Name___________________________
(Printed)
Student Number___________________
(Social Security Number)
E-mail__________________________
First retrieve the data set "Boston Cities" from the class web site.
I found these data in the
StatLib where you can obtain the background information.(1)
The data on the web page come in
two versions: One
contains 14 variables for 506 towns in the Boston area.
The other contains a
subset of 4 of these variables.
The smaller one will "fit" in the Student Version of Minitab for
Windows. (The full version will, of course, accommodate either set.)
The information was apparently collected to investigate air quality on housing prices, but
we will use them for a different purpose, an examination of crime rates in these towns.
You should retrieve the set that your "system" can handle. The stripped down or minimal
batch is easier to work with but doesn't allow you to investigate as many hypotheses. In any event
you should save the files on your diskette or hard drive as you have been doing so that you can
analyze them as needed.
Here are the variables in the full set:
c1 per capita crime rate by town
c2 proportion of residential land zoned for lots over 25,000 sq.ft.
c3 proportion of non-retail business acres per town
c4 Charles River dummy variable (= 1 if tract bounds river; 0 otherwise)
c5 nitric oxides concentration (parts per 10 million)
c6 average number of rooms per dwelling
c7 proportion of owner-occupied units built prior to 1940
c8 weighted distances to five Boston employment centres
c9 index of accessibility to radial highways
c10 full-value property-tax rate per $10,000
c11 pupil-teacher ratio by town
c12 1000(Bk - 0.63)2 where Bk is the proportion of blacks by town
c13 % lower status of the population
c14 Median value of owner-occupied homes in $1000's
Here are the variables for the smaller version:
c1 Per capita crime rate by town
c2 Proportion of owner-occupied units built prior to 1940
c3 % lower status of the population
c4 Median value of owner-occupied homes in $1000's
Our job is to explain variation in crime rates by reference to one or more independent variables: proportion of houses built before 1940 (proportion of older houses), percent of the population classified as "lower status," and median value of owner-occupied houses. Of course, these all pertain to the social and economic environment of the towns. We will try to find a relationship between crime and one or more of the variables. We'll also strive to create the most
parsimonious or simplest model.
Mean |
Median |
Standard
deviation |
Minimum |
Maximum |
Valid
N | |
Crime | ||||||
House age | ||||||
% lower | ||||||
Median
value |
Crime | House age | % lower | |
House age | |||
% lower | |||
Median value |
_______________________________________________________ _____________________________________________________________________________
___________________________________________________ _________________________________________________________________________________
Log Crime | |
House age | |
% lower | |
Median value |
_________________________________________________ _______________________________________________________ ________________________________________________________ _________________________________________________________ _______________________________________________________________________
Why not try some further analysis on your own.
1. Harrison, D. and Rubinfeld, D.L. 'Hedonic Prices and the Demand for Clean Air', J. Environ. Economics & Management, vol.5, 81-102, 1978. They were used in Belsley, Kuh & Welsch, Regression diagnostics , Wiley, 1980. N.B. Various transformations are used in the table on pages 244-261 of the latter.
Go to Statistics 815 main page