DEPARTMENT OF POLITICAL SCIENCE

AND

INTERNATIONAL RELATIONS

Posc/Uapp 815



Assignment 9

MULTIPLE REGRESSION



Name___________________________

(Printed)

Student Number___________________

(Social Security Number)

E-mail__________________________

First retrieve the data set "Boston Cities" from the class web site. I found these data in the StatLib where you can obtain the background information.(1) The data on the web page come in two versions: One contains 14 variables for 506 towns in the Boston area. The other contains a subset of 4 of these variables. The smaller one will "fit" in the Student Version of Minitab for Windows. (The full version will, of course, accommodate either set.)

The information was apparently collected to investigate air quality on housing prices, but we will use them for a different purpose, an examination of crime rates in these towns.

You should retrieve the set that your "system" can handle. The stripped down or minimal batch is easier to work with but doesn't allow you to investigate as many hypotheses. In any event you should save the files on your diskette or hard drive as you have been doing so that you can analyze them as needed.

Here are the variables in the full set:

c1 per capita crime rate by town

c2 proportion of residential land zoned for lots over 25,000 sq.ft.

c3 proportion of non-retail business acres per town

c4 Charles River dummy variable (= 1 if tract bounds river; 0 otherwise)

c5 nitric oxides concentration (parts per 10 million)

c6 average number of rooms per dwelling

c7 proportion of owner-occupied units built prior to 1940

c8 weighted distances to five Boston employment centres

c9 index of accessibility to radial highways

c10 full-value property-tax rate per $10,000

c11 pupil-teacher ratio by town

c12 1000(Bk - 0.63)2 where Bk is the proportion of blacks by town

c13 % lower status of the population

c14 Median value of owner-occupied homes in $1000's

Here are the variables for the smaller version:

c1 Per capita crime rate by town

c2 Proportion of owner-occupied units built prior to 1940

c3 % lower status of the population

c4 Median value of owner-occupied homes in $1000's

Our job is to explain variation in crime rates by reference to one or more independent variables: proportion of houses built before 1940 (proportion of older houses), percent of the population classified as "lower status," and median value of owner-occupied houses. Of course, these all pertain to the social and economic environment of the towns. We will try to find a relationship between crime and one or more of the variables. We'll also strive to create the most

parsimonious or simplest model.



  1. Let's proceed more or less systematically. First obtain descriptive statistics for the four variables. (Valid N means the number of cases.)




Mean


Median
Standard

deviation



Minimum


Maximum
Valid

N

Crime
House age
% lower
Median

value







  1. Do you notice anything about the dependent variable?
    1. ________________________________________________________ _______________________________________________________________ __________________________________________________________________________ _______________________________________________________________________


  1. Attach a stem-and-leaf plot of the dependent variable. It should suggest that a transformation is necessary, but wait for now.


  2. Find the correlation matrix:
Crime House age % lower
House age
% lower
Median value


    1. What does the correlation between crime and percent of population classified as lower tell you?

_______________________________________________________ _____________________________________________________________________________

    1. What does the correlation between median housing values and percent of population classified as lower tell you about multicolinearity?

___________________________________________________ _________________________________________________________________________________

    1. Now regress the crime variable on percent of aged housing. What is the estimated regression equation? (Please be neat.) ___________________________________________________________


    2. What is the measure of scale of Y, the dependent variable? ______________


    3. What is R2?_____________________


    4. Now add population classified as lower status value to the model. That is, redo the regression analysis with the second independent variable added. What is the estimated equation? ______________________________________________________________


    1. Try to interpret this number in the context of the data. _________________________________________________________ _________________________________________________________ __________________________________________________________ __________________________


    1. What is R2?_____________________


    2. Finally, add the third variable, median housing value. What is the estimated model when all three variables are included? ________________________________________________________________


    3. What is R2?_____________________


  1. We saw earlier that the dependent variable should be transformed. Create a now variable, log of crime rate, by taking the natural log of the raw variable. Attach a stem-and-leaf display of it.


  2. What is the correlation between the transformed variable and the three independent variables?
Log Crime
House age
% lower
Median value




  1. Now regress the three independent variables on log crime rate and obtain the estimated equation. (Note: these parameters will obviously not correspond to the previous ones because the dependent variable now has a new measurement scale, log of crime rate, not crime rate.) _______________________________________________________________________


    1. What is R2?_____________________


  1. Finally, eliminate median housing from the model. Is the "fit" changed very much? Explain.


_________________________________________________ _______________________________________________________ ________________________________________________________ _________________________________________________________ _______________________________________________________________________

Why not try some further analysis on your own.

1. Harrison, D. and Rubinfeld, D.L. 'Hedonic Prices and the Demand for Clean Air', J. Environ. Economics & Management, vol.5, 81-102, 1978. They were used in Belsley, Kuh & Welsch, Regression diagnostics , Wiley, 1980. N.B. Various transformations are used in the table on pages 244-261 of the latter.


Go to Statistics 815 main page

Go to H. T. Reynolds page.