You are on page 1of 3

PPOL 509-03

Problem Set 3
Exercises in Multiple Linear Regression
Due in class 2/1/07

1. You will need to use SAS and the data from MEAP93.RAW for this exercise.
The variables we are interested in are math10 (the math pass rate), lexpend
(the natural log of spending per student), and lnchprg (the share of students
in a school qualifying for the school lunch program). Note that low-income
students qualify for the school lunch program, so here higher values of
lnchprg suggest that the school’s student body is poorer.

a. First, I would like you to produce two scatterplots: (1) plot math10 on the
vertical axis and lexpend on the horizontal axis and (2) plot math10 on the vertical
axis and lnchprg on the horizontal axis.

b. In the regression
math10̂ = β 0 + βˆ1l exp end + βˆ 2 ln chprg

What sign do you predict for the coefficients β̂ 1 and βˆ 2 ?

c. Using SAS, estimate the regression above. Write your results in equation
form.

d. In a concise paragraph, drawing on the estimates from (c), describe the


relationship between math10, lexpend, and lnchprg as precisely as you can.
Indicate the direction, magnitude, and statistical significance of the
relationship based on this sample. What concerns do you have about the
interpretation of the coefficients on lexpend and lnchprg?

e. What is the R-squared for this estimation? What does it reveal about our
empirical specification?

f. In SAS, create a new variable called math10̂ defined as follows:

math10̂ = β 0 + βˆ1l exp end + βˆ 2 ln chprg


Calculate the correlation coefficient between math10 and math10̂ . What
does r tell us about how well our model does predicting math10?

PPOL 509-3
Problem Set #3
Page 1
2. (Wooldridge 3.1)
Using the data in GPA2.RAW on 4,137 college students, the following equation was
estimated by OLS:
co lg paˆ = 1.392 − 0.0135hsperc + 0.00148sat
n=4,137, R-square=0.273
where colgpa is measured on a four-point scale, hsperc is the percentile in the high
school graduating class (defined so that, for example, hsperc = 5 means the top five
percent of the class), and sat is combined math and verbal scores on the student
achievement test.
i. Why does it make sense for the coefficient on hsperc to be negative?
ii. What is the predicted college GPA when hsperc = 20 and sat = 1050?
iii. Suppose that two high school graduates, A and B, graduated in the same
percentile from high school, but student A’s SAT score was 140 points
higher (about one standard deviation in the sample). What is the predicted
difference in college GPA for these two students? Is the difference large?
iv. Holding hsperc fixed, what difference in SAT scores leads to a predicted
colgpa difference of 0.50, or one-half of a grade point? Comment on your
answer.

3. (Wooldridge 3.4)
The median starting salary for new law school graduates is determined by

log(salary ) = β 0 + β 1 LSAT + β 2 GPA + β 3 log(libvol ) + β 4 log(cos t ) + β 5 rank + u


Where LSAT is the median LSAT score for the graduating class, GPA is the median
college GPA for the class, libvol is the number of volumes in the law school library,
cost is the annual cost of attending law school, and rank is a law school ranking (with
rank = 1 being the best).
(i) Explain why we expect β 5 ≤ 0
(ii) What signs do you expect for the other slope parameters? Justify
your answer.
(iii) Using the data in LAWSCH85.RAW, the estimated equation is

log(salary ˆ) = 8.34 + 0.0047 LSAT + 0.248GPA + 0.95 log(libvol ) + 0.038 log(cos t ) − 0.0033rank
N=136, R-square=0.842
What is the predicted ceteris paribus difference in salary for
schools with a median GPA different by one point? (Report your
answer as a percentage.)
(iv) Interpret the coefficient on the variable log(libvol)
(v) Would you say it is better to attend a higher ranked law school?
How much is a difference in ranking of 20 worth in terms of
predicted starting salary?

PPOL 509-3
Problem Set #3
Page 2
4. (Wooldridge C3.1)
A problem of interest to health officials (and others) is to determine the effects of
smoking during pregnancy on infant health. One measure of infant health is birth
weight; a birth weight that is too low can put an infant at risk for contracting various
illnesses. Since factors other than cigarette smoking that affect birth weight are likely
to be correlated with smoking, we should take those factors into account. For
example, higher income generally results in access to better prenatal care, as well as
better nutrition for the mother. An equation that recognizes this is:
bwght = β 0 + β 1cigs + β 2 fa min c + u
(i) What is the most likely sign for β 2 ?
(ii) Do you think cigs and faminc are likely to be correlated? Explain why
the correlation might be positive or negative.
(iii) Now, estimate the equation with and without faminc, using the data in
BWGHT.RAW. Report both sets of results in equation form,
including the sample size and R-squared. Discuss your results,
focusing on whether adding faminc substantially changes the estimated
effect of cigs on bwght.

5. (Wooldridge C3.2)
Use the date in HPRICE1.RAW to estimate the model:
price = β 0 + β1 sqrft + β 2 bdrms + u
Where price is the house price measured in thousands of dollars.
(i) Write out the results in equation form.
(ii) What is the estimated increase in price for a house with one more
bedroom, holding square footage constant?
(iii) What is the estimated increase in price for a house with an
additional bedroom that is 140 square feet in size? Compare this to
your answer in part (ii).
(iv) What percentage of the variation in price is explained by square
footage and number of bedrooms?
(v) The first house in the sample has sqrft=2,438 and bdrms=4. Find
the predicted selling price for this house from the OLS regression
line.
(vi) The actual selling price of the first house in the sample was
$300,000 (so price=300). Find the residual for this house. Does it
suggest that the buyer overpaid or underpaid for the house?

PPOL 509-3
Problem Set #3
Page 3

You might also like