You are on page 1of 13

PPOL 502: Problem Set #2

Vivek Agarwal
February 11, 2015
Time Taken: 2.5 hours

Solution 1
1. A 1% in the distance of house from a recently built garbage incinerator
results in an increase of 0.312% in price of the house sold. One would
expect that as the distance between a house and garbage incinerator increases its price would go up. Harmful gases released by an incinerator are
more likely to be found in houses closer to the incinerator and are therefore likely to reduce to the demand for such house. Hence, the positive
sign of the coefficient on log(dist) is expected.
2. No. Controlling for several other factors has been ignored. It is likely
that the incinerators are actually installed by the local authorities based
on other criterion such as availability of land, distance from water sources
(to prevent contamination, etc.). Clearly these factors also determine the
price of the houses, therefore ignoring them creates a bias.
3. As discussed above, distance from say a clean water body (lake, sea, etc.)
can increase the price of a house dramatically. Further, incinerators are
usually not installed close to water bodies to prevent contamination, etc.
Therefore, proximity to a water body increases the price, but also has an
eect by the way increasing the distance.

Solution 2
1. The average salary is $ 957.9455, while the average IQ of the population
is 101.2824. The standard deviation of IQ in the sample is 15.05264. See
Figure 1.
2. The simple regression model has been presented in Figure2. The equation
is:
c
wage
d = 116.9916 + 8.303064IQ
(1)

Figure 1: Summary Results

Figure 2: Regression Results

Figure 3: Regression Results


Further, an increase of 15 points in IQ will result in an increase of 15
times the coefficient of IQ in the value of wage. This can be calculated as
$124.54596.
R2 values can help us determine if most of the variation in the dependent
variable is explained by the independent variable. Here, the R2 value is
0.0955. Hence, only 9.55%, a very small part, of the variation in wage is
explained by IQ.
3. The regression model has been presented in Figure3. The equation is:
d
c
ln(wage)
= 5.886994 + 0.0088072IQ

(2)

This model predicts that a one unit change in IQ will result in a 0.88072%
change in wage. Further, an increase of 15 points in IQ will result in an
increase which is 15 times the coefficient of IQ in the model, i.e. 13.2108%.
However, note that this is an approximation and might not be accurate.

Solution 3
1. A lower value of rank reflects a higher academic superiority of the school.
Further, a higher academic superiority is expected to result in students
with higher abilities - a valued asset for employers - leading to higher
starting salaries. Therefore, rank is inversely related to salary. This results
in the expectation of 5 0.
2. An increase in LSAT reflects a higher ability for a better job performance
and consequently would result in higher salaries. Therefore, one would

expect 1 0. Similarly, GPA would result in higher salaries. Therefore,


one would expect 2 0.
Higher the number of volumes in a law school library, libvol, greater are
the opportunities for students to gain more expertise. This expertise is
of value to employers and is expected to result in higher starting salaries.
Therefore, one would expect 3 0.
Students typically perform a cost-benefit perspective while making the
choice of a college. A college with higher cost would be favored only if the
future payos, including starting salary, are high. It could also be argued
that schools that charge more, have greater capacity to administer better
education that would result in better academic outcomes for the students.
This is again of value to employers and results in the same conclusion.
Therefore, one would expect 4 0.
3. A one unit change in GPA results in a 100
24.8% increase in salary.

2%

change in salary, i.e.

4. A 3 % increase results in salary from a 1% change in the value of libel. The


estimated equation suggests that a 0.095% increase results in the value of
salary from every 1% increase in the value of libel.
5. For every bettered rank, one can expect an average increase of 0.33%
(=
5 100) in their salaries. Therefore, an increase in ranking by 20 is
expected to result in a 6.6% (= 20 0.33%) increase in the salary.
The substantiveness of the 0.33% increase in salary for every unit betterment rank has to be interpreted considering trade-os involved in increasing salary by changing the other parameters. However, overall choosing a
better ranked (lower value) college is expected to result in higher salaries.

Solution 4
1. The estimated equation is presented below. Also, see Figure??.
d =
price

d t + 15.19819bdrms
d
19.315 + 0.1284362sqrf

(3)

2. The estimated increase in the price for a house is $0 with addition of one
more bedroom, holding square footage constant. This results from the
fact that the coefficient of bdrms is not statistically significant even at a
significance level of 10%. Hence, its coefficient is indistinguishable from 0.
3. The estimated increase in the price for a house with an additional bedroom
that is 140 square feet in size is $33,179 (= ( 1 (sqrft) + 2 (bdrms))
1000).

Figure 4: Regression Results

Figure 5: Summary Results


4. R-Squared value suggests the percentage variation in the dependent variable that is explained by the independent variables. The R-Squared value
for this estimated model is 0.6319 or 63.19%. Therefore, 63.19% of the
variation in price for a house is explained by the variation in square footage
and number of bedrooms.
5. Substituting for sqrft = 2438 and bdrms = 4 in Equation 3, we have price
= 354.60522. Hence, the price is $ 354,605.
(vi) Residual is the dierence between the actual (price) and the estimated
d
value(price).
Therefore, the residual here is $-54,605. Hence, the buyer
underpaid by $54,605according to the predictions from the model.

Solution 5
1. The average value of prpblck is 0.1134864, and the average value of income
is $47053.78. See Figure 5. prpblck is unit-less and is ratio while income
has the units of $ per year and is ratio.
2. The estimated equation is presented below (See Equation 4). Also, see

Figure 6: Regression Results


Figure6. The R-Squared is 0.0642 and the sample size is 401.
d = 0.9563196 + 0.1149882prpblck
d + 0.0000016income
d
psoda

(4)

For every 1% increase in the proportion of black population the price of


medium soda increases by $0.1149882. This increase is almost 10% the
mean of psoda (1.044876) and greater than the its standard deviation
(0.0886873). Therefore, this change can be considered to be substantially
significant.
3. The estimated equation (without controlling for income) is presented below (See Equation 5). Also, see Figure7. .
d = 1.037399 + 0.0649269prpblck
d
psoda

(5)

The previous model estimates a larger eect of $0.1149882 increase in


price of soda for every percentage increase in black population compared
to $0.0649269 predicted by the modified model. The discrimination eect
is larger when income is controlled in the model.
4. For every 1% increase in the proportion of black population the price of
soda increases by 1 100, i.e. 12.15803%. See Figure 8. Therefore, a 0.2%
increase will increase the price of soda by 2.431606% (= 12.15803%0.20).
5. On adding the variable prppov to the regression model we notice that the

value of prpblck
decreases from 0.1215803 to 0.0728072. See Figures 8 &
9
6

Figure 7: Regression Results

Figure 8: Regression Results

Figure 9: Regression Results

Figure 10: Correlation Results


6. The correlation between logincome and prppov is found be -0.8385. See
Figure 10. A negative correlation was expected because median family
income, income, in poor areas is roughly expected to be low. However,
it must be noted that areas with low poverty proportion might still have
low median family incomes.
7. The statement because log(income) and prppov are so highly correlated,
they have no business being in the same regression is partly unjustified.
log(income) and prppov are not perfectly collinear and measure poverty
but using dierent measures. Therefore, although this might lead to multicollinearity, it will not lead to perfect collinearitity and they can be
included in the same regression.

Solution 6
1. MLR1: Although the few points corresponding to higher values of ln(income)
& prpblck tend to skew the best fit line; however generally a trend can be
8

Figure 11: Matrix Plot


observed that is linear. See Figure 11.
2. MLR2: No information on random sampling has been provided. However,
for the purposes of this exercise it has been assumed that random sampling
was infact adhered to.
3. MLR3: From Figure 11 it is clear that none of the Independent Variables
(ln(income) & prpblck ) are constant. Also, the number of observations
(401) are far greater than the number of Independent Variables (2). In addition, STATA doesnt report a problem with perfect collinearity. Hence,
it can be comfortably concluded that MLR3 is satisfied.
4. MLR4: This condition essentially requires for the rvp plots for all independent variables to be a mirror image across residual equals zero (assuming one dot represents one point).
Studying the rvp plot for prpblck, we can conclude that almost across all
the values of prpblck we see that the residuals add up to zero. See Figure
12. Similarly, on analyzing the rvp plot for ln(income) we notice that
although for a few low values of ln(income) the residuals are net negative,
they generally cancel each other out for most other values of ln(income).
See Figure 13.
9

Figure 12: Residual versus Predictor Plot for prpblck


This can also be interpreted from the rvf plot. Except for the lower
and higher fitted values the residuals approximately cancel each other
out. This deviation in lower and higher values is not surprising and was
expected based on the interpretations of the rvp plots. See Figure 14.
5. Unbiasedness:Since MLR1-4 are satisfied, it can be concluded that the
estimated equation in unbiased. Further, not we can explore the efficiency
of the estimated equation by evaluating MLR 5.
6. MLR5: This condition essentially requires for that the envelope of points
be rectangular in shape and be symmetric around the residuals = 0 line
for all independent variables.
Studying the rvp plot for prpblck, we see that the residuals are boxed in
between -0.2 and 0.2 for values of prpblck between 0.1 to 0.7. For other
values of prpblck though the values are either higher or below this rectangle, they are very few in number. Therefore, we can conclude that MLR
5 is almost satisfied for prpblck. See Figure 12. Similarly, on analyzing
the rvp plot for ln(income) we notice for values ln(income) between 10.25
and 11.25 the residuals are boxed between -0.2 and 0.2. However for the
other values they are much lower that 0.2. Therefore, we can conclude
that MLR 5 is not satisfied for ln(income). See Figure 13.
10

Figure 13: Residual versus Predictor Plot for lnincome

11

Figure 14: Residual versus Fitted Value Plot

12

Figure 15: Kernel Density Plot


This can also be interpreted from the rvf plot. Though boxed for fitted
values between 0 and 0.07 between -0.2 and 0.2, the values are much lower
for higher and lower fitted values. This deviation in lower and higher
values is not surprising and was expected based on the interpretations of
the rvp plots. See Figure 14.
7. Efficiency: Since, MLR 5 is violated overall we can conclude that the
estimated equation is not an efficient estimator.
8. MLR6: This condition requires for the population residual u to be independent of the independent variables and be normally distributed with
zero mean and constant variance. This can be analyzed from the plot
in Figure 15. Clearly, the residuals do not follow a normal distribution.
Therefore, it can be concluded that MLR 6 is not satisfied.
9. Reliability of Standard Errors: Since MLR 6 is not satisfied we can
conclude that the standard errors are not reliable.

13

You might also like