Professional Documents
Culture Documents
Problem Set #4
Nathaniel Higgins
nhiggins@jhu.edu
Assignment
Read 4.1 4.3. Hand in answers to
C4.1(i)
C4.5(i) and (ii)
C4.7(i) (iv)
C4.8(i) (v)
C4.9(i) and (iii)
C4.10(i) (vi)
C4.1
The following model can be used to study whether campaign expenditures aect election
outcomes:
voteA =
0
+
1
log(expendA) +
2
log(expendB) +
3
prtystrA + u,
where voteA is the percentage of the vote received by Candidate A, expendA and expendB
are campaign expenditures by Candidates A and B, and prtystrA is a measure of party
strength for Candidate A (the percentage of the most recent presidential vote that wen
to As party).
i
What is the interpretation of
1
?
When expenditure by Candidate As campaign increases by 1%, the percentage of the
vote that Candidate A receives is predicted to increase by
1
/100.
1
C4.5
Use the data in MLB1.RAW for this exercise.
i
Use the model estimated in equation (4.31) and drop the variable rbisyr. What happens to
the statistical signicance of hrunsyr? What about the size of the coecient on hrunsyr?
The model estimated in equation (4.31) yields:
phsrank
= 0 against the two-sided alternative hypothesis that
phsrank
= 0 at the 0.05
level of signicance. You can also see this from the p-value, which is greater than 0.05).
Ten percentage points of high school rank is worth an increase of 10 .0003032 =
0.003032 in log(wage). That is, since high school rank is already expressed in percentage
terms, we only need to multiply the coecient by 10 to see how much log(wage) increases
by when we increase phsrank by 10. We could leave it there, but its not very useful
to interpret things in terms of log(wage) units. We can easily interpret this change of
0.003032 log(wage) units in terms of percentage increase in wage instead (since wage is
logged on the left-hand-side of the regression equation). To do this, we multiply by 100.
Therefore, a 10 percentage point increase in phsrank increases wage by approximately
0.3032 percent. Not that much! High school really didnt matter (which is good for me,
because I spent most of high school guring out creative ways to get in trouble).
iii
Does adding phsrank to (4.26) substantively change the conclusions on the returns to
two- and four-year colleges? Explain.
Adding phsrank to (4.26) (i.e. to a regression model that includes junior college credits,
total college credits, and work experience) does not seem to make much dierence. The
total variation explained by all the variables in each model is very similar, the magnitude
of the coecients (i.e. their absolute value) are very similar between the regressions,
and the standard errors (and hence the t-statistics) are essentially unchanged.
iv
The data set contains a variable called id. Explain why if you add id to equation (4.17)
or (4.26) you expect it to be statistically insignicant. What is the two-sided p-value?
By using the describe command in Stata I can see that the id variable is nothing
but an ID number, which should have absolutely nothing to do with a persons wage.
Therefore, if I add it to any regression model explaining wage, I should hope like hell
that it doesnt correlate very highly with wage. And it doesnt (p-value of 0.587).
3
4.8
The data set 401KSUBS.RAW contains information on net nancial wealth (nettfa),
age of the survey respondent (age), annual family income (inc), family size (fsize), and
participation in certain pension plans for people in the United States. The wealth and
income variables are both recorded in thousands of dollars. For this question, use only
the data for single-person households (so fsize=1).
i
How many single-person households are there in the dset?
I used three commands to determine that out of 9,275 responses in the dset, 2,017 of them
came from individuals with a family size of 1. The only necessary command was: sum
if fsize==1. I then see that there were 2,017 observations in the dset with fsize==1.
I also wondered if any of the observations of fsize were missing (which would indicate
the possibility that there were more single-person households that I could not observe).
To nd this out, I typed describe to nd out that there were 9,275 total observations
in the dset, then typed sum fsize to determine that there were 9,275 observations of
the variable fsize, i.e. there were no missing values. Just some bonus knowledge for
you.
ii
Use OLS to estimate the model
nettfa =
0
+
1
inc +
2
age + u,
and report the results using the usual format. Be sure to use only the single-person
households in the sample. Interpret the slope coecients. Are there any surprises in the
slope estimates?
To run this regression I used the command reg nettfa inc age if fsize == 1. When
I did so I obtained the results:
2
= 1), what is the probability of getting a t-stat as large (in absolute) as the t-stat we
observe? First, we have to calculate the t-stat. The t-stat under the null is:
0.8426563 1
0.0920169
= 1.7099435.
We now want to know what the probability is of getting a t-stat that is bigger than
1.7099435. If we were testing a two-sided hypothesis (i.e. if the alternative hypothesis
were H
A
:
2
= 0 instead of H
A
:
2
< 0) then we would double the value we are about
to obtain. We want to nd:
P(T > 1.7099435),
where T is a random draw from the t-distribution with 2,014 degrees of freedom. Lets
nd this exact value using Stata:
scalar pval = ttail(2014,1.7099435).
When I do this I obtain: pval = .04371517. The p-value of 0.04 tells me that we would
not reject the null hypothesis at the 1% signicance level (we would reject the null at
the 5% signicance level, but not at the 1% level).
Note that if you did not have Stata (or your Stata does not have the ttail command) you
could come close to this value simply by comparing the absolute value of the t-statistic
we obtained (1.7099435) to the critical values in table G2.
C4.9
Use the data in DISCRIM.RAW to answer this question.
i
Use OLS to estimate the model
log(psoda) = + 0 +
1
prpblck +
2
log(income +
3
prppov + u,
and report the results in the usual form. Is
1
statistically dierent from zero at the 5%
level against a two-sided alternative? What about at the 1% level?
When I run the regression in Stata I obtain:
1
is statistically dierent from zero at the 5% level, but not at the 1% level (p-value of
0.018, which is less than 0.05, but greater than 0.01).
5
iii
To the regression in part(i), add the variable log(hseval). Interpret its coecient and
report the two-sided p-value for H
0
:
log(hseval)
= 0.
When I add the variable log(hseval) to the regression above, I get a coecient of
0.1213056 and a p-value of 0.000. I thus reject the null hypothesis that the true coecient
on log(hseval) is zero at the 1% signicance level. The coecient tells me that when
log(hseval) increases by one unit, log(psoda) is predicted to increase by about 0.12
units. To change this result from units of logged-variables to units of the variables
themselves, I dont need to do anything to the coecient (since both the independent
variable hseval and the dependent variable psoda are logged). Therefore, I can say that
when median house value in a zip code increases by 1%, the price of a medium soda in
that same zip code is predicted to increase by 0.12%.
C4.10
Use the data in ELEM94 95 to answer this question. The ndings can be compared with
those in Table 4.1. The dependent variable lavgsal is the log of average teacher salary
and bs is the ratio of average salary (by school).
i
Run the simple regression of lavgsal on bs. Is the estimated slope statistically dierent
from zero? Is it statistically dierent from -1?
When I run the simple regression I get the following results:
bs
. The eect of reducing the unexplained variation in the model
outweighs the collinearity eect, in this case. This makes sense when you observe that
the correlation between bs and lenrol is 0.02 and the correlation between bs and lstaff
is 0.04 (both relatively small).
iv
How come the coecient on lsta is negative? Is it large in magnitude?
The coecient on lstaff suggests that the relationship between number of sta and
average teacher salary is negative, controlling for enrollment size and ratio of benets to
salary. This suggests that when more sta are added to a school, each teacher is paid
less, on average. The magnitude of the coecient is relatively large: when the number
of sta increases by 10%, the average salary decreases by about 7%/.
v
Now add the variable lunch to the regression. Holding other factors xed, are teachers
being compensated for teaching students from disadvantaged backgrounds? Explain.
When I run this new regression I obtain: