You are on page 1of 7

Jason Schott

Econ 306

Professor Goldstein

3/1/16

Homework 3

100 points

README: The following two problems will require a lot of calculations in


STATA. It will generate many pages of output. Here is how you should
organize it. The first pages should contain your answers to all the
questions, along with showing any key algebraic equations or
explanations you need to use along the way. After that, include a
printout of the output from the regressions you executed in support of your
answers. Highlight any numbers in this output that you used in the first
section. (To save paper, you may print this section double-side and/or with 2-
up format.) Last, include a copy of the DO file that contains the commands
you asked STATA to execute. Be sure you organize these in a way that will be
clear to the reader.

1. (52 points total. 12 parts worth 4 each, 4 points free.)


With this assignment you will find a STATA data file called
HW3.Housing.dta. For reference, the variables in this file are:

price = House Price in $


sqft = Total square feet of living area
beds = Number of bedrooms
baths = Number of full bathrooms
age = age of house in years
stories = number of stories of the house
vacant = If yes, this variable=1. If no, this variable=0.

Open this dataset within STATA. Before you begin answering the
following, its not a bad idea to ask STATA to summarize the data using
the command summarize. You should also start a log file to store your
results.

Price 0 1 *beds
a. Run the following regression:
b. Hypothesize the sign of the bias, if any, resulting from excluding
sqft from the regression. Explain your reasoning.

The bias may be positive. If you hold beds constant the


more sqft there is the higher the price is so it would be
positive

c. Use the data to verify (or not) your claim from b). Break down the
bias into the component pieces as we did in class.

The coefficient on beds drops once we include sqft. This


suggests the bias was indeed positive when we exclude age
led to an overestimation due to beds by approximately
48000. In class we showed that the variable left out is equal
to a term. ( ) We can also find the shadow regression of
sqfts and beds with is 507.11. If you use the shadow
regression of sqft which we find to be 95.29. The product of
these two equals 48322.458. This proves our original
prediction

d. You will see from c. that the effect of beds is negative, once we
control for square footage. Does this make sense?

This does make sense. If square foot is held constant the


quality of the bed rooms would hold more value than simply
the number of beds in the house. Whats better 3 huge bed
rooms or 5 smaller bed rooms. It may be personal
preference.

e. Now, run the regression:


Price 0 1 * beds 2 * sqft 3 * baths 4 * age 5 * stories

f. At a level of =.05, for which, if any, values of i, would you reject


the null hypothesis that i=0?

Only baths are not statistically significant with a p-value of .


62 according to the calculations.

g. What is the predicted price with beds=4, sqft=2185, age=45,


baths=2.5, stories=3? (Note: these are the figures for my house
here in State College, but the data is not, so the price here is pretty
meaningless as a predictor of my own home value.)

The predicted price is $135,815.82 (View output)

h. According to this model, how much will my house change in value


five years from today?

$135,815.82-$134,320.45=1,495.37 This say that the house


is predicted to lose $1,495.37 over the next 5 years.

i. What percentage of the variation in price is explained by the five X-


variables?

R^2=.7118
Now change the measurement of price. Use the gen command:
gen price_thous=price/1000
and then use this in place of price in the regression command for
part e

j. Compare the coefficients, standard error, and t-statistics for the


independent variables. Briefly interpret the difference between this
model and the version from part e.

All the variables coefficients and errors have been reduced


by factors of 1000. The t-stat remained the same.

k. Create a new age variable by converting age from years to days


(365 days in a year). Rerun the regression from e with the new age
variable in place of the original age.

l. What has changed between the regression in e and regression in k?


Be precise.

The only thing that changed between both regressions is


that the new age variable has been reduced by 365 this also
includes the error values.
2. (48 Points Total, 9 parts worth 5 points each, add on 3 points
for free.)
For the following problem, use the STATA dataset called crime.dta. This
data set was compiled by Christopher Cornwell and William Trumbull to
study factors that influence crime rates. The data set contains
observations for 90 counties in North Carolina for 1981. The definitions
of the variables are given in the data set:

According to the economic model of crime rates, lower crime rates are
associated with better labor markets (higher wages), more police
presence and tougher sentences, and lower population density. We will
use this data set to examine these hypotheses. Use a significance
level of =.05 for all hypothesis tests. All of the following
regressions will utilize the following subset of variables from
this dataset.

crmrte=crime rate
prbarr=probability of arrest
prbconv=probability of conviction
prbpris=probability of a prison sentence
avgsen=average sentence in days
polpc=number of police per capita
density=population density
pctmin=percent minority
taxpc=tax revenue per capita
wmfg=average weekly wage in manufacturing
wcon=average weekly wage in construction
wtuc=average weekly wage in transportation,utilities,and
communications
wtrd=average weekly wage in wholesale and retail trade
wfir=average weekly wage in finance,insurance,and real estate
wser=average weekly wage in services
wfed=average weekly wage in federal government
wsta=average weekly wage in state government
wloc=average weekly wage in local government

a. Run a regression of crmrte on the variables listed above. Call this


Model 1.

b. Do any p-values indicate a variable is not statistically significant?


Which?

Only a few variable are statistically significant, prbarr,


density, and pctmin80.

c. Interpret the F-statistic STATA has calculated for Model 1.

This F-stat tests the regression as a whole. It tests that all


variables are all zero at the same time. Since the P-value is
0.0000 it says that they are all statistically significant and
we would reject the Ho at a level of .05

d. Test the hypothesis that the coefficients on wsta and wloc are equal
to each other. Use the t-test method described in the lectures.
What transformation do you need to do here? Be specific.

If the variables are equal then the difference will be 0. If we


create a new variable and run the regression you get the p-
value of .562, so we cant rejct Ho.

e. Test the hypothesis that the coefficients on wfed, wsta and wloc are
all equal to each other. Do this by writing down the formula for the
relevant F-statistic. Calculate it (by running the appropriate
restricted regression) and test the hypothesis. Report these results.
This restricted version of the regression will be called Model 2.

If all three variables are equal, then it would force the B


value for those variables to be equal as well. It is similar to
Model one just all wage variables are combines. The SSR
value then becomes .0056988 compared to the model 1
value of .0056612. The F-stat is far less than the crit-value
of about 3.15, which shows we cannot reject Ho.
f. Return to Model 1. Now test the hypothesis that all 9 of the wage
variables have a coefficient of zero. Do this by writing down the
formula for the relevant F-statistic. Calculate it (by running the
appropriate restricted regression) and test the hypothesis. Report
these results.

Model three is similar to model one except for the 9 wage


variable. The SSR for model 3 becomes .0062863 compared
to the value of .0056612 for model 1. Since the F-stat is less
than the Crit value of 2.04 We cannot reject Ho. This
supports an idea that wage doesnt matter for our
calculations.

g. If a crime is committed, the probability of arrest is prbarr. If a person


is arrested for the crime, the probability of conviction is prbconv. If
the person is convicted, the probability of prison is prbpris. Assume
all these probabilities are independent. What is the formula for
calculating the probability that someone who commits a crime will
a) get arrested AND b) get convicted AND c) get a prison sentence?
That is, recall from Econ 106 how would you calculate the
probability of an intersection of statistically independent events.
Generate a new variable in STATA using this formula and call it
prjail_ifcrime [Note: the probabilities produced by the researchers
are derived from the arrest data, and thus may not follow the usual
rules of probability. In particular, some probabilities are greater
than one. Dont worry about that here.]

The probability of all these independent is the combination


of all of the probabilities of each event happening
independently. It would therefore be:
prjail_ifcriime=prbarr*prbconv*prbpris

h. Now that you have this probability, use it to go one step further and
write down the formula for the EXPECTED jail sentence if a person
commits a crime. [You will need to use the information about
average sentence.] Use this formula to create a new variable in
STATA called jailtime_ifcrime.

This becomes jailtime_ifcrime=prjail_ifcrime*avgsen


i. Return to regression Model 1. Replace the variables prbarr,
prbconv, prbpris and avgsen with your new variable jailtime_ifcrime.
This is Model 4. Write a paragraph in which you discuss how
Model 4 compares with Model 1.

Model 4 compares reasonably well to model 1. The new


variable that is created for model 4 just combines all the
data into an expected penalty. It also combines the
probabilities of getting caught and severity of the
punishment from the crime. The results from our regression
show that as new value increases, the crime rate drops
substantially enough to be considered statistically
significant. It is saying that the increase in punishments and
probabilities of getting caught act as a deterrent to more
crime. Since the R squared value remains constant across
the two models, it is reasonable to assume we dont lose
any power due to the change. One thing to note is that the
police variable increases most likely due to the increase in
crime, which causes an increase in the demand for police
officers. Logically when there is more crime there will be
more officers trying to stop the crime.

You might also like