Professional Documents
Culture Documents
Final Review
Sophia Zhengzi Li1
1 Department
of Economics
Duke University
Econ 139/239
1 / 61
Econ 139/239
2 / 61
Econ 139/239
3 / 61
Econ 139/239
4 / 61
Econ 139/239
5 / 61
Since this violates OLS A1, OLS wont just be biased but also
inconsistent, so OVB is a problem whether the sample size is large or
small.
The magnitude and direction of the bias depends on the correlation
between the regressor and the omitted variable (or more generally, the
error term).
The best solution to the OVB problem is to add (if you can) the
other relevant variables to the regression.
If you cant, you will have to use another method (like Fixed Effects) to
solve the problem.
Econ 139/239
6 / 61
Econ 139/239
7 / 61
Econ 139/239
8 / 61
Homoskedasticity
Econ 139/239
9 / 61
Econ 139/239
10 / 61
When the sample size is large, a 95% confidence interval for j can
be constructed as
bj 1.96 SE bj , bj + 1.96 SE bj
Remember that this confidence interval contains the true value of j
with a 95% probability (i.e. it contains the true value of j in 95% of
all possible randomly selected samples).
Equivalently, it is also the set of values of j that cannot be rejected
by a 5% two-sided hypothesis test.
Econ 139/239
11 / 61
Assuming A1-A4 and a large sample size, you can use the F -statistic.
To do so in practice, you need to:
1
2
3
4
Econ 139/239
12 / 61
Goodness of Fit
There are 3 main ways to measure goodness of fit.
1
R = 1
Final Review (Duke)
s2
n 1 SSR
= 1 2ub
n k 1 TSS
sY
Econ 139/239
13 / 61
You can have a good model but a low R 2 and R because Var (ui ) is
large
2
Can also have a bad model with R 1 (spurious regression)
Econ 139/239
14 / 61
Econ 139/239
15 / 61
Xi X ui
b
1 = 1 +
2
Xi X
(Xi X )ui p
Since b1 = 1 +
2 1 +
( Xi X )
Cov (Xi , ui ) 6= 0.
Xu
,
X2
b1 will be inconsistent if
Econ 139/239
16 / 61
Econ 139/239
17 / 61
Simultaneous causality
X causes Y , but Y in turn causes X
Use IV.
Design an experiment.
Sample selection
The availability of data is related to the value of the dependent
variable.
Use a model that corrects for selection bias.
Econ 139/239
18 / 61
Nonlinearities
Identifying and Modeling Nonlinearities
The basic OLS model assumes that the X s are all linearly related to
Y through the population regression line.
What if this is not the case?
We looked at two methods for modeling nonlinearities using OLS:
Allowing the effect on Y of a unit change in X1 to depend on the value
of another independent variable X2 (or perhaps more than one).
This method uses dummy variables and interactions.
Econ 139/239
19 / 61
Nonlinearities
Econ 139/239
20 / 61
Econ 139/239
21 / 61
Econ 139/239
22 / 61
Quadratic Regression
Econ 139/239
23 / 61
Quadratic Regression
Econ 139/239
24 / 61
Quadratic Regression
The estimator of the unknown population difference
Y = f (X1 , ..Xi + Xi , .., Xk ) f (X1 , ..Xi , .., Xk )
is just the difference between the predicted values
b =b
Y
f (X1 , ..Xi + Xi , .., Xk ) b
f (X1 , ..Xi , .., Xk )
For a quadratic regression
bi = b0 + b1 Xi + b2 Xi2
Y
we have
b
Y
= b0 + b1 (Xi + Xi ) + b2 (Xi + Xi )2 b0 + b1 Xi + b2 Xi2
= b1 X i +2 b2 Xi (Xi ) + b2 (Xi )2
Final Review (Duke)
Econ 139/239
25 / 61
Quadratic Regression
Notice that to compute this you will need to know b1 and b2 , the
initial value of Xi , and the size of the change Xi .
To compute the standard error of the effect on Y of changing X in
the quadratic regression, you need to compute
b
Y
b =
SE Y
F
where F is the F -statistic from the null hypothesis that the effect is
zero, which will depend on the coefficients on X and X 2 , the initial
value of Xi , and the size of the change Xi .
Econ 139/239
26 / 61
Polynomial Regression
Econ 139/239
27 / 61
Econ 139/239
28 / 61
Linear-log model
Log-linear model
Log-log model
Linear-log model
Assume that the regression has the following shape
Y = 0 + 1 ln X + u
When would we want to use this approach?
Econ 139/239
29 / 61
Econ 139/239
30 / 61
Econ 139/239
31 / 61
Econ 139/239
32 / 61
Econ 139/239
33 / 61
Most of the tools weve learned so far carry over to the LPM.
confidence intervals, hypothesis tests, & interactions are the same.
2
only R 2 and R dont, since the fitted values are always somewhat far
from Yi .
However, the LPM has a serious flaw: you can get predicted
probabilities that are greater than one or less than zero.
For this reason, we introduced two nonlinear specifications (logit and
probit) to correct this flaw.
Econ 139/239
34 / 61
e 0 + 1 X1 +...+ k Xk
F ( 0 + 1 X1 + ... + k Xk )
1 + e 0 + 1 X1 +...+ k Xk
Econ 139/239
35 / 61
Econ 139/239
36 / 61
One drawback relative to the LPM is that the coefficients from the
logit or probit do not have simple interpretations.
Both the predicted values and differences in predicted values are
non-linear functions of the s and X s.
Econ 139/239
37 / 61
f (X1 , ..., Xk ) =
1 + e b0 + b1 X1 +...+ bk Xk
Econ 139/239
F b0 + b1 X1 + ... + bk Xk
38 / 61
Econ 139/239
39 / 61
Estimation
Econ 139/239
40 / 61
0 ... k
Econ 139/239
41 / 61
0 ... k
Since this does not have a nice closed form solution, we cant
represent the estimators using simple formulas (like we could with
OLS).
Instead, we must use a computer algorithm to maximize the function
numerically.
But we know that under fairly general conditions, ML estimation is
consistent, asymptotically normal, and efficient.
Econ 139/239
42 / 61
Econ 139/239
43 / 61
max
The formula for the logit simply replaces Lmax
probit with Llogit :
Pseudo-R 2 = 1
ln Lmax
logit
ln (Lmax
bernoulli )
The Pseudo-R 2 tells us how well the probit or logit does relative to a
simple Bernoulli model, so a higher value means that the probit (or
logit) does a better job of explaining the data.
Econ 139/239
44 / 61
Econ 139/239
45 / 61
(1)
2 If we believe that there is a third component of the error ( ) that varies over time
t
but is constant across units, we can also add a time fixed effect.
Final Review (Duke)
Econ 139/239
46 / 61
Random Effects
If the fixed effect i is uncorrelated with all the included regressors in
all time periods (Cov (i , Xj,it ) = 0) we can still use OLS to estimate
Yit = 0 + 1 X1,it + ... + k Xk,it + i + uit
but it will be more efficient to use an estimator that accounts for the
fact that the observations are no longer iid (due to the presence of
i ).
We can do so by using a particular form of GLS (Generalized Least
Squares) known as the random effects (RE) estimator.
If the fixed effect i is correlated with one of more of the included
regressors (Cov (i , Xj,it ) 6= 0), RE will be inconsistent.
In this case, we should use the fixed effects (FE) estimator, which
differences i away, allowing us to use OLS.
Final Review (Duke)
Econ 139/239
47 / 61
Fixed Effects
Specifically, by subtracting the average of both sides of
Yit = 0 + 1 X1,it + ... + k Xk,it + i + uit
from itself, we are left with
Econ 139/239
48 / 61
Econ 139/239
49 / 61
Hausman Test
Formally, the Hausman test involves constructing a test statistic
which measures the normalized difference of the coefficients estimated
using RE and FE respectively.
Econ 139/239
50 / 61
Instrumental Variables
(2)
Econ 139/239
51 / 61
Instrumental Variables
Yi = 0 + 1 Xi + ui
(2)
Instrument relevance
Cov (Zi , Xi ) 6= 0 (Usually easy to satisfy)
Instrument exogeneity
Cov (Zi , ui ) = 0 (Usually hard to satisfy)
Econ 139/239
52 / 61
Instrumental Variables
So how does IV work?
Assume that the relation between the endogenous variable Xi and the
instrument Zi is described by the following linear model:
Xi = 0 + 1 Zi + vi
where, if Zi is a valid instrument, (0 + 1 Zi ) is uncorrelated with
the error term ui (but Cov (vi , ui ) 6= 0).
2SLS3 estimates the parameter 1 in
Yi = 0 + 1 Xi + ui
using only the component of Xi that is uncorrelated with the error.
3 Although this discussion concerns the univariate case with one instrument, the
general case is a simple extension.
Final Review (Duke)
Econ 139/239
53 / 61
Instrumental Variables
This procedure is called 2SLS because it involves two steps:
1
In practice, the two steps are performed jointly, which also computes
the correct standard errors.
The formula for this 2SLS estimator is given by
n
2SLS
=
1
Zi Z
Zi Z
i =1
n
Yi Y
d (Zi , Yi )
Cov
= Cov
d (Zi , Xi )
Xi X
i =1
Econ 139/239
54 / 61
Inference in 2SLS
b22SLS =
1
1 var [(Zi Z ) ui ]
n [Cov (Zi , Xi )]2
Econ 139/239
55 / 61
Regress each Xji on the instruments (Z1i , ..., Zmi ) and the included
exogenous
regressors
(W1i , ..., Wri ) using OLS. Compute the predicted
b1i , ..., X
bki from these k regressions.
values X
b1i , ..., X
bki and the included
Regress Yi on the predicted values X
exogenous regressors (W1i , ..., Wri ) using OLS.
In practice, the two steps are done jointly, in order to compute the
correct standard errors.
Final Review (Duke)
Econ 139/239
56 / 61
Econ 139/239
57 / 61
Econ 139/239
58 / 61
Econ 139/239
59 / 61
Instrument Exogeneity
Econ 139/239
60 / 61
Instrument Exogeneity
If we use OLS to estimate the regression coefficients in
u i2SLS = 0 + 1 Z1i + .. + m Zmi + m+1 W1i + .. + m+r Wri + ei
we can then use the F -statistic testing the null hypothesis
H0 : 1 = ... = m = 0
to construct the OIR test statistic
d
J = mF 2mk
where m is the number of instruments and k is the number of
endogenous variables.
Since the null hypothesis of this test is that u is uncorrelated with the
Z s, rejecting the null implies that the instruments are not exogenous.
Final Review (Duke)
Econ 139/239
61 / 61