Final Reviewheloo

Econ 139/239: Introduction to Econometrics
Final Review
Sophia Zhengzi Li1
1 Department
of Economics
Duke University
Summer II, 2010
Final Review (Duke)
Econ 139/239
Summer II, 2010
1 / 61
The final exam will be Saturday, August 14 from 2 PM - 5 PM.

Content
The final will be cumulative, but will be biased toward more recent
material (i.e. Binary Dependent Variables, Panel Data, IV).
Todays slides provide a good indication of the topics that I believe are
important.
Stop me if you have any questions!
Preparation & Mechanics

Problem set, Quiz, in-class practice and Discussion session are the best
indication of exam content and style.
The exam is closed book, but you will be allowed to use the final cheat
sheet.
You should bring a calculator, since you will be doing several
calculations!
Final Review (Duke)
Econ 139/239
Summer II, 2010
2 / 61
Univariate Regression Analysis
Before the first midterm, we introduced the univariate regression

model
Yi = 0 + 1 Xi + ui
which we estimated using OLS.
In order for OLS to have the properties that we value in an estimator
(unbiasedness, consistency, and asymptotic normality)1 , we needed to
make some assumptions.
1 To prove efficiency we would need to assume homoskedasticity of the errors

(Var (ui | Xi ) = 2 ) as well. However, we usually wont make this assumption.
Final Review (Duke)
Econ 139/239
Summer II, 2010
3 / 61
The OLS Assumptions

OLS Assumption 1 Linearity
E (ui | Xi ) = 0
OLS Assumption 2 Simple random sample
(Xi , Yi ) are iid draws from their joint distribution, and

OLS Assumption 3 No extreme outliers
ui and Xi have non-zero & finite fourth moments:

0 < E Xi4 < and 0 < E ui4 <
Given OLS assumptions 1-3, the OLS estimators are:
Unbiased
Consistent
Asymptotically Normal
Final Review (Duke)
Econ 139/239
Summer II, 2010
4 / 61
Omitted Variable Bias

However, univariate OLS has a big limitation: if the regressor (Xi ) is
correlated with a variable that has been omitted from the analysis,
but that determines (in part) the dependent variable, then the OLS
estimator will suffer from omitted variable bias (OVB).
OVB occurs when two conditions are true:
The OV is correlated with the included regressor
The OV is a determinant of the dependent variable
OVB means that OLS Assumption 1 (E (ui | Xi ) = 0) does not hold.

The error term ui represents all factors (other than Xi ) that are
determinants of Yi .
If one of these factors is correlated with Xi , then the error term will
be correlated with Xi .
Final Review (Duke)
Econ 139/239
Summer II, 2010
5 / 61
Omitted Variable Bias
Since this violates OLS A1, OLS wont just be biased but also
inconsistent, so OVB is a problem whether the sample size is large or
small.
The magnitude and direction of the bias depends on the correlation
between the regressor and the omitted variable (or more generally, the
error term).
The best solution to the OVB problem is to add (if you can) the
other relevant variables to the regression.
If you cant, you will have to use another method (like Fixed Effects) to
solve the problem.
Final Review (Duke)
Econ 139/239
Summer II, 2010
6 / 61
Multiple Regression: Model and Interpretation
We assume the population regression model is given by

Yi = 0 + 1 X1i + ... + k Xki + ui
0 is the intercept (the mean impact of unobserved factors) and k is
the slope coefficient of Xk .
k represents the expected change in Y associated with a unit
change in Xk , holding all other regressors constant.
We can estimate the parameters of the multiple regression model
using OLS, by minimizing the sum of the squared prediction errors.
Final Review (Duke)
Econ 139/239
Summer II, 2010
7 / 61
The OLS Assumptions in the Multiple Regression Model
OLS Assumption 1 Linearity

E (ui | X1i , ..., Xki ) = 0
OLS Assumption 2 Simple random sample
(Yi , X1i , ..., Xki ) iid

OLS Assumption 3 No extreme outliers
X1i , ..., Xki , ui have non-zero & finite fourth moments
OLS Assumption 4 No perfect collinearity
Regressors are not linear combinations of each other
Given OLS A1-A4, OLS is unbiased, consistent, and asymptotically
normal.
Final Review (Duke)
Econ 139/239
Summer II, 2010
8 / 61
Homoskedasticity
If we want to assume homoskedasticity (in general we wont), we

would add:
OLS Assumption 5 Homoskedasticity
Var (ui | X1i , ..., Xki ) = 2
Adding OLS Assumption 5 makes OLS efficient and allows us to use
HO standard errors (but this assumption is often violated in the data).
Final Review (Duke)
Econ 139/239
Summer II, 2010
9 / 61
Hypothesis Tests and CIs for a Single Coefficient

To test the hypothesis H0 : j = j,0 against the alternative
HA : j 6= j,0

Compute the standard error of bj , SE bj
Compute the t-statistic
bj j,0

t=
SE bj
Compute the p-value

p-value = 2 t act
where t act is the value of the t-statistic actually computed. Reject H0
at the 5% significance level if the p-value is less than 0.05, or
equivalently, if |t act | > 1.96.
Final Review (Duke)
Econ 139/239
Summer II, 2010
10 / 61
Hypothesis Tests and CIs for a Single Coefficient
When the sample size is large, a 95% confidence interval for j can
be constructed as

bj 1.96 SE bj , bj + 1.96 SE bj
Remember that this confidence interval contains the true value of j
with a 95% probability (i.e. it contains the true value of j in 95% of
all possible randomly selected samples).
Equivalently, it is also the set of values of j that cannot be rejected
by a 5% two-sided hypothesis test.
Final Review (Duke)
Econ 139/239
Summer II, 2010
11 / 61
Testing Joint Hypotheses
What if you want to test a joint hypothesis about several coefficients?

Why might you want to?
If you think the coefficients are individually insignificant because of
near perfect multicollinearity.
Assuming A1-A4 and a large sample size, you can use the F -statistic.
To do so in practice, you need to:
1
2
3
4
Count the number of restrictions under the null (degrees of

freedom), call this q.
Compute F -statistic.
Check table for Fq, (or use the p-value, if it is provided).
Reject the null if p-value < or F -Stat > Fq, .
Final Review (Duke)
Econ 139/239
Summer II, 2010
12 / 61
Goodness of Fit
There are 3 main ways to measure goodness of fit.
1
The standard error of the regression (SER) - the SER is a measure of

the spread of the distribution of Y around the regression line, but it
depends on the units of Y .
R 2 - The regression R 2 is the fraction of the sample variation in Yi
explained by the regressors.

2
bi Y
Y
ESS
SSR
R2 =
2 = TSS = 1 TSS
Y Y
i
However, in multiple regression, the R 2 increases whenever a new

regressor is added (unless its perfectly multicollinear with the original
regressors).
2
2
R - R adjusts for this by deflating the R 2 by a penalty factor:
2
R = 1
Final Review (Duke)
s2
n 1 SSR
= 1 2ub
n k 1 TSS
sY
Econ 139/239
Summer II, 2010
13 / 61
Goodness of Fit: Some Caveats
Some caveats about using R 2 and R in practice:

2
An increase in R 2 or R does not mean that an added variable is

statistically significant or that the regressors are a true cause of the
dependent variable.
2
A high R 2 or R does not mean that there is no omitted variable bias
or that you have the best possible set of regressors.
2
Neither R 2 nor R can prove our model is wrong or right.
2
You can have a good model but a low R 2 and R because Var (ui ) is
large
2
Can also have a bad model with R 1 (spurious regression)
Final Review (Duke)
Econ 139/239
Summer II, 2010
14 / 61
Threats to Internal Validity
A statistical analysis is internally valid if the statistical inferences

about causal effects are valid for the population being studied.
We know that internal validity hinges on two things:
1
The estimator of the causal effect should be consistent (unbiased

would be nice too, but its not always feasible).
Hypothesis tests should have the desired significance level (i.e. you
should be using the correct standard errors).
We focused on 1, since 2 is just about using the right formula.
Final Review (Duke)
Econ 139/239
Summer II, 2010
15 / 61

Consider the simple univariate regression
Yi = 0 + 1 Xi + ui
We know that:

Xi X ui
b
1 = 1 +
2
Xi X
(Xi X )ui p
Since b1 = 1 +
2 1 +
( Xi X )
Cov (Xi , ui ) 6= 0.
Xu
,
X2
b1 will be inconsistent if
Also, if E [ui | Xi ] 6= 0 then E [(Xi X ) ui ] 6= 0, so b1 will biased as

well.
So when might this occur?
Final Review (Duke)
Econ 139/239
Summer II, 2010
16 / 61
Omitted variables. Is the omitted variable observed?

Yes Include it! (Multivariate regression analysis)
No
Use Panel Data (fixed effects).
Use IV.
Design an experiment.
Wrong functional form

Approximate with a nonlinear functional form like a polynomial
regression.
Final Review (Duke)
Econ 139/239
Summer II, 2010
17 / 61
Simultaneous causality
X causes Y , but Y in turn causes X
Use IV.
Design an experiment.
Measurement error in the regressor

Get more accurate measurements.
Use Instrumental Variables (IV) or model the form of error.
Sample selection
The availability of data is related to the value of the dependent
variable.
Use a model that corrects for selection bias.
Final Review (Duke)
Econ 139/239
Summer II, 2010
18 / 61
Nonlinearities
Identifying and Modeling Nonlinearities
The basic OLS model assumes that the X s are all linearly related to
Y through the population regression line.
What if this is not the case?
We looked at two methods for modeling nonlinearities using OLS:
Allowing the effect on Y of a unit change in X1 to depend on the value
of another independent variable X2 (or perhaps more than one).
This method uses dummy variables and interactions.
Allowing the effect on Y of a unit change in X1 to depend on the value

of X1 itself
This method uses nonlinear functions of the X s like polynomials and
logarithms
Final Review (Duke)
Econ 139/239
Summer II, 2010
19 / 61
Nonlinearities
Final Review (Duke)
Econ 139/239
Summer II, 2010
20 / 61
Dummies and interactions

Through the use of the interaction term Xi Di , the population
regression line relating Yi and the continuous variable Xi can have a
slope or intercept that depends on the binary variable Di . There are
three possibilities:
1
Different intercepts, same slope

Yi = 0 + 1 Xi + 2 Di + ui
Different intercepts and slopes

Yi = 0 + 1 Xi + 2 Di + 3 (Xi Di ) + ui
Same intercept, different slopes

Yi = 0 + 1 Xi + 2 (Xi Di ) + ui
Final Review (Duke)
Econ 139/239
Summer II, 2010
21 / 61
Dummies and interactions
Final Review (Duke)
Econ 139/239
Summer II, 2010
22 / 61
Quadratic Regression
In other situations, we would like to allow the effect on Y of a unit

change in X to depend on the value of X itself, rather than some
other variable.
One way to do this is to run a quadratic regression
Yi = 0 + 1 Xi + 2 Xi2 + ui
Note that this regression is linear in the parameters: after creating
the variable Xi2 we can still use OLS to estimate the parameters.
Final Review (Duke)
Econ 139/239
Summer II, 2010
23 / 61
For the nonlinear models considered in this class

Y = f (X1 , X2 , ..., Xk ) + u
so the expected effect on Y of a change in Xi is then
Y = f (X1 , ..Xi + Xi , .., Xk ) f (X1 , ..Xi , .., Xk )
Notice that this formula applies both to the examples in Chapter 8,
where f is a nonlinear function of the X s but a linear function of the
parameters ( s) , and to the examples in Chapter 11, where f can be
a nonlinear function of the parameters as well as the X s.
Final Review (Duke)
Econ 139/239
Summer II, 2010
24 / 61
The estimator of the unknown population difference
Y = f (X1 , ..Xi + Xi , .., Xk ) f (X1 , ..Xi , .., Xk )
is just the difference between the predicted values
b =b
Y
f (X1 , ..Xi + Xi , .., Xk ) b
f (X1 , ..Xi , .., Xk )
For a quadratic regression
bi = b0 + b1 Xi + b2 Xi2
Y
we have
b
Y

= b0 + b1 (Xi + Xi ) + b2 (Xi + Xi )2 b0 + b1 Xi + b2 Xi2
= b1 X i +2 b2 Xi (Xi ) + b2 (Xi )2
Final Review (Duke)
Econ 139/239
Summer II, 2010
25 / 61
Notice that to compute this you will need to know b1 and b2 , the
initial value of Xi , and the size of the change Xi .
To compute the standard error of the effect on Y of changing X in
the quadratic regression, you need to compute

b
Y
b =
SE Y
F
where F is the F -statistic from the null hypothesis that the effect is
zero, which will depend on the coefficients on X and X 2 , the initial
value of Xi , and the size of the change Xi .
Final Review (Duke)
Econ 139/239
Summer II, 2010
26 / 61
Polynomial Regression
The quadratic regression is a special case of a polynomial regression,

which extends the quadratic specification to higher order polynomials
(X 3 , X 4 , etc.).
As in the quadratic regression, calculating the effect of a change in
one regressor involves requires the difference in predicted values.
Since this can be tedious, researchers often prefer to use logarithms to
model nonlinearities.
Final Review (Duke)
Econ 139/239
Summer II, 2010
27 / 61
Regressions using Logarithms

Logarithms are often useful because they convert changes in variables
into percentage changes:

X
X
X + X
= ln (X + X ) ln (X )
(when
is small)
ln
X
X
X
This approximation makes their coefficients simpler to interpret and
perform tests on than the coefficients in quadratic or polynomial
regressions, which is very convenient.
The advantage of logs over polynomials (or quadratics) is that
interpretation and tests are easier.
This disadvantage is that you have to decide on the shape of the
relationship beforehand.
Final Review (Duke)
Econ 139/239
Summer II, 2010
28 / 61

There are three main ways to use logs in regressions:
1
Linear-log model
Log-linear model
Log-log model
Linear-log model
Assume that the regression has the following shape
Y = 0 + 1 ln X + u
When would we want to use this approach?
Here, a 1% change in X is associated with a change in Y of .01 1 .
Final Review (Duke)
Econ 139/239
Summer II, 2010
29 / 61

Log-linear model
What if we apply the log to Y instead of X ?
Now the regression has the following shape
ln Y = 0 + 1 X + u
Here, a change in X of one unit (X = 1) is associated with a

100 1 % change in Y .
Log-log model
Now the regression has the following shape
ln Y = 0 + 1 ln X + u
Here, a 1% change in X is associated with a 1 % change in Y .

Final Review (Duke)
Econ 139/239
Summer II, 2010
30 / 61
Regression with Binary Dependent Variables
Final Review (Duke)
Econ 139/239
Summer II, 2010
31 / 61
Linear Probability Model (LPM)
One option for modeling discrete dependent variables is to just use

OLS.
The key here is to reinterpret the predicted values as probabilities.
Why is this interpretation legitimate?
The population regression function is a conditional expectation
(E (Y | X1 , ..., Xk )) and here Y is a 0/1 binary variable, so its
expected value is simply the probability that Y = 1.
Thus, for a binary variable,

E (Y | X1 , ..., Xk ) = P (Y = 1 | X1 , ..., Xk )
Final Review (Duke)
Econ 139/239
Summer II, 2010
32 / 61
OLS with a binary dependent variable is called the linear probability

model since it models the probability that Y = 1 with a straight line.
P (Y = 1 | X1 , ..., Xk ) = 0 + 1 X1 + ... + k Xk
i measures the change in P (Yi = 1) due to a unit change in Xi , or
if Xi is a dummy variable (Di ) it measures the change in P (Yi = 1)
associated with changing Di from being equal to 0 to being equal to
1.
Final Review (Duke)
Econ 139/239
Summer II, 2010
33 / 61
Most of the tools weve learned so far carry over to the LPM.
confidence intervals, hypothesis tests, & interactions are the same.
2
only R 2 and R dont, since the fitted values are always somewhat far
from Yi .
However, the LPM has a serious flaw: you can get predicted
probabilities that are greater than one or less than zero.
For this reason, we introduced two nonlinear specifications (logit and
probit) to correct this flaw.
Final Review (Duke)
Econ 139/239
Summer II, 2010
34 / 61
Probit and Logit

With the LPM
E (Yi | X1i , ..., Xki ) = P (Yi = 1 | X1i , ..., Xki )
= f (X1i , ..., Xki )

= 0 + 1 X1 + ... + k Xk
which can lead to predicted probabilities outside the unit interval.
Probit and logit use CDFs to model f (X1i , ..., Xki ) , which keeps the
predictions inside this interval.
The Probit model uses the standard normal CDF so
f (X1 , ..., Xk ) = ( 0 + 1 X1 + ... + k Xk )
The Logit uses the standard logistic CDF
f (X1 , ..., Xk ) =
Final Review (Duke)
e 0 + 1 X1 +...+ k Xk
F ( 0 + 1 X1 + ... + k Xk )
1 + e 0 + 1 X1 +...+ k Xk
Econ 139/239
Summer II, 2010
35 / 61
Probit and Logit

Both are essentially fitting an S shaped curve through the data, and
produce pretty similar results, so choosing between them is usually a
matter of preference (i.e. arbitrary).
Final Review (Duke)
Econ 139/239
Summer II, 2010
36 / 61
Probit and Logit
One drawback relative to the LPM is that the coefficients from the
logit or probit do not have simple interpretations.
Both the predicted values and differences in predicted values are
non-linear functions of the s and X s.
The model is best interpreted by computing predicted probabilities

and the effect of a unit change in a regressor (often evaluated at the
mean value of the other regressors).
Final Review (Duke)
Econ 139/239
Summer II, 2010
37 / 61
Probit & Logit: Predicted Probabilities
For the probit model, the predicted probability that Y = 1, given

values of X1 , X2 , ..., Xk is calculated by computing the z-value,
z = b0 + b1 X1 + ... + bk Xk , and then looking up this z-value in the
normal distribution table.
For the logit model, the predicted probability that Y = 1, given
values of X1 , X2 , ..., Xk is calculated by computing the value of
b0 + b1 X1 + ... + bk Xk , and then plugging this value into the logistic
cumulative distribution function
e 0 + 1 X1 +...+ k Xk
b
f (X1 , ..., Xk ) =
Final Review (Duke)
1 + e b0 + b1 X1 +...+ bk Xk
Econ 139/239

F b0 + b1 X1 + ... + bk Xk
Summer II, 2010
38 / 61
Probit & Logit: Interpreting Coefficients
For either model, the effect of a change in a regressor is computed by

1
Computing the predicted probability for the initial value of the

regressors,
Computing the predicted probability for the new or changed value of

the regressors, and
Taking their difference
Final Review (Duke)
Econ 139/239
Summer II, 2010
39 / 61
Estimation
We cant use OLS to estimate the coefficients because these

parameters enter both the logit and probit nonlinearly.
In other words, both
E (Yi | X1i , ..., Xki ) = ( 0 + 1 X1 + ... + k Xk )
and
E (Yi | X1i , ..., Xki ) = F ( 0 + 1 X1 + ... + k Xk )
are nonlinear in the coefficients ( s) so we cant use OLS.
Instead, the coefficients of the probit and logit models are estimated
using maximum likelihood.
Final Review (Duke)
Econ 139/239
Summer II, 2010
40 / 61
Maximum Likelihood (ML)

To use ML, we treat the joint probability distribution of the data as a
function of the unknown coefficients.
If we know the distribution of the data as a function of the
parameters, bMLE is the parameter(s) that maximize the likelihood of
our data.
For probit, the log-likelihood is
ln L (Y1 , ..., Yn | X1i , .., Xki ; 0 , .., k ) =
[yi ln ( 0 + . + k Xk ) + (1 yi ) ln (1 ( 0 + . + k Xk ))]
The MLE then solves
Max [yi ln ( 0 + . + k Xk ) + (1 yi ) ln (1 ( 0 + . + k Xk ))]
0 ... k
Final Review (Duke)
Econ 139/239
Summer II, 2010
41 / 61
Maximum Likelihood (ML)
Max [yi ln ( 0 + . + k Xk ) + (1 yi ) ln (1 ( 0 + . + k Xk ))]
0 ... k
Since this does not have a nice closed form solution, we cant
represent the estimators using simple formulas (like we could with
OLS).
Instead, we must use a computer algorithm to maximize the function
numerically.
But we know that under fairly general conditions, ML estimation is
consistent, asymptotically normal, and efficient.
Final Review (Duke)
Econ 139/239
Summer II, 2010
42 / 61
Inference & Goodness of Fit

Because the MLE is asymptotically normal, statistical inference about
the probit and logit coefficients is carried out in the same manner as
in OLS.
As usual, we can use a t-ratio or F-stat to test hypotheses about one
or more coefficients.
2
We cant use R 2 or R though because the fitted values will still be

somewhat far from Yi so instead, we can use

ln Lmax
probit
Pseudo-R 2 = 1
max
ln (Lbernoulli )
where Lmax
probit is the value of the maximized probit likelihood and
Lmax
bernoulli is the value of the maximized Bernoulli likelihood
Final Review (Duke)
Econ 139/239
Summer II, 2010
43 / 61
Inference & Goodness of Fit
max
The formula for the logit simply replaces Lmax
probit with Llogit :
Pseudo-R 2 = 1

ln Lmax
logit
ln (Lmax
bernoulli )
The Pseudo-R 2 tells us how well the probit or logit does relative to a
simple Bernoulli model, so a higher value means that the probit (or
logit) does a better job of explaining the data.
Final Review (Duke)
Econ 139/239
Summer II, 2010
44 / 61
Panel Data Techniques
Panel data techniques are a powerful method for addressing the

omitted variables problem.
If we are willing to assume that the omitted variables are constant
over time, we can solve the OV problem by collecting panel data from
the same units (e.g. people, firms, states) for several time periods (at
least two).
Final Review (Duke)
Econ 139/239
Summer II, 2010
45 / 61
Panel Data Techniques
Suppose that we have n units each observed for T periods. We can

write our regression model as
Yit = 0 + 1 X1,it + ... + k Xk,it + i + uit
(1)
where the error term now includes two components:

A fixed effect i which includes all unobserved variables that are
constant over time2 for each unit i
A second component uit which contains all the remaining (time-unit
specific) error.
2 If we believe that there is a third component of the error ( ) that varies over time
t
but is constant across units, we can also add a time fixed effect.
Final Review (Duke)
Econ 139/239
Summer II, 2010
46 / 61
Random Effects
If the fixed effect i is uncorrelated with all the included regressors in
all time periods (Cov (i , Xj,it ) = 0) we can still use OLS to estimate
but it will be more efficient to use an estimator that accounts for the
fact that the observations are no longer iid (due to the presence of
i ).
We can do so by using a particular form of GLS (Generalized Least
Squares) known as the random effects (RE) estimator.
If the fixed effect i is correlated with one of more of the included
regressors (Cov (i , Xj,it ) 6= 0), RE will be inconsistent.
In this case, we should use the fixed effects (FE) estimator, which
differences i away, allowing us to use OLS.
Final Review (Duke)
Econ 139/239
Summer II, 2010
47 / 61
Fixed Effects
Specifically, by subtracting the average of both sides of
from itself, we are left with
(Yit Y i ) = 1 (X1,it X 1,i ) + ... + k (Xk,it X k,i ) + (uit u i )

which no longer includes the fixed effect i and can be estimated
using OLS.
Intuitively, we are exploiting the panel nature of the data to hold the
unobserved effect (i ) constant, even though we cant measure it.
Since we are regressing changes of Y on changes of X s, the fixed
effect wont play any role in this regression since, by definition, the
fixed effect did not change over time.
Final Review (Duke)
Econ 139/239
Summer II, 2010
48 / 61
Fixed Effects versus Random Effects

When Cov (i , Xj,it ) = 0, both FE and RE are consistent, but RE is
more efficient.
If Cov (i , Xj,it ) 6= 0, FE is unbiased and consistent, but RE is not, so
FE is more robust.
Therefore, you should only use RE if Cov (i , Xj,it ) = 0.
You can test this condition with a Hausman test.
Final Review (Duke)
Econ 139/239
Summer II, 2010
49 / 61
Hausman Test
Formally, the Hausman test involves constructing a test statistic
which measures the normalized difference of the coefficients estimated
using RE and FE respectively.
This test statistic will be distributed 2M , where M is the number of

coefficients (that vary over time).
Since the null hypothesis of the test statistic is that the coefficients
are the same, a rejection of the null implies that RE is inconsistent
(so we should use FE instead).
Final Review (Duke)
Econ 139/239
Summer II, 2010
50 / 61
Instrumental Variables
Instrumental variables (IV) techniques are another powerful method

for addressing the endogeneity problem.
Consider the simple univariate regression
Yi = 0 + 1 Xi + ui
(2)
We know that OLS is inconsistent if Xi is correlated with ui (that is,

if Xi is endogenous).
IV or 2SLS is an estimation technique that can be used instead of
OLS to recover consistent estimates of the parameters.
2SLS can be used when particular variables called instruments are
available.
Final Review (Duke)
Econ 139/239
Summer II, 2010
51 / 61
Yi = 0 + 1 Xi + ui
(2)
A valid instrument Zi must satisfy two conditions:

1
Instrument relevance
Cov (Zi , Xi ) 6= 0 (Usually easy to satisfy)
Instrument exogeneity
Cov (Zi , ui ) = 0 (Usually hard to satisfy)
Instruments allow us to break X into two parts, only one of which is

correlated with u, and then use the good (uncorrelated) part alone
to estimate (2).
Final Review (Duke)
Econ 139/239
Summer II, 2010
52 / 61
So how does IV work?
Assume that the relation between the endogenous variable Xi and the
instrument Zi is described by the following linear model:
Xi = 0 + 1 Zi + vi
where, if Zi is a valid instrument, (0 + 1 Zi ) is uncorrelated with
the error term ui (but Cov (vi , ui ) 6= 0).
2SLS3 estimates the parameter 1 in
Yi = 0 + 1 Xi + ui
using only the component of Xi that is uncorrelated with the error.
3 Although this discussion concerns the univariate case with one instrument, the
general case is a simple extension.
Final Review (Duke)
Econ 139/239
Summer II, 2010
53 / 61
This procedure is called 2SLS because it involves two steps:
1
Estimate (0 + 1 Zi ) by regressing Xi on Zi , using OLS. The

predicted value will then be
b (Xi | Zi ) =
b0 +
b1 Zi
X i = E
Regress Yi on X i , again using OLS, to get b2SLS

& b2SLS
0
1
In practice, the two steps are performed jointly, which also computes
the correct standard errors.
The formula for this 2SLS estimator is given by
n
2SLS
=
1
Zi Z
Zi Z
i =1
n
Yi Y
d (Zi , Yi )
Cov
= Cov
d (Zi , Xi )
Xi X
i =1
Final Review (Duke)
Econ 139/239
Summer II, 2010
54 / 61
Inference in 2SLS
Given the 2SLS assumptions, the 2SLS estimator is consistent and

asymptotically normal (CAN).
b2SLS has a sampling distribution that is
Specifically, in large
samples,
1
approximately N 1 , b22SLS where
1
b22SLS =
1
1 var [(Zi Z ) ui ]
n [Cov (Zi , Xi )]2
which can be estimated by estimating the variance and covariance

terms.
Statistical inference is again straightforward (provided that you use
standard errors that take the two stage procedure into account).
Final Review (Duke)
Econ 139/239
Summer II, 2010
55 / 61
Multiple Regression 2SLS

So what if you have more than one endogenous variable?
For the general case, the equation of interest is
Yi = 0 + 1 X1i + .. + k Xki + k +1 W1i + .. + k +r Wri + ui
The 2SLS estimator is still computed in two stages:
1
Regress each Xji on the instruments (Z1i , ..., Zmi ) and the included
exogenous
regressors
(W1i , ..., Wri ) using OLS. Compute the predicted
b1i , ..., X
bki from these k regressions.
values X

b1i , ..., X
bki and the included
Regress Yi on the predicted values X
exogenous regressors (W1i , ..., Wri ) using OLS.
In practice, the two steps are done jointly, in order to compute the
correct standard errors.
Final Review (Duke)
Econ 139/239
Summer II, 2010
56 / 61
Multiple Regression 2SLS: Identification
So how many instruments do we need?

We need at least as many instruments as endogenous regressors
(otherwise we cant estimate the parameters).
If m = k the equation is exactly identified.
If m > k the equation is over-identified.
If m < k the equation is under-identified.
If the equation is under-identified, IV/2SLS cannot be used!

However, if we are over-identified, we can test instrument exogeneity.
Final Review (Duke)
Econ 139/239
Summer II, 2010
57 / 61
Problems with Instrument Relevance
Valid instruments must be both exogenous and relevant.

With relevance the issue is not just whether the instrument is
relevant, but how relevant.
The degree of relevance is called strength.
An instrument Z is weak if Cov (Zi , Xi ) 0.
Instrument weakness is a problem since
d (Zi , Yi )
Cov
b2SLS =
d (Zi , Xi )
Cov
d (Zi , Xi ) 0 then b2SLS explodes (and so will 22SLS ).
so if Cov
b
1
Final Review (Duke)
Econ 139/239
Summer II, 2010
58 / 61
Problems with Instrument Relevance
You can check for weakness (with a single endogenous regressor) by

using the following rule of thumb test
If the F -statistic testing the null hypothesis that the coefficients on the
instruments are all zero in the first stage regression is less than 10, you
have weak instruments.
If your instruments are weak, you should seriously consider using a

different technique (or getting better instruments).
Final Review (Duke)
Econ 139/239
Summer II, 2010
59 / 61
Instrument Exogeneity
If the instruments are not exogenous (Cov (Zi , ui ) 6= 0 ), then the

b s will be correlated with u and 2SLS will be inconsistent.
X
Defeats the purpose of 2SLS since it cant isolate the good part of X .
If our system is over-identified, we can test for instrument exogeneity

by using the test of over-identifying restrictions (OIR), which works as
follows.
Exogeneity means u is uncorrelated with the Z s.
We dont observe ui but we can estimate it with the 2SLS coefficients
u i2SLS = Yi b2SLS
b2SLS
X1i .. b2SLS
0
1
k +r Wri
Final Review (Duke)
Econ 139/239
Summer II, 2010
60 / 61
Instrument Exogeneity
If we use OLS to estimate the regression coefficients in
u i2SLS = 0 + 1 Z1i + .. + m Zmi + m+1 W1i + .. + m+r Wri + ei
we can then use the F -statistic testing the null hypothesis
H0 : 1 = ... = m = 0
to construct the OIR test statistic
d
J = mF 2mk
where m is the number of instruments and k is the number of
endogenous variables.
Since the null hypothesis of this test is that u is uncorrelated with the
Z s, rejecting the null implies that the instruments are not exogenous.
Final Review (Duke)
Econ 139/239
Summer II, 2010
61 / 61

Final Reviewheloo

Uploaded by

Document Information

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Final Reviewheloo

Uploaded by

Copyright:

Available Formats

Econ 139/239: Introduction to Econometrics

Summer II, 2010

Final Review (Duke)

Summer II, 2010

The final exam will be Saturday, August 14 from 2 PM - 5 PM.

Preparation & Mechanics

Final Review (Duke)

Summer II, 2010

Univariate Regression Analysis

Before the first midterm, we introduced the univariate regression

1 To prove efficiency we would need to assume homoskedasticity of the errors

Summer II, 2010

The OLS Assumptions

(Xi , Yi ) are iid draws from their joint distribution, and

Final Review (Duke)

Summer II, 2010

Omitted Variable Bias

OVB means that OLS Assumption 1 (E (ui | Xi ) = 0) does not hold.

Final Review (Duke)

Summer II, 2010

Omitted Variable Bias

Final Review (Duke)

Summer II, 2010

Multiple Regression: Model and Interpretation

We assume the population regression model is given by

Final Review (Duke)

Summer II, 2010

The OLS Assumptions in the Multiple Regression Model

OLS Assumption 1 Linearity

(Yi , X1i , ..., Xki ) iid

Final Review (Duke)

Summer II, 2010

If we want to assume homoskedasticity (in general we wont), we

Final Review (Duke)

Summer II, 2010

Hypothesis Tests and CIs for a Single Coefficient

Final Review (Duke)

Summer II, 2010

Hypothesis Tests and CIs for a Single Coefficient

Final Review (Duke)

Summer II, 2010

Testing Joint Hypotheses

What if you want to test a joint hypothesis about several coefficients?

Count the number of restrictions under the null (degrees of

Final Review (Duke)

Summer II, 2010

The standard error of the regression (SER) - the SER is a measure of

However, in multiple regression, the R 2 increases whenever a new

Summer II, 2010

Goodness of Fit: Some Caveats

Some caveats about using R 2 and R in practice:

An increase in R 2 or R does not mean that an added variable is

Final Review (Duke)

Summer II, 2010

Threats to Internal Validity

A statistical analysis is internally valid if the statistical inferences

The estimator of the causal effect should be consistent (unbiased

We focused on 1, since 2 is just about using the right formula.

Final Review (Duke)

Summer II, 2010

Threats to Internal Validity

Also, if E [ui | Xi ] 6= 0 then E [(Xi X ) ui ] 6= 0, so b1 will biased as

Final Review (Duke)

Summer II, 2010