You are on page 1of 61

Econ 139/239: Introduction to Econometrics

Final Review
Sophia Zhengzi Li1
1 Department

of Economics
Duke University

Summer II, 2010

Final Review (Duke)

Econ 139/239

Summer II, 2010

1 / 61

The final exam will be Saturday, August 14 from 2 PM - 5 PM.


Content
The final will be cumulative, but will be biased toward more recent
material (i.e. Binary Dependent Variables, Panel Data, IV).
Todays slides provide a good indication of the topics that I believe are
important.
Stop me if you have any questions!

Preparation & Mechanics


Problem set, Quiz, in-class practice and Discussion session are the best
indication of exam content and style.
The exam is closed book, but you will be allowed to use the final cheat
sheet.
You should bring a calculator, since you will be doing several
calculations!

Final Review (Duke)

Econ 139/239

Summer II, 2010

2 / 61

Univariate Regression Analysis

Before the first midterm, we introduced the univariate regression


model
Yi = 0 + 1 Xi + ui
which we estimated using OLS.
In order for OLS to have the properties that we value in an estimator
(unbiasedness, consistency, and asymptotic normality)1 , we needed to
make some assumptions.

1 To prove efficiency we would need to assume homoskedasticity of the errors


(Var (ui | Xi ) = 2 ) as well. However, we usually wont make this assumption.
Final Review (Duke)

Econ 139/239

Summer II, 2010

3 / 61

The OLS Assumptions


OLS Assumption 1 Linearity
E (ui | Xi ) = 0
OLS Assumption 2 Simple random sample

(Xi , Yi ) are iid draws from their joint distribution, and


OLS Assumption 3 No extreme outliers
ui and Xi have non-zero & finite fourth moments:


0 < E Xi4 < and 0 < E ui4 <
Given OLS assumptions 1-3, the OLS estimators are:
Unbiased
Consistent
Asymptotically Normal

Final Review (Duke)

Econ 139/239

Summer II, 2010

4 / 61

Omitted Variable Bias


However, univariate OLS has a big limitation: if the regressor (Xi ) is
correlated with a variable that has been omitted from the analysis,
but that determines (in part) the dependent variable, then the OLS
estimator will suffer from omitted variable bias (OVB).
OVB occurs when two conditions are true:
The OV is correlated with the included regressor
The OV is a determinant of the dependent variable

OVB means that OLS Assumption 1 (E (ui | Xi ) = 0) does not hold.


The error term ui represents all factors (other than Xi ) that are
determinants of Yi .
If one of these factors is correlated with Xi , then the error term will
be correlated with Xi .

Final Review (Duke)

Econ 139/239

Summer II, 2010

5 / 61

Omitted Variable Bias

Since this violates OLS A1, OLS wont just be biased but also
inconsistent, so OVB is a problem whether the sample size is large or
small.
The magnitude and direction of the bias depends on the correlation
between the regressor and the omitted variable (or more generally, the
error term).
The best solution to the OVB problem is to add (if you can) the
other relevant variables to the regression.
If you cant, you will have to use another method (like Fixed Effects) to
solve the problem.

Final Review (Duke)

Econ 139/239

Summer II, 2010

6 / 61

Multiple Regression: Model and Interpretation

We assume the population regression model is given by


Yi = 0 + 1 X1i + ... + k Xki + ui
0 is the intercept (the mean impact of unobserved factors) and k is
the slope coefficient of Xk .
k represents the expected change in Y associated with a unit
change in Xk , holding all other regressors constant.
We can estimate the parameters of the multiple regression model
using OLS, by minimizing the sum of the squared prediction errors.

Final Review (Duke)

Econ 139/239

Summer II, 2010

7 / 61

The OLS Assumptions in the Multiple Regression Model

OLS Assumption 1 Linearity


E (ui | X1i , ..., Xki ) = 0
OLS Assumption 2 Simple random sample

(Yi , X1i , ..., Xki ) iid


OLS Assumption 3 No extreme outliers
X1i , ..., Xki , ui have non-zero & finite fourth moments
OLS Assumption 4 No perfect collinearity
Regressors are not linear combinations of each other
Given OLS A1-A4, OLS is unbiased, consistent, and asymptotically
normal.

Final Review (Duke)

Econ 139/239

Summer II, 2010

8 / 61

Homoskedasticity

If we want to assume homoskedasticity (in general we wont), we


would add:
OLS Assumption 5 Homoskedasticity
Var (ui | X1i , ..., Xki ) = 2
Adding OLS Assumption 5 makes OLS efficient and allows us to use
HO standard errors (but this assumption is often violated in the data).

Final Review (Duke)

Econ 139/239

Summer II, 2010

9 / 61

Hypothesis Tests and CIs for a Single Coefficient


To test the hypothesis H0 : j = j,0 against the alternative
HA : j 6= j,0
 
Compute the standard error of bj , SE bj
Compute the t-statistic
bj j,0
 
t=
SE bj
Compute the p-value


p-value = 2 t act
where t act is the value of the t-statistic actually computed. Reject H0
at the 5% significance level if the p-value is less than 0.05, or
equivalently, if |t act | > 1.96.

Final Review (Duke)

Econ 139/239

Summer II, 2010

10 / 61

Hypothesis Tests and CIs for a Single Coefficient

When the sample size is large, a 95% confidence interval for j can
be constructed as
 
 

bj 1.96 SE bj , bj + 1.96 SE bj
Remember that this confidence interval contains the true value of j
with a 95% probability (i.e. it contains the true value of j in 95% of
all possible randomly selected samples).
Equivalently, it is also the set of values of j that cannot be rejected
by a 5% two-sided hypothesis test.

Final Review (Duke)

Econ 139/239

Summer II, 2010

11 / 61

Testing Joint Hypotheses

What if you want to test a joint hypothesis about several coefficients?


Why might you want to?
If you think the coefficients are individually insignificant because of
near perfect multicollinearity.

Assuming A1-A4 and a large sample size, you can use the F -statistic.
To do so in practice, you need to:
1

2
3
4

Count the number of restrictions under the null (degrees of


freedom), call this q.
Compute F -statistic.
Check table for Fq, (or use the p-value, if it is provided).
Reject the null if p-value < or F -Stat > Fq, .

Final Review (Duke)

Econ 139/239

Summer II, 2010

12 / 61

Goodness of Fit
There are 3 main ways to measure goodness of fit.
1

The standard error of the regression (SER) - the SER is a measure of


the spread of the distribution of Y around the regression line, but it
depends on the units of Y .
R 2 - The regression R 2 is the fraction of the sample variation in Yi
explained by the regressors.

2
bi Y
Y
ESS
SSR
R2 =
2 = TSS = 1 TSS
Y Y
i

However, in multiple regression, the R 2 increases whenever a new


regressor is added (unless its perfectly multicollinear with the original
regressors).
2
2
R - R adjusts for this by deflating the R 2 by a penalty factor:
2

R = 1
Final Review (Duke)

s2
n 1 SSR
= 1 2ub
n k 1 TSS
sY

Econ 139/239

Summer II, 2010

13 / 61

Goodness of Fit: Some Caveats

Some caveats about using R 2 and R in practice:


2

An increase in R 2 or R does not mean that an added variable is


statistically significant or that the regressors are a true cause of the
dependent variable.
2
A high R 2 or R does not mean that there is no omitted variable bias
or that you have the best possible set of regressors.
2
Neither R 2 nor R can prove our model is wrong or right.
2

You can have a good model but a low R 2 and R because Var (ui ) is
large
2
Can also have a bad model with R 1 (spurious regression)

Final Review (Duke)

Econ 139/239

Summer II, 2010

14 / 61

Threats to Internal Validity

A statistical analysis is internally valid if the statistical inferences


about causal effects are valid for the population being studied.
We know that internal validity hinges on two things:
1

The estimator of the causal effect should be consistent (unbiased


would be nice too, but its not always feasible).
Hypothesis tests should have the desired significance level (i.e. you
should be using the correct standard errors).

We focused on 1, since 2 is just about using the right formula.

Final Review (Duke)

Econ 139/239

Summer II, 2010

15 / 61

Threats to Internal Validity


Consider the simple univariate regression
Yi = 0 + 1 Xi + ui
We know that:


Xi X ui
b
1 = 1 +
2
Xi X

(Xi X )ui p
Since b1 = 1 +
2 1 +
( Xi X )
Cov (Xi , ui ) 6= 0.

Xu
,
X2

b1 will be inconsistent if

Also, if E [ui | Xi ] 6= 0 then E [(Xi X ) ui ] 6= 0, so b1 will biased as


well.
So when might this occur?

Final Review (Duke)

Econ 139/239

Summer II, 2010

16 / 61

Threats to Internal Validity

Omitted variables. Is the omitted variable observed?


Yes Include it! (Multivariate regression analysis)
No
Use Panel Data (fixed effects).
Use IV.
Design an experiment.

Wrong functional form


Approximate with a nonlinear functional form like a polynomial
regression.

Final Review (Duke)

Econ 139/239

Summer II, 2010

17 / 61

Threats to Internal Validity

Simultaneous causality
X causes Y , but Y in turn causes X
Use IV.
Design an experiment.

Measurement error in the regressor


Get more accurate measurements.
Use Instrumental Variables (IV) or model the form of error.

Sample selection
The availability of data is related to the value of the dependent
variable.
Use a model that corrects for selection bias.

Final Review (Duke)

Econ 139/239

Summer II, 2010

18 / 61

Nonlinearities
Identifying and Modeling Nonlinearities

The basic OLS model assumes that the X s are all linearly related to
Y through the population regression line.
What if this is not the case?
We looked at two methods for modeling nonlinearities using OLS:
Allowing the effect on Y of a unit change in X1 to depend on the value
of another independent variable X2 (or perhaps more than one).
This method uses dummy variables and interactions.

Allowing the effect on Y of a unit change in X1 to depend on the value


of X1 itself
This method uses nonlinear functions of the X s like polynomials and
logarithms

Final Review (Duke)

Econ 139/239

Summer II, 2010

19 / 61

Nonlinearities

Final Review (Duke)

Econ 139/239

Summer II, 2010

20 / 61

Dummies and interactions


Through the use of the interaction term Xi Di , the population
regression line relating Yi and the continuous variable Xi can have a
slope or intercept that depends on the binary variable Di . There are
three possibilities:
1

Different intercepts, same slope


Yi = 0 + 1 Xi + 2 Di + ui

Different intercepts and slopes


Yi = 0 + 1 Xi + 2 Di + 3 (Xi Di ) + ui

Same intercept, different slopes


Yi = 0 + 1 Xi + 2 (Xi Di ) + ui

Final Review (Duke)

Econ 139/239

Summer II, 2010

21 / 61

Dummies and interactions

Final Review (Duke)

Econ 139/239

Summer II, 2010

22 / 61

Quadratic Regression

In other situations, we would like to allow the effect on Y of a unit


change in X to depend on the value of X itself, rather than some
other variable.
One way to do this is to run a quadratic regression
Yi = 0 + 1 Xi + 2 Xi2 + ui
Note that this regression is linear in the parameters: after creating
the variable Xi2 we can still use OLS to estimate the parameters.

Final Review (Duke)

Econ 139/239

Summer II, 2010

23 / 61

Quadratic Regression

For the nonlinear models considered in this class


Y = f (X1 , X2 , ..., Xk ) + u
so the expected effect on Y of a change in Xi is then
Y = f (X1 , ..Xi + Xi , .., Xk ) f (X1 , ..Xi , .., Xk )
Notice that this formula applies both to the examples in Chapter 8,
where f is a nonlinear function of the X s but a linear function of the
parameters ( s) , and to the examples in Chapter 11, where f can be
a nonlinear function of the parameters as well as the X s.

Final Review (Duke)

Econ 139/239

Summer II, 2010

24 / 61

Quadratic Regression
The estimator of the unknown population difference
Y = f (X1 , ..Xi + Xi , .., Xk ) f (X1 , ..Xi , .., Xk )
is just the difference between the predicted values
b =b
Y
f (X1 , ..Xi + Xi , .., Xk ) b
f (X1 , ..Xi , .., Xk )
For a quadratic regression
bi = b0 + b1 Xi + b2 Xi2
Y
we have
b
Y



= b0 + b1 (Xi + Xi ) + b2 (Xi + Xi )2 b0 + b1 Xi + b2 Xi2

= b1 X i +2 b2 Xi (Xi ) + b2 (Xi )2
Final Review (Duke)

Econ 139/239

Summer II, 2010

25 / 61

Quadratic Regression

Notice that to compute this you will need to know b1 and b2 , the
initial value of Xi , and the size of the change Xi .
To compute the standard error of the effect on Y of changing X in
the quadratic regression, you need to compute


b
Y
b =
SE Y
F
where F is the F -statistic from the null hypothesis that the effect is
zero, which will depend on the coefficients on X and X 2 , the initial
value of Xi , and the size of the change Xi .

Final Review (Duke)

Econ 139/239

Summer II, 2010

26 / 61

Polynomial Regression

The quadratic regression is a special case of a polynomial regression,


which extends the quadratic specification to higher order polynomials
(X 3 , X 4 , etc.).
As in the quadratic regression, calculating the effect of a change in
one regressor involves requires the difference in predicted values.
Since this can be tedious, researchers often prefer to use logarithms to
model nonlinearities.

Final Review (Duke)

Econ 139/239

Summer II, 2010

27 / 61

Regressions using Logarithms


Logarithms are often useful because they convert changes in variables
into percentage changes:


X
X
X + X
= ln (X + X ) ln (X )
(when
is small)
ln
X
X
X
This approximation makes their coefficients simpler to interpret and
perform tests on than the coefficients in quadratic or polynomial
regressions, which is very convenient.
The advantage of logs over polynomials (or quadratics) is that
interpretation and tests are easier.
This disadvantage is that you have to decide on the shape of the
relationship beforehand.

Final Review (Duke)

Econ 139/239

Summer II, 2010

28 / 61

Regressions using Logarithms


There are three main ways to use logs in regressions:
1

Linear-log model

Log-linear model

Log-log model

Linear-log model
Assume that the regression has the following shape
Y = 0 + 1 ln X + u
When would we want to use this approach?

Here, a 1% change in X is associated with a change in Y of .01 1 .

Final Review (Duke)

Econ 139/239

Summer II, 2010

29 / 61

Regressions using Logarithms


Log-linear model
What if we apply the log to Y instead of X ?
Now the regression has the following shape
ln Y = 0 + 1 X + u
When would we want to use this approach?

Here, a change in X of one unit (X = 1) is associated with a


100 1 % change in Y .
Log-log model
Now the regression has the following shape
ln Y = 0 + 1 ln X + u
When would we want to use this approach?

Here, a 1% change in X is associated with a 1 % change in Y .


Final Review (Duke)

Econ 139/239

Summer II, 2010

30 / 61

Regression with Binary Dependent Variables

Final Review (Duke)

Econ 139/239

Summer II, 2010

31 / 61

Linear Probability Model (LPM)

One option for modeling discrete dependent variables is to just use


OLS.
The key here is to reinterpret the predicted values as probabilities.
Why is this interpretation legitimate?
The population regression function is a conditional expectation
(E (Y | X1 , ..., Xk )) and here Y is a 0/1 binary variable, so its
expected value is simply the probability that Y = 1.

Thus, for a binary variable,


E (Y | X1 , ..., Xk ) = P (Y = 1 | X1 , ..., Xk )

Final Review (Duke)

Econ 139/239

Summer II, 2010

32 / 61

Linear Probability Model (LPM)

OLS with a binary dependent variable is called the linear probability


model since it models the probability that Y = 1 with a straight line.
P (Y = 1 | X1 , ..., Xk ) = 0 + 1 X1 + ... + k Xk
i measures the change in P (Yi = 1) due to a unit change in Xi , or
if Xi is a dummy variable (Di ) it measures the change in P (Yi = 1)
associated with changing Di from being equal to 0 to being equal to
1.

Final Review (Duke)

Econ 139/239

Summer II, 2010

33 / 61

Linear Probability Model (LPM)

Most of the tools weve learned so far carry over to the LPM.
confidence intervals, hypothesis tests, & interactions are the same.
2
only R 2 and R dont, since the fitted values are always somewhat far
from Yi .

However, the LPM has a serious flaw: you can get predicted
probabilities that are greater than one or less than zero.
For this reason, we introduced two nonlinear specifications (logit and
probit) to correct this flaw.

Final Review (Duke)

Econ 139/239

Summer II, 2010

34 / 61

Probit and Logit


With the LPM
E (Yi | X1i , ..., Xki ) = P (Yi = 1 | X1i , ..., Xki )

= f (X1i , ..., Xki )


= 0 + 1 X1 + ... + k Xk
which can lead to predicted probabilities outside the unit interval.
Probit and logit use CDFs to model f (X1i , ..., Xki ) , which keeps the
predictions inside this interval.
The Probit model uses the standard normal CDF so
f (X1 , ..., Xk ) = ( 0 + 1 X1 + ... + k Xk )
The Logit uses the standard logistic CDF
f (X1 , ..., Xk ) =
Final Review (Duke)

e 0 + 1 X1 +...+ k Xk
F ( 0 + 1 X1 + ... + k Xk )
1 + e 0 + 1 X1 +...+ k Xk
Econ 139/239

Summer II, 2010

35 / 61

Probit and Logit


Both are essentially fitting an S shaped curve through the data, and
produce pretty similar results, so choosing between them is usually a
matter of preference (i.e. arbitrary).

Final Review (Duke)

Econ 139/239

Summer II, 2010

36 / 61

Probit and Logit

One drawback relative to the LPM is that the coefficients from the
logit or probit do not have simple interpretations.
Both the predicted values and differences in predicted values are
non-linear functions of the s and X s.

The model is best interpreted by computing predicted probabilities


and the effect of a unit change in a regressor (often evaluated at the
mean value of the other regressors).

Final Review (Duke)

Econ 139/239

Summer II, 2010

37 / 61

Probit & Logit: Predicted Probabilities

For the probit model, the predicted probability that Y = 1, given


values of X1 , X2 , ..., Xk is calculated by computing the z-value,
z = b0 + b1 X1 + ... + bk Xk , and then looking up this z-value in the
normal distribution table.
For the logit model, the predicted probability that Y = 1, given
values of X1 , X2 , ..., Xk is calculated by computing the value of
b0 + b1 X1 + ... + bk Xk , and then plugging this value into the logistic
cumulative distribution function
e 0 + 1 X1 +...+ k Xk
b

f (X1 , ..., Xk ) =

Final Review (Duke)

1 + e b0 + b1 X1 +...+ bk Xk

Econ 139/239



F b0 + b1 X1 + ... + bk Xk

Summer II, 2010

38 / 61

Probit & Logit: Interpreting Coefficients

For either model, the effect of a change in a regressor is computed by


1

Computing the predicted probability for the initial value of the


regressors,

Computing the predicted probability for the new or changed value of


the regressors, and

Taking their difference

Final Review (Duke)

Econ 139/239

Summer II, 2010

39 / 61

Estimation

We cant use OLS to estimate the coefficients because these


parameters enter both the logit and probit nonlinearly.
In other words, both
E (Yi | X1i , ..., Xki ) = ( 0 + 1 X1 + ... + k Xk )
and
E (Yi | X1i , ..., Xki ) = F ( 0 + 1 X1 + ... + k Xk )
are nonlinear in the coefficients ( s) so we cant use OLS.
Instead, the coefficients of the probit and logit models are estimated
using maximum likelihood.

Final Review (Duke)

Econ 139/239

Summer II, 2010

40 / 61

Maximum Likelihood (ML)


To use ML, we treat the joint probability distribution of the data as a
function of the unknown coefficients.
If we know the distribution of the data as a function of the
parameters, bMLE is the parameter(s) that maximize the likelihood of
our data.
For probit, the log-likelihood is
ln L (Y1 , ..., Yn | X1i , .., Xki ; 0 , .., k ) =
[yi ln ( 0 + . + k Xk ) + (1 yi ) ln (1 ( 0 + . + k Xk ))]
The MLE then solves
Max [yi ln ( 0 + . + k Xk ) + (1 yi ) ln (1 ( 0 + . + k Xk ))]

0 ... k

Final Review (Duke)

Econ 139/239

Summer II, 2010

41 / 61

Maximum Likelihood (ML)

Max [yi ln ( 0 + . + k Xk ) + (1 yi ) ln (1 ( 0 + . + k Xk ))]

0 ... k

Since this does not have a nice closed form solution, we cant
represent the estimators using simple formulas (like we could with
OLS).
Instead, we must use a computer algorithm to maximize the function
numerically.
But we know that under fairly general conditions, ML estimation is
consistent, asymptotically normal, and efficient.

Final Review (Duke)

Econ 139/239

Summer II, 2010

42 / 61

Inference & Goodness of Fit


Because the MLE is asymptotically normal, statistical inference about
the probit and logit coefficients is carried out in the same manner as
in OLS.
As usual, we can use a t-ratio or F-stat to test hypotheses about one
or more coefficients.
2

We cant use R 2 or R though because the fitted values will still be


somewhat far from Yi so instead, we can use


ln Lmax
probit
Pseudo-R 2 = 1
max
ln (Lbernoulli )
where Lmax
probit is the value of the maximized probit likelihood and
Lmax
bernoulli is the value of the maximized Bernoulli likelihood

Final Review (Duke)

Econ 139/239

Summer II, 2010

43 / 61

Inference & Goodness of Fit

max
The formula for the logit simply replaces Lmax
probit with Llogit :

Pseudo-R 2 = 1



ln Lmax
logit
ln (Lmax
bernoulli )

The Pseudo-R 2 tells us how well the probit or logit does relative to a
simple Bernoulli model, so a higher value means that the probit (or
logit) does a better job of explaining the data.

Final Review (Duke)

Econ 139/239

Summer II, 2010

44 / 61

Panel Data Techniques

Panel data techniques are a powerful method for addressing the


omitted variables problem.
If we are willing to assume that the omitted variables are constant
over time, we can solve the OV problem by collecting panel data from
the same units (e.g. people, firms, states) for several time periods (at
least two).

Final Review (Duke)

Econ 139/239

Summer II, 2010

45 / 61

Panel Data Techniques

Suppose that we have n units each observed for T periods. We can


write our regression model as
Yit = 0 + 1 X1,it + ... + k Xk,it + i + uit

(1)

where the error term now includes two components:


A fixed effect i which includes all unobserved variables that are
constant over time2 for each unit i
A second component uit which contains all the remaining (time-unit
specific) error.

2 If we believe that there is a third component of the error ( ) that varies over time
t
but is constant across units, we can also add a time fixed effect.
Final Review (Duke)

Econ 139/239

Summer II, 2010

46 / 61

Random Effects
If the fixed effect i is uncorrelated with all the included regressors in
all time periods (Cov (i , Xj,it ) = 0) we can still use OLS to estimate
Yit = 0 + 1 X1,it + ... + k Xk,it + i + uit
but it will be more efficient to use an estimator that accounts for the
fact that the observations are no longer iid (due to the presence of
i ).
We can do so by using a particular form of GLS (Generalized Least
Squares) known as the random effects (RE) estimator.
If the fixed effect i is correlated with one of more of the included
regressors (Cov (i , Xj,it ) 6= 0), RE will be inconsistent.
In this case, we should use the fixed effects (FE) estimator, which
differences i away, allowing us to use OLS.
Final Review (Duke)

Econ 139/239

Summer II, 2010

47 / 61

Fixed Effects
Specifically, by subtracting the average of both sides of
Yit = 0 + 1 X1,it + ... + k Xk,it + i + uit
from itself, we are left with

(Yit Y i ) = 1 (X1,it X 1,i ) + ... + k (Xk,it X k,i ) + (uit u i )


which no longer includes the fixed effect i and can be estimated
using OLS.
Intuitively, we are exploiting the panel nature of the data to hold the
unobserved effect (i ) constant, even though we cant measure it.
Since we are regressing changes of Y on changes of X s, the fixed
effect wont play any role in this regression since, by definition, the
fixed effect did not change over time.
Final Review (Duke)

Econ 139/239

Summer II, 2010

48 / 61

Fixed Effects versus Random Effects


When Cov (i , Xj,it ) = 0, both FE and RE are consistent, but RE is
more efficient.
If Cov (i , Xj,it ) 6= 0, FE is unbiased and consistent, but RE is not, so
FE is more robust.
Therefore, you should only use RE if Cov (i , Xj,it ) = 0.
You can test this condition with a Hausman test.

Final Review (Duke)

Econ 139/239

Summer II, 2010

49 / 61

Hausman Test
Formally, the Hausman test involves constructing a test statistic
which measures the normalized difference of the coefficients estimated
using RE and FE respectively.

This test statistic will be distributed 2M , where M is the number of


coefficients (that vary over time).
Since the null hypothesis of the test statistic is that the coefficients
are the same, a rejection of the null implies that RE is inconsistent
(so we should use FE instead).
Final Review (Duke)

Econ 139/239

Summer II, 2010

50 / 61

Instrumental Variables

Instrumental variables (IV) techniques are another powerful method


for addressing the endogeneity problem.
Consider the simple univariate regression
Yi = 0 + 1 Xi + ui

(2)

We know that OLS is inconsistent if Xi is correlated with ui (that is,


if Xi is endogenous).
IV or 2SLS is an estimation technique that can be used instead of
OLS to recover consistent estimates of the parameters.
2SLS can be used when particular variables called instruments are
available.

Final Review (Duke)

Econ 139/239

Summer II, 2010

51 / 61

Instrumental Variables

Yi = 0 + 1 Xi + ui

(2)

A valid instrument Zi must satisfy two conditions:


1

Instrument relevance
Cov (Zi , Xi ) 6= 0 (Usually easy to satisfy)

Instrument exogeneity
Cov (Zi , ui ) = 0 (Usually hard to satisfy)

Instruments allow us to break X into two parts, only one of which is


correlated with u, and then use the good (uncorrelated) part alone
to estimate (2).

Final Review (Duke)

Econ 139/239

Summer II, 2010

52 / 61

Instrumental Variables
So how does IV work?
Assume that the relation between the endogenous variable Xi and the
instrument Zi is described by the following linear model:
Xi = 0 + 1 Zi + vi
where, if Zi is a valid instrument, (0 + 1 Zi ) is uncorrelated with
the error term ui (but Cov (vi , ui ) 6= 0).
2SLS3 estimates the parameter 1 in
Yi = 0 + 1 Xi + ui
using only the component of Xi that is uncorrelated with the error.
3 Although this discussion concerns the univariate case with one instrument, the
general case is a simple extension.
Final Review (Duke)

Econ 139/239

Summer II, 2010

53 / 61

Instrumental Variables
This procedure is called 2SLS because it involves two steps:
1

Estimate (0 + 1 Zi ) by regressing Xi on Zi , using OLS. The


predicted value will then be
b (Xi | Zi ) =
b0 +
b1 Zi
X i = E

Regress Yi on X i , again using OLS, to get b2SLS


& b2SLS
0
1

In practice, the two steps are performed jointly, which also computes
the correct standard errors.
The formula for this 2SLS estimator is given by
n

2SLS
=
1

Zi Z

Zi Z

i =1
n

Yi Y

d (Zi , Yi )
Cov
 = Cov
d (Zi , Xi )
Xi X

i =1

Final Review (Duke)

Econ 139/239

Summer II, 2010

54 / 61

Inference in 2SLS

Given the 2SLS assumptions, the 2SLS estimator is consistent and


asymptotically normal (CAN).
b2SLS has a sampling distribution that is
Specifically, in large
 samples,
 1
approximately N 1 , b22SLS where
1

b22SLS =
1

1 var [(Zi Z ) ui ]
n [Cov (Zi , Xi )]2

which can be estimated by estimating the variance and covariance


terms.
Statistical inference is again straightforward (provided that you use
standard errors that take the two stage procedure into account).

Final Review (Duke)

Econ 139/239

Summer II, 2010

55 / 61

Multiple Regression 2SLS


So what if you have more than one endogenous variable?
For the general case, the equation of interest is
Yi = 0 + 1 X1i + .. + k Xki + k +1 W1i + .. + k +r Wri + ui
The 2SLS estimator is still computed in two stages:
1

Regress each Xji on the instruments (Z1i , ..., Zmi ) and the included
exogenous
 regressors
 (W1i , ..., Wri ) using OLS. Compute the predicted
b1i , ..., X
bki from these k regressions.
values X


b1i , ..., X
bki and the included
Regress Yi on the predicted values X
exogenous regressors (W1i , ..., Wri ) using OLS.

In practice, the two steps are done jointly, in order to compute the
correct standard errors.
Final Review (Duke)

Econ 139/239

Summer II, 2010

56 / 61

Multiple Regression 2SLS: Identification

So how many instruments do we need?


We need at least as many instruments as endogenous regressors
(otherwise we cant estimate the parameters).
If m = k the equation is exactly identified.
If m > k the equation is over-identified.
If m < k the equation is under-identified.

If the equation is under-identified, IV/2SLS cannot be used!


However, if we are over-identified, we can test instrument exogeneity.

Final Review (Duke)

Econ 139/239

Summer II, 2010

57 / 61

Problems with Instrument Relevance

Valid instruments must be both exogenous and relevant.


With relevance the issue is not just whether the instrument is
relevant, but how relevant.
The degree of relevance is called strength.
An instrument Z is weak if Cov (Zi , Xi ) 0.
Instrument weakness is a problem since
d (Zi , Yi )
Cov
b2SLS =
d (Zi , Xi )
Cov
d (Zi , Xi ) 0 then b2SLS explodes (and so will 22SLS ).
so if Cov
b
1

Final Review (Duke)

Econ 139/239

Summer II, 2010

58 / 61

Problems with Instrument Relevance

You can check for weakness (with a single endogenous regressor) by


using the following rule of thumb test
If the F -statistic testing the null hypothesis that the coefficients on the
instruments are all zero in the first stage regression is less than 10, you
have weak instruments.

If your instruments are weak, you should seriously consider using a


different technique (or getting better instruments).

Final Review (Duke)

Econ 139/239

Summer II, 2010

59 / 61

Instrument Exogeneity

If the instruments are not exogenous (Cov (Zi , ui ) 6= 0 ), then the


b s will be correlated with u and 2SLS will be inconsistent.
X
Defeats the purpose of 2SLS since it cant isolate the good part of X .

If our system is over-identified, we can test for instrument exogeneity


by using the test of over-identifying restrictions (OIR), which works as
follows.
Exogeneity means u is uncorrelated with the Z s.
We dont observe ui but we can estimate it with the 2SLS coefficients
u i2SLS = Yi b2SLS
b2SLS
X1i .. b2SLS
0
1
k +r Wri

Final Review (Duke)

Econ 139/239

Summer II, 2010

60 / 61

Instrument Exogeneity
If we use OLS to estimate the regression coefficients in
u i2SLS = 0 + 1 Z1i + .. + m Zmi + m+1 W1i + .. + m+r Wri + ei
we can then use the F -statistic testing the null hypothesis
H0 : 1 = ... = m = 0
to construct the OIR test statistic
d

J = mF 2mk
where m is the number of instruments and k is the number of
endogenous variables.
Since the null hypothesis of this test is that u is uncorrelated with the
Z s, rejecting the null implies that the instruments are not exogenous.
Final Review (Duke)

Econ 139/239

Summer II, 2010

61 / 61

You might also like