You are on page 1of 16

Economics 2810a

Handout for Lecture #7

Lawrence Katz
9/22/14

PROGRAM EVALUATION: METHODS AND APPLICATIONS


Outline
1. The Program Evaluation Problem
--Basic Set-Up, Selection Bias, and Different Treatment Effects: ITT vs. TOT
--Randomized Social Experiments and Eligibility Randomization
--Example: Moving to Opportunity Kling, Liebman, and Katz (2007 EMA), KKL (2001 QJE)
--Instrumental Variables and Local Average Treatment Effects
--Natural or Quasi Experiments
--Differences-in-Differences
2. Estimating the Labor Market Returns to Training Programs
--Ashenfelter-Card (1985): Traditional Nonexperimental Methods -- CETA
--LaLonde (1986): Experimental vs. Nonexperimental Methods -- NSW
--Dehejia-Wahba: (1999) Propensity Score Methods -- NSW
3. Regression Discontinuity Methods Imbens and Lemieux (2008)
--Ludwig and Miller (2007): Estimating the Long-Run Impacts of Head Start
4. Estimating the Effects of Class Size on Student Achievement
-- Krueger (1999): Experimental Tennessee STAR
--Angrist and Lavy (1999): Regression Discontinuity in Cross-Section Setting
--Hoxby (2000): Regression Discontinuity in Panel Setting
5. Estimating Teacher Impacts on Student Achievement Kane and Staiger (2008)
6. Estimating the Effects of Attending a Catholic High School
--Altonji et al. (2005): Sorting on Unobservables "Similar" to Sorting on Observables
I. The Program Evaluation Problem: Basic Set Up, Selection Bias and Treatment Effects
An individual can be in either a treated state "1" or an untreated state "0":
Y0i = outcome for i without the treatment or program
Y1i = outcome for i with the treatment or program
di = 1 if receive treatment and 0 if does not
The actual observed outcome for i is given by:
Yi = diY1i + (1-di)Y0i
The causal effect (gain or loss) from treatment for an individual i is given by
i = Y1i - Y0i
Average Treatment Effect (ATE) = expected gain to a randomly selected person from entire population
ATE = E[Y1i - Y0i]
Mean Effect of Treatment on the Treated (TOT) or Selected Average Treatment Effect (SATE):
TOT = E[Y1i - Y0i | di=1] = E[i | di=1]
Selection bias problem in estimating TOT: The standard approach (ignoring covariates) is to compare the
mean post-program outcome (earnings) of the treatment group (trainees) and the comparison group (nontrainees):

E[Yi|di=1] - E[Yi|di=0]
= E[Y1i|di=1] - E[Y0i|di=1] + {E[Y0i|di=1] - E[Y0i|di=0]}
= E[ i|di=1] + {E[Y0i|di=1] - E[Y0i|di=0]}
The first term is the parameter of interest, but the term in brackets is the selection bias term. In the case of
random assignment of treatment it is zero (up to sampling error when population moments are replaced by
sample moments). If assignment is nonrandom, then omitted variables that affect both Y0i and selection
into the program will generate selection bias. Selection bias arises when the non-participants differ from
the participants in the non-participant state.
Linear Model with Constant Treatment Effect (simplifying assumption):
Y0i = Xi + ui where Xi are observed covariates and ui are unobserved outcome determinants.
E[ui|Xi] = 0 by construction.
The actual outcome for i (Yi) is then given by:
Yi = Y0i + di

(constant treatment effect assumption)

The goal is to estimate . Selection bias is present (ignoring observed covariates for the moment) if
E[Y0i | di=1]

E[Y0i | di=0] or E[ui | di]

Selection bias is present with observed covariates: E[ui | di,Xi]

0.

If ui is correlated with selection into the program even after conditioning on observed covariates then
selection bias will bias estimates of program effects.
More General Model with Covariates with outcomes function of observables (X) and unobservables
(u1, u0):
Y1i = g1(Xi) + u1i
Y0i = g0(Xi) + u0i
where E(u1i) = E(u0i) = 0
TOT = E[Y1i - Y0i | Xi, di=1] = E( i|Xi, di=1) = g1(Xi) - g0(Xi) + E[u1i - u0i| Xi, di=1]
The TOT combines both the structure (the g0 and g1 functions) and the means of the error terms for the
treated in this more general set-up. Experimental (random assignment) approaches allow the
identification of the TOT but without further assumptions do not necessarily allow one to identify the
underlying structural parameters of g1 and g0.
Approaches to Estimating the Average Treatment Effect on the Treated (TOT):
1. Randomized Social Experiment: Random assignment of treatment among applicants to programs
(those that would have participated). A randomized social experiment generates an experimental control
group consisting of those persons who would have participated but were randomly denied access to the
2

program or treatment - thus requires no randomization bias --so that randomization does not change
pool of applicants or behavior per se. The control group provides an estimate of E[Y0|d=1]. Compare
sample means of treatment and control group in experiment.
Under ideal conditions, social experiments recover: F(Y0 | d=1,X) from the distribution of outcomes of
the control group and F(Y1 | d=1, X) from the outcomes of the treatment group if randomization
administered at application stage, no attrition, and no randomization bias. The experiment supplements
missing data by providing an estimate of E[Y0|d=1,X] from the sample mean for the control group and
E[Y1|d=1,X] from the mean of the treatment group (which is also available in observational studies).
Thus the TOT can be estimated, but one can't recover the overall distribution of gains (treatment effects)
F( |d=1,X) without stronger additional assumptions.
How does randomization identify the TOT? Drop i subscripts for ease of presentation.
Let

d*= 1 if person applies to program (would participate unless randomized out)


R = 1 if randomized in, R = 0 if randomized out

Assumption of No randomization bias: Let Y1* and Y0* be the outcomes observed under a regime of
randomization: Absence of randomization bias for the mean gain in the program implies:
E[Y1 | d=1,X] = E[Y1* | d=1,X]
E[Y0 | d=1,X] = E[Y0* | d=1,X]
Randomization operates conditional on d*=1 which is appropriate since we are trying to get the mean
treatment effect for those who would participate in the program.
E[Y | d*=1, R=1, X] = E[Y1 | d=1,X] = g1(X) + E[u1| d=1,X]
E[Y | d*=1, R=0, X] = E[Y0 | d=1,X] = g0(X) + E[u0| d=1,X]
Thus the difference of the mean of the treatment group and control group yields the TOT:
E[Y | d*=1, R=1, X] - E[Y | d*=1, R=0, X] = E[Y1 - Y0|d=1,X] = E[ |d=1,X]
2. Eligibility Randomization: Randomization of eligibility to a program is sometimes a less disruptive
approach to implementing a social experiment. Under this approach, eligibility is randomly assigned (say
across hospitals or training centers), but then individuals and program operators can freely choose to
participate.
Eligibility randomization allows one to directly estimate the mean effect of eligibility for the program on
the population included in the experiment: the effect of eligibility for the program on outcomes is known
as the Intent to Treat (ITT) effect.
Consider a population of persons normally eligible for a program. Let e=1 if the person is kept eligible
after randomization and e=0 if the person loses eligibility. Let d* equal "willingness to participate" -- will
participate if eligible then d*=1. Assume eligibility e is randomly assigned. Ignore other covariates.
Actual participation d = ed*, only participate if eligible and willing to participate.
Intent-to-Treat Effect (ITT) = E[Y | e=1] - E[Y | e=0]
= difference in mean outcomes for eligibles and ineligibles if eligibility is randomly assigned
3

Randomization of eligibility directly allows the estimation of the ITT. But can one estimate the TOT
when there in an eligibility randomization experiment? Yes, but one needs additional assumptions:
The TOT can be estimated from an eligibility randomization experiment under the assumptions that (1)
treatment group (eligibility) assignment is truly random; (2) the effect of treatment group assignment on
outcomes operates only through participating in the program (e.g., using a housing voucher or getting
training) with no direct effect of eligibility per se; and (3) control group members (the ineligibles) cannot
participate in the program.
Under these assumptions, the difference in average outcomes of eligibles and ineligibles divided by
fraction of eligibles who participate provides an unbiased estimate of the TOT:
TOT = E[Y1 - Y0|d=1] = (E[Y|e=1] - E[Y|e=0])/P(d=1|e=1) = ITT/P(d=1|e=1)
where P( ) is the probability function, so that P(d=1 | e=1) is the program participation rate.
Proof:
(*)

E[Y|e=1] = E[Y1|d=1,e=1]P(d=1|e=1) + E(Yo|d=0,e=1)P(d=0|e=1)


= E[Y1|d*=1,e=1]P(d*=1|e=1) + E(Yo|d*=0,e=1)P(d*=0|e=1)

(**)

E[Y|e=0] = E[Y0|d*=1,e=0]P(d*=1|e=0) + E(Yo|d*=0,e=0)P(d*=0|e=0)

But P(d=1|e=1)= P(d*=1|e=1) = P(d*=1|e=0,X) and P(d=0|e=1)= P(d*=0|e=1) = P(d*=0|e=0,X) and


E[Y1|d=1] = E[Y1|d=1,e=1] from eligibility randomization;
and E[Yo|d=0,e=1] = E[Yo|d*=0,e=1] =E[Yo|d*=0,e=0] since no direct effects of eligibility.
So the result follows from subtracting (**) from (*) and then substituting.
Instrumental Variable interpretation of eligibility randomization experiment: Let Z be an
instrument that affects participation d. Z is a legitimate instrument if uncorrelated with outcomes; Z only
effects Y by affecting d: E[Y0|Z=z]=E[Y0] and E[Y1|Z=z]=E[Y1] and participation d is a non-trivial
function of Z: E[d|Z=z] is non-trivial function of Z.
If there exist values of Z in the set z0 that occurs with positive probability and under which
Pr[di=0|Zi member of z0] = 0, then can estimate TOT by defining e=1 for Z not in z0 and e=0 for Z in Z0.
What can one estimate if have a legitimate instrument that affects probability of participation and
can be excluded from outcomes equation? The answer is one can estimate a Local Average Treatment
Effect (LATE) equal to the average treatment effect on those that can be induced to change their behavior
by change in the instrument.
Zi is a random variable where P(w) = E[di| Zi = w] is a nontrivial function of w.
Let treatment for i depend on value of instrument Z: di = di(Zi)
LATE =

z,w

= E[Y1i - Y0i | di(z)

di(w)]

Expected treatment effect for individuals who change treatment status as instrument changes value from
w to z.

The LATE can be identified if Z is a legitimate instrument (can be excluded from the Y equations) and if
have monotonicity condition: di(z) di(w) for all i or di(z) di(w) for all i. Thus we are assuming that
there are no "non-compliers" in terms of Angrist-Imbens-Rubin (1996).
Assume: di(z) > di(w):
(***) E[Yi|Zi=z] - E[Yi|Zi=w] = z(P(z)-P(w)) * E[Y1i - Y0i | di(z) - di(w) = 1]
The LATE is consistently estimated by the ratio of the difference in sample mean outcomes for those with
values of z and w for the instrument over the difference in fraction who are treated.
Proof: The monotonicity condition allows the expected value of Y given the instrument to be decomposed
into 3 groups (never takers (1-P(z)), compliers (P(z)-P(w)), and always-takers (P(w))). The LATE is the
average treatment effect for the compliers (those that change treatment status with different values of the
instrument).
E[Yi|Zi=z] = {P(z)-P(w)}*E[Y1i|di(z)-di(w)=1] + P(w)*E[Y1i|di(z)=di(w)=1]
+ {1-P(z))*E[Y0i|di(z)=di(w) =0]
E[Yi|Zi=w] = {P(z)-P(w)}*E[Y0i|di(z)-di(w)=1] + P(w)*E[Y1i|di(z)=di(w)=1]
+ {1-P(z))*E[Y0i|di(z)=di(w) =0]
Subtracting the second from the first equation yields equation (***).
Case of binary instrument Z:
LATE = E[Y1i - Y0i | di(1) - di(0) = 1] = (E[Yi|Zi=1] - E[Yi|Zi=0])/(P(1)-P(0))
One can't estimate LATE if there exists a fourth category of "defiers" (those with di(z)=0 but di(w)=1)-which arises with a failure of the monotonicity assumption -- see Angrist, Imbens, Rubin (JASA, 96).
Let Yi = Yi(Zi , di )
Exclusion restriction for instrument Z: Yi(1 , di ) = Yi(0 , di ) for d= 0, 1. The instrument Z only affects Y
through D.
-------------------------------------------------------------------------------------------------------------------------Causal Effects of Z on Y for Population Units Classified by di(0) and di(1)
di(0)
0
0

Yi(1 , 0 ) - Yi(0 ,0 ) = 0
Never-Taker

Yi(1 ,1) - Yi(0 ,0 ) = Yi(1) -Yi(0)


Complier

di(1)

1
Yi(1 , 0 ) - Yi(0 ,1 ) = - (Yi(1) - Yi(0))
Defier
Yi(1 , 1 ) - Yi(0 ,1 ) = 0
Always-Taker

MTO Example (Eligibility Randomization Social Experiment):


C = 1 = complier = d*=1; C = 0 = non-complier = d*=0
ITT = E[Y| Z=1] - E[Y|Z=0]
TOT = E[Y| C=1,Z=1] - E[Y|C=1,Z=0] = ITT/P[d=1| Z=1] = ITT/P[C=1]
TOT is estimated difference in outcomes between those who actually use the program (MTO voucher)
those in the Control group who would have used the program (MTO voucher) if it had been offered to
them.
To assess the magnitude of the TOT effect in relative as well as absolute terms, it is useful to have a
benchmark level of the outcome in the absence of treatment for comparison. In equation (6), we use the
mean outcome for treated compliers and the TOT difference to impute the Control Complier Mean
outcome (CCM).
CCM = E[Y|C=1,Z=0]
= E[Y|C=1,Z=1] - {E[Y|C=1,Z=1] - E[Y|C=1,Z=0]}
= E[Y|C=1,Z=1] - TOT
Although E[Y|C=1,Z=0] is not directly observable, E[Y|C=1,Z=1] and TOT can be estimated.

II. Other Issues In Interpreting Program Evaluation Results:


1. Displacement Effects
2. Spillover Effects
3. Marginal Effects
III. Alternative Nonexperimental Estimators: Linear Models with Access to Pre- and Post-Program
Information for a Training Program Latent Index Function Approach
Yt = Xit + uit
We shall assume a training effect that is invariant across individuals, but not time, so

it

t.

Observed earnings for i at t can be written as:


(1)

Yt = Xit + + di t + uit.

We assume E[uit|Xit] = 0 for all i and t.


Nonrandom assignment means selection bias can arise because of dependence between di and uit:
(2)

E[uit|di,Xit]

0.

The decision-making rule for program assignment can be described in terms of a latent index function INi
that depends on both observed (Zi) and unobserved (vi) covariates:
(3)

INi = Zi + vi ; where di = 1 iff INi>0 and di = 0 otherwise.

Alternative nonexperimental estimators try to undo the dependence between uit and di by making
alternative assumptions about the forms of equations (1), (2), and (3).
Dependence between uit and di can arise for two reasons: (1) dependence between Zi and uit (selection on
observables); and (2) dependence between vi and uit (selection on unobservables). Dependence on
observables is easily solved by controlling for those observables; selection on unobservables is a more
difficult problem.
B. Selection on observables (Zi): E(uit|di,Xi)

0 and E(uit|di,Xi,Zi)

0, but

E(uit|di,Xi,Zi) = E(uit|Xi,Zi).
In this case, controlling for observed selection variables solves selection bias problem. The only issue is
getting the right functional form for the control function: E(uit|Xi,Zi) and then insert this into equation (1)
and estimate by regression methods.
1. Propensity score and blocking approach -- Dehejia and Wahba (1999)
2. Exact match comparison approach if can discretize observables -- Card and Sullivan (1988)
C. Selection on unobservables (vi): E(uit|di,Xi) 0 and E(uit|di,Xi,Zi) E(uit|Xi,Zi). In this case, one
needs assumptions about distributions of vi, uit, and Zi to get estimate of t using Control Function
Estimators (Heckits, control for Propensity score) or need an instrument (a variable in Z not included in
X). The availability of longitudinal data (pre- and post-program) allows one to try alternatives such as (1)
Fixed Effects estimates assuming selection based on permanent earnings components; (2) Randomgrowth estimator - allow individual specific trends to effect selection; or (3) Transitory shocks -- see
7

Ashenfelter and Card (85) and by Heckman and Hotz (89). Benefits of pre-program data on comparison
group - Heckman et al. (1998, EMA)
"Natural or Quasi Experiments" (see Angrist-Krueger 1999; Meyer 1995)- Natural Experiments result
when exogenous variation in independent variables of interest is created by (1) sharp exogenous shocks
to markets (baby boom, Black Death, Mariel boatlift); (2) institutional quirks (e.g. draft lotteries;
Maimonides rule for maximum class size in Israel); or (3) exogenous policy changes that affect some
groups but not other groups (e.g. changes in maximum UI that leave the replacement rate unchanged for
workers not at the maximum in one state but not another).
Basic approach: A comparison of changes for treatment and comparison groups (differences-indifferences) or a further difference relative to placebo treatment and comparison groups (differences-indifferences-in-differences). All this can be done in a simple components of variance scheme (time
effects, location effects, treatment group effects, placebo group effects, interaction terms) or by using an
IV - instrument variables - strategy in which one instruments for the treatment dummy variable with the
natural experiment indicator variables. IV Estimates can be interpreted as natural experiments:
Legitimate instruments generate a natural experiment that assigns treatment in a manner independent of
unobserved covariates:
--Vietnam Draft Lottery and Effects of Military service on earnings (Angrist, 1990)
--Date of Birth, Compulsory Schooling Laws, and Returns to Education (Angrist and Krueger, 1991)
--Mariel boatlift (Card, 1990) and impact of mass immigration on local labor markets
--prison overcrowding legislation to estimate impacts of incarceration on crime (Levitt, 1996)
Diffs-in-Diffs and DDD examples: Mariel boatlift (high and low skill workers and low skill immigration):
Treatment city Miami; Placebo city Atlanta (p)
Experimentals: Low Education workers
Controls: High Education workers
Before
E
C

Experimentals
Controls

After
E
C

Diff-in-diff = (E - E) - (C -C)
DDD = [(E - E) - (C -C)] - [(Ep - Ep) - (Cp - Cp)]
Regression approach to DDD with covariates (X) where S= skill group, A = after, and M = Miami:
Yit = Xit + Sit

+ Mit

2 +A t 3

+ SitMit

+SitA t

+A tMit

+ SitA tMit

it

Some Inference issues in DD:


(1) Grouped errors if use micro data (Moulton J. of Econometrics 1986)
(2) Serial correlation (Bertrand, Duflo, Mullainathan QJE 2004)
(3) Are the identification assumptions plausible? Pre-existing trends?
How does one interpret natural experiments? They provide estimates of Local Average Treatment
Effects - impact on those affected by treatment effects -marginal impacts.
Key Issues with Instrumental Variables Models:
(1) Bad instruments / instrument legitimacy
(2) Weak instruments Baker, Bound, Jaeger (JASA 1995)
8

IV. Estimating the Labor Market Impacts of Training Programs


Ashenfelter and Card (1985 RESTAT): They compare the earnings histories of 1976 adult male
enrollees in Comprehensive Employment and Training Act (CETA) training programs to the earnings
histories of non-experimental comparison groups drawn from the March 1976 CPS (likely to be
almost all non-enrollees) matched to Social Security earnings data. They choose CPS respondents
who participated in the labor force in March 1976 and who have the same age distribution as the
CETA trainees. Their goal is to estimate the impact of CETA training participation on earnings
through regression adjustments to control for other differences between the trainees and the
comparison group.
Table 1: The trainees are clearly not a random sample of adult males with earnings in 1976 trainees
are less educated and have lower earnings than the comparison group unlike what one would have in
a randomized experiment in comparing the treatment and control groups.
Table 1 also provides evidence of "Ashenfelter's Dip" trainees experience an earnings decline in
1975 the year before they enter the program.
(1) Simple Differences-in-Differences Estimates:
Suppose earnings yit for individual i year t are given by
yit =

+ dt + Dit +

it

where i is a permanent component (individual fixed effect), dt is an economy-wide component (time


fixed effect), Dit is a dummy variable for participation in training in period that takes on a value of 1
for trainees after and 0 otherwise, and it is serially uncorrelated error term (transitory component of
earnings).
Assume selection into CETA training in is governed by the permanent earnings component
Dit = 1 for t > iff

< y

where y is a constant based on potential trainees discount rates and tastes for training.
In this case, a simple differences-in-differences estimate comparing the change in earnings for
trainees between some pre-training period ( -j) and the post-training period ( +1) to the change in
earnings over the same period for the comparison group provides an unbiased estimate of the training
effect:
E[yi +1 - yi -j | Di +1=1] = (d +1 - d -j) +
E[yi +1 - yi -j | Comparison Group] = (d +1 - d -j) +
where is the fraction of the comparison group that participates in CETA training (contamination
effect). If is trivially small (approximately 0), then one gets an unbiased estimate of through the
differences-in-differences estimator:
E[yi +1 - yi -j | Di +1=1] - E[yi +1 - yi -j | Comparison Group] =

If multiple years of data available, then there are multiple difference-in-differences estimates which
should all be equal up to sampling error. A test for the correct specification is a test of equality of the
alternative d-d estimates.
Table 2: The choice of initial years greatly affects the A&C estimates. If use 1975, then it looks like
large positive effect (from Ashenfelter's dip and mean reversion), earlier years look like negative
effects (from the fact that trainees are individuals with flatter age-earnings profiles than the
comparison group).
Furthermore, Ashenfelter's dip strongly indicates that transitory earnings are likely to play a key role
in training program entry not just the permanent (average) earnings component. Shocks to earnings
also appear to be serially correlated. Thus, one needs a more sophisticated model of earnings
dynamics and program selection.
(2) Components of Variance Estimates: Assume that selection is based on individual-specific fixed
effect and individual-year-specific disturbance tern which display first-order autoregressive serial
correlation.
(i) yit =
where

it =

it-1

+ dt + Dit +

it

+ eit and vi is a random variable independent of earnings components.

Training occurs iff:

yi -k + vi < y

That is, training occurs iff


(ii) zi = (

)+

i -k

+ vi < y -

+ d -k

z ; where

is mean of

Use method of moments: Predict the means and covariances of the earnings of the comparison group
and the trainees using (i) and (ii). Estimate the means and covariances using the comparison group
and the trainee group the sample moments. Match the estimated sample moments to the predicted
moments, get parameter estimates. Use the parameter estimates to predict trainee earnings if they had
not received training. The difference between the predictions of the trainee earnings without training
and the actual earnings of trainings is the estimate of the effect of training .
For the comparison group, the means and covariances are the unconditional means and covariances
from (i):
E(yit) =

+ dt

cov(yit,yis) = 2 + |t s| 2

For the trainees, if we assume that


mean is:

E[ yit | z i < z ] = E ( yit ) +

where * =

i,

it,

var(yit) = 2 + 2

and vi are jointly normally distributed, then the conditional

cov( yit , z i )
E[ z i | z i < z ] = E ( y it ) + Dit [ 2 + |t k | 2 ] *
var( z i )

E[ z i | z i < z ]
>0
var( z i )
10

The mean of trainee earnings differs from the mean of comparison earnings by a training effect plus
the sum of two components (a permanent components and a geometrically declining transitory
component symmetric centered around the selection period) each proportional to *. The model
imposes the restriction that in the pre- and post-training periods the earnings of the trainees and
comparisons diverge in a systematic pattern that depend on only one free parameter *.
The
restrictions of the model are rejected since they fail to capture a systematically weaker trend in
trainees' earnings than in comparison group earnings. A&C supplement the model with individualspecific earnings growth rates trends (gi):
yit =

+ dt + git + Dit +

it

This model does better but still much instability in the estimates.
LaLonde (1986 AER): A classic study in which the estimates of the impact of a training program
from a randomized social experiment the National Supported Work demonstration project are used as
a benchmark (the "true" estimates) to compare to alternative non-experimental estimates using
alternative comparison groups and econometric specifications.

Experimental Estimates: Difference in mean earnings of NSW treatment group and NSW controls
(applicants randomized out of access to the program).
Non-experimental estimates: LaLonde throws away the experimental controls and uses comparison
group with longitudinal earnings histories from the PSID and CPS-SSA earning matched samples.
He differences-in-differences models, more detailed regression models, and Heckit (control function)
estimates.
Key Insights: Experimental treatments and controls look identical (balanced) up to sampling error.
None of the standard non-experimental approaches provide reliable estimates; different estimators
passing standard specification tests give widely varying estimates.
Dehejia and Wahba (1999 JASA): This paper re-examines the use of non-experimental estimators to
estimate the treatment impact on earnings of the NSW demonstration using propensity score
methods.
Propensity Score Method: A semi-parametric generalization of the Heckman selection correction
model. Its advantages are a more general first stage equation and a better diagnostic for assessing the
comparability of the treatment and comparison groups (how balanced are the covariates of treatment
and comparison group members with similar propensity scores). It is an approach to doing "selection
on observables." Thus, the propensity score method is most useful when the econometrician
observes all of the variables used in selection but does not know the exact form of the "rule" that
leads to selection into treatment.

Rosenbaum and Rubin (1983): If treatment and potential outcomes are independent conditional on
the observed covariates X, then they are independent conditional on the conditional probability of
receiving treatment given the covariates.
Let Yi = diYi1 + (1-di)Yi0 where di = 1 if treated, Yi1 = outcome for i with treatment, and Yi0 is the
outcome for i without treatment.
11

Selection on observables Xi: {Yi1, Yi0} di } | Xi or


E[Yij | Xi, di=1] = E[Yij | Xi, di=0] = E[Yij | Xi, di=j]
Conditional on the observables there is no systematic pre-treatment difference between the groups
assigned to treatment and control. This allows one to estimate the TOT by:
TOT = E{E(Yi|Xi, di=1) E(Yi|Xi,di=0) | di=1},
where the outer expectation is over the distribution of Xi|di=1, the distribution of pre-treatment
variables in the treated population. The exact matching estimator approach is to match treatment
and control observations by X, get the difference in mean outcomes (the treatment effect at X=x) at
each value of X, and then get the TOT by averaging these estimated treatment effects over the
distribution of X. When X is high dimensional and takes on many values, this exact matching
approach may be impractical. This is where the propensity score theorem is helpful:
p(Xi) = Pr(di=1 | Xi) = E(di | Xi) = probability of i being assigned to treatment = propensity score
Propensity score theorem: {Yi1, Yi0} di } | Xi

{Yi1, Yi0} di } | p(Xi)

Thus, in this case of selection on observables, adjusting for the propensity score removes the biases
associated with differences in covariates. Why is it sufficient to condition just on the propensity
score? The reason is that under the Rosenbaum-Rubin assumptions for selection on observables
(covariates X), the covariates are independent of assignment to treatment conditional on the
propensity score. In other words, the distribution of covariates should be the same across treatment
and comparison groups for observations with the same propensity score. This implication of the
assumptions for the propensity score approach to be appropriate provides a diagnostic: one can
group observations in strata based on the estimated propensity score and check whether the
covariates are balanced across the treatment and comparison groups in each strata.
Implementing the Propensity Score Approach:
(1)
Start with a parsimonious logit or probit selection equation for treatment and estimate the
propensity to select into the treatment group
(2)
Sort the data according to the estimated propensity score from lowest to highest
(3)
Divide the observations into blocks (or strata) of equal propensity score range
(0-0.1, 0.1-0.2, 0.2-0.3, etc.)
(4)
Do t-tests for the difference-in-means for all covariates across treatment and control
observations in each block
(5a) If all covariates are balanced (no significant differences in means), stop. Use the estimated
propensity scores.
(5b) If a particular block has one or more unbalanced covariates (but there is balance elsewhere),
divide the block into finer blocks and re-evaluate
(5c) If still problems with unbalanced covariates, modify initial logit or probit equation to add
higher order terms in problem covariates and/or further interactions. Re-evaluate.

12

There are a number of different semi-parametric ways to use the propensity score to estimate the
TOT given that you have achieved "balance" in the covariates. The multiplicity of methods arises
when the true functional form of the second stage equation is unknown:
(1)
Control Function: Use your first stage equation to form the Heckman selection correction
term and add it to your second stage regression
(2)
Stratify: Divide the data into blocks based on the propensity score. Run the second stage
equation within each block (this might just be the mean difference in outcomes for
treatment and comparison observations in each block). Calculate the weighted mean of the
within-block estimates to get the TOT (weight by number of treatment observations in each
block).
(3)
Match: Match each treatment observation with a comparison observation, based on
similar propensity scores (find closest match). Treat the data like panel data (like twins
data) and run within-match (match fixed effects) models of the treatment effect.
(4)
Weight: Weight each observation by its propensity score and estimate the second stage
equation (Hirano, Imbens and Ridder 2003).
Dehejia and Wahba (1999) illustrate these methods and show that once one achieves "balance" of
covariates within blocks that these propensity score methods come quite close to the experimental
estimates for the NSW demonstration in contrast to the lack of reliability of other the traditional
econometric non-experimental estimators examined by LaLonde (1986). See table below.

13

You might also like