You are on page 1of 39

Panel data methods for microeconometrics using Stata

A. Colin Cameron Univ. of California - Davis


Prepared for West Coast Stata UsersGroup Meeting Based on A. Colin Cameron and Pravin K. Trivedi, Microeconometrics using Stata, Stata Press, forthcoming.

October 25, 2007

A. Colin Cameron

Univ. of California - Davis (Prepared Panel formethods West Coast for Stata Stata UsersGroup Meeting Based October on 25, A. Colin 2007 Cameron 1 / 39 and

1. Introduction
Panel data are repeated measures on individuals (i ) over time (t ). Regress yit on xit for i = 1, ..., N and t = 1, ..., T . Complications compared to cross-section data:
1

Inference: correct (inate) standard errors. This is because each additional year of data is not independent of previous years. Modelling: richer models and estimation methods are possible with repeated measures. Fixed eects and dynamic models are examples. Methodology: dierent areas of applied statistics may apply dierent methods to the same panel data set.

A. Colin Cameron

Univ. of California - Davis (Prepared Panel formethods West Coast for Stata Stata UsersGroup Meeting Based October on 25, A. Colin 2007 Cameron 2 / 39 and

This talk: overview of panel data methods and xt commands for Stata 10 most commonly used by microeconometricians. Three specializations to general panel methods:
1

Short panel: data on many individual units and few time periods. Then data viewed as clustered on the individual unit. Many panel methods also apply to clustered data such as cross-section individual-level surveys clustered at the village level. Causation from observational data: use repeated measures to estimate key marginal eects that are causative rather than mere correlation. Fixed eects: assume time-invariant individual-specic eects. IV: use data from other periods as instruments. Dynamic models: regressors include lagged dependent variables.

A. Colin Cameron

Univ. of California - Davis (Prepared Panel formethods West Coast for Stata Stata UsersGroup Meeting Based October on 25, A. Colin 2007 Cameron 3 / 39 and

Outline
1 2 3 4 5 6 7 8 9 10 11 12

Introduction Linear models overview Example: wages Standard linear panel estimators Linear panel IV estimators Linear dynamic models Long panels Random coe cient models Clustered data Nonlinear panel models overview Nonlinear panel models estimators Conclusions
Univ. of California - Davis (Prepared Panel formethods West Coast for Stata Stata UsersGroup Meeting Based October on 25, A. Colin 2007 Cameron 4 / 39 and

A. Colin Cameron

2.1 Some basic considerations


1 2

6 7

Regular time intervals assumed. Unbalanced panel okay (xt commands handle unbalanced data). [Should then rule out selection/attrition bias]. Short panel assumed, with T small and N ! . [Versus long panels, with T ! and N small or N ! .] Errors are correlated. [For short panel: panel over t for given i , but not over i .] Parameters may vary over individuals or time. Intercept: Individual-specic eects model (xed or random eects). Slopes: Pooling and random coe cients models. Regressors: time-invariant, individual-invariant, or vary over both. Prediction: ignored. [Not always possible even if marginal eects computed.] Dynamic models: possible. [Usually static models are estimated.]
Univ. of California - Davis (Prepared Panel formethods West Coast for Stata Stata UsersGroup Meeting Based October on 25, A. Colin 2007 Cameron 5 / 39 and

A. Colin Cameron

2.2 Basic linear panel models


Pooled model (or population-averaged)
0 yit = + xit + uit .

(1)

Two-way eects model allows intercept to vary over i and t


0 yit = i + t + xit + it .

(2)

Individual-specic eects model


0 yit = i + xit + it ,

(3)

for short panels where time-eects are included as dummies in xit . Random coe cients model allows slopes to vary over i
0 yit = i + xit i + it .
A. Colin Cameron

(4)

Univ. of California - Davis (Prepared Panel formethods West Coast for Stata Stata UsersGroup Meeting Based October on 25, A. Colin 2007 Cameron 6 / 39 and

2.2 Fixed eects versus random eects

0 + ( + ). Individual-specic eects model: yit = xit i it

Fixed eects (FE):


i is possibly correlated with xit regressor xit can be endogenous (though only wrt a time-invariant component of the error) can consistently estimate for time-varying xit (mean-dierencing or rst-dierencing eliminates i ) cannot consistently estimate i if short panel prediction is not possible = E[yit ji , xit ]/xit

A. Colin Cameron

Univ. of California - Davis (Prepared Panel formethods West Coast for Stata Stata UsersGroup Meeting Based October on 25, A. Colin 2007 Cameron 7 / 39 and

Random eects (RE) or population-averaged (PA)


i is purely random (usually iid (0, 2 )). regressor xit must be exogenous corrects standard errors for equicorrelated clustered errors prediction is possible = E[yit jxit ]/xit

Fundamental divide
Microeconometricians: xed eects Many others: random eects.

A. Colin Cameron

Univ. of California - Davis (Prepared Panel formethods West Coast for Stata Stata UsersGroup Meeting Based October on 25, A. Colin 2007 Cameron 8 / 39 and

2.3 Robust inference

Many methods assume it and i (if present) are iid. Yields wrong standard errors if heteroskedasticity or if errors not equicorrelated over time for a given individual. For short panel can relax and use cluster-robust inference.
Allows heteroskedasticity and general correlation over time for given i . Independence over i is still assumed.

Use option vce(cluster) if available (xtreg, xtgee). This is not available for many xt commands.
then use option vce(boot) or vce(cluster) but only if the estimator being used is still consistent.

A. Colin Cameron

Univ. of California - Davis (Prepared Panel formethods West Coast for Stata Stata UsersGroup Meeting Based October on 25, A. Colin 2007 Cameron 9 / 39 and

2.4 Stata linear panel commands

Panel summary

xtset; xtdescribe; xtsum; xtdata; xtline; xttab; xttran Pooled OLS regress Feasible GLS xtgee, family(gaussian) xtgls; xtpcse Random eects xtreg, re; xtregar, re Fixed eects xtreg, fe; xtregar, fe Random slopes xtmixed; quadchk; xtrc First dierences regress (with dierenced data) Static IV xtivreg; xthtaylor Dynamic IV xtabond; xtdpdsys; xtdpd

A. Colin Cameron

Univ. of California - Davis (Prepared Panel formethods West Coast for Stata Stata UsersGroup Meeting October Based on 25, A. 2007 Colin Cameron 10 / 39 and

3.1 Example: wages

PSID wage data 1976-82 on 595 individuals. Balanced. Source: Baltagi and Khanti-Akom (1990). [Corrected version of Cornwell and Rupert (1998).] Goal: estimate causative eect of education on wages. Complication: education is time-invariant in these data. Rules out xed eects. Need to use IV methods (Hausman-Taylor).

A. Colin Cameron

Univ. of California - Davis (Prepared Panel formethods West Coast for Stata Stata UsersGroup Meeting October Based on 25, A. 2007 Colin Cameron 11 / 39 and

3.2 Reading in panel data

xt commands require data to be in long form. Then each observation is an individual-time pair. Original data are often in wide form. Then an observation combines all time periods for an individual, or all individuals for a time period. Use reshape long to convert from wide to long. xtset is used to dene i and t .
xtset id t is an example allows use of panel commands and some time series operators.

A. Colin Cameron

Univ. of California - Davis (Prepared Panel formethods West Coast for Stata Stata UsersGroup Meeting October Based on 25, A. 2007 Colin Cameron 12 / 39 and

3.3 Summarizing panel data

A. Colin Cameron

Univ. of California - Davis (Prepared Panel formethods West Coast for Stata Stata UsersGroup Meeting October Based on 25, A. 2007 Colin Cameron 13 / 39 and

describe, summarize and tabulate confound cross-section and time series variation. Instead use specialized panel commands:
xtdescribe: extent to which panel is unbalanced xtsum: separate within (over time) and between (over individuals) variation xttab: tabulations within and between for discrete data e.g. binary xttrans: transition frequencies for discrete data xtline: time series plot for each individual on one chart xtdata: scatterplots for within and between variation.

A. Colin Cameron

Univ. of California - Davis (Prepared Panel formethods West Coast for Stata Stata UsersGroup Meeting October Based on 25, A. 2007 Colin Cameron 14 / 39 and

4.1 Standard linear panel estimators


1 2 3

Pooled OLS: OLS of yit on xit . Between estimator: OLS of y i on xi . Random eects estimator: FGLS in RE model. b i ) on (xit b Equals OLS i xi ); p of (yit i y 2 2 i = 1 / ( Ti + 2 ) . Within estimator or FE estimator: OLS of (yit First dierence estimator: OLS of (yit Implementation:
xtreg does 2-4 with options be, fe, re xtgee does 3 (with option exchangeable) regress does 1 and 5.

4 5

y i ) on (xit on (xit xi ,t

xi ).
1 ).

yi ,t

1)

Only 4. and 5. give consistent estimates of in FE model.


A. Colin Cameron Univ. of California - Davis (Prepared Panel formethods West Coast for Stata Stata UsersGroup Meeting October Based on 25, A. 2007 Colin Cameron 15 / 39 and

4.2 Example

Coe cients vary considerably across OLS, FE and RE estimators. Cluster-robust standard errors (su x rob) larger even for FE and RE. Coe cient of ed not identied for FE as time-invariant regressor.
A. Colin Cameron Univ. of California - Davis (Prepared Panel formethods West Coast for Stata Stata UsersGroup Meeting October Based on 25, A. 2007 Colin Cameron 16 / 39 and

4.3 Fixed eects versus random eects


Use Hausman test to discriminate between FE and RE.
If xed eects: FE consistent and RE inconsistent. If not xed eects: FE consistent and RE consistent. So see whether dierence between FE and RE is zero. i 1 0h e b e b e d [ H= Cov ] 1 ,RE 1 ,FE 1 ,RE 1 ,FE 1 ,RE

where 1 corresponds to time-varying regressors (or a subset of these).

b 1 ,W ,

Problem: hausman command assumes RE is fully e cient. But not the case here as robust se s for RE dier from default se s. So hausman is incorrect. Instead implement Hausman test using suest or panel bootstrap or Wooldridge (2002) robust version of Hausman test.

A. Colin Cameron

Univ. of California - Davis (Prepared Panel formethods West Coast for Stata Stata UsersGroup Meeting October Based on 25, A. 2007 Colin Cameron 17 / 39 and

5.1 Panel IV

Consider model with possibly transformed variables: yit = + xit0 + uit , where yit = yit or yit = y i for BE or yit = (yit yit = (yit i y i ) for RE. OLS is inconsistent if E[uit jxit ] = 0. y i ) for FE or

So do IV estimation with instruments zit satisfy E[uit jzit ] = 0. Command xtivreg is used, with options be, re or fe. This command does not have option for robust standard errors.

A. Colin Cameron

Univ. of California - Davis (Prepared Panel formethods West Coast for Stata Stata UsersGroup Meeting October Based on 25, A. 2007 Colin Cameron 18 / 39 and

5.2 Hausman-Taylor IV estimator

Problem in the xed eects model


If an endogenous regressor is time-invariant Then FE estimator cannot identify (as time-invariant).

Solution:
Assume the endogenous regressor is correlated only with i (and not with it ) Use exogenous time-varying regressors xit from other periods as instruments

Command xthtaylor does this (and has option amacurdy).

A. Colin Cameron

Univ. of California - Davis (Prepared Panel formethods West Coast for Stata Stata UsersGroup Meeting October Based on 25, A. 2007 Colin Cameron 19 / 39 and

6.1 Linear dynamic panel models

Simple dynamic model regresses yit in polynomial in time.


e.g. Growth curve of child height or IQ as grow older use previous models with xit polynomial in time or age.

Richer dynamic model regresses yit on lags of yit .

A. Colin Cameron

Univ. of California - Davis (Prepared Panel formethods West Coast for Stata Stata UsersGroup Meeting October Based on 25, A. 2007 Colin Cameron 20 / 39 and

6.2 Linear dynamic panel models with individual eects

Leading example: AR(1) model with individual specic eects yit = yi ,t


1

0 + xit + i + it .

Three reasons for yit being serially correlated over time:


True state dependence: via yi ,t 1 Observed heterogeneity: via xit which may be serially correlated Unobserved heterogeneity: via i

Focus on case where i is a xed eect


FE estimator is now inconsistent (if short panel) Instead use Arellano-Bond estimator

A. Colin Cameron

Univ. of California - Davis (Prepared Panel formethods West Coast for Stata Stata UsersGroup Meeting October Based on 25, A. 2007 Colin Cameron 21 / 39 and

6.3 Arellano-Bond estimator


First-dierence to eliminate i (rather than mean-dierence)

(yit

yi ,t

1)

= (yi ,t

yi ,t

2 ) + (xit

xi0 ,t

1 )

+ (it

i ,t i ,t

1 ). 1)

OLS inconsistent as (yi ,t 1 yi ,t 2 ) correlated with (it (even under assumption it is serially uncorrelated). But yi ,t 2 is not correlated with (it i ,t 1 ), so can use yi ,t 2 as an instrument for (yi ,t 1 yi ,t
2 ).

Arellano-Bond is a variation that uses unbalanced set of instruments with further lags as instruments. For t = 3 can use yi 1 , for t = 4 can use yi 1 and yi 2 , and so on. Stata commands
xtabond for Arellano-Bond xtdpdsys for Blundell-Bond (more e cient than xtabond) xtdpd for more complicated models than xtabond and xtdpdsys.
A. Colin Cameron Univ. of California - Davis (Prepared Panel formethods West Coast for Stata Stata UsersGroup Meeting October Based on 25, A. 2007 Colin Cameron 22 / 39 and

7.1 Long panels

For short panels asymptotics are T xed and N ! . For long panels asymptotics are for T !
A dynamic model for the errors is specied, such as AR(1) error Errors may be correlated over individuals Individual-specic eects can be just individual dummies Furthermore if N is small and T large can allow slopes to dier across individuals and test for poolability.

A. Colin Cameron

Univ. of California - Davis (Prepared Panel formethods West Coast for Stata Stata UsersGroup Meeting October Based on 25, A. 2007 Colin Cameron 23 / 39 and

7.2 Commands for long panels

Models with stationary errors:


xtgls allows several dierent models for the error xtpcse is a variation of xtgls xtregar does FE and RE with AR(1) error

Models with nonstationary errors (currently active area):


As yet no Stata commands Add-on levinlin does Levin-Lin-Chu (2002) panel unit root test Add-on ipshin does Im-Pesaran-Shin (1997) panel unit root test in heterogeneous panels Add-on xtpmg for does Pesaran-Smith and Pesaran-Shin-Smith estimation for nonstationary heterogeneous panels with both N and T large.

A. Colin Cameron

Univ. of California - Davis (Prepared Panel formethods West Coast for Stata Stata UsersGroup Meeting October Based on 25, A. 2007 Colin Cameron 24 / 39 and

8.1 Random coe cients model

Generalize random eects model to random slopes. Command xtrc estimates the random coe cients model
0 yit = i + xit i + it ,

where (i , i ) are iid with mean (, ) and variance matrix and it is iid. No vce(robust) option but can use vce(boot) if short panel.

A. Colin Cameron

Univ. of California - Davis (Prepared Panel formethods West Coast for Stata Stata UsersGroup Meeting October Based on 25, A. 2007 Colin Cameron 25 / 39 and

8.2 Mixed or multi-level or hierarchical model

Not used in microeconometrics but used in many other disciplines. Stack all observations for individual i and specify yi = Xi + Zi ui + i where ui is iid (0, G) and Zi is called a design matrix. Random eects: Zi = e (a vector of ones) and ui = i Random coe cients: Zi = Xi . Other models including multi-level models are possible. Command xtmixed estimates this model.

A. Colin Cameron

Univ. of California - Davis (Prepared Panel formethods West Coast for Stata Stata UsersGroup Meeting October Based on 25, A. 2007 Colin Cameron 26 / 39 and

9.1 Clustered data

Consider data on individual i in village j with clustering on village. A cluster-specic model (here village-specic) species
0 yji = i + xji + ji .

Here clustering is on village (not individual) and the repeated measures are over individuals (not time). Use xtset village id Assuming equicorrelated errors can be more reasonable here than with panel data (where correlation dampens over time). So perhaps less need for vce(cluster) after xtreg

A. Colin Cameron

Univ. of California - Davis (Prepared Panel formethods West Coast for Stata Stata UsersGroup Meeting October Based on 25, A. 2007 Colin Cameron 27 / 39 and

9.2 Estimators for clustered data

If i is random use:
regress with option vce(cluster village) xtreg,re xtgee with option exchangeable xtmixed for richer models of error structure

If i is xed use:
xtreg,fe

A. Colin Cameron

Univ. of California - Davis (Prepared Panel formethods West Coast for Stata Stata UsersGroup Meeting October Based on 25, A. 2007 Colin Cameron 28 / 39 and

10.1 Nonlinear panel models overview


General approaches similar to linear case
Pooled estimation or population-averaged Random eects Fixed eects

Complications
Random eects often not tractable so need numerical integration Fixed eects models in short panels are generally not estimable due to the incidental parameters problem.

Here we consider short panels throughout. Standard nonlinear models are:


Binary: logit and probit Counts: Poisson and negative binomial Truncated: Tobit

A. Colin Cameron

Univ. of California - Davis (Prepared Panel formethods West Coast for Stata Stata UsersGroup Meeting October Based on 25, A. 2007 Colin Cameron 29 / 39 and

10.2 Nonlinear panel models


A pooled or population-averaged model may be used. This is same model as in cross-section case, with adjustment for correlation over time for a given individual. A fully parametric model may be specied, with conditional density
0 f (yit ji , xit ) = f (yit , i + xit , ),

t = 1, ..., Ti , i = 1, ...., N , (5)

where denotes additional model parameters such as variance parameters and i is an individual eect. A conditional mean model may be specied, with additive eects
0 E[yit ji , xit ] = i + g (xit )

(6)

or multiplicative eects E[yit ji , xit ] = i


A. Colin Cameron

0 g (xit ).

(7)

Univ. of California - Davis (Prepared Panel formethods West Coast for Stata Stata UsersGroup Meeting October Based on 25, A. 2007 Colin Cameron 30 / 39 and

10.3 Nonlinear panel commands

Counts Pooled poisson negbin GEE (PA) xtgee,family(poisson) xtgee,family(nbinomial) RE xtpoisson, re xtnegbin, fe Random slopes xtmepoisson FE xtpoisson, fe xtnegbin, fe plus tobit and xttobit.

Binary logit probit xtgee,family(binomial) link(logit xtgee,family(poisson) link(probit xtlogit, re xtprobit, re xtmelogit xtlogit, fe

A. Colin Cameron

Univ. of California - Davis (Prepared Panel formethods West Coast for Stata Stata UsersGroup Meeting October Based on 25, A. 2007 Colin Cameron 31 / 39 and

11.1 Pooled or Population-averaged estimation


Extend pooled OLS
Give the usual cross-section command for conditional mean models or conditional density models but then get cluster-robust standard errors Probit example: probit y x, vce(cluster id) or xtgee y x, fam(binomial) link(probit) corr(ind) vce(cluster id)

Extend pooled feasible GLS


Estimate with an assumed correlation structure over time Equicorrelated probit example: xtprobit y x, pa vce(boot) or xtgee y x, fam(binomial) link(probit) corr(exch) vce(cluster id)

A. Colin Cameron

Univ. of California - Davis (Prepared Panel formethods West Coast for Stata Stata UsersGroup Meeting October Based on 25, A. 2007 Colin Cameron 32 / 39 and

11.2 Random eects estimation


Assume individual-specic eect i has specied distribution g (i j). Then the unconditional density for the i th observation is f (yit , ..., yiT jxi 1 , ..., xiT , , , ) Z h i T = f ( y j x , , , ) g (i j)d i . it it i t =1

(8)

Analytical solution:

For Poisson with gamma random eect For negative binomial with gamma eect Use xtpoisson, re and xtnbreg, re

No analytical solution:
For other models. Instead use numerical integration (only univariate integration is required). Assume normally distributed random eects. Use re option for xtlogit, xtprobit Use normal option for xtpoisson and xtnegbin
Univ. of California - Davis (Prepared Panel formethods West Coast for Stata Stata UsersGroup Meeting October Based on 25, A. 2007 Colin Cameron 33 / 39 and

A. Colin Cameron

11.2 Random slopes estimation

Can extend to random slopes.


Nonlinear generalization of xtmixed Then higher-dimensional numerical integral. Use adaptive Gaussian quadrature

Stata commands are:


xtmelogit for binary data xtmepoisson for counts

Stata add-on that is very rich:


gllamm (generalized linear and latent mixed models) Developed by Sophia Rabe-Hesketh and Anders Skrondal.

A. Colin Cameron

Univ. of California - Davis (Prepared Panel formethods West Coast for Stata Stata UsersGroup Meeting October Based on 25, A. 2007 Colin Cameron 34 / 39 and

11.3 Fixed eects estimation

In general not possible in short panels. Incidental parameters problem:


N xed eects i plus K regressors means (N + K ) parameters But (N + K ) ! as N ! Need to eliminate i by some sort of dierencing possible for Poisson, negative binomial and logit.

Stata commands
xtlogit, fe xtpoisson, fe (better to use xtpqml as robust se s) xtnegbin, fe

Fixed eects extended to dynamic models for logit and probit. No Stata command.

A. Colin Cameron

Univ. of California - Davis (Prepared Panel formethods West Coast for Stata Stata UsersGroup Meeting October Based on 25, A. 2007 Colin Cameron 35 / 39 and

12. Conclusion
Stata provides commands for panel models and estimators commonly used in microeconometrics and biostatistics. Stata also provides diagnostics and postestimation commands, not presented here. The emphasis is on short panels. Some commands provide cluster-robust standard errors, some do not. A big distinction is between xed eects models, emphasized by microeconometricians, and random eects and mixed models favored by many others. Extensions to nonlinear panel models exist, though FE models may not be estimable with short panels. This presentation draws on two chapters in Cameron and Trivedi, Microeconometrics using Stata, forthcoming.
A. Colin Cameron Univ. of California - Davis (Prepared Panel formethods West Coast for Stata Stata UsersGroup Meeting October Based on 25, A. 2007 Colin Cameron 36 / 39 and

Book Outline
For Cameron and Trivedi, Microeconometrics using Stata, forthcoming. 1. Stata basics 2. Data management and graphics 3. Linear regression basics 4. Simulation 5. GLS regression 6. Linear instrumental variable regression 7. Quantile regression 8. Linear panel models 9. Nonlinear regression methods 10. Nonlinear optimization methods 11. Testing methods 12. Bootstrap methods
A. Colin Cameron Univ. of California - Davis (Prepared Panel formethods West Coast for Stata Stata UsersGroup Meeting October Based on 25, A. 2007 Colin Cameron 37 / 39 and

Book Outline (continued)

13. Binary outcome models 14. Multinomial models 15. Tobit and selection models 16. Count models 17. Nonlinear panel models 18. Topics A. Programming in Stata B. Mata

A. Colin Cameron

Univ. of California - Davis (Prepared Panel formethods West Coast for Stata Stata UsersGroup Meeting October Based on 25, A. 2007 Colin Cameron 38 / 39 and

Econometrics graduate-level panel data texts


Comprehensive panel texts
Baltagi, B.H. (1995, 2001, 200?), Econometric Analysis of Panel Data, 1st and 2nd editions, New York, John Wiley. Hsiao, C. (1986, 2003), Analysis of Panel Data, 1st and 2nd editions, Cambridge, UK, Cambridge University Press.

More selective advanced panel texts


Arellano, M. (2003), Panel Data Econometrics, Oxford, Oxford University Press. Lee, M.-J. (2002), Panel Data Econometrics: Methods-of-Moments and Limited Dependent Variables, San Diego, Academic Press.

Texts with several chapters on panel


Cameron, A.C. and P.K. Trivedi (2005), Microeconometrics: Methods and Applications, New York, Cambridge University Press. Greene, W.H. (2003), Econometric Analysis, fth edition, Upper Saddle River, NJ, Prentice-Hall. Wooldridge, J.M. (2002, 200?), Econometric Analysis of Cross Section and Panel Data, Cambridge, MA, MIT Press.
A. Colin Cameron Univ. of California - Davis (Prepared Panel formethods West Coast for Stata Stata UsersGroup Meeting October Based on 25, A. 2007 Colin Cameron 39 / 39 and

You might also like