You are on page 1of 20

Practical fixed effects estimation methods for the

three-way error components model


Martyn Andrews
University of Manchester

Thorsten Schank
Universitat ErlangenN
urnberg

Richard Upward
University of Nottingham
July 21, 2005

Abstract
Methods for fixed effects estimation of the three-way error component model are
not yet standard. In this paper, we make the fixed effects methods developed originally
by Abowd, Kramarz & Margolis (1999) for linked worker-firm data more accessible,
where possible, and show how they can be implemented in Stata. There is a caveat
to these methods, however. If the researcher wishes to recover estimates of the error
components themselves, and the number of units at the higher level of aggregation is
large, memory or matrix constraints may make estimation of the components themselves infeasible using Stata.

Introduction

Panel data with three or more dimensions of variation are increasingly available to researchers in various fields. One might have data on workers observed over time and also
the firms in which they work. Alternatively, one might have children in schools or patients
in hospitals. These data are also commonly described as being multilevel or hierarchical.
Note that, in the examples given here, the lower-level units (workers, children, patients)
are not merely nested in the higher-level. They may also move between the higher-level
units. Workers change their employer, for example.
In the econometric literature the linear error components model is frequently used for
panel data. If the fixed (over time) error components are assumed to be uncorrelated with
observed explanatory variables then a random effects estimator may be used. These may
be estimated in Stata in a variety of ways, including the standard xtreg command for
the two-way model and the new xtmixed command for models with a more complicated
hierarchical structure. However, in many cases it may not be desirable to impose the
1

assumption that the error components are uncorrelated with the observed explanatory
variables, in which case fixed effects methods are required to estimate the parameters of
interest.
It is well known that, in the linear model, fixed error components can be modelled using dummies, for example individuals, firms and time. It is also well known that, in a
two-way model with individual and time dummies, algebraic solutions are available for
the estimation of all parameters of interest, including those associated with both sets of
dummies Baltagi (2005, Section 3.2). However, in the case of data strutures such as those
considered here, there is no algebraic transformation which sweeps away all the fixed error
components in one go, and which allows them to be recovered subsqeuently.
Although there is a growing literature in economics, much of it based on the work of
Abowd, Kramarz & Margolis (AKM), the analysis of data with three dimensions of variation (such as linked employer-employee panel data) is not yet routine. There are various
econometric issues to overcome, which mean that routine techniques and packages cannot
be used. AKMs papers suggest these issues are quite technical. The objective of this
paper, therefore, is to make these fixed effects methods more accessible, where possible,
and then show how they can be implemented in Stata.
To illustrate these methods we focus on a dataset of workers who are observed annually,
together with the identity of the firms they work for. In Section 2, we set out the generic
model that best represents the econometrics of fixed effects models using these type of
data. In Section 3 we describe the various methods that can be used to estimate this
generic model. In Section 4, we present some illustrative results using an example dataset
from the Institut f
ur Arbeitsmarkt und Berufsforschung (IAB) in Germany.

A generic model

Consider the following linear three-way error components model:


yit = xit + wjt + ui + qj + i + j + t + it

(1)

Workers are indexed i = 1, . . . , N . They are observed once per period t = 1, . . . , Ti in firm
j = 1, . . . , J. Workers can change firms over time. yit is the dependent variable, xit and ui
are vectors of observable i-level covariates and wjt and qj are vectors of observable j-level
covariates. Both workers and firms are assumed to enter and exit the panel, which means
P
we have an unbalanced panel with Ti observations per worker. There are N = N
i=1 Ti
observations (worker-periods) in total.
The error components (or unobserved heterogeneities) comprise i for the worker and j
for the firm. The third component, t , represents the unobserved time effect. The error

components may be correlated with each other and with any of the observable covariates. We assume it is strictly exogenous, which implies, inter alia, that workers mobility
decisions are independent of it .
Note that both i and ui are variables that are time-invariant for workers and similarly j
and qj are fixed over time for firms. xit , on the other hand, varies across i and t, and wjt
varies across j and t. However, because the data are recorded at the (i, t) level, firm-level
covariates also vary at that level. Thus we should strictly write wJ(i,t)t and qJ(i,t) , where
J(i, t) is the function which maps worker i to firm j at time t.
Equation (1) therefore contains all four possible types of covariate which a researcher might
have about workers and firms. There are K observed covariates in total.
We emphasise that it is usual to assume that the heterogeneity terms i and j are
correlated with the observables. This means that random effects methods are inconsistent,
and so fixed effects methods are needed to estimate the parameters of interest. This, in
turn, means that [, ], the parameter vector associated with the time-invariant variables,
is not identified. Rather than dropping [ui , qj ], it is usual to define
i i + ui

(2)

j j + qj

(3)

yit = xit + wjt + i + j + it .

(4)

and

giving

In the next section we describe how to estimate the parameters of Equation (4) using
various fixed effects methods. We assume throughout that the unobserved time component
t is to be treated as fixed and estimated directly by using time dummies.1 This is why we
have dropped t from (4); we have subsumed these time dummies into one of the vectors
of observable covariates. This means that, hereafter, we are essentally analysing a two-way
model.

Econometric methods

3.1

Spell Fixed Effects

If one is not interested in the estimates of i and j themselves, or in estimating the


parameters on the time-invariant variables ui and qj , consistent estimates of and from
Equation (4) are straightforward to obtain by taking differences or by time-demeaning
1

This will always be practical so long as the time dimension of the panel is relatively short, which it
usually is with these kinds of data.

within each unique worker-firm combination (or spell). This is because for each spell
of a worker within a firm neither i nor j vary. Defining s i + j as spell-level
heterogeneity, which is swept out by subtracting averages at the spell-level, both i and
j have disappeared:
s ) + (wjt w
s ) + (it s ).
yit ys = (xit x

(5)

s = 0 and qj q
s = 0. In
The effects of ui and qj are not identified, because ui u
addition, any variable xit or wjt which is constant within a spell will also not be identified.
One observation per spell is used up in identifying each spell fixed effect.
This is essentially the first method that AKM discuss (Abowd et al. 1999, Section 3.3),
except they use differences rather than mean-deviations. We call this method Spell Fixed
Effects or FE(s). Because one cannot separate the worker and firm heterogeneities, AKM
s , one cannot recover i and j .
do not pursue this method further. Given estimates
It is worth emphasising, however, that for many researchers this spell fixed effects method
is a practical and simple solution which does not present any computational difficulty,
providing the researcher is not interested in analysing the heterogeneity post-estimation.
It is trivial to implement in Stata using the xtreg, fe command.

3.2

Least squares dummy variables (LSDV)

The spell fixed effects method outlined above is not useful if one wishes to recover estimates
of i and j . This might be important if one wants to analyse these terms themselves,
or if one wants to recover estimates of and using Equations (2) and (3) respectively.
An alternative is to use a Least Squares Dummy Variables (LSDV) estimator. The LSDV
estimates of i are inconsistent, although unbiased.2 The estimates of j have the same
properties as those of [, ].
However, direct estimation of (4) using dummy variables when the dataset is large will not
usually be feasible, since this is a model with approximately K + N + J parameters. In
the two-way model this problem is circumvented by using the within transformation which
sweeps out the i-level heterogeneity. But in this model there is no algebraic transformation
of the observables that sweeps away both heterogeneity terms in one go and which allows
them to be recovered subsequently. This is because of the lack of patterning between
workers and the firms they work for.3 To circumvent this problem, AKM note that explicitly including dummy variables for the firm heterogeneity, but sweeping out the worker
heterogeneity algebraically, gives exactly the same solution as the LSDV estimator.4
2

See Wooldridge (2002, ch. 10) for assumptions and properties of panel data models.
More precisely, sort the data by workers and the firm dummies are unpatterned; sort the data by firms,
and the worker dummies are unpatterned.
4
In linear models, there is no distinction between removing the heterogeneity algebraically or adding
two full sets of dummy variables, for workers and firms, and so the terminology LSDV applies to both.
3

More precisely, the researcher must generate a dummy variable for each firm:
Fitj = 1(J(i, t) = j) j = 1, . . . , J,
where 1( ) is the dummy variable indicator function and the function J(i, t) = j maps
worker i at time t to firm j. Now substitute
J(i,t) =

J
X

j Fitj

(6)

j=1

into Equation (4). The i are removed by time-demeaning (or differencing) over i:
i ) + (wjt w
i ) +
yit yi = (xit x

J
X

j (Fitj Fij ) + it .

(7)

j=1

This means that J de-meaned (or differenced) firm dummies actually need creating.5 To
distinguish this estimator from LSDV above, we label this estimator FEiLSDVj. They are
identical estimators, but differ in how they are computed.
It is worth emphasising that firm dummies are no different from any multi-category
dummy, so long as workers can move from one category to another over time (eg region dummies, but not ethnicity dummies). Fitj Fij will be zero for all J dummies for
any worker i who does not change firm. Furthermore, if we have a sample of firms (rather
than the population) Fitj Fij will only be non-zero for workers who change from one firm
within the sample to another firm in the sample. This means that in many cases only
a tiny proportion of workers have any non-zero terms. Identification of j is driven by
the total number of such movers in each firm j. Some small firms may have no movers,
in which case j is not identified. Other small firms may have only a very few movers,
in which case estimates of j will be very imprecise. This means that it may be not be
sensible to estimate j for small firms, and instead one should group small firms together
(this is what AKM and others do.)
There are two potential computational problems with this estimator. The first is the
number of firms J, because the software needs to invert a matrix of dimension (K + J)
(K +J). For many applications, the number of firms is sufficiently small that FEiLSDVj is
computationally feasible. However, some datasets have tens of thousands of firms, or even
hundreds of thousands. The second problem is the requirement that one must create and
store J mean-deviations for N observations, meaning that the data matrix is N (K +J).
This may be prohibitively large for software packages which store all data in memory, such
as Stata.
5

Differencing is ignored hereafter. There are various reasons why it is easier to implement the covariance
transformation. Normally, the decision whether to estimate the model in first differences or use the
covariance transform depends on which give the more efficient estimates. Both estimators are consistent.
See Wooldridge (2002, Section 10.6.3).

Some improvement in the storage efficiency of the J mean-deviated firm dummies can be
achieved in Stata by using the lowest common multiple of all values of Ti if the panel is
sufficiently short. For example, if the data span a maximum of 5 years then Ti can be
any value from [1, 2, 3, 4, 5]. Multiplying F j F j by the lowest common multiple (in this
it

case 60) yields a set of integers which can be stored in Stata as single bytes rather than 4or 8-byte fractions.6 To implement this method the researcher would need to create the
mean deviations manually and use OLS on the transformed data, rather than relying on
xtreg.
The memory requirements of the data matrix for the FEiLSDVj estimator are then approximately (N J) + 4[N (K + 1)] bytes. We require N J bytes for the mean-deviated
firm dummies and 4[N (K + 1)] bytes for the remaining K explanatory variables and the
dependent variable, assuming each is stored as 4-bytes.

3.3

A Classical Minimum Distance (CMD) method

The FEiLSDVj method requires the inversion of a potentially very large (K + J) (K + J)


cross-product matrix, and, in addition, enough memory to store J mean-deviated firm
dummies across N observations. To circumvent the second of these problems, we propose
the following method, based on the fact that only movers between firms identify firm
effects.
We separate the model into observations for movers and non-movers, subscripted by 1
and 2 respectively. There are N1 mover observations and N2 non-mover observations.
We then write Equation (4) in matrix notation, where each model is estimated separately:7
e 1 1 + F
e 1 1 + 1
e1 = X
y

(8)

e 2 2 + 2 .
e2 = X
y

(9)

e 1, X
e 2 and F
e 1 have all been mean-deviated and defined viz y
e1 , y
e2 , X
e1 = MD y1
Note that y
2
etc, where MD I D(D0 D)1 D0 . Denote the variances of the two error terms as 1
e 1 that are the zero vector, that is the J2 firms that
and 2 . We now drop all columns of F
2

e 2 0.
have no turnover. By definition, F
e 2 0 from the model means that,
Because there are often very few movers, eliminating F
by estimating the model for movers and non-movers separately, the memory constraints
noted above are sided-stepped. The Classical Minimum Distance (CMD) estimator forms
a restricted estimator for and from 1 , 2 and 1 .8
In general, denote as the S 1 unrestricted parameter vector and denote as the
6

Storing the mean-deviated firm dummies as integers also appears to improve the accuracy of the matrix
inversion procedure.
7
We dispense with the distinction between xit variables and wjt variables.
8
See Wooldridge (2002, ch. 14.6) for further details.

P 1 restricted parameter vector. The constraint is = h(). In CMD estimation, one


and h() is minimised.
estimates and then finds a such that the distance between
An efficient CMD estimator uses any consistent estimator of asymptotic covariance matrix
b In other
and h(), denoted V.
V to act as weighting matrix for the distance between
words, Efficient CMD solves:
b 1 [
h()]0 V
h()],
min [
whose solution is
b 1 H)1 H0 V
b 1 ,

= (H0 V
when the mapping from to is linear: = H. Also, the appropriate estimator of
with which to conduct inference is
[ )
Avar(
= [H0 Avar(
b 1 H]1 .
[ )
[ )
1 H]1 = [H0 V
Avar(
A test of the validity of the restrictions is given by Wooldridge (2002, Eqn. 14.76):
0V
2 (S P ).
b 1 [
h()]
h()]
[
For the model at hand, the constraint = H is written:

IK


1 = 0
2
IK

IJ
0

where is (2K + J) 1, is (K + J) 1, and H is (2K + J) (K + J). The appropriate


asymptotic covariance matrix is:
"
b =
V

b1
V

0
b2
V

e0 X
e
e0 e
X
1 1 X1 F1
e0 X
e
e0 F
e
F
F
1 1
1 1

12

0
e0 X
e

22 (X
2 2)

Given the general expressions immediately above, it follows that the restricted estimator
b 1 H]1 H0 V
b 1
is given by:
= [H0 V
=

"
b 1 +
= V
1

b 1 0
V
2
0

!#1 "
b 1
V
1

!
b 1
+V
2

and that
"
= [H0 V
b 1 H]1 = V
b 1 +
[ )
Avar(
1

b 1 0
V
2
0

2
0

!#
(10)

!#1
(11)

a (K + J) (K + J) matrix. It should be emphasised that these expressions use standard

b1
(unrobust) covariance matrices. A robust version of this covariance matrix replaces V
b 2 in Equation (11) by robust equivalents. It is wrong to do this in (10), however,
and V
because if the true constraint could be imposed upon the model, one would not end up
with the LSDV estimator.
A standard criticism is that movers and non-movers are different groups of individuals
and so one should model them separately. Before imposing H0 : 1 = 2 , one should test
these restrictions, although this rarely happens. Under H0 :

!0
b 1
V
1

!
)
0V
)
2 (K).
b 1 (
+ (
2
2
2

(12)

It should be emphasised that the only price paid with this approach is that one cannot
2 = 2 . The only difference between this and the LSDV estimator is because
constrain 1
2
1
1
b
b
V1 and V2 come from separate regressions.

3.4

Post-estimation analysis of the error components

Once estimates of (, ) have been made (either using FEiLSDVj or CMD methods), the
researcher can recover estimates of the error components themselves. First compute
J(i,t) =

J
X

j Fitj

(13)

j=1

and then
w
i
i

i = yi i x

(14)

where i averages J(i,t) over t for each i.


Identification of firm effects is only possible within a group, where a group is defined by
the movement of workers between firms. A group contains all the workers who have ever
worked for any of the firms in that group, and all the firms at which any of the workers
were employed. A second (unconnected) group is defined only if no firm in the first group
has ever employed any workers in the second, and no firm in the second group has ever
employed any workers in the first. It is not possible to identify one firm per group because
within each group the mean-deviated firm dummies sum to zero. Some normalisation is
therefore required between groups. In Section 4 we show how to identify groups and how
to perform this normalisation.

Identifying the effects of time-invariant variables


AKM suggest that one can recover estimates of
i and j by estimating Equations (2, 3)
as follows. Thus, one can analyse distributions of j , i , specifically to see whether they
8

are correlated. First, run the auxiliary regressions:


i = const + ui + error

(15)

j = const + qj + error

(16)

which give consistent estimates of , (AKM 1999, Section 3.4.4). Because i is dropped
from (2), the identifying assumption is that Cov(ui , i ) = 0 or else there is omitted
variables bias. Similarly, Cov(qj , j ) = 0 is assumed in (3). One only needs N observations
to estimate (2) and J observations to estimate (3). AKM estimate these equations by
GLS, because of the aggregation to the firm-level. Because there are other causes of
heteroskedasticity, one could use OLS and adjust the covariance matrix for clustering at
the firm-level. Second, the researcher computes

i = i ui

(17)

b.
j = j qj

(18)

and can be defined at three levels of aggregation:

i, t

i replicated Ti times

J(i,t)

PTi
i = t=1
J(i,t) /Ti

P
j = (it)j i /Nj

(Nj is the total number of worker-years observed in firm j.) AKM show that statistics
based on aggregating i and
i to the level of the firm are consistent as Ti goes to infinity
Chamberlain (1984, see also).

An illustrative example

To illustrate these methods we use data from a linked worker-firm dataset made available
by the IAB. The firm data comprise a panel of 4,376 establishments (or plants) from
the former West Germany observed over the period 19931997. The worker data comprise
a panel of 1,930,260 workers who are employed in these plants. A common establishment
identifier is available in both datasets, allowing them to be linked.9 After eliminating observations with missing or incomplete information, the resulting linked dataset has 5,145,098
9
K
olling (2000) provides more information on the IAB establishment panel, Bender, Haas & Klose
(2000) has details on the worker data and Alda, Bender & Gartner (2005) has details on the linked data.

worker-year observations (the i, t level). For each row in the data the identity j of the
plant is recorded.
Table 1: Sample sizes
Whole sample
Workers who move to other IAB
plants
Workers who dont move to other
IAB plants
Workers in plants with movement to other IAB plants

5,145,098

1,930,260

4,376

1,953,774

72,253

23,393

1,821

46,907

5,072,845

1,906,867

4,376

1,906,867

4,883,331

1,816,368

1,821

1,839,882

The first row of Table 1 reports the total sample size in terms of the total number of rows
in the data (N ), the number of workers (N ), the number of plants (J) and the number of
unique worker-firm combinations, or spells (S). Notice that the total number of spells
is only slightly greater than the total number of workers. This is a consequence of having
a sample of plants: a worker will be observed with more than one spell only if they move
from one plant in the sample to another, which is actually very unlikely.
Identification of unobserved plant-effects is driven only by those workers who change plants.
Thus an important sub-sample comprises those workers who have two or more spells
(Si > 1) in IAB plants (IAB movers). There are only 23,393 of these movers, and they
work in 1,821 plants. However, those 1,821 plants employ over 1.8 million workers because
they tend to be larger than plants who employ no IAB movers.
To illustrate our methods, we estimate log-wage equations using a a small set of covariates
of each type (x, w, q and u). The dependent variable yit is the log daily wage in Pfennigen.
Table 2 gives the sample means for the relevant variables.
Table 2: Sample means
Description

log daily wage in Pfennigen


female
married
age
age2 /100
single-plant enterprise
log plant employment
(log plant employment)2

Variable
type

Variable
name

yit
ui
xit
xit
xit
qj
wjt
wjt

lw
female
married
age
age 2
single
lN
lN 2

Mean

9.763
0.213
0.624
39.643
16.827
0.269
7.702
61.429

S.D.

0.278
0.409
0.484
10.539
8.628
0.443
1.454
22.283

A simple OLS estimate of Equation (1) provides a useful benchmark. This estimate treats
i and j as part of the error term, while t is estimated using a dummy for each year.
10

. reg lw female married age age_2 single lN lN_2 year2-year5


Source
Model
Residual
Total

SS

df

MS

Number of obs = 5145098


F( 11,5145086) =
.
Prob > F
= 0.0000
R-squared
= 0.2441
Adj R-squared = 0.2441
Root MSE
=
.2417

97058.6434
11 8823.51304
300561.3595145086 .058417169
397620.0035145097

lw

Coef.

female
married
age
age_2
single
lN
lN_2
year2
year3
year4
year5
_cons

-.1773087
.0126403
.0351288
-.0356353
-.0317087
.0793973
-.0028624
.0295777
.0723557
.0983264
.1078701
8.514487

.077281342

Std. Err.
.0002664
.000244
.0000828
.0000999
.0002526
.0004695
.0000306
.0003233
.0003402
.0003471
.0003587
.0023888

t
-665.56
51.81
424.48
-356.55
-125.51
169.10
-93.53
91.49
212.66
283.26
300.75
3564.36

P>|t|
0.000
0.000
0.000
0.000
0.000
0.000
0.000
0.000
0.000
0.000
0.000
0.000

[95% Conf. Interval]


-.1778308
.0121621
.0349666
-.0358312
-.0322038
.0784771
-.0029224
.0289441
.0716888
.0976461
.1071671
8.509805

-.1767865
.0131185
.035291
-.0354395
-.0312135
.0803176
-.0028024
.0302114
.0730226
.0990068
.108573
8.519169

We now wish to investigate whether any of these parameter estimates are likely to be
biased because of potential correlation between the unobserved error components and the
variables of interest.
It is straightforward to estimate two-way fixed-effects models by using the xtreg command.
The within-i transformation eliminates the unobserved worker-level component of the error
term, i :

11

. xtreg lw female married age age_2 single lN lN_2 year2-year5, fe i(i)


Fixed-effects (within) regression
Group variable (i): id

Number of obs
Number of groups

=
=

5145098
1930260

R-sq:

Obs per group: min =


avg =
max =

1
2.7
5

within = 0.3029
between = 0.1088
overall = 0.0964

corr(u_i, Xb)

lw

Coef.

female
(dropped)
married
.0061755
.0582696
age
age_2
-.0353585
single
-.0049728
lN
-.0016697
.0010928
lN_2
(output omitted )
sigma_u
sigma_e
rho

F(9,3214829)
Prob > F

= -0.6637

.35916844
.06832495
.96507601

F test that all u_i=0:

Std. Err.

.0002202
.0001161
.0001397
.0010414
.0010828
.0000734

28.05
501.69
-253.12
-4.78
-1.54
14.89

P>|t|

0.000
0.000
0.000
0.000
0.123
0.000

= 155200.61
=
0.0000

[95% Conf. Interval]

.005744
.058042
-.0356323
-.0070139
-.0037919
.0009489

.006607
.0584973
-.0350847
-.0029316
.0004526
.0012366

(fraction of variance due to u_i)


F(1930259, 3214829) =

31.69

Prob > F = 0.0000

Note that we cannot estimate a coefficient on female because it does not vary within-i.
There are some significant changes in some of the parameter estimates. For example, the
effect of age has gone up from 0.035 to 0.058, while the effect of age2 remains the same.
This means that the estimated quadratic wage-age profile is steeper, with a much older
turning point. This implies a negative correlation between i and age older workers
have lower (unobserved) earning power. This is reflected in the large correlation (0.6637)
between i and the estimated effect of all the covariates.

12

The within-j transformation eliminates the unobserved plant-level component, j :


. xtreg lw female married age age_2 single lN lN_2 year2-year5, fe i(j)
Fixed-effects (within) regression
Group variable (i): idnum

Number of obs
Number of groups

=
=

5145098
4376

R-sq:

Obs per group: min =


avg =
max =

1
1175.8
102106

within = 0.2151
between = 0.0000
overall = 0.1810

corr(u_i, Xb)

lw

Coef.

female
-.1779654
married
.0247614
age
.0327271
age_2
-.0336805
(dropped)
single
lN
-.0694038
lN_2
.0042823
(output omitted )
sigma_u
sigma_e
rho

F(10,5140712)
Prob > F

= 0.0100

.35985478
.20663177
.75204046

F test that all u_i=0:

Std. Err.

P>|t|

= 140884.11
=
0.0000

[95% Conf. Interval]

.0002496
.0002188
.0000717
.0000866

-712.90
113.16
456.32
-389.10

0.000
0.000
0.000
0.000

-.1784547
.0243325
.0325865
-.0338502

-.1774761
.0251902
.0328676
-.0335109

.0031901
.000225

-21.76
19.03

0.000
0.000

-.0756563
.0038412

-.0631512
.0047233

(fraction of variance due to u_i)


F(4375, 5140712) =

434.00

Prob > F = 0.0000

The coefficient estimates on female, married and age are all very similar to those from
the OLS regression, suggesting that the correlation between individual-level variables and
j is not very important. However, the wage effect of plant size is now quite different from
those estimated by either the OLS or FE(i) models.

13

We now wish to eliminate both the unobserved worker- and plant-level error components.
The simplest way to do this is to estimate Equation (5) using FE(s):
. egen s=group(i j)
.
. xtreg lw female married age age_2 single lN lN_2 year2-year5, fe i(s)
Fixed-effects (within) regression
Group variable (i): s

Number of obs
Number of groups

=
=

5145098
1953774

R-sq:

Obs per group: min =


avg =
max =

1
2.6
5

within = 0.3029
between = 0.1119
overall = 0.0994

corr(u_i, Xb)

lw

Coef.

female
(dropped)
married
.0057338
age
.0582328
-.0350417
age_2
single
(dropped)
lN
-.0102442
lN_2
.0021531
(output omitted )
sigma_u
sigma_e
rho

F(8,3191316)
Prob > F

= -0.6669

.36006301
.067745
.96581076

F test that all u_i=0:

Std. Err.

P>|t|

= 173364.57
=
0.0000

[95% Conf. Interval]

.0002203
.0001164
.0001396

26.02
500.43
-250.97

0.000
0.000
0.000

.005302
.0580048
-.0353154

.0061657
.0584609
-.034768

.0012252
.0000847

-8.36
25.43

0.000
0.000

-.0126456
.0019871

-.0078428
.002319

(fraction of variance due to u_i)


F(1953773, 3191316) =

31.89

Prob > F = 0.0000

Note that by defining a spell s in this way we treat as a single spell all unique combinations
of i and j. Therefore a worker who has two periods of employment with employer A,
separated by a period with employer B is treated as having just two spells in total.
If the correct model is given by Equation (4), and if the error components are correlated
with the observed data then these estimates should be preferred to either of the standard
FE or OLS estimates. In this example, estimates from FE(s) are generally very close to
those from FE(i). However, we are now unable to estimate either of the coefficients on
female or single.
We now wish to estimate Equation (4), but we also want to subsequently recover our
estimates of i and j . If we had sufficient memory, we could use the LSDV methods
outlined in Section 3.2. In our example we have N = 5, 145, 098, J = 1, 821 and K = 11,
meaning that we require about 10GB of memory to proceed. At the time of writing this
amount of memory is not available to us (or to many researchers) and so we must use the
CMD method described in Section in Section 3.3.
The dummy variable mover identifies workers who change plant during the sample period.
The variable plantin counts the number of workers in each plant who move.
14

sort i j
by i: gen byte mover = j[1]!=j[_N]
egen plantin = sum(mover), by(j)
save cmd, replace

Only workers with mover=1 contribute to estimates of j , and one cannot estimate j
for plants with plantin=0. To estimate Equation (8) we use xtreg only for movers, and
include a full set of firm dummies:
keep if mover==1
tab j, gen(F_)
local J1 = r(r)
xtreg lw married age age_2 single lN lN_2 year2-year4 F_*, fe i(i)

and
and the variance covariance matrix V
b 1,
We then save the coefficient estimates
1
removing the constant from both:
matrix
matrix
matrix
matrix

B1
B1
V1
V1

=
=
=
=

e(b)
B1["x".."F_J1",1]
e(V)
V1["married".."F_J1","married".."F_J1"]

The process is then repeated for the non-movers. Note that we do not issue a clear
command on its own. Instead use, clear loads in the data without destroying any of the
relevant matrices in memory.
use cmd if mover==0, clear
xtreg y married age age_2 lN lN_2 year2-year4, fe i(i)
matrix BETA2 = e(b)
matrix BETA2 = BETA2["married".."year4",1]
local K = rowsof(BETA2)
matrix V2 = e(V)
matrix V2 = V2["married".."year4","married".."year4"]

given by Equation (10). To do this we


Now we can compute the restricted estimator ,
need to construct the vector

0
and the matrix

b 1 0
V
2
0

and V
b 2.
which is achieved by adding blocks of zeros to
2
matrix B2 = BETA2\J(J1,1,0)
matrix V2inv = J(J1+K,J1+K,0)
matrix V2inv[1,1] = syminv(V2)

Equation (10) is then computed:


matrix DELTA = syminv(syminv(V1)+V2inv)*((syminv(V1)*B1)+(V2inv*B2))

and the variance-covariance matrix of is given by Equation (11):

15

matrix VARDELTA = syminv(syminv(V1)+V2inv)

,V
b 1 ), and we can then
We can label the resulting matrices using the variable names of (
1
display the results in the usual format.
local rownames: rownames B1
matrix rownames DELTA = rownames
matrix rownames VARDELTA = rownames
matrix colnames VARDELTA = rownames
matrix DELTA = DELTA
ereturn post DELTA VARDELTA
ereturn display

Coef.
married
.0057685
age
.0582999
-.0351287
age_2
lN
-.0103505
lN_2
.0021577
year2
-.0060567
year3
.0145369
year4
.0073687
F_1
.1060651
F_2
.1250478
(output omitted )
F_1821
.4486626

Std. Err.

P>|z|

[95% Conf. Interval]

.0002198
.000116
.0001392
.0012258
.0000847
.0000902
.0000931
.0000972
.2684473
.2576193

26.24
502.42
-252.35
-8.44
25.47
-67.13
156.21
75.80
0.40
0.49

0.000
0.000
0.000
0.000
0.000
0.000
0.000
0.000
0.693
0.627

.0053377
.0580724
-.0354016
-.0127531
.0019917
-.0062335
.0143545
.0071782
-.420082
-.3798769

.0061992
.0585273
-.0348559
-.0079479
.0023237
-.0058798
.0147193
.0075593
.6322121
.6299724

.2417074

1.86

0.063

-.0250753

.9224005

The 2 statistic to test whether the restriction imposed by pooling movers and non-movers
is given by Equation (12).
. matrix DELTA = e(b)
. matrix VARDELTA = e(V)
. matrix BETA = DELTA["married".."year4",1]
. matrix PSI = DELTA["F_1".."F_$J1",1]
. matrix CHI2 = ((B1-DELTA)*syminv(V1)*(B1-DELTA) + (BETA2-BETA)*V2inv[K,K]*(BETA2-BETA))
. display as text "Chi^2 statistic: " el(CHI2,1,1)
Chi^2 statistic: 317.33385
. display as text "P-value: " chi2tail(K,el(CHI2,1,1))
P-value: 8.383e-64

Our test statistic strongly rejects the pooling hypothesis H0 : 1 = 2 , namely that the
models for movers and non-movers are the same. Therefore, it would be wrong to estimate
this model by LSDV, even if one could.

16

Post-estimation analysis of error components


The CMD method allows us to recover the estimates of i and j so they can be analysed
and possibly used in auxilliary regressions such as (15) and (16).
which contains the firm-level error component for each firm which has
We have a vector
a mover. Using Equation (13) we can map this back to the data in the following way.
use cmd, clear
egen j1 = group(j) if plantin>0
generate psi=.
forvalues j=1(1)J1 {
quietly replace psi = PSI[j,1] if j1==j
}
assert psi==. if plantin==0

Note that we need a new


The variable psi now contains the appropriate value of .
variable j1 which contains the index only for those firms with movement. We then assert
that plants with no movers do not have an estimated value of j .
An additional complication is that estimates of j cannot be directly compared across
groups, as defined on Page 8. This is because it is arbitrary which j is set equal to
zero for normalisation in each group. The same issue applies to the resulting i . Abowd,
Creecy & Kramarz therefore suggest normalising estimates of j so they have the same
mean across groups. To do this we must first define the groups. We have written an ado
file which creates a new variable which records which group each firm is in. The syntax of
grouping is very simple:
grouping newvarname , ivar(varname) jvar(varname)

In our data we have 33 groups, as shown below.


. grouping g, ivar(i) jvar(j)
New variable g contains grouping indicator
Group 1: 4857672 person-years allocated to groups
Group 2: 4860228 person-years allocated to groups
Group 3: 4862179 person-years allocated to groups
Group 4: 4863445 person-years allocated to groups
Group 5: 4863770 person-years allocated to groups
Group 6: 4865494 person-years allocated to groups
(output omitted )
Group 30: 4881458 person-years allocated to groups
Group 31: 4882145 person-years allocated to groups
Group 32: 4882738 person-years allocated to groups
Group 33: 4883331 person-years allocated to groups

But note that almost all rows in the data belong to group 1. All those workers in plants
with movement to other plants are allocated a group. To normalise the estimates of j
across groups:

17

egen psigbar = mean(psi), by(g)


replace psi = psi-psigbar

To recover estimates of i we can use Equation (14). The easiest way to implement this
in Stata is to use the matrix score command:
matrix x = DELTA["married".."year4",1]
matrix score xb = x
gen theta_it = lw - xb - psi
egen theta = mean(theta_it), by(i)

Finally, we can estimate the auxilliary regressions and see whether the components are
themselves correlated.
. bysort i: gen t=_n
. regress theta

female if t==1, robust

Regression with robust standard errors

theta

Coef.

female
_cons

-.0724758
7.986986

Number of obs = 1816368


F( 1,1816366) =10679.67
Prob > F
= 0.0000
R-squared
= 0.0071
Root MSE
= .35695

Robust
Std. Err.
.0007013
.000287

t
-103.34
.

P>|t|
0.000
0.000

[95% Conf. Interval]


-.0738503
7.986423

-.0711012
7.987548

. bysort j: gen n=_n


. regress psi single if n==1, robust
Regression with robust standard errors

psi

Coef.

single
_cons

-.024473
.0105364

Number of obs
F( 1, 1819)
Prob > F
R-squared
Root MSE

Robust
Std. Err.
.0086226
.0046327

t
-2.84
2.27

P>|t|
0.005
0.023

=
=
=
=
=

1821
8.06
0.0046
0.0048
.17472

[95% Conf. Interval]


-.0413842
.0014505

-.0075618
.0196223

The first regression sample comprises those 1,816,368 workers who work in the 1,821 plants
for which we can estimate j (see Table 1). The second regression sample comprises one
observation for each of these 1,821 plants.
The coefficients on female and single are both smaller than those estimated from the
original OLS regression, suggesting that these original estimates were biased. However, one
should be aware that these auxilliary regressions impose the usual identifying assumptions

18

that the unobserved component of the error is uncorrelated with the observed component,
so Cov(ui , i ) = 0 and Cov(qj , j ) = 0.

Conclusion

We have shown how, using standard Stata code, it is possible to estimate fixed effects
three-way error components models.
There are two points worth emphasising. The first is that researchers who are interested in
estimating unobserved i and j-level heterogeneity, and who have a large number of j-level
units must use the Direct Least Squares algorithm of Abowd, Creecy and Kramarz. In
this paper we explain how the researcher can make the feasible number of plants as large
as possible without having to resort to the Direct Least Squares algorithm. Our CMD
method is virtually identical to the correct FEiLSDVj method, and only differs because
the error variances are different in the mover and non-mover regressions.10
The second point is that it is important to emphasise the estimates of j rely entirely on
workers who change plants, as in any fixed-effects model. If one has a sample of plants,
as here, there are very few movers (we have 1.9 million workers, but only 23,000 movers).
The estimates of j therefore need interpreting with caution.
If researchers who are not interested in estimating the worker and firm heterogeneities
themselves, but merely wish to control for them, Spell-level fixed effects is very straightforward to use.

Acknowledgments
The authors thank the IAB (Institut f
ur Arbeitsmarkt und Berufsforschung, N
urnberg)
for kindly supplying the data, in particular, Lutz Bellmann and Stephan Bender. Financial
support from the British Academy under Grant SG-35691 is also gratefully acknowledged.
The views expressed in this paper are solely those of the authors and are not those of
the IAB. Comments from presentations at the Symposium of Multisource Databases, Universitat ErlangenN
urnberg, July 2004, the 10th Annual Stata Users Group conference,
London 2004, the IAB, the Institute of Social and Economic Research at Essex, and the
Departments of Economics at Aberdeen, Kent, Manchester, and Warwick are gratefully
acknowledged. The usual disclaimer applies. All calculations were performed with Stata
9 SE and the code is available from
http://www.nottingham.ac.uk/economics/staff/details/richard upward.html.
10

In fact, these are estimated as 0.08512 and 0.0682 respectively.

19

References
Abowd, J., Creecy, R. & Kramarz, F. (2002), Computing person and firm effects using
linked longitudinal employer-employee data, Technical Paper 2002-06, U.S. Census
Bureau.
Abowd, J., Kramarz, F. & Margolis, D. (1999), High wage workers and high wage firms,
Econometrica 67, 251333.
Alda, H., Bender, S. & Gartner, H. (2005), The linked employer-employee dataset of the
IAB (LIAB), IAB Discussion Paper 06/2005.
Baltagi, B. (2005), Econometric analysis of panel data, 3rd edn, John Riley.
Bender, S., Haas, A. & Klose, C. (2000), The IAB employment subsample 1975-1995:
Opportunities for analysis provided by the anonymised sample, DP 117, IZA.
Chamberlain, G. (1984), Panel data, in Z. Griliches & M. Intrilligator, eds, Handbook
of Econometrics, Vol. 2, Elsevier, Amsterdam, chapter 22, pp. 1247318.
Kolling, A. (2000), The IAB establishment panel, Schmollers Jahrbuch: Zeitschrift f
ur
Wirtschafts- und Sozialwissenschaften 120, 291300.
Wooldridge, J. (2002), The econometric analysis of cross section and panel data, MIT
Press.

20

You might also like