Fix

Practical fixed effects estimation methods for the
three-way error components model

Martyn Andrews
University of Manchester
Thorsten Schank
Universitat ErlangenN
urnberg
Richard Upward
University of Nottingham
July 21, 2005
Abstract
Methods for fixed effects estimation of the three-way error component model are
not yet standard. In this paper, we make the fixed effects methods developed originally
by Abowd, Kramarz & Margolis (1999) for linked worker-firm data more accessible,
where possible, and show how they can be implemented in Stata. There is a caveat
to these methods, however. If the researcher wishes to recover estimates of the error
components themselves, and the number of units at the higher level of aggregation is
large, memory or matrix constraints may make estimation of the components themselves infeasible using Stata.
Introduction
Panel data with three or more dimensions of variation are increasingly available to researchers in various fields. One might have data on workers observed over time and also
the firms in which they work. Alternatively, one might have children in schools or patients
in hospitals. These data are also commonly described as being multilevel or hierarchical.
Note that, in the examples given here, the lower-level units (workers, children, patients)
are not merely nested in the higher-level. They may also move between the higher-level
units. Workers change their employer, for example.
In the econometric literature the linear error components model is frequently used for
panel data. If the fixed (over time) error components are assumed to be uncorrelated with
observed explanatory variables then a random effects estimator may be used. These may
be estimated in Stata in a variety of ways, including the standard xtreg command for
the two-way model and the new xtmixed command for models with a more complicated
hierarchical structure. However, in many cases it may not be desirable to impose the
1
assumption that the error components are uncorrelated with the observed explanatory
variables, in which case fixed effects methods are required to estimate the parameters of
interest.
It is well known that, in the linear model, fixed error components can be modelled using dummies, for example individuals, firms and time. It is also well known that, in a
two-way model with individual and time dummies, algebraic solutions are available for
the estimation of all parameters of interest, including those associated with both sets of
dummies Baltagi (2005, Section 3.2). However, in the case of data strutures such as those
considered here, there is no algebraic transformation which sweeps away all the fixed error
components in one go, and which allows them to be recovered subsqeuently.
Although there is a growing literature in economics, much of it based on the work of
Abowd, Kramarz & Margolis (AKM), the analysis of data with three dimensions of variation (such as linked employer-employee panel data) is not yet routine. There are various
econometric issues to overcome, which mean that routine techniques and packages cannot
be used. AKMs papers suggest these issues are quite technical. The objective of this
paper, therefore, is to make these fixed effects methods more accessible, where possible,
and then show how they can be implemented in Stata.
To illustrate these methods we focus on a dataset of workers who are observed annually,
together with the identity of the firms they work for. In Section 2, we set out the generic
model that best represents the econometrics of fixed effects models using these type of
data. In Section 3 we describe the various methods that can be used to estimate this
generic model. In Section 4, we present some illustrative results using an example dataset
from the Institut f
ur Arbeitsmarkt und Berufsforschung (IAB) in Germany.
A generic model
Consider the following linear three-way error components model:

yit = xit + wjt + ui + qj + i + j + t + it
(1)
Workers are indexed i = 1, . . . , N . They are observed once per period t = 1, . . . , Ti in firm
j = 1, . . . , J. Workers can change firms over time. yit is the dependent variable, xit and ui
are vectors of observable i-level covariates and wjt and qj are vectors of observable j-level
covariates. Both workers and firms are assumed to enter and exit the panel, which means
P
we have an unbalanced panel with Ti observations per worker. There are N = N
i=1 Ti
observations (worker-periods) in total.
The error components (or unobserved heterogeneities) comprise i for the worker and j
for the firm. The third component, t , represents the unobserved time effect. The error
components may be correlated with each other and with any of the observable covariates. We assume it is strictly exogenous, which implies, inter alia, that workers mobility
decisions are independent of it .
Note that both i and ui are variables that are time-invariant for workers and similarly j
and qj are fixed over time for firms. xit , on the other hand, varies across i and t, and wjt
varies across j and t. However, because the data are recorded at the (i, t) level, firm-level
covariates also vary at that level. Thus we should strictly write wJ(i,t)t and qJ(i,t) , where
J(i, t) is the function which maps worker i to firm j at time t.
Equation (1) therefore contains all four possible types of covariate which a researcher might
have about workers and firms. There are K observed covariates in total.
We emphasise that it is usual to assume that the heterogeneity terms i and j are
correlated with the observables. This means that random effects methods are inconsistent,
and so fixed effects methods are needed to estimate the parameters of interest. This, in
turn, means that [, ], the parameter vector associated with the time-invariant variables,
is not identified. Rather than dropping [ui , qj ], it is usual to define
i i + ui
(2)
j j + qj
(3)
yit = xit + wjt + i + j + it .
(4)
and
giving
In the next section we describe how to estimate the parameters of Equation (4) using
various fixed effects methods. We assume throughout that the unobserved time component
t is to be treated as fixed and estimated directly by using time dummies.1 This is why we
have dropped t from (4); we have subsumed these time dummies into one of the vectors
of observable covariates. This means that, hereafter, we are essentally analysing a two-way
model.
Econometric methods
3.1
Spell Fixed Effects
If one is not interested in the estimates of i and j themselves, or in estimating the

parameters on the time-invariant variables ui and qj , consistent estimates of and from
Equation (4) are straightforward to obtain by taking differences or by time-demeaning
1
This will always be practical so long as the time dimension of the panel is relatively short, which it
usually is with these kinds of data.
within each unique worker-firm combination (or spell). This is because for each spell
of a worker within a firm neither i nor j vary. Defining s i + j as spell-level
heterogeneity, which is swept out by subtracting averages at the spell-level, both i and
j have disappeared:
s ) + (wjt w
s ) + (it s ).
yit ys = (xit x
(5)
s = 0 and qj q
s = 0. In
The effects of ui and qj are not identified, because ui u
addition, any variable xit or wjt which is constant within a spell will also not be identified.
One observation per spell is used up in identifying each spell fixed effect.
This is essentially the first method that AKM discuss (Abowd et al. 1999, Section 3.3),
except they use differences rather than mean-deviations. We call this method Spell Fixed
Effects or FE(s). Because one cannot separate the worker and firm heterogeneities, AKM
s , one cannot recover i and j .
do not pursue this method further. Given estimates
It is worth emphasising, however, that for many researchers this spell fixed effects method
is a practical and simple solution which does not present any computational difficulty,
providing the researcher is not interested in analysing the heterogeneity post-estimation.
It is trivial to implement in Stata using the xtreg, fe command.
3.2
Least squares dummy variables (LSDV)
The spell fixed effects method outlined above is not useful if one wishes to recover estimates
of i and j . This might be important if one wants to analyse these terms themselves,
or if one wants to recover estimates of and using Equations (2) and (3) respectively.
An alternative is to use a Least Squares Dummy Variables (LSDV) estimator. The LSDV
estimates of i are inconsistent, although unbiased.2 The estimates of j have the same
properties as those of [, ].
However, direct estimation of (4) using dummy variables when the dataset is large will not
usually be feasible, since this is a model with approximately K + N + J parameters. In
the two-way model this problem is circumvented by using the within transformation which
sweeps out the i-level heterogeneity. But in this model there is no algebraic transformation
of the observables that sweeps away both heterogeneity terms in one go and which allows
them to be recovered subsequently. This is because of the lack of patterning between
workers and the firms they work for.3 To circumvent this problem, AKM note that explicitly including dummy variables for the firm heterogeneity, but sweeping out the worker
heterogeneity algebraically, gives exactly the same solution as the LSDV estimator.4
2
See Wooldridge (2002, ch. 10) for assumptions and properties of panel data models.
More precisely, sort the data by workers and the firm dummies are unpatterned; sort the data by firms,
and the worker dummies are unpatterned.
4
In linear models, there is no distinction between removing the heterogeneity algebraically or adding
two full sets of dummy variables, for workers and firms, and so the terminology LSDV applies to both.
3
More precisely, the researcher must generate a dummy variable for each firm:
Fitj = 1(J(i, t) = j) j = 1, . . . , J,
where 1( ) is the dummy variable indicator function and the function J(i, t) = j maps
worker i at time t to firm j. Now substitute
J(i,t) =
J
X
j Fitj
(6)
j=1
into Equation (4). The i are removed by time-demeaning (or differencing) over i:
i ) + (wjt w
i ) +
yit yi = (xit x
J
X
j (Fitj Fij ) + it .
(7)
j=1
This means that J de-meaned (or differenced) firm dummies actually need creating.5 To
distinguish this estimator from LSDV above, we label this estimator FEiLSDVj. They are
identical estimators, but differ in how they are computed.
It is worth emphasising that firm dummies are no different from any multi-category
dummy, so long as workers can move from one category to another over time (eg region dummies, but not ethnicity dummies). Fitj Fij will be zero for all J dummies for
any worker i who does not change firm. Furthermore, if we have a sample of firms (rather
than the population) Fitj Fij will only be non-zero for workers who change from one firm
within the sample to another firm in the sample. This means that in many cases only
a tiny proportion of workers have any non-zero terms. Identification of j is driven by
the total number of such movers in each firm j. Some small firms may have no movers,
in which case j is not identified. Other small firms may have only a very few movers,
in which case estimates of j will be very imprecise. This means that it may be not be
sensible to estimate j for small firms, and instead one should group small firms together
(this is what AKM and others do.)
There are two potential computational problems with this estimator. The first is the
number of firms J, because the software needs to invert a matrix of dimension (K + J)
(K +J). For many applications, the number of firms is sufficiently small that FEiLSDVj is
computationally feasible. However, some datasets have tens of thousands of firms, or even
hundreds of thousands. The second problem is the requirement that one must create and
store J mean-deviations for N observations, meaning that the data matrix is N (K +J).
This may be prohibitively large for software packages which store all data in memory, such
as Stata.
5
Differencing is ignored hereafter. There are various reasons why it is easier to implement the covariance
transformation. Normally, the decision whether to estimate the model in first differences or use the
covariance transform depends on which give the more efficient estimates. Both estimators are consistent.
See Wooldridge (2002, Section 10.6.3).
Some improvement in the storage efficiency of the J mean-deviated firm dummies can be
achieved in Stata by using the lowest common multiple of all values of Ti if the panel is
sufficiently short. For example, if the data span a maximum of 5 years then Ti can be
any value from [1, 2, 3, 4, 5]. Multiplying F j F j by the lowest common multiple (in this
it
case 60) yields a set of integers which can be stored in Stata as single bytes rather than 4or 8-byte fractions.6 To implement this method the researcher would need to create the
mean deviations manually and use OLS on the transformed data, rather than relying on
xtreg.
The memory requirements of the data matrix for the FEiLSDVj estimator are then approximately (N J) + 4[N (K + 1)] bytes. We require N J bytes for the mean-deviated
firm dummies and 4[N (K + 1)] bytes for the remaining K explanatory variables and the
dependent variable, assuming each is stored as 4-bytes.
3.3
A Classical Minimum Distance (CMD) method
The FEiLSDVj method requires the inversion of a potentially very large (K + J) (K + J)

cross-product matrix, and, in addition, enough memory to store J mean-deviated firm
dummies across N observations. To circumvent the second of these problems, we propose
the following method, based on the fact that only movers between firms identify firm
effects.
We separate the model into observations for movers and non-movers, subscripted by 1
and 2 respectively. There are N1 mover observations and N2 non-mover observations.
We then write Equation (4) in matrix notation, where each model is estimated separately:7
e 1 1 + F
e 1 1 + 1
e1 = X
y
(8)
e 2 2 + 2 .
e2 = X
y
(9)
e 1, X
e 2 and F
e 1 have all been mean-deviated and defined viz y
e1 , y
e2 , X
e1 = MD y1
Note that y
2
etc, where MD I D(D0 D)1 D0 . Denote the variances of the two error terms as 1
e 1 that are the zero vector, that is the J2 firms that
and 2 . We now drop all columns of F
2
e 2 0.
have no turnover. By definition, F
e 2 0 from the model means that,
Because there are often very few movers, eliminating F
by estimating the model for movers and non-movers separately, the memory constraints
noted above are sided-stepped. The Classical Minimum Distance (CMD) estimator forms
a restricted estimator for and from 1 , 2 and 1 .8
In general, denote as the S 1 unrestricted parameter vector and denote as the
6
Storing the mean-deviated firm dummies as integers also appears to improve the accuracy of the matrix
inversion procedure.
7
We dispense with the distinction between xit variables and wjt variables.
8
See Wooldridge (2002, ch. 14.6) for further details.
P 1 restricted parameter vector. The constraint is = h(). In CMD estimation, one

and h() is minimised.
estimates and then finds a such that the distance between
An efficient CMD estimator uses any consistent estimator of asymptotic covariance matrix
b In other
and h(), denoted V.
V to act as weighting matrix for the distance between
words, Efficient CMD solves:
b 1 [
h()]0 V
h()],
min [
whose solution is
b 1 H)1 H0 V
b 1 ,
= (H0 V
when the mapping from to is linear: = H. Also, the appropriate estimator of
with which to conduct inference is
[ )
Avar(
= [H0 Avar(
b 1 H]1 .
[ )
[ )
1 H]1 = [H0 V
Avar(
A test of the validity of the restrictions is given by Wooldridge (2002, Eqn. 14.76):
0V
2 (S P ).
b 1 [
h()]
h()]
[
For the model at hand, the constraint = H is written:
IK

1 = 0
2
IK
IJ
0
where is (2K + J) 1, is (K + J) 1, and H is (2K + J) (K + J). The appropriate

asymptotic covariance matrix is:
"
b =
V
b1
V
0
b2
V
e0 X
e
e0 e
X
1 1 X1 F1
e0 X
e
e0 F
e
F
F
1 1
1 1
12
0
e0 X
e
22 (X
2 2)
Given the general expressions immediately above, it follows that the restricted estimator
b 1 H]1 H0 V
b 1
is given by:
= [H0 V
=
"
b 1 +
= V
1
b 1 0
V
2
0
!#1 "
b 1
V
1
!
b 1
+V
2
and that
"
= [H0 V
b 1 H]1 = V
b 1 +
[ )
Avar(
1
b 1 0
V
2
0
2
0
!#
(10)
!#1
(11)
a (K + J) (K + J) matrix. It should be emphasised that these expressions use standard
b1
(unrobust) covariance matrices. A robust version of this covariance matrix replaces V
b 2 in Equation (11) by robust equivalents. It is wrong to do this in (10), however,
and V
because if the true constraint could be imposed upon the model, one would not end up
with the LSDV estimator.
A standard criticism is that movers and non-movers are different groups of individuals
and so one should model them separately. Before imposing H0 : 1 = 2 , one should test
these restrictions, although this rarely happens. Under H0 :

!0
b 1
V
1
!
)
0V
)
2 (K).
b 1 (
+ (
2
2
2
(12)
It should be emphasised that the only price paid with this approach is that one cannot
2 = 2 . The only difference between this and the LSDV estimator is because
constrain 1
2
1
1
b
b
V1 and V2 come from separate regressions.
3.4
Post-estimation analysis of the error components
Once estimates of (, ) have been made (either using FEiLSDVj or CMD methods), the
researcher can recover estimates of the error components themselves. First compute
J(i,t) =
J
X
j Fitj
(13)
j=1
and then
w
i
i
i = yi i x
(14)
where i averages J(i,t) over t for each i.

Identification of firm effects is only possible within a group, where a group is defined by
the movement of workers between firms. A group contains all the workers who have ever
worked for any of the firms in that group, and all the firms at which any of the workers
were employed. A second (unconnected) group is defined only if no firm in the first group
has ever employed any workers in the second, and no firm in the second group has ever
employed any workers in the first. It is not possible to identify one firm per group because
within each group the mean-deviated firm dummies sum to zero. Some normalisation is
therefore required between groups. In Section 4 we show how to identify groups and how
to perform this normalisation.
Identifying the effects of time-invariant variables

AKM suggest that one can recover estimates of
i and j by estimating Equations (2, 3)
as follows. Thus, one can analyse distributions of j , i , specifically to see whether they
8
are correlated. First, run the auxiliary regressions:

i = const + ui + error
(15)
j = const + qj + error
(16)
which give consistent estimates of , (AKM 1999, Section 3.4.4). Because i is dropped
from (2), the identifying assumption is that Cov(ui , i ) = 0 or else there is omitted
variables bias. Similarly, Cov(qj , j ) = 0 is assumed in (3). One only needs N observations
to estimate (2) and J observations to estimate (3). AKM estimate these equations by
GLS, because of the aggregation to the firm-level. Because there are other causes of
heteroskedasticity, one could use OLS and adjust the covariance matrix for clustering at
the firm-level. Second, the researcher computes
i = i ui
(17)
b.
j = j qj
(18)
and can be defined at three levels of aggregation:
i, t
i replicated Ti times
J(i,t)
PTi
i = t=1
J(i,t) /Ti
P
j = (it)j i /Nj
(Nj is the total number of worker-years observed in firm j.) AKM show that statistics
based on aggregating i and
i to the level of the firm are consistent as Ti goes to infinity
Chamberlain (1984, see also).
An illustrative example
To illustrate these methods we use data from a linked worker-firm dataset made available
by the IAB. The firm data comprise a panel of 4,376 establishments (or plants) from
the former West Germany observed over the period 19931997. The worker data comprise
a panel of 1,930,260 workers who are employed in these plants. A common establishment
identifier is available in both datasets, allowing them to be linked.9 After eliminating observations with missing or incomplete information, the resulting linked dataset has 5,145,098
9
K
olling (2000) provides more information on the IAB establishment panel, Bender, Haas & Klose
(2000) has details on the worker data and Alda, Bender & Gartner (2005) has details on the linked data.
worker-year observations (the i, t level). For each row in the data the identity j of the
plant is recorded.
Table 1: Sample sizes
Whole sample
Workers who move to other IAB
plants
Workers who dont move to other
IAB plants
Workers in plants with movement to other IAB plants
5,145,098
1,930,260
4,376
1,953,774
72,253
23,393
1,821
46,907
5,072,845
1,906,867
4,376
1,906,867
4,883,331
1,816,368
1,821
1,839,882
The first row of Table 1 reports the total sample size in terms of the total number of rows
in the data (N ), the number of workers (N ), the number of plants (J) and the number of
unique worker-firm combinations, or spells (S). Notice that the total number of spells
is only slightly greater than the total number of workers. This is a consequence of having
a sample of plants: a worker will be observed with more than one spell only if they move
from one plant in the sample to another, which is actually very unlikely.
Identification of unobserved plant-effects is driven only by those workers who change plants.
Thus an important sub-sample comprises those workers who have two or more spells
(Si > 1) in IAB plants (IAB movers). There are only 23,393 of these movers, and they
work in 1,821 plants. However, those 1,821 plants employ over 1.8 million workers because
they tend to be larger than plants who employ no IAB movers.
To illustrate our methods, we estimate log-wage equations using a a small set of covariates
of each type (x, w, q and u). The dependent variable yit is the log daily wage in Pfennigen.
Table 2 gives the sample means for the relevant variables.
Table 2: Sample means
Description
log daily wage in Pfennigen

female
married
age
age2 /100
single-plant enterprise
log plant employment
(log plant employment)2
Variable
type
Variable
name
yit
ui
xit
xit
xit
qj
wjt
wjt
lw
female
married
age
age 2
single
lN
lN 2
Mean
9.763
0.213
0.624
39.643
16.827
0.269
7.702
61.429
S.D.
0.278
0.409
0.484
10.539
8.628
0.443
1.454
22.283
A simple OLS estimate of Equation (1) provides a useful benchmark. This estimate treats
i and j as part of the error term, while t is estimated using a dummy for each year.
10
. reg lw female married age age_2 single lN lN_2 year2-year5

Source
Model
Residual
Total
SS
df
MS
Number of obs = 5145098

F( 11,5145086) =
.
Prob > F
= 0.0000
R-squared
= 0.2441
Adj R-squared = 0.2441
Root MSE
=
.2417
97058.6434
11 8823.51304
300561.3595145086 .058417169
397620.0035145097
lw
Coef.
female
married
age
age_2
single
lN
lN_2
year2
year3
year4
year5
_cons
-.1773087
.0126403
.0351288
-.0356353
-.0317087
.0793973
-.0028624
.0295777
.0723557
.0983264
.1078701
8.514487
.077281342
Std. Err.
.0002664
.000244
.0000828
.0000999
.0002526
.0004695
.0000306
.0003233
.0003402
.0003471
.0003587
.0023888
t
-665.56
51.81
424.48
-356.55
-125.51
169.10
-93.53
91.49
212.66
283.26
300.75
3564.36
P>|t|
0.000
0.000
0.000
0.000
0.000
0.000
0.000
0.000
0.000
0.000
0.000
0.000
[95% Conf. Interval]

-.1778308
.0121621
.0349666
-.0358312
-.0322038
.0784771
-.0029224
.0289441
.0716888
.0976461
.1071671
8.509805
-.1767865
.0131185
.035291
-.0354395
-.0312135
.0803176
-.0028024
.0302114
.0730226
.0990068
.108573
8.519169
We now wish to investigate whether any of these parameter estimates are likely to be
biased because of potential correlation between the unobserved error components and the
variables of interest.
It is straightforward to estimate two-way fixed-effects models by using the xtreg command.
The within-i transformation eliminates the unobserved worker-level component of the error
term, i :
11
. xtreg lw female married age age_2 single lN lN_2 year2-year5, fe i(i)

Fixed-effects (within) regression
Group variable (i): id
Number of obs
Number of groups
=
=
5145098
1930260
R-sq:
Obs per group: min =

avg =
max =
1
2.7
5
within = 0.3029
between = 0.1088
overall = 0.0964
corr(u_i, Xb)
lw
Coef.
female
(dropped)
married
.0061755
.0582696
age
age_2
-.0353585
single
-.0049728
lN
-.0016697
.0010928
lN_2
(output omitted )
sigma_u
sigma_e
rho
F(9,3214829)
Prob > F
= -0.6637
.35916844
.06832495
.96507601
F test that all u_i=0:
Std. Err.
.0002202
.0001161
.0001397
.0010414
.0010828
.0000734
28.05
501.69
-253.12
-4.78
-1.54
14.89
P>|t|
0.000
0.000
0.000
0.000
0.123
0.000
= 155200.61
=
0.0000
.005744
.058042
-.0356323
-.0070139
-.0037919
.0009489
.006607
.0584973
-.0350847
-.0029316
.0004526
.0012366
(fraction of variance due to u_i)

F(1930259, 3214829) =
31.69
Prob > F = 0.0000
Note that we cannot estimate a coefficient on female because it does not vary within-i.
There are some significant changes in some of the parameter estimates. For example, the
effect of age has gone up from 0.035 to 0.058, while the effect of age2 remains the same.
This means that the estimated quadratic wage-age profile is steeper, with a much older
turning point. This implies a negative correlation between i and age older workers
have lower (unobserved) earning power. This is reflected in the large correlation (0.6637)
between i and the estimated effect of all the covariates.
12
The within-j transformation eliminates the unobserved plant-level component, j :

. xtreg lw female married age age_2 single lN lN_2 year2-year5, fe i(j)
Group variable (i): idnum
Number of obs
Number of groups
=
=
5145098
4376
R-sq:

avg =
max =
1
1175.8
102106
within = 0.2151
between = 0.0000
overall = 0.1810
corr(u_i, Xb)
lw
Coef.
female
-.1779654
married
.0247614
age
.0327271
age_2
-.0336805
(dropped)
single
lN
-.0694038
lN_2
.0042823
(output omitted )
sigma_u
sigma_e
rho
F(10,5140712)
Prob > F
= 0.0100
.35985478
.20663177
.75204046
Std. Err.
P>|t|
= 140884.11
=
0.0000
.0002496
.0002188
.0000717
.0000866
-712.90
113.16
456.32
-389.10
0.000
0.000
0.000
0.000
-.1784547
.0243325
.0325865
-.0338502
-.1774761
.0251902
.0328676
-.0335109
.0031901
.000225
-21.76
19.03
0.000
0.000
-.0756563
.0038412
-.0631512
.0047233

F(4375, 5140712) =
434.00
Prob > F = 0.0000
The coefficient estimates on female, married and age are all very similar to those from
the OLS regression, suggesting that the correlation between individual-level variables and
j is not very important. However, the wage effect of plant size is now quite different from
those estimated by either the OLS or FE(i) models.
13
We now wish to eliminate both the unobserved worker- and plant-level error components.
The simplest way to do this is to estimate Equation (5) using FE(s):
. egen s=group(i j)
.
. xtreg lw female married age age_2 single lN lN_2 year2-year5, fe i(s)
Group variable (i): s
Number of obs
Number of groups
=
=
5145098
1953774
R-sq:

avg =
max =
1
2.6
5
within = 0.3029
between = 0.1119
overall = 0.0994
corr(u_i, Xb)
lw
Coef.
female
(dropped)
married
.0057338
age
.0582328
-.0350417
age_2
single
(dropped)
lN
-.0102442
lN_2
.0021531
(output omitted )
sigma_u
sigma_e
rho
F(8,3191316)
Prob > F
= -0.6669
.36006301
.067745
.96581076
Std. Err.
P>|t|
= 173364.57
=
0.0000
.0002203
.0001164
.0001396
26.02
500.43
-250.97
0.000
0.000
0.000
.005302
.0580048
-.0353154
.0061657
.0584609
-.034768
.0012252
.0000847
-8.36
25.43
0.000
0.000
-.0126456
.0019871
-.0078428
.002319

F(1953773, 3191316) =
31.89
Prob > F = 0.0000
Note that by defining a spell s in this way we treat as a single spell all unique combinations
of i and j. Therefore a worker who has two periods of employment with employer A,
separated by a period with employer B is treated as having just two spells in total.
If the correct model is given by Equation (4), and if the error components are correlated
with the observed data then these estimates should be preferred to either of the standard
FE or OLS estimates. In this example, estimates from FE(s) are generally very close to
those from FE(i). However, we are now unable to estimate either of the coefficients on
female or single.
We now wish to estimate Equation (4), but we also want to subsequently recover our
estimates of i and j . If we had sufficient memory, we could use the LSDV methods
outlined in Section 3.2. In our example we have N = 5, 145, 098, J = 1, 821 and K = 11,
meaning that we require about 10GB of memory to proceed. At the time of writing this
amount of memory is not available to us (or to many researchers) and so we must use the
CMD method described in Section in Section 3.3.
The dummy variable mover identifies workers who change plant during the sample period.
The variable plantin counts the number of workers in each plant who move.
14
sort i j
by i: gen byte mover = j[1]!=j[_N]
egen plantin = sum(mover), by(j)
save cmd, replace
Only workers with mover=1 contribute to estimates of j , and one cannot estimate j
for plants with plantin=0. To estimate Equation (8) we use xtreg only for movers, and
include a full set of firm dummies:
keep if mover==1
tab j, gen(F_)
local J1 = r(r)
xtreg lw married age age_2 single lN lN_2 year2-year4 F_*, fe i(i)
and
and the variance covariance matrix V
b 1,
We then save the coefficient estimates
1
removing the constant from both:
matrix
matrix
matrix
matrix
B1
B1
V1
V1
=
=
=
=
e(b)
B1["x".."F_J1",1]
e(V)
V1["married".."F_J1","married".."F_J1"]
The process is then repeated for the non-movers. Note that we do not issue a clear
command on its own. Instead use, clear loads in the data without destroying any of the
relevant matrices in memory.
use cmd if mover==0, clear
xtreg y married age age_2 lN lN_2 year2-year4, fe i(i)
matrix BETA2 = e(b)
matrix BETA2 = BETA2["married".."year4",1]
local K = rowsof(BETA2)
matrix V2 = e(V)
matrix V2 = V2["married".."year4","married".."year4"]
given by Equation (10). To do this we

Now we can compute the restricted estimator ,
need to construct the vector
0
and the matrix
b 1 0
V
2
0
and V
b 2.
which is achieved by adding blocks of zeros to
2
matrix B2 = BETA2\J(J1,1,0)
matrix V2inv = J(J1+K,J1+K,0)
matrix V2inv[1,1] = syminv(V2)
Equation (10) is then computed:

matrix DELTA = syminv(syminv(V1)+V2inv)*((syminv(V1)*B1)+(V2inv*B2))
and the variance-covariance matrix of is given by Equation (11):
15
matrix VARDELTA = syminv(syminv(V1)+V2inv)
,V
b 1 ), and we can then
We can label the resulting matrices using the variable names of (
1
display the results in the usual format.
local rownames: rownames B1
matrix rownames DELTA = rownames
matrix rownames VARDELTA = rownames
matrix colnames VARDELTA = rownames
matrix DELTA = DELTA
ereturn post DELTA VARDELTA
ereturn display
Coef.
married
.0057685
age
.0582999
-.0351287
age_2
lN
-.0103505
lN_2
.0021577
year2
-.0060567
year3
.0145369
year4
.0073687
F_1
.1060651
F_2
.1250478
(output omitted )
F_1821
.4486626
Std. Err.
P>|z|
.0002198
.000116
.0001392
.0012258
.0000847
.0000902
.0000931
.0000972
.2684473
.2576193
26.24
502.42
-252.35
-8.44
25.47
-67.13
156.21
75.80
0.40
0.49
0.000
0.000
0.000
0.000
0.000
0.000
0.000
0.000
0.693
0.627
.0053377
.0580724
-.0354016
-.0127531
.0019917
-.0062335
.0143545
.0071782
-.420082
-.3798769
.0061992
.0585273
-.0348559
-.0079479
.0023237
-.0058798
.0147193
.0075593
.6322121
.6299724
.2417074
1.86
0.063
-.0250753
.9224005
The 2 statistic to test whether the restriction imposed by pooling movers and non-movers
is given by Equation (12).
. matrix DELTA = e(b)
. matrix VARDELTA = e(V)
. matrix BETA = DELTA["married".."year4",1]
. matrix PSI = DELTA["F_1".."F_$J1",1]
. matrix CHI2 = ((B1-DELTA)*syminv(V1)*(B1-DELTA) + (BETA2-BETA)*V2inv[K,K]*(BETA2-BETA))
. display as text "Chi^2 statistic: " el(CHI2,1,1)
Chi^2 statistic: 317.33385
. display as text "P-value: " chi2tail(K,el(CHI2,1,1))
P-value: 8.383e-64
Our test statistic strongly rejects the pooling hypothesis H0 : 1 = 2 , namely that the
models for movers and non-movers are the same. Therefore, it would be wrong to estimate
this model by LSDV, even if one could.
16
Post-estimation analysis of error components

The CMD method allows us to recover the estimates of i and j so they can be analysed
and possibly used in auxilliary regressions such as (15) and (16).
which contains the firm-level error component for each firm which has
We have a vector
a mover. Using Equation (13) we can map this back to the data in the following way.
use cmd, clear
egen j1 = group(j) if plantin>0
generate psi=.
forvalues j=1(1)J1 {
quietly replace psi = PSI[j,1] if j1==j
}
assert psi==. if plantin==0
Note that we need a new

The variable psi now contains the appropriate value of .
variable j1 which contains the index only for those firms with movement. We then assert
that plants with no movers do not have an estimated value of j .
An additional complication is that estimates of j cannot be directly compared across
groups, as defined on Page 8. This is because it is arbitrary which j is set equal to
zero for normalisation in each group. The same issue applies to the resulting i . Abowd,
Creecy & Kramarz therefore suggest normalising estimates of j so they have the same
mean across groups. To do this we must first define the groups. We have written an ado
file which creates a new variable which records which group each firm is in. The syntax of
grouping is very simple:
grouping newvarname , ivar(varname) jvar(varname)
In our data we have 33 groups, as shown below.

. grouping g, ivar(i) jvar(j)
New variable g contains grouping indicator
Group 1: 4857672 person-years allocated to groups
(output omitted )
But note that almost all rows in the data belong to group 1. All those workers in plants
with movement to other plants are allocated a group. To normalise the estimates of j
across groups:
17
egen psigbar = mean(psi), by(g)

replace psi = psi-psigbar
To recover estimates of i we can use Equation (14). The easiest way to implement this
in Stata is to use the matrix score command:
matrix x = DELTA["married".."year4",1]
matrix score xb = x
gen theta_it = lw - xb - psi
egen theta = mean(theta_it), by(i)
Finally, we can estimate the auxilliary regressions and see whether the components are
themselves correlated.
. bysort i: gen t=_n
. regress theta
female if t==1, robust
Regression with robust standard errors
theta
Coef.
female
_cons
-.0724758
7.986986
Number of obs = 1816368

F( 1,1816366) =10679.67
Prob > F
= 0.0000
R-squared
= 0.0071
Root MSE
= .35695
Robust
Std. Err.
.0007013
.000287
t
-103.34
.
P>|t|
0.000
0.000

-.0738503
7.986423
-.0711012
7.987548
. bysort j: gen n=_n

. regress psi single if n==1, robust
Regression with robust standard errors
psi
Coef.
single
_cons
-.024473
.0105364
Number of obs
F( 1, 1819)
Prob > F
R-squared
Root MSE
Robust
Std. Err.
.0086226
.0046327
t
-2.84
2.27
P>|t|
0.005
0.023
=
=
=
=
=
1821
8.06
0.0046
0.0048
.17472

-.0413842
.0014505
-.0075618
.0196223
The first regression sample comprises those 1,816,368 workers who work in the 1,821 plants
for which we can estimate j (see Table 1). The second regression sample comprises one
observation for each of these 1,821 plants.
The coefficients on female and single are both smaller than those estimated from the
original OLS regression, suggesting that these original estimates were biased. However, one
should be aware that these auxilliary regressions impose the usual identifying assumptions
18
that the unobserved component of the error is uncorrelated with the observed component,
so Cov(ui , i ) = 0 and Cov(qj , j ) = 0.
Conclusion
We have shown how, using standard Stata code, it is possible to estimate fixed effects
three-way error components models.
There are two points worth emphasising. The first is that researchers who are interested in
estimating unobserved i and j-level heterogeneity, and who have a large number of j-level
units must use the Direct Least Squares algorithm of Abowd, Creecy and Kramarz. In
this paper we explain how the researcher can make the feasible number of plants as large
as possible without having to resort to the Direct Least Squares algorithm. Our CMD
method is virtually identical to the correct FEiLSDVj method, and only differs because
the error variances are different in the mover and non-mover regressions.10
The second point is that it is important to emphasise the estimates of j rely entirely on
workers who change plants, as in any fixed-effects model. If one has a sample of plants,
as here, there are very few movers (we have 1.9 million workers, but only 23,000 movers).
The estimates of j therefore need interpreting with caution.
If researchers who are not interested in estimating the worker and firm heterogeneities
themselves, but merely wish to control for them, Spell-level fixed effects is very straightforward to use.
Acknowledgments
The authors thank the IAB (Institut f
ur Arbeitsmarkt und Berufsforschung, N
urnberg)
for kindly supplying the data, in particular, Lutz Bellmann and Stephan Bender. Financial
support from the British Academy under Grant SG-35691 is also gratefully acknowledged.
The views expressed in this paper are solely those of the authors and are not those of
the IAB. Comments from presentations at the Symposium of Multisource Databases, Universitat ErlangenN
urnberg, July 2004, the 10th Annual Stata Users Group conference,
London 2004, the IAB, the Institute of Social and Economic Research at Essex, and the
Departments of Economics at Aberdeen, Kent, Manchester, and Warwick are gratefully
acknowledged. The usual disclaimer applies. All calculations were performed with Stata
9 SE and the code is available from
http://www.nottingham.ac.uk/economics/staff/details/richard upward.html.
10
In fact, these are estimated as 0.08512 and 0.0682 respectively.
19
References
Abowd, J., Creecy, R. & Kramarz, F. (2002), Computing person and firm effects using
linked longitudinal employer-employee data, Technical Paper 2002-06, U.S. Census
Bureau.
Abowd, J., Kramarz, F. & Margolis, D. (1999), High wage workers and high wage firms,
Econometrica 67, 251333.
Alda, H., Bender, S. & Gartner, H. (2005), The linked employer-employee dataset of the
IAB (LIAB), IAB Discussion Paper 06/2005.
Baltagi, B. (2005), Econometric analysis of panel data, 3rd edn, John Riley.
Bender, S., Haas, A. & Klose, C. (2000), The IAB employment subsample 1975-1995:
Opportunities for analysis provided by the anonymised sample, DP 117, IZA.
Chamberlain, G. (1984), Panel data, in Z. Griliches & M. Intrilligator, eds, Handbook
of Econometrics, Vol. 2, Elsevier, Amsterdam, chapter 22, pp. 1247318.
Kolling, A. (2000), The IAB establishment panel, Schmollers Jahrbuch: Zeitschrift f
ur
Wirtschafts- und Sozialwissenschaften 120, 291300.
Wooldridge, J. (2002), The econometric analysis of cross section and panel data, MIT
Press.
20

Fix

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Fix

Uploaded by

Copyright:

Available Formats

Practical fixed effects estimation methods for the

three-way error components model

Consider the following linear three-way error components model:

yit = xit + wjt + i + j + it .

Spell Fixed Effects

If one is not interested in the estimates of i and j themselves, or in estimating the

Least squares dummy variables (LSDV)

j (Fitj Fij ) + it .

A Classical Minimum Distance (CMD) method

The FEiLSDVj method requires the inversion of a potentially very large (K + J) (K + J)

P 1 restricted parameter vector. The constraint is = h(). In CMD estimation, one

where is (2K + J) 1, is (K + J) 1, and H is (2K + J) (K + J). The appropriate

a (K + J) (K + J) matrix. It should be emphasised that these expressions use standard

Post-estimation analysis of the error components

where i averages J(i,t) over t for each i.

Identifying the effects of time-invariant variables

are correlated. First, run the auxiliary regressions:

and can be defined at three levels of aggregation:

log daily wage in Pfennigen

. reg lw female married age age_2 single lN lN_2 year2-year5

Number of obs = 5145098

[95% Conf. Interval]

. xtreg lw female married age age_2 single lN lN_2 year2-year5, fe i(i)

Obs per group: min =

F test that all u_i=0:

[95% Conf. Interval]

(fraction of variance due to u_i)

Prob > F = 0.0000

The within-j transformation eliminates the unobserved plant-level component, j :

Obs per group: min =

F test that all u_i=0:

[95% Conf. Interval]

(fraction of variance due to u_i)

Prob > F = 0.0000

Obs per group: min =

F test that all u_i=0:

[95% Conf. Interval]

(fraction of variance due to u_i)

Prob > F = 0.0000

given by Equation (10). To do this we

Equation (10) is then computed:

and the variance-covariance matrix of is given by Equation (11):

matrix VARDELTA = syminv(syminv(V1)+V2inv)

[95% Conf. Interval]

Post-estimation analysis of error components

Note that we need a new

In our data we have 33 groups, as shown below.

egen psigbar = mean(psi), by(g)

female if t==1, robust

Regression with robust standard errors

Number of obs = 1816368

[95% Conf. Interval]

. bysort j: gen n=_n

[95% Conf. Interval]

In fact, these are estimated as 0.08512 and 0.0682 respectively.

You might also like

yit = xit + wjt + i + j + it .

j (Fitj Fij ) + it .