Professional Documents
Culture Documents
yg = Xg β + εg
1 Introduction where yg is a Ng × 1 vector, Xg is a Ng × (K + 1) matrix and εg is is a
Ng × 1 vector. Stacking observations cluster by cluster, we can write
This handout extends the handout on ”The Multiple Linear Regression
model” and refers to its definitions and assumptions in section 2. It y = Xβ + ε
relaxes the homoscedasticity assumption (A5a) and allows the error terms
where y = [y1 ... yG
] is N × 1, Xg is N × (K + 1) and εg is N × 1.
to be heteroscedastic and correlated within groups or so-called clusters.
The data generation process (dgp) is fully described by the following
It shows in what situations the parameters of the linear model can be
set of assumptions:
consistently estimated by OLS and how the standard errors need to be
corrected. A1: Linearity
The canonical example (Moulton 1990) for clustering is a regression
yi = xig β + εig and E(εig ) = 0
of individual outcomes (e.g. wages) on explanatory variables of which
some are observed on a more aggregate level (e.g. employment growth A2: Independence
on the state level).
c) (Xg , yg )G
g=1 independently distributed
Clustering also arises when the sampling mechanism first draws a ran-
dom sample of groups (e.g. schools, households, towns) and than surveys A2c means that the observations in one cluster are independent from the
all (or a random sample of) observations within that group. Stratified observations in all other clusters.
sampling, where some observations are intentionally under- or oversam- A3: Strict Exogeneity
pled asks for more sophisticated techniques.
2
a) εig |Xg ∼ N (0, σig )
4 Estimation with OLS where V = G−1 Q−1 ΣQ−1 can be consistently estimated as
G
The parameter β can be estimated with OLS as −1
−1
V̂ = (X X) Xg eg eg Xg (X X)
−1 g=1
β̂OLS = (X X) X y
with eg = yg − Xg β̂OLS .
The OLS estimator of β remains unbiased (under A1, A2c, A3c, A4, This so-called cluster-robust covariance matrix estimator is a gen-
A5c and A6) and normally distributed (additionally assuming A3a) in eralization of Huber(1967) and White(1980).1 It does not impose any
small samples. It is consistent and approximately normally distributed restrictions on the form of both heteroscedasticity and correlation within
(under A1, A2c, A3d, A4, A5c and A6b) in samples with a large number clusters (though we assumed independence of the error terms across clus-
of clusters. However, the OLS estimator is not efficient any more. More ters). We can perform the usual z- and Wald-test for large samples using
importantly, the usual standard errors of the OLS estimator and tests the cluster-robust covariance estimator.
(t-, F -, z-, Wald-) based on them are not valid any more. Note: the cluster-robust covariance matrix is consistent when the
number of clusters G → ∞ and the number of observations per cluster
5 Estimating the Covariance of the OLS Estimator Ng is fixed. In practice this requires a sample with many clusters (50 or
more) and relatively small number of observations per cluster.
The small sample covariance matrix of β̂OLS is under A3c and A5c
Bootstrapping is an alternative method to estimate a cluster-robust
−1 2
−1 covariance matrix under the same assumptions. See the handout on
V = V (β̂OLS |X) = (X X) X σ ΩX (X X)
”The Bootstrap”. Clustering is addressed in the bootstrap by randomly
and differs from usual OLS where V = σ 2 (X X)−1 . Consequently, the drawing clusters g (rather than individual observations ig) and taking
usual estimator V̂ = σ̂ 2 (X X)−1 is incorrect. Usual small sample test all Ng observations for each drawn cluster. This so-called block bootstrap
procedures, such as the F - or t-Test, based on the usual estimator are preserves all within cluster correlation.
therefore not valid.
With the number of clusters G → ∞, the OLS estimator is asymp- 6 Estimation with Cluster Specific Random Effects
totically normally distributed under A1, A2, A3d, A4, A5c and A6b
√ In the cluster specific random effects model, the error covariance matrix
G(β̂ − β) −→ N 0, Q−1 ΣQ−1
d
Ω only depends on the two parameters ρ and σ. These two parameters
can be consistently estimated in samples with many clusters. We could
The OLS estimator is therefore approximately normally distributed in
plug these estimates into Ω to estimate the correct covariance V̂ for the
samples with a large number of clusters
OLS estimator β̂OLS .
A
β̂ ∼ N (β, V ) . 1
Note: the cluster-robust estimator is not clearly attributed to a specific author.
See e.g. http://www.stata.com/support/faqs/stat/robust_ref.html
7 Short Guides to Microeconometrics Clustering in the Linear Model 8