You are on page 1of 121

Fixed and Random Effects

Jos Elkink

April, 2008

Introduction

Motivation

Fixed effects

Random effects

Random coefficients

Further information

Outline
1

Introduction

Motivation

Fixed effects

Random effects

Random coefficients

Further information

Four topics
Missing data
Fixed & random effects
Time-series models
Causation and inference

March 27
April 3
April 10
April 17

Outline
1

Introduction

Motivation

Fixed effects

Random effects

Random coefficients

Further information

Motivations
Clustered sampling

Sampling strategies
Probability sampling:

Sampling strategies
Probability sampling:
Simple random sampling

Simple random sampling


The sampling here is a purely random
selection from the sampling frame, selected
without replacement.

Simple random sampling


The sampling here is a purely random
selection from the sampling frame, selected
without replacement.
Each subject from a population has the exact
same chance of being selected in the sample.

Simple random sampling


The sampling here is a purely random
selection from the sampling frame, selected
without replacement.
Each subject from a population has the exact
same chance of being selected in the sample.
The sample probability for each subject is the
same.

Sampling strategies
Probability sampling:
Simple random sampling
Systematic random sampling

Sampling strategies
Probability sampling:
Simple random sampling
Systematic random sampling
Stratified sampling

Sampling strategies
Probability sampling:
Simple random sampling
Systematic random sampling
Stratified sampling
Cluster sampling

Cluster sampling
To reduce costs, clusters are (randomly)
sampled first, before lower levels are
clustered.

Cluster sampling
To reduce costs, clusters are (randomly)
sampled first, before lower levels are
clustered.
E.g. selecting schools before selecting
students, so that fewer schools need to be
visited.

Cluster sampling
To reduce costs, clusters are (randomly)
sampled first, before lower levels are
clustered.
E.g. selecting schools before selecting
students, so that fewer schools need to be
visited.
Individual observations from a clustered
sample are not independent.

Motivations
Clustered sampling
Inherent structure

Examples
schools
classes
firms
countries
doctors
subjects
interviewers
judges

teachers
pupils
employees
political parties
patients
measurements
respondents
suspects

Motivations
Clustered sampling
Inherent structure
Panel data

Motivations
Clustered sampling
Inherent structure
Panel data
Time-Series Cross-Section

Multilevel characteristics
Observations are not completely
independent

Multilevel characteristics
Observations are not completely
independent
Variance can be divided in
between-group and within-group
variances

Multilevel characteristics
Observations are not completely
independent
Variance can be divided in
between-group and within-group
variances
Variables can be measured at either
micro- or marco-level, or both

Example

4
x

Overall mean

4
x

Group means

4
x

Between variation

slope = 0.683

4
x

Group means

4
x

0.0
0.5
1.0
1.5

y.dev

0.5

1.0

Within variation

4
x

Outline
1

Introduction

Motivation

Fixed effects

Random effects

Random coefficients

Further information

Pooled model

When we simply run a regression using all


micro-level data, ignoring the multilevel
structure, we call this a pooled model.

Pooled model
If we have some observations at the
macro-level, we are artificially increasing the
number of observations.

Pooled model
If we have some observations at the
macro-level, we are artificially increasing the
number of observations. Thus we will be
overconfident in our results.

Pooled model
If we have some observations at the
macro-level, we are artificially increasing the
number of observations. Thus we will be
overconfident in our results.
E.g. characteristics of judges in explaining
the severity of court rulings.

Pooled model

slope = 0.124

4
x

Fixed effects model


With a fixed effects model we explain the
within-group variation, removing the
between-group variation by:

Fixed effects model


With a fixed effects model we explain the
within-group variation, removing the
between-group variation by:
Adding dummy variables for each group

Fixed effects model


With a fixed effects model we explain the
within-group variation, removing the
between-group variation by:
Adding dummy variables for each group
Subtracting the group means from all
variables

Fixed effects model


With a fixed effects model we explain the
within-group variation, removing the
between-group variation by:
Adding dummy variables for each group
Subtracting the group means from all
variables
The two are equivalent.

Fixed effects model


In essence, we thus have different intercepts
for each group.

Fixed effects model


In essence, we thus have different intercepts
for each group.
yi = 0 + Xi + j[i ] + i ,
whereby i denotes the individual unit, j the
group, and j[i] the group of i.

Fixed effects model


In essence, we thus have different intercepts
for each group.
yi = 0 + Xi + j[i ] + i ,
whereby i denotes the individual unit, j the
group, and j[i] the group of i.
If the fixed effects model is the true model,
pooled estimates are biased and inconsistent.

Pooled model

slope = 0.124

4
x

Fixed effects model (1)

slope = 0.210

4
x

Fixed effects model (2)

0.0
0.5
1.0
1.5

y.dev

0.5

1.0

slope = 0.210

0
x.dev

Between effects model


Another way of dealing with clustered data is
looking at the between model:
yj = 0 + Xj + j

Between effects model


Another way of dealing with clustered data is
looking at the between model:
yj = 0 + Xj + j
Typical mistake: conclusions about
individuals from aggregate data - ecological
fallacy.

Between effects model

slope = 0.683

4
x

Fixed effects in R
lm(grade ~ aptitude + age + factor(school))

Fixed effects in R
lm(grade ~ aptitude + age + factor(school))

Or, if you prefer without overall intercept:


lm(grade ~ aptitude + age + factor(school) - 1)

Fixed effects in Stata


xtreg grade aptitude age, i(school) fe

Fixed effects in Stata


xtreg grade aptitude age, i(school) fe

Or, manually:
xi: reg grade aptitude age i.school

6
5
4
3
2

grade

School example

0
aptitude

School example
Estimate Std. Error t value Pr(>|t|)
(Intercept)
1.955
1.886
1.04
0.304
aptitude
0.797
0.159
5.02 5.4e-06 ***
age
0.287
0.151
1.90
0.062 .
Residual standard error: 1.2 on 57 degrees of freedom
Multiple R-Squared: 0.361, Adjusted R-squared: 0.339
F-statistic: 16.1 on 2 and 57 DF, p-value: 2.86e-06

School example
aptitude
age
factor(school)1
factor(school)2
factor(school)3

Estimate Std. Error t value Pr(>|t|)


0.9227
0.0723
12.76 < 2e-16 **
0.2013
0.0675
2.98
0.0042 **
2.6388
0.8565
3.08
0.0032 **
4.5112
0.8550
5.28 2.3e-06 **
1.9550
0.8370
2.34
0.0232 *

Residual standard error: 0.54 on 55 degrees of freedom


Multiple R-Squared: 0.992, Adjusted R-squared: 0.991
F-statistic: 1.3e+03 on 5 and 55 DF, p-value: <2e-16

Group-level variables
Note that fixed effects models cannot deal
with group-level variables.

Group-level variables
Note that fixed effects models cannot deal
with group-level variables.
The effect would be perfect multicollinearity.

Group-level variables
Note that fixed effects models cannot deal
with group-level variables.
The effect would be perfect multicollinearity.
High multicollinearity also arises from
variables with low variance - e.g. political
institutions.

Group-level variables
Solution:
1
2
3

yi = Xi + j[i ] + i

j = Zj + j
yi = Xi + Zj[i ] + j[i ] + i

(X and Z are assumed to include constants.)


The last step is necessary to get the correct
standard errors.

Group-level variables

In most cases with group-level variables,


however, a random effects or random
intercept model is more appropriate.

Outline
1

Introduction

Motivation

Fixed effects

Random effects

Random coefficients

Further information

Random effects
For the random effects model we still have:
yi = 0 + Xi + j[i ] + i .
However, this time we assume j N(0, 2 ).

Random effects
For the random effects model we still have:
yi = 0 + Xi + j[i ] + i .
However, this time we assume j N(0, 2 ).
By assuming that j comes from a normal
distribution, we have fewer parameters to
estimate (only one 2 instead of J s).

Variance components
In the population, the variance of the
dependent variable can be split in
within-group and between-group variance:
2
2
Y2 = between
+ within

Intraclass correlation
Aside: the proportion of the variance that is
accounted for by the group level is the
intraclass correlation.
intra

2
between
= 2
2
between + within

Variance estimators
2
2
within
= swithin

Variance estimators
2
2
within
= swithin

between

2
sbetween

where
n = n

2
swithin
,

sn2j
N n

Fixed vs random

When to use random effects?

Fixed vs random
When to use random effects?
A group effect is random if we can think
of the levels we observe in that group to
be samples from a larger population.

Fixed vs random
When to use random effects?
A group effect is random if we can think
of the levels we observe in that group to
be samples from a larger population.
When making out-of-sample inferences.

Fixed vs random
When to use random effects?
A group effect is random if we can think
of the levels we observe in that group to
be samples from a larger population.
When making out-of-sample inferences.
When there are group-level variables.

Fixed vs random
When to use random effects?
A group effect is random if we can think
of the levels we observe in that group to
be samples from a larger population.
When making out-of-sample inferences.
When there are group-level variables.
When the sizes of groups are small.

Fixed vs random
When to use random effects?
Alternatively, one can primarily look at nj
and N:
N small
fixed effects
N not small, nj small random effects
nj larger
not as important
But this is only a preliminary quick judgment!

Fixed vs random
When to use random effects?
Gelman & Hill (2007): Our advice (...) is to
always use multilevel modeling (random
effects).

Fixed vs random
When to use random effects?
Johnston & DiNardo (1997): choose random
effects when you can assume that Xi and
j[i ] are uncorrelated; fixed effects otherwise.

Random effects in R

library(arm)
lmer(grade ~ aptitude + age + (1|school))

Random effects in Stata

xtreg grade aptitude age, i(school) re


xtreg grade aptitude age, i(school) re mle

School example

Note that we are talking about 3 schools this is too few groups to seriously consider a
random effects model!

School example
Linear mixed-effects model fit by REML
Random effects:
Groups
Name
Variance Std.Dev.
school
(Intercept) 1.737
1.318
Residual
0.293
0.542
number of obs: 60, groups: school, 3
Fixed effects:
Estimate Std. Error t value
(Intercept)
3.0259
1.1360
2.66
aptitude
0.9216
0.0723
12.75
age
0.2020
0.0675
2.99

School example
Random-effects GLS regression
Group variable (i): school

Number of obs
Number of groups

=
=

60
3

R-sq:

Obs per group: min =


avg =
max =

20
20.0
20

within = 0.7578
between = 0.0072
overall = 0.3610

Random effects u_i ~ Gaussian


corr(u_i, X)
= 0 (assumed)

Wald chi2(2)
Prob > chi2

=
=

32.21
0.0000

-----------------------------------------------------------------------------grade |
Coef.
Std. Err.
z
P>|z|
[95% Conf. Interval]
-------------+---------------------------------------------------------------aptitude |
.7970039
.1587327
5.02
0.000
.4858935
1.108114
age |
.2871654
.1508365
1.90
0.057
-.0084687
.5827995
_cons |
1.955282
1.88636
1.04
0.300
-1.741915
5.652479
-------------+---------------------------------------------------------------sigma_u |
0
sigma_e |
.5416609
rho |
0
(fraction of variance due to u_i)
------------------------------------------------------------------------------

6
5
4
3
2

grade

School example (fixed)

0
aptitude

6
5
4
3
2

grade

School example (random)

0
aptitude

School example
Random-effects ML regression
Group variable (i): school

Number of obs
Number of groups

=
=

60
3

Random effects u_i ~ Gaussian

Obs per group: min =


avg =
max =

20
20.0
20

Log likelihood

= -53.896431

LR chi2(2)
Prob > chi2

=
=

83.55
0.0000

-----------------------------------------------------------------------------grade |
Coef.
Std. Err.
z
P>|z|
[95% Conf. Interval]
-------------+---------------------------------------------------------------aptitude |
.9210743
.0710091
12.97
0.000
.781899
1.06025
age |
.2022943
.0662681
3.05
0.002
.0724112
.3321775
_cons |
3.021863
1.034865
2.92
0.003
.993565
5.050161
-------------+---------------------------------------------------------------/sigma_u |
1.073727
.4438302
.4775795
2.414027
/sigma_e |
.5320764
.0498341
.4428439
.639289
rho |
.8028508
.1342531
.4616776
.9640621
-----------------------------------------------------------------------------Likelihood-ratio test of sigma_u=0: chibar2(01)=
83.34 Prob>=chibar2 = 0.000

R-squared
In linear regression, a popular statistics is R 2,
which is the squared multiple correlation
coefficient

R-squared
In linear regression, a popular statistics is R 2,
which is the squared multiple correlation
coefficient, or in other words, which describes
the proportion of the variance in the
dependent variable that is explained by the
model.

R-squared
In linear regression, a popular statistics is R 2,
which is the squared multiple correlation
coefficient, or in other words, which describes
the proportion of the variance in the
dependent variable that is explained by the
model.
So what about R 2 for random effects
models?

R-squared
Remember, variance of a multilevel model
has different components:
2
2
Y2 = between
+ within

R-squared: individual
level
Estimated two models, one with and one
without explanatory variables (A and B,
respectively).
Then,
2
Rwithin

2
2
+ ,A
,A
= 2
2
,B + ,B

R-squared: group level


Estimated two models, one with and one
without explanatory variables (A and B,
respectively).
Then,
2
Rbetween

2
2
/n
+ ,A
,A
,
= 2
2 /n
,B + ,B

whereby n is the typical group size in the


population.

Predicted random effects


With a fixed effects model, we have the
coefficients on the group dummies which we
can interpret as group-level predictors.

Predicted random effects


With a fixed effects model, we have the
coefficients on the group dummies which we
can interpret as group-level predictors.
In a random effects model, we do not have
these predictions, as we only estimated 2
and 0.

Predicted random effects


The predicted group levels can be estimated
using:
0,j = j yj + (1 j )0
2
,
j = 2
+
2/nj
whereby yj is the mean on y of group j.

Predicted random effects


In R, you get the estimated fixed effects with:

model.random <- lmer(y ~ x1 + x2 + (1|


fixef(model.random)
and the predicted random effects with:
ranef(model.random)

Outline
1

Introduction

Motivation

Fixed effects

Random effects

Random coefficients

Further information

Random coefficients
In the random effects model, we assume that
group intercepts vary according to a normal
distribution.

Random coefficients
In the random effects model, we assume that
group intercepts vary according to a normal
distribution.
But what about the coefficients?

Random coefficients
In the random effects model, we assume that
group intercepts vary according to a normal
distribution.
But what about the coefficients?
I.e. what about group slopes that vary
following a normal distribution?

Random coefficients
yi = 0 + Xi + Xi j[i ] + j[i ] + i
j N(0, 2 )
j N(0, 2 )

Random coefficients
yi = 0 + Xi + Xi j[i ] + j[i ] + i
j N(0, 2 )
j N(0, 2 )
Note that a model with random coefficients,
but a constant intercept across groups rarely
makes sense, especially because of the often
arbitrary location if x = 0.

Random effects in R

library(arm)
lmer(grade ~ aptitude + age + (aptitude|school))

School example
Linear mixed-effects model fit by REML
Random effects:
Groups
Name
Variance Std.Dev. Corr
school
(Intercept) 1.74e+00 1.32e+00
aptitude
1.47e-10 1.21e-05 0.000
Residual
2.93e-01 5.42e-01
number of obs: 60, groups: school, 3
Fixed effects:
Estimate Std. Error t value
(Intercept)
3.0259
1.1359
2.66
aptitude
0.9216
0.0723
12.75
age
0.2020
0.0675
2.99

6
5
4
3
2

grade

School example (random)

0
aptitude

10

Example

5
x

10

10

Pooled model

slope = 0.195

5
x

10

10

Fixed effects model

slope = 0.236

5
x

10

10

Random effects model

slope = 0.235

5
x

10

10

Random coefficients
model
mean slope = 0.363

Random coefficients
model
Linear mixed-effects model fit by REML
Random effects:
Groups
Name
Variance
g
(Intercept) 3.730
x
0.397
Residual
0.094
number of obs: 200, groups: g,

Std.Dev. Corr
1.931
0.630
0.168
0.307
10

Fixed effects:
Estimate Std. Error t value
(Intercept)
1.113
0.611
1.82
x
0.363
0.199
1.82

Outline
1

Introduction

Motivation

Fixed effects

Random effects

Random coefficients

Further information

Important other topics


Time-dependence within groups (next
week)

Important other topics


Time-dependence within groups (next
week)
Predictors on the random coefficients

Important other topics


Time-dependence within groups (next
week)
Predictors on the random coefficients
Bayesian estimation

Important other topics


Time-dependence within groups (next
week)
Predictors on the random coefficients
Bayesian estimation
More complex models dealing with panel
data structures

Important other topics


Time-dependence within groups (next
week)
Predictors on the random coefficients
Bayesian estimation
More complex models dealing with panel
data structures
Extensions towards limited dependent
variables

Further information
A clear, relatively introductory textbook on
multilevel modeling is Snijders & Bosker
(1999), Multilevel analysis. An introduction
to basic and advanced multilevel modeling.

Further information
A clear, relatively introductory textbook on
multilevel modeling is Snijders & Bosker
(1999), Multilevel analysis. An introduction
to basic and advanced multilevel modeling.
An excellent, modern book on multilevel
modeling, using primarily R and Bugs, is
Gelman & Hill (2007), Data analysis using
regression and multilevel/hierarchical models.

Further information
Their websites are also interesting:
Snijders: http://stat.gamma.rug.nl/snijders/
Gelman: http://www.stat.columbia.edu/ gelman/

Further information
When using Stata, the
Longitudinal/panel-data reference manual is
of very high quality. The relevant chapters
for this lecture are in fact freely available as
sample chapters (xtreg and xtmixed) at
http://www.stata.com/bookstore/xt.html.

Further information
When using Stata, the
Longitudinal/panel-data reference manual is
of very high quality. The relevant chapters
for this lecture are in fact freely available as
sample chapters (xtreg and xtmixed) at
http://www.stata.com/bookstore/xt.html.
For the use of R, Google and Gelman & Hill
(2007) are more helpful resources.

Further information
Two standard textbooks on panel data are
Baltagi (2005), Econometric analysis of
panel data (primarily for small N, large T )
and Hsiao (2003), Analysis of panel data
(primarily for large N, small T ). Both are
very technical in nature. Perhaps an easier
introduction is Wooldridge (2002),
Econometric analysis of cross-section and
panel-data.