You are on page 1of 37

My Favorite

M
F
it (Mixed)
(Mi d) Models:
M d l
An Overview
Rod Sturdivant
Center for Data Analysis and Statistics (CDAS)
Feb 2007

AGENDA

The Problem
The Traditional Approach
The Mixed Model
An Example
l
References

Center for Data Analysis and Statistics (CDAS)

The Problem

Center for Data Analysis and Statistics (CDAS)

Initial Model Specification


Reasonable? A straight line:

returni = 0 + 1 usei + i
Where:
i is the subject index (I = 1,,40)
ij is an error term (assumed independent with
y
mean 0 and constant variance usually
assume a normal distribution)

Center for Data Analysis and Statistics (CDAS)

Initial Model Specification


Parameters estimated using least squares (or
Maximum Likelihood) results in:

returni = 72.9 1.8 usei

Center for Data Analysis and Statistics (CDAS)

A Good Model?

Regression is statistically significant


(p<0.0001)
Standard regression diagnostics reveal
nothing
g suspicious.
p

BUT WHAT IF
BUT,
IF

Center for Data Analysis and Statistics (CDAS)

A NEW PICTURE
The data came from 4 treatment centers:

Center for Data Analysis and Statistics (CDAS)

Traditional Solution

Use 0-1 design variables to represent the group


differences
Results in a different line for each group
4 Groups
3 Design
variables

Center
1
2
3
4

Center for Data Analysis and Statistics (CDAS)

D1
0
1
0
0

D2
0
0
1
0

D3
0
0
0
1
8

Revised Model Specification


returni = 0 + 1 usei
+ 2 D1i + 3 D 2i + 4 D3i + i
USING LEAST SQUARES THE ESTIMATED
MODEL IS:

return
t i = 80.8
80 8 2 usei 22.9
9 D1i 88.8
8 D 2i 15
15.8
8 D3i

Center for Data Analysis and Statistics (CDAS)

Revised Model Specification


Note each center has a different estimated line:
returni = 80.8
80 8 2 usei 22.9
9 D1i 88.8
8 D 2i 15
15.8
8 D3i
Center

Design Variables

Estimated Line

0, 0, 0

returni = 80.8 2 usei

1, 0, 0

returni = 77.9 2 usei

0, 1, 0

returni = 72.0
2 0 2 usei

0, 0, 1

returni = 65.0 2 usei

Center for Data Analysis and Statistics (CDAS)

10

Model Results

Excellent fit (R2 increase from 0.69 to 0.99)


All variables highly
g y significant
g
(p < 0.0001))

Center for Data Analysis and Statistics (CDAS)

11

ONE MORE TWIST


Slopes are not all the same:

Center for Data Analysis and Statistics (CDAS)

12

Revised Model Specification


ADD use and D3 interaction:
returni = 0 + 1 usei + 2 D1i + 3 D 2i + 4 D3i
+ 4 D3i usei + i

THE ESTIMATED MODEL IS:


returni = 80.7 2 usei 2.9 D1i 8.9 D 2i 44.1 D3i + 3.8 D3i usei
The Center 4 Line Changes

Center for Data Analysis and Statistics (CDAS)

returni = 36.6 + 1.8 usei

13

Model Results

Excellent fit (R2 of 0.97)


All variables highly significant (p < 0.0001 and d1 0.0015)

Center for Data Analysis and Statistics (CDAS)

14

The Mixed Model

Q: What is wrong with the traditional model it looked pretty good


Increase in the number of groups leads to large number
of parameters
Data management and parameter estimation issues
Decrease in estimate precision (note: in our example, standard
errors more than doubled when interaction added)

May have a large number of predictors unique at the


hi h level
higher
l
l only
l
Observed groups are a random sample
Independence assumption data CORRELATED
Center for Data Analysis and Statistics (CDAS)

16

Introduction

Issue of clustered data

Common in many studies and fields


Standard models fail to adequately address
Inference and estimates affected

Mixed models as a solution

IIncreasing
i use
Estimation algorithms and software
available

Center for Data Analysis and Statistics (CDAS)

17

Mixed Model Specification

Random intercept only


Emphasis on hierarchy

Emphasis on fixed/random

yij = 0 j + 1 xij + ij
0 j = 0 + 0 j
where
h

We assume:

ij

y ij = ( 00 + 1 xij ) + ( 0 j + ij )

j = 1,..., J

Level 2 (group) index

i = 1,..., n j

Level 1 (subject) index

N ((0, 2 )

Independent
p
of

Center for Data Analysis and Statistics (CDAS)

0 j

N ((0, 02 )
18

Random Slope Model


Emphasis on hierarchy

yij = 0 j + 1 j xij + ij
where

Emphasis on fixed/random

y ij = ( 0 + 1 xij ) + ( 0 j + 1 j xij + ij )

0 j = 0 + 0 j

1 j = 1 + 1 j
Additional assumptions:

1 j

N (0,
(0 12 )

cov( 0 j , 1 j ) = 01
Center for Data Analysis and Statistics (CDAS)

19

Matrix Form of the Model


y = X + Z +
As an example, consider the matrices for the 2 level random slope
model:
y ij = ( 0 + 1 x ij ) + ( 0 j + 1 j x ij + ij )
y11

M
yn 1
1
y12
M

y =
yn2 2

M
y
1J
M
yn J
J

1 x11

M M
1 xn 1
1
1 x12
M M

X=
1 x n2 2

M M
1 x
1J
M M
1 x
nJ J


= 0
1

M
1

Z=

x11
M
xn11

Center for Data Analysis and Statistics (CDAS)

1
M

x12
M

x n2 2
O
1
M
1

x1 J

M
xnJ J

01

11
M
=

0 J

1J

11

M
n 1
1
12
M

=
n2 2

M

1J
M

nJ J

20

Covariance Structure of the


Linear Hierarchical Model
var( y ) = V = ZZ + W
Where:

= var()

and

W = Var ( )

In the 2 level random slope example:


02 01

2
01 0

=
O

0
01

01 02

Center for Data Analysis and Statistics (CDAS)

W = diag( 2 )

21

Model Estimation

Recent (10 years) computing


advances allow estimation
Maximum Likelihood (ML) Estimates

Iterative algorithms
Bayesian MCMC methods

Software including MLwiN,


MLwiN HLM,
HLM
Winbugs, SAS, VARCL

Center for Data Analysis and Statistics (CDAS)

22

For Our Example

Parameter Estimates
Parameter
Fixed

Estimate Standard Error

Intercept

66.9

10.2

Slope

-1.0

0.96

40
4.0

0 97
0.97

412.8

293.6

3.6

2.6

-38.1

27.3

Level 1
Random

Intercept
Slope
Covariance

NOTE: only estimate 6 parametersregardless


parameters regardless of number of
groups!
Center for Data Analysis and Statistics (CDAS)

23

Predictions

Residual estimates (posterior means) for level h of model


substitute parameter estimates in:

rh = R h V 1 (y X)

where

R h = Z h h

cov(rh ) = R h V 1R h

(Design and covariance


matrices for level h)

Referred to as shrunken residuals. Example for level two intercept


only case:

n j 02
( y. j y.. )
0 j =
2
2
n j 0 +
Center for Data Analysis and Statistics (CDAS)

24

Model Results
Center (j)

Intercept

Slope

66.9+12.9
= 79.8

-1 - 0.86
= -1.86

80.7
-2

66.9+11.7
= 78.6
78 6

-1 - 1.03
= -2.03
2 03

77.8
-2
2

66.9+ 5.7
= 72.6

-1 - 0.98
= -1.98

71.8
-2

66.9 30.2
66.9-30.2
= 36.7

-1
1 + 2.87
= 1.87

36.6
1.8

Estimate
intercept and
slope from:
Center for Data Analysis and Statistics (CDAS)

Std.
Model

0 j = 00 + 0 j
1 j = 10 + 1 j
25

Model Extensions
There are numerous possible extensions at this
point. In addition to additional predictors (with
either fixed or random coefficients) at level 1
returnij = 0 j + 1 j useij + ij
Additional
l
level
l

0 j = 00 k + 01 z0j + 0 j
00 k = 000 + 00 k
1 j = 10 + 11z1j + 1 j

Center for Data Analysis and Statistics (CDAS)

Center specific
predictor variable

26

Example

Batting Average in
DAY/NIGHT games

MIXED MODEL (SAS PROC MIXED)


Effect
DAY
Intercept
DAY
N

Solution for Fixed Effects


Standard
Estimate
Error
DF
0.2850 0.005203
19
-0
0.0045
0045 0.001027
0 001027
379

t Value
54.77
-4
4.38
38

Pr > |t|
<.0001
<.0001
< 0001

In data: Day average = 0.28499 Night average = 0.28048


No random effect for Player
Parameter Estimate
Intercept
0.2850
DAY
N -0.0045

Error
0.00174983
0.00247463

Center for Data Analysis and Statistics (CDAS)

t Value
162.87
-1.82

Pr > |t|
<.0001
0.0696
28

Random effects
Covariance Parameter Estimates
Cov Parm
Subject
Estimate
Intercept
PLAYER
0.000531
0 000531
Residual
0.000105

Player

Day

Night

Model Day

Model Night

0.24943

0.24545

0.25004

0.24554

0 26905
0.26905

0 26712
0.26712

0 27048
0.27048

0 26598
0.26598

0.29949

0.29169

0.29772

0.29321

0.26852

0.26626

0.26979

0.26529

Center for Data Analysis and Statistics (CDAS)

29

Acute Myocardial Infarction (AMI)

Article by Austin, Tu, Alter in the American Heart


Journal ((JAN 2001))
Analyzed patients admitted to Ontario hospitals
between 1994 and 1999 (>100,000)
Compared traditional to hierarchical logistic models
3 levels: patients, physicians and hospitals
Separate analysis for 9 outcomes classified as
fatal, non-fatal and processes of care

Center for Data Analysis and Statistics (CDAS)

30

Study Findings

Found patient level variables agreed


Hi
Hierarchical
hi l method
th d led
l d to
t different
diff
t conclusions
l i
44% of the time for hospital level factors

Traditional methods overestimated the statistical


significance of these factors

Traditional models tended to underestimate the


magnitude of physician level factors

Center for Data Analysis and Statistics (CDAS)

31

References

BOOKS

Bryk and Raudenbush (1992) good diagnostic sections


Longford (1993) can be tough to read; Fisher scoring
Goldstein ((1995)) details sometimes omitted,, but manyy of
technical issues are there (download for free)
Hox (1995) appears very introductory (download for free)
Kreft and DeLeeuw (1998) Nice introduction with many
examples using MLn
Snijders and Bosker (1999) Good introduction; some math
left out. Best sections on model checking/diagnostics
Goldstein and Leyland
y
Eds ((2001)) Great for intuitive
understanding; recent and good source of further references

Center for Data Analysis and Statistics (CDAS)

33

PAPERS

Topic dependent; I have lists


Useful Websites:

Multilevel models project (UK) - Goldstein


http://multilevel.ioe.ac.uk/index.html
Multilevel Analysis Page (Netherlands) - Snijders
http://stat.gamma.rug.nl/multilevel.htm
Multilevel Modeling Page (Leipzig) Mayerhofer
http://www lrz muenchen de/~wlm/wlmmule htm#Author
http://www.lrz-muenchen.de/~wlm/wlmmule.htm#Author
LAMMP (Michigan) Raudenbush
p
p
g
g
g
http://www-personal.engin.umich.edu/~gibsong/
Center for Data Analysis and Statistics (CDAS)

34

Marginal Models

Example is Generalized Estimating


Equations (GEE)
Primary interest in fixed parameters
Random structure specified nuisance
parameters
Hierarchical
h l models
d l consider
d random
d
parameters of interest themselves

Estimate fixed and random simultaneously


Center for DataAnalysis
and Statistics
Focus
of our(CDAS)
proposed research

35

Linear Hierarchical Model


Using
the Deviance
Comparison

IGLS ((ML estimation only)


y)
Comparing models with different random parts must have
the same fixed part
Given the maximum of the likelihood function

Di = 2 log(
l ( Li )

Difference in Deviance to compare two models

D1 D2 ~ 2 ( p2 p1 )

Where the number of parameters in the models

p1 < p2

Wald Tests of Fixed Parameters

Concern about appropriateness of assymptotic standard


normal distribution
h
Test of hypothesis

H0 : h = 0

Center for Data Analysis and Statistics (CDAS)

T (h ) =

SE( h )

36

Linear Hierarchical Model


Comparison
Wald
test in small samples compare to a tdistribution
Random parameters (Bryk & Raudenbush)

Obtain LS estimates of parameter within each


group and use chi-square test of equality
across groups

Multivariate Wald test available for a group


of parameters
Other tests all appear to have problems
and/or are not available in most software
packages
Center for Data Analysis and Statistics (CDAS)

37

You might also like