Professional Documents
Culture Documents
longitudinal data
Kristin Sainani Ph.D.
http://www.stanford.edu/~kcobb
Stanford University
Department of Health Research and Policy
Limitations of rANOVA/rMANOVA
They assume categorical predictors.
They do not handle time-dependent covariates
(predictors measured over time).
They assume everyone is measured at the same time
(time is categorical) and at equally spaced time
intervals.
You dont get parameter estimates (just p-values)
Missing data must be imputed.
They require restrictive assumptions about the
correlation structure.
2
time1
time2
time3
time4
chem1
chem2
20
18
15
20
1000
1100
1200
1300
22
24
18
22
1000
1000
1005
950
14
10
24
10
1000
1999
800
1700
38
34
32
34
1000
1100
1150
1100
25
29
25
29
1000
1000
1050
1010
30
28
26
14
1000
1100
1109
1500
chem3
chem4
chem=chem1;
chem=chem2;
chem=chem3;
chem=chem4;
output;
output;
output;
output;
id
1
20
1000
18
1100
15
1200
20
1300
22
1000
24
1000
18
1005
22
950
14
1000
10
1999
24
800
10
1700
38
1000
34
1100
32
1150
34
1100
25
1000
29
1000
25
1050
29
1010
30
1000
28
1100
26
1109
14
150
Data in long1
1
form:
1
time
score chem
data?
The only way to force a rANOVA here is
data forcedanova;
set broad;
if
4;
;
if avgchem<1100 then group="low";
;
run
avgchem>1100 then group=
group="high"
"high";
glm data=forcedanova;
class group;
model
repeated
time1-time4= group/ nouni;
;
run;
time /summary;
; quit;
Gives no
significant
results!
14
data?
Todays lecture:
Introduction to GEE for longitudinal data.
Introduction to Mixed models for
longitudinal data.
15
depression
scores
time.correlation between
Can also look
forand
a linear
n
In SAS:
modelreg data=long;
run
score=chem time;
run;
;
16
Graphically
Nave linear regression here looks for significant slopes (ignoring
correlation between individuals):
Y= 24.90889 - 0.557778*time.
Y=42.44831-0.01685*chem
Impossible d'afficher l'image. Votre ordinateur manque peut-tre de mmoire pour ouvrir l'image ou l'image est endommage. Redmarrez l'ordinateur, puis ouvrez
nouveau le fichier. Si le x rouge est toujours affich, vous devrez peut-tre supprimer l'image avant de la rinsrer.
The model
The linear regression model:
18
Results
The fitted model:
DF
Intercept
42.46803
6.06410
7.00
<.0001
chem
-0.01704
0.00550
-3.10
0.0054
time
Estimate
0.07466
Error
0.64946
t Value
Standard
0.11
Pr > |t|
0.9096
19
Generalized Estimating
Equations (GEE)
n
20
The model
Score1
Chem1
Score2
Chem2
= +
+ (time) + CORR + Error
0
1
2
Score3
Chem3
Score
4
Chem
4
NOTE, for
time-dependent predictors
22
Results
Empirical Standard Error Estimates
Intercept 38.2431 4.9704 28.5013 47.9848
chem
7.69 <.0001
time
Parameter
-0.0775
Estimate
0.2829
Error-0.6320
Limits
0.4770 -0.27
Z Pr > 0.7841
|Z|
Intercept 38.2431 4.9704 28.5013 47.9848
chem
time
7.69 <.0001
here.
standard
error for time
parameter was: 0.64946
Its cut by more than half
here.
23
100-unit
higher chemical
level is correlated
average)
with having
Between-subjects
interpretation
(different(on
types
of people):
Havingaa
1.29 point lower depression score.
chemical
levels within
a person corresponds
to time):
an average
1.29 point
Within-subjects
interpretation
(change over
A 100-unit
increase in
decrease in depression levels.
**Look at the data: here all subjects start at the same chemical level, but
havebetween
link
differentincreasing
depressionchemical
scores. Plus,
levelstheres
and decreasing
a strong within-person
depression
scores within
chemical
levels
patients
within(so
a person
likely corresponds
largely a within-person
to an average
effect).
1.29 point
decrease in depression levels.
**Look at the data: here all subjects start at the same chemical level, but
have different depression scores. Plus, theres a strong within-person
link between increasing chemical levels and decreasing depression
scores within patients (so likely largely a within-person effect).
25
asThen
a nuisance
variablecorrelation
(i.e.
as a covariate)
The
within-subject
the regression
coefficients
structure
are refit,is treated
correcting for the correlation. (Iterative process)
n The within-subject correlation structure is treated
as a nuisance variable (i.e. as a covariate)
26
t1
t2
t3
t23
t3
Variance of scores is homogenous across
time (MSE in ordinary least squares
regression).
27
2
y / t
a
t2
a
2
y/t
c
t3
c
2
y / t
28
29
Independence
t1
t1
t2
t3
t2
t3
0 0
0 0
0 0
30
Exchangeable
t1
t2
t3
t1
t2
t3
Autoregressive
t1
t2
t1
t2
t3
t4
3 2
t3
t4
3
2
32
M-dependent
t1
t1
t2 1
t3
t4
t2
2 1
0 2 1
t3
t4
0
2
1
2
1
Unstructured
t1
t2
t3
t1
2
5
t2 1
t3
t4
2 5
3 4 6
t4
3
4
6
35
Independent?
time1
Exchangeable?
time2
Autoregressive?
M-dependent?
time3
Unstructured?
time4
time2
time3
time4
1.00000
0.92569
0.69728
0.68635
0.0081
0.1236
0.1321
0.92569
0.0081
1.00000
0.55971
0.77991
0.2481
0.0673
0.69728
0.55971
0.1236
0.2481
1.00000
0.37870
0.4591
0.68635
0.77991
0.37870
0.1321
0.0673
0.4591
1.00000
36
Col2
1.0000
0.7276
0.7276
0.7276
Col3
0.7276
1.0000
0.7276
0.7276
Col4
0.7276
0.7276
1.0000
0.7276
0.7276
0.7276
0.7276
1.0000
Error
Limits
Z Pr > |Z|
7.69 <.0001
Compare to autoregressive
proc genmod data=long4;
class
id;
39
Row1
Row2
Row3
Row4
Col2
0.7831
1.0000
0.7831
0.6133
Col3
Col4
0.6133
0.7831
1.0000
0.7831
0.4803
0.6133
0.7831
1.0000
Z Pr > |Z|
40
Example tworecall
From rANOVA:
Within subjects effects,
but no between subjects
effects.
Time is significant.
Group*time is not
significant.
Group is not significant.
This is an example with a
binary time-independent
predictor.
41
Empirical Correlation
Pearson Correlation Coefficients, N = 6
Prob > |r| under H0: Rho=0
time1
Independent?
time1
Exchangeable?
Autoregressive?
time2
M-dependent?
Unstructured?
time2
time3
time4
1.00000
-0.13176
-0.01435
-0.50848
0.8035
0.9785
0.3030
-0.13176
0.8035
1.00000
-0.02819
-0.17480
0.9577
0.7405
time3
-0.01435
-0.02819
0.9785
0.9577
1.00000
0.69419
0.1260
time4
-0.50848
-0.17480
0.69419
0.3030
0.7405
0.1260
1.00000
42
GEE analysis
proc genmod data=long;
class group id;
model score=
time-independent predictors
43
1.0000
-0.0701
0.1916
-0.1817
Col2
-0.0701
1.0000
0.1778
-0.5931
-0.1817
-0.5931
0.5931
1.0000
rameter
Z Pr > |Z|
Comparable to within
effects for time and
time*group from
rMANOVA and rANOVA
GEE analysis
proc genmod data=long;
class group id;
model score=
45
Col2
Col3
Col4
Row1
1.0000
-0.0529
-0.0529
Row2
-0.0529
1.0000
-0.0529
Row3
-0.0529
-0.0529
1.0000
P-values
are
similar
to
rANOVA
Row4
-0.0529
-0.0529
-0.0529
-0.0529
-0.0529
-0.0529
1.0000
rameter
Z Pr > |Z|
47
48
49
~ N (0, 2y / t )
0i ~ N ( 0 population , 0 )
time = constant
50
0i ~ N ( 0 population , 0 )
Generally, this is a
nuisance
parameterwe
have to estimate it for
making statistical
inferences, but we
dont care so much
about the actual
value.
51
it ~ N (0,
2
y/t )
0 = constant
time = constant
Unexplained variability in Y.
LEAST SQUARES ESTIMATION FINDS
THE BETAS THAT MINIMIZE THIS
VARIANCE (ERROR)
52
y/t
y/t
y/t
y/t
y/t
y/t
it ~ N (0,
2
y/t )
59.482929
0 = constant
3 parameters to
estimate.
24.90888889
time = constant
-0.55777778
54
DF
Model
Error
22
Corrected Total
35.00056
Square
F Value
35.00056
1308.62444
0.59
Pr > F
0.4512
59.48293
1343.62500
Root MSE
7.71252
Dependent Mean
Coeff Var
Mean
Squares
23
Analysis of Variance
R-Square
23.37500
0.0260
Adj R-Sq
-0.0182
32.99473
Parameter Estimates
Parameter
Variable
DF
Intercept
time
Estimate
24.90889
-0.55778
Standard
Error
2.54500
0.72714
t Value
9.79
-0.77
Pr > |t|
<.0001
0.4512
0i ~ N ( 0 population , 0 )
56
Mean
population
intercept
Variation in
intercepts
57
2
y/t )
Residual variance:18.9264
0i ~ N ( 0 population , 0 )
Same:24.90888889
time = constant
Same:-0.55777778
4 parameters to
estimate.
58
Where Cov
toParm Subject Estimate
find these
Variance id
44.6121
Residual
things in
from MIXED
in SAS: Fit Statistics
-2 Res Log Likelihood
146.7
152.7
154.1
18.9264
44.6121
= 69%
18.9264 + 44.6121
69% of variability in
depression scores is
explained by the differences
between subjects
152.1
Interpretation isSolution
the same
for as
Fixed Effects
with GEE: -.5578 decrease in
score per month time.
Standard
Effect
Intercept
time
Estimate
24.9089
-0.5578
Error
DF
3.0816
0.4102
t Value
5
17
8.08
-1.36
Pr > |t|
0.0005
0.1916
Time coefficient is the same but standard error is nearly halved (from
0.72714)..
60
61
2
y/t )
Residual variance:40.4937
0 = constant
Same:-0.55777778
0i ~ N ( 0 population , 0 )
2
63
64
0i ~ N ( 0 population , 0 )
24.90888889
53.0068
0.4162
Additionally, we have to
estimate the covariance of the
random intercept and
random slope:
here -1.9943
(adding random time therefore
cost us 2 degrees of freedom)
65
AIC
162.2
Intercept random
150.7
Time slope fixed
Intercept fixed
161.4
Time effect random
All random
152.7
67
68
Cov Parm
Intercept
Subject
id
Estimate
35.5720
Residual
10.2504
Residual and
AIC are reduced
even further
due to strong
explanatory Fit Statistics
power of
chemical.
-2 Res Log Likelihood
143.7
AIC (smaller is better)
147.7
148.4
147.3
Estimate
38.1287
-0.08163
-0.01283
Error
DF
4.1727
0.3234
0.003125
t Value
Pr > |t|
9.14
0.0003
16
-0.25
0.8039
16
-4.11
0.0008
71
SAS code
proc mixed data=long ;
class id group;
model score = time group
time*group/s corrb;
random int /subject=id ;
run; quit;
72
138.4
142.4
143.1
142.0
group
Intercept
Estimate
40.8333
time
-5.1667
group
7.1667
group
time*group
time*group
Error
A
B
4.1934
1.5250
-3.5000
0
DF
t Value
Pr > |t|
9.74
0.0006
16
-3.39
5.9303
16
1.21
0.0038
2.1567
16
-1.62
0.2444
0.1242
73
Parameter
Z Pr > |Z|
References
n
76