You are on page 1of 7

Stats 112/203: Sample Midterm Exam Solutions

1. Consider the following linear mixed model:


Yij = 0 + b0i + (1 + b1i )tij + ij ,
where Yij is the response measurement for the ith individuals jth occasion taken at
time tij . The coefficients 0 and 1 are fixed effects. Assume
iid
ij N (0, 2 ),
     2 
b0i iid 0 0 01
N , ,
b1i 0 01 12
0
and b0i b1i and ij are independent.
(a) Derive the marginal variance V ar(Yij ) and covariance Cov(Yij , Yik ) (j 6= k).
(b) What is the conditional variance V ar(Yij |b0i , b1i )?
(c) Suppose the observed responses for the first individual are Y11 = 32, Y12 = 24,
Y13 = 25, and Y14 = 28, observed at days t11 = 0, t12 = 8, t13 = 16, and
t14 = 30. Write the linear mixed model for this observation in matrix notation:
Y i = X i + Z i bi + i .

(d) Suppose the true values of the fixed effects are 0 = 20 and 1 = 5. Draw a
time plot that demonstrates
i the overall population mean response trajectory,
ii a hypothetical random sample of three individual mean response trajectories,
and
iii the hypothetical observed response measurements for each of the three cho-
sen individuals
under each of the following conditions:
1. 02 = 0 and 0 < 2 < 12 .
2. 12 = 0 and 0 < 2 < 02 .
3. 2 = 0, 02 > 0, and 12 > 0.
Solution:
(a)
V ar(Yij ) = Cov(0 + b0i + (1 + b1i )tij + ij , 0 + b0i + (1 + b1i )tij + ij )
= Cov(b0i + b1i tij + ij , b0i + b1i tij + ij )
= V ar(b0i ) + t2ij V ar(b1i ) + V ar(ij )
+ 2tij Cov(b0i , b1i ) + 2Cov(b0i , ij ) + 2tij Cov(b1i , ij )
= 02 + t2ij 12 + 2 + 2tij 01 + 0 + 0
= 02 + t2ij 12 + 2tij 01 + 2
since b0i and ij are independent, and b1i and ij are independent.

1
Similarly,

Cov(Yij , Yik ) = Cov(0 + b0i + (1 + b1i )tij + ij , 0 + b0i + (1 + b1i )tik + ik )
= Cov(b0i + b1i tij + ij , b0i + b1i tik + ik )
= V ar(b0i ) + tij tik V ar(b1i ) + Cov(ij , ik )
+ (tij + tik )Cov(b0i , b1i )
+ Cov(b0i , ij ) + Cov(b0i , ik )
+ tij Cov(b1i , ij ) + tik Cov(b1i , ik )
= 02 + tij tik 12 + (tij + tik )01

since b0i , ij and ik are independent; and b1i , ij and ik are independent.

(b) V ar(Yij |b0i , b1i ) = V ar(ij ) = 2 .

(c)
32 1 0   1 0   11
24 1 8 0 1 8 b01 12
=
25 1 16 1 + 1 16 b11 + 13

28 1 30 1 30 14

(d) 02 = 0 and 0 < 2 < 12 :

Problem 2(b)1.
200

Pop. Mean
Indiv. 1
Indiv. 2
Indiv. 3
150
100
Y

50
0

0 5 10 15 20 25 30

Time

2
12 = 0 and 0 < 2 < 02 :

Problem 2(b)2.

200
Pop. Mean
Indiv. 1
Indiv. 2
Indiv. 3

150
100
Y

50
0

0 5 10 15 20 25 30

Time

2 = 0, 02 > 0, and 12 > 0:

Problem 2(b)3.
200

Pop. Mean
Indiv. 1
Indiv. 2
Indiv. 3
150
100
Y

50
0

0 5 10 15 20 25 30

Time

3
2. Data were collected on N = 22 students by the Minneapolis Public School District
in Minnesota to comply with federal accountability requirements, namely Title X
of the No Child Left Behind Act of 2001. Reading achievement scores (Read) were
measured in grades 5, 6, 7, and 8 for each student in the study. The predictor of
interest is called Risk, which takes on the value DADV if the student was designated as
disadvantaged (based on poverty or homeless measures), and ADV if the student was
designated as advantaged. R code, output, and summary plots are in the Appendix.

(a) Write out the equation of the mod1 fitted model for the subject-specific (condi-
tional) mean response. Define any variables used.
(b) Give an interpretation of the estimated Grade coefficient in mod1 in context of
the problem.
(c) Does the intercept for mod1 have a useful interpretation? If yes, give the inter-
pretation. If no, explain why.
(d) What is the estimated variance of the random slopes using mod1.REML? Provide
an interpretation of the magnitude of this value in context of the problem.
(e) Here are the predicted random effects (EBLUPs) for individual 1, who was clas-
sified as disadvantaged:
> ranef(mod1.REML)
(Intercept) Grade
1 -25.6386581 0.7372782
You may leave your answers to the following three questions unsimplified.
i. What is the predicted subject-specific slope for individual 1?
ii. What is the predicted mean reading score for individual 1 in grade 9?
iii. What is the estimated marginal mean reading score for disadvantaged stu-
dents in grade 9?
(f) The R output includes a likelihood ratio test of mod2.REML vs. mod1.REML. State
the null and alternative hypotheses being tested (in mathematical symbols), the
p-value (report at least four digits), and state a conclusion of the test in context
of the problem.
(g) Write a short paragraph addressing the effect of being advantaged or disadvan-
taged on the changes in reading scores across grades. Include both an estimate
of the effect as well as results from an appropriate test assessing statistical sig-
nificance of the effect.

4
Solution:

(a)
i = 194.571 24.258Di + 4.570tij + 0.571tij Di + b0i + b1i tij
Yij |b
where
Yij is the fitted mean reading achievement score of the ith individual in grade
tij ,
tij is the grade level of individual i at the time of occasion j, and
(
1 individual i is disadvantaged
Di =
0 individual i is advantaged
(b) For advantaged students, the estimated mean reading score increases by 4.570
per grade level.
(c) No. The intercept is the mean reading level at grade zero for advanted students,
which is an extrapolation beyond the observed data (grades 5-8).
(d) V ar(b1i ) = (2.8595)2 = 8.177.
Approximately 95% of advantaged students have an estimated mean change in
reading score between a 1.15 decrease and a 10.29 increase per grade level (4.570
2(2.860)).
(e) i. 4.570 + 0.571 + 0.737 = 5.878
ii. 194.571 24.258 25.639 + (4.570 + 0.571 + 0.737)(9) = 197.576
iii. 194.571 24.258 + (4.570 + 0.571)(9) = 216.582
(f) We are testing the set of hypotheses
H0 : 12 = 0
Ha : 12 > 0
where 2 = V ar(b1i ). The p-value is 0.01149973 (from a 2 (2) and 2 (1) mixture
distribution), which is less than a significance level of 0.05, so we reject H0 . There
is statistically significant evidence of individual variation in the mean change in
read reading score per grade level.
(g) Disadvantaged students have an estimated increase in mean reading score that
is 0.571 larger than the estimated increase in mean reading score for advantaged
students per grade level. However, this difference between the two groups is not
statistically significant (p-value = 0.710).

5
3. Consider a three-year longitudinal study of binge drinking among 100 college stu-
dents, with a weekly indicator variable Yij as the response variable (1 for any binge
drinking episode that week; 0 otherwise). Covariates are time tij in weeks, and
membership in organized group housing Xi (1 for yes, such as fraternity, sorority,
or student cooperative; 0 otherwise). (Note that membership in organized group
housing for an individual does not change over time.) One of the goals of the study
is to assess changes in binge drinking behavior over time, and particularly whether
changes over time are influenced by membership in group housing.

(a) For the goal of this study, would a model for population-averaged effects or
subject-specific effects be more appropriate? Explain.
(b) Write out an appropriate generalized linear mixed model for this study, including
any model assumptions.
(c) Using your model from part (a), write out the null and alternative hypotheses that
match the study goal of determining whether changes over time are influenced
by membership in group housing in terms of the model parameters.

Solution:

(a) To assess within-subject changes in binge drinking behavior over time, a subject-
specific model would be more appropriate. For the question of whether changes
over time are influenced by membership in group housing, a population-averaged
model would be more appropriate since group membership does not change over
time within an individual in this study.
(b) Assume
Yij |bi Bin(1, ij )
where we model the conditional probability of any binge drinking episode in week
j for individual i, ij = E(Yij |bi ), as
 
ij
log = 0 + bi + 1 Xi + 2 tij + 3 Xi tij ,
1 ij
iid
where we assume bi N (0, b2 ).
(Note that the model must include the interaction term between Xi and tij in
order to assess whether changes over time are influenced by membership in group
housing.)
(c) H0 : 3 = 0 vs. Ha : 3 6= 0.

6
Appendix: R code, output, and plots for Problem 2
Individual Reading Score Trajectories Mean Reading Score Trajectories by Group

Advantaged Advantaged
Disadvantaged Disadvantaged
260

260
240

240
Mean Reading Score
220

220
Reading Score

200

200
180

180
160

160
140

140
5 6 7 8 5 6 7 8

Grade Grade

> mod1 = lme(Read ~ Risk*Grade, random = ~ 1+Grade|ID, method="ML",...)


> summary(mod1)
Linear mixed-effects model fit by maximum likelihood
...
Fixed effects: Read ~ Risk * Grade
Value Std.Error DF t-value p-value
(Intercept) 194.57106 9.205739 56 21.135844 0.0000
RiskDADV -24.25798 12.429473 20 -1.951650 0.0651
Grade 4.56981 1.135098 56 4.025920 0.0002
RiskDADV:Grade 0.57102 1.528037 56 0.373693 0.7100

> mod1.REML = lme(Read ~ Risk*Grade, random = ~ 1+Grade|ID, method="REML",...)


> summary(mod1.REML)
Linear mixed-effects model fit by REML
...
Random effects:
Formula: ~1 + Grade | ID
Structure: General positive-definite, Log-Cholesky parametrization
StdDev Corr
(Intercept) 25.940918 (Intr)
Grade 2.859547 -0.786
Residual 4.268936

> mod2.REML = lme(Read ~ Risk*Grade, random = ~ 1|ID,


data=dat.long, method="REML", na.action=na.omit)
> anova(mod2.REML,mod1.REML)
Model df AIC BIC logLik Test L.Ratio p-value
mod2.REML 1 6 569.8403 583.8247 -278.9202
mod1.REML 2 8 565.8409 584.4867 -274.9204 1 vs 2 7.999482 0.0183
> pchibarsq(7.999482, 2, lower.tail=FALSE)
[1] 0.01149973
> pchibarsq(7.999482, 1, lower.tail=FALSE)
[1] 0.002339537

You might also like