Professional Documents
Culture Documents
Course Outline
Longitudinal Studies are studies in which the data on individuals are measured repeatedly
through time. This course will cover exploratory analysis, modeling and interpretation of
longitudinal studies.
The outline of this course is as follows. Section 1 starts with the background assumed and a
brief overview of SPSS, introducing the module used to analyze longitudinal data: SPSS Proc
Mixed. Then, we highlight the main features and merits of longitudinal data. In longitudinal
analysis considerations we will mention different features of longitudinal studies that must be
taken into account when selecting an appropriate methodology. We present the data structure
needed to analyse longitudinal data using SPSS Proc Mixed. We also introduce some general
notation and concepts.
Section 2 presents some graphical techniques used to explore longitudinal analysis and it
provides a brief discussion of the two approaches for analysis of longitudinal data which will
be the main focus of this course.
Section 3 presents the Mean Response approach. Section 4 presents the Random
Coefficients approach.
General notes on using SPPS
3
3
6
8
9
10
11
17
23
Profile Analysis
23
Parametric Curves
42
43
24
25
33
34
36
51
1. Introduction
1.1 Background Assumed
- Variables:
Y: Outcome, response, dependent variable
X: Covariates, independent variables
- Inference
Estimation, testing, and confidence intervals
- Statistical methods:
Multiple linear regression, ANOVA,ANCOVA.
To open a file you need to click on File and then select Open Data. SPSS data sets have
the file extension *.sav.
Variable selection
After doing the appropriate selection and clicking on Continue the following screen will appear.
Variable selection
On the right hand side of the screen there are two Sub-dialogue boxes: Fixed and Random
(see below).
These four screens are needed to specify a model. We will go through each dialog box and
their range of options when presenting the methodology applied to longitudinal data.
EXAMPLES
1) Diet Study
A physician is evaluating a new diet for her patients with a family history of heart disease. To
test the effectiveness of this diet, 16 patients are placed on the diet for 6 months. Their
weights and triglyceride levels are measured before and after the study, and the physician
wants to know if the weights have changed.
2) Smoke Data
This dataset consists of a sub-sample from an epidemiologic study conducted in a rural
Netherlands where residents were followed over time to obtain information on the prevalence
of and risk factors for chronic obstructive lung diseases. A measure of pulmonary function
(FEV1) was obtained every three years for the first 15 years of the study, and also at year 19.
Information on respiratory symptoms and smoking status was collected.
3) Exercise Therapy Study
The data are from a study of exercise therapies, where 37 patients were assigned to one of
two weightlifting programs. In the first program (treatment 1), the number of repetitions was
increased as subjects became stronger. In the second program (treatment 2), the number of
repetitions was fixed but the amount of weight was increased as subjects became stronger.
Measures of strength were taken at baseline (day 0), and on days 2, 4, 6, 8, 10, and 12.
4) Treatment of Lead Exposed Children
These data consist of four repeated measurements of blood lead levels ( a common forms of
metal intoxication) obtained at baseline (or week 0), week 1, week 4, and week 6 on 100
children who were randomly assigned to a chelating treatment (antidote to lead poisoning) or
placebo.
5) Grocery coupons data
This is a hypothetical data file that contains survey data collected by a grocery store chain
interested in the purchasing habits of their customers. Each customer is followed for four
weeks, and each case corresponds to a separate customer-week and records information
about where and how the customer shops, including how much was spent on groceries during
that week.
A grocery store chain is interested in determining the effects of three different coupons
(versus no coupon) on customer spending. To this end, they construct a crossover trial in
which a random sample of their regular customers is followed for four weeks.
Design for crossover trial
Sequence 1 Sequence 2 Sequence 3 Sequence 4
Week 1 No coupon 5 percent
15 percent 25 percent
Week 2 5 percent
25 percent No coupon 15 percent
Week 3 15 percent No coupon 25 percent 5 percent
Week 4 25 percent 15 percent 5 percent
No coupon
Thus, in Sequence 1, a customer is not sent a coupon in the first week, receives a 5 percent
coupon in the second week, a 15 percent coupon in the third week, and a 25 percent coupon
in the fourth week. Each customer is randomly assigned to one of the sequences.
6) Clinical Trial of Patients with Respiratory Illness
The data are from a clinical trial of patients with respiratory illness, where 111 patients from
two different clinics were randomized to receive either placebo or an active treatment.
Patients were examined at baseline and at four visits during treatment. At each examination,
respiratory status (categorized as 1 = good, 0 = poor) was determined.
7) Airline Cost Data
These data are from a study of a group of U.S. airlines collected on several companies over
15 years. The objective of the study is to determine the effect of economic factors on cost.
8) Growth Study Data
Investigators at the University of North Carolina Dental School followed the growth of 27
children (16 males, 11 females) from age 8 until age 14. Every two years they measured the
distance between two points in the head that are easily identified on x-ray.
9) Opposites-naming data
This dataset consists on a sample of people who completed an inventory that assesses their
performance on a timed cognitive task called opposites naming. Individuals were measured
on four occasions. A baseline measure of cognitive skill was also collected. The research
interest focuses on whether opposites-naming skill increases more rapidly over time among
individuals with stronger cognitive skills.
(iii) Another advantage of a longitudinal study is its ability to distinguish the degree of
variation in the response over time for one person from the variation in the response among
people. We will come back to this later in the course.
General structure
1
Subjects
1
2
.
.
.
n
Time
2
3
Time
2
3
where yij = j th observation on the i th subject and y is the response or outcome measure.
In the structure above we have a single group of subjects. We can extend this to have two or
more groups of subjects repeatedly measured over time. The groups can be defined by
characteristics of the study subjects, such as age, gender (Studies based on such categories
are observational). Groups can also be defined by randomization to alternative treatments.
This type of design is a called Parallel Groups Design. The structure of a Two- Groups
Parallel Design is below:
Time
Group 1
Subjects
1
2
.
.
.
m
y11
y21
.
.
.
ym1
y12
y22
.
.
.
ym2
y13
y23
.
.
.
ym3
Group 2
Subjects
m+1
m+2
.
.
.
n
. . . y1p
. . . y2p
... .
... .
... .
. . . ymp
Next we consider a variant of the single-group repeated measures design known as the
crossover design. In the simplest version of the cross-over design, two treatments, say A
and B, are to be compared.
Subjects are randomly assigned to the two treatment orders: A B and B A. Example:
Placebo-controlled study of the effect of erythropoietin on plasma histamine levels and
purities scores of 10 dialysis patients. Treatment schedule was 5 weeks of placebo and 5
weeks of erythropoietin in random order. Another example is the Grocery Coupons data in
section 1.3.
Correlation
To avoid writing equations we can say that correlation is a single number that describes the
degree of relationship between two variables. For example we would expect that some
measure of self esteem and weight to be correlated. The same applies to price and demand
for a certain item.
Factors
Factors are categorical predictors. e.g., treatment (treatment 1,. . ., treatment k),gender.
Level
The categories of a factor are called levels. For example gender is a factor with two levels.
Covariates
Interactions
An interaction between two factors means that each combination of factor levels can have a
different effect on the dependent variable. Additionally, you can find that the relationship
between a covariate and the dependent variables changes for different levels of a factor. This
would be a factor-covariate interaction.
The response or outcome measure will be denoted by y. Predictors, both factors and
covariates will be denoted by x except for time which will be denoted by t.
10
dataset ( section 1.3), smoking status is a predictor of FEV1 and it could also be considered
as time-varying if some patients stop smoking.
Parallel Design
Continuous outcome
Balanced design with equally or unequally spaced measurements.
ID
1
2
3
4
5
6
Var1
12
23
31
13
26
27
Var2
45
43
54
42
40
49
Var3
34
34
45
31
38
44
Group
A
B
A
A
B
B
As an example we could think of Var1, Var2 and Var3 being, respectively, the weight of each
individual measured on three occasions.
The defining characteristic of Multiple-Record (MR) structure is that information pertaining to a
single observation is stacked or placed on multiple lines in the dataset.
For example if there are 20 participants in a study with 3 observations on variable X for each
person, the resulting MR data file will contain 120 lines (records) ,i.e., all the information
regarding variable X will be contained in one column. In addition to variable X, it also
necessary to include an individual level identifier (ID) and a variable representing timing or
sequence (ORDER) of measurements.
This format is illustrated in figure 2 for the first 3 participants.
11
ID
1
1
1
2
2
2
3
3
3
Var X
12
45
34
23
43
34
31
54
45
Order
1
2
3
1
2
3
1
2
3
In this course we are going to use the SPSS Linear Mixed Models procedure to analyze
longitudinal data. SPSS Mixed uses the MR format so if the data is initially recorded in MV
format you need to restructure the file.
PREPARING THE DATA: using the Restructure option in SPSS
First, lets open the diet study file,Diet_Study.sav. To restructure the data from variables to
cases, from the menus choose:
12
You want to restructure variables into cases, so simply click Next in the Restructure Data
Wizard Welcome dialog box.
You have to specify how many variable groups you want to restructure. The default is the
first option. However, the diet file contains Weight and Triglyceride which are recorded for
each patient at different time points. This means that we have two variable groups so the
second option is appropriate in this case.
Click on the More than One option and type 2 after How Many?
13
Click Next.
14
In the Case Group Identification group, select Use selected variable from the dropdown list
and select Patient ID as the identification variable.
In the Variables to Be Transposed group, type Triglice as the first target variable (trans1)
and select tg0,tg1,tg2,tg3 and tg4 as the variables to be transposed.
In the same Variables to Be Transposed box click on the second target variable (trans2),
type Weight and select wgt0,wgt1,wgt2,wgt3 and wgt4 as the second group of variables to be
transposed.
15
16
You want to create just one index variable, so simply click Next in the Variables to Cases:
Create Index Variables.
17
18
(ii) We can address the between-person question-How does individual change differ across
people?- by exploring whether different people change in similar or different ways, whether
observed differences in change across people are associated with individuals characteristics,
in other words which predictors are associated with which patterns.
Understanding these two questions will prepare you for subsequent mode-based analyses.
The exploratory analyses presented in this course will be based on graphical techniques.
The simplest way of visualizing how a person changes over time (question (i)) is to plot each
persons outcome vs. time. Because it is difficult to discern similarities and differences among
individuals if each page contains only a single plot, we recommend that you cluster sets of plots in
smaller numbers of panels.
We use the diet file with multiple records (Diet_StudyLF) as an illustration. Once youve opened this
file, from the menu choose:
Graphs
19
Drag Triglyceride
to this box
Drag time
to this box
After you inserted Triglyceride in the Y-Axis box and time in the X-Axis box, click on the
Groups/Point ID box.
A drop down list appears. Tick the Rows panel variable box.
A Panel box is displayed on the right side of the canvas.
Drag Patient ID to the Panel box
Go to the Options box. Tick on Wrap Panels in the Panels sub-box.
Click Ok
20
Drag Patient ID
to this box
Tick the
Wrap
Panels box
Click Ok
21
You can get another version of this graph by double clicking on the graph to open the editor
and from the menu chose Elements and then click on Add Markers. If you get rid of the lines
connecting the markers you obtain the graph below.
22
Should you examine every possible empirical growth plot if your data set is large, including perhaps
hundreds of cases? May be, you can randomly select a sub sample of individuals (perhaps
stratified into groups defined by the values of important predictors).
Having summarized how each individual changes over time, we now examine similarities and
differences in these changes across people (question (ii)).In other words, we are interested in the
average change trajectory for the entire group.
To produce this type of graph, go to
Graphs
23
24
Click Ok
25
The average change trajectory for the entire group would be a line somewhere in the middle of
graph above. Unfortunately, there is no straightforward way to do this in SPSS. However you could
do this semi-automatically by selecting Data Split File from the menu and then clicking on the
Organize output by groups option.
26
Now if you go to Analyze Descriptive Statistics you obtain the mean weight for each occasion
of measurement (time).You can create a new patient ID (e.g., 17) in the Diet_Study.sav file and
type in the five mean values. Re-run the graph above and youll get:
It is also useful do box-plots (go to Graphs Legacy Dialogs Box Plots) to investigate
symmetry and variability at each time point. Lack of symmetry implies non-normality. In the
diet study dataset we could do a box plots graph for weight by time and gender. We can see
that there is no apparent departure from symmetry.
27
The graph below shows box plots for each time point for the triglyceride outcome. This
response is less symmetric. And, unlike weight, variability over time is less homogeneous.
However, we will se that the methodology used in longitudinal designs can account for this
type of heterogeneity.
28
Next we discuss different choices for modeling the mean response over time. The mean, , is
given by the linear regression model,
E (Y) = = X
We also need to specify the covariance matrix, but we will deal with this later in this section.
First we will concentrate in modeling the mean response. We can distinguish two basic
strategies: (i) Arbitrary Means (Profile Analysis) and (ii) Parametric Curves
29
H10: Are the profiles of means similar between the groups, in the sense that the line
segments between adjacent occasions are parallel? This is the hypothesis of no group by
time interaction.
H20: If the group profiles are parallel, are the means constant over time? This is the
hypothesis of no time effect.
H30: If the group profiles are parallel, are they also at the same level? This is the hypothesis
of no group effect.
Although these general formulations of the study hypotheses are a good place to begin, the
appropriate hypotheses in a particular study must be derived from the relevant scientific
issues in that investigation.
30
Suppose that each individual was measured three times. Some of the available structures are:
12
12
13
12
22
23
13
23
33
1
2
2
1
1
Compound Symmetry. This structure has constant variance and constant covariance.
2 12
1
1
1
12
1
2
1
2 12
For more options see Covariance Structures on the SPSS Proc Mixed online help.
With too little structure (e.g. unstructured), there may be too many parameters to be
estimated with the limited amount of data available. This would leave too little information
available for estimating . With too much structure (e.g compound symmetry), there is more
information available for estimating .
31
observed values. Note that a high likelihood corresponds to a low -2 Restricted Log
Likelihood because we are multiplying by -2. Therefore, the smaller this measure is the
better.
The other four measures are modifications of the Log Likelihood which penalize more
complex models.
Akaike's Information Criterion (AIC) adjusts the -2 Restricted Log Likelihood by twice the
number of parameters in the model.
Hurvich and Tsai's Criterion (AICC) is a correction for the AIC when the sample size is
small. As the sample size increases, the AICC converges to the AIC.
Scwartz's Bayesian Criterion (BIC) has a stronger penalty than the AIC for
overparametrized models, and adjusts the -2 Restricted Log Likelihood by the number of
parameters times the log of the number of cases.
Bozdogan's Criterion (CAIC) has a stronger penalty than the AIC for overparametrized
models, and adjusts the -2 Restricted Log Likelihood by the number of parameters times one
plus the log of the number of cases. As the sample size increases, the CAIC converges to the
BIC.
Smaller values indicate better models.
Model Comparison
First we need to make a distinction between nested and non-nested models:
Model I is nested in Model II when Model I is a particular case of Model II.
Example:
Y 0 1 x1 2 x 2 3 x1 x 2 e
(I)
Y 0 1 x1 2 x 2 3 x1 x 2 4 x12 5 x 22 e
(II)
When models are nested and the parameters estimated using Maximum likelihood methods
as it is the case with SPSS Mixed, they can be compared statistically by using the Likelihood
Ratio Test.
The likelihood ratio test compares a smaller model versus a more complex model. The null
hypothesis of the test states that the smaller model fits the data as good as the larger, more
complex model. If the null hypothesis is rejected, then the alternative, larger model provides a
significant improvement over the smaller model.
We can use likelihood ratio tests for hypotheses about models for the mean and the
covariance (keep in mind that in this approach we need to specify a model for the mean and a
model for the covariance structure).
Note models for the covariance can be statistically compared provided that they have the
same model for the mean.
Non-nested models can only be compared descriptively by using model fit measures such
as the ones mentioned above. For example the autoregressive and compound symmetry
covariance structures are not special cases of each other i.e., they are non-nested.
We will go back to model selection and comparison in the examples described throughout the
course.
32
Screen 1
Click Continue.
33
Note that so far we used time (which is a factor) to specify the covariance structure (since
observations for the same patient are correlated). However we are also interested in the
effect of time on the response so in the second window we enter time as a factor.
Select Weight as the dependent variable.
Select time and gender as a factor.
Screen 2
Click Fixed.
Screen 3-a
34
Select time and gender in the Factors and Covariates box and click Add.
Note that the box in the middle is set to Main Effects. If you want to include the interaction
term between time and gender in the model you have to click on and select Interaction.
Then highlight both time and gender and click Add. The screen below will appear
Screen 3-b
Lets choose the model without the interaction term (you can test the interaction effect as an
exercise).If you already are in Screen 3-b , highlight time*gender and click Remove.
Click Continue.
Click on Statistics in the Linear Mixed Models dialog box.
In the Statistics sub-box (Screen 2 ), select Parameter estimates, Tests for covariance
parameters and Covariances of residuals in the Model Statistics group.
Click Continue and then click OK in the Linear Mixed Models dialog box.
35
Screen 4
Table 1
36
Table 2
The tests of fixed effects table provides F tests for each of the fixed effects specified in the
model. Small significance values (Sig. column) indicate that the effect contributes to the
model.
Table 3
Table 4 provides estimates of the fixed model effects and tests of their significance. Since
there is an intercept term, the fifth level of time is redundant. Thus, the estimates for the
first four levels contrast the average weight at time 1, 2, 3 and 4 to the last period. We
can see that the estimated average weight at each time point is significantly different from the
last time point .From the first column the estimates of the mean weights decrease over time
being the mean weight on the first occasion the highest. The effect of gender is also
significant.
Table 4
37
The following two tables show information about the variance-covariance structure.
Table 5
For the unstructured covariance matrix, the table above directly reports the values in the
matrix and their corresponding significance. UN(1,1) is the variance for the error term in time
1, UN(2,2) is the variance for the error term in time 2, UN(2,1) is the covariance between error
terms at the first and second time periods and so on. Another way to look at the estimated
variances and covariances is given by the table below.
Table 6
But how do we know that this model is appropriate for these data? First we need to assess
whether this is a feasible model before doing any model comparison (see Restricting the
Covariance Structure below).
A simple way to assess how well the model fits the data is by plotting the fitted values
obtained from the model vs. the observed values. To obtain the predicted values when you
run the analysis in SPSS Mixed, you will find a sub-box called Save in Screen 2 . Click on
Fixed Predicted Values. Lets assume for now that this model is suitable. In the example in
Section 3.2 we will see how to plot predicted vs. observed values.
38
Table 7
The model dimension table shows that the number of parameters in the repeated effects is
reduced from 15 to 2.
We can compare the information criteria for this model to those for the previous model with
the unstructured covariance.
The -2 Restricted Log Likelihood is smaller for the unstructured model but this is expected
since this model has a larger number of parameters. All of the other measures, which
penalize overly complex models, are smaller for the autoregressive model except the AIC
which is similar for both models.
Another way to check that the simpler model is suitable is by looking at the parameters
estimates table for the fixed effects (Table 10). If the estimated effects and Standard Errors
were very different from the corresponding table for the unstructured model (Table 4) it is not
39
a good sign. In this case there are only slight differences between the two models. A more
formal comparison is given by the Likelihood ratio Test.
Table 10
The AR1 diagonal parameter specifies the residual variance for each time point. The AR1
rho parameter specifies the residual correlation between two consecutive occasions (see
table 11).
Table 11
40
The significance value of the test is saved to the variable p-value, and its value is 0.014.
Therefore, at the 5 % significance level, we reject the assumption of an AR(1) structure.
However, if we take into account the information criteria for both models, it would make sense
to keep an AR(1) structure- specially because if the number of repeated measurements
become too large and the sample is not big enough, the estimation of an unstructured
covariance becomes unfeasible.
Suggestion: Try to fit an Heterogeneous AR(1).
E (Yij) = 0 + 1 Time j
While for subjects in treatment group 1,
41
Questions:
Does Lead level increase linearly over time?
Running the analysis:
Select Patient ID as the subjects variable.
Select time as the repeated effects variable.
Select Unstructured from the Repeated Covariance type dropdown list.
42
Screen 1
Click Continue.
Note that as in profile analysis we still use time (which is a factor) to specify the covariance
structure . However, in the parametric curves approach we are interested in modeling the
time effect as a continuous variable so we are going to use actual day of measurement (days)
as the time variable. The simplest curve is a line , i.e. we are assuming that straight line
approximates the relationship between the response and days.
Select Lead Level as the dependent variable.
Select Treatment as a factor.
Select Days as a covariate.
43
Screen 2
Click Fixed.
Select Treatment and days in the Factors and Covariates box and click Add.
Screen 3
As in profile analysis,
Click Continue.
44
Table 12
Table 13
Table 14 below shows that days is very significant but the effect of treatment is borderline
(Sig. =0.042).
Table 14
45
Table 15
Table 16
First, before doing any model comparison (e.g. doing a profile analysis instead) we need to
assess whether this is a feasible model.
As it was mentioned in the previous section, a straightforward way to assess how well the model fits
the data is by plotting the fitted values obtained from the model vs. the observed values. After
saving the Fixed Predicted Values in Screen 2 you can plot Predicted vs. Observed as follows:
From the menu chose:
Graphs
Legacy Dialogs
Interactive Line
Click on Summaries of separate variables and select the Multiple option. You may want to
put Treatment in the Row box so that you can have one plot for each treatment. You get the
graph below.
46
It is obvious that when there is no treatment (A) the relationship between days and Lead level
is non linear so you may be better off by trying another curve or simply doing a profile
analysis.
I strongly advice to do some exploratory analysis before fitting any model. Things such as non
linearity are easy to check by doing a simple plot . In this case a graph of Lead Level vs. days
by Treatment would have indicated the lack of linearity.
Allows one to model time trend and treatment effect(s) as a function of a small
number of parameters. That is, the treatment effect can be captured in one or
two parameters, leading to more powerful tests when these models fit the data.
47
Random Coefficients
A random coefficient model is an alternative approach to model longitudinal data. The most
common applications are those in which a linear relationship is assumed between the
outcome of interest and time and it could be considered as an extension of the simple linear
regression model for the response Y of subject i on occasion j (see equation below).
Yij 0 1Time ij e ij
i = 1,,N j:=1, , T
(*)
48
Note that in (*), 0 and 1 do not vary between individuals. They are fixed and they would
have the same interpretation as in the linear curve in section 3.2. For 1, for example, this
means that we are only assessing the average change trajectory for the entire group, not
each individuals growth trajectory.
Also, remember that linear regression models assume independence of the errors and for
longitudinal data this assumption is unreasonable.
A way to extend the idea of multiple regression models to longitudinal data is by introducing
random effects in the regression parameters. This is the main characteristic of Random
Coefficient Models. These random effects allow describing each subjects trend across time
and also accounting for the correlation between measurements.
We are going to describe these ideas in more detail in the next two sections by presenting the
Random Intercept Model and the Random Intercept and Slope Model.
(4.1-a)
b0i 0 u0i
(4.1-b)
where
eij ~ N ( 0 , e2 )
where
u0i ~ N ( 0 , u20 )
(4.1-c)
A typical situation where this model would be appropriate is if the data looked like the graph
below.
49
Each line corresponds to a different subject. The slope doesnt seem to change across
subjects. However, the intercept does.
From the first equation in (4.1) , the random intercept model indicates that the response for
subject i is influenced by his/her initial level boi. The second equation in (4.1) indicates that
the initial level for subject i, boi is determined by 0, the initial level common to all subjects,
plus a unique contribution to that subject u0i. Equation 4.1-c indicates that the subject
effect boi is random by specifying a variance parameter
u20
Another way to see it, is by thinking that each subjects trend line is parallel to a common
trend given by 0 + 1* Time so the difference between each subjects trend and the
common trend is given by u0i.This means that for the Random intercept model
u0i is a
2
between-subject error and its variance u0 , the between-subject variance. This variance
2
represents the spread of the lines in the graph above. If u0 , is near zero, then the individual
lines would not deviate much from the common trend 0 + 1* Time. The graph below
illustrates this point.
u0i
Subject i
0 + 1* Time
e2
called within-person residual variance. It is the variability of the errors e ij . To interpret this
errors suppose an hypothesized trajectory for subject i. In the graph below
ei 1 , ei 2 , ei 3 are
50
Subject i
The introduction of random effects induces correlation among the repeated measurements for
the same subject. For the Random Intercept Model the correlation structure will have a
Compound Symmetry form (see page 31). This type of correlation is usually unrealistic unless
subjects are measured only in two occasions.
The Random Intercept and Random Slope Model given next allows for a more flexible
structure.
51
Screen 1
Click Continue.
Select Weight as the dependent variable.
Select time as a covariate and gender as a factor.
Screen 2
Click Fixed.
52
Screen 3
Select time and gender in the Factors and Covariates box and click Add.
Click Continue
Click Random in the Linear Mixed Models dialog box.
53
Screen 4
54
Screen 5
55
Table 18
From this table we can see that time and gender are significant.
Table 20
The intercept refers to the initial value of the response. Since gender is significant the
interpretation of the intercept (0) depends on gender.
For gender =0, the baseline average weight is 165.65+57.92= 223.57. For gender =1 , the
average initial weight is 165.65.
The estimated slope (1) is -2.01. Because the sign is negative, the linear trend goes down,
i.e. , on average patients loose weight over time. The time scale in this data set is weeks so
we can say that on average patients are loosing 2.01 units per week.
56
Table 21
The table above provides a summary of the parameters used to specify the random effect and
residual covariance matrices. Unlike Models for the Mean response, no repeated effects are
specified here so the variance of the residuals has only one parameter. Its estimated value is
2.52.
2
u0
e2
in weight (i.e., the variability in the response not explained by the linear effect of time) is at the
individual level.
It is always recommended to check predicted vs. observed values either though the graph
below or producing a table of predicted vs. observed weight (see table 22). The model seems
to fit the data well.
Table 22
57
b0i 0 uoi
(4.2-a)
where
eij ~ N ( 0 , e2 )
(4.2-b)
b1i 1 u1i
where
0 2
u0 ,u1
u0 i
~ N , u0
2
0 u0 ,u1 u1
u1i
(4.2-c)
This model is more suitable when the data show a pattern as in the graph below. We can see
that not only the intercepts but also the slopes seem to vary across subjects.
58
In this model both the intercepts boi and the slopes b1i are considered random.
The randomness of the coefficients is set by assigning a distribution to them (4.2-c).
It can be shown that for the Random Intercept and Random Coefficient model the VarianceCovariance structure not only a function of the parameters in 4.2-c but also a function of time.
59
Screen 1
Click Continue.
Select as the dependent variable.
Select time as a covariate.
Screen 2
Click Fixed
60
Screen 3
Select time in the Factors and Covariates box and click Add.
Click Continue
Click Random in the Linear Mixed Models dialog box.
Screen 4
61
Table 23
Table 24
Table 25
62
Table 26
The fixed effects (time and intercept) interpretation is the same as in the Random intercept
model: children start off, on average, with an OPP score of 164.37 and gained 26.96 points
per testing occasion.. Because of the sign of the estimated slope is positive, it means that
childrens OPP scores gets higher over time.
Table 27
Inspection of the random effects indicates that the residual variance is 159.48 and it is
statistically significant.
The intercept and slope also have significant variance, UN (1,1) and U (2,2),( u0 and
2
u21
63
This type of model assumes the correlation between repeated measurements arises
because each subject has an underlying (latent) level of response which persists over
time.
It distinguishes two sources of variation that account for differences in the response of
subjects measured at the same occasion:
(i) Between-Subject Variation: Different subjects simply respond differently
(ii) Within-Subject Variation: In a longitudinal design its the variation in the outcome
measure over time within each subject
Random coefficients models are very flexible since they can accommodate any degree
of imbalance in the data. That is, we do not necessarily require the same number of
observations on each subject or the measurements be taken at the same times.
64
Random Coefficients: make inferences about individuals rather than the population
averages.
Interest focuses on estimating intercepts or slopes for each subject and /or on
controlling for unmeasured subject characteristics (variance parameters for the
intercept and slope).
For missing data they are a better choice.
65
References
Diggle, P.J., Heagerty, P., Liang K-Y. and Zeger, S.L. (2002). Analysis of
Longitudinal Data (second edition). Oxford: Oxford University Press.
Hedeker, D.and Gibbons R.D (2006). Longitudinal data Analysis. Wiley, John
& Sons.
http://biosun1.harvard.edu/~fitzmaur/ala/
66