You are on page 1of 8

ANALYSIS OF REPEATED MEASURES DATA USING SAS

Krishan Lal I.A.S.R.I., Library Avenue, Pusa, New Delhi 110 012 klkalra@iasri.res.in

1. INTRODUCTION The term repeated measures refers broadly to the data in which the response of each experimental unit or subject is observed on multiple occasions or under multiple conditions. Thus repeated measurements refer to the situation in which multiple measurements of the response variable are obtained, over several time periods, from each experimental unit, such as an animal. Usually, the responses are taken over time, as in growth of animal weights are measured weekly/monthly production of fruit over the years from the same tree. Repeated measurement data are obtained in animal science, horticulture, clinical trials, medical science, physiological, psychological experiments, etc. Repeated measures experiments are a type of factorial experiment, with group and time as the two factors. They have been used commonly in animal, plant, and human research for several decades, but only in recent years statistical and computing methodologies been available to analyze them effectively and efficiently. The objectives of repeated measures data analysis are to examine and compare response trends over time. This can involve comparisons of groups at specific times, or averaged over time. It also can involve comparisons of times within a group. These are objectives common to any factorial experiment. The important feature of repeated measures experiments that requires special attention in data analysis is the correlation pattern among the responses on the same individual (animal) over time. 2. METHODS FOR ANALYZING REPEATED MEASURES Responses measured on the same animal are correlated because they contain a common contribution from the animal. Moreover, measures on the same animal close in time tend to be more highly correlated than measures far apart in time. Also, variances of repeated measures often change with time. These potential patterns of correlation and variation may combine to produce a complicated covariance structure of repeated measures. Special methods of statistical analysis are needed for repeated measures data because of the covariance structure. Standard regression and analysis of variance methods may produce invalid results because they require mathematical assumptions that do not hold with repeated measures data. In repeated measures analysis of variance, the effects of interest are i) between-subject effects such as GROUP ii) within-subject effects such as TIME iii) interactions between the two types of effects such as GROUP*TIME. There are several statistical methods used for analyzing repeated measures data. Here we give from basic to sophisticated methods for the analysis of repeated measure data using SAS software. These include: i) ii) iii) iv) Separate analyses at each time point, Univariate analysis of variance, Univariate and multivariate analyses of time variables, and Mixed model methodology.

113

Analysis of Repeated Measures Data Using SAS

Separate analyses at each time point do not require special methods for repeated measures and do not directly address the objectives of examining and comparing trends over time. The other three approaches require special methodology and software. Development of statistical methods for repeated measures data has been an active area of research in the past two decades because of advancements in computing hardware and software. Enhancements in the SAS System reflect the advancements in methodology and hardware. In SAS System the GLM procedure enabled users to perform univariate analysis of variance but did not provide valid standard errors for most estimates. Moreover, conclusions derived from univariate analysis of variance are often invalid because the methodology does not adequately address the covariance structure of repeated measures. The REPEATED statement is now available to the SAS in the GLM procedure and Mixed procedure. PROC GLM provides both univariate and multivariate tests for repeated measures for one response. Another approach to analysis of repeated measures is via general mixed models. This approach can handle balanced as well as unbalanced or missing within-subject data, and it offers more options for modeling the within-subject covariance. The main drawback of the mixed models approach is that it generally requires iteration and, thus, may be less computationally efficient. The results provided by the REPEATED statement are based on univariate and multivariate analyses of contrast variables computed from the repeated measures variables. This approach basically bypassed the problems of covariance structure rather than addressing them directly. The REPEATED statement enabled users to obtain statistical tests for effects involving time trends. However, the tests were inefficient and the problem of incorrect standard errors remained. In addition, missing data on even one measure of an animal caused all the data for that animal to be ignored. Mixed procedure provided capabilities of mixed model methodology for analysis of repeated measures data. Use of mixed model methodology enabled the user to directly address the covariance structure and greatly enhanced the users ability to analyze repeated measures data by providing valid standard errors and efficient statistical tests. Here we shall illustrate the univariate and multivariate methods of analysis and their respective advantages and shortcomings. The statistical analysis methods illustrated focus on group (sex) comparisons at specific times, group comparisons averaged over times, and on changes over time in specific groups. Differences between groups (male and female) are computed at individual times and averaged across times. Separate analyses at each time and the GLM REPEATED statement require the data to be organized in multivariate mode. That is, there is one row per experimental unit in the data set, and the measurements at each time are considered separate response variables. The univariate ANOVA and MIXED procedure require that the data be organized in univariate mode, that is, one row per experimental unit at each time. We use the data obtained on body weight (kg) of pigs for the male and female. The body weights of pigs are collected at interval of 4 weeks since birth to 20 weeks of age and are given in Table -1. Here the sex has two levels.

114

Analysis of Repeated Measures Data Using SAS

Table 1: Body weights of pigs maintained at Jabalpur during 1988-89 Anim No.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16

Sex 0
Male Male Male Male Male Male Male Male Female Female Female Female Female Female Female Female 1 1 0.8 0.8 0.8 0.8 0.8 0.8 1 1.2 1 0.8 0.8 1 1 1

Week 4
4.8 4.2 4 4 5 3.2 3.2 3.4 5.4 4.8 4.6 4.2 3.8 5.4 6 3.4

8
12.6 7 6 6 9.4 7 5.5 7 10 12.6 13 8 7 11 5.4 7.8

12
16 10 6.4 9 11 10 7.4 8.7 13 16 18 11 7.2 14 10 10

16
21 14 10 13 14 15 12 12.4 17.4 20 22 13 12 19 17 13

20
1.6 22 15 21 23 22 17 19.2 26.4 21 24 18 19 22 26.8 17.8

Now the analysis of this data by using different methods with the use of software is given below: I) Analysis at Individual Time Points Analysis of data at each time point examines group effects separately at individual observation times and makes no statistical comparisons among times. This can be anlysed by using even in Microsoft Excel (easily available software). In it we make a file in Microsoft Excel by taking columns as the levels of the groups and then using Anova single factor command in Data Analysis command in Tools. This process is repeated for each time point.

In SAS data is organized in the "multivariate mode". The statements to obtain analyses at each time point are: DATA BW1; INPUT SEX T1-T6; CARDS; DATA ; PROC GLM; CLASS SEX; MODEL T1-T6 = SEX/SS3; MEANS SEX/LSD; ESTIMATE GP 1 GP 2 SEX 1 -1; RUN; No inference is drawn about trends over time, so this method is not truly a repeated measures analysis. Use of analysis at each time point is usually at a preliminary stage of data analysis and is not a preferred method because it does not address time effects. The only advantage in

115

Analysis of Repeated Measures Data Using SAS

this method is that if we do not have any statistical software the data can be analyzed in Microsoft Excel. II) Univariate ANOVA when the data follow a trend Some of the repeated measures data such as growth, lactation data follow a trend. The analysis of such data can be done by fitting the appropriate such as linear, quadratic curves etc. on each of the animal. A set of estimates of parameters of these repeated data are estimated. These estimates are further analyzed to determine the effect of factors. Such data can easily be analyzed by using SAS system easily. The drawback of this method is that we are using the estimates of parameter that may not be normally distributed. III) Univariate Analysis of Variance Using the GLM Procedure Univariate analysis of variance (ANOVA), is the method most commonly applied to repeated measures data that makes comparisons between times. It treats the data as if they were from a split-plot design with the animals as whole-plot units and animals at particular times as subplot units. This approach also is referred to as a split plot in time analysis. If measurements have equal variance at all times, and if pairs of measurements on the same animal are equally correlated, regardless of the time lag between the measurements, then the univariate ANOVA is valid from a statistical point of view, and, in fact, yields an optimal method of analysis. However, measurements close in time are often more highly correlated than measures far apart in time, which will invalidate tests for effects involving time. For this procedure data is to be set in univariate mode. The data can be analyzed by using SAS system. Now SAS code using PROC GLM for this analysis is given below: DATA BW2; INPUT sex an wk wt; CARDS; DATA ; PROC GLM; CLASS sex an wk; MODEL wt = sex an(sex) wk sex*wk; RANDOM an(sex)/TEST; LSMEANS sex/STDERR E = an(sex); LSMEANS sex*wk/PDIFF; RUN; The MODEL statement specifies sources of variation for the ANOVA. The RANDOM statement produces a table of expected mean squares which, in a true split-plot experiment and can be used to determine appropriate denominators of F-statistics for all terms in the MODEL statement. These tests are produced by the TEST option at the end of the RANDOM statement. In this case, test statistic for SEX is F=MS[SEX] /MS[AN(SEX)]. Tests for effects of WK and SEX*WK use F-statistics with MS[ERROR] for denominator mean square. The first LSMEAN statement computes means for each sex, averaged over weeks, with standard errors. The second LSMEANS statement computes means for combinations of sex and weeks, with standard errors. In addition to the potential problems of statistical validity with univariate ANOVA analysis of repeated measures, there are potential shortcomings with capabilities of the GLM procedure. The LSMEANS statement in PROC GLM does not compute correct standard errors for the SEX*WK means, even if correlation structure of the repeated measures is not a problem, that is, even if variances are equal and correlations are

116

Analysis of Repeated Measures Data Using SAS

equal. Also, comparisons of LSMEANS between sex at specific weeks are not valid due to incorrect calculation of standard errors of differences. Moreover we are using the model of split- plot design but the observations at sub-plot (time points) are not randomly distributed. IV) Analysis of Contrast Variables Using the GLM REPEATED Statement Contrast variables in repeated measures data are linear combinations of the responses over time for individual animals. A familiar example from basic statistical methodology is given by the orthogonal polynomials (Snedecor and Cochran, 1980), which represent linear, quadratic, cubic, etc., trends over time. Another example is the set of differences between responses at consecutive time points, that is, changes from time 1 to time 2, time 2 to time 3, and so forth. A set of contrast variables can be used to analyze trends over time and to make comparisons between times in repeated measures data. The original repeated measures data for each animal are transformed into a new set of variables given by a set of contrast variables. Then, multivariate and univariate analyses can be applied to these new variables. This provides a method for analyzing repeated measures data that evades some of the covariance structure problems that invalidate univariate ANOVA analyses, as discussed in the previous section. The REPEATED statement in GLM provides automatic computation and analyses for several common choices of contrast variables. Data must be in a multivariate mode for use of the GLM REPEATED statement. Using SAS system GLM statements are: DATA BW1; INPUT sex t1-t6; CARDS; DATA ; proc glm; CLASS sex; MODEL t1-t6 = sex/SS3; repeated time 6 contrast (1); title 'repeated measures analysis using REPEATED Statement'; RUN; Note that TIME is not a variable in the SAS data set named MULT. Rather, it is only a name attached to the set of contrasts to be analyzed. The REPEATED statement produces results from several statistical methods to obtain tests for effects involving week. If there were the same number of animals per group and no missing data on any animal, then all four multivariate tests would have equal results. If all animals had complete data, the univariate ANOVA results would agree exactly with those in given in Section I. The label t1 refers to a difference between the response t1 on week 0 and the mean of responses t2 on wk 2 through t6 on wk6. That is, wk1 = t1 - (t2 + ... + t6)/6. Likewise, the label wk2 refers to t2 - (t1 + t3... + t6)/6, and so forth. The REPEATED statement causes PROC GLM to compute an ANOVA for each of the contrast variables wk1 through wk6. V) Mixed Model Analysis Using the MIXED Procedure As noted above, analysis of repeated measures data requires special attention to the covariance structure due to the sequential nature of the data on each animal. Procedures discussed previously either avoid the issue (analysis of contrast variables) or ignore it (univariate analysis of variance). Ignoring the covariance issues may result in incorrect

117

Analysis of Repeated Measures Data Using SAS

conclusions from the statistical analysis. Avoiding the issues may result in inefficient analyses, which is tantamount to wasting data. The general linear mixed model allows the capability to address the issue directly by modeling the covariance structure. This capability is implemented in the MIXED procedure of the SAS System. There are two basic steps in performing a repeated measures analysis using mixed model methodology. The first step is to model the covariance structure. The second step is to analyze time trends for groups by estimating and comparing means. Measures on different animals are independent, so covariance concern is only with measures on the same animal. The covariance structure refers to variances at individual times and to correlation between measures at different times on the same animal. There are basically two aspects of the correlation. First, two measures on the same animal are correlated simply because they share common contributions from the animal. This is due to variation between animals. Second, measures on the same animal close in time are often more highly correlated than measures far apart in time. This is covariation within animals. Usually, when using PROC MIXED, the variation between animals is specified by the RANDOM statement, and covariation within animals is specified by the REPEATED statement. Numerous structures are available as options on the REPEATED and RANDOM statements in the MIXED procedure. Three different structures will be shown here and one will be chosen as best among the three. First, a structure known as compound symmetry (CS) will be fitted. This structure specifies that measures at all times have the same variance, and that all pairs of measures on the same animal have the same correlation. The implication is that the only aspect of the covariance between repeated measures is due to the animal contribution, irrespective of proximity of time. If this structure holds, then the univariate ANOVA in Table 2 would have valid tests, although the standard errors and tests of LSMEANS from statements (2) would not necessarily be valid. Compound symmetric structure can be fitted in two ways with PROC MIXED. One way is with the RANDOM statement: DATA BW2; INPUT sex an wk wt; CARDS; DATA ; PROC MIXED; CLASS sex an wk; MODEL wt = sex an(sex) wk sex*wk; (4) RANDOM an(sex); RUN; This RANDOM statement specifies that there is a contribution common to all measures on the same animal, which results in equal variances at all times and equal correlations between all pairs of times. Only fixed effects are included in the PROC MIXED MODEL statement. Statements for fitting the compound symmetric structure with the REPEATED statement are:

118

Analysis of Repeated Measures Data Using SAS

DATA BW2; INPUT sex an wk wt; CARDS; DATA ; PROC MIXED; CLASS sex an wk; MODEL wt = sex wk sex*wk; REPEATED wk / SUB=an(sex) TYPE=CS R RCORR; RUN; Here, the REPEATED statement indicates via SUB=an(sex) that data are correlated on the same animal All other animals are assumed to have the same covariance matrix, although heterogeneity of variances between animals can be accommodated by the MIXED procedure. Second, a general structure will be fitted. As an option in PROC MIXED, this is indicated as UN for unstructured. This structure makes no assumptions regarding equal variances or correlations. Observed average correlations and estimated correlation functions from compound symmetric and autoregressive plus random effect covariance structures. For fitting this structure with the REPEATED statement are DATA BW2; INPUT sex an wk wt; CARDS; DATA ; PROC MIXED; CLASS sex an wk; MODEL wt = sex wk sex*wk; REPEATED wk / SUB=an(sex) TYPE= UN R RCORR; RUN; Again, no RANDOM statement is used because interanimal variance is absorbed into the general structure. Results from statements All other animals have the same covariance matrix. This combination structure specifies an inter-animal random effect of differences between animals, and a correlation structure within animals that decreases with increasing lag between measures. A combination of MIXED procedure using both RANDOM and REPEATED statements is given below: DATA BW2; INPUT sex an wk wt; CARDS; DATA ; PROC MIXED; CLASS sex an wk; MODEL wt = sex wk sex*wk; RANDOM an(sex); REPEATED wk / SUB=an(sex) TYPE= AR(1); RUN;

119

Analysis of Repeated Measures Data Using SAS

Implications Computer software is currently available that enables researchers to analyze repeated measures data using mixed model methodology. This methodology provides more valid and efficient statistical analyses of repeated measures. Implementation of this methodology requires the data analyst to model the variance and correlation structure of the data as a first step. Then, comparisons of groups and trends over time can be analyzed. REFERENCES Damon, R. A., and W. R. Harvey (1987) Experimental Design, ANOVA, and Regression. p 320. Harper and Row, New York. SAS (1989). SAS/STAT Users Guide (Version 6, 4th Ed.). SAS Inst. Inc., Cary, NC. SAS (19960. SAS/STAT Software: Changes and Enhancements through Release 6.11. SAS Inst. Inc., Cary, NC. Snedecor, G. W., and W. G. Cochran (1980). Statistical Methods (7th Ed.). Iowa State University Press, Ames. Searle, S. R. (1971). Linear Models. John Wiley & Sons, New York.

120

You might also like