Professional Documents
Culture Documents
ABSTRACT: Growth curve analysis is an important of longitudinal data analysis. It was found that the
issue for many agricultural and laboratory species, for SAD models fit the growth process better with far fewer
both phenotypic and genetic studies. The aim of this parameters than the RR models (9 instead of 16 covari-
paper is to present the use of a novel statistical ap- ance parameters for the phenotypic analysis, and 14
proach, namely the structured antedependence (SAD) instead of 21 for the genetic analysis). Despite this
models, to deal with this issue. The basic idea of these smaller number of covariance parameters, the likeli-
models is that an observation at time t can be explained hood value was found to be much higher with the SAD
by the previous observations. These models are espe- vs. the RR models, with a difference of 262.9 for the
phenotypic analysis with a quartic polynomial for the
cially appropriate to deal with cumulative traits such
RR and 751.5 for the genetic analysis with a cubic poly-
as growth, as BW at age t clearly depends on BW mea-
nomial for both the genetic and environmental parts of
sures at ages (t 1), (t 2), etc. These models were the RR model. The SAD models also proved to be better
applied on an INRA experimental Charolais herd data able to interpolate missing values. Heritability, genetic,
set. The data comprised BW records for 560 cows born and environmental correlation coefficients were esti-
over an 11-yr period (from 1988 to 1998) from 60 sires mated for weights from birth to adulthood. The struc-
and 369 dams. The proposed SAD models were com- tured antedependence models proved, in this study, to
pared with the well-known random regression (RR) be very appropriate to model growth data in a parsimo-
models that are already widely used in various areas nious and flexible way.
Key Words: Beef Cattle, Genetic Analysis, Growth Curves, Random Regression Models,
Structured Antedependence Models
2004 American Society of Animal Science. All rights reserved. J. Anim. Sci. 2004. 82:34653473
3465
3466 Jaffrezic et al.
The aim of this paper was to investigate the use of A further advantage of modeling the antedependence
these SAD models for growth curve analysis and apply parameters with a continuous function of the lag time
them to the genetic analysis of an experimental beef is that it allows dealing with unequally spaced data,
cattle data set. as well as unbalanced data. It is therefore not required
that the animals be weighed at regular time intervals
Materials and Methods or that the times of measurement are the same for all
the animals. To be able to find an appropriate paramet-
Structured Antedependence Models ric curve, the time spacing should not be completely
random. The model can, for example, easily deal with
Let Yi(tj) be the observed body weight at age tj for smaller time intervals at earlier rather than late ages.
animal i. It is assumed that it can be decomposed into Although antedependence models are based on an
genetic and environmental components, as follows: idea similar to autoregressive models, due to the initial
condition gi(t0) = gi(t0), there is no constraint on the
Yi(tj) = (tj) + gi(tj) + ei(tj) [1] antedependence parameters 1j and 2j, whereas they
have to be between 1 and 1 for autoregressive models.
where (tj) corresponds to the fixed effects and includes Therefore, any parametric function can be used to
the mean curve in the population. Functions gi(tj) and model these parameters, depending on the biological
ei(tj) are Gaussian processes, which are independent of process studied.
one another, have an expected value of 0 at each age, Parameters gi(tj) are assumed normally distributed,
and covariance functions G(tj,tk) and E(tj,tk), respec- with mean 0 and variance g2(tj) that are termed inno-
tively, for two ages tj and tk. They represent the age- vation variances. Gabriel (1962) originally suggested
dependent genetic and environmental deviations, re- estimating one innovation variance for each time of
spectively, for animal i. It is assumed that J measure- measurement. This again leads to a quite large number
ment times are available with t1 < t2 < ... < tJ. These of parameters when many different times of measure-
ages are assumed to be the same for all the individuals; ment are analyzed. In their structured antedependence
however, the model is able to deal with missing values. models, Nunez-Anton and Zimmerman (2000) again
The idea of antedependence models, as originally pro- suggested using a continuous function of time to model
posed by Gabriel (1962), is that an observation at time these innovation variances, such as polynomials. For a
tj can be explained by the previous observations. An quadratic polynomial, for instance:
antedependence structure of order r is defined by the
fact that the kth observation (k > r) given the r preceding
ln g2(tj) = a + b tj + c tj2 [5]
ones is independent of all further observations (Gabriel,
1962). Generalizing this concept to genetic analysis, a
In this case, therefore, only three parameters will
second-order structured antedependence model for the
have to be estimated, regardless of the number of times
genetic part gi(tj), for animal i, can be written as:
of measurement J. This parametric modeling of the
error variances can also be related to the structural
gi(t0) = gi(t0) [2] model proposed by Foulley and Quaas (1995). The use
of a polynomial function to model the variances might,
gi(t1) = 11 gi(t0) + gi(t1) [3] as with the RR models, show some border effects. These
border effects will, however, be far less important for
gi(tj) = 1j gi(tj 1)) + 2j gi(tj 2) + gi(tj) [4] the SAD models than for the RR case as the estimated
covariance function of the process is not a simple func-
for j 2. tion of these polynomials but rather a complex combina-
Parameters 1j and 2j are termed antedependence tion of the antedependence parameters and innovation
parameters. As originally proposed by Gabriel (1962), variances. In practice, simple parametric functions are
these parameters were estimated for each time j. There- sufficient to adequately model these variances. When
fore, if J measurement times were considered, (J 1) possible, it is best to fit an unstructured antedepen-
parameters 1j and (J 2) parameters 2j had to be dence model (UAD) as originally proposed by Gabriel
estimated. If the number of measurement times is large (1962), which will give a nonparametric estimation of
and the order of antedependence high, that will lead to the antedependence parameters and innovation vari-
a very large number of parameters to be estimated. ances and will allow one to choose the most appropriate
Therefore, Nunez-Anton and Zimmerman (2000) pro- parametric functions of time.
posed in their SAD models to decrease this number of To relate this modeling to the genetic covariance func-
parameters by fitting a parametric function of the lag tion G(tj,tk), for two ages tj and tk, Pourahmadi (1999)
time for these antedependence terms. For example, showed that a Cholesky decomposition of the inverse
when using an exponential function, for j = 1, ..., J: of the covariance matrix can be easily calculated using
1j (tj, tj1) = exp(1 [tj tj1]) and 2j (tj, tj2) = exp(2 the antedependence parameters and innovation vari-
[tj tj2]) ances. If there are J measurement times, let G be the
Genetic analysis of growth curves 3467
genetic covariance matrix, of dimension J J. It can Table 1. Likelihood values (Log L) and Bayesian informa-
be shown that: tion criterion (BIC) for the phenotypic analysis using dif-
ferent structured antedependence (SAD) models, with
G1 = LD1L [6] antedependence up to order 3, and with the antedepen-
dence parameters modeled as constant (const), linear (lin),
where L is a lower triangular matrix with 1 on the quadratic (quad), or exponential (exp) functions of time,
diagonal and negatives of the antedependence coeffi- different random regression (RR) models based on poly-
cients rj (r = 1, ..., R for SAD(R), j = 1, ..., J) below nomials up to order 4 (quartic), and a completely unstruc-
the diagonal entries, and D is a diagonal matrix with tured model (US) with a 10 10 covariance matrix
innovation variances j2(j = 1, ..., J) as components. An Modela Parametersb Log L BIC
interesting computational property is that the inverse
G1 is sparse. Indeed, for a second-order antedepen- US 55 2,003.4 1,766.9
SAD3 quad-lin-const 9 1,615.7 1,577.0
dence model, for instance, only the first two subdiago- SAD3 quad-const-const 8 1,608.3 1,573.9
nals are nonzero. Antedependence and innovation vari- SAD3 quad-lin-lin 10 1,616.9 1,573.9
ance parameters can be estimated by restricted maxi- SAD2 quad-lin 8 1,606.8 1,572.4
mum likelihood procedures. In the following example, SAD2 quad-const 7 1,598.2 1,568.1
SAD1 exp 5 1,553.1 1,531.6
these parameters were estimated using the OWN func-
SAD1 quad 6 1,551.8 1,526.0
tion of ASREML (Gilmour et al., 2002), which requires SAD2 const 5 1,527.0 1,505.5
specification of the covariance matrix and derivatives SAD3 const 6 1,529.5 1,503.7
with respect to each of the parameters. As presently SAD1 lin 5 1,524.6 1,503.1
implemented, it is necessary to build the J J covari- SAD1 const 4 1,503.0 1,485.8
RR quartic 16 1,352.8 1,284.0
ance matrix, which will be the same for all the animals.
RR cubic 11 1,063.9 1,016.6
The J times of measurement are therefore assumed to RR quad 7 763.7 733.6
be the same for all the individuals, and should not be RR linear 4 422.8 405.6
too large (20 at most). It is not required, however, that a
Numbers following SAD indicate the order of antedependence.
these ages should be equally spaced or that the animals b
Parameters = number of covariance parameters in the model.
have measurements at all times.
Presentation of the Data and the antedependence parameters were either con-
sidered as constant, linear, quadratic, or exponential
Data analyzed in this study came from an INRA ex- functions of age. Random regression models based on
perimental Charolais herd (Mialon et al., 2001). The polynomials up to order 4 (quartic) were used.
data set comprised BW records for 560 cows born over Models were compared using the likelihood values
an 11-yr period (from 1988 to 1998) from 60 sires and and the Bayesian information criterion (BIC; Schwarz,
369 dams. Data were collected monthly from 1998 to 1978): BIC = ln L 0.5 nc ln(N p), where ln L is the
2003, and only measurements at 10 different ages were restricted maximum likelihood value, nc is the number
considered here for each animal, at approximately 0, of covariance parameters in the model, p is the number
112, 224, 364, 540, 720, 900, 1,260, 1,620 and 1,980 d. of fixed effects (also equal to rank (X)), and N is the total
Although the same ages were considered for each ani- number of observations. The number of fixed effects in
mal, they were unequally spaced and some records were the model was equal to 25 and the total number of
missing. The fixed effects were the year of birth of the observations was 5,455.
animal, twinning effect and the dam age at calving. To This analysis was followed by a genetic study. An
obtain the most accurate fit, the mean curve in the animal model was used and the pedigree file comprised
population was modeled nonparametrically by fitting 807 animals. Several SAD and RR models were com-
one mean at each age by including in the fixed effects pared for both the genetic and permanent environmen-
(tj) of Eq. [1] the variable age as a factor. This was tal parts. As before, models were compared using the
possible because only 10 measurements were consid- likelihood values and the BIC criterion. The estimated
ered for each animal and the data had very few missing genetic parameters obtained at different ages for the
values. In other practical cases, however, it could be chosen models are presented in the next section.
better to use parametric or semiparametric functions
of age, such as spline functions. Results
Figure 2. Estimated phenotypic correlation functions obtained with the unstructured (US) model and the structured
antedependent (SAD) and random regression (RR) models chosen according to the Bayesian information criterion
(BIC) values given in Table 1.
animal i at age 112 d, yi is the predicted value obtained sical growth curve. In this particular example, the phe-
either with the SAD or RR model, and y and y are the notypic prediction obtained with the RR model was de-
averages of the observed and predicted values, respec- creasing, whereas the BW for this animal would be
tively. This rc coefficient has values between 1 and 1, expected to remain stable, as predicted by the SAD
with a perfect fit at 1 and a lack of fit for negative values. model, or to keep increasing slightly, as actually ob-
When calculating this coefficient for the interpolation served.
of the 40% missing values at age 112 d, it was found
that the prediction ability was much better with the Genetic Analysis
SAD than with the RR model. In fact, the concordance
coefficient was found to be equal to 0.83 for the SAD Model comparison results for the genetic analysis are
model for the predicted vs. actual observed BW values, presented in Table 2. As for the phenotypic analysis, it
and was 0.44 for the RR model. was found that the simplest first-order SAD model, with
The differences in the interpolation abilities were less a constant antedependence parameter, had a higher
important for other ages; for example, rc = 0.69 at 224 likelihood value than a cubic RR model, while requiring
d of age for the SAD model compared with 0.58 for the far fewer parameters for the covariance structure (eight
RR model, or rc = 0.74 at 540 d of age for the SAD model instead of 21 for the RR model). The best-fitting SAD
compared with 0.70 for the RR model. model regarding the likelihood value and the BIC crite-
Finally, the rc coefficient was equal to 0.74 with the rion was a second-order antedependence, with a qua-
SAD model compared with 0.69 with the RR model dratic first-order antedependence parameter, a con-
for the latest age (1,980 d), showing a slightly better stant second-order antedependence parameter, and
extrapolation ability of the SAD model. Figure 5 illus- quadratic innovation variances for both the genetic and
trates this prediction ability for an animal with a clas- environmental parts.
3470 Jaffrezic et al.
For the genetic part, the chosen model was as follows with parameter estimates equal to 1t = 1.29 0.11 t +
(for any animal i, at age t): 0.01 t2, 2 = 0.34, and Var(et) = 15.8 + 33.8 t 1.1 t2.
For the environmental part, the chosen model was
yit = 1t yi(t 1) + 2 yi(t 2) + eit [9] as follows:
Figure 6. Estimated genetic correlation functions ob- Figure 7. Estimated environmental correlation func-
tained with the structured antedependent (SAD) and ran-
tions obtained with the structured antedependent (SAD)
dom regression (RR) models chosen according to the and random regression (RR) models chosen according to
Bayesian information criterion (BIC) values given in Ta- the Bayesian information criterion (BIC) values given in
ble 2. Table 2.