Professional Documents
Culture Documents
79
I. INTRODUCTION
Linear mixed models or variance components models
have been effectively and extensively used by
statisticians for analyzing data when the response is
univariate. Reference [12] discussed the latent variable
model for mixed ordinal or discrete and continuous
outcomes that was applied to birth defects data.
Reference [16] showed that maximum likelihood
estimation of variance components from twin data can be
parameterized in the framework of linear mixed models.
Specialized variance component estimation software that
can handle pedigree data and user-defined covariance
structures can be used to analyze multivariate data for
simple and complex models with a large number of
random effects. Reference [2] showed that Linear Mixed
Models (LMM) could handle data where the observations
were not independent or could be used for modeling data
with correlated errors. There are some technical terms for
predictor variables in linear mixed models, those are (i)
random effects, i.e. the set of values of a categorical
predictor variable that the values are selected not
completely but as random sample of all possibility values
(for example, the variable product has values
representing only 5 of a possible 42 brands), (ii)
hierarchical effects, i.e. predictor variables are measured
at more than one level, and (iii) fixed effects, i.e.
1
109406-8484 IJBAS-IJENS December 2010 IJENS
IJENS
y = X
12+3e
{ + Z
fixed
(1)
random
or
y ~ N ( X , ) .
(2)
f ( y, , ) = 1 / ( 2 ) n | |
exp (y X ) 1 (y X ) / 2
T
(3)
(4)
1 y ,
1 X ) 1 X T
= ( X T
1
1 X ( X T
1 .
1 X ) 1 X T
where P =
Equation (5) has a closed-form. Whereas (6) does
not have a closed-form, therefore [4], [17] proposed
three algorithms that could be used to calculate the
parameter estimation q, i.e. EM (expectationmaximization), N-R (Newton-Raphson), and Fisher
Scoring algorithm. Due to the MLE is consistent and
asymptotically normal with covariance matrix asymptotic
equal to inverse of Fisher Information matrix, so we
need to obtain Fisher Information matrix components. In
both univariate and multivariate linear mixed models, the
regression
coefficients
and
covariance
matrix
components will be placed it in to a vector , that is
= ( T , T ) T . Thus, the Fisher Information matrix will
have expressions as follows
l
2l
= E
Var
T
y P
q
1
P y = tr (
q
2l
E
q
= 0 ; 1 q n(n+1)/2,
2l
= (1/2)tr 1 1 ;
E
q q '
q
q'
1 q, q n(n+1)/2.
Base on (5) and a result of Theorem 1 in [15], the ML
Iterative Algorithm can be used to compute the MLE of
the unknown parameters in model (3). By using iteration
process, the estimator of variance components are found
) = such that estimator of
as elements of vech(
covariance matrix can be written as,
(8)
(5)
) ; q = 1, 2, ..., n ( n + 1) / 2 , (6)
q
(7)
f R ( y 1 ) = 1 / ( 2 ) n p 1 | A T A |
T
80
exp (1/2) y 1T ( A T A) 1 y 1 .
(9)
IJENS
l R () (1 / 2)[ln( A T A ) y 1T ( A T A ) 1 y 1 ] .
(10)
Py tr ( P
= (1 / 2){y T P
)}
q
q
q
(11)
2l R
= E
T
(12)
lR
= (1 / 2)tr P P
E
q q '
q q '
1 q, q n (n + 1) / 2 .
2
(13)
Y = XB + E
(14)
01 02
12
11
B =
M
M
p2
p1
(15)
L 0s
L 1s
,
O M
L ps
81
1
i1
x
i1
xi = , and ei = i 2 .
M
M
xip
is
The error vectors ei in (15) are random vectors that be
assumed follow multivariate normal distribution with
zero mean vector in Rns. The assumption of rows of E
are independents because each of them are related to
different observations, however the columns of E are
still allowed to have a correlation. Thus,
Cov (e i , e i' ) = 0,
Cov (e i , e i ) = i
i i'
(16)
vec( Y T ) ( X I s )vec(B T ) .
Maximize (17) with constraint that is known will yield
estimator
B = ( X T X ) 1 X T Y or
vec (B T ) = (( X T X ) 1 X T I s )vec ( Y T ) .
(18)
(19)
.
Var( vec(B T )) = ( X T X ) 1
(20)
IJENS
Y = XB + ZD + E
(21)
82
and
can be
Both components estimator of
obtained too by applying iteration that are included in
. The result of
iteration process for computing V
estimation in (24) is unbiased estimator with variance
matrix
1 ( X I )]1 .
Var (vec(B T )) = [( X I s ) T V
s
F = {F , }, where
= [1 ,L, k ]T .
0 , a subset N( 0 ) of is a neighborhood of 0
iff N( 0 ) is a superset of an open set G containing 0
[8]. Assume R k and there are three regularity
conditions on F, those are [3], [14]:
l( y, )
,
(R1). For each , the derivatives
i
2 l( y , )
3 l( y , )
,
i i '
i i ' i ''
exist,
L( y, )
2 L( y, )
g(y) ,
h( y ) ,
i
i i '
3 L( y, )
H( y ) hold for all y, and
i i ' i ''
V 1 vec( Y T ) ( X I s )vec(B T ) .
By applying partial derivation of ln-likelihood function
(23) respect to vec(B T ) and q , then make equal to zero
and it will yield estimators
T ) = [(X I ) T V
1 ( X I )]1 ( X I ) T V
1vec( Y T ),
vec(B
s
s
s
(24)
and by using a nonlinear optimization, with inequality
constraints imposed on so that positive definiteness
requirements on the and matrices are satisfied. There
is no closed-form solution for the , so the estimate of
is obtained by the Fisher scoring algoritm or NewtonRaphson algoritm. After an iterative computational
= V , i.e.
process we have V
=
= ( Z I )
( Z I ) T + (I
) .
V
s
s
n
(25)
For each
0 < E T l( y, )
<
where =
, L,
= gradient, and
k
1
norm of a matrix.
1
This is written m wp
; m . An equivalent
condition for convergence wp1 is
(26)
IJENS
83
Lemma:
Let L( y, ) and l( y, ) are likelihood function and
ln-likelihood function respectively. Assume that the
Fisher
regularity conditions (R1) and (R2) hold.
Information is the variance of the random variable,
l( y, )
; i.e.,
2 l( y, )
l( y, )
= E
= E[T l( y, )].
I( ) = Var
T
(27)
2 l( y, )
l( y, )
Q I( ) = Var
= E
Proof:
The L( y , ) is a likelihood function. Thus, it can be
considered as a joint probability density function, so
L( y , )dy = 0 .
(28)
(ln L( y, ))
L ( y , ) dy = 0 .
L
(29)
(30)
2 (ln L( y , ))
L { T L( y, )
(ln L( y , ))
+
L( y, )}dy = 0.
l( y, )
into a Taylor series
m , as
of second order about 0 and evaluating it at
follows:
m ) l( y, 0 )
l( y,
m 0 )T
=
+ { T l( y, 0 )+(1 / 2)[(
( 11 T )] H ( y, *m )}( m 0 )
(33)
m , H( y, *m ) = (T ) l( y, *m ) .
where 0 < *m <
But
m)
l( y,
= 0 , so Taylor series (33) can be written
as
m 0 ) = (1 / m )l( y, 0 )
H( y, *m )} m (
(34)
Therefore, putting
a m = (1 / m ) l( y i , 0 ) then
i =1
2 (ln L ( y , ))
L ( y , )dy
T
E 0 [a m ] = (1 / m ) E 0 [l( y i , 0 )] = 0 ,
i =1
(ln L( y , ))
+ L
L( y , )dy = 0.
m 0 )T (11T )]
{(1 / m)T l( y, 0 ) (1 /(2m))[(
This is equivalent to
m } satisfying
equations admit a sequence of solutions {
a). strong consistency:
1
m wp
(31)
; m ;
AN , (1 / m )I 1 ( ) ,
(32)
follows that
Proof:
l( y , )
E
=0.
Theorem:
Let n vectors observations y 1 , y 2 ,L, y n be iid with
L( y , )
L
L( y, ) L( y, )dy = 0 or
L L( y, )dy = 1 .
l( y, ) 2
2 l( y , )
+
E
E
=0
T
Var[a m ] = (1 / m ) 2 mI( 0 ) = I( 0 ) ,
IJENS
84
C m = (1 / 2m ) H( y i , *m ) then
(40)
i =1
p
By (36) :
Bm
I( 0 )
d
By ( 40) : a m
AN (0, I( 0 ))
= (1 / 2) E * [H( y, *m )].
m
(35)
0,
Bm wp
1I(0 )
(36)
1
Cm wp
m 0 < c0
probability. Let c0 be a constant and the
implies *m 0 < c0 such that,
m
i =1
i =1
(37)
By condition (R2) E {H(y )} < and by applying the
Law of Large Numbers (LLN), it can be shown that
m
1
(1 / m ) H( y i , 0 ) wp
E 0 [H( y, 0 )] .
i =1
m 0 < c0 ] 1 / 2
m M 1 P[
(38)
m M 2 P[ (1 / m ) H( y i , 0 ) E 0 [H( y, 0 )]
i =1
(39)
< 1] 1 / 2.
It follows from (37), (38) and (39) that,
m max{M 1 , M 2 }
P[ (1 / m)H( y, *m ) 1 + E 0 [H( y, 0 )] ] 1 / 2
hence, (1 / m)H( y, *m ) is bounded in probability.
m 0 )
(
probability 1 to 0, we have
1
m
converges with
*
m
H( y, ) is bounded in
m}
{
satisfying
1
m wp
; m is proven.
d
m
AN ( , (1 / m )I 1 ( )) [3], [14].
n } of solutions to the
Now, let a sequence {
likelihood function (23) where
T
T ))T , (vech( V
))T ]T = [ ,L,
n = [(vec(B
n1
n ( ps + s + ns ( ns +1) / 2 ) ] .
n will be convergence
By using the above Theorem,
with probability 1 to or strong consistency
n ; n and also efficiency and asymptotic
AN 0, I 1 () or
AN , n1I1 () .
V. CONCLUSION
d
m 0 )
m (
AN (0, I 1 ( 0 )).
This expression can be written as
d
m )
(
AN (0, (1 / m)I 1 ( )) or
strong
consistency
= ( Z I )
( Z I ) T + (I
) ,
V
s
s
n
and
1 ( X I )]1 .
Var(vec(B T )) = [( X I s ) T V
s
n }, which
2). Let a sequence {
T ))T , (vech(V
))T ]T then by applying the
n = [(vec(B
n is strong
proposed theorem it can be shown that
consistency, i.e.
d
1
n
n wp
AN , n 1I 1 () .
; n and
IJENS
REFERENCES
[1] R. Christensen, Linear Models for Multivariate,
Time Series, and Spatial Data. New York: SpringerVerlag, 1991, ch. 1.
[2] G. D. Garson, Linear mixed models: random
effects, hierarchical linear, multilevel, random
coefficients, and repeated measures models
(Statnotes style), Lecturer notes at Department of
Public Administration, North Carolina State
University, 2008, pp. 1- 49.
[3] R. V. Hogg, J. W. McKean, and A. T. Craig,
Introduction to Mathematical Statistics, 6rd ed.,
Intern. ed., New Jersey: Pearson-Prentice Hall,
2005, ch. 6,7.
[4] J. Jiang, Linear and Generalized Linear Mixed
Models and Their Applications, New York: Springer
Science+Business Media, LCC., 2007, ch. 2.
[5] H. A. Kalaian and S. W. Raudenbush A
multivariate mixed linear model for meta-analysis,
Psychological Methods, vol. 1(3), pp. 227-235, Mar.
1996.
[6] T. Kubokawa and M. S. Srivastava, Prediction in
multivariate mixed linear models (Discussion papers
are a series of manuscripts style), Discussion
Papers CIRJE-F-180, pp. 1-24, Oct. 2002.
[7] L.R. LaMotte,A direct derivation of the REML
likelihood function, Statistical Papers, vol. 48, pp.
321-327, 2007.
[8] S. Lipschutz, Schaums Outline of Theory and
Problems of General Topology. New York:
McGraw-Hill Book Co., 1965, ch. 1.
[9] K. E. Muller and P. W. Stewart, Linear Model
Theory: Univariate, Multivariate, and Mixed
Models, New Jersey: John Wiley & Sons, Inc.,
2006, ch. 2-5, 12-17.
[10] F. Picard, E. Lebarbier, E. Budinsk, and S. Robin,
Joint segmentation of multivariate gaussian
processes using mixed linear models (Report style),
Statistics for Systems Biology Group, Research
Report (5), pp. 1-10, 2007.
http://genome.jouy.inra.fr/ssb/
85
IJENS