Professional Documents
Culture Documents
LECTURE - 1
Analysis
y
of Variance and
Design of Experiments
Experiments--I
2
We need some basic knowledge to understand the topics in analysis of variance.
Vectors
A vector Y is an ordered n-tuple of real numbers. A vector can be expressed as row vector or a column vector as
y1
y
Y = 2
#
yn
and
x1
y1
z1
x2
y2
z
X=
, Y=
, Z = 2
#
#
#
xn
yn
zn
then
x1 + y2
ky1
x2 + y2
k 2
ky
X +Y =
, kY =
#
xn + yn
kyn
3
X + (Y + Z ) = ( X + Y ) + Z
X '(Y + Z ) = X ' Y + X ' Z
k ( X ' Y ) = ( kX ) ' Y = X '( kY )
k ( X + Y ) = kX + kY
X ' Y = x1 y1 + x2 y2 + ... + xn yn
where k is a scalar.
Orthogonal
g
vectors
Two vectors X and Y are said to be orthogonal if X ' Y = Y ' X = 0.
The null vector is orthogonal to every vector X and is the only such vector.
Linear combination
if x1 , x2 ,..., xm are m vectors of same order and k1 , k2 ,..., km are scalars, Then
m
t = ki xi
i =1
4
Linear independence
If X 1 , X 2 ,..., X m are m vectors then they are said to be linearly independent if there exist scalars k1 , k 2 ,..., k m
such that
m
k X
i =1
kx
i =1
be linearly dependent
dependent.
5
Linear function
Let K = (k1 , k 2 ,..., k m ) ' be a m 1 vector of scalars and X = ( x1 , x2 ,..., xm ) be a m 1 vector of variables, then
K 'Y =
i =1
k i y i is called a linear function or linear form. The vector K is called the coefficient vector.
For example,
example mean of x1 , x2 ,..., xm can be expressed as
x1
x
1 m
1
1
x = xi = (1,1,...,1) 2 = 1'm X
# m
m i =1
m
xm
'
where 1m is a m 1 vector of all elements unity.
Contrast
The linear function K ' X =
k x
i =1
i i
k
i =1
= 0.
x1 x2 , 2 x1 3 x2 + x3 ,
are contrasts.
x
x1
x2 + 3
2
3
m
1 m
xi .
m i =1
Matrix
A matrix is a rectangular array of real numbers. For example,
a11
a21
#
am1
a22 ... a2 n
#
#
am 2 ... amn
7
If A and B are matrices of order m n then
A( B + C ) = AB + AC.
If the orders of matrices A is m x n, B is n x p and C is p x q then
( AB )C = A( BC ).
If A is the matrix of order m x n then
I m A = AI n = A.
Trace of a matrix
The trace of n x n matrix A, denoted as tr(A) or trace(A) is defined to be the sum of all the diagonal elements of A,
i.e., tr ( A) =
a .
i =1
ii
tr ( AB ) = tr ( BA).
tr ( A) = tr ( P 1 AP ).
)
If P is an orthogonal matrix than tr ( A) = tr ( P ' AP ).
If A and B are n x n matrices, a and b are scalars then
tr ( aA + bB ) = a tr ( A) + b tr ( B )
If A is a m x n matrix, then
n
and
tr ( A ') = trA
Rank
a o
of a matrix
at
The rank of a matrix A of m n is the number of linearly independent rows in A.
Let B be another matrix of order n q.
10
Inverse of matrix
The inverse of a square matrix A of order m, is a square matrix of order m, denoted as
( AB) 1 = B 1 A1.
Idempotent matrix
A square matrix A is called idempotent if A2 = AA = A.
If A is an n n idempotent matrix with rank ( A) = r n. Then
the eigenvalues of A are 1 or 0.
trace ( A ) = rank ( A ) = r .
If A is of full rank n, then A = I n .
If A and B are idempotent and AB = BA, then AB is also idempotent.
If A is idempotent then (I A) is also idempotent and A(I - A) = (I - A) A = 0.
Analysis
y
of Variance and
Design of Experiments
Experiments--I
MODULE - I
LECTURE - 2
Quadratic forms
If A is a given matrix of order m n and X and Y are two given vectors of order m1 and n 1 respectively, then
the quadratic form is given by
m
X ' AY =
i =1
a
j =1
ijj
xi y j
X ' AX = a11 x12 + ... + amm xm2 + (a12 + a21 ) x1 x2 + ... + (am1,1 m + am,m1 ) xm1 xm .
If A is symmetric also, then
X ' AX = a11 x12 + ... + amm xm2 + 2a12 x1 x2 + ... + 2am1,m xm1 xm
m
=
i =1
a x x
j =1
ij i
The quadratic form X ' AX and the matrix A of the form is called
Positive definite if X ' AX > 0 for all x 0 .
Positive semi definite if X ' AX 0 for all x 0 .
Negative definite if X ' AX < 0 for all x 0 .
Negative semi definite if X ' AX 0 for all x 0 .
3
If A is positive semi definite matrix then aii 0 and if aii = 0 then aij = 0 for all j, and a ji = 0 for all j.
If P is any nonsingular matrix and A is any positive definite matrix (or positive semi-definite matrix) then P ' AP is
also a positive definite matrix (or positive semi-definite matrix).
A matrix A is positive definite if and only if there exists a non-singular matrix P such that A = P ' P.
A positive definite matrix is a nonsingular matrix.
If A is m n matrix and rank ( A ) = m < n then AA ' is positive definite and A ' A is positive semidefinite.
If A m n matrix and rank ( A) = k < m < n, then both A ' A and AA ' are positive semidefinite.
semidefinite
AX = b
where A is a real matrix of known scalars of order m n called as coefficient matrix, X is real vector and b is n 1 real
vector of known scalars given by
a11
a
A = 21
#
am1
a22 ... a2 n
, is an m n real matrix called as coefficient matrix,
# %#
am 2 ... amn
x1
b1
x2
b
X=
, is an n 1 vector of variables and b = 2 is an m 1 real vector.
#
#
xn
bm
5
If A is n n nonsingular matrix, then AX = b has a unique solution.
L t B = [A,
[A b] is
i an augmented
t d matrix.
t i A solution
l ti tto AX = b exist
i t if and
d only
l if rank(A)
k(A) = rank(B
k(B).
)
Let
If A is an m n matrix of rank m, then AX = b has a solution.
Linear homogeneous system AX = 0 has a solution other than X = 0 if and only if rank (A) < n.
If AX = b is consistent then AX = b has a unique solution if and only if rank (A) = n
If
1 aii 1.
column of A.
A necessary and sufficient condition that A is an orthogonal matrix is given by the following:
(i ) ai' ai = 1 for i = 1, 2,..., n
(ii ) ai' a j = 0 for i j = 1, 2,..., n.
Orthogonal matrix
A square matrix A is called an orthogonal matrix if A ' A = AA ' = I or equivalently if A1 = A '.
An orthogonal matrix is non
non-singular.
singular
If A is orthogonal, then AA ' is also orthogonal.
If A is an n n matrix and let P is an n n orthogonal matrix, then the determinants of A and P ' AP are the same.
Random vectors
Let Y1 , Y2 ,..., Yn be n random variables then Y = (Y1 , Y2 ,..., Yn ) ' is called a random vector.
The mean vector of Y is
#
#
%
#
If Y1 , Y2 ,..., Yn are pair-wise uncorrelated, then the covariance matrix is a diagonal matrix.
If Var (Yi ) = 2 for all i = 1, 2,, n then Var (Y ) = 2 I n .
kY
i =1
i i
variables Y1 , Y2 ,..., Yn .
f (Y | , ) =
1
(2 )
n /2
n /2
exp (Y ) ' 1 (Y )
2
Chi-square distribution
If Y1 , Y2 ,,...,, Yk are identicallyy and independently
p
y distributed random variables following
g the normal distribution with
k
Y
i =1
degrees of freedom.
2
The probability density function of - distribution with k degrees of freedom is given as
k
1
1
x
2
f 2 ( x) =
x
exp ; 0 < x < .
k /2
(k / 2)2
2
If Y1 , Y2 ,..., Yk are independently distributed following the normal distribution with common means 0 and common
k
2
variance , then
Y
i =1
2
g
of freedom.
has - distribution with k degrees
If the random variables Y1 , Y2 ,..., Yk are normally distributed with non-null means 1 , 2 ,..., k but common variance
k
1 th
1,
then th
the di
distribution
t ib ti off
i =1
parameter =
i =1
2
h
has
non-central
t l - distribution
di t ib ti with
ith k degrees
d
off freedom
f d
and
d non-centrality
t lit
2
i
If Y1 , Y2 ,..., Yk are independently distributed following the normal distribution with means 1 , 2 ,..., k but common
variance then
2
i =1
2
i
9
If U has a Chi-square distribution with k degrees of freedom then E (U ) = k and Var (U ) = 2k .
If U has a noncentral Chi-square distribution with k degrees of freedom and noncentrality parameter then
E (U ) = k + and Var (U ) = 2k + 4 .
If U1 , U 2 ,..., U k are independently distributed random variables with each U i having a noncentral Chi-square
distribution with
Chi
Chi-square
di t ib ti with
distribution
ith
d
off freedom
f d
and
d noncentrality
t lit parameter
t
ni degrees
i =1
then
k
.
i =1
i =1
has noncentral
is an idempotent matrix
of rank k.
Let X = ( X 1 , X 2 ,..., X n ) has a multivariate normal distribution with mean vector and positive definite covariance
matrix
X ' A1 X is
i di
distributed
t ib t d as
X ' A2 X is distributed as
2 with
ith n1
2 with n2
d
degrees off freedom
f d
and
d noncentrality
t lit parameter
t ' A1 and
d
10
t- distribution
If
X
is called the t-distribution
t distribution with n degrees of freedom
freedom.
Y /n
n +1
n +1
t 2 2
2
fT (t ) =
;
1 +
n
n
n
2
- < t < .
X
Y /n
11
F- distribution
2
If X and Y are independent random variables with - distribution with m and n degrees of freedom respectively,
X /m
is called the F-distribution with m and n degrees of freedom. The
Y /n
m + n m
2 n
fF ( f ) =
m n
2 2
m /2
m2
m
1 + n
m+n
; 0 < f < .
2
If X has a noncentral Chi-square distribution with m degrees of freedom and noncentrality parameter ; Y has a
distribution with n degrees of freedom, and X and Y are independent random variables, then the distribution of
F=
X /m
is the noncentral F distribution with m and n degrees of freedom and noncentrality parameter .
Y /n
Analysis
y
of Variance and
Design of Experiments
Experiments--I
MODULE - I
LECTURE - 3
Linear model
Suppose there are n observations. In the linear model, we assume that these observations are the values taken by n
random variables Y1 , Y2 ,.., Yn satisfying the following conditions:
with
Y = X +
where Y is a n x1 vector of observation, X is a n p matrix of n observations ( xij 's) on each of X 1 , X 2 ,..., X p variables,
is called study or dependent variable, X 1 , X 2 ,..., X p are called explanatory or independent variables and
3
2
Alternatively since Y ~ N ( X , I ) so the linear model can also be expressed in the expectation form as a normal
E (Y ) = X
Var (Y ) = 2 I .
2
Note that and are unknown but X is known.
Estimable function
' off the
A linear
li
parametric
t i ffunction
ti
th parameter
t iis said
id to
t be
b an estimable
ti bl parametric
t i ffunction
ti or estimable
ti bl if th
there
exists a linear function of random variables A 'Y of Y where Y = (Y1 , Y2 ,..., Yn ) ' such that
E (A ' Y ) = '
with A = (A1 , A 2 ,..., A n ) ' and = (1 , 2 ,..., n ) ' being the vectors of known scalars.
Suppose
S
d A '2Y are the
th BLUE off 1' and
ti l
A 1' Y and
d 2' respectively.
Then (a1A1 + a2 A 2 ) ' Y is the BLUE of ( a11 + a2 2 ) ' .
If ' is estimable, its best estimate is ' where is any solution of the equations X ' X = X ' Y .
S = ' = (Y X ) '(Y X )
= Y 'Y 2 ' X 'Y + ' X ' X .
S
=0
X ' X = X 'Y
which is termed as normal equation.
= ( X ' X ) 1 X ' Y
assuming rank ( X )
2S
= X 'X
= p. Note that
'
1
is a positive definite matrix. So = ( X ' X ) X ' Y is the value of
E (Y X ) = 0.
0
respectively, then
Y = X +
with E ( ) = 0, Var ( ) = and is normally distributed, we find
E (Y ) = X , Var (Y ) = .
Assuming
= P'P
where P is a nonsingular matrix. Premultiplying Y = X + by P, we get
PY = PX + P
or
Y* = X * + *
where
Y * = PY , X * = PX and * = P .
Distribution of
A 'Y
2
In the linear model Y = X + , ~ N (0, I ) consider a linear function A 'Y which is normally distributed with
E (A ' Y ) = A ' X ,
V (A ' Y ) = 2 (A ' A).
Var
)
Then
A 'Y
A' X
~ N
,1 .
A 'A
A 'A
(A ' y ) 2
Further,
has a noncentral Chi-square distribution with one degrees of freedom and noncentrality parameter
2A ' A
(A ' X ) 2
.
2A ' A
Degrees of freedom
A linear function A 'Y of the observations (A 0) is said to carry one degrees of freedom. A set of r linear functions L ' Y ,
where L is r x n matrix, is said to have M degrees of freedom if there exists M linearly independent functions in the set
and no more. Alternatively, the degrees of freedom carried by the set L ' Y equals rank (L). When the set L ' Y are
the estimates of ' , the degrees of freedom of the set L ' Y will also be called the degrees of freedom for the
estimates of ' .
Sum of squares
If A 'Y is a linear function of observations, then the projection of Y on A is the vector
Y 'A
.A . The square of this
A 'A
(A ' Y ) 2
projection is called the sum of squares (SS) due to A ' y is given by
. Since A 'Y has one degree of freedom,
A 'A
so the SS due A 'Y to has one degree of freedom.
The sum of squares and the degrees of freedom arising out of the mutually orthogonal sets of functions can be added
together to give the sum of squares and degrees of freedom for the set of all the function together and vice versa.
X ' A1 X is distributed as 2 with n1 degrees of freedom and noncentrality parameter ' A1 and
X ' A2 X
Then
A1A2 = 0.
Fisher-Cochran theorem
If X = ( X 1 , X 2 ,..., X n ) has multivariate normal distribution with mean vector
matrix
where
h
Qi = X ' Ai X
1 2,...,
2 k . Then
with
ith rank
k ( Ai ) = N i , i = 1,
Th Qi ' s are independently
i d
d tl distributed
di t ib t d noncentral
t l Chi-square
Chi
distribution with Ni degrees of freedom and noncentrality parameter ' Ai if and only if
k
' 1 = ' Ai .
i 1
i=
N
i =1
= N , in which case
10
Let
then
f ( X )
x
1
f ( X )
f ( X )
= x2 .
X
#
f ( X )
x
n
If A is an n n matrix, then
K ' X
= K.
X
X ' AX
= 2( A + A ') X .
X
X
m n matrix.
Var (Yi ) = 2 .
This is the linear model in the expectation form where 1 , 2 ,..., p are the unknown parameters and x ij ' s are the known values
of independent covariates X 1 , X 2 ,..., X p .
p
where i s are identically and independently distributed random error component with mean 0 and variance 2 , i.e.,
E ( i ) = 0,
0 Var ( i ) = 2 and Cov( i , j ) = 0(i j )).
Y = X +
where
X 11 X 12 ... X 1 p
X 21 X 22 ... X 2 p
X X ... X
n
1
n
2
np
the covariates X 1 , X 2 ,..., X p are counter variables or indicator variables where xij counts the number of times
the effect j occurs in the ith observation xi .
The value xij = 1 indicates the presence of effect j in xi and xij = 0 indicates the absence of effect j in Xi.
Note that in the linear regression model, the covariates are usually continuous variables.
When some of the covariates are counter variables and rest are continuous variables, then the model is called as
mixed model and is used in the analysis of covariance.
One-way model: Y = + X +
Two-way model: Y = + X + Z +
Three-way model : Y = + X + Z + W + and so on.
Consider an example of agricultural yield. The study variable denotes the yield which depends on various covariates
X 1 , X 2 ,..., X p . In case of regression analysis, the covariates X 1 , X 2 ,..., X p are the different variables like temperature,
Now consider the case of one way model and try to understand its interpretation in terms of multiple regression model.
The covariate X is now measured at different levels, e.g., if X is the quantity of fertilizer then suppose there are p
possible values, say 1 Kg., 2 Kg., ,..., p Kg. then X 1 , X 2 ,..., X p denotes these p values in the following way.
The linear model now can be expressed as Y = o + 1 X 1 + 2 X 2 + ... + p X p +
by defining
If effect of 1 Kg. of fertilizer is present, then other effects will obviously be absent and the linear model is expressible as
Y = 0 + 1 ( X 1 = 1) + 2 ( X 2 = 0) + ... + p ( X p = 0) +
= 0 + 1 + .
Y = 0 + 1 ( X 1 = 0) + 2 ( X 2 = 1) + ... + p ( X p = 0) +
= 0 + 2 + .
Y = 0 + 1 ( X 1 = 0) + 2 ( X 2 = 0) + ... + p ( X p = 1) +
= 0 + p +
and so on.
If the experiment with 1 Kg. of fertilizer is repeated n1 number of times then n1 observation on response variables
p
as
are recorded which can be represented
7
The experiment is continued and if Xp = 1 is repeated np times, then on the same lines
Yp1 = 0 + 1.0 + 2 .0 + ... + p .1 + P1
Yp 2 = 0 + 1.0 + 2 .0 + ... + p .1 + P 2
#
Ypn p = 0 + 1.0 + 2 .0 + ... + p .1 + pn p .
#
y1n 1
1
y21 1
y 1
22
#
= #
y
2 n2 1
#
#
y p1 1
1
y
p2
#
#
y pn 1
p
1
1
#
1
0
0
#
0
#
0
0
#
0
0
0
#
0
1
1
#
1
#
0
0
#
0
or
Y = X + .
0" 0 0
0" 0 0
#% # #
0" 0 0
0" 0 0
0" 0 0
# %# #
0" 0 0
#% # #
0" 0 1
0" 0 1
#%# #
0" 0 1
11
12
1n
21
0
22
1 + #
#
2n2
p
#
p1
p2
pn
p
In the two way analysis of variance model, there are two covariates and the linear model is expressible as
Y = 0 + 1 X 1 + 2 X 2 + ... + p X p + 1 Z1 + 2 Z 2 + ... + q Z q +
where
X 1 , X 2 ,..., X p
denotes, e.g., the p levels of quantity of fertilizer, say 1 Kg., 2 Kg.,..., p Kg. and Z1 , Z 2 ,..., Z q
denotes, e.g., the q levels of level of irrigation, say 10 Cms., 20 Cms.,,10q Cms. etc. The levels X 1 , X 2 ,..., X p ,
Z1 , Z 2 ,..., Z q are defined as counter variable indicating the presence or absence of the effect as in the earlier
case. If the
th effect
ff t off X1 and
d Z1 are present,
t i.e.,
i
1K
Kg off fertilizer
f tili
and
d 10 Cms.
C
off irrigation
i i ti iis used
d th
then th
the
linear model is written as
The regression
g
p
parameters ' s can be fixed or random.
If all ' s are unknown constants, they are called as parameters of the model and the model is called as a
fixed-effects model or model I. The objective in this case is to make inferences about the parameters and the error
variance 2 .
for all i = 1,
case, j occurs with every
1 2,...,
2
n then j is termed as additive constant. In this case
If all ' s are observable random variables except the additive constant, then the linear model is termed as
random-effects model, model II or variance components model. The objective in this case is to make inferences
2
2
2
about the variances of ' s, i.e., 1 , 2 ,..., p and error variance 2 and/or certain functions of them.
If some parameters are fixed and some are random variables, then the model is called as mixed-effects model
or model III. In mixed effect model, at least one
2
objective is to make inference about the fixed effect parameters, variance of random effects and error variance .
Analysis of variance
Analysis of variance is a body of statistical methods of analyzing the measurements assumed to be structured as
xij
be identically and independently distributed with mean 0 and variance 2 . It may be noted that the i s can be assumed
additionally to follow a normal distribution N (0, 2 ). It is needed for the maximum likelihood estimation of parameters from
the beginning of analysis but in the least squares estimation, it is needed only when conducting the tests of hypothesis and
the confidence interval estimation of parameters.
parameters The least squares method does not require any knowledge of distribution
like normal upto the stage of estimation of parameters.
We need some basic concepts to develop the tools.
S 2 = i2 = ' = ( y X )( y X )
i =1
= yy 2 X ' y + X X
y = ( y1 , y2 ,..., yn ) . Differentiating
i minimum
is
i i
where
h
Diff
i i
S2 with
i h respect to and
d substituting
b i i iit to b
be zero, the
h normall
equations are obtained as
dS 2
= 2 X X 2 X y = 0
d
or X X = X y.
If X has full rank then ( X X ) has a unique inverse and the unique least squares estimate of is
= ( X X ) 1 X y
which is the best linear unbiased estimator of in the sense of having minimum variance in the class of linear and unbiased
)
estimator If rank of X is not full,
estimator.
full then generalized inverse is used for finding the inverse of ( X X ).
If L is a linear parametric function where L = (A 1 , A 2 ,..., A p ) is a non-null vector, then the least squares estimate of L
is L .
L admits
A question
ti arises
i
th t what
that
h t are the
th conditions
diti
under
d which
hi h a linear
li
parametric
t i function
f
ti
d it a unique
i
l
least
t
squares estimate in the general case.
Estimable functions
A linear function of the parameters with known is said to be an estimable parametric function (or estimable) if there
exists a linear function L Y of Y such that
E ( LY ) = for all Rb .
Theorem 1
A linear parametric function L admits a unique least squares estimate if and only if L is estimable.
Th
Theorem
2 (Gauss
(G
M k ff theorem)
Markoff
h
)
If the linear parametric function L is estimable then the linear estimator L where is a solution of X X = X Y
g minimum variance in the class of all linear and unbiased
is the best linear unbiased estimater of L in the sense of having
estimators of L .
Theorem 3
If the linear parametric function
1 , 2 ,..., k
is also estimable.
Theorem 4
All linear parametric functions in are estimable if and only if X has full rank
rank.
If X is not of full rank, then some linear parametric functions do not admit the unbiased linear estimators and nothing can be
inferred about them. The linear parametric functions which are not estimable are said to be confounded. A possible solution
to this problem is to add linear restrictions on so as to reduce the linear model to a full rank.
Theorem 5
Let L1' and L'2 be two estimable parametric functions and let L1' and L'2 be their least squares estimators. Then
assuming that X is a full rank matrix. If not, the generalized inverse of X X can be used in place of unique inverse.
Estimator of
q
estimation
2 based on least squares
Consider an estimator of 2 as
1
( y X )( y X )
2 =
n p
1
[ y X ( X X ) 1 X ' y ][ y X ( X X ) 1 X y ]
n p
1
y [ I X ( X X ) 1 X ][ I X ( X X ) 1 X ] y
=
n p
1
=
y [ I X ( X X ) 1 X ] y
n p
=
E ( 2 ) =
2
n p
tr[ I X ( X X ) 1 X ]
= 2
and so
2 is an unbiased estimator of 2 .
least squares only while constructing the tests for hypothesis and the confidence
intervals. For maximum likelihood estimation, we need the distributional assumption from the beginning.
p
Suppose y1 , y2 ,..., yn are independently and identically distributed following a normal distribution with mean E ( yi ) = j xij
and variance Var ( yi ) = 2 (i = 1, 2,, n). Then the likelihood function of
L( y | , 2 ) =
1
n
2
n
2 2
(2 ) ( )
where
exp 2 ( y X )( y X )
2
L
= 0 X X = X y,
1
L
2
=
0
=
( y X )( y X ).
)
2
n
y1 , y2 ,..., yn is
j =1
8
Assuming the full rank of X, the normal equations are solved and the maximum likelihood estimators are obtained as
= ( X X ) 1 X y
1
n
1
= y I X ( X X ) 1 X y.
n
2 = ( y X )( y X )
The second order differentiation conditions can be checked and they are satisfied for and 2 to be the maximum
likelihood estimators.
Note that in the maximum likelihood estimator is same as the least squares estimator and
Theorem 6
Let
Y = (Y1 , Y2 ,..., Yn )
follow a multivariate normal distribution N ( , ) with mean vector and positive definite covariance
Theorem 7
Let Y = (Y1 , Y2 ,..., Yn ) follows a multivariate normal distribution N (, ) with mean vector and positive definite covariance
2
2
matrix . Let Y AY
follows ( p1 , A1 ) and Y A2Y follows ( p2 , A2 ). Then Y AY
and Y A2Y are independently distributed if
1
1
A1 A2 = 0.
Theorem 8
Let Y = (Y1 , Y2 ,..., Yn )
n 2
2
follows (n p) where rank(X) = p.
p
E ( L ) = L( X X ) 1 X E (Y )
= L( X X ) 1 X X
= L
Var ( L ) = LVar ( ) L
= LE ( )( ) L
= 2 L( X X ) 1 L.
Since is a linear function of y and L is a linear function of , so L follows a normal distribution N L , 2 L( X X ) 1 L .
Let A = I X ( X X )1 X and B = L '( X X ) 1 X , then L = L ( X X ) 1 X Y = BY
and
n 2 = (Y X ) ' I X ( X X )1 X (Y X ) = Y ' AY .
2
2
So, using Theorem 6 with rank(A) = n p, n follows a (n p) . Also
BA = L( X X )1 X L( X X )1 X X ( X X )1 X
= 0.
0
So using Theorem 7, Y ' AY and Y ' BY are independently distributed.
Analysis of Variance The technique in the analysis of variance involves the breaking down of total variation into orthogonal
components. Each orthogonal factor represents the variation due to a particular factor contributing in the total variation.
Model
p
Let Y1 , Y2 ,..., Yn be independently distributed following a normal distribution with mean E (Yi ) = j xij and variance 2. Denoting
j =1
Y = (Y1 , Y2 ,..., Yn ) a n1 column vector, such assumption can be expressed in the form of a linear regression model Y = X +
where X is a n p matrix, is a p 1 vector and is a n1 vector of disturbances with
E ( ) = 0
Cov ( ) = 2 I
and follows a normal distribution.
This implies that
E (Y ) = X
E (Y X )(Y X ) = 2 I .
Now we consider four different types of tests of hypothesis .
I the
In
th first
fi t two
t
cases, we develop
d
l the
th likelihood
lik lih d ratio
ti test
t t for
f the
th nullll hypothesis
h
th i related
l t d to
t the
th analysis
l i off variance.
i
N t that,
Note
th t
later we will derive the same test on the basis of least squares principle also. An important idea behind the development of this
test is to demonstrate that the test used in the analysis of variance can be derived using least squares principle as well as
likelihood ratio test.
Case 1: Test of H 0 : = 0
Consider the null hypothesis for testing H 0 : = 0 where = ( 1 , 2 ,..., p ), 0 = ( 10 , 20 ,..., p0 ) ' is specified and 2 is
unknown.
Assume that all i ' s are estimable, i.e., rank(X) = p (full column rank). We now develop the likelihood ratio test.
The ( p + 1) 1 dimensional parametric space is a collection of points such that
= {( 0 , 2 ); 2 > 0} .
The likelihood function of y1 , y2 ,..., yn is
n
1 2
1
exp 2 ( y X )( y X ) .
L( y | , ) =
2
2
2
2
The likelihood function is maximum over when and are substituted with their maximum likelihood estimators, i.e.,
= ( X X )1 X y
1
n
2 = ( y X )( y X ).
Substituting and 2 in L( y | , 2 ) gives
n
1
1 2
Max L( y | , 2 ) =
exp 2 ( y X )( y X )
2
2
2
n
n
=
exp .
2 ( y X )( y X )
2
U d H 0 , the
Under
th maximum
i
lik lih d estimator
likelihood
ti t off 2 is
i
1
n
2 = ( y X 0 )( y X 0 ).
The maximum value of the likelihood function under H 0 is
n
1
1 2
exp 2 ( y X 0 )( y X 0 )
Max L( y | , ) =
2
2
2
n
n
=
exp .
0
0
2
2 ( y X )( y X )
5
The likelihood ratio test statistic is
Max L( y | , 2 )
Max L( y | , 2 )
( y X )( y X ) 2
=
0
0
( y X )( y X )
n
( y X )( y X )
=
'
( y X ) + ( X X 0 ) ( y X ) + ( X X 0 )
( 0 ) ' X X ( 0 )
= 1 +
( y X )( y X )
q
= 1 + 1
q2
n
2
n
2
where q2 = ( y X )( y X )
and
q1 = ( 0 ) X X ( 0 ).
The expression
p
of q1 and q2 can be further simplified
p
as follows:
Consider
q1 = ( 0 ) X X ( 0 )
= ( X X )1 X y 0 X X ( X X )1 X y 0
= ( X X )1 X ( y X 0 ) X X ( X X )1 X ( y X 0 )
= ( y X 0 ) X ( X X )1 X X ( X X )1 X ( y X 0 )
= ( y X 0 ) X ( X X )1 X ( y X 0 )
q2 = ( y X )( y X )
= y X ( X X )1 X y y X ( X X )1 X y
= y I X ( X X )1 X y
= [( y X 0 ) + X 0 ][ I X ( X X ) 1 X '][( y X 0 ) + X 0 ]
= ( y X 0 )[ I X ( X X )1 X ]( y X 0 ).
7
In order to find out the decision rule for H 0 based on , first we need to find if is a monotonic increasing or decreasing
function of
q1
. So we proceed as follows:
q2
q
Let g = 1 , so that = 1 + 1
q2
q2
then
d
n
=
d
dg
2
n
2
= (1 + g )
n
2
1
n
(1 + g ) 2
+1
d
decreases.
dg
So as g increases,
q1
.
q2
The decision rule is to reject H 0 if 0 where 0 is a constant to be determined on the basis of size of the test. Let us
simplify this in our context.
0
or
or
or
q1
1 +
q2
1
(1 + g )
n
2
n
2
(1 + g ) 0 n
2
n
or
g 0 1
or
g C
So reject H 0
whenever
q1
C
q2
q1
can also be obtained by the least squares method as follows. The least squares methodology will
q2
also be discussed in further lectures.
q1 = ( 0 ) X X ( 0 )
q1
Min( y X )( y X )
=
Min( y X )( y X )
sum of
sum
sum
squares due
of squares
of
to deviation
due to H o
squares
from H o
OR
due to error
OR
sum of squares
due to
Total sum of
squares
9
Theorem 9
Let
Z =Y X0
Q1 = Z X ( X X )1 X ' Z
Q2 = Z [ I X ( X X )1 X ]Z .
Q1
Q2
2
2
Then Q1 and Q2 are independently distributed. Further, when H 0 is true , then 2 ~ ( p ) and 2 ~ (n p )
2
2
E (Z ) = X 0 X 0 = 0
Var ( Z ) = Var (Y ) = 2 I .
Further Z is a linear function of Y and Y follows a normal distribution. So Z ~ N (0,
(0 2 I )
The matrices X ( X X )1 X and [ I X ( X X ) 1 X ] are idempotent matrices. So
tr [ X ( X X ) 1 X ] = tr[( X X ) 1 X X ] = tr ( I p ) = p
tr [ I X ( X X ) 1 X ] = tr I n tr[ X ( X X ) 1 X ] = n p.
Q1
~ 2 ( p ) and
Q2
~ 2 (n p )
1
where the degrees of freedom p and (n-p) are obtained by the trace of X ( X X )1 X and trace of I X ( X X ) X ,
respectively.
10
Since Q1 and Q2 are independently distributed, so under H 0 ,
Q1 / p
Q2 /(n p ) follows a central F distribution, i.e.
n p Q1
F ( p, n p).
p Q2
Hence the constant C in the likelihood ratio test statistic is given by C = F1 ( p, n p)
where F1 (n1 , n2 ) denotes the upper 100 % points of F-distribution with n1 and n2 degrees of freedom.
The computations of this test of hypothesis can be represented in the form of an analysis of variance table.
Degrees of
freedom
p
q1
p
(n p)
C = F1 ( p, n p)
q2
(n p)
q2
H0 : = 0
Total
Mean
squares
q1
Error
Sum of
squares
( y X 0 )( y X 0 )
F - value
n p q1
p q2
H 0 : k = k0 , k = 1,
1 2,..,
2 r < p when
C
Case
2 Test
2:
T t off a subset
b t off parameters
t
h
d
2 and
i = i0 , i = 1, 2,..., p. Now consider another situation, in which the interest is to test only a subset of 1 , 2 ,..., p , i.e., not all but
only a few parameters. This type of test of hypothesis can be used e.g., in the following situation. Suppose five levels of
voltage are applied to check the rotations per minute (rpm) of a fan at 160 volts, 180 volts, 200 volts, 220 volts and 240 volts.
It can be realized in practice that when the voltage is low, then the rpm at 160, 180 and 200 volts can be observed easily. At
220 and 240 volts, the fan rotates at the full speed and there is not much difference in the rotations per minute at these
voltages. So the interest of the experimenter lies in testing the hypothesis related to only first three effects, viz., ,1 for 160
volts, 2 for 180 volts and 3 for 200 volts.
The null hypothesis in this case can be written as:
H 0 : 1 = 10 , 20 = 20 , 3 = 30
3
Let 1 , 2 ,..., p be the p parameters.
We can divide them into two parts such that out of 1 , 2 ,..., r , r +1 ,..., p , and we are interested in testing
the hypothesis of a subset of it.
0
Suppose, we want to test the null hypothesis H 0 : k = k , k = 1, 2,.., r < p when r +1 , r + 2 ,..., p and 2 are unknown.
The alternative hypothesis under consideration is H 1 : k k0 for at least one k = 1, 2,.., r < p.
In order to develop a test for such a hypothesis, the linear model
Y = X +
under the usual assumptions can be rewritten as follows:
(1)
Partition X = ( X 1 X 2 ), =
(2)
Y = X +
(1)
= ( X1 X 2 )
+
(2)
= X 1 (1) + X 2 (2) + .
4
The null hypothesis of interest is now
2
0
(2) and
H 0 : (1) = (1)
= ( 10 , 20 ,..., r0 ) where
h
d are unknown.
k
exp 2 ( y X )( y X ) .
L( y | , 2 ) =
2
2
2
Th maximum
The
i
value
l off lik
likelihood
lih d function
f
ti under
d is
i obtained
bt i d b
by substituting
b tit ti th
the maximum
i
lik
likelihood
lih d
estimates of and 2 , i.e.,
= ( X X )1 X y
2 =
1
( y X )( y X )
n
as
n
1 2
1
exp 2 ( y X )( y X )
Max L( y | , ) =
2
2
2
n
n
exp .
=
'
2
2 ( y X ) ( y X )
5
Now we find the maximum value of likelihood function under H 0 . The model under H 0 becomes
0
Y = X 1 (1)
+ X 2 2 + . The
Th lik
likelihood
lih d function
f
ti under
d H 0 is
i
1 2
1
0
0
exp 2 ( y X 1 (1)
L( y | , ) =
X 2 (2) )( y X 1 (1)
X 2 (2) )
2
2
2
1 2
1
(0)
. Note that ( 2) and 2 are the unknown parameters. This likelihood function looks like as if it is
where y* = y X 1 (1)
2 = ( y * X 2 (2) )( y * X 2 (2) ).
Note that X 2' X 2 is a principal minor of X X . Since X X is a positive definite matrix, so X 2' X 2 is also positive
definite. Thus ( X 2' X 2 ) 1 exists and is unique.
Thus the maximum value of likelihood function under H 0 is obtained as
n
1 2
1
2
2
2
n
n
=
exp .
2 ( y * X ) '( y * X )
2
2 (2)
2 (2)
0
The likelihood ratio test statistic for H 0 : ((1)) = (1)
is
( )
max L( y | , 2 )
max L( y | , 2 )
( y X )( y X )
=
( y * X 2 (2) )( y * X 2 (2)
n
2
( y * X 2 (2) )( y * X 2 (2) ) ( y X )( y X ) + ( y X )( y X )
=
( y X )( y X )
-n
( y * X 2 (2) )( y * X 2 (2) ) ( y X )( y X ) 2
= 1 +
( y X )( y X )
-n
q1 2
= 1 +
q2
- n2
Consider
( y * X 2 (2) )( y * X 2 (2) ) = ( y * X 2 ( X 2' X 2 ) 1 X 2' y*) ( y * X 2 ( X 2' X 2 ) 1 X 2' y*)
= y * ' I X 2 ( X 2' X 2 ) 1 X 2' y *
0
0
= ( y X 1 (1)
X 2 (2) ) + X 2 (2) I X 2 ( X 2 X 2 ) 1 X 2 ( y X 1 (1)
X 2 (2) ) + X 2 (2)
0
= ( y X 1 (01) X 2 (2) ) I X 2 ( X 2 X 2 ) 1 X 2 ( y X 1 (1)
X 2 (2) ).
7
The other terms becomes zero using the result X 2' I X 2 ( X 2' X 2 ) 1 X 2' = 0.
Consider
( y X )( y X ) = ( y X ( X ' X ) 1 X ' y )( y X ( X ' X ) 1 X ' y )
= y I X ( X X )1 X ' y
0
0
0
0
= ( y X 1 (1)
X 2 (2) ) + X 1 (1)
+ X 2 (2) ) I X ( X ' X ) 1 X ( y X 1 (1)
X 2 (2) ) + X 1 (1)
+ X 2 (2) )
0
0
= ( y X 1 (1)
X 2 (2) ) ' I X ( X X )1 X ( y X 1 (1)
X 2 (2) )
0
+ X 2 (2)
and other terms become zero using the result X ' I X ( X X )1 X = 0. Note that under H 0 , the term X 1 (1)
0
can be expressed as ( X 1 X 2 )( (1)
(2) ) ' . Thus
q1 = ( y * X 2 (2) )( y * X 2 (2) ) ( y X )( y X )
= y *' I X 2 ( X 2 X 2 ) 1 X 2 y * y ' I X ( X X ) 1 X y
0
0
0
= ( y X 1 (1)
X 2 (2) ) I X 2 ( X 2' X 2 ) 1 X 2' ( y X 1 (1)
X 2 (2) ) ( y X 1 (1)
X 2 (2) ) ' I X ( X X ) 1 X ( y X 1 (10 ) X 2 (2) )
0
0
= ( y X 1 (1)
X 2 (2) ) X ( X X )1 X X 2 ( X 2' X 2 ) 1 X 2' ( y X 1 (1)
X 2 (2) )
q2 = ( y X )( y X )
= y [ I X ( X X ) X ] y
0
0
0
0
= ( y X 1 (1)
X 2 (2) ) + ( X 1 (1)
+ X 2 (2) ) [ I X ( X X ) X ] ( y X 1 (1)
X 2 (2) ) + ( X 1 (1)
+ X 2 (2) )
0
0
= ( y X 1 (1)
X 2 (2) ) ' I X ( X X ) 1 X ( y X 1 (1)
X 2 (2) ).
'
Other terms become zero. Note that in simplifying the terms q1 and q2, we tried to write them in the quadratic form with
0
same variable ( y X 1 (1) X 2 (2) ).
8
Using the same argument as in the case 1, we can say that since is a monotonic decreasing function of
lik lih d ratio
likelihood
ti ttestt rejects
j t H 0 whenever
h
q1
>C
q2
where C is a constant to be determined by the size of the test.
The likelihood ratio test statistic can also be obtained through least squares method as follows:
0
(q1 + q2 ) : Minimum value of ( y X )( y X ) when H 0 : (1) = (1)
holds true.
(q1 + q2 ) :
q2
q1
: Sum of squares due to the deviation from H 0 or sum of squares due to (1) adjusted for (2)
0
= 0 then
If (1)
= X y
'
(2)
X 2' y.
Reduction
sum of squares
sum of squares
due to (2)
or
sum of squares
due to
ignoring (1)
q1
, so the
q2
0
Z = Y X 1 (1)
X 2 (2)
Q1 = Z AZ
Q2 = Z BZ
A = X ( X X )1 X ' X 2 ( X 2' X 2 ) 1 X 2'
where
B = I X ( X X ) 1 X '.
Then
Q1
and
Q2
Q1
~ 2 ( q ) and
Q2
~ 2 ( n p ).
)
Thus under H 0 ,
Q1 / r
n p Q1 follow a F-distribution F ( r , n p ).
=
Q2 / ( n p )
r Q2
is
C = F1 ( r, n p )
where F1 ( r, n p) denotes the upper 100 % points on F-distribution with r and (n - p) degrees of freedom.
10
Degrees of
freedom
r
(n p)
C = F1 ( p, n p )
q2
(n p )
q2
H0 : = 0
Total
n -(p q)
Mean
squares
q1
r
q1
Error
Sum of
squares
q1 + q2
F - value
n p q1
r q2
Case 3: Test of H 0 : L =
Let us consider the test of hypothesis related to a linear parametric function. Assuming that the linear parameter function
L is estimable where L = (A 1 , A 2 ,..., A p ) is a p 1 vector of known constants and = ( 1 , 2 ,..., p ). The null hypothesis
of interest is
H 0 : L =
where is some specified constant.
Consider the set up of linear model Y = X + where Y = (Y1 , Y2 ,..., Yn ) follows N ( X , 2 I ). The maximum likelihood
estimators
ti t
off and
d 2 are
= ( X X ) 1 X y
and
1
n
respectively
2 = ( y X )( y X ),
) respectively.
E ( L ' ) = L
Cov( L ) = 2 L( X X )1 L
L ~ N L , 2 L( X X )1 L
and
n 2
~ 2 (n p)
n 2
( n p )( L )
t=
n 2 L( X X ) 1 L
follows a t-distribution with (n p) degrees of freedom. So the test for H 0 : L = against H1 : L rejects H 0 whenever
t t
(n p )
where t1 (n1 ) denotes the upper 100 % points on t-distribution with n1 degrees of freedom.
H 0 : 1 = 1 , 2 = 2 ,..., k = k
where 1 , 2 ,..., k are the known constants.
Let = (1 , 2 ,..., k ) and = (1 , 2 ,.., k ).
Then H 0 is expressible as H 0 : = L =
where L is a k p matrix of constants associated with L1 , L2 ,..., Lk .
The maximum likelihood estimator of i is : i = L'i
Then = (1 , 2 ,..., k ) = L .
Also E () =
Cov () = 2V
where V = (( L'i ( X X ) 1 L j ))
( )V 1 ( )
2
follows a 2 distribution with k degrees of freedom and
n 2
2
follows a distribution with (n - p) degrees of freedom where
1
2 = ( y X )( y X ) is the maximum likelihood estimator of 2 .
n
5
1
2
Further ( )V ( ) and n are also independently distributed.
Thus under H 0 : =
( )V 1 ( )
2
n
(n p )
or
n p ( )V 1 ( )
k
n 2
H1 : At least one i for i = 1, 2,..., k whenever F F1 (k , n p) where F1 (k , n p) denotes the upper 100 % points
of F-distribution with k and (n p) degrees of freedom.
The random samples from different population are assumed to be independent of each other.
These observations follow the set up of linear model
Y = X +
where
= ( 1 , 2 ,..., p )
= (11 , 12 ,..., 1n , 21 ,..., 2 n ,..., p1 , p 2 ,..., pn ) '
1
7
1 0...0
# #%# n1 values
1 0 0
0 1...0
# #%# n values
2
X =
0 1...0
# # #
0 0...1
# #%# n p values
0 0...1
0 if effect i is absent in x j
p
n = ni .
i =1
2
Obviously, rank ( X ) = p, E (Y ) = X and Cov(Y ) = I .
This completes the representation of a fixed effect linear model of full rank.
Th nullll h
The
hypothesis
th i off iinterest
t
t iis H 0 : 1 = 2 = ... = p = (say)
(
)
and H 1 : At least one i j (i j )
where and 2 are unknown.
W would
We
ld d
develop
l h
here the
h lik
likelihood
lih d ratio
i test. IIt may b
be noted
d that
h the
h same test can also
l b
be d
derived
i d through
h
h the
h lleast
squares method. This will be demonstrated in the next module. This way the readers will understand both the methods.
We already have developed the likelihood ratio for the hypothesis H 0 : 1 = 2 = ... = p in the case 1.
The
e whole
o e pa
parametric
a et c space is
s a ( p + 1)) d
dimensional
e s o a space .
1
1 2
L( y | , 2 ) =
exp 2
2
2
2
ni
( y
i =1 j =1
n
1
L = ln L( y | , ) = ln (2 2 ) 2
2
2
2
1
L
= 0 i =
i
ni
ni
y
j =1
ij
= yio
L
1 p ni
2
=
0
( yij yio ) 2 .
n i =1 j =1
ij
p
i ) 2
ni
( y
i =1 j =1
ij
i )2
9
The dot sign
(o)
in yio indicates that the average has been taken over the second subscript j. The Hessian matrix of
second order partial derivation of ln L with respect to i and 2 is negative definite at = y io and 2 = 2 which
ensures that the likelihood function is maximized at these values.
Thus the maximum value of L( y | , 2 ) over is
n
1
1 2
Max L( y | , ) =
exp
2
2
2
2
( y
i =1 j =1
=
p ni
2
2 ( yij yio )
i =1 j =1
ij
i ) 2
n /2
n
expp .
2
is
1
1 2
exp
L( y | , ) =
2
2
2
2
2
ni
)2
ni
( y
i =1 j =1
ij
and
1
n
l L( y | , ) = ln(2
ln
l (2 2 ) 2
2
2
2
ni
( y
i =1 j =1
ij
)2 .
The normal equations and the least squares estimates are obtained as follows:
ln L( y | , 2 )
1 p ni
= 0 = yij = yoo
n i =1 j =1
ln L( y | , 2 )
1 p ni
2
=
0
=
( yij yoo ) 2 .
n i =1 j =1
10
The maximum value of the likelihood function over under H 0 is
n
1
1 2
Max L ( y | , 2 ) =
exp
2
2
2
2
ni
(y
i =1 j =1
=
ni
p
2
2 ( y ij y oo )
i =1 j =1
ij
) 2
n/2
n
exp .
2
Max L ( y | , 2 )
Max L ( y | , 2 )
p ni
2
( yij yio )
=1 =1
= i p jni
2
( yij yoo )
i =1 j =1
n/2
We have
p
ni
(y
i =1 j =1
ni
ij
i =1 j =1
ni
i =1
11
Thus
ni
( y
n
2
ij
i =1 j =1
q1
= 1+
1+
q2
yi ) 2 + ni ( yio yoo ) 2
I =1
p ni
( yij yio ) 2
i =1 j =1
n
2
where
p
q1 = ni ( yio yoo ) 2
i =1
ni
q2 = ( yij yio ) 2 .
i =1 j =1
q1
: sum of squares due to deviations from H 0 or the between population sum of squares,
q2
Q1 = ni (Yio Yoo ) 2
i =1
p
Q2 = Si2
i =1
where
ni
Yoo =
1 p ni
Yij ,
n i =1 j =1
Yio =
1
ni
ni
Y
j =1
ij
12
then under H 0
Q1
Q2
and
~ 2 ( p 1)
~ 2 (n p)
Q1 and Q2
Thus under H 0
Q1
2
p 1
~ F ( p 1, n p).
Q2
2
n p
q1
>C
q2
where the constant C = F1 ( p 1, n p ).
13
The analysis of variance table for the one way classification in fixed effect model is
Source of
variation
Degrees of
freedom
Sum of
squares
Mean
squares
Between
populations
p -1
q1
q1
p 1
Within
populations
n-p
Total
n-1
C = F1 ( p, n p )
q2
H0 : = 0
q1 + q2
Note that
Q
E 2 =2
n p
p
Q
E 1 =2 +
p 1
1 p
=
i .
p i =1
(
i =1
p 1
q2
(n p)
F - value
n p q1
.
p 1 q2
Case of rejection of H 0
If F > F1 ( p 1, n p), then H 0 : 1 = 2 = ... = p is rejected. This means that at least one i is different from other effects
which is responsible for the rejection. So objective is to investigate and find out such i and divide the population into
groups such that the means of populations within the group are same. This can be done by pairwise testing of s.
Test
H 0 : i = k (i k ) against H1 : i k .
t=
1 1
s2 +
ni nk
1 , n p
2
The quantity t
1 , n p
2
1 1
s2 +
ni nk
1 1
s 2 + is called the critical difference.
ni nk
q2
.
n p
CCD = t
1 , n p
2
2s 2
n
If
then the corresponding effects/means y io and y ko are coming from populations with the different means.
is accepted
In this sense if the probability of an event is higher than the intersection of the events, i.e., the probability that H 03 is
accepted is higher than the probability of acceptance of H 01 and H02 both, so we conclude, in general , that the
acceptance of H 01 and H02 imply the acceptance of H 03 .
Multiple
p comparison
p
tests
One interest in the analysis of variance is to decide whether population means are equal or not. If the hypothesis of
equal means is rejected then one would like to divide the populations into subgroups such that all populations with
same means come to the same subgroup. This can be achieved by the multiple comparison tests.
A multiple comparison test procedure conducts the test of hypothesis for all the pairs of effects and compare them at a
significance level , i.e., it works on per comparison basis.
There are various available multiple comparison tests. We will discuss some of them in the context of one way
classification. In two way or higher classification, they can be used on similar lines.
R n
s
where q , p , is the upper 100 % point of Studentized range when = n p. The tables for q
The testing procedure involves the comparison of q p , with q
, p ,
, p ,
are available.
W p = q , p ,
s2
.
n
*
*
*
divide the ranked means y1 , y2 ,..., y p into two subgroups containing
ii.
Compute the ranges R1 = y *p y2* and R2 = y *p 1 y1* . Then compare the ranges R1 and R2 with W p 1.
If either range R1 or R2 is smaller than W p 1, then means (or i s) in each of the groups are equal.
If R1 and / or R2 are greater then W p 1 , then the ( p 1) means (or i s) in the group concerned are divided
into two groups of (p 2) means each and compare the range of the two groups with W p 2 .
Continue with this procedure until a group of i means (or i s) is found whose range does not exceed Wi .
By this method, the difference between any two means under test is significant when the range of the observed means
of each and every subgroup containing the two means under test is significant according to the Studentized critical
range. This procedure can be easily understood by the following flow chart.
8
Arrange yio ' s in increasing order
y1* y2* ... y p*
Compute R = y *p y1*
Compare with W p = q , p ,
If R < W p
Stop and conclude
1 = 2 = ... = p
s2
n
If R > W p
continue
Compute R1 = y *p y2*
R2 = y *p 1 y1*
Compare R1 and R2 with Wp 1
4 possibilities
ibiliti
off R1 and
d R 2 with
ith W p 1
R1 < Wp 1
R2 < W p 1
2 = 3 = ... = p
and
1 = 2 = ... = p 1
1 = 2 = ... = p
R1 < W p 1
R1 > W p 1
R2 > W p 1
R2 < W p 1
2 = 3 = ... = p
1 = 2 = ... = p 1
i j , i j = 1, 2,..., p 1
i j , i j = 2,3,..., p
which is 1
which is p
one subgroup is
( 2 , 3 ,..., p )
R1 > W p 1
R2 > W p 1
Compute
R3 = y *p y3*
R4 = y *p 1 y2*
R5 = y *p 2 y1*
and compare
with Wp 2
D p = q p , p ,
*
s2
n
where p = 1 (1 ) p 1 , q* p , p, denotes the upper 100 % points of the Studentized range based on Duncans range.
Tables for Duncans range are available.
Duncan feels that this test is better than the Student-Newman-Keuls test for comparing the differences between any
two ranked means. Duncan regarded that the Student-Newman-Keuls method is too stringent in the sense that the
true differences between the means will tend to be missed too often. Duncan notes that in testing the equality of a
subset k , (2 k p) means through null hypothesis, we are in fact testing whether (p - 1) orthogonal contrasts between
the ' s differ from zero or not. If these contrasts were tested in separate independent experiments, each at level ,
th probability
the
b bilit off incorrectly
i
tl rejecting
j ti the
th nullll hypothesis
h
th i would
ld be
b 1 (1 ) pp1 . So
S Duncan
D
proposed
d to
t use
1 (1 ) p 1 in place of in the Student-Newman-Keuls test.
[Reference: Contributions to order statistics, Wiley 1962, Chapter 9 (Multiple decision and multiple comparisons, H.A.
David, pages 147-148)].
.
q* * , p ,
p
s
1 1
1
*
+
by q * , p , s
p
2 nU nL
n
where nU and nL are the number of observations corresponding to the largest and smallest means in the data
data. This
procedure is only an approximate procedure but will tend to be conservative, since means based on a small number of
observations will tend to be overrepresented in the extreme groups of means.
p
1
i =1 ni
p
t=
yio yko
Vm ( yio yko )
Var
is used which follows a t-distribution, say with degrees of freedom df. Thus H 0 is rejected whenever
t >t
df , 1
and it is concluded that 1 and 2 are significantly different. The inequality t > t
d f , 1
can be
equivalently written as
df , 1
m ( y y ).
Var
V
io
ko )
exceeds t
df , 1
m(y y )
Var
io
ko
then this will indicate that the difference between i and k is significantly different . So according to this
this, the
quantity t
df , 1
m ( y y ) would be the least difference of y and y for which it will be declared that the
Var
ko
io
io
ko
difference between i and k is significant. Based on this idea, we use the pooled variance of the two sample
Var ( yio y ko ) as
LSD = t
df , 1
1 1
s 2 + .
ni nk
If n1 = n2 = n, then
LSD = t
Now all
df , 1
2s 2
.
n
p ( p 1)
pairs of yio and yko , (i k = 1, 2,..., p) are compared with LSD. Use of LSD criterion may not lead to
2
good results if it is used for comparisons suggested by the data (largest/smallest sample mean) or if all pairwise
comparisons are done without correction of the test level. If LSD is used for all the pairwise comparisons then these
tests are not independent. Such correction for test levels was incorporated in Duncans test.
In this procedure , the Studentized rank values q , n , are used in place of t-quantiles and the standard error of the
difference of pooled mean is used in place of standard error of mean in common critical difference for testing H 0 : i = k
against H 0 : i k and Tukeys Honestly Significant Difference is computed as
HSD = q
1 , p ,
2
MSerror
n
p ( p 1)
pairs
2
yio yko
7
We notice that all the multiple comparison test procedure discussed up to now are based on the testing of hypothesis.
Th
There
i one-to-one
is
t
relationship
l ti
hi b
between
t
th
the ttesting
ti off h
hypothesis
th i and
d th
the confidence
fid
interval
i t
l estimation.
ti ti
So
S the
th
confidence interval can also be used for such comparisons. Since H 0 : i = k is same as H 0 : i k = 0, so first we
establish the relationship and then describe the Tukeys and Scheffes procedures for multiple comparison test which
are based on the confidence interval. We need the following concepts.
Contrast
p
= 0.
i=1
For example 1 2 = 0, 1 + 2 3 1 = 0, 1 + 2 2 3 3 = 0
etc are contrast whereas 1 + 2 = 0, 1 + 2 + 3 + 4 = 0, 1 2 2 3 3 = 0 etc.
etc.
etc are not contrasts.
contrasts
Orthogonal contrast
p
If
i =1
A m = 0 or
A m
i =1
=0
then L1 and L2
The condition
A m
i =1
= 0 ensures that
Coming back to the multiple comparison test, if the null hypothesis of equality of all effects is rejected then it is
reasonable to look for the contrasts which are responsible for the rejection. In terms of contrasts, it is desirable to have
a procedure
i.
that permits the selection of the contrasts after the data is available.
ii
ii.
Such procedures are Tukeys and Scheffes procedures. Before discussing these procedure, let us consider the
following example which illustrates the relationship between the testing of hypothesis and confidence intervals.
.
Example
H 0 : i = j (i j = 1, 2,..., p)
or H 0 : i j = 0
or H 0 : contrast = 0
or H 0 : L = 0.
t=
( i j ) ( i j )
m(y y )
Var
io
ko
L L
m ( L )
Var
9
where denotes the maximum likelihood (or least squares) estimator of and t follows a t-distribution with df
degrees
g
of freedom. This statistic,, infact,, can be extended to anyy linear contrast,, sayy e.g.,
g,
L = 1 + 2 3 4 , L = 1 + 2 3 4 .
The decision rule is
reject H 0 : L = 0 against
H1 : L 0
if
m ( L ).
L > tdf V
Var
)
L L
P tdf
tdf = 1
m ( L )
Var
or
m ( L ) L L + t Var
m ( L ) = 1
P L tdf Var
df
L t Var
m ( L ), L + t Var
m ( L )
df
df
and
m ( L ) L L + t Var
m ( L ).
L t df Var
df
10
3 1 2 2
and 2 1 3 4.
Thus we find that the interval of 1 2 includes zero which implies that H 0 : 1 2 = 0 is accepted. Thus 1 = 2 . On
the other hand interval of 1 3 does not include zero and so H 0 : 1 3 = 0 is not accepted. Thus 1 3 .
If the interval of 1 3 is 1 1 3 1 then H 0 : 1 = 3 is accepted. If both H 0 : 1 = 2 and H 0 : 1 = 3 are
accepted then we can conclude that 1 = 2 = 3 .
Tukeys
Tukey
s procedure for multiple comparison (T
(T- method)
The T-method uses the distribution of studentized range statistic . (The S-method (discussed next) utilizes the Fdistribution). The T-method can be used to make the simultaneous confidence statements about contrasts ( i j )
among a set of parameters {1 , 2 ,..., p } and an estimate s2 of error variance if certain restrictions are satisfied.
These restrictions have to be viewed according to the given conditions. For example, one of the restrictions is that all
2
i ' s have equal
q
variances. In the setup
p of one way
y classification,, i has its mean Yi and its variance is
. This
ni
reduces to a simple condition that all ni ' s are same, i.e., ni = n for all i so that all the variances are same.
Another assumption is to assume that 1 , 2 ,..., p are statistically independent and the only contrasts considered are
p ( p 1)
i 'j , i j = 1,
1 22,..., p} .
th
the
diff
differences
{
2
We make the following assumptions:
i.
ii
ii.
i a known
k
constant.
t t
i ~ N ( i , a 2 2 ), i = 1, 2,..., p , a > 0 is
iii.
iv.
s2
~ 2 ( ). (Here = n - p)
2
and
Th statement
The
t t
t off T-method
T
th d iis as ffollows:
ll
Under the assumptions (i)-(iv), the probability is (1 ) that the values of contrasts
p
i =1
i =1
L = Ci i ( Ci = 0) simultaneously satisfy
1 p
1 p
L Ts Ci L L + Ts Ci
2 i =1
2 i =1
where L = Ci i , i is the maximum likelihood (or least squares) estimate of i , T = aq , p , , with q , p , being the
i =1
1 p
Ci = 1 and the variance is 2 so that a = 1 and the interval
2 i =1
simplifies to
( i j ) Ts
T i j ( i j ) + T
Ts
where T = q , p , . Thus the maximum likelihood (or least squares) estimate L = i j of L = i j is said to be
significantly different from zero according to T-criterion if the interval ( i j Ts, i j + Ts ) does not cover i j = 0,
i e if
i.e.,
i j > Ts
or more general if
1 p
L > Ts Ci
2 i =1
Compute L or i j .
Compute all possible pairwise differences.
Compare all the differences with
s 1 p
q , p , .
Ci .
n 2 i =1
1 p
If L or ( i j ) > Ts Ci
2 i =1
q , p ,
n
q , p , s
1 1 1 1 p
+ Ci
2 ni n j 2 i =1
or
T
1 1 1 1 p
+ Ci
2 ni n j 2 i =1
independent estimable functions ( 1 , 2 ,..., p ) such that every in L is of the form = Ci yi where C1 , C2 ,..., C p
i =1
are known constants. In other words, L is the set of all linear combinations of 1 , 2 ,..., p .
Under the assumption that the parametric space is Y ~ N ( X , 2 I ) with rank ( X ) = p , = ( 1 ,..., p ), X
is n p matrix, consider a p-dimensional space L of estimable functions generated by a set of p linearly independent
estimable functions { 1 , 2 ,..., p }.
n
For any L, Let = Ci yi be its least squares (or maximum likelihood) estimator,
i =1
Var ( ) = 2 Ci2
i =1
= (say)
2
and
= s
2
where
s2
C
i =1
2
i
S + S
where the constant S =
pF1 ( p , n p ).
Method
For a given space L of estimable functions and confidence coefficient (1 ) , the least square (or maximum
likelihood) estimate of
confidence interval
( S + S )
does not cover = 0, i.e., if > S .
The S-method is less sensitive to the violation of assumptions of normality and homogeneity of variances.
Comparison
p
of Tukeys
y and Scheffes methods
1. Tukeys method can be used only with equal sample size for all factor level but S-method is applicable whether the
sample sizes are equal or not.
2. Although, Tukeys method is applicable for any general contrast, the procedure is more powerful when comparing
simple pairwise differences and not making more complex comparisons.
3. It only pairwise comparisons are of interest, and all factor levels have equal sample sizes, Tukeys method gives
shorter confidence interval and thus is more powerful.
4. In the case of comparisons involving general contrasts, Scheffes method tends to give narrower confidence interval
and p
provides a more p
powerful test.
5. Scheffes method is less sensitive to the violations of assumptions of normal distribution and homogeneity of
variances.
.
EXPERIMENTAL DESIGN
MODELS
Dr. Shalabh
Department of Mathematics and Statistics
Indian Institute of Technology Kanpur
2
We consider the models which are used in designing an experiment.
The experimental conditions, experimental setup and the objective of the study essentially determine that what type of
design is to be used and hence which type of design model can be used for the further statistical analysis to conclude
about the decisions. These models are based on one-way classification, two way classifications (with or without
interactions), etc. We discuss them now in detail in few setups which can be extended further to any order of
classification We discuss them now under the set up of one-way and two-way classifications.
classification.
classifications
It may be noted that it has already been described how to develop the likelihood ratio tests for the testing the hypothesis
of equality of more than two means from normal distributions and now we will concentrate more on deriving the same
tests through the least squares principle under the setup of linear regression model. The design matrix is assumed to be
not necessarily of full rank and consists of 0s and 1s only.
One-way classification
Let p random samples from p normal populations with same variances but different means and different sample sizes
have been independently drawn.
Let the observations follow the linear regression model setup and
Yij: denotes the jth observation of dependent variable Y when effect of ith level of factor is present.
Then Yij are independently
p
y normally
y distributed with
4
Example
E (Y1 j ) = + 1 ; j = 1, 2.
Similarly, if 5 mg. and 10 mg. of dosages are given to 4 and 7 patients respectively then the responses follow the model
E (Y2 j ) = + 2 ; j = 1, 2,3, 4
E (Y3 j ) = + 3 ; j = 1, 2,3, 4,5, 6, 7.
Here will denote the general mean effect which may be thought as follows: The human body has tendency to fight against
the fever, so the time taken by the medicine to bring down the temperature depends on many factors like body weight, height
etc. So denotes the general effect of all these factors which are present in all the observations.
In the terminology of linear regression model, denotes the intercept term which is the value of the response variable when
all the independent variables are set to take value zero. In experimental designs, the models with intercept term are more
g
y we consider these types
yp of models.
commonlyy used and so generally
ij is the random error component in Yij . It indicates the variations due to uncontrolled causes which can
influence the observations. We assume that ij ' s are identically and independently distributed as N (0, 2 ) with
where
E ( ij ) = 0, Var ( ij ) = 2 .
Note that the general linear model considered is
E (Y ) = X
for which
when all the entries in X are 0s or 1s. This model can also be re-expressed in the form of
E (Yij ) = + i .
This gives rise to some more issues.
Consider
E (Yij ) = i
= + ( i )
where
= + i
1 p
= i
p i =1
i i .
Now let us see the changes in the structure of design matrix and the vector of regression coefficients
coefficients.
The model E (Yij ) = i = + i can be now written as
E (Y ) = X * *
Cov (Y ) = 2 I
where
h
t and
d
* = ( , 1 , 2 ,..., p ) iis a p x 1 vector
1
1
X* =
#
rank ( X *) = p
It is thus apparent
pp
that all the linear p
parametric functions of 1 , 2 ,,...,, p are not estimable. The q
question now arises
that what kind of linear parametric functions are estimable?
L = aijYij
ni
ni
i =1 j =1
with
Ci = aij .
j=1
Now
ni
E (L) =
i =1 j =1
ij
ni
a
i =1 j =1
p
ij
E (Yij )
( + i )
ni
= aij +
i =1 j =1
p
i =1
i =1
ni
a
i = 1 j =1
ij
= ( C i ) + C i i .
p
Thus
= 0,
i =1
p
C
i =1
ie,
i.e.
C
i =1
is a contrast.
p
i =1
nor any
This effect and outcome can also be seen from the following explanation based on the estimation of parameters , 1 , 2 ,..., p .
8
Consider the least squares estimation , 1 , 2 ,..., p of 1 , 1 , 2 , ..., p , respectively.
Minimize the sum of squares due to
p
ni
ij ' s
ni
S = = ( yij i )2
2
ij
i =1 j =1
i =1 j =1
t obtain
to
bt i , 1 ,..., p .
p ni
S
= 0 ( yij i ) = 0
i =1 j =1
(a)
ni
S
(b)
= 0 ( yij i ) = 0 , i = 1, 2 ,..., p.
i
j =1
Note that (a) can be obtained from (b) or vice versa. So (a) and (b) are linearly dependent in the sense that there are
(p +1) unknowns and p linearly independent equations. Consequently , 1 ,..., p do not have a unique solution.
Same applies to the maximum likelihood estimation of , 1 ,... p . .
If a side condition that
p
ni i = 0 or
i =1
n
i =1
=0
=
i =
p
1 p ni
,
say
where
y
=
y
n
=
ni ,
ij oo
n i =1 j =1
i =1
1
np
ni
y
j =1
ij
= yio yoo .
In case, all the sample sizes are the same, then the condition
nii = 0 or
i =1
ni i = 0 reduces to
i =1
i = 0 or
i =1
i =1
So the model yij = + i + ij needs to be rewritten so that all the parameters can be uniquely estimated. Thus
Yij = + i + ij
= ( + ) + ( i ) + ij
= * + i* + ij
where
* = +
i* = i
=
1 p
i
p i =1
and
p
i =1
*
i
= 0.
= 0.
10
Thus in a linear model, when X is not of full rank, then the parameters do not have unique estimates. A restriction
p
i = 0 (or equivalently
i =1
n
i =1
= 0 in case all nis are not same) can be added and then the least squares (or
E (Yij ) = * + ;
*
i
i =1
*
1
=0
EXPERIMENTAL DESIGN
MODELS
Dr. Shalabh
Department of Mathematics and Statistics
Indian Institute of Technology Kanpur
2
Let us now consider the analysis of variance with additional constraint. Let
= =
1
p
,
i =1
i = i ,
p
n
i =1
= 0,
n = ni
i =1
and
H 0 : 1 = 2 = ... = p = 0
and the alternative hypothesis is
H 1 : atleast one i j for all i , j.
ni
ni
E = = ( yij i )2
2
ij
i =1 j =1
i =1 j =1
E
= 0 2
ni
i =1
j =1
(y
ij
i ) = 0
or
p
ni
n + ni i = yij
i =1
(1)
i =1 j =1
ni
E
= 0 2 ( yij i ) = 0
i
j =1
or
ni
ni + ni i = yij
j =1
(i = 1, 2,..., p ).
(2)
U i
Using
n
i
i =1
= 0 in
i (1) gives
i
1 p ni
G
= yij = = yoo
n i =1 j =1
n
where G =
ni
y
i =1 j =1
ij
i =
=
ni
1
ni
y
j =1
ij
Ti
ni
= yio yoo
where Ti =
ni
ijj
j=1
and yio =
1
ni
ni
y .
j =1
ij
5
Now the fitted model is yij = + i and the error sum of squares after substituting and i in E becomes
p
ni
E = ( yij i ) 2
i =1 j =1
p
ni
i =1 j =1
p
ni
ni
i =1 j =1
p ni 2 G 2 p Ti 2 G 2
= yij
n i =1 ni
n
i =1 j =1
where the total sum of squares (TSS)
p
ni
G2
= y
,
n
i =1 j =1
p
2
ij
G2
and
is called as correction factor (CF).
n
6
To obtain a measure of variation due to treatments, let
H 0 = 1 = 2 = ... = p = 0
be true. Then the model becomes
Yij = + ij , i = 1, 2,..., p ; j = 1, 2,..., ni .
E1 = ( yij )2
i =1 j =1
with respect to
p
E1
= 0 2
i =1
ni
(y
j =1
ij
) = 0
G
= y oo .
n
ni
E1 = ( yij ) 2
i =1 j =1
p
ni
= ( yij yoo ) 2
i =1 j =1
ni
G2
= y .
n
i =1 j =1
p
2
ij
Note that
E1: Contains variation due to treatment and error both
E: Contains variation due to treatment only
So E1 E : contain variation due to treatment only.
The sum of squares due to treatment is given by
SSTr = E1 E
p
ni
Ti 2
G2
=
.
n
i =1 ni
p
The following quantity is called the error sum of squares or sum of squares due to error (SSE)
n
SSE =
i =1
ni
(y
j =1
ij
yio ) 2 .
These sum of squares forms the basis for the development of tools in the analysis of variance. We can write
8
The distribution of degrees of freedom among these sum of squares is as follows:
The total sum of squares is based on n quantities subject to the constraint that
ni
i =1
j =1
(y
ij
(n 1) degrees of freedom.
p
The sum of squares due to the treatments is based on p quantities subject to the constraint
n (y
i =1
i
io
yoo ) = 0 so
(y
j =1
ij
yio ) = 0, i = 1, 2,..., p so
N ( + i , 2 ).
)
so yij are also
l iindependently
d
d tl di
distributed
t ib t d ffollowing
ll i
Now using the Theorems 7 and 8 with q1 = SSTr , q2 = SSE , we have under H 0 ,
SSTr
~ 2 ( p 1) and
SSE
~ 2 ( n p ).
)
9
The mean square is defined as the sum of squares divided by the degrees of freedom. So the mean square due to
treatment is
MSTr =
SSTr
p 1
MSE =
SSE
.
n p
Thus, under H 0 ,
MST
MSTr
F=
~ F ( p 1, n p).
MSE
2
The decision rule is that reject H 0 , if
F > F1 , p 1, n p
at 100 % level of significance.
significance
If H 0 does not hold true, then
MSTr
~ noncentral F ( p 1, n p, )
MSE
p
where
=
i =1
ni i2
10
MSTr
can also be obtained from the likelihood ratio test.
MSE
If H 0 is rejected , then we go for multiple comparison tests and try to divide the population into several groups having
the same effects.
Source of
variation
Degrees of
freedom
Sum of
squares
Mean
squares
F - value
Treatments
p -1
SSTr
MSTr
MSTr
MSE
Error
n - p
C = F1 ( p, n p )
SSE
H0 : = 0
Total
n-1
TSS
MSE
EXPERIMENTAL DESIGN
MODELS
Dr. Shalabh
Department of Mathematics and Statistics
Indian Institute of Technology Kanpur
p
2
= E ni {( + i + io ) ( + oo )}
i =1
where
1
ioi =
ni
ni
j =1
ij
1 p ni
oo = ij
n i =1 j =1
and
p
i =1
ni i
= 0.
n
p
2
E ( SSTr ) = E ni { i + ( io oo )}
i =1
n E (
i =1
2
i
)+
n E (
i =1
io
oo ) 2 + 0.
Since
1 ni
1
2
2
E ( ) = Var ( io ) = Var ij = 2 ni =
ni
ni j =1 ni
1 p ni
1
2
2
E ( oo ) = Var ( oo ) = Var ij = 2 n 2 =
n
n i =1 j =1 n
E ( io oo ) = Cov( io , oo )
2
io
p ni
ni
1
Cov ij ij
=
ni n
j =1 i =1 j =1
ni 2 2
=
=
.
ni n
n
E ( SSTr ) = ni +
2
i
n n
i =1
i=1
p
= ni i2 + ( p 1) 2
i =1
or
2
i
SSTr
2
i =1
E
= +
p 1
p 1
i
or
E ( MSTr ) = 2 +
n
i =1
2
i
p 1
1
n
4
Next
p ni
p ni
2
= E {( + i + ij ) ( + i + io )}
i =1 j =1
p ni
= E ( ij io ) 2
i =1 j =1
ni
= E ( ij2 + io2 2 ij io )
i =1 j =1
2 2 2 2
= +
ni
ni
i =1 j =1
ni
ni 1
i =1 j =1 ni
p
n (n 1)
= 2 i i
ni
i =1
p
ni
(n 1)
i =1
= (n p) 2
SSE
2
=
n p
or E
or E ( MSE ) = 2 .
Thus MSE is an unbiased estimator of
2.
Such an experiment is called two factor experiment. The different locations correspond to the different levels of A
and the different varieties correspond to the different levels of factor B. The observations are collected on the basis of
per plot.
The combined effect of the two factors (A and B in our case) is called the interaction effect (of A and B).
Mathematically, let a and b be the levels of factors A and B respectively then a function f ( a, b) is called a function of no
interaction if and only if there exists functions g (a ) and h(b) such that f ( a , b) = g ( a ) + h ( b).
Otherwise the factors are said to interact.
For a function f ( a, b) of no interaction,
f ( a1 , b) = g ( a1 ) + h(b)
f ( a2 , b) = g ( a2 ) + h(b)
f ( a1 , b) f ( a2 , b) = g ( a1 ) g ( a2 )
and so it is independent of b. Such no interaction functions are called additive functions.
E (Yij ) = ij
= oo + ( io oo ) + ( oj oo ) + ( ij io oj + oo )
= + i + j + ij
where
= oo
i = io oo
j = oj oo
ij = ij io oj + oo
with
I
i =1
i =1
j =1
j =1
i = (io oo ) = 0
j = (oj oo ) = 0.
Here
Here we assume
ij = 0
We also assume that the model E (Yij ) = ij is a full rank model so that
estimable.
The total number of observations are I J which can be arranged in a two way classified I J table where the rows
corresponds to the different levels of A and the column corresponds to the different levels of B.
9
The observations on Y design matrix X in this case are
2 " j
" 0
" 0
1
0
0 " 0
1 " 0
% #
" 0
#
0
# % #
0 " 1
#
0
# % #
0 " 1
#
1
# % #
0 " 0
0
#
0 " 1
# % #
0
#
1 " 0
# % #
" 1
0 " 1
1 2 " I
y11
1
1
1
1
0
0
#
1
#
#
1
#
0
y12
#
y1J
#
yI 1
yI 2
1
1
#
yIJ
#
1
If the design matrix is not of full rank, then the model can be reparameterized. In such a case, we can start the analysis
by assuming that the model E (Yij ) = + i + j is obtained after reparameterization.
There are two null hypothesis of interest:
H 0 : 1 = 2 = ... = I = 0
H 0 : 1 = 2 = ... = J = 0
10
Now we derive the least squares estimators (or equivalently the maximum likelihood estimator) of
, i
and
j , i = 1,
1 2,...,
2 I , j = 11, 22,..., J by
b minimizing
i i i i th
the error sum off squares
I
E=
i =1
(y
ij
j =1
i j )2 .
E
= 0 2
E
= 0 2
i
E
= 0 2
j
i =1
j =1
(y
J
(y
j =1
ij
(y
ij
i =1
ij
i j ) = 0
i j ) = 0 , i = 1, 2,..., I
i j ) = 0 , j = 1, 2,..., J .
I
i =1
where
1
IJ
y
i =1
j =1
ij
= 0 and
j =1
G
= yoo
IJ
i =
T
1 J
yij yoo = i yoo = yio yoo i = 1, 2,..., I
J j =1
J
j =
Bj
1 I
yoo = yoj yoo , j = 1, 2,..., J
yij yoo =
I i =1
I
Ti : treatment totals due to ith effect, i.e., sum of all the observations receiving the ith treatment.
B j : block totals due to jth effect, i.e., sum of all the observations in the jth block.
11
i =1 j =1
= ( yij i i j ) 2
i =1 j =1
I
( y
i =1 j =1
I
ij
i =1 j =1
i =1 j =1
which carries
IJ ( I 1) ( J 1) 1 = ( I 1)( J 1)
degrees of freedom.
i =1 j =1
EXPERIMENTAL DESIGN
MODELS
Dr. Shalabh
Department of Mathematics and Statistics
Indian Institute of Technology Kanpur
2
Next we consider the estimation of and j under the null hypothesis H 0 : 1 = 2 = ... = I = 0 by minimizing the
E1 = ( yij j ) 2 .
i =1 j =1
E1
E1
= 0 and
= 0, j = 1, 2,..., J
j
which on solving gives the least square estimates
= yoo
j = yoj yoo .
The sum of squares due to H 0 is
I
min E1 = ( yij j ) 2
,
i =1 j =1
I
= ( yij j )2
i =1 j =1
I
i =1
i =1 j =1
Sum of squares
Error sum of
due to factor A
squares
q
due to rows or sum of squares
q
are to factor A)
H 0 ((or sum of squares
i =1
i =1
and carries
( IJ J ) ( I 1)( J 1) = I 1
degrees of freedom.
Now we find the estimates of and i under H 0 : 1 = 2 = ... = J = 0 by minimizing
I
E2 = ( yij i ) 2 .
i =1 j =1
E 2
E 2
= 0 and
= 0, i = 1, 2,..., I
i
which on solving give the estimators as
= yoo
i = yio yoo .
4
The minimum value of the error sum of squares is obtained by
I
Min E2 = ( yij i ) 2
, j
i =1 j =1
I
= ( yij yio ) 2
i =1 j =1
J
j =1
i =1 j =1
The sum of squares due to deviation from H 0 (or the sum of squares due to columns or sum of squares due to
factor B) is
J
( IJ I ) ( I 1)( J 1) = J 1.
i =1 j =1
I
i =1
j =1
i =1 j =1
5
The partitioning of degrees of freedom into the corresponding groups is
IJ 1 = ( I 1) + ( J 1) + ( I 1)( J 1).
Note that SSA, SSB and SSE are mutually orthogonal and that is why the degrees of freedom can be divided like this.
Now using the theory explained while discussing the likelihood ratio test or assuming
yij ' s
to be independently
distributed as N ( + i + j , ), i = 1, 2,..., I ; j = 1, 2,..., J , and using the Theorems 6 and 7, we can write
2
SSA
SSB
SSE
~ 2 ( I 1)
~ 2 ( J 1)
~ 2 (( I 1)( J 1)).
SSA / 2
I 1
F1 =
SSE / 2
( I 1)( J 1)
( I 1)( J 1) SSA
.
( I 1)
SSE
MSA
=
~ F (( I 1),
1) ( I 1) ( J 1)) under
d H 0
MSE
SSA
SSE
where MSA =
and MSE =
.
( I 1)( J 1)
I 1
=
6
Same statistic is also obtained using the likelihood ratio test for H 0 .
The decision rule is
Reject H 0 if F1 > F1 [ ( I 1), ( I 1) ( J 1) ] .
SSB / 2
J 1
F2 =
SSE / 2
)( J 1))
( I 1)(
( I 1)( J 1) SSB
( J 1) SSE
MSB
=
~ F (( J 1), ( I 1)( J 1)) under H 0
MSE
SSB
where MSB =
.
J 1
=
The same test statistic can also be obtained from the likelihood ratio test.
J i2
i =1
is the associated
7
The analysis of variance table is as follows:
F - value
Source of variation
Degrees of
freedom
Sum of
squares
Mean
squares
( I - 1)
SSA
MSA
F1 =
MSA
MSE
SSB
MSB
F2 =
MSB
MSE
MSE
C = F1 ( p, n p )
(J - 1)
H0 : = 0
Error
(I - 1) (J - 1)
SSE
(By subtraction)
Total
n-1
TSS
8
It can be found on similar lines as in the case of one way classification that
J I 2
E ( MSA) = +
i
I 1 i =1
2
I J 2
S ) = +
E ( MSB
j
J 1 j =1
2
E ( MSE ) = 2 .
If the null hypothesis is rejected, then we use the multiple comparison tests to divide the
such that
i ' s (or j ' s) belonging to the same group are equal and those belonging to different groups are different.
Generally, in practice, the interest of experimenter is more in using the multiple comparison test for treatment effects
rather on the block effects. So the multiple comparison test are used generally for the treatment effects only.
EXPERIMENTAL DESIGN
MODELS
Dr. Shalabh
Department of Mathematics and Statistics
Indian Institute of Technology Kanpur
yijk = ij + ijk
where ijk are identically and independently distributed following N (0, 2 ). Thus
E ( yijk ) = ij
= oo + ( io oo ) + ( oj oo ) + ( ij io oj + oo )
= + i + j + ij
where
= oo
i = io oo
j = oj oo
ij = ij io oj + oo
with
I
i =1
j =1
i =1
j =1
= 0, j = 0, ij = 0, ij = 0.
Assume that the design matrix X is of full rank so that all the parametric functions of ij are estimable.
3
The null hypothesis are
H 0 : 1 = 2 = ... = I = 0
H 0 : 1 = 2 = ... = J = 0
H 0 : All ij = 0 for all i, j.
The corresponding alternative hypothesis is
E=
i =1
(y
j =1
k =1
ijk
i j ij ) 2 ,
E
= 0,
0
E
= 0 for all i,
i
E
= 0 for all j and
j
E
= 0 for all i and j.
ij
4
The least squares estimates are obtained as
= yooo =
1
IJK
i =1
j =1
k =1
i = yioo yooo =
1
JK
j = yojo yooo =
1
IK
ijk
y
i =1
ijk
y
j =1
ijk
yooo
yooo
1
K
y
i =1
ijk
j =1
( y
SSE = Min
, i , j , ij i =1 j =1 k =1
I
ijk
i j ij ) 2
= ( yijk i j ij ) 2
i =1 j =1 k =1
I
= ( yijk yijo ) 2
i =1 j =1 k =1
with
SSE
~ 2 ( IJ ( K 1)).
5
Now minimizing the error sum of squares under
I
E1 =
i =1
(y
j =1
ijk
k =1
ij
j ij ) 2
E1
E1
E1
= 0,
= 0 for all j and
= 0 for all i and j
j
ij
gives the least squares estimates as
= y ooo
j = yojo yooo
ij = yijo yioo yojo + yooo .
I
, j , ij
( y
ijk
i =1 j =1 k =1
I
j ij ) 2
= ( yijk j ij ) 2
i =1 j =1 k =1
I
= SSE + JK
i =1
(y
i =1
ioo
yooo ) 2 .
6
Thus the sum of squares due to deviation from
with
SSA
~ 2 ( I 1).
E2 =
i =1
(y
j =1
k =1
ijk
i ij ) 2
E2
E2
E2
= 0,
= 0 for all j and
= 0 for all i and j
i
ij
yields the least squares estimators
= yooo
i = yiooo yooo
ij = yijo yioo yojo + yooo .
The minimum error sum of squares is
I
j =1
7
and the sum of squares due to deviation from
H0
= IK ( yojo yooo ) 2
j =1
with
ith
SSB
~ 2 ( J 1).
1)
E3 =
i =1
(y
j =1
k =1
ijk
i j )2
E3
E3
E3
= 0,
= 0 for all i and
= 0 for all j
i
j
yields the least squares estimators as
= yooo
i = yioo yooo
j = yojo yooo .
8
The sum of squares due to H 0 is
I
( y
Min
,
i,
ijk
i =1 j =1 k =1
I
i j )2
= ( yijk i j ) 2
i =1 j =1 k =1
Thus the sum of squares due to deviation from H 0 or the sum of squares due to interaction effect AB is
(y
i =1
with
SSAB
j =1
ijo
~ 2 (( I 1) J 1)).
where SSA, SSB, SSAB and SSE are mutually orthogonal. So either using the independence of SSA, SSB, SSAB and
2
SSE as well as their respective distributions or using the likelihood ratio test approach, the decision rules for the
null hypothesis at
F1 =
IJ ( K 1) SSA
.
~ F [ ( I 1, IJ ( K 1) ] under H 0 ,
I 1 SSE
F2 =
IJ ( K 1) SSB
.
~ F [ ( J 1,, IJ ( K 1))] under H 0 ,
J 1 SSE
F3 =
IJ ( K 1) SSAB
.
~ F [ ( I 1)( J 1), IJ ( K 1) ] under H 0 .
( I 1)( J 1) SSE
and
So
Reject
H 0 if F1 > F1 [ ( I 1), IJ ( K 1) ]
Reject
H 0 if F2 > F1 [ ( J 1), IJ ( K 1) ]
Reject
If
or j ' s are
test or multiple comparison test to find which pairs of i ' s o
H 0 or H 0 is rejected, one can use t -test
significantly different.
If H 0 is rejected, one would not usually explore it further but theoretically t- test or multiple comparison tests can be
used.
10
It can also be shown that
E ( SSA) = 2 +
JK I 2
i
I 1 i =1
E ( SSB) = 2 +
IK J 2
j
J 1 j =1
I
J
K
E ( SSAB) = +
ij2
( I 1)( J 1) i =1 j =1
2
E ( SSE ) = 2 .
The analysis of variance table is as follows:
Source of
variation
Degrees of
freedom
Sum of squares
Factor A
( I - 1)
SSA
Factor B
(J - 1)
C = F1 ( p, n p )
SSB
Interaction AB
(I - 1) (J - 1)
H0 : = 0
Error
I J (K - 1)
SSE
Total
(I J K 1)
TSS
SSAB
Mean squares
MSA =
MSB =
MSAB =
MSA
I 1
MSB
J 1
SSAB
( I I )( J 1)
MSE =
SSE
IJ ( K 1)
F - value
F1 =
MSA
MSE
F2 =
MSB
MSE
F3 =
MSAB
MSE
EXPERIMENTAL DESIGN
MODELS
Dr. Shalabh
Department of Mathematics and Statistics
Indian Institute of Technology Kanpur
Consider the set up of two-way classification with one observation per cell and interaction as
yij = + i + j + ij + ij ,
i = 1, 2..., I , j = 1, 2,..., J
with
I
i =1
j =1
i = 0, j = 0.
The distribution of degrees of freedom in this case is as follows:
Source
Degrees of freedom
I- 1
J- 1
AB(interaction)
Error
(I 1) (J 1)
0
_______________________________________________
Total
( 1))
(IJ
_______________________________________________
There is no degree of freedom for error. The problem is that the two factor interaction effect and random error
componentt are subsumed
b
d ttogether
th and
d cannott be
b separated
t d out.
t There
Th
iis no estimate
ti t for
f 2.
If no interaction exists, then H 0 : ij = 0 for all i, j is accepted and the additive model yij =
+ i + j + ij
is well enough to test the hypothesis H 0 : i = 0 and H 0 : j = 0 with error having ( I 1)( J 1) degrees of freedom.
If interaction exists, then H 0 : ij = 0 is rejected. In such a case, if we assume that the structure of interaction effect is such
that it is proportional to the product of individual effects, i.e.,
ij = i j
then a test for testing H 0 : = 0 can be constructed. Such a test will serve as a test for nonadditivity. It will help in knowing
the effect of presence of interact effect and whether the interaction enters into the model additively. Such a test is given by
Tukeys test for nonadditivity which requires one degree of freedom leaving
( I -1)( J -1) -1
Let us assume that departure from additivity can be specified by introducing a product term and writing the model as
i =1
= 0,
j =1
= 0.
When 0, the model becomes nonlinear model and the least squares theory for linear models is not applicable.
4
I
i = 0,
i =1
1
yoo =
IJ
j =1
= 0, we have
1
yij =
IJ
i =1 j =1
+
i =1 j =1
+ j + i j + ij
J
1 I
1 J
I
= + i + j + ( j )( j ) + oo
I i =1
J j =1
IJ i =1
j =1
= + oo
E ( yoo ) =
= yoo .
Next
1 J
1 J
yio = yij = + i + j + i j + ij
J j =1
J j =1
1 J
1 J
= + i + j + i j + io
J j =1
J j =1
= + i + io
E ( yio ) = + i
i = yio = yio yoo .
Similarly,
yoj = + j
j = yoj = yoj yoo .
, i andd j remain
Th
Thus
i the
th unbiased
bi
d estimators
ti t
off
, i
and
d
j,
Also
S = ( yij i j i j ) 2
i
= Sij2 .
i
respectively
ti l irrespective
i
ti off whether
h th
= 0 or not.t
6
The normal equations are solved as
I
J
S
= 0 Sij = 0
i =1 j =1
= yoo
J
S
= 0 (1 + j ) Sij = 0
i
j =1
I
S
= 0 (1 + i ) Sij = 0
j
i =1
I
J
S
= 0 i j Sij = 0
i =1 j =1
I
or
i =1 j =1
or
( yij i j i j ) = 0
i =1 j =1
yij
2
2
i j
i =1 j =1
I
= (say)
7
Since i and j can be estimated by
i = yio yoo
and
j = yoj yoo
p
of whether 0 or = 0 so we can substitute them in p
place of i and
irrespective
I
y
i =1 j =1
ij
( IJ ) i j yij
i =1 j =1
I 2 J 2
J i I j
i =1 j =1
I
I 2 J 2
i j
i =1 j =1
I
S ASB
I
i =1
i =1
j =1
j =1
S B = I j2 = I ( yoj yoo ) 2 .
j in which ggives
8
Assuming i and j to be known, we find
2
I J
1
2 2
+
(
)
0
Var ( ) = I
Var
y
i
j
ij
J
2
2 i =1 j =1
i j
j =1
i =1
using
j =1
2 i2 j2
i =1
I 2 J 2
i j
i =1 j =1
2
2
I 2 J 2
i j
i =1 j =1
Var ( yij ) = 2 ,
C ( yij , y jk ) = 0 for all i k .
Cov
9
Since i and j can be estimated by i and j then substitute them back in the expression of Var ( ) and treating it
as Var ( ) gives
Var ( ) =
2
I 2 J 2
i j
i =1 j =1
IJ 2
=
S ASB
f given i and j.
for
EXPERIMENTAL DESIGN
MODELS
Dr. Shalabh
Department of Mathematics and Statistics
Indian Institute of Technology Kanpur
2
Note that if = 0, then
I J
i j yij
2
2
i j
j =1
i =1
I J
i j ( + i + j + 0 + ij )
= E i =1 j =1
I
J
2
2
(
)(
i
j
i =1
j =1
0
= I
J
2
( i ) ( j2 )
i =1
j =1
= 0.
0
As i and j
distributed as
IJ 2
~ N 0,
.
S
S
A B
= 0,
3
Thus the statistic
I J
i j yij
IJ
( ) 2
i =1 j =1
= 2
S ASB
Var ( )
I J
=
2
S ASB
I J
=
2
S A SB
=
follows a
SN
I J
SN =
S ASB
i th
is
the sum off squares due
d tto non-additivity
dditi it .
Note that
I
S AB
follows
( y
i =1 j =1
ij
2 (( I 1)( J 1)),
))
S N S AB
2 is nonnegative and follows 2 [ ( I 1)( J 1) 1] .
2
so
yij = + i + j + non-additivity + ij
and so
= ( IJ 1) ( I 1) ( J 1) 1
= ( I 1)( J 1) 1.
SN
SAB
is nonnegative.
Moreover S N ( SS due to nonadditivity ) and SSE are orthogonal.
Thus the F test for nonadditivity is
SN / 2
F=
SSE / 2
( I 1)( J 1) 1
= [ ( I 1)( J 1) 1]
SSN
SSE
The analysis of variance table for the model including a term for nonadditivity is as follows:
Source of
variation
Degrees of
freedom
Sum of squares
Factor A
( I - 1)
SA
Factor B
(J - 1)
SB
Non-additivity
SN
Error
(I - 1) (J - 1) - 1
SSE
(By subtraction)
Total
(I J 1)
TSS
Mean
squares
MS A =
MS B =
SA
I 1
SB
J 1
MS N = S N
MSE =
F - value
SSE
( I 1)( J 1) 1
MS N
MSE
Comparison of variances
One of the basic assumptions
p
in the analysis
y of variance is that the samples
p
are drawn from different normal
populations with different means but same variances. So before going for the analysis of variance, the test of hypothesis
about the equality of variance is needed to be done.
We discuss the test of equality of two variances and more than two variances.
A : x1 ,x2 ,...,xn1 ; xi ~ N( A , A2 )
B : y1 , y2 ,..., yn2 ; yi ~ N( B , B2 ).
The sample
p variance corresponding
p
g to the two samples
p
are
1 n1
( xi x ) 2
s =
n1 1 i =1
2
x
1 n2
s =
( yi y ) 2 .
n2 1 i =1
2
y
Under
H 0 : A2 = B2 = 2 ,
(n1 1) sx2
(n2 1) s y2
~ 2 (n1 1))
~ 2 (n2 1).
8
Moreover, the sample variances sx2 and s 2y are independent. So
(
n
1)
s
x
1
2
n1 1
sx2
= 2 ~ Fn1 1, n2 1.
(n2 1) s y2 s y
2
n2 1
F>F
F<F
H 0 is rejected if
1 ;n1 1,n2 1
2
or
1 ;n1 1,n2 1
2
where
F
2
;n1 1,n2 1;
1
F
1 ;n2 1,n1 1
2
If the null hypothesis H 0 : 12 = 22 is rejected, then the problem is termed as Fisher-Behrans problem. The solution are
available for this problem.
i si2
s =
i =1
k
where
i = ni 1, = i .
i =1
s2
i ln 2
si
i =1
1 k 1 1
1 +
3
1
(
k
)
i =1 i
is distributed as
be k
freedom. Under H 0 , all the variances are same as , say and an unbiased estimate of
2
is
2
Design of experiment means how to design an experiment in the sense that how the observations or measurements should
be obtained to answer a q
query
y in a valid,, efficient and economical way.
y The designing
g g of experiment
p
and the analysis
y
of
obtained data are inseparable.
If the experiment is designed properly keeping in mind the question, then the data
generated is valid and proper analysis of data provides the valid statistical inferences. If the experiment is not well
designed, the validity of the statistical inferences is questionable and may be invalid.
It is important to understand first the basic terminologies used in the experimental design.
Experimental unit
For conducting an experiment, the experimental material is divided into smaller parts and each part is referred to as
experimental unit. The experimental unit is randomly assigned to a treatment. The phrase randomly assigned is very
important in this definition.
Experiment
A way of getting an answer to a question which the experimenter wants to know.
Treatment
Different objects or procedures which are to be compared in an experiment are called treatments.
Sampling unit
The object that is measured in an experiment is called the sampling unit. This may be different from the experimental unit.
Factor
A factor is a variable defining a categorization
categorization.
A factor can be fixed or random in nature.
A factor is termed as fixed factor if all the levels of interest are included in the experiment.
A factor is termed as random factor if all the levels of interest are not included in the experiment and those that
are can be considered to be randomly chosen from all the levels of interest.
Replication
It is the repetition of the experimental situation by replicating the experimental unit.
Experimental error
The unexplained random part of variation in any experiment is termed as experimental error. An estimate of experimental
p
error can be obtained byy replication.
Treatment design
A treatment design is the manner in which the levels of treatments are arranged in an experiment.
Design of experiment
One of the main objectives of designing an experiment is how to verify the hypothesis in an efficient and economical way.
In the contest of the null hypothesis of equality of several means of normal populations having same variances, the
analysis of variance technique can be used. Note that such techniques are based on certain statistical assumptions. If
these assumptions are violated, the outcome of the test of hypothesis then may also be faulty and the analysis of data may
be meaningless. So the main question is how to obtain the data such that the assumptions are met and the data is readily
available for the application of tools like analysis of variance. The designing of such mechanism to obtain such data is
achieved by the design of experiment. After obtaining the sufficient experimental unit, the treatments are allocated to the
experimental units in a random fashion.
fashion Design of experiment provides a method by which the treatments are placed at
random on the experimental units in such a way that the responses are estimated with the utmost precision possible.
Randomization
ii
ii.
Replication
i.
Randomization
The principle of randomization involves the allocation of treatment to experimental units at random to avoid any bias in the
experiment
resulting from the influence of some extraneous unknown factor that may affect the experiment. In the
development of analysis of variance, we assume that the errors are random and independent. In turn, the observations
also become random. The principle of randomization ensures this.
The random assignment of experimental units to treatments results in the following outcomes.
a) It eliminates the systematic bias.
b) It is needed to obtain a representative sample from the population.
c) It helps in distributing the unknown variation due to confounded variables throughout the experiment and breaks the
confounding influence.
Randomization forms a basis of valid experiment but replication is also needed for the validity of the experiment.
If the randomization process is such that every experimental unit has an equal chance of receiving each treatment, it is
called a complete randomization.
ii.
Replication
p
In the replication principle, any treatment is repeated a number of times to obtain a valid and more reliable estimate than
which is possible with one observation only. Replication provides an efficient way of increasing the precision of
an
experiment. The precision increases with the increase in the number of observations. Replication provides more
observations when the same treatment is used, so it increases the precision. For example, if variance of X is
variance of sample mean based on n observation is
So as n increases,
than
decreases.
Complete
p
and incomplete
p
block designs
g
In most of the experiments, the available experimental units are grouped into blocks
characteristics to remove the blocking effect from the experimental error. Such design are termed as block designs.
In case, the number of treatments is so large that a full replication in each block makes it too heterogeneous with respect to
the characteristic under study, then smaller but homogeneous blocks can be used. In such a case, the blocks do not
contain a full replicate of the treatments. Experimental designs with blocks containing an incomplete replication of the
treatments are called incomplete block designs.
9
All the variability among the experimental units goes into experimented error.
CRD is used when the experimental material is homogeneous.
CRD is often inefficient.
CRD is more useful when the experiments are conducted inside the lab.
CRD is well suited for the small number of treatments and for the homogeneous experimental material
Layout of CRD
Following steps are needed to design a CRD:
Divide the entire experimental material or area into a number of experimental units, say n.
Fix the number of replications for different treatments in advance
((for g
given total number of available experimental
p
units).
)
No local control measure is provided as such except that the error variance can be reduced by choosing a
homogeneous set of experimental units.
10
Procedure
Let the v treatments are numbered from 1,2,...,v and ni be the number of replications required for ith treatment
k
such that
n
i =1
= n.
Select n1 units out of n units randomly and apply treatment 1 to these n1 units.
(Note: This is how the randomization principle is utilized is CRD.)
Select n2 units out of (n n1 ) units randomly and apply treatment 2 to these n2 units.
Continue with this procedure until all the treatments have been utilized.
Generally equal number of treatments are allocated to all the experimental units unless no practical limitation dictates or
some treatments are more variable or/and of more interest.
Analysis
There is only one factor which is affecting the outcome treatment effect. So the setup of one way analysis of variance is to
be used.
yij : Individual measurement of jth experimental units for ith treatment i = 1,2,...,v , j = 1,2,...,ni.
yij : Independently distributed following N ( + i , ) with
2
n
i =1
= 0.
: overall mean
t t
t effect
ff t
i : ith treatment
H 0 : 1 = 2 = ... = v = 0
H1 : All i' s are not equal.
Treatments
... v
_____________
y11
y12
#
y1n1
y21 ...
y22 ...
# %
y2 n2 ...
yv1
yv 2
#
yvnv
_____________
where Ti =
T1
ni
y
j =1
ij
T2
ith
... Tv
effect, G =
ni
T = y
i =1
i =1 j =1
ij
3
In order to derive the test for H 0 , we can use either the likelihood ratio test or the principle of least squares. Since the
likelihood ratio test has already been derived earlier
earlier, so we choose to demonstrate the use of least squares principle
principle.
ij ' s are identically and independently distributed random errors with mean 0 and variance 2 . The normality
assumption of
is not needed for the estimation of parameters but will be needed for deriving the distribution of various
Let
v
ni
ni
S = = ( yij i )2 .
i =1 j =1
2
ij
i =1 j =1
v
S
= 0 n + ni i = 0
i =1
ni
S
= 0 ni + ni i = yij , i = 1, 2,..., v.
i
j =1
4
v
= yoo
i = yio yoo
1
yioi =
ni
where
n
i =1
= 0 , we get
ni
y
j =1
ij
1 v ni
yoo = yij
n i =1 j =1
observations.
The fitted model is obtained after substituting the estimate and i in the linear model, we get
yij = + i + ij
or
or
Squaring both sides and summing over all the observation, we have
v
ni
( y
i =1 j =1
ij
or
i =1
i =1 j =1
Sum of squares
+ Sum of squares
=
due
to
treatment
due to error
squares
effects
TSS
=
SSTr
+
SSE
Total sum
of
ni
or
Since
ni
( y
ij
i =1 j =1
yoo ) = 0, so TSS is based on the sum of (n 1) squared quantities. Thus TSS carries only (n 1)
degrees of freedom.
v
Since
Si
n (y
i
i =1
io
yoo ) = 0,
0
so SSTr
SST is
i b
based
d only
l on th
the sum off ((v -1)
1) squared
d quantities.
titi
Th
Thus SSTr
SST carries
i
only
l
Since
n (y
i =1
constraints
ij
ni
(y
j =1
ij
Using
g the Fisher-Cochran theorem,
TSS = SSTr + SSE
with degrees of freedom partitioned as
( 1)) = (v
(n
( - 1)) + (n
( v).
)
6
Moreover, the equality in TSS = SSTr + SSE has to hold exactly. In order to ensure that the equality holds exactly, we find
one of the sum of squares through subtraction. Generally, it is recommended to find SSE by subtraction as
SSE= TSS - SSTr
where
ni
ni
= yij2
i =1 j =1
v
G2
n
ni
G = yij
i =1 j =1
G2
: correction factor
n
ni
Ti 2 G 2
=
n
i =1 ni
v
ni
Ti = yij .
j =1
7
Now under H 0 : 1 = 2 = ... = v = 0 , the model become
Yij = + ij ,
v
ni
S = ij2
and minimizing
i =1 j =1
with respect
p
to g
gives
S
G
= 0 = = yoo .
n
The SSE under H 0 becomes
ni
This TSS under H 0 contains the variation only due to the random error whereas the earlier TSS = SSTr + SSE
contains the variation due to treatments and errors both. The difference between the two will provides the effect of
treatments in terms of sum of squares as
v
SSTr = ni ( yi yoo ) 2 .
i=1
Expectations
v
ni
i =1
i =1 j =1
ni
= E ( ij io )
i =1 j =1
= ni E ( i + io oo ) 2
i =1
= ni + ni io2 n oo2
i =1
i =1
v
v 2
2
2
= ni i + ni
n
n
n
i =1
i
i =1
v
ni
E (
i =1 j =1
2
ij
) ni E ( io2. )
i =1
i =1
ni
= n ni
2
= (n v)
SSE
2
E ( MSE ) = E
=
nv
2
i
= ni i2 + (v 1) 2
i =1
1 v
SStr
E ( MSTr ) = E
ni i2 + 2 .
=
v 1 v 1 i =1
2
In general E ( MSTr )
but under H 0 , all
i = 0
and so
E ( MSTr ) = 2
ij ' s,
we find that yij ' s are also normal as they are the linear
combination of ij ' s.
SSTr
SSE
SSTr
~ 2 (v 1)
under
H0
~ 2 (n v)
under
H0
and
SSE
MStr
~ F (v 1, n v) under H 0 .
MSE
H0
at
level of significance if F
> F *, 1,n .
[Note: We denote the level of significance here by * because has been used for denoting the factor]
10
Source of
variation
Degrees of
freedom
Sum of
squares
Mean
squares
F - value
Between
treatments
v1
SSTr
MSTr
MSTr / MSE
Error
(n v)
SSE
MSE
Total
n-1
TSS
The responses from the b levels of blocks and the v levels of treatments can be arranged in a two
two-way
way
layout. The observed data set is arranged as follows:
Treatments
Blocks
Block
totals
y11
y21
yi1
yb1
B1 = yo1
y12
y22
yi2
yb2
B2 = yo22
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
y1j
y2j
ybj
Bj = yoj
.
.
.
.
.
.
.
.
.
v
Treatment
totals
.
.
.
.
.
.
yij
.
.
.
y1v
y2v
yiv
ybv
Bb = yob
T1 = y1o
T2 = y2o
Ti = yio
Tb = yvo
Grand
total
G = yoo
Layout
A two-way layout is called a randomized block design (RBD) or a randomized complete block design (RCB) if within each
block, the v treatments are randomly assigned to the v experimental units such that each of the v! ways of assigning the
treatments to the units has the same p
probability
y of being
g adopted
p
in the experiment
p
and the assignment
g
in different blocks
are statistically independent.
The RBD utilizes the principles of design- randomization, replication and local control- in the following way:
1. Randomization
Number the v treatments 1, 2,,v.
Number
N b th
the units
it iin each
h bl
block
k as 1
1, 2
2,...,v.
Randomly allocate the v treatments to the v experimental units in each block.
2 Replication
2.
Since each treatment is appearing in the each block, so every treatment will appear in all the blocks. So each treatment
can be considered as if replicated the number of times as the number of blocks. Thus in RBD, the number of blocks and
the number of replications are same.
5
3. Local Control
Local control is adopted in RBD in following way:
First form the homogeneous blocks of the experimental units.
Then allocate each treatment randomly in each block.
The error variance now will be smaller because of homogeneous blocks and some variance will be parted away from the
error variance due to the difference among the blocks.
Example
Suppose there are 7 treatment denoted as T1 , T2 ,.., T7 corresponding to 7 levels of a factor to be included in 4 blocks. So
one possible layout of the assignment of 7 treatments to 4 different blocks in a RBD is as follows:
Block 1
T2
T7
T3
T5
T1
T4
T6
Block 2
T1
T6
T7
T4
T5
T3
T2
Block 3
T7
T5
T1
T6
T4
T2
T3
Block 4
T4
T1
T5
T6
T2
T7
T3
Analysis
Let
yij : Individual measurements of jth treatment in ith block, i = 1, 2,...,b, j = 1, 2,...,v.
yij s are independently
p
y distributed following
g N ( + i + j , 2 )
where : overall mean effect
such that
i = 0,
i =1
j =1
=0.
H 0 B : 1 = 2 = .... = b = 0.
-
H 0T : 1 = 2 = .... = v = 0.
0
The linear model in this case is a two-way model as
ij
are identically and independently distributed random errors following a normal distribution with mean 0 and
variance .
2
7
The tests of hypothesis can be derived using the likelihood ratio test or the principle of least squares. The use of likelihood
ratio test has already been demonstrated earlier, so we now use the principle of least squares.
b
Minimizing
S = ij2 = ( yij i j ) 2
i =1 j =1
i =1 j =1
S
S
S
= 0,
= 0,
= 0 for all i = 1, 2,.., b, j = 1, 2,.., v,
i
j
the least squares estimators are obtained as
= yoo ,
i = yio yoo ,
j = yojj yoo .
The fitted model is
yij = + i + j + ij
= yoo + ( yio yoo ) + ( yoj yoo ) + ( yij yio yoj + yoo )).
Squaring both sides and summing over i and j gives
b
( y
i =1 j =1
or
TSS
ij
i =1
SSBl
j =1
SSTr
(v 1)
( y
i =1 j =1
ij
bv 1
(b 1)
(b 1)(v 1).
The reason for the number of degrees of freedom for different sums of squares is the same as in the case of CRD.
b
Here
= yij2
i =1 j =1
G2
bv
G2
: correction factor
bv
Bi2 G 2
b bv
=
i =1
v
=
j =1
T j2
G2
v bv
T j = yij : j th
treatment total
i =1
i=
v b 2
SSBl
2
E ( MSBl ) = E
=
+
i
b 1 i =1
b 1
b v 2
SSTr
2
E ( MSTr ) = E
+
=
j
v 1 j =1
v 1
SSE
2
E ( MSE ) = E
= .
(
b
1)(
v
1)
Moreover,
(b 1)
SSBl
(v 1)
SSTr
(b 1)(v 1)
~ 2 (b 1)
~ 2 (v 1)
SSE
~ 2 (b 1)(v 1).
10
Under
H 0 B : 1 = 2 = ... = b = 0,
0
E ( MSBl ) = E ( MSE )
and
MSBl
~ F ((b 1, (b 1)(v 1)).
MSE
Similarly, under
H 0T : 1 = 2 = ... = v = 0,
E ( MSTr ) = E ( MSE ).
)
Also,
MSTr
~ F (v 1), (b 1)(v 1)).
MSE
Reject
Reject
If H0B is accepted, then it indicates that the blocking is not necessary for future experimentation.
If H0T is rejected then it indicates that the treatments are different. Then the multiple comparison tests are used to divide
the entire set of treatments into different subgroups such that the treatments in the same subgroup have the same
treatment effect and those in the different subgroups have different treatment effects.
11
Source of
variation
Degrees of
freedom
Sum of
squares
Mean
squares
F - value
Bl k
Blocks
b 1
SSBl
MSBI
FBl
Treatments
v -1
SSTr
MSTr
FTr
Errors
(b - 1)(v - 1)
SSE
MSE
Total
bv - 1
TSS
There are only two factors block and treatment effects which are taken
experimental units needed for complete replication are bv where b and v are the numbers of blocks and treatments
respectively.
If there are three factors and suppose there are b,v and k levels of each factor, then the total number of experimental
units needed for a complete replication are bvk. This increases the cost of experimentation and the required number of
experimental units over RBD.
In Latin square design (LSD), the experimental material is divided into rows and columns, each having the same number
of experimental units which is equal to the number of treatments. The treatments are allocated to the rows and the columns
such that each treatment occurs once and only once in the each row and in the each column.
In order to allocate the treatment to the experimental units in rows and columns, we take the help from Latin squares.
Latin square
A Latin square of order p is an arrangement of p symbols in p2 cells arranged in p rows and p columns such that each
symbol occurs once and only once in each row and in each column.
For example, to write a Latin square of order 4, choose four symbols A,B,C and D. These letters are Latin letters which
are used as symbols. Write them in a way such that each of the letters out of A,B,C and D occurs once and only once is
each row and each column.
column For example,
example as
A
4
Example:
Suppose different brands of petrol are to be compared with respect to the mileage per liter achieved in motor cars.
Important factors responsible for the variation in the mileage are
We have three factors cars, drivers and petrol brands. Suppose we have
Now the complete replication will require 4 X 4 X 4 = 64 number of experiments. We choose only 16 experiments. To
choose such 16 experiments, we take the help of Latin square. Suppose we choose the following Latin square:
A
Write them in rows and columns and choose rows for cars, columns for drivers and letter for petrol brands.
Thus 16 observations are recorded as per this plan of treatment combination (as shown in the next figure) and further
analysis is carried out. Since such design is based on Latin square, so it is called as a Latin square design.
Drivers
Cars
C
Driver d will use
petrol C in car 4
4.
This will again give a design different from the previous one. The 16 observations will be recorded again but based on
different treatment combinations.
Since we use only 16 out of 64 possible observations, so it is an incomplete 3 way layout in which each of the 3 factors
cars, drivers and petrol brands are at 4 levels and the observations are recorded only on 16 of the 64 possible treatment
combinations.
7
Thus in a LSD,
rows and columns variations are eliminated from the within treatment variation.
In RBD, the experimental units are divided into homogeneous blocks according to the blocking factor. Hence it
eliminates the difference among blocks from the experimental error.
In LSD, the experimental units are grouped according to two factors. Hence two effects (like as two block effects)
are removed from the experimental error.
So the error variance can be considerably reduced in LSD.
The LSD is an incomplete three-way layout in which each of the three factors, viz., rows, columns and treatments, is at v
levels each and observations only on of the possible treatment combinations are taken. Each treatment combination
contains one level of each factor.
The analysis of data in a LSD is conditional in the sense it depends on which Latin square is used for allocating the
treatments. If the Latin square changes, the conclusions may also change.
We note that Latin squares play an important role is a LSD, so first we study more about these Latin squares before
describing the analysis of variance.
Given a Latin square, it is possible to rearrange the columns so that the first row and first column remain in natural order.
Example: Four standard forms of Latin square are as follows:
A B CD
A B CD
A B CD
A B C D
B A DC
B C DA
B D AC
B A D C
C D BA
C D AB
C A DB
C D A B
D C AB
D A BC
D CB A
D CB A
For each standard Latin square of order p, the p rows can be permuted in p! ways. Keeping a row fixed, vary and permute
(p - 1) columns in (p - 1)! ways. So there are p!(p - 1)! different Latin squares.
For illustration:
Size of square
Number of
Value of
Total number of
standard squares
p!(p -1)!
different squares
3 X 3
12
12
4 X 4
144
576
5 X 5
56
2880
161280
6 X 6
9408
86400
812851250
Conjugate
Two standard Latin squares are called conjugate if the rows of one are the columns of other .
For example
A
and
A Latin square is called self conjugate if its arrangement in rows and columns are the same.
Transformation set
A set of all the Latin squares obtained from a single Latin square by permuting its rows, columns and symbols is called a
transformation set.
Number of different
Latin squares of order
p in a transformation set
10
Greco-Latin square
A pair of orthogonal Latin squares, one with Latin symbols and the other with Greek symbols forms a Greco-Latin square.
For example
A B C D
B A D C
C D A B
D C B A
Greco Latin squares design enables to consider one more factor than the factors in Latin square design. For example, in
the earlier example, if there are four drivers, four cars, four petrol and each petrol has four varieties, as , , and ,
then Greco-Latin square helps in deciding the treatment combination as follows:
11
Cars
1
Drivers
Now
and so on
on.
For Latin squares of order less than 5, fix first row and then randomize rows and then randomize columns. In Latin
squares of order 5 or more, need not to fix even the first row. Just randomize all rows and columns.
Example
Suppose following Latin square is chosen
A B C
B C D
D E A
E A B
C D E
B C D
C D E
E A B
C D E
Now randomize columns, say 5th column becomes 1st column, 1st column becomes 4th column and 4th column becomes
5th column
E B C
A C D
D A B
C E A
B D E
yijk : Observation on kth treatment in ith row and jth block, i = 1, 2,...,v, j = 1, 2,...,v, k = 1, 2,...,v.
Triplets (i, j, k) take on only the v2 values indicated by the chosen particular Latin square selected for the experiment.
yijks are independently distributed as
N ( + i + j + k , 2 ) .
Linear model is
ijk
are random errors which are identically and independently distributed following N (0, 2 ).
with
v
i =1
= 0,
j =1
= 0,
k =1
= 0,
k :
H 0 R : 1 = 2 = .... = v = 0
H 0C : 1 = 2 = .... = v = 0
H 0T : 1 = 2 = .... = v = 0.
5
The analysis of variance can be developed on the same lines as earlier.
Minimizing S =
i =1
j =1
k =1
2
ijk
= yooo
i = yioo yooo
i = 1, 2,..., v
j = yojo yooo
j = 1, 2,..., v
k = yook yooo
k = 1, 2,..., v.
G2
( yijk yooo ) = y 2
v
i =1 j =1 k =1
i =1 j =1 k =1
v
v ( yioo
i =1
2
ijk
v
v
Ri2 G 2
yooo ) =
2 ; Ri = yijk
v
i =1 v
j =1 k =1
v
v ( yojoj yooo ) =
2
j =1
j =1
v ( yook
k =1
C 2j
v
v
G2
2 ; C j = yijkj
v
v
i =1 k =1
Tk2 G 2
yooo ) =
2 ;
v
v
k =1
v
Tk = yijk
i =1 j =1
6
Degrees of freedom carried by SSR, SSC and SSTr are (v - 1) each.
Degrees of freedom carried by TSS are v2 1.
1
Degrees of freedom carried by SSE are (v - 1)(v - 2).
The expectations of mean squares are obtained as
v v 2
SSR
E ( MSR) = E
=
+
i
v 1 i =1
v 1
v v 2
SSC
2
E ( MSC ) = E
j
= +
v 1 j =1
v 1
v v 2
SSTr
2
E ( MSTr ) = E
k
= +
v 1 k =1
v 1
SSE
2
E ( MSE ) = E
= .
(v 1)(v 2)
Thus
- under H 0 R , FR =
MSR
~ F ((v 1), (v 1)(v 2))
MSE
- under H 0C , FC =
MSC
~ F ((v 1), (v 1)(v 2))
MSE
- under H 0T , FT =
MSTr
~ F ((v 1),
), (v 1)(
)(v 2)).
))
MSE
7
Decision rules:
Reject H 0 R at level
Reject H 0 C at level
Reject H 0T at level
Source of
variation
Degrees of
freedom
Sum of
squares
Mean
squares
F - value
Rows
v1
SSR
MSR
FR
Columns
v1
SSC
MSC
FC
Treatments
v1
SSTr
MSTr
FT
Error
(v 1)(v 2)
SSE
MSE
Total
v2 - 1
TSS
For example,
p in a clinical trial, suppose
pp
the readings
g of blood p
pressure are to be recorded after three days
y of g
giving
g the
medicine to the patients. Suppose the medicine is given to 20 patients and one of the patient doesnt turn up for providing
the blood pressure reading. Similarly, in an agricultural experiment, the seeds are sown and yields are to be recorded after
f
few
months.
th Suppose
S
some cattle
ttl destroys
d t
th crop off any plot
the
l t or the
th crop off any plot
l t is
i destroyed
d t
d due
d to
t storm,
t
i
insects
t etc.
t
We discuss here the classical missing plot technique proposed by Yates which involve the following steps:
Estimate the missing observations by the values which makes the error sum of squares to be minimum.
Substitute the unknown values by the missing observations.
Express the error sum of squares as a function of these unknown values.
Minimize the error sum of squares using principle of maxima/minima, i.e., differentiating it with respect to the missing
value and put it to zero and form a linear equation.
Form as many linear equation as the number of unknown values (i.e., differentiate error sum of squares with respect to
each unknown value).
Solve all the linear equations simultaneously and solutions will provide the missing values.
values
Impute the missing values with the estimated values and complete the data.
Apply analysis of variance tools.
The error sum of squares thus obtained is corrected but treatment sum of squares are not corrected.
The number of degrees of freedom associated with the total sum of squares are subtracted by the number of missing
values and adjusted in the error sum of squares. No change in the degrees of freedom of sum of squares due to
treatment is needed.
Treatments
s
Blocks
1
y11
y21
yi1
yb1
B1 = yo1
y12
y22
yi2
yb2
B2 = yo2
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
y1j
y2j
ybj
Bj=
.
.
.
.
.
.
ybv
Bb= yob
yvo
Grand Total
'
G = yoo
+x
.
.
.
v
Treatment
totals
where
here
Block totals
i
.
.
.
.
.
.
y1v
y2v
T1 = y1o
T2 = y2o
yij = x
.
.
.
yiv
'
Ti = yio + x
'
yoo
: total of kno
known
n observations.
obser ations
'
yoj : total of known observations in jth block
yio' : total of known observations in ith treatment..
yoj' + x
Correction factor
(CF ) =
b
'
+ x)2
(G ') 2 ( yoo
=
n
bv
TSS = yij2 CF
i =1 j =1
+
=0
x
b
v
bv
b
'
vyio' + byoj' yoo
x=
.
(b 1)(v 1)
'
or
Then
1
1
1
SSE = x 2 + y 2 [( R1 + x ) 2 + ( R2 + y ) 2 ] [(C1 + x ) 2 + (C2 + y ) 2 ] + ( S + x + y ) 2 + terms independent of x and y.
b
v
bv
R + x C1 + x S + x + y
( SSE )
=0 x 1
+
=0
x
b
b
bv
R + y C2 + y S + x + y
( SSE )
=0 y 2
+
= 0.
y
v
v
bv
Thus solving the following two linear equations in x and y, we obtain the estimated missing values
ii.
Subtract correct error sum of squares from (i) . This given the correct treatment sum of squares.
iii.
Reduce the degrees of freedom of error sum of squares by the number of missing observations.
iv.
( S + x) 2
v2
Total sum of squares (TSS ) = x 2 + term which are constant with respect to x - CF
Row sum of squares ( SSR) =
( R + x)2
+ term which are constant with respect to x - CF
v
(C + x)2
+ term which are constant with respect to x - CF
v
(T + x) 2
+ term which are constant with respect to x - CF
v
Ch
Choose
x such
h th
thatt SSE iis minimum
i i
.
So
d ( SSE )
=0
dx
2x
2
4( S + x )
( R + C + T + 3x )) + 2
v
v
or
x=
V ( R + C + T ) 2S
.
(v 1)(v 2)
FACTORIAL EXPERIMENTS
Dr. Shalabh
Department of Mathematics and Statistics
Indian Institute of Technology Kanpur
Factorial experiments
p
involve simultaneouslyy more than one factor each at two or more levels. Several factors affect
simultaneously the characteristic under study in factorial experiments and the experimenter is interested in the main
effects and the interaction effects among different factors.
Fi t we consider
First
id an example
l to
t understand
d t d the
th utility
tilit off factorial
f t i l experiments.
i
t
Example: Suppose the yield from different plots in an agricultural experiment depend upon
1.
(i)
(ii)
type of fertilizer.
3
In order to compare different fertilizers (or different dosage of fertilizers)
sow single crop on all the plots and vary the quantity of fertilizer from plot to plot
plot.
The conclusions will become invalid if different varieties of crop are sown.
It is quite possible that one variety may respond differently than another to a particular type of fertilizer.
aA aB
bB bA bC
bC bB
bA
and
bA bC bB
aC
aC aA aB
aB aC aA
bB
aB
aC
bC
aB
bB
bC
aC
bB
aB
aC
bC
and
aA
bA
bA
aA
bA
aA
Here the difference between crop varieties is estimable but the difference between fertilizer treatment is not estimable.
Factorial experiments overcome this difficulty and combine each crop with each fertilizer treatment. There are six treatment
combinations as
aA, aB, aC, bA, bB, bC.
Keeping the total number of observations to be 18 (as earlier), we can use RBD with three blocks with six plots each, e.g.
bA aC
aB bB
aA bC
aA aC
bC aB
bB
bB aB
B
bA aC
C
aA
A bC
bA
F t i l experiments
Factorial
i
t involves
i
l
simultaneously
i lt
l more than
th one factor
f t each
h att two
t
or more levels.
l
l
If the number of levels for each factor is the same, we call it as symmetrical factorial experiment.
If the number of levels of each factor is not the same, then we call it as a symmetrical or mixed factorial experiment.
We consider only symmetrical factorial experiments.
Through the factorial experiments, we can study the individual effect of each factor and interaction effect.
Now we consider a 22 factorial experiment with an example and try to develop and understand the theory and notations
through this example.
A general notation for representing the factors is to use capital letters, e.g., A,B,C etc. and levels of a factor are
represented in small letters.
For example, if there are two levels of A, they are denoted as a0 and a1.
Similarly the two levels of B are represented as b0 and b1 .
Other alternative representation to indicate the two levels of A is 0 (for a0 ) and 1 (for a1 ).
b1 )).
The factors of B are then 0 ((for b0 ) and 1 (for
(
Note: An important point to remember is that the factorial experiments are conducted in a design of experiment. For
example, the factorial experiment is conducted as an RBD.
The two factors are denoted as A, say for current and B, say for voltage.
In order to make an experiment
experiment, there are 4 different combinations of values of current and voltage.
voltage
1.
2.
3.
4.
The response from those treatment combinations are represented by a0b0 (1), (a0b1 ) (b),
respectively
(a1b0 ) (a ) and ( a1b1 ) ( ab),
) respectively.
8
Now consider the following :
(C0V0 ) + (C0V1 )
2
I.
.
2
2
(C1V0 ) + (C1V1 )
: Average effect of voltage for current level C1
2
: ( a1b0 ) + ( a1b1 ) ( a ) + ( ab ) .
2
2
II.
2
2
Comparison
p
like
9
The average interaction effect of voltage and current can be obtained as
Average effect of voltage Average effect of voltage
I
I
at
current
level
at
current
level
0
1
2
2
2
2
Similarly,
at
voltagelevel
V
at
voltage
level
V
0
2
2
2
2
10
Comparison
p
like
at voltage level V0
at voltage level V1
= Average effect of current at different levels of voltage
=
2
2
2
2
The quantity
FACTORIAL EXPERIMENTS
Dr. Shalabh
Department of Mathematics and Statistics
Indian Institute of Technology Kanpur
2
Treating (ab) as (a)(b) symbolically (mathematically and conceptually, it is incorrect), we can now express all the main
effects interaction effect and general mean effect as follows:
effects,
Main effect of
A=
Main effect of
B=
Interaction effect of
A and B =
(M ) =
a0b0 0 0 I
a0b1 0 1 a
a1b0 1 0 b
a1b1 1 1 ab.
Sometimes 0 is referred to as low level and 1 is referred to high level.
Treatment combinations
Divisor
((1))
((a))
((b))
((ab))
AB
When the experiments are conducted factor by factor, then much more resources are required in comparison to the factorial
experiment. For example, if we conduct RBD for three level of voltage V0, V1 and V2 and two levels of current l0 and l1,
then to have 10 degrees of freedom for the error variance, we need
6 replications on voltage,
11 replications
li i
on current.
So total number of fans needed are 40.
For the factorial experiment with 6 combinations of 2 factors, total number of fans needed are 18 for the same precision.
We have considered the situation up to now by assuming only one observation for each treatment combination, i.e., no
replication. If r replicated observations for each of the treatment combinations are obtained, then the expressions for the
main and interaction effects can be expressed as
A=
1
[ (ab) + (a) b (1)]
2r
B=
1
[ (ab) + (b) a (1)]
2r
AB =
M=
1
[ (ab) + (1) a (b)]
2r
1
[(ab) + (a) + (b) + (1)].
4r
4r
5
Now we detail the statistical theory and concepts related to these expressions.
Let Y* = ((1), a, b, ab) ' be the vector of total response values. Then
A=
1 '
1
A AY* = (1 1 1 1)Y*
2r
2r
B=
1 '
1
A BY* = (1 1 1 1)Y*
2r
2r
AB =
1 '
1
A ABY* = (1 1 1 1)Y* .
2r
2r
Note that A, B and AB are the linear contrasts. Recall that a linear parametric function is estimable only when it is in the
form of linear contrast. Moreover, A, B and AB are the linear orthogonal contrasts in the total response values (1), a, b, ab
except for the factor 1/2r.
Th sum off squares off a lilinear parametric
The
t i function
f
ti A ' y is
i given
i
b
by
(A ' y ) 2
. If there
th
are r replicates,
li t
th
then th
the sum off
A 'A
(A ' y ) 2
squares is
. It may also be recalled under the normality of ys, this sum of squares has a Chi-square distribution
rA ' A
2
with one degree of freedom ( 1 ) . Thus the various associated sum of squares due to A, B and AB are given by following:
(A 'AY* ) 2
1
SSA = '
= ( ab + a b (1)) 2
rA AA A
4r
(A 'BY* ) 2
1
SSB = '
= ( ab + b a (1)) 2
rA B A B
4r
(A 'ABY* ) 2
1
SSAB = '
= ( ab + (1) a b) 2 .
r A AB A AB 4r
6
Each of SSA, SSB and SSAB has
12
under normality of y.
2
TSS = yijk
i =1 j =1 k =1
G2
4r
where
2
G = yijk
i =1 j =1 k =1
2 distribution with
(4r 1) 1 1 1 = 4( r 1)
degrees of freedom.
The mean squares are
MSA =
SSA
,
1
MSB =
SSB
,
1
MSAB =
MSE =
SSAB
,
1
SSA
.
4(r 1)
7
The F-statistic corresponding to A, B and AB are
FA =
MSA
~ F (1, 4(r 1) under H 0 ,
MSE
FB =
MSB
~ F (1, 4(r 1) under H 0 ,
MSE
FAB =
MSAB
~ F (1, 4( r 1) under H 0 .
MSE
Source
Sum of
Degrees of
Mean
squares
freedom
squares
A
B
SSA
SSB
AB
Error
SSAB
1
4( r 1)
Total
TSS
SSE
MSA
MSB
MSAB
MSE
MSA
MSE
MSB
FB =
MSE
MSAB
FAB =
MSE
FA =
4r 1
The decision rule is to reject the concerned null hypothesis when the value of concerned F statistic
FACTORIAL EXPERIMENTS
Dr. Shalabh
Department of Mathematics and Statistics
Indian Institute of Technology Kanpur
23 Factorial experiment
p
Suppose that in a complete factorial experiment, there are three factors - A, B and C, each at two levels, viz., a0 , a1 ; b0 , b1
and c0 , c1 respectively. There are total eight number of combinations:
Each treatment combination has r replicates, so the total number of observations are N = 23 r = 8r that are to be
analyzed for their influence on the response.
Factorial effect
Treatment combinations
(1)
ab
ac
bc
abc
AB
AC
BC
ABC
N t that
Note
th t once few
f
rows have
h
been
b
determined
d t
i d in
i this
thi table,
t bl restt can b
be obtained
bt i d b
by simple
i l multiplication
lti li ti off th
the symbols.
b l
For example, consider the column corresponding to a, we note that A has + sign, B has sign ,
so AB has sign (=sign of A x sign of B).
Once AB has - sign, C has sign then ABC has (sign of AB x sign of C ) which is + sign and so on.
The first row is a basic element. With this a = 1' Y* can be computed where 1 is a column vector of all elements unity. If
other rows are multiplied with the first row, they stay unchanged (therefore we call it as identity and denoted as I). Every
other row has the same number of + and signs.
signs If + is replaced by 1 and is replaced by -1,
1 we obtain the vectors of
orthogonal contrasts with the norm 8(= 23 ).
If each row is multiplied by itself, we obtain I (first row). The product of any two rows leads to a different row in the table.
F example
For
l
A.B = AB
AB.B = AB 2 = A
AC.BC = A.C 2 BB = AB.
The structure in the table helps in estimating the average effect.
A=
1
[(a) (1) + (ab) (b) + (ac) (c) + (abc) (bc)]
4r
[(a) (1)]
r
(ii) Average effect of A at low level of B and low level of C (a1b1c0 ) (a0b1c0 )
[ (ab) (b)]
(iii) Average effect of A at low level of B and high level of C (a1b0 c1 ) (a0b0 c1 )
(iv) Average effect of A at low level of B and C (a1b1c1 ) (a0b1c1 )
r
[(ac) (c)]
[(abc) (bc)] .
Hence for all combinations of B and C, the average effect of A is the average of all the average effects in (i)-(iv).
B=
1
(a + 1)(b 1)(c + 1)
[(b) + (ab) + (bc) + (abc) (1) (a) (c) (ac)] =
4r
4r
C=
1
(a + 1)(b + 1)(c 1)
[c + (ac) + (bc) + (abc) (1) (a) (b) (ab)] =
4r
4r
AB =
1
(a 1)(b 1)(c + 1)
[(1) + (ab) + (c) + (abc) (a) (b) (ac) (bc)] =
4r
4r
AC =
1
(a 1)(b + 1)(c 1)
[(1) + (b) + (ac) + (abc) (a) (ab) (c) (bc)] =
4r
4r
4r
BC =
1
(a + 1)(b 1)(c 1)
[(1) + (a) + (bc) + (abc) (b) (ab) (c) (ac)] =
4r
4r
ABC =
1
(a 1)(b 1)(c 1)
.
[(abc) + (ab) + (b) + (c) (ab) (ac) (bc) (1)] =
4r
4r
which follow a Chi-square distribution with one degree of freedom under normality of Y*. The corresponding mean
q
are obtained as
squares
MS ( Effect ) =
SS ( Effect )
.
Degrees of freedom
Feffect =
MS ( Effect )
MS ( Error )
which follows an F-distribution with degrees of freedoms 1 and error degrees of freedom under respective null
hypothesis The decision rule is to reject the corresponding null hypothesis at level of significance whenever
hypothesis.
Th
These
outcomes
t
are presented
t d in
i the
th ffollowing
ll i ANOVA table.
t bl
Sources
Sum of
Degrees of
Mean
squares
freedom
squares
A
B
AB
C
AC
BC
ABC
Error
SSA
SSB
SSAB
SSC
SSAC
SSBC
SSABC
SS ( Error )
Total
TSS
1
1
1
1
1
8((r 1))
8r - 1
MSA = SSA /1
MSB = SSB /1
FA
MSAB = SSAB /1
MSC = SSC /1
MSAC = SSAC /1
MSBC = SSBC /1
MSABC = SSABC /1
FAB
FB
FC
FAC
FBC
FABC
FACTORIAL EXPERIMENTS
Dr. Shalabh
Department of Mathematics and Statistics
Indian Institute of Technology Kanpur
2n Factorial experiment
p
Based on the theory developed for 22 and 23 factorial experiments, we now extend them for 2n factorial experiment.
Capital letters A,B,C, denote the factors. They are the main effect contrast for the factors A,B,C,
ABC,ABD,BCD, denote
d
t th
the second
d order
d or 3-factor
3 f t iinteractions
t
ti
and
d so on.
Each of the main effect and interaction effect carries one degree of freedom.
n
Total number of first order interactions = 2
n
3
Total
T t l number
b off second
d order
d iinteractions
t
ti
=
and so on.
For two factors A and B, the standard order is obtained by adding b and ab in the standard order of one factor A.
This is derived by multiplying (1) and a by b , i.e.
For three factors, add c, ac, bc and abc which are derived by multiplying the standard order of A and B by c, i.e.
Th the
Thus
th standard
t d d order
d off any ffactor
t iis obtained
bt i d step
t by
b step
t by
b multiplying
lti l i it with
ith additional
dditi
l letter
l tt to
t preceding
di
standard order.
For example, the standard order of A, B, C and D is 24 factorial experiment is
(1), a, b, ab, c, ac, bc, abc, d {(1), a, b, ab, c, ac, bc, abc}
= (1), a, b, ab, c, ac, bc, abc, d , ad , bd , abd , cd , acd , bcd , abcd .
How to find the contrasts for main effects and interaction effect
Recall that earlier, we had illustrated the concept in writing the contrasts for main and interaction effects. For example, in a
22 factorial experiment, we had expressed
1
1
(a 1)(b + 1) = [ (1) + (a ) (b) + (ab)]
2
2
1
1
AB = (a 1)(b 1) = [ (1) (a ) (b) + (ab) ].
2
2
A=
For example, in a 26 factorial experiment, the general mean effect has divisor 26 and any main effect or interaction effect
of any order has divisor 26-1 = 25.
5
How to write contrasts
M th d 1:
Method
1 Contrast
C t t belonging
b l
i tto the
th main
i effects
ff t and
d the
th interaction
i t
ti effects
ff t are written
itt as follows:
f ll
A = (a 1)(b + 1)(c + 1)...( z + 1)
B = (a + 1)(b 1)(c + 1)...( z + 1)
C = (a + 1)(b + 1)(c 1)...( z + 1)
#
1
(a 1)(b + 1)(c + 1)
231
1
= [ (1) + (a) (b) + (ab) (c) + (ac) (bc) + ( abc) ]
4
1
M = 3 (a + 1)(b + 1)(c + 1)
2
1
= [ (1) + (a) + (b) + (ab) + (c) + (ac) + (bc) + (abc) ]
8
A=
Method 2:
Form a table such that
the rows correspond to the main or interaction effect and
the columns corresponds to treatment combinations (or other way round.)
+ and signs in the table indicate the sign of the treatment combinations of main and interaction effects.
Signs are determined by the rule of odds and evens given as follows:
if the interaction has an even number of letters (AB, ABCD,), a treatment combination having an even number
of letters common with the interaction enters with a + sign and one with an odd number of letters common enters
with a sign.
if interaction has odd number of letters (A, ABC,), the rule is reversed.
Once few rows are filled up, other can be obtained through multiplication rule. For example, sign of is obtained as
(sign of A x sign of BCD) or (sign of AB x sign of CD).
Treatment combination (1) is taken to have an even number (zero) of letters common with every interaction.
7
This rule of assignment of + or is illustrated in the following flow diagram:
Interaction
Evennumberofletters(AB, ABCD,)
Oddnumberofletters(A, ABC,)
Countthenumberofletters
commonbetweentreatment
b
combinationsandinteractions
Countthenumberofletters
commonbetweentreatment
combinationsandinteractions
Evennumber
Oddnumber
Evennumber
Oddnumber
ofletters
ofletters
ofletters
ofletters
+sign
sign
sign
- sign
+ sign
+sign
Factorial effect
Treatment combinations
(1)
ab
ac
bc
abc
AB
AC
BC
ABC
FACTORIAL EXPERIMENTS
Dr. Shalabh
Department of Mathematics and Statistics
Indian Institute of Technology Kanpur
Sums of squares
q
Suppose 2n factorial experiment is carried out in a randomized block design with r replicates.
Denote the total yield (output) from r plotes (experimental units) receiving a particular treatment combination by the same
symbol
b l within
ithi a square bracket.
b k t For
F example,
l [ab]
[ b] denotes
d
t th
the ttotal
t l yield
i ld ffrom th
the plots
l t receiving
i i th
the treatment
t t
t
combination (ab).
I a 22 factorial
In
f t i l experiment,
i
t the
th factorial
f t i l effect
ff t totals
t t l are
[ A] = yi ( ab ) yi (b ) + yi ( a ) yi (1)
i =1
= A 'A y A (say).
where A A is a vector of +1 and -1 and y A is a vector denoting the responses from ab, b, a and 1. Similarly, other effects can
also be found.
[ Total yield]
(A 'A y A ) 2
SSA =
.
r 22
In a 2n factorial experiment in an RBD, the divisor will be r. 2n . If latin square design is used based on 2n x 2n Latin
square , then r is replaced by 2n.
[column( n) ]
Yield
(1)
(2)
combinations
(1)
(1)
(1) + (a)
(a)
(b) + (ab)
(b)
(a) - (1)
ab
(ab)
(ab) - (b)
Note: The columns are repeatedly obtain 2 times due to 22 factorial experiment.
experiment
[ A]
Now SSA =
4r
[ B]
SSB =
4r
[ AB ]
SSAB =
4r
6
Example: Yates procedure for a 23 factorial experiment
Treatment
((1))
((2))
((3))
((4))
((5))
((6))
(1)
u1 = (1) + (a )
v1 = u1 + u2
w1 = v1 + v2
[M ]
(a)
u2 = (b) + (ab)
v2 = u3 + u4
w2 = v3 + v4
[ A]
(b)
u3 = (c) + (ac)
v3 = u5 + u6
w3 = v5 + v6
[ B]
ab
(ab)
u4 = (bc) + (abc)
v4 = u7 + u8
w4 = v7 + v8
[ AB]
(ac)
u5 = (a ) (1)
v5 = u2 u7
w5 = v2 v1
[C ]
ac
(ac )
u6 = (ab) (b)
v6 = u4 u3
w6 = v4 v3
[ AC ]
bc
(bc)
u7 = (ac) (c)
v7 = v6 u5
w7 = v6 v5
[ BC ]
abc
(abc)
u8 = (abc) (bc)
v8 = u8 u7
w8 = v8 v7
[ ABC ]
The sum of squares are obtained as follows when the design is RBD:
[ Effect ]
SS ( Effect ) =
r.23
F the
For
th analysis
l i off 2n factorial
f t i l experiment,
i
t the
th analysis
l i off variance
i
iinvolves
l
th
the partitioning
titi i off treatment
t t
t sum off squares
so as to obtain the sum of squares due to main and interaction effects of factors. These sum of squares are mutually
orthogonal, so Total SS = Total of all the SS due to main and interaction effects.
For example:
In 22 factorial experiment in an RBD with r replications, the division of degrees of freedom and the treatment sum of
squares are as follows:
Source
Degrees of
freedom
Replications
r-1
Treatments
41=3
Sum of squares
[ A]
[ B]
[ AB ]
/ 4r
/ 4r
2
Error
3(r - 1)
Total
4r - 1
The decision rule is to reject the concerned null hypothesis when the related F - statistic
/ 4r
CONFOUNDING
Dr. Shalabh
Department of Mathematics and Statistics
Indian Institute of Technology Kanpur
' is said to be estimable if there exist a linear function l ' y of the observations on random
variable y such that E (l ' y ) = ' . Now there arise two questions. Firstly, what does confounding means and secondly,
h
how
d
does it compares to
t using
i
BIBD.
BIBD
In order to understand the confounding, let us consider a simple example of 22 factorial with factors A and B, each factor
has two levels..
levels The four treatment combinations are (1),
(1) a,
a b and ab.
ab Suppose each batch of raw material to be used in the
experiment is enough only for two treatment combinations to be tested. So two batches of raw material are required. Thus
two out of four treatment combinations must be assigned to each block. Suppose this 22 factorial experiment is being
conducted in a randomized block design. Then the corresponding model is
3
E ( yij ) = + i + j ,
then
A=
1
[ ab + a b (1)] ,
2r
B=
1
[ ab + b a (1)] ,
2r
AB =
1
[ ab + (1) a b].
2r
a
b
The block effects of blocks 1 and 2 are 1 and 2 , respectively, then the average responses corresponding to treatment
combinations a, b, ab and (1) are
E [ y (a) ] = + 2 + (a),
E [ y (b)] = + 2 + (b),
)
E [ y (ab) ] = + 1 + (ab),
E [ y (1) ] = + 1 + (1),
respectively. Here y ( a ), y (b), y ( ab), y (1) and ( a ), (b), ( ab), (1) denote the responses and treatments
corresponding to a, b, ab and (1), respectively.
5
Alternatively, if the arrangement of treatments in blocks are as follows:
Block 1
(1)
ab
Block 2
a
b
We notice that it is in our control to decide that which of the effect is to be confounded. The order in which treatments are
run in a block is determined randomly. The choice of block to be run first is also randomly decided.
For a given effect, when two treatment combinations with the same signs are assigned to one block and the other two
treatment combinations with the same but opposite signs are assigned to another block, then the effect gets confounded.
The reason behind this observation is that if every block has treatment combinations in the form of linear contrast, then
effects are estimable and thus unconfounded. This is also evident from the theory of linear estimation that a linear
parametric function is estimable if it is in the form of a linear contrast.
The contrasts which are not estimable are said to be confounded with the differences between blocks (or block
effects) The contrasts which are estimable are said to be unconfounded with blocks or free from block effects
effects).
effects.
N
Now
we explain
l i h
how confounding
f
di and
d BIBD compares together.
t
th C
Consider
id a 23 factorial
f t i l experiment
i
t which
hi h needs
d th
the bl
block
k
size to be 8. Suppose the raw material available to conduct the experiment is sufficient only for a block of size 4. One
can use BIBD in this case with parameters b = 14, k = 4, = 8, r = 7 and = 3 (such BIBD exists). For this BIBD, the
efficiency factor is
E=
v
kr
6
8
and
2k 2 2 2
= ( j j ').
v
6
Consider now an unconnected design in which 7 out of 14 blocks get treatment combination in block 1 as
a b c abc
and remaining 7 blocks get treatment combination in block 2 as
(1) ab bc ac
In this case, all the effects A, B, C , AB, BC and AC are estimable but ABC is not estimable because the treatment
combinations with all + and all signs in
8
ABC = (a 1)(b 1)(c 1)
= (a + b + c + abc) ((1) + ab + bc + ac)
in block1
in block 2
are contained in same blocks. In this case, the variance of estimates of unconfounded main effects and interactions is
2 2 2 2
=
=
( j j ')
r
7
and there are four linear contrasts, so the total variance is 4 (2 2 / 7) which gives the factor 8 2 / 7 and which is
smaller than the variance under BIBD.
We observe that at the cost of not being able to estimate ABC, we have better estimates of A, B, C, AB, BC and AC with the
same number of replicates as in BIBD. Since higher order interactions are difficult to interpret and are usually not large,
so it is much better to use confounding arrangements which provide better estimates of the interactions in which we are
more interested.
Note that this example is for understanding only. As such the concepts behind incomplete block design and confounding
are different.
Confounding arrangement
The arrangement of treatment combinations in different blocks, whereby some pre-determined effect (either main or
interaction) contrasts are confounded is called a confounding arrangement.
For example, when the interaction ABC is confounded in a 23 factorial experiment, then the confounding arrangement
consists of dividing the eight treatment combinations into following two sets:
a b c abc
and
(1) ab bc ac
With the treatments of each set being assigned to the same block and each of these sets being replicated same number of
times in the experiment, we say that we have a confounding arrangement of a 23 factorial in two blocks.
It may be noted that any confounding arrangement has to be such that only predetermined interactions are confounded
and the estimates of interactions which are not confounded are orthogonal whenever the interactions are orthogonal.
10
Defining contrast
The interactions which are confounded are called the defining contrasts of the confounding arrangement.
A confounded contrast will have treatment combinations with the same signs in each block of the confounding arrangement.
For example, if effect AB = (a - 1)(b 1)(c 1) is to be confounded, then put all factor combinations with + sign, i.e., (1),
ab, c and abc in one block and all other factor combinations with sign, i.e., a, b, ac and bc in another block. So the block
size reduces to 4 from 8 when one effect is confounded in 23 factorial experiment.
Suppose
pp
if along
g with ABC confounded,, we want to confound C also. To obtain such blocks,, consider the blocks where
ABC is confounded and divide them into further halves. So the block
a b c abc
i di
is
divided
id d iinto
t ffollowing
ll i ttwo bl
blocks:
k
a b
and
c abc
(1) ab bc ac
is divided into following two blocks:
(1) ab
and
bc ac
11
These blocks of 4 treatments are divided into 2 blocks with each having 2 treatments and they are obtained in the following
way. If only C is confounded then the block with + sign of treatment combinations in C is
c ac bc abc
and block with sign
g of treatment combinations in C is
(1) a b ab
N
Now
llook
k iinto
t th
the
(i) following block with + sign when ABC = ( a 1)(b 1)( c 1) is confounded,
a b c abc
(ii) following block with + sign when C = ( a + 1)(b + 1)(c 1) is confounded and
c ab ac abc
(iii) table of + and signs in case of 23 factorial experiment.
experiment
Identify the treatment combinations having common - signs in these two blocks in (i) and (ii). These treatment combinations
are c and abc. So assign them into one block. The remaining treatment combinations out of a, b, c and abc are a and b
which go into another block.
12
(1) ab bc ac
(b) following block with sign when C is confounded and
(1) a b ab
( ) ttable
(c)
bl off + and
d signs
i
iin case off 23 factorial
f t i l experiment.
i
t
Identify the treatment combinations having common sign in these two blocks in (a) and (b). These treatment
combinations are (1) and which go into one block and the remaining two treatment combinations ac and bc out of c, ac, bc
and abc go into another block. So the blocks where both ABC and C are confounded together are
(1) ab , a b , ac bc
and
c abc
CONFOUNDING
Dr. Shalabh
Department of Mathematics and Statistics
Indian Institute of Technology Kanpur
2
Now we present some definitions which are useful in describing the confounding arrangements.
Generalized interaction
Given any two interactions, the generalized interaction is obtained by multiplying the factors (in capital letters) and ignoring
all the terms with an even exponent.
For example, the generalized interaction of the factors ABC and BCD is ABC BCD = AB 2C 2 D = AD and the generalized
interaction of the factors AB,
AB BC and ABC is
AB BC ABC = A2 B 3C 2 = B.
Independent set
A set of main effects and interaction contrasts is called independent if no member of the set can be obtained as a
generalized interaction of the other members of the set.
For example, the set of factors AB, BC and AD is an independent set but the set of factors AB, BC, CD and AD is not an
independent set because AB BC CD = AB 2C 2 D = AD which is already contained in the set.
A x B y C z ... Similarly,
y if two interactions are orthogonal
g
to a treatment combination, then their g
generalized interaction is also
orthogonal to it.
Now we give some general results for a confounding arrangement. Suppose we wish to have a confounding arrangement in
bl k off a 2 k factorial
f t i l experiment.
i
t Then
Th we have
h
th
the ffollowing
ll i observations:
b
ti
2 p blocks
1. The size of each block is 2k p.
p
p
2. The number of elements in defining contrasts is (2 1), i.e., (2 1) interactions have to be confounded.
p
Proof: If p factors are to be confounded, then the number of mth order interaction with p factors is , (m = 1, 2,..., p). So
m
the total number of factors to be confounded are
p
p 1
=2
m =1 m
p
3. If any two interactions are confounded, then their generalized interactions are also confounded.
4
4. The number of independent contrasts out of (2 p 1) defining contrasts is p and rest are obtained as generalized
interactions.
p
5. Number of effects getting confounded automatically is (2 p 1).
To illustrate this, consider a 25 factorial (k = 5) with 5 factors, viz., A, B, C, D and E. The factors are to be confounded in
53
23 blocks (p = 3). So the size of each block is 2 = 4. The number of defining contrasts is 23 1 = 7. The number of
independent contrasts which can be chosen arbitrarily is 3 (i.e., p ) out of 7 defining contrasts. Suppose we choose p = 3
following independent contrasts as
i.
ACE
ii.
CDE
iii. ABDE
and then the remaining 4 out of 7 defining contrasts are obtained as
iv.
( ACE ) (CDE ) = AC 2 DE 2 = AD
v.
vi.
5
Alternatively, if we choose another set of independent contrast as
i
i.
ABCD
ii.
ACDE
iii. ABCDE
iv.
then the defining
g contrasts are obtained as
iv. ( ABCD) ( ACDE ) = A2 BC 2 D 2 E = BE
v.
( ABCD) ( ABCDE ) = A2 B 2C 2 D 2 E = E
As a rule, try to confound, as far as possible, higher order interactions only because they are difficult to interpret.
After selecting
g p independent
p
defining
g contrasts,, divide the 2k treatment combinations into 2p g
groups
p of 2 k-p combinations
each, and each group going into one block.
If there are p independent defining contrasts, then any treatment combination in principal block is orthogonal to p
independent defining contrasts. In order to obtain the principal block,
order.
write the treatment combinations in standard order
check each one of them for orthogonality.
If two treatment combinations belongs to the principal block, their product also belongs to the principal block.
When few treatment combinations of the principal block have been determined, other treatment combinations can
be obtained byy multiplication
p
rule.
7
Example
Consider the set up of a 25 factorial experiment in which we want to divide the total treatment effects into 23 groups by
confounding three effects AD, BE and ABC. The generalized interactions in this case are ADBE, BCD, ACE and CDE.
ab
ac
bc
abc
ad
bd
abd
cd
acd
bcd
abcd
ae
be
abe
ce
ace
bce
abce
de
ade
bde
abde
cde
acde
bcde
abcde
Place a treatment combination in the principal block if it has an even number of letters in common with the confounded
effects AD, BE and ABC. The principal block has (1), acd , bce and abde( = acd bce).
Obtain other blocks of confounding arrangement from principal block by multiplying the treatment combinations of the
principal block by a treatment combination not occurring in it or in any other block already obtained.
In other words, choose treatment combinations not occurring in it and multiply with them in the principal block.
Choose only distinct blocks. In this case, obtain other blocks by multiplying a, b, ab, c, ac, bc, abc like as in the following
. Theyy are separated byy bold letters.
8
Arrangement of treatments in blocks when AD, BE and ABC are confounded
Principle
p
block 1
Block 2
Block 3
Block 4
Block 5
Block 6
Block 7
Block 8
(1)
ab
ac
bc
abc
acd
cd
abcd
bcd
ad
abd
bd
bce
abce
ce
ace
be
abe
ae
abde
bde
ade
de
abcde
bcde
acde
cde
For example, block 2 is obtained by multiplying a with each factor combination in principal block as
(1) a = a, acd a = a 2 cd = cd , bce a = abce, abde a = a 2bde = bde; block 3 is obtained by multiplying b with (1),
acd , bce and abde and similarly other blocks are obtained.
If any other treatment combination is chosen to be multiplied with the treatments in principal block, then we get a block
which will be one among the blocks 1 to 8. For example, if ae is multiplied with the treatments in principal block, then the
blocks obtained consists of
Alternatively if ACD,
Alternatively,
ACD ABCD and ABCDE are to be confounded
confounded, then independent defining contrasts are ACD,
ACD ABCD,
ABCD
ABCDE and the principal block has (1), ac, ad and cd (= ac x ad).
ANALYSIS OF COVARIANCE
Dr. Shalabh
Department of Mathematics and Statistics
Indian Institute of Technology Kanpur
Any scientific experiment is performed to know something that is unknown about a group of treatments and to test certain
hypothesis about the corresponding treatment effect.
When variability of experimental units is small relative to the treatment differences and the experimenter do not wishes to
use experimental design
design, then just take large number of observations on each treatment effect and compute its mean
mean.
The variation around mean can be made as small as desired by taking more observations.
When there is considerable variation among observations on the same treatment and it is not possible to take an unlimited
number of observations, the techniques used for reducing the variation are
The use of concomitant variables is accomplished through the technique of analysis of covariance
covariance.
If both the techniques fail to control the experimental variability then the number of replications of different treatments (in
other words, the number of experimental units) are needed to be increased to a point where adequate control of variability
is attained.
I t d ti to
Introduction
t analysis
l i off covariance
i
model
d l
In the linear model
Y = X 11 + X 2 2 + ... + X p p + ,
if the explanatory variables are quantitative variables as well as indicator variables, i.e., some of them are qualitative and
some are quantitative, then the linear model is termed as analysis of covariance (ANCOVA) model.
Note that the indicator variables do not provide as much information as the quantitative variables. For example, the
q antitati e obser
quantitative
observations
ations on age can be con
converted
erted into indicator variable.
ariable Let an indictor variable
ariable be
1 if age 17 years
D=
0 if age < 17 years.
Now the following quantitative values of age can be changed into indicator variables.
Ages (in years)
14
15
16
17
20
21
22
It helps in reducing the sum of squares due to error which in turn reflects the better model adequacy diagnostics.
See how does this work:
In one way model : Yij = + i + ij ,
+ SSE1
+ SSE2
I three
In
th
way model
d l : Yij = + i + j + k + ik , we h
have TSS3 = SSA3 + SSB3 + SS 3 + SSE3 .
SS (effects) / df
.
SSE / df
Since
SSA, SSB etc. here are based on dummy variables, so obviously if SSA, SSB, etc. are based on quantitative
variables, they will provide more information. Such ideas are used in ANCOVA models and we construct the model by
incorporating the quantitative explanatory variables in ANOVA models.
In another example,
example suppose our interest is to compare several different kinds of feed for their ability to put weight on
animals. If we use ANOVA, then we use the final weights at the end of experiment. However, final weights of the animals
depend upon the initial weight of the animals at the beginning of the experiment as well as upon the difference in feeds.
U off ANCOVA models
Use
d l enables
bl us tto adjust
dj t or correctt th
these iinitial
iti l diff
differences.
ANCOVA is useful for improving the precision of an experiment.
If such an adjustment is not made, then the X can inflate the error mean square and makes the true differences is Y due
to treatment harder to detect.
If for a given experimental material, the use of proper experimental design cannot control the experimental variation, the
use
se of concomitant variables
ariables (which
( hich are related to experimental
e perimental material) may
ma be effective
effecti e in reducing
red cing the variability.
ariabilit
6
Consider the one way classification model as
E (Yij ) = i
i = 1, 2,..., p; j = 1, 2,..., N i ,
Var (Yij ) = 2 .
If usual analysis of variance for testing the hypothesis of equality of treatment effects shows a highly significant
difference in the treatment effects due to some factors affecting the experiment, then consider the model which takes into
account this effect
E (Yij ) = i + tij
i = 1,
1 2,...,
2
p, j = 11, 2,...,
2 Ni ,
Var (Yij ) = 2
where tij are the observations on concomitant variables (which are related to Xij) and is the regression coefficient
associated with tij. With this model, the variability of treatment effects can be considerably reduced.
For example, in any agricultural experimental, if the experimental units are plots of land then, tij can be measure of fertility
characteristic of the jth plot receiving ith treatment and Xij can be yield.
In another example, if experimental units are animals and suppose the objective is to compare the growth rates of groups of
animals receiving different diets. Note that the observed differences in growth rates can be attributed to diet only if all the
animals are similar in some observable characteristics like weight, age etc. which influence the growth rates.
In the absence of similarity, use tijj which is the weight or age of jth animal receiving ith treatment.
If we consider the quadratic regression in tij then
i = 1,..., p, j = 1,..., ni ,
Var (Yij ) = 2 .
2
ANCOVA in this case is the same as ANCOVA with two concomitant variables tij and tij .
E (Yij ) = + i + j + tij ,
i = 1,..., I , j = 1,..., J
or
= 0,
=0,
then ( yij , tij ) or ( yij , tij , wij ) are the observations in (i, j )
th
One-way classification
Let Yij ( j = 1... ni , i = 1... p ) be a random sample of size ni from ith normal populations with mean
ij = E (Yij ) = i + tij
Var (Yij ) = 2
where i , and 2 are the unknown parameters,
parameters tij are known constants which are the observations on a
concomitant variable.
The null hypothesis is H 0 : 1 = 2 = ... =
Let
1
ni
y ;
1
ni
t ;
yio =
tio =
ij
yoj =
ij
toj =
p.
1
1
yij , yoo = yij
p i
n i j
1
1
tij , too = tij
p i
n i j
n = ni .
i
Under the whole parametric space ( ) , use likelihood ratio test for which we obtain
squares principle (or maximum likelihood estimation) as follows:
Minimize
S = ( yij ij ) 2
i
= ( yij i tij ) 2
i
S
= 0 for fixed
i
i = yio tio .
Put
in
i.e. minimize
ij
( y y )(t t
=
(t t )
ij
io
ij
io
ij
Thus
S
= 0,
io
i = yio tio
ij = i + tij .
2
i
j
(tij tio )
i
S w = yij tij
i
S w
= 0,
S w
=0
( w ) as
10
= yoo too
( y y )(t t
(t t )
ij
oo
ij
oo
ij
oo
ij = + tij .
Hence
2
i
j
i
j
(tij too )
i
and
2
i
j
i
j
max L( , , 2 )
w
max L( , , 2 )
(
=
( y
i
ij
ij ) 2
2
ij ij )
11
Theorem 2: Let Y = (Y1 , Y2 ,..., Yn ) follows a multivariate normal distribution N ( , ) with mean vector and positive
definite covariance matrix
2
2
Let Y AY
follows ( p1 , A1 ) and Y A2Y follows ( p 2 , A2 ) . Then Y AY
and Y A2Y
1
1
0
are independently
i d
d tl di
distributed
t ib t d if A1A2 = 0.
2
Theorem 3: Let Y = (Y1 , Y2 ,..., Yn ) follows a multivariate normal distribution N ( , I ) , then the maximum likelihood (or
least squares)
sq ares) estimator L of estimable linear parametric ffunction
nction is independentl
independently distrib
distributed
ted of 2 ; L follow
follo
2
n
2
N L , L( X X )1 L and
follows (n p) where rank ( X ) = p.
2
Using these theorems on the independence of quadratic forms and dividing the numerator and denominator by respective
degrees of freedom, we have
(
n p 1
F=
p 1 ( y
i
ij
ij ) 2
ij
ij ) 2
~ F ( p 1, n p ) under H 0 .
12
The terms involved in
We can write
i
j
i
j
i
j
i
j
= ( yij yio ) (tij tio )
i
i
j
= ( yij ij ) 2 + ( ij ij ).
i
2
2
T
E
yt
yt
( ij ij ) 2 Tyy E yy
Ttt
Ett
= i j
=
( yij ij ) 2
E yt2
E yy
i
j
E
yy
where
Tyy = ( yij yoo ) 2 , Ttt = (tij too ) 2 , Tyt = ( yij yoo )(tij too ),
i
13
ANALYSIS OF COVARIANCE
Dr. Shalabh
Department of Mathematics and Statistics
Indian Institute of Technology Kanpur
Two way
y classification ((with one observations p
per cell))
Consider the case of two way classification with one observation per cell.
Let yij ~ N ( ij , ) be independently distributed with
2
: Grand mean
=0
=0
i
J
i
H 0 : 1 = 2 = ... = I = 0
H 0 : 1 = 2 = ... = J = 0
Dimension of whole parametric space ( ) : I + J
Dimension of sample space ( w ) : J + 1 under H 0
Dimension of sample space ( w ) : I + 1 under H 0
with respective alternative hypotheses as
( y
ij
ij ) 2 under .
To do this, minimize
(y
ij
i j tij ) 2 .
For fixed , which gives on solving the least squares estimates (or the maximum likelihood estimates) of the respective
parameters as
= yoo to
i = yi yoo ( tio too )
(1)
( y
ij
ij
( y
i =1 j =1
ij
gives
(t
i =1 j =1
ij
i j tij ) 2 reduces to
(2)
4
Using , we get from (1)
= yoo too
i = ( yio yoo ) ( tio too )
j = ( yoj yoo ) ( toj too )).
Hence
2
i
j
i
j
(tij tio toj + too )
i
= E yy
E yt2
Ett
where
( y
ij
= yoo too
( y y )(t t
(t t )
ij
oj
ij
oj
(3)
ij
oj
= + j + tij .
( yij ij ) 2 = ( yij y j ) 2
2
i
j
i
j
(tij toj )
i
E yt + Ayt
= E yy + Ayy
.
Ett + Att
6
where
Ett =
(t
ij
( y
ij
1 =
ij ) 2 ( yij ij ) 2
( y
2
ijj ijj )
Adjusting with degrees of freedom and using the earlier result for the independence of two quadratic forms and their
distribution
( yij ij ) 2 ( yij ij ) 2
( IJ I J ) i j
i
j
F1 =
~ F ( I 1, IJ I J ) under H 0 .
2
( I 1)
( yij ij )
i
j
( y
ij
i tij ) 2 with respect to , i and gives the least squares estimates (or the maximum
= yoo too
j = yio yoo ( tio too )
( y y )(t t
=
(t t )
ij
io
ij
io
(4)
ij
io
ij = + i + ij .
From (4), we get
(
y
y
)(
t
t
)
ij
io
ij
oj
i
j
2
2
( yij ij ) = ( yij yio )
2
i
j
i
j
(tij tio )
i
E yt + Byt
= E yy + Byy
Btt
where
Byy = I ( yoj yoo ) 2
j
( yij ij ) 2 ( yij ij ) 2
( IJ I J ) i j
i
j
F2 =
2
~ F ( J 1, IJ I J ) under H 0 .
( J 1)
(
y
ij
ij
i
j
So the decision rule is to reject H 0 whenever F2 F1 ( J 1, IJ I J ).
If H o is rejected, use multiple comparison methods to determine which of the contrasts i are responsible for this
rejection.
j ti
Th
The same iis ttrue ffor
H 0 .
9
The analysis of covariance table for two way classification is as follows:
Source
of
Sum of products
variation
Degrees of
yy
yt
tt
freedom
Between
Degrees of
freedom
I 1
Ayy
Ayt
Att
I 1
J 1
Byy
B yt
Btt
J 1
levels of A
B t
Between
levels of
q0 = q 3 q2
( I 1)( J 1)
E yy
Total
IJ 1
Tyy
Error +
levels of A
IJ J
IJ I
Error
o +
E yt
Tyt
Ett
IJ I J
Ttt
IJ 2
F1 =
IJ I J q0
I 1 q2
F2 =
IJ I J q1
J 1 q2
q1 = q4 q2
Error
levels of
Sum of squares
q2 = E yy
E yt2
Ett
_________
q3 = ( Ayy + E yy )
q4 = ( Byy + E yt )
( Ayt + E yt ) 2
Att + Ett
( Byyt + E yty ) 2
Btt + Ett
ANALYSIS OF VARIANCE IN
RANDOM--EFFECTS MODEL AND
RANDOM
MIXED--EFFECTS MODEL
MIXED
Dr. Shalabh
Department of Mathematics and Statistics
Indian Institute of Technology Kanpur
Random-effects model
The levels of factors used in experiment are randomly drawn from a population of possible levels in case of a randomeffects model for an experiment. The statistical inferences are drawn from the data for all levels of the factors in the
population from which the levels were selected and not only the levels used in the experiment.
For example, in case of quality control experiment about the daily production of five machines from an assembly line, we
have the following set ups of fixed and random effect models:
i.
Fixed-effects: The daily production of five particular machines from an assembly line.
ii.
Random-effects: The daily production of five machines, chosen at random, that represent the machines as a
class.
class
Many studies involve factors having a predetermined set of levels and factors in which the levels used in the study are
randomly selected from a population of levels.
For example, the blocks in a randomized complete block design may represent a random sample of b plots of land taken
from a population of plots in an agricultural research land. Then the effects due to the blocks are considered to be random
effects.
Suppose the treatments are four new varieties of wheat that have been developed to be resistant to a specific bacteria.
The levels of the treatment are fixed because there are only varieties of interest to the researchers, whereas the levels of
th plots
the
l t off land
l d are random
d
b
because
th researchers
the
h
are nott interested
i t
t d in
i only
l those
th
plots
l t off land
l d but
b t are interested
i t
t d in
i the
th
effects of these treatments on a wide range of plots of land.
When some of the factors to be used in the experiment have levels randomly selected from a population of possible levels
and other factors have predetermined levels, the model used to relate the response variable to the levels of the factors is
referred to as a mixedeffects model.
Mixedeffects model
In a mixed-effects model for an experiment, the levels of some of the factors used in the experiment are randomly selected
from a population of levels, whereas the levels of the other factors in the experiment are predetermined.
The inferences from the data in the experiment concerning factors with fixed levels are only for the levels of the factors
used in the experiment, whereas the inferences concerning factors with randomly selected levels are for all levels of the
factors in the population from which the levels were selected.
A l i off variance
Analysis
i
in
i one way random-effects
d
ff t model
d l
The model with random effects is of the same structure as the model with fixed effects given as
parameter
i however has now changed. The i ' s are now the random effects of the i th treatment ( i th machine). Hence,
Var ( i ) = 2
E ( ij i ) = 0,
0
E ( i j ) = 0 (i j ).
Then
yij ~ ( , 2 + 2 )
holds.
In the model with fixed effects,, the treatment effect A was represented
p
by
y the p
parameter estimates i , or i = + i ,
respectively. In the model with random effects, a treatment effect can be expressed by the variance components. The
2
variance is estimated as a component of the entire variance. The absolute or relative size of this component then
The estimation of the variances 2 and 2 requires no assumptions about the distribution. For the test of hypothesis and
the computation of confidence intervals, however, we assume the normal distribution, i.e.,
ij ~ N (0, 2 ),
ij 's are assumed to be independent of each other,
i ~ N (0, 2 ),
i 's are assumed to be independent of each other.
and, hence,
yij ~ N ( , 2 + 2 ).
Unlike, the model with effects, the response values yij of a level i of the treatment (i.e., of the
uncorrected
7
On the other hand, the response values of different samples are still uncorrelated (i i ', for any j , j ') :
effects are identically 0. In this case, each i estimate (i = 1, 2,..., s ) should be close
are not identically 0. In this case, the variability of the i estimate (i = 1, 2,..., s )
The ANOVA table for a random factor is the same as the ANOVA table for a fixed factor with
E ( MS Error ) = 2
i.e., 2 = MS Error is an unbiased estimate of 2 .
ni
yio = + i + io ,
yoo = + + oo ,
= ni i / n,
( yio yoo ) = ( i ) + ( io oo ).
)
9
Then
E ( yio yoo ) 2 = E ( i ) 2 + E ( io oo ) 2 ,
E ( i ) 2 = E ( i2 ) + E ( 2 ) 2 E ( i )
ni2
n
= 1 + 2 2 i ,
n
n
2
ni
2
n
1 1
= 2 .
ni n
10
Hence
ni
E( y
io
j =1
n
= ni + i
n
and
2
i
n2
ni
ni
2
+ 1
n
n
ni2
2
ni E ( yio yoo ) = n
1)
+ ( s 1).
n
i =1
We have now
i.
In the unbalanced case, i.e., all sample sizes ni s are not the same, we have
E ( MSTr ) =
1
E ( SS A ) = 2 + k 2
s 1
with
k=
ii.
1
1
2
n ni ;
s 1
n
k=
1
1 2
rs s r = r ,
s 1
rs
E ( MSTr ) = 2 + r 2 .
11
2
2
This yyields the unbiased estimate of as follows:
2 =
MSTr MS Error
k
2 =
MSTr MS Error
.
r
MS Error ~ 2 n2 s
and
MSTr ~ ( 2 + k 2 ) s21.
The two distributions are independent, hence the ratio
MSTr
2
.
MS Error 2 + k 2
has a central F- distribution under the assumption of equal variances, i.e., under H 0 : 2 = 0.
12
MSTr
~ Fs 1,n s .
MS Error
2
Hence H 0 : = 0 is tested with the same test statistic as H 0 : i = 0 (all i ) in the model with fixed effects. The table of
Source
Sum of
Degrees of
E(MS)
squares
freedom
Effects
Fixed
Treatments
SSTr
s1
Error
SSError
ns
Random
n
+
i
s 1
2
i
2 + k2
2
13
2 Estimate
2.
2 using the restricted maximum likelihood method because it always yields a nonnegative estimate.
estimate
3. Assume the model is incorrect, and examine the problem in another way. For example, add or remove an effect
from the model, and then analyze the new model.
ANALYSIS OF VARIANCE IN
RANDOM-EFFECTS MODEL AND
MIXED-EFFECTS MODEL
Dr. Shalabh
Department of Mathematics and Statistics
Indian Institute of Technology Kanpur
Case 1: For 2
Because
(n s ) MS Error
(n s ) MS Error
/2,n s
2
(n s ) MS Error
12 ,n s
Case 2: For 2
There is no closed form for a confidence interval for
2 .
Case 3: For
2
2
If all sample sizes are same, i.e., ni = r , then because MS Error and MSTr are independent
MSTr / ( 2 + r 2 )
~ F ( s 1, n s ).
MS Error / 2
MSTr
2
P F1 /2 ( s 1, n s )
F /2 ( s 1, n s ) =
1
2
2
+
MS
r
(
)
Error
MS Error ( 2 + r 2 )
1
1
1
P
=
2
MSTr
F /2 ( s 1, n s )
F1 /2 ( s 1, n s )
MS Error
1
P
MSTr
F1 /2 ( s 1, n s)
r 2
1
1.
=
1 + 2
(
1,
)
F
s
n
s
/2
2
2
Thus, a 100 (1 )% confidence interval for / is (L, U) where
1 MSTr
1
1
k MS Error F /2 ( s 1, n s )
1 MSTr
1
.
k MS Error F1 /2 ( s 1, n s )
and
4
2
Case 4: For 2 2
+
Note that
2
1- = P L 2 U
2
= P 1 + L 2 1 + U
2 + 2
= P 1 + L
+
U
1
1
2
1
=
2
2
1 + L + 1 + U
2
1
1
= P 1
1
1 2
2
1 + L + 1 + U
L
2
=
2 2
P
1 + L +
U
.
+
U
1
Thus, L , U is a 100 (1 )% confidence interval for 2 2 which represents the proportion of the total
+
1+ L 1+U
variability attribute to the variability among the treatments.
2
yijk = + i + j + ij + ijk ,
where we use the following conditions with the levels of factor A fixed and the levels of factor B randomly selected:
1.
j is a random effect due to the j th level of factor B. The j ' s have independent normal distributions, with
mean 0 and variance
4.
2 .
ij is a random effect due to the interaction of the i th level of factor A with the j th level of factor B. The ij s
have independent normal distributions with mean 0 and variance
5.
2 .
Using these assumptions, the analysis of variance table for a fixed, random, or mixed model in a two-factor experiment is
shown in following table.
6
ANOVA table for an a b factorial treatment structure, with n observations per cell
Source
Sum
of
Degrees of
freedom
E(MS)
Mean
squares
Fixed effects
Random Effects
Mixed Effects
A fixed, B Random
squares
SSA
a-1
MSA
2 + bn
2 + n 2 + bn 2
2 + n 2 + bn
SSB
b-1
MSB
2 + an
2 + n 2 + an 2
2 + n 2 + an
AB
SSAB (a - 1)(b 1)
MSAB
2 + n
2 + n 2
Error
SSE
MSE
Total
TSS
ab(n 1)
(nab 1)
2 + n 2
2
H 0 : 2
2 is the same in the mixed model as in the random-effects model. That is, to test
F=
MSAB
.
MSE
, we would proceed to use the following tests for factors A and B. For factor
A we have
H 0 : 1= ...= a= 0
H a : at least one of the s differs from the rest.
F=
MSA
MSAB
H 0 : 2 = 0
H1 : 2 > 0.
The test statistic is
F=
MSA
MSAB
based on df1 =
(b 1) and
df 2 =
(a 1) (b 1).
The analysis of variance procedure outlined for a mixed-effects model for an factorial treatment structure can be used as
well for a randomized block design, where treatments are fixed, blocks are assumed to be random, and there are
observations for each block and treatment.
ANALYSIS OF
NONORTHOGONAL DATA
Dr. Shalabh
Department of Mathematics and Statistics
Indian Institute of Technology Kanpur
Orthogonal
g
data
The concept of orthogonality of data is associated with two or higher way classified data. Consider the set up of two-way
classified data.
Let A and B be two factors at p and q levels respectively.
Let nij : number of observations in (i, j)th cell.
Let yijk = kth observation in cell, i = 1, 2,..., p; j = 1, 2,..., q; k = 1, 2,..., nij ,
= Tij
i
where Tij =
ijk
: cell total
= Tij .
j
Let
A = B
i
= nio ,
ij
ti =
= G : Grand total,
ij
= noj ,
io
= noj = n
j
Ai
: marginal mean of A,
nij
j
Bj
b j =
: marginal mean of B.
nij
j
3
If any contrast
l t
i i
m b
j
l A
i
qn
1
li (Ti1 + ... + Tiq ).
qn i
Similarly
M = m j b j =
j
m B
j
Bn
1
m j (T1 j + ... + Tpj ).
pn j
The sum of products of the coefficients of identical observations in the two contrasts is
1
pqn 2
l m n = pqn l m
i
= 0.
nij
nij '
Cj
C j'
yiik = + i + j + ijk
Assume that ijk ' s are identically and independently distribution as N (0,
(0 2 )).
Using the least squares principal, we minimize the sum of squares due to error as
E = ijk2 = ( yijk i j ) 2
i
E
= 0 noo + nio i + noj j = yijk = G
j
i
j
k
i
(1)
E
= 0 nio + nio i + nij j = yijk = Ai (i = 1, 2,..., I )
1
j
j
k
(2)
E
= 0 noj i + noj j = yijk = B j
j
i
k
j =
j
noj
noj
noj
( j = 1, 2,..., J ).
(3)
Bj
1
n
j ij n n
oj
oj
m mjj m = Ai
or
1
Bj
Ai nij
= nio + nio i nij nij
nojj
j
j
j
nojj
1
= nio i nij nmj m
noj m
j
nij2
= i ni
j nojj
or
1
nij .
nojj
j
Bj
nij2
Ai nij
= i nio
n
j
j noj
oj
Qi
nmj m
m
nij nmj
m
m i j noj
Cii
mj
Cim
or
Qi = Cii i + Cim m
(i = 1, 2,..., I ).
(4)
m i
nij2
nojj
Cii = nio
j
Cim =
j
nij2
noj
nij nmj
nojj
Cim = Cmi .
Q1 = C111 + C1m m
m 1
Q2 = C22 2 + C2 m m
m2
QI = CII I + CIm m .
m I
i
noj
i =1
i =1
j
I
nB
= Ai ij j .
noj
i =1
i
j
I
7
Using normal equations
Q = n
i =1
io
nij
i
noj
+ nioi + nij j
nij
+ n
n
n
i ij i
oj j
oj
= 0.
Consider right hand side of (4) and sum over I, we get
[ RHS of
i
if i (1.....I ) is a set of solution then i + (i = 1, 2,..., I ) is also a set of solution where is a constant.
To get unique solution, impose a condition
=0.
only.
8
'
After obtaining set of solution from (4) for i s, we can obtain the solution of 'j s from (3) if so required. Further, the error
sum of squares
q
is
E = ( yijk i j ) 2
2
= yijk
G i Ai j B j
i
nmjj m
Bj
m
m
= y G B j
j
noj
i
j
k
j
noj
B2
2
= yijk
j i Qi .
i
j
k
j noj
i
2
ijk
(5)
Here in this case, we eliminated j and obtained the error sum of squares by obtaining i .
Now we consider the other way round, i.e., eliminate i , obtain j and then obtain the sum of square due to error. So
doing so, we eliminate i and obtain the error sum of square as follows:
Ai2
j R j
E = y
i
j
k
i nio
j
2
ijk
(6)
Rj = Bj
i
nij Ai
nio
Both error sum of squares (5) and (6) are the same, so
B 2j
Ai2
i n + j j R j = j n + i iQi
io
oj
or
B 2j
Ai2
i n j n = i iQi j j R j .
io
oj
(7)
ANALYSIS OF
NONORTHOGONAL DATA
Dr. Shalabh
Department of Mathematics and Statistics
Indian Institute of Technology Kanpur
2
Now under
H 0 : 1 = 2 = ... = k = 0,,
we get the model as
yijk = + j + ijk .
Now minimize the sum of squares due to error as
E * = ( yijk j ) 2
i
E *
= 0 noo + noj j = G
j
E *
= 0 noj + noj j = B j
j
or
j =
Bj
noj
E1 = yijk j
i
= y
2
ijk
B 2j
noj
(8)
Ai2 G 2
.
nio noo
Degrees of
Sum of squares
Mean squares
freedom
A (adjusted)
I1
Q
i
MSA =
B (unadjusted)
J1
B 2j
n
j
Error
IJ I J 1
oj
G2
noo
SSerror
(by subtraction)
Total
IJ 1
G2
y
noo
i
j
k
2
ijk
MSB =
SSA(adjusted)
I 1
SSB (unadjusted)
J 1
MSE =
SSerror
IJ I J 1
MSA
MSE
E = y
2
ijk
B 2j
noj
i Qi
i
G2 Bj G2
2
= yijk
1Qi
n
n
n
i
j
k
j
i
oo
oj
oo
(9)
and also
Ai2
E = y
j R j
n
i
j
k
j
i
io
2
ijk
Ai2 G 2
G2
2
= yijk
j R j
noo j nio noo i
i j k
(10)
Tij = nij ( + i + j + ij ).
The error sum of squares is
E2 = yijk ( yijk i j ij )
i
= y
2
ijk
Tij2
nij
E = y
2
ijk
B 2j
noj
i Qi .
i
E E2 =
i
Tij2
nij
B 2j
noj
i Qi .
i
A
i
L = A i i = qi Qi .
i
If i ' s are available as linear functions of Qi ' s , then qi ' s can be obtained easily. However, qi ' s can be obtained as
follows:
Since
Qi = Cii i + Cim im , i = 1, 2,..., I
m i
so
A = q [C
i
+ Ci 2 2 + ... + CiI I ].
A i = qmCmi (i = 1, 2,.., I ).
m
'
except that Qi s are substituted by A i ' s and the unknown i ' s have been written as qi ' s .
Hence qi ' s can be obtained from the solution of the same normal equation
equation.
Now
nB
Var (Qi ) = Var Ai ij i = Cii2 2
noj
i
n B
n B
Cov(Qi , Qm ) = Cov Ai ij i Am mj j
noj
noj
i
i
2
= Cim .
So
= 2 qi qmCim
m
i
2
= qi A i .
i
In particular
Var ( i im ) = 2 (qi qm )
)
where
h
qi and
d qm are the
th coefficients
ffi i t off Qi and
d Qm respective
ti iin th
the expression
i giving
i i th
the estimate
ti t off ( i m ).
2
The sum of squares of a contrast is the square of the contrast is the square of the contrast divided by the coefficient of
qi Qi
i
.
A i qi
i
A
i
is
Cii = rni
where N =
rni2
.
N
Cij =
rni nm
.
N
or
ni2
r
r ni i ni nm m = Qi
N
N mi
n
Q
ni i i nm m = i .
N m
r
Imposing restriction
n
m
= 0,
0 the solution is obtained as
i =
Qi
.
rni
Ai2 G 2
i rn rN
i
Bi2 G 2
.
and unadjusted sum of squares due to B is
rN
j rni
2 1 1
Thus Var ( i m ) =
+ .
n ni nm