You are on page 1of 137

SPRINGER BRIEFS IN STATISTICS

Simo Puntanen
George P. H. Styan
Jarkko Isotalo

Formulas Useful
for Linear Regression
Analysis and Related
Matrix Theory
Its Only Formulas
But We Like Them
SpringerBriefs in Statistics

For further volumes:


http://www.springer.com/series/8921
Photograph 1 Tiritiri Island, Auckland, New Zealand. (Photo: SP)
Simo Puntanen
George P. H. Styan
Jarkko Isotalo

Formulas Useful
for Linear Regression
Analysis and Related
Matrix Theory
Its Only Formulas But We Like Them

123
Simo Puntanen George P. H. Styan
School of Information Sciences Department of Mathematics
University of Tampere and Statistics
Tampere, Finland McGill University
Montral, QC, Canada
Jarkko Isotalo
Department of Forest Sciences
University of Helsinki
Helsinki, Finland

ISSN 2191-544X ISSN 2191-5458 (electronic)


ISBN 978-3-642-32930-2 ISBN 978-3-642-32931-9 (eBook)
DOI 10.1007/978-3-642-32931-9
Springer Heidelberg New York Dordrecht London

Library of Congress Control Number: 2012948184

The Author(s) 2013


This work is subject to copyright. All rights are reserved by the Publisher, whether the whole or part of
the material is concerned, specifically the rights of translation, reprinting, reuse of illustrations,
recitation, broadcasting, reproduction on microfilms or in any other physical way, and transmission or
information storage and retrieval, electronic adaptation, computer software, or by similar or dissimilar
methodology now known or hereafter developed. Exempted from this legal reservation are brief
excerpts in connection with reviews or scholarly analysis or material supplied specifically for the
purpose of being entered and executed on a computer system, for exclusive use by the purchaser
of the work. Duplication of this publication or parts thereof is permitted only under the provisions of
the Copyright Law of the Publishers location, in its current version, and permission for use must always
be obtained from Springer. Permissions for use may be obtained through RightsLink at the Copyright
Clearance Center. Violations are liable to prosecution under the respective Copyright Law.
The use of general descriptive names, registered names, trademarks, service marks, etc. in this
publication does not imply, even in the absence of a specific statement, that such names are exempt
from the relevant protective laws and regulations and therefore free for general use.
While the advice and information in this book are believed to be true and accurate at the date of
publication, neither the authors nor the editors nor the publisher can accept any legal responsibility for
any errors or omissions that may be made. The publisher makes no warranty, express or implied, with
respect to the material contained herein.

Printed on acid-free paper

Springer is part of Springer Science+Business Media (www.springer.com)


Preface

Lie la lie, lie la la-lie lie la-lie.


There must be fifty ways to leave your lover.
Oh, still crazy after all these years.
Paul Simon1

Think about going to a lonely island for some substantial time and that you are sup-
posed to decide what books to take with you. This book is then a serious alternative:
it does not only guarantee a good nights sleep (reading in the late evening) but also
oers you a survival kit in your urgent regression problems (denitely met at the day
time on any lonely island, see for example Photograph 1, p. ii).
Our experience is that even though a huge amount of the formulas related to linear
models is available in the statistical literature, it is not always so easy to catch them
when needed. The purpose of this book is to collect together a good bunch of helpful
ruleswithin a limited number of pages, however. They all exist in literature but are
pretty much scattered. The rst version (technical report) of the Formulas appeared
in 1996 (54 pages) and the fourth one in 2008. Since those days, the authors have
never left home without the Formulas.
This book is not a regular textbookthis is supporting material for courses given
in linear regression (and also in multivariate statistical analysis); such courses are
extremely common in universities providing teaching in quantitative statistical anal-
ysis. We assume that the reader is somewhat familiar with linear algebra, matrix
calculus, linear statistical models, and multivariate statistical analysis, although a
thorough knowledge is not needed, one year of undergraduate study of linear alge-
bra and statistics is expected. A short course in regression would also be necessary
before traveling with our book. Here are some examples of smooth introductions to
regression: Chatterjee & Hadi (2012) (rst ed. 1977), Draper & Smith (1998) (rst
ed. 1966), Seber & Lee (2003) (rst ed. 1977), and Weisberg (2005) (rst ed. 1980).
The term regression itself has an exceptionally interesting history: see the excel-
lent chapter entitled Regression towards Mean in Stigler (1999), where (on p. 177)
he says that the story of Francis Galtons (18221911) discovery of regression is an
exciting one, involving science, experiment, mathematics, simulation, and one of the
great thought experiments of all time.
1
From (1) The Boxer, a folk rock ballad written by Paul Simon in 1968 and rst recorded by Simon
& Garfunkel, (2) 50 Ways to Leave Your Lover, a 1975 song by Paul Simon, from his album Still
Crazy After All These Years, (3) Still Crazy After All These Years, a 1975 song by Paul Simon and
title track from his album Still Crazy After All These Years.

v
vi Preface

This book is neither a real handbook: by a handbook we understand a thorough


representation of a particular area. There are some recent handbook-type books deal-
ing with matrix algebra helpful for statistics. The book by Seber (2008) should be
mentioned in particular. Some further books are, for example, by Abadir & Magnus
(2005) and Bernstein (2009). Quick visits to matrices in linear models and multi-
variate analysis appear in Puntanen, Seber & Styan (2013) and in Puntanen & Styan
(2013).
We do not provide any proofs nor references. The book by Puntanen, Styan &
Isotalo (2011) oers many proofs for the formulas. The website http://www.sis.uta.
/tilasto/matrixtricks supports both these books by additional material.
Sincere thanks go to Gtz Trenkler, Oskar Maria Baksalary, Stephen J. Haslett,
and Kimmo Vehkalahti for helpful comments. We give special thanks to Jarmo Nie-
mel for his outstanding LATEX assistance. The Figure 1 (p. xii) was prepared using
the Survo software, online at http://www.survo. (thanks go to Kimmo Vehkalahti)
and the Figure 2 (p. xii) using PSTricks (thanks again going to Jarmo Niemel).
We are most grateful to Alice Blanck, Ulrike Stricker-Komba, and to Niels Peter
Thomas of Springer for advice and encouragement.
This research has been supported in part by the Natural Sciences and Engineering
Research Council of Canada.
SP, GPHS & JI
June 7, 2012

MSC 2000: 15-01, 15-02, 15A09, 15A42, 15A99, 62H12, 62J05.


Key words and phrases: Best linear unbiased estimation, CauchySchwarz inequal-
ity, column space, eigenvalue decomposition, estimability, GaussMarkov model,
generalized inverse, idempotent matrix, linear model, linear regression, Lwner or-
dering, matrix inequalities, oblique projector, ordinary least squares, orthogonal pro-
jector, partitioned linear model, partitioned matrix, rank cancellation rule, reduced
linear model, Schur complement, singular value decomposition.

References
Abadir, K. M. & Magnus, J. R. (2005). Matrix Algebra. Cambridge University Press.
Bernstein, D. S. (2009). Matrix Mathematics: Theory, Facts, and Formulas. Princeton University
Press.
Chatterjee, S. & Hadi, A. S. (2012). Regression Analysis by Example, 5th Edition. Wiley.
Draper, N. R. & Smith, H. (1998). Applied Regression Analysis, 3rd Edition. Wiley.
Puntanen, S., Styan, G. P. H. & Isotalo, J. (2011). Matrix Tricks for Linear Statistical Models: Our
Personal Top Twenty. Springer.
Puntanen, S. & Styan, G. P. H. (2013). Chapter 52: Random Vectors and Linear Statistical Models.
Handbook of Linear Algebra, 2nd Edition (Leslie Hogben, ed.), Chapman & Hall, in press.
Puntanen, S., Seber, G. A. F. & Styan, G. P. H. (2013). Chapter 53: Multivariate Statistical Analysis.
Handbook of Linear Algebra, 2nd Edition (Leslie Hogben, ed.), Chapman & Hall, in press.
Seber, G. A. F. (2008). A Matrix Handbook for Statisticians. Wiley.
Seber, G. A. F. & Lee, A. J. (2006). Linear Regression Analysis, 2nd Edition. Wiley.
Stigler, S. M. (1999). Statistics on the Table: The History of Statistical Concepts and Methods.
Harvard University Press.
Weisberg, S. (2005). Applied Linear Regression, 3rd Edition. Wiley.
Contents

Formulas Useful for Linear Regression Analysis and Related Matrix Theory . . 1
1 The model matrix & other preliminaries . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
2 Fitted values and residuals . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
3 Regression coecients . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
4 Decompositions of sums of squares . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15
5 Distributions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19
6 Best linear predictor . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27
7 Testing hypotheses . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33
8 Regression diagnostics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39
9 BLUE: Some preliminaries . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44
10 Best linear unbiased estimator . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 48
11 The relative eciency of OLSE . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 58
12 Linear suciency and admissibility . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 63
13 Best linear unbiased predictor . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 65
14 Mixed model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 68
15 Multivariate linear model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 72
16 Principal components, discriminant analysis, factor analysis . . . . . . . . . . . 74
17 Canonical correlations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 76
18 Column space properties and rank rules . . . . . . . . . . . . . . . . . . . . . . . . . . . . 78
19 Inverse of a matrix . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 82
20 Generalized inverses . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 86
21 Projectors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 91
22 Eigenvalues . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 96
23 Singular value decomposition & other matrix decompositions . . . . . . . . . . 105
24 Lwner ordering . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 109
25 Inequalities . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 112
26 Kronecker product, some matrix derivatives . . . . . . . . . . . . . . . . . . . . . . . . . 115

Index . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 117

vii
Notation

Rnm the set of n  m real matrices: all matrices considered in this book
are real
Rnm
r the subset of Rnm consisting of matrices with rank r
NNDn the subset of symmetric n  n matrices consisting of nonnegative
denite (nnd) matrices
PDn the subset of NNDn consisting of positive denite (pd) matrices
0 null vector, null matrix
1n column vector of ones, shortened 1
In identity matrix, shortened I
ij the j th column of I; j th standard basis vector
Anm D faij g n  m matrix A with its elements aij , A D .a1 W : : : W am / pre-
sented columnwise, A D .a.1/ W : : : W a.n/ /0 presented row-wise
a column vector a 2 Rn
A0 transpose of matrix A; A is symmetric if A0 D A, skew-symmetric
if A0 D A
.A W B/ partitioned (augmented) matrix
1
A inverse of matrix Ann : AB D BA D In H) B D A1
A generalized inverse of matrix A: AA A D A
C
A the MoorePenrose inverse of matrix A: AAC A D A; AC AAC D
AC ; .AAC /0 D AAC ; .AC A/0 D AC A
A1=2 nonnegative denite square root of A 2 NNDn
C1=2
A nonnegative denite square square root of AC 2 NNDn
ha; bi standard inner product in Rn : ha; bi D a0 b
ha; biV inner product a0 Vb; V is a positive denite inner product matrix
(ipm)

ix
x Notation

kak Euclidean norm (standard norm, 2-norm) of vector a: kak2 D a0 a,


also denoted as kak2
kakV kak2V D a0 Va, norm when the ipm is positive denite V
kAkF Euclidean (Frobenius) norm of matrix A: kAk2F D tr.A0 A/
det.A/ determinant of matrix A, also denoted as jAj
diag.d1 ; : : : ; dn / n  n diagonal matrix with listed diagonal entries
diag.A/ diagonal matrix formed by the diagonal entries of Ann , denoted
also as A
r.A/ rank of matrix A, denoted also as rank.A/
tr.A/ trace of matrix Ann , denoted also as trace.A/: tr.A/ D a11 C
a22 C    C ann
A L 0 A is nonnegative denite: A D LL0 for some L
A >L 0 A is positive denite: A D LL0 for some invertible L
A L B A  B is nonnegative denite, Lwner partial ordering
A >L B A  B is positive denite
cos.a; b/ the cosine of the angle, , between the nonzero vectors a and b:
cos.a; b/ D cos  D ha; bi=.kakkbk/
vec.A/ the vector of columns of A, vec.Anm / D .a01 ; : : : ; a0m /0 2 Rnm
AB Kronecker product of Anm and Bpq :
0 1
a11 B : : : a1m B
B : :: :: C npmq
A B D @ :: : : A2R
an1 B : : : anm B
 11 A12 
A221 Schur complement of A11 in A D A A21 A22 : A221 D A22 
A21 A
11 A12 D A=A 11

PA orthogonal projector onto C .A/ w.r.t. ipm I: PA D A.A0 A/ A0 D


AAC
PAIV orthogonal projector onto C .A/ w.r.t. ipm V: PAIV D
A.A0 VA/ A0 V
PAjB projector onto C .A/ along C .B/: PAjB .A W B/ D .A W 0/
C .A/ column space of matrix Anm : C .A/ D f y 2 Rn W y D Ax for
some x 2 Rm g
N .A/ null space of matrix Anm : N .A/ D f x 2 Rm W Ax D 0 g
C .A/? orthocomplement of C .A/ w.r.t. ipm I: C .A/? D N .A0 /
A? matrix whose column space is C .A? / D C .A/? D N .A0 /
C .A/?
V orthocomplement of C .A/ w.r.t. ipm V
A?
V matrix whose column space is C .A/? ? ? 1 ?
V : AV D .VA/ D V A
Notation xi

chi .A/ D i the i th largest eigenvalue of Ann (all eigenvalues being real):
.i ; ti / is the i th eigenpair of A: Ati D i ti , ti 0
ch.A/ set of all n eigenvalues of Ann , including multiplicities
nzch.A/ set of nonzero eigenvalues of Ann , including multiplicities
sgi .A/ D i the i th largest singular value of Anm
sg.A/ set of singular values of Anm
U CV sum of vector spaces U and V
U V direct sum of vector spaces U and V; here U \ V D f0g
U V direct sum of orthogonal vector spaces U and U
vard .y/ D sy2 sample variance: argument is variable vector y 2 Rn :
vard .y/ D 1
n1
y0 Cy
covd .x; y/ D sxy sample covariance:
X
n
covd .x; y/ D 1
n1
x0 Cy D 1
n1
.xi  xN i /.yi  y/
N
iD1
p
cord .x; y/ D rxy sample correlation: rxy D x0 Cy= x0 Cx  y0 Cy, C is the centering
matrix
E.x/ expectation of a p-dimensional random vector x: E.x/ D x D
.1 ; : : : ; p /0 2 Rp
cov.x/ D covariance matrix (p  p) of a p-dimensional random vector x:
cov.x/ D D fij g D E.x  x /.x  x /0 ;
cov.xi ; xj / D ij D E.xi  i /.xj  j /;
var.xi / D i i D i2
cor.x/ correlation matrix of a p-dimensional random vector x:
 
ij
cor.x/ D f%ij g D
i j
xii

15

10

-5

-10

-15

x
-15 -10 -5 0 5 10 15

Figure 1 Observations from N2 .0; /; x D 5, y D 4, %xy D 0:7. Regression line has the
slope O1  %xy y =x . Also the regression line of x on y is drawn. The direction of the rst
major axis of the contour ellipse is determined by t1 , the rst eigenvector of .

1
X

y
SST SSR SSE
e I Hy
SSE

x
SSE
SST

H 1
SSR
y Jy 1 JHy

Figure 2 Illustration of SST D SSR C SSE.


Formulas Useful for Linear Regression Analysis
and Related Matrix Theory

1 The model matrix & other preliminaries

1.1 Linear model. By M D fy; X;  2 Vg we mean that we have the model y D


X C ", where E.y/ D X 2 Rn and cov.y/ D  2 V, i.e., E."/ D 0 and
cov."/ D  2 V; M is often called the GaussMarkov model.
 y is an observable random vector, " is unobservable random error vector,
X D .1 W X0 / is a given n  p (p D k C 1) model (design) matrix, the
vector D .0 ; 1 ; : : : ; k /0 D .0 ; 0x /0 and scalar  2 > 0 are unknown
 yi D E.yi / C "i D 0 C x0.i/ x C "i D 0 C 1 xi1 C    C k xik C "i ,
where x0.i/ D the i th row of X0
 from the context it is apparent when X has full column rank; when distri-
butional properties are considered, we assume that y  Nn .X;  2 V/
 according to the model, we believe that E.y/ 2 C .X/, i.e., E.y/ is a linear
combination of the columns of X but we do not know which linear combi-
nation
 from the context it is clear which formulas require that the model has the
intercept term 0 ; p refers to the number of columns of X and hence in the
no-intercept model p D k
 if the explanatory variables xi are random variables, then the model M
may be interpreted as the conditional model of y given X: E.y j X/ D X,
cov.y j X/ D  2 V, and the error term is dierence y  E.y j X/. In short,
regression is the study of how the conditional distribution of y, when x is
given, changes with the value of x.

S. Puntanen et al., Formulas Useful for Linear Regression 1


Analysis and Related Matrix Theory, SpringerBriefs in Statistics,
DOI: 10.1007/978-3-642-32931-9_1,  c The Author(s) 2013
2 Formulas Useful for Linear Regression Analysis and Related Matrix Theory
1
0
x0.1/
B C
1.2 X D .1 W X0 / D .1 W x1 W : : : W xk / D @ ::: A 2 Rn.kC1/ model matrix
0
x.n/ n  p, p DkC1
0 0 1 0 1
x.1/ x11 x12 : : : x1k
B :: C @ :: :: :: A
1.3 X0 D .x1 W : : : W xk / D @ : A D : : : 2 Rnk
0 xn1 xn2 : : : xnk data matrix
x.n/
of x1 ; : : : ; xk

1.4 1 D .1; : : : ; 1/0 2 Rn ; ii D .0; : : : ; 1 (i th) : : : ; 0/0 2 Rn ;


i0i X D x0.i/ D .1; x0.i/ / D the i th row of X;
i0i X0 D x0.i/ D the i th row of X0

1.5 x1 ; : : : ; xk variable vectors in variable space Rn


x.1/ ; : : : ; x.n/ observation vectors in observation space Rk

1.6 Xy D .X0 W y/ 2 Rn.kC1/ joint data matrix of x1 ; : : : ; xk and response y

1.7 J D 1.10 1/1 10 D n1 110 D P1 D Jn D orthogonal projector onto C .1n /


N D yN D .y;
Jy D y1 N y; N 0 2 Rn
N : : : ; y/

1.8 IJDC C D Cn D orthogonal projector onto C .1n /? ,


centering matrix
1.9 .I  J/y D Cy D y  y1
N n
D y  yN D yQ D .y1  y; N 0
N : : : ; yn  y/ centered y

1.10 xN D .xN 1 ; : : : ; xN k /0 D n1 X00 1n


D n1 .x.1/ C    C x.n/ / 2 Rk vector of x-means

1.11 JXy D .JX0 W Jy/ D .xN 1 1 W : : : W xN k 1 W y1/


N 0 0 1
xN yN
@ : :: A
D .xN 1 W : : : W xN k W yN / D 1.Nx ; y/
0
N D :: : 2R
n.kC1/

xN 0 yN

1.12 Q 0 D .I  J/X0 D CX0 D .x1  xN 1 W : : : W xk  xN k / D .Qx1 W : : : W xQ k /


X
0 0 1 0 0 1
x.1/  xN 0 xQ .1/
B : C B : C nk
D@ :: A D @ :: A 2 R centered X0
0 0 0
x.n/  xN xQ .n/
1 The model matrix & other preliminaries 3

1.13 Q y D .I  J/Xy
X
Q 0 W y  yN / D .X
D CXy D .CX0 W Cy/ D .X Q 0 W yQ / centered Xy
 0   0 
X0 CX0 X00 Cy Q X
X Q Q0Q
1.14 T D Xy0 .I  J/Xy D X Qy D
Q y0 X D 0 0 X0 y
y0 CX0 y0 Cy yQ 0 XQ 0 yQ 0 yQ
0 1
    t11 t12 : : : t1k t1y
T txy ssp.X0 / ssp.X0 ; y/ B :: :: :: :: C
D 0xx D DB : :
@tk1 tk2 : : : tkk tky A
: : C
txy tyy ssp.y; X0 / ssp.y/
ty1 ty2 : : : tyk tyy
D ssp.X0 W y/ D ssp.Xy / corrected sums of squares and cross products

X
n X
n
1.15 Txx D X Q 0 D X00 CX0 D
Q 00 X xQ .i/ xQ 0.i/ D .x.i/  xN /.x.i/  xN /0
iD1 iD1
X
n
D x.i/ x0.i/  nNxxN 0 D fx0i Cxj g D ftij g D ssp.X0 /
iD1
 
Sxx sxy
1.16 S D covd .Xy / D covd .X0 W y/ D 1
T D 0
n1 sxy sy2
 
covd .X0 / covd .X0 ; y/
D sample covariance matrix of xi s and y
covd .y; X0 / vard .y/
   
x covs .x/ covs .x; y/
D covs D here x is the vector
y covs .y; x/ vars .y/
of xs to be observed

1.17 T D diag.T/ D diag.t11 ; : : : ; tkk ; tyy /;


S D diag.S/ D diag.s12 ; : : : ; sk2 ; sy2 /

1.18 Q D .xQ W : : : W xQ W yQ /
Xy 1 k

Q y T1=2 ; centering & scaling: diag.X


DX Q
Q0X
y y / D IkC1

1.19 While calculating the correlations, we assume that all variables have nonzero
variances, that is, the matrix diag.T/ is positive denite, or in other words:
xi C .1/, i D 1; : : : ; k, y C .1/.

1.20 R D cord .X0 W y/ D cord .Xy / sample correlation matrix of xi s and y


D S1=2

SS1=2

D T1=2

TT1=2
!    
QX0 X Q X Q 0 yQ
DXQ0XQ D 0 0 0 D
Rxx rxy
D
cord .X0 / cord .X0 ; y/
y y Q
yQ 0 X yQ 0 yQ r0xy 1 cord .y; X0 / 1
0
4 Formulas Useful for Linear Regression Analysis and Related Matrix Theory
     
T t Sxx sxy Rxx rxy
1.21 T D 0xx xy ; SD D 1
T; RD
txy tyy s0xy sy2 n1 r0xy 1
0 1 0 1 0 1
t1y s1y r1y
1.22 txy D @ ::: A ; sxy D @ ::: A ; rxy D @ ::: A
tky sky rky

X
n X
n X
n 2
1.23 SSy D N 2D
.yi  y/ yi2  n1 yi
iD1 iD1 iD1
X
n
D yi2  nyN 2 D y0 Cy D y0 y  y0 Jy
iD1

X
n X
n X
n X
n
1.24 SPxy D .xi  x/.y
N i  y/
N D xi yi  1
n
xi yi
iD1 iD1 iD1 iD1
Xn
D xi yi  nxN yN D x0 Cy D x0 y  x0 Jy
iD1

1.25 sy2 D vars .y/ D vard .y/ D 1


n1
y0 Cy;
sxy D covs .x; y/ D covd .x; y/ D n11
x0 Cy

1.26 rij D cors .xi ; xj / D cord .xi ; xj / D cos.Cxi ; Cxj / D cos.Qxi ; xQj / D xQ 0i xQj
tij sij x0 Cxj SPij
Dp D Dp 0 i Dp sample
ti i tjj si sj 0
xi Cxi  xj Cxj SSi SSj correlation

1.27 If x and y are centered then cord .x; y/ D cos.x; y/.

1.28 Keeping observed data as a theoretical distribution. Let u1 , . . . , un be the ob-


served values of some empirical variable u, and let u be a discrete random
variable whose values are u1 , . . . , un , each with P.u D ui / D n1 . Then
E.u / D uN and var.u / D n1 n u
s 2 . More generally, consider a data matrix
0
U D .u.1/ W : : : W u.n/ / and dene a discrete random vector u with proba-
bility function P.u D u.i/ / D n1 , i D 1; : : : ; n. Then
N
E.u / D u; cov.u / D n1 U0 CU D n1
n
covd .U/ D n1
n
S:
Moreover, the sample correlation matrix of data matrix U is the same as the
(theoretical, population) correlation matrix of u . Therefore, any property
shown for population statistics, holds for sample statistics and vice versa.

1.29 Mahalanobis distance. Consider a data matrix Unp D .u.1/ W : : : W u.n/ /0 ,


where covd .U/ D S 2 PDp . The (squared) sample Mahalanobis distance of
1 The model matrix & other preliminaries 5

the observation u.i/ from the mean vector uN is dened as


N 0 S1 .u.i/  u/
N S/ D .u.i/  u/
MHLN2 .u.i/ ; u; N D kS1=2 .u.i/  u/k
N 2:
N S/ D .n1/hQ i i , where H
Moreover, MHLN2 .u.i/ ; u; Q D PCU . If x is a random
vector with E.x/ D  2 R and cov.x/ D 2 PDp , then the (squared)
p

Mahalanobis distance between x and  is the random variable


MHLN2 .x; ; / D .x/0 1 .x/ D z0 zI z D 1=2 .x/:

1.30 Statistical distance. The squared Euclidean distance of the i th observation u.i/
from the mean uN is of course ku.i/  uk N 2 . Given the data matrix Unp , one
may wonder if there is a more informative way, in statistical sense, to measure
N Consider a new variable D a0 u so that the n
the distance between u.i/ and u.
values of are in the variable vector z D Ua. Then i D a0 u.i/ and N D a0 u,N
and we may dene
ji  jN ja0 .u.i/  u/j
N
Di .a/ D p D p ; where S D covd .U/.
vars ./ 0
a Sa
Let us nd a vector a which maximizes Di .a/. In view of 22.24c (p. 102),
N 0 S1 .u.i/  u/
max Di2 .a/ D .u.i/  u/ N D MHLN2 .u.i/ ; u;
N S/:
a0

The maximum is attained for any vector a proportional to S1 .u.i/  u/.
N

1.31 C .A/ D the column space of Anm D .a1 W : : : W am /


D f z 2 Rn W z D At D a1 t1 C    C am tm for some t 2 Rm g  Rn

1.32 C .A/? D the orthocomplement of C .A/


D the set of vectors which are orthogonal (w.r.t. the standard inner
product u0 v) to every vector in C .A/
D f u 2 Rn W u0 At D 0 for all t g D f u 2 Rn W A0 u D 0 g
D N .A0 / D the null space of A0

1.33 Linear independence and rank.A/. The columns of Anm are linearly inde-
pendent i N .A/ D f0g. The rank of A, r.A/, is the maximal number of
linearly independent columns (equivalently, rows) of A; r.A/ D dim C .A/.

1.34 A? D a matrix whose column space is C .A? / D C .A/? D N .A0 /:


Z 2 fA? g () A0 Z D 0 and r.Z/ D n  r.A/ D dim C .A/? :

1.35 The rank of the model matrix X D .1 W X0 / can be expressed as


r.X/ D 1 C r.X0 /  dim C .1/ \ C .X0 / D r.1 W CX0 /
D 1 C r.CX0 / D 1 C r.Txx / D 1 C r.Sxx /;
6 Formulas Useful for Linear Regression Analysis and Related Matrix Theory

and thereby
r.Sxx / D r.X/  1 D r.CX0 / D r.X0 /  dim C .1/ \ C .X0 /:
If all x-variables have nonzero variances, i.e., the correlation matrix Rxx is
properly dened, then r.Rxx / D r.Sxx / D r.Txx /. Moreover,
Sxx is pd () r.X/ D k C 1 () r.X0 / D k and 1 C .X0 /:

1.36  In 1.1 vector y is a random vector but for example in 1.6 y is an observed
sample value.
 cov.x/ D D fij g refers to the covariance matrix (p  p) of a random
vector x (with p elements), cov.xi ; xj / D ij , var.xi / D i i D i2 :
cov.x/ D D E.x  x /.x  x /0
D E.xx0 /  x 0x ; x D E.x/:

 notation x  .; / indicates that E.x/ D  and cov.x/ D


 the determinant det./ is called the (population) generalized variance


 cor.x/ D % D iijj refers to the correlation matrix of a random vector x:

cor.x/ D % D 1=2

1=2

; D 1=2

% 1=2

 cov.Ax/ D A cov.x/A0 D AA0 , A 2 Rap , E.Ax/ D Ax


 E 1=2 .x  / D 0, cov 1=2 .x  / D Ip , when x  .; /
 cov.T0 x/ D , if D TT0 is the eigenvalue decomposition of
 var.a0 x/ D a0 cov.x/a D a0 a  0 for all a 2 Rp and hence every
covariance matrix is nonnegative denite; is singular i there exists a
nonzero a 2 Rp such that a0 x D a constant with probability 1
 var.x1 x2 / D 12 C 22 212
     
x cov.x/ cov.x; y/ xx xy
 cov D D ,
y cov.y; x/ cov.y/ yx yy
   
x xx  xy
cov D
y  0xy y2
 cov.x; y/ refers to the covariance matrix between random vectors x and y:
cov.x; y/ D E.x  x /.y  y /0 D E.xy0 /  x 0y D xy

 cov.x; x/ D cov.x/
 cov.Ax; By/ D A cov.x; y/B0 D A xy B0
 cov.Ax C By/ D A xx A0 C B yy B0 C A xy B0 C B yx A0
1 The model matrix & other preliminaries 7

 cov.Ax; By C Cz/ D cov.Ax; By/ C cov.Ax; Cz/


 cov.a0 x; y/ D a0 cov.x; y/ D a0  xy D a1 1y C    C ap py
 cov.z/ D I2 ,
   2 
x p0 x xy
AD H) cov.Az/ D ;
y % y 1  %2 xy y2
i.e.,
 
u
cov.z/ D cov D I2
v
   2 
xu
p x xy
H) cov   D
y %u C 1  %2 v xy y2

    !  
1 0 x x 2 0
 cov D cov xy D x 2
xy =x2 1 y y  2 x 0 y .1  %2 /
x

 covd .Unp / D covs .u/ refers to the sample covariance matrix


Xn
covd .U/ D n1
1
U0 CU D n1 1
.u.i/  u/.u N 0DS
N .i/  u/
iD1
 the determinant det.S/ is called the (sample) generalized variance
 covd .UA/ D A0 covd .U/A D A0 SA D covs .A0 u/
 covd .US1=2 / D Ip D covd .CUS1=2 /
 U D CUS1=2 : U is centered and transformed so that the new vari-
N
ables are uncorrelated N and each has variance 1. Moreover, diag.UU0 / D
N S/.
diag.d1 ; : : : ; dn /, where di2 D MHLN2 .u.i/ ; u;
2 2 NN

 U D CUdiag.S/1=2 : U is centered and scaled so that each variable


has variance 1 (and the squared length n  1)
Q D CUdiag.U0 CU/1=2 : U
 U Q is centered and scaled so that each variable
1
has length 1 (and variance n1 )
 Denote U# D CUT, where Tpp comprises the orthonormal eigenvectors
of S W S D TT0 , D diag.1 ; : : : ; p /. Then U# is centered and trans-
formed so that the new variables are uncorrelated and the i th variable has
variance i : covd .U# / D .
 vars .u1 C u2 / D covd .Un2 12 / D 10 covd .U/1 D 10 S22 1 D s12 C s22 C
2s12
 vars .u1 u2 / D s12 C s22 2s12
8 Formulas Useful for Linear Regression Analysis and Related Matrix Theory

2 Fitted values and residuals

2.1 H D X.X0 X/ X0 D XXC D PX orthogonal projector onto C .X/


0 1 0
D X.X X/ X; when r.X/ D p

2.2 H D P1 C P.IJ/X0 D P1 C PXQ 0 X D .1 W X0 /


DJCX Q 0 / X
Q 00 X
Q 0 .X Q 00 D J C X
Q 0 T Q0
xx X0
Q 0 D .I  J/X0 D CX0
X

2.3 Q 0 .X
H  J D PXQ 0 D X Q 0 / X
Q 00 X Q 00 D PC .X/\C .1/?

2.4 H D P X1 C P M1 X2 I
M1 D I  P X1 ; X D .X1 W X2 /; X1 2 Rnp1 ; X2 2 Rnp2

2.5 H  PX1 D PM1 X2 D PC .X/\C .M1 / D M1 X2 .X02 M1 X2 / X02 M1

2.6 C .M1 X2 / D C .X/ \ C .M1 /;


C .M1 X2 /? D N .X02 M1 / D C .X/?  C .X1 /

2.7 C .Cx/ D C .1 W x/ \ C .1/? ; C .Cx/? D C .1 W x/?  C .1/

2.8 r.X02 M1 X2 / D r.X02 M1 / D r.X2 /  dim C .X1 / \ C .X2 /

2.9 The matrix X02 M1 X2 is pd i r.M1 X2 / D p2 , i.e., i C .X1 / \ C .X2 / D f0g


and X2 has full column rank. In particular, Txx D X00 CX0 is pd i r.CX0 / D
k i r.X/ D k C 1 i
C .X0 / \ C .1/ D f0g and X0 has full column rank.

2.10 H D X1 .X01 M2 X1 / X01 M2 C X2 .X02 M1 X2 / X02 M1


i C .X1 / \ C .X2 / D f0g

2.11 MDIH orthogonal projector onto C .X/? D N .X0 /

2.12 M D I  .PX1 C PM1 X2 / D M1  PM1 X2 D M1 .I  PM1 X2 /

2.13 yO D Hy D XO D X
c D OLSE.X/ OLSE of X, the tted values

2.14 Because yO is the projection of y onto C .X/, it depends only on C .X/, not on
a particular choice of X D X , as long as C .X/ D C .X /. The coordinates
O depend on the choice of X.
of yO with respect to X, i.e., ,
2 Fitted values and residuals 9

2.15 yO D Hy D XO D 1O0 C X0 O x D O0 1 C O1 x1 C    C Ok xk
D .J C PXQ 0 /y D Jy C .I  J/X0 T1 N 0 T1
xx t xy D .yN  x
1
xx t xy /1 C X0 Txx t xy

2.16 yO D X1 O 1 C X2 O 2 D X1 .X01 M2 X1 /1 X01 M2 y C X2 .X02 M1 X2 /1 X02 M1 y


D .PX1 C PM1 X2 /y D X1 .X01 X1 /1 X01 y C M1 X2 .X02 M1 X2 /1 X02 M1 y
D PX1 y C M1 X2 O 2 here and in 2.15 r.X/ D p

2.17 OLS criterion. Let O be any vector minimizing ky  Xk2 . Then XO is


OLSE.X/. Vector XO is always unique but O is unique i r.X/ D p. Even
if r.X/ < p, O is called the OLSE of even though it is not an ordinary esti-
mator because of its nonuniqueness; it is merely a solution to the minimizing
problem. The OLSE of K0 is K0 O which is unique i K0 is estimable.

2.18 Normal equations. Let O 2 Rp be any solution to normal equation X0 X D


X0 y. Then O minimizes ky  Xk. The general solution to X0 X D X0 y is
O D .X0 X/ X0 y C Ip  .X0 X/ X0 Xz;
where z 2 Rp is free to vary and .X0 X/ is an arbitrary (but xed) generalized
inverse of X0 X.

2.19 Generalized normal equations. Let Q 2 Rp be any solution to the generalized


normal equation X0 V1 X D X0 V1 y, where V 2 PDn . Then Q minimizes
ky  XkV1 . The general solution to the equation X0 V1 X D X0 V1 y is
Q D .X0 V1 X/ X0 V1 y C Ip  .X0 V1 X/ X0 V1 Xz;
where z 2 Rp is free to vary.

2.20 Under the model fy; X;  2 Ig the following holds:


(a) E.Oy/ D E.Hy/ D X; cov.Oy/ D cov.Hy/ D  2 H
(b) cov.Oy; y  yO / D 0; cov.y; y  yO / D  2 M
(c) yO  Nn .X;  2 H/ under normality
O where x0 is the i th row of X
(d) yOi D x0.i/ , yOi  N.x0.i/ ;  2 hi i /
.i/

(e) "O D y  yO D .In  H/y D My D res.yI X/ "O D the residual vector


(f) " D y  X; E."/ D 0; cov."/ D  2 In " D error vector
O D 0, cov."/
(g) E."/ O D  2 M and hence the components of the residual
vector "O may be correlated and have unequal variances
(h) "O D My  Nn .0;  2 M/ when normality is assumed
O
(i) "Oi D yi  yOi D yi  x0.i/ ; "Oi  N0;  2 .1  hi i /, "Oi D the i th
residual
10 Formulas Useful for Linear Regression Analysis and Related Matrix Theory

(j) var.O"i / D  2 .1  hi i / D  2 mi i
hij mij
(k) cor.O"i ; "Oj / D D
.1  hi i /.1  hjj /1=2 .mi i mjj /1=2
"Oi "i
(l) p  N.0; 1/;  N.0; 1/
 1  hi i 

2.21 Under the intercept model fy; .1 W X0 /;  2 Ig the following holds:


 0 
y0 .I  J/Hy y .H  J/y 1=2
(a) cord .y; yO / D p D
y0 .I  J/y  y0 H.I  J/Hy y0 .I  J/y
 
SSR 1=2
D D R D Ryx D the multiple correlation
SST
O D0
(b) cord .Oy; "/ the tted values and residuals are uncorrelated
O D0
(c) cord .xi ; "/ each xi -variable is uncorrelated
with the residual vector
O D .C/.1  R2 /1=2 ( 0)
(d) cord .y; "/ y and "O may be
positively correlated
Pn
(e) "O 0 1 D y0 M1 D 0, i.e., Oi
iD1 " D0 the residual vector "O is centered
0O 0
(f) "O y D 0; "O xi D 0
Pn
(g) yO 0 1 D y0 H1 D y0 1, i.e., 1
n iD1 yOi D yN the mean of yOi -values is yN

2.22 Under the model fy; X;  2 Vg we have


(a) E.Oy/ D X; cov.Oy/ D  2 HVH; cov.Hy; My/ D  2 HVM,
O D ;
(b) E./ O D  2 .X0 X/1 X0 VX.X0 X/1 .
cov./ [if r.X/ D p]

3 Regression coecients

In this section we consider the model fy; X;  2 Ig, where r.X/ D p most of
the time and X D .1 W X0 /. As regards distribution, y  Nn .X;  2 I/.
!
O0
3.1 O D .X X/ X y D
0 1 0
2 RkC1 estimated regression
O x coecients, OLSE./

3.2 O D ;
E./ O D  2 .X0 X/1 I
cov./ O  NkC1 ;  2 .X0 X/1 

3.3 O0 D yN  O 0x xN D yN  .O1 xN 1 C    C Ok xN k / estimated constant term,


intercept
3 Regression coecients 11

3.4 O x D .O1 ; : : : ; Ok /0
Q 00 X
D .X00 CX0 /1 X00 Cy D .X Q 0 /1 X
Q 00 yQ D T1 1
xx t xy D Sxx sxy

SPxy sxy sy
3.5 k D 1; X D .1 W x/; O0 D yN  O1 x;
N O1 D D 2 D rxy
SSx sx sx

3.6 If the model does not have the intercept term, we denote p D k, X D X0 ,
and O D .O1 ; : : : ; Op /0 D .X0 X/1 X0 y D .X00 X0 /1 X00 y.

3.7 If X D .X1 W X2 /, Mi D I  PXi , Xi 2 Rnpi , i D 1; 2, then


!  
O1
.X01 M2 X1 /1 X01 M2 y
O D O D and
2 .X02 M1 X2 /1 X02 M1 y

3.8 O 1 D .X01 X1 /1 X01 y  .X01 X1 /1 X01 X2 O 2 D .X01 X1 /1 X01 .y  X2 O 2 /:

3.9 Denoting the full model as M12 D fy; .X1 W X2 /;  2 Ig and


M1 D fy; X1 1 ;  2 Ig; small model
2
M122 D fM2 y; M2 X1 1 ;  M2 g; with M2 D I  PX2 ; reduced
model
O i .A / D OLSE of i under the model A ;
we can write 3.8 as
(a) O 1 .M12 / D O 1 .M1 /  .X01 X1 /1 X01 X2 O 2 .M12 /,
and clearly we have
(b) O 1 .M12 / D O 1 .M122 /. (FrischWaughLovell theorem)

3.10 Let M12 D fy; .1 W X0 /;  2 Ig, D .0 W 0x /0 , M121 D fCy; CX0 x ;


 2 Cg D centered model. Then 3.9b means that x has the same OLSE in the
original model and in the centered model.

3.11 O 1 .M12 / D .X01 X1 /1 X01 y, i.e., the old regression coecients do not change
when the new regressors (X2 ) are added i X01 X2 O 2 D 0.

3.12 The following statements are equivalent:


(a) O 2 D 0, (b) X02 M1 y D 0, (c) y 2 N .X02 M1 / D C .M1 X2 /? ,
(d) pcord .X2 ; y j X01 / D 0 or y 2 C .X1 /I X D .1 W X01 W X2 /
D .X1 W X2 /:

3.13 The old regression coecients do not change when one new regressor xk is
added i X01 xk D 0 or Ok D 0, with Ok D 0 being equivalent to x0k M1 y D 0.
12 Formulas Useful for Linear Regression Analysis and Related Matrix Theory

3.14 In the intercept model the old regression coecients O1 , O2 , . . . , Okp2 (of
real predictors) do not change when new regressors (whose values are in
X2 ) are added if cord .X01 ; X2 / D 0 or O 2 D 0; here X D .1 W X01 W X2 /.

x0k .I  PX1 /y x 0 M1 y u0 v
3.15 Ok D 0 D 0k D 0 , where X D .X1 W xk / and
xk .I  PX1 /xk x k M1 x k vv
u D M1 y D res.yI X1 /, v D M1 xk D res.xk I X1 /, i.e., Ok is the OLSE of
k in the reduced model M121 D fM1 y; M1 xk k ;  2 M1 g.

y 0 M1 y
3.16 Ok2 D ryk12:::k1
2
 0 2
D ryk12:::k1  t kk  SSE.yI X1 /
xk M1 x0k

3.17 Multiplying xk by a means that Ok will be divided by a.


Multiplying y by b means that Ok will be multiplied by b.
si
3.18 O D R1
xx rxy ; O i D Oi ; O 0 D 0; standardized regr. coecients
sy (all variables centered & equal variance)

r1y  r12 r2y r2y  r12 r1y .1  r2y


2 1=2
/
3.19 k D 2 W O 1 D ; O
2 D ; O
1 D r1y2
1  r12
2
1  r12
2
.1  r12 /
2 1=2

3.20 Let X D .1 W X1 W X2 / WD .Z W X2 /, where X2 2 Rnp2 . Then


O D  2 .X0 X/1
cov./
 1  
2 n 10 X0 2 1=n C x N 0 T1 xN Nx0 T1
D D xx xx
X00 1 X00 X0 T1 xx x N T1
xx
0 1
00 01
! t t : : : t 0k
B 10 t 11 : : : t 1k C
var.O0 / cov.O0 ; O x / 2 Bt C
D D  B :: :: : : : C
cov.O x ; O0 / cov.O x / @ : : : :: A
t k0 t k1 : : : t kk
 0 0
1 

2 Z Z Z X2 2 T T2
D D
X02 Z X02 X2 T2 T22
 0 1

2 Z .I  PX2 /Z 
D ; where
 X02 .I  PZ /X2 1
(a) T22 D T1 0
221 D X2 .I  P.1WX1 / X2 
1

D .X02 M1 X2 /1 D X02 .C  PCX1 /X2 1 M1 D I  P.1WX1 /


D X02 CX2  X02 CX1 .X01 CX1 /1 X01 CX2 1
D .T22  T21 T1 1
11 T12 / ,
3 Regression coecients 13
 
T11 T12
(b) Txx D .X1 W X2 /0 C.X1 W X2 / D 2 Rkk ,
T21 T22
 
 
T1 D ,
xx  T22
(c) t kk D x0k .I  PX1 /xk 1 X D .X1 W xk /, X1 D .1 W x1 W : : : W xk1 /
D 1=SSE.k/ D 1=SSE.xk explained by all other xs/
D 1=SSE.xk I X1 / D 1=tkkX1 corresp. result holds for all t i i
(d) t kk D .x0k Cxk /1 D 1=tkk i X01 Cxk D 0 i r1k D r2k D    D
rk1;k D 0,
(e) t 00 D 1
n
C xN 0 T1 N D .n  10 PX0 1/1 D k.I  PX0 /1k1 .
xx x

3.21 Under fy; .1 W x/;  2 Ig:


  P 2 
O 2 1=n C x N 2 =SSx x=SS
N x 2 xi =n xN
cov./ D  D ;
x=SS
N x 1=SSx SSx xN 1
2 x0 x xN
var.O1 / D ; var.O0 / D  2 ; cor.O0 ; O1 / D p :
SSx nSSx x0 x=n

3.22 cov.O 2 / D  2 .X02 M1 X2 /1 ; O 2 2 Rp2 ; X D .X1 W X2 /; X2 is n  p2

3.23 cov.O 1 j M12 / D  2 .X01 M2 X1 /1 L  2 .X01 X1 /1 D cov.O 1 j M1 /:


adding new regressors cannot decrease the variances of old regression coef-
cients.

3.24 Ri2 D R2 .xi explained by all other xs/


D R2 .xi I X.i/ / X.i/ D .1 W x1 ; : : : ; xi1 ; xiC1 ; : : : ; xk /
SSR.i / SSE.i /
D D1 SSE.i / D SSE.xi I X.i/ /, SST.i / D ti i
SST.i / SST.i /

1
3.25 VIFi D D r i i ; i D 1; : : : ; k; VIFi D variance ination factor
1  Ri2
SST.i / ti i
D D D ti i t i i ; R1 ij 1
xx D fr g; Txx D ft g
ij
SSE.i / SSE.i /

3.26 VIFi  1 and VIFi D 1 i cord .xi ; X.i/ / D 0

3.27 var.Oi / D  2 t i i
2 VIFi rii 2
D D 2 D 2 D ; i D 1; : : : ; k
SSE.i / ti i ti i .1  Ri2 /ti i
14 Formulas Useful for Linear Regression Analysis and Related Matrix Theory

2
3.28 k D 2 W var.Oi / D ; cor.O1 ; O2 / D r12
.1  r12
2
/ti i
3.29 cor.O1 ; O2 / D r1234:::k D  pcord .x1 ; x2 j X2 /; X2 D .x3 W : : : W xk /

3.30 b O D O 2 .X0 X/1


cov./ estimated covariance matrix of O

3.31 c Oi / D O 2 t i i D se2 .Oi /


var. estimated variance of Oi
p p
3.32 se.Oi / D c Oi / D O t i i
var. estimated stdev of Oi , standard error of Oi

3.33 Oi t=2Ink1 se.Oi / .1  /100% condence interval for i

3.34 Best linear unbiased prediction, BLUP, of y under


     
y X 2 In 0
M D ; ;  I
y x0# 0 1
a linear model with new future observation; see Section 13 (p. 65). Suppose
that X D .1 W X0 / and denote x0# D .1; x0 / D .1; x1 ; : : : ; xk /. Then
(a) y D x0# C " new unobserved value y with
a given .1; x0 / under M
(b) yO D x0# O D O0 C x0 O x D O0 C O1 x1 C    C Ok xk
D .yN  O 0x xN / C O 0x x D yN C O 0x .x  xN / yO D BLUP.y /
(c) e D y  yO prediction error with a given x
(d) var.yO / D O
var.x0# / D  2 x0# .X0 X/1 x# WD  2 h#

D var.y/N C var O 0x .x  xN / Note: cov.O x ; y/
N D0
0 1

D  n C .x  xN / Txx .x  xN /
2 1

D  2 n1 C n1
1
.x  xN /0 S1
xx .x  xN/

D  2 n1 C n1
1
MHLN2 .x ; xN ; Sxx /

1 .x  x/
N 2
(e) var.yO / D  2 C ; when k D 1; yO D O0 C O1 x
n SSx
(f) var.e / D var.y  yO / variance of the prediction error
D var.y / C var.yO / D  C  h# D  2 1 C x0# .X0 X/1 x# 
2 2


1 .x  x/
N 2
(g) var.e / D  2 1 C C ; when k D 1; yO D O0 C O1 x
n SSx
(h) se2 .yO / D var. O D O 2 h#
c yO / D se2 .x0# / estimated variance of yO
4 Decompositions of sums of squares 15

(i) se2 .e / D var.e


c /
D var.y
c   yO / D O 2 .1 C h# / estimated variance of e
p
0 O O
c yO / D x# t=2Ink1 se.x0# /
(j) yO t=2Ink1 var.
p
D yO t=2Ink1 O h# condence interval for E.y /
p p
(k) yO t=2Ink1 var.y
c   yO / D yO t=2Ink1 O 1 C h#
prediction interval for the new unobserved y
p p
(l) yO .k C 1/F;kC1;nk1 O h#
WorkingHotelling condence band for E.y /

4 Decompositions of sums of squares

Unless otherwise stated we assume that 1 2 C .X/ holds throughout this sec-
tion.
4.1 SST D ky  yN k2 D k.I  J/yk2 D y0 .I  J/y D y0 y  nyN 2 D tyy total SS

4.2 SSR D kOy  yN k2 D k.H  J/yk2 D k.I  J/Hyk2 D y0 .H  J/y


D y0 PCX0 y D y0 PXQ 0 y D t0xy T1
xx t xy SS due to regression; 1 2 C .X/

4.3 SSE D ky  yO k2 D k.I  H/yk2 D y0 .I  H/y D y0 My D y0 .C  PCX0 /y


D y0 y  y0 XO D tyy  t0xy T1
xx t xy residual sum of squares

4.4 SST D SSR C SSE

4.5 (a) df.SST/ D r.I  J/ D n  1; sy2 D SST=.n  1/,


(b) df.SSR/ D r.H  J/ D r.X/  1; MSR D SSR=r.X/  1,
(c) df.SSE/ D r.I  H/ D n  r.X/; MSE D SSE=n  r.X/ D O 2 .
X
n X
n X
n
4.6 SST D N 2;
.yi  y/ SSR D N 2;
.yOi  y/ SSE D .yi  yOi /2
iD1 iD1 iD1
 
SSR
4.7 SSE D SST 1  D SST.1  R2 / D SST.1  Ryx
2
/
SST

4.8 MSE D O 2 sy2 .1  Ryx


2
/ 2
which corresponds to yx D y2 .1  %yx
2
/

4.9 MSE D O 2 D SSE=n  r.X/ unbiased estimate of  2 ,


residual mean square
D SSE=.n  k  1/; when r.X/ D k C 1
16 Formulas Useful for Linear Regression Analysis and Related Matrix Theory

4.10 We always have .I  J/y D .H  J/y C .I  H/y and similarly always y0 .I 


J/y D y0 .H  J/y C y0 .I  H/y, but the decomposition 4.4,
(a) k.I  J/yk2 D k.H  J/yk2 C k.I  H/yk2 ,
is valid i .Oy  yN /0 .y  yO / D .H  J/y0 .I  H/y D y0 J.H  I/y D 0, which
holds for all y i JH D J which is equivalent to 1 2 C .X/, i.e., to H1 D 1.
Decomposition (a) holds also if y is centered or y 2 C .X/.

4.11 1 2 C .X/ () H  J is orthogonal projector () JH D HJ D J in


which situation JOy D JHy D Jy D .y;
N y; N 0.
N : : : ; y/

4.12 In the intercept model we usually have X D .1 W X0 /. If X does not explicitly


have 1 as a column, but 1 2 C .X/, then C .X/ D C .1 W X/ and we have
H D P1 C PCX D J C PCX , and H  J is indeed an orthogonal projector.

4.13 SSE D min ky  Xk2 D SSE.yI X/ D kres.yI X/k2


4.14 SST D min ky  1k2 D SSE.yI 1/ D kres.yI 1/k2 y explained only by 1


4.15 SSR D minkHy  1k2 D SSE.HyI 1/ D kres.HyI 1/k2


D SSE.yI 1/  SSE.yI X/ change in SSE gained


adding real predictors
D SSE.X0 j 1/ when 1 is already in the model

SSR SSE
4.16 R2 D Ryx
2
D D1 multiple correlation coecient squared,
SST SST coecient of determination
SSE.yI 1/  SSE.yI X/
D fraction of SSE.yI 1/ D SST accounted
SSE.yI 1/ for by adding predictors x1 ; : : : ; xk
t0xy T1
xx t xy s0xy S1
xx sxy
D D
tyy sy2
D r0xy R1 O 0 rxy D O 1 r1y C    C O k rky
xx rxy D

4.17 O D cord2 .yI yO / D cos2 Cy; .H  J/y


R2 D max cord2 .yI X/ D cord2 .yI X/

4.18 (a) SSE D y0 I  .PX1 C PM1 X2 /y X D .X1 W X2 /; M1 D I  PX1


0 0
D y M1 y  y PM1 X2 y D SSE.M1 yI M1 X2 /
D SSE.yI X1 /  SSR.eyX1 I EX2 X1 /, where
(b) eyX1 D res.yI X1 / D M1 y D residual of y after elimination of X1 ,
(c) EX2 X1 D res.X2 I X1 / D M1 X2 .
4 Decompositions of sums of squares 17

4.19 y0 PM1 X2 y D SSE.yI X1 /  SSE.yI X1 ; X2 / D SSE.X2 j X1 /


D reduction in SSE when adding X2 to the model

4.20 Denoting M12 D fy; X;  2 Ig, M1 D fy; X1 1 ;  2 Ig, and M121 D fM1 y;
M1 X2 2 ;  2 M1 g, the following holds:
(a) SSE.M121 / D SSE.M12 / D y0 My,
(b) SST.M121 / D y0 M1 y D SSE.M1 /,
(c) SSR.M121 / D y0 M1 y  y0 My D y0 PM1 X2 y,
SSR.M121 / y 0 P M1 X2 y y0 My
(d) R2 .M121 / D D 0
D1 0 ,
SST.M121 / y M1 y y M1 y
y0 My
(e) X2 D xk : R2 .M121 / D ryk12:::k1
2
and 1  ryk12:::k1
2
D ,
y 0 M1 y
(f) 1  R2 .M12 / D 1  R2 .M1 /1  R2 .M121 /
D 1  R2 .yI X1 /1  R2 .M1 yI M1 X2 /;
(g) 1  Ry12:::k
2
D .1  ry1
2
/.1  ry21
2
/.1  ry312
2
/    .1  ryk12:::k1
2
/.

4.21 If the model does not have the intercept term [or 1 C .X/], then the decom-
position 4.4 is not valid. In this situation, we consider the decomposition
y0 y D y0 Hy C y0 .I  H/y; SSTc D SSRc C SSEc :
In the no-intercept model, the coecient of determination is dened as
SSRc y0 Hy SSE
Rc2 D D 0 D1 0 :
SSTc yy yy
In the no-intercept model we may have Rc2 D cos2 .y; yO / cord2 .y; yO /. How-
ever, if both X and y are centered (actually meaning that the intercept term
is present but not explicitly), then we can use the usual denitions of R2 and
Ri2 . [We can think that SSRc D SSE.yI 0/  SSE.yI X/ D change in SSE
gained adding predictors when there are no predictors previously at all.]

4.22 Sample partial correlations. Below we consider the data matrix .X W Y/ D


.x1 W : : : W xp W y1 W : : : W yq /.
(a) EYX D res.YI X/ D .I  P.1WX/ /Y D MY D .ey1 X W : : : W eyq X /,
(b) eyi X D .I  P.1WX/ /yi D Myi ,
(c) pcord .Y j X/ D cord res.YI X/ D cord .EYX /
D partial correlations of variables of Y after elimination of X,
(d) pcord .y1 ; y2 j X/ D cord .ey1 X ; ey2 X /,
(e) Tyyx D E0YX EYX D Y0 .I  P.1WX/ /Y D Y0 MY D ftij x g 2 PDq ,
18 Formulas Useful for Linear Regression Analysis and Related Matrix Theory

(f) ti ix D y0i .I  P.1WX/ /yi D y0i Myi D SSE.yi I 1; X/.


   
Txx Txy X0 CY X0 CX
4.23 Denote T D D , C D I  J, M D I  P.1WX/ .
Tyx Tyy Y0 CX Y0 CY
Then
   
T1  R1 
(a) T1 D xxy
; cord .X W Y/1 D R1 D xxy
;
 T1
yyx  R1
yyx

(b) Tyyx D Tyy  Tyx T1 0 0 0 1 0 0


xx Txy D Y CY  Y CX.Y CX/ Y CX D Y MY,

(c) Ryyx D Ryy  Ryx R1


xx Rxy ,

(d) pcord .Y j X/ D diag.Tyyx /1=2 Tyyx diag.Tyyx /1=2


D diag.Ryyx /1=2 Ryyx diag.Ryyx /1=2 :

4.24 .Y  XB/0 C.Y  XB/ D .CY  CXB/0 .CY  CXB/


L .CY  PCX Y/0 .CY  PCX Y/
D Y0 C.I  PCX /CY D Tyy  Tyx T1
xx Txy

for all B, and hence for all B we have


covs .y  B0 x/ D covd .Y  XB/ L Syy  Syx S1
xx Sxy ;

where the equality is attained if B D T1 1


xx Txy D Sxx Sxy ; see 6.7 (p. 28).

rxy  rx ry
4.25 rxy D p partial correlation
.1  rx
2
/.1  ry
2
/

4.26 If Y D .x1 W x2 /, X D .x3 W : : : W xk / and Ryyx D fr ij g, then


12
r
cor.O1 ; O2 / D  pcord .x1 ; x2 j x3 ; : : : ; xk / D r123:::k D p :
r 11 r 22

4.27 Added variable plot (AVP). Let X D .X1 W xk / and denote


u D eyX1 D M1 y D res.yI X1 /;
v D exk X1 D M1 xk D res.xk I X1 /:
The scatterplot of exk X1 versus eyX1 is an AVP. Moreover, consider the mod-
 
els M12 D fy; .X1 W xk /;  2 Ig, with D k1 , M1 D fy; X1 1 ;  2 Ig,
and M121 D fM1 y; M1 xk k ;  2 M1 g D feyX1 ; exk X1 k ;  2 M1 g. Then
(a) Ok .M12 / D Ok .M121 /, (FrischWaughLovell theorem)
(b) res.yI X/ D res.M1 yI M1 xk /, R2 .M121 / D ryk12:::k1
2
,

(c) 1  R2 .M12 / D 1  R2 .M1 /.1  ryk12:::k1


2
/.
5 Distributions 19

5 Distributions

5.1 Discrete uniform distribution. Let x be a random variable whose values are
1; 2; : : : ; N , each with equal probability 1=N . Then E.x/ D 12 .N C 1/, and
var.x/ D 12 1
.N 2  1/.

5.2 Sum of squares and cubes of integers:


X
n X
n
i 2 D 16 n.n C 1/.2n C 1/; i 3 D 14 n2 .n C 1/2 :
iD1 iD1

5.3 Let x1 ; : : : ; xp be a random sample selected without a replacement from A D


2
f1; 2; : : : ; N g. Denote y D x1 C    C xp D 1p0 x. Then var.xi / D N121 ,
 
cor.xi ; xj / D  N11 D %, i; j D 1; : : : ; p, cor 2 .x1 ; y/ D p1 C 1  p1 %.

5.4 Bernoulli distribution. Let x be a random variable whose values are 0 and
1, with probabilities p and q D 1  p. Then x  Ber.p/ and E.x/ D p,
var.x/ D pq. If y D x1 C    C xn , where xi are independent and each
xi  Ber.p/, then y follows the binomial distribution, y  Bin.n; p/, and
E.x/ D np, var.x/ D npq.

5.5 Two dichotomous variables. On the basis of the following frequency table:
1  n 
sx2 D D  1 ; x
n1 n n1 n n
1 ad  bc ad  bc 0 1 total
sxy D ; rxy D p ; 0 a b
n1 n  y
1 c d
n.ad  bc/2
2 D 2
D nrxy : total  n

 
5.6 Let z D yx be a discrete 2-dimensional random vector which is obtained
from the frequency table in 5.5 so that each
 observation
 has the same proba-
bility 1=n. Then E.x/ D n , var.x/ D n 1  n , cov.x; y/ D .ad  bc/=n2 ,
p
and cor.x; y/ D .ad  bc/=  .

5.7 In terms of the probabilities:


x
var.x/ D p1 p2 ;
0 1 total
cov.x; y/ D p11 p22  p12 p21 ;
0 p11 p12 p1
p11 p22  p12 p21 y
cor.x; y/ D p D %xy : 1 p21 p22 p2
p1 p2 p1 p2
total p1 p2 1
20 Formulas Useful for Linear Regression Analysis and Related Matrix Theory
   
p11 p12 a b
5.8 %xy D 0 () det D det D0
p21 p22 c d
p11 p12 a b
() f D () D
p21 p22 c d

5.9 Dichotomous random variables x and y are statistically independent i %xy D


0.

5.10 Independence between random variables means statistical (stochastic) inde-


pendence: the random vectors   x and y are statistically independent i the joint
distribution function of xy is the product of the distribution functions of x
and y. For example, if x and y are discrete random variables with values
x1 ; : : : ; xr and y1 ; : : : ; yc , then x and y are statistically independent i
P.x D xi ; y D yj / D P.x D xi / P.y D yj /; i D 1; : : : ; r; j D 1; : : : ; c:

5.11 Finiteness matters. Throughout this book, we assume that the expectations,
variances and covariances that we are dealing with are nite. Then indepen-
dence of the random variables x and y implies that cor.x; y/ D 0. This im-
plication may not be true if the niteness is not holding.

5.12 Denition N1: A p-dimensional random variable z is said to have a p-variate


normal distribution Np if every linear function a0 z has a univariate normal
distribution. We denote z  Np .; /, where  D E.z/ and D cov.z/. If
a0 z D b, where b is a constant, we dene a0 z  N.b; 0/.

5.13 Denition N2: A p-dimensional random variable z, with  D E.z/ and D


cov.z/, is said to have a p-variate normal distribution Np if it can be expressed
as z D  C Fu, where F is an p  r matrix of rank r and u is a random vector
of r independent univariate normal random variables.

5.14 If z  Np then each element of z follows N1 . The reverse relation does not
necessarily hold.
 
5.15 If z D xy is multinormally distributed, then x and y are stochastically inde-
pendent i they are uncorrelated.

5.16 Let z  Np .; /, where is positive denite. Then z has a density


1 1 0 1
n.zI ; / D e 2 .z/ .z/ :
.2
/p=2 jj1=2

5.17 Contours of constant density for N2 .; / are ellipses dened by


A D f z 2 R2 W .z  /0 1 .z  / D c 2 g:
5 Distributions 21
p
These ellipses are centered at  and have axes c i ti , where i D chi ./
and ti is the corresponding eigenvector. The major axis is the longest diameter
(line through ) of the ellipse, that is, we want to nd a point z1 solving
maxkz  k2 subject to z 2 A. Denoting u D z  , the above task becomes
max u0 u subject to u0 1 u D c 2 ;
p
for which the solution is u1 D z1   D c 1 t1 , and u01 u1 D c 2 1 .
Correspondingly, the minor axis is the shortest diameter of the ellipse A.
   
 2 12 1 %
5.18 The eigenvalues of D D 2 , where 12  0, are
21  2 % 1
ch1 ./ D  2 C 12 D  2 .1 C %/;
ch2 ./ D  2  12 D  2 .1  %/;
     
and t1 D p1 11 , t2 D p1 1 1 . If 12
0, then t 1 D
p1 1 .
2 2 2 1

5.19 When p D 2 and cor.1 ; 2 / D % ( 1), we have


 1  2 1
1 11 12 1 1 2 %
(a) D D
21 22 1 2 % 22
0 1
1 %
 
1 22 12 1 B 2 C
B 1 1 2 C ;
D D @ A
12 22 .1  %2 / 12 12 1% 2 % 1
1 2 22

(b) det./ D 12 22 .1  %2 /


12 22 ,

1 1 .1  1 /2
(c) n.zI ; / D p  exp
2
1 2 1  %2 2.1  %2 / 12

.1  1 /.2  2 / .2  2 /2
 2% C :
1 2 22
     
x x xx xy
5.20 Suppose that z  N.; /, z D ,D ,D . Then
y y yx yy
(a) the conditional distribution of y given that x is held xed at a selected
value x D x is normal with mean
N
E.y j x/ D y C yx 1
xx .x  x /
N N
D .y  yx 1 1
xx x / C yx xx x;
N
(b) and the covariance matrix (partial covariances)
cov.y j x/ D yyx D yy  yx 1
xx xy D = xx :
N
22 Formulas Useful for Linear Regression Analysis and Related Matrix Theory
   
x x
5.21 If z D  NpC1 .; /;  D ;
y y
   
xx  xy 1  
D ; D ; then
 0xy y2   yy
(a) E.y j x/ D y C  0xy 1 0 1 0 1
xx .x  x / D .y   xy xx x / C  xy xx x,
N N N
2
(b) var.y j x/ D y12:::p 2
D yx D y2   0xy 1 
xx xy
N  0 1 

xy xx xy 
D y2 1 
y2
D y2 .1  %yx
2
/ D 1= yy D conditional variance;
 0xy 1
xx  xy
2
(c) %yx D D the squared population multiple correlation.
y2
x
5.22 When y  N2 , cor.x; y/ D %, and WD xy =x2 , we have

(a) E.y j x/ D y C .x  x / D y C % yx .x  x / D .y  x / C x,
N N N N
2
xy
(b) var.y j x/ D yx
2
D y2  x2
D y2 .1  %2 /
y2 D var.y/.
N

5.23 The random vector y C yx 1 xx .x  x / appears to be the best linear pre-


dictor of y on the basis of x, denoted as BLP.yI x/. In general, BLP.yI x/ is
not the conditional expectation of y given x.

5.24 The random vector


eyx D y  BLP.yI x/ D y  y C yx 1
xx .x  x /

is the vector of residuals of y from its regression on x, i.e., prediction error


between y and its best linear predictor BLP.yI x/. The matrix of partial co-
variances of y (holding x xed) is
cov.eyx / D yyx D yy  yx 1
xx xy D = xx :

If z is not multinormally distributed, the matrix yyx is not necessarily the


covariance matrix of the conditional distribution.

5.25 The population partial correlations. The ij -element of the matrix of partial
correlations of y (eliminating x) is
ij x
%ij x D p D cor.eyi x ; eyj x /;
i ix jj x
fij x g D yyx ; f%ij x g D cor.eyx /;
and eyi x D yi  yi   0xyi 1
xx .x  x /,  xyi D cov.x; yi /. In particular,
5 Distributions 23
%xy  %x %y
%xy D p :
.1  %2x /.1  %y
2
/

5.26 The conditional expectation E.y j x/, where x is now a random vector, is
BP.yI x/, the best predictor of y on the basis of x. Notice that BLP.yI x/ is the
best linear predictor.

5.27 In the multinormal distribution, BLP.yI x/ D E.y j x/ D BP.yI x/ D the best


predictor of y on the basis of x; here E.y j x/ is a random vector.

5.28 The conditional mean E.y j x/ is called (in the world of random variables)
the regression function (true Nmean of y when x is held at a selected value x)
N
and similarly var.y j x/ is called the variance function. Note that in the multi-
N
normal case E.y j x/ is simply a linear function of x and var.y j x/ does not
depend on x at all. N N N
N
 
5.29 Let yx be a random vector and let E.y j x/ WD m.x/ be a random variable
taking the value E.y j x D x/ when x takes the value x, and var.y j x/ WD
N the value var.y j x D x/N when x D x. Then
v.x/ is a random variable taking
N N
E.y/ D EE.y j x/ D Em.x/;
var.y/ D varE.y j x/ C Evar.y j x/ D varm.x/ C Ev.x/:
 
5.30 Let yx be a random vector such that E.y j x D x/ D C x. Then D
N N
xy =x2 and D y  x .

5.31 If z  .; / and A is symmetric, then E.z0 Az/ D tr.A/ C 0 A.

5.32 Central 2 -distribution: z  Nn .0; In /: z0 z D 2n  2 .n/

5.33 Noncentral 2 -distribution: z  Nn .; In /: z0 z D 2n;  2 .n; /, D 0 

5.34 z  Nn .;  2 In / W z0 z= 2  2 .n; 0 = 2 /

5.35 Let z  Nn .; / where is pd and let A and B be symmetric. Then


(a) z0 Az  2 .r; / i AA D A, in which case r D tr.A/ D r.A/,
D 0 A,
(b) z0 1 z D z0 cov.z/1 z  2 .r; /, where r D n, D 0 1 ,
(c) .z  /0 1 .z  / D MHLN2 .z; ; /  2 .n/,
(d) z0 Az and z0 Bz are independent i AB D 0,
(e) z0 Az and b0 z are independent i Ab D 0.
24 Formulas Useful for Linear Regression Analysis and Related Matrix Theory

5.36 Let z  Nn .0; / where is nnd and let A and B be symmetric. Then
(a) z0 Az  2 .r/ i AA D A, in which case r D tr.A/ D
r.A/,
(b) z0 Az and x0 Bx are independent () AB D 0,
(c) z0  z D z0 cov.z/ z  2 .r/ for any choice of  and r./ D r.

5.37 Let z  Nn .; / where is pd and let A and B be symmetric. Then


(a) var.z0 Az/ D 2 tr.A/2  C 40 AA,
(b) cov.z0 Az; z0 Bz/ D 2 tr.AB/ C 40 AB.

2m; =m
5.38 Noncentral F -distribution: F D  F.m; n; /, where 2m; and 2n
are independent 2n =n

5.39 t -distribution: t2 .n/ D F.1; n/

5.40 Let y  Nn .X;  2 I/, where X D .1 W X0 /, r.X/ D k C 1. Then


(a) y0 y= 2  2 .n; 0 X0 X= 2 /,
(b) y0 .IJ/y= 2  2 n1; 0 X0 .IJ/X= 2  D 2 n1; 0x Txx x = 2 ,
(c) y0 .H  J/y= 2  2 k; 0x Txx x = 2 ,
(d) y0 .I  H/y= 2 D SSE= 2  2 .n  k  1/.

5.41 Suppose In D A1 C    C Am . Then the following statements are equivalent:


(a) n D r.A1 / C    C r.Am /,
(b) A2i D Ai for i D 1; : : : ; m,
(c) Ai Aj D 0 for all i j .

5.42 Cochrans theorem. Let z  Nn .; I/ and let z0 z D z0 A1 z C    C z0 Am z.


Then any of 5.41a5.41c is a necessary and sucient condition for z0 Ai z to
be independently distributed as 2 r.Ai /; .

5.43 Wishart-distribution. Let U0 D .u.1/ W : : : W u.n/ / be a random sample from


Np .0; /,
Pi.e., u.i/ s are independent and each u.i/  Np .0; /. Then W D
U0 U D niD1 u.i/ u0.i/ is said to have a Wishart-distribution with n degrees
of freedom and scale matrix , and we write W  Wp .n; /.

5.44 Hotellings T 2 distribution. Suppose v  Np .0; /, W  Wp .m; /, v and


W are independent, is pd. Hotellings T 2 distribution is the distribution of
5 Distributions 25

T 2 D m  v0 W1 v
 1  
W Wishart 1
D v0 v D .normal r.v./0 .normal r.v./
m df
and is denoted as T 2  T2 .p; m/.

5.45 Let U0 D .u.1/ W u.2/ W : : : W u.n/ / be a random sample from Np .; /. Then:
(a) The (transposed) rows of U, i.e., u.1/ , u.2/ , . . . , u.n/ , are independent
random vectors, each u.i/  Np .; /.
(b) The columns of U, i.e., u1 , u2 , . . . , up are n-dimensional random vectors:
ui  Nn .i 1n ; i2 In /, cov.ui ; uj / D ij In .
0 1 0 1
u1 1 1 n
B : C
(c) z D vec.U/ D @ ::: A, E.z/ D @ :: A D  1n .
up p 1n
0 2 1
1 In 12 In : : : 1p In
B : :: :: C
(d) cov.z/ D @ :: : : A D In .
p1 In p2 In : : : p2 In

(e) uN D n1 .u.1/ C u.2/ C    C u.n/ / D n1 U0 1n D .uN 1 ; uN 2 ; : : : ; uNp /0 .


X
n
(f) S D 1
n1
T D 1
n1
U0 .I  J/U D 1
n1
.u.i/  u/.u N 0
N .i/  u/
iD1
X
n
D 1
n1
u.i/ u0.i/  nuN uN 0 :
iD1
 
N D , E.S/ D , uN  Np ; n1 .
(g) E.u/
(h) uN and T D U0 .I  J/U are independent and T  W.n  1; /.
(i) Hotellings T 2 : T 2 D n.uN  0 /0 S1 .uN  0 / D n  MHLN2 .u;
N 0 ; S/,
np
.n1/p
T2  F.p; n  p;  /;  D n.  0 /0 1 .  0 /:

(j) Hypothesis  D 0 is rejected at risk level , if


n.uN  0 /0 S1 .uN  0 / > p.n1/
np
FIp;np :

(k) A 100.1  /% condence region for the mean of the Np .; / is the
ellipsoid determined by all  such that
n.uN  /0 S1 .uN  /
p.n1/
np
FIp;np :

a0 .uN  0 /2
(l) max D .uN  0 /0 S1 .uN  0 / D MHLN2 .u;
N 0 ; S/.
a0 a0 Sa
26 Formulas Useful for Linear Regression Analysis and Related Matrix Theory

5.46 Let U01 and U02 be independent random samples from Np .1 ; / and Np .2 ;
/, respectively. Denote Ti D U0i Cni Ui and
1 n1 n2
S D .T1 C T2 /; T2 D .uN 1  uN 2 /0 S1 N 1  uN 2 /:
 .u
n 1 C n2  2 n 1 C n2
n1 Cn2 p1 2
If 1 D 2 , then .n1 Cn2 2/p
T  F.p; n1 C n2  p  1/.

5.47 If n1 D 1, then Hotellings T 2 becomes


T2 D n2
n2 C1
.u.1/  uN 2 /0 S1 N 2 /:
2 .u.1/  u

5.48 Let U0 D .u.1/ W : : : W u.n/ / be a random sample from Np .; /. Then the
likelihood function and its logarithm are
pn n
h X
n i
(a) L D .2
/ 2 jj 2 exp  12 .u.i/  /0 1 .u.i/  / ,
iD1

X
n
(b) log L D  12 pn log.2
/  12 n logjj  .u.i/  /0 1 .u.i/  /.
iD1

The function L is considered as a function of  and while U0 is being xed.


The maximum likelihood estimators, MLEs, of  and are the vector 
and the positive denite matrix  that maximize L:
(c)  D n1 U0 1 D .uN 1 ; uN 2 ; : : : ; uNp /0 D u,
N  D n1 U0 .I  J/U D n1
n
S.
1
(d) The maximum of L is max L.; / D enp=2 .
; .2
/np=2 j j
n=2

1
(e) Denote max L.0 ; / D enp=2 ,
.2
/np=2 j 0 jn=2
Pn
where 0 D 1
n iD1 .u.i/  0 /.u.i/  0 /0 . Then the likelihood ratio
is
 n=2
max L.0 ; / j  j
D D :
max; L.; / j 0 j

1
(f) Wilkss lambda D 2=n D , T 2 D n.uN  0 /0 S1 .uN  0 /.
1C 1
n1
T2

5.49 Let U0 D .u.1/ W : : : W u.n/ / be a random sample from NkC1 .; /, where
   
x.i/ x
u.i/ D ; E.u.i/ / D ;
yi y
 
xx  xy
cov.u.i/ / D ; i D 1; : : : ; n:
 0xy y2
6 Best linear predictor 27

Then the conditional mean and variance of y, given that x D x, are


N
E.y j x/ D y C  0xy 1 0
xx .x  x / D 0 C x x;
N 2
N N
var.y j x/ D yx ;
N
where 0 D y   0xy 1 1
xx x and x D xx  xy . Then

MLE.0 / D yN  s0xy S1 N D O0 ;


xx x MLE. x / D S1 O
xx sxy D x ;
2
MLE.yx 2
/ D n1 .tyy  t0xy T1
xx t xy / D n SSE:
1

2
The squared population multiple correlation %yx D  0xy 1
xx  xy =y equals
2
1
0 i x D xx  xy D 0. The hypothesis x D 0, i.e., %yx D 0, can be tested
by
2
Ryx =k
F D :
.1  Ryx
2 /=.n  k  1/

6 Best linear predictor

6.1 Let f .x/ be a scalar valued function of the random vector x. The mean squared
error of f .x/ with respect to y (y being a random variable or a xed constant)
is
MSEf .x/I y D Ey  f .x/2 :
Correspondingly, for the random vectors y and f.x/, the mean squared error
matrix of f.x/ with respect to y is
MSEMf.x/I y D Ey  f.x/y  f.x/0 :

6.2 We might be interested in predicting the random variable y on the basis of


some function of the random vector x; denote this function as f .x/. Then
f .x/ is called a predictor of y on the basis of x. Choosing f .x/ so that it
minimizes the mean squared error MSEf .x/I y D Ey  f .x/2 gives the
best predictor BP.yI x/. Then the BP.yI x/ has the property
min MSEf .x/I y D min Ey  f .x/2 D Ey  BP.yI x/2 :
f .x/ f .x/

It appears that the conditional expectation E.y j x/ is the best predictor of y:


E.y j x/ D BP.yI x/. Here we have to consider E.y j x/ as a random variable,
not a real number.

6.3 biasf .x/I y D Ey  f .x/

6.4 The mean squared error MSE.a0 x C bI y/ of the linear (inhomogeneous) pre-
dictor a0 x C b with respect to y can be expressed as
28 Formulas Useful for Linear Regression Analysis and Related Matrix Theory

Ey  .a0 x C b/2 D var.y  a0 x/ C y  .a0 x C b/2


D variance C bias2 ;
and in the general case, the mean squared error matrix MSEM.Ax C bI y/ is
Ey  .Ax C b/y  .Ax C b/0
D cov.y  Ax/ C ky  .Ax C b/k2 :

6.5 BLP: Best linear predictor. Let x and y be random vectors such that
       
x x x xx xy
E D ; cov D :
y y y yx yy
Then a linear predictor Gx C g is said to be the best linear predictor, BLP, for
y, if the Lwner ordering
MSEM.Gx C gI y/
L MSEM.Fx C fI y/
holds for every linear predictor Fx C f of y.
   
x xx xy
6.6 Let cov.z/ D cov DD ; and denote
y yx yy
    
I 0 x x
Bz D D :
 yx 
xx I y y  yx xx x

Then in view the block diagonalization theorem of a nonnegative denite ma-


trix, see 20.26 (p. 89), we have
   
0 I 0 xx xy I .  /0 xy
(a) cov.Bz/ D BB D xx
 yx 
xx I yx yy 0 I
 
xx 0
D ; yyx D yy  yx 
xx xy ;
0 yyx
(b) cov.x; y  yx 
xx x/ D 0, and

6.7 (a) cov.y  yx 


xx x/ D yyx
L cov.y  Fx/ for all F,
(b) BLP.yI x/ D y C yx   
xx .x  x / D .y  yx xx x / C yx xx x.

6.8 Let eyx D y  BLP.yI x/ be the prediction error. Then


(a) eyx D y  BLP.yI x/ D y  y C yx 
xx .x  x /,

(b) cov.eyx / D yy  yx 
xx xy D yyx D MSEMBLP.yI x/I y,

(c) cov.eyx ; x/ D 0.

6.9 According to 4.24 (p. 18), for the data matrix U D .X W Y/ we have
6 Best linear predictor 29

Syy  Syx S1 0


xx Sxy
L covd .Y  XB/ D covs .y  B x/ for all B;
sy2  s0xy S1
xx sxy
0

vard .y  Xb/ D vars .y  b x/ for all b;
where the minimum is attained when B D S1 1
xx Sxy D Txx Txy .

6.10 x 2 C . xx W x / and x  x 2 C . xx / with probability 1.

6.11 The following statements are equivalent:


(a) cov.x; y  Ax/ D 0.
(b) A is a solution to A xx D yx .
(c) A is of the form A D yx  
xx C Z.Ip  xx xx /; Z is free to vary.

(d) A.x  x / D yx 
xx .x  x / with probability 1.

6.12 If b D 1 
xx  xy (no worries to use xx ), then

(a) min var.y  b0 x/ D var.y  b0 x/


b
D y2   0xy 1 2 2
xx  xy D y12:::p D yx ;

(b) cov.x; y  b0 x/ D 0,


(c) max cor 2 .y; b0 x/ D cor 2 .y; b0 x/ D cor 2 .y;  0xy 1
xx x/
b
 0xy 1
xx  xy 2
D D %yx , squared population
y2 multiple correlation,
2
(d) yx D y2   0xy 1 2 0 1
xx  xy D y .1   xy xx  xy =y / D y .1  %yx /.
2 2 2

6.13 (a) The tasks of solving b from minb var.y  b0 x/ and maxb cor 2 .y; b0 x/
yield essentially the same solutions b D 1
xx  xy .

(b) The tasks of solving b from minb ky  Abk2 and maxb cos2 .y; Ab/ yield
essentially the same solutions Ab D PA y.
(c) The tasks of solving b from minb vard .y  X0 b/ and maxb cord2 .y; X0 b/
yield essentially the same solutions b D S1 O
xx sxy D x .

6.14 Consider a random vector z with E.z/ D , cov.z/ D , and let the random
vector BLP.zI A0 z/ denote the BLP of z based on A0 z. Then
(a) BLP.zI A0 z/ D  C cov.z; A0 z/cov.A0 z/ A0 z  E.A0 z/
D  C A.A0 A/ A0 .z  /
D  C P0AI .z  /;
(b) the covariance matrix of the prediction error ezA0 z D z  BLP.zI A0 z/ is
30 Formulas Useful for Linear Regression Analysis and Related Matrix Theory

covz  BLP.zI A0 z/ D  A.A0 A/ A0


D .I  PAI / D 1=2 .I  P1=2 A / 1=2 :
 
6.15 Cooks trick. The best linear predictor of z D xy on the basis of x is
   
x x
BLP.zI x/ D D :
y C yx  xx .x  x / BLP.yI x/

6.16 Let cov.z/ D and denote z.i/ D .1 ; : : : ; i /0 and consider the following
residuals (prediction errors)
e10 D 1 ; ei1:::i1 D i  BLP.i I z.i1/ /; i D 2; : : : ; p:
Let e be a p-dimensional random vector of these residuals: e D .e10 ; e21 ; : : : ;
ep1:::p1 /0 . Then cov.e/ WD D is a diagonal matrix and
det./ D detcov.e/ D 12 21
2 2
   p12:::p1
D .1  %212 /.1  %2312 /    .1  %p1:::p1
2
/12 22    p2

12 22    p2 :

6.17 (Continued . . . ) The vector e can be written as e D Fz, where F is a lower


triangular matrix (with ones in diagonal) and cov.e/ D D D FF0 . Thereby
D F1 D1=2 D1=2 .F0 /1 WD LL0 , where L D F1 D1=2 is an lower trian-
gular matrix. This gives a statistical proof of the triangular factorization of a
nonnegative denite matrix.

6.18 Recursive decomposition of 1  %y12:::p


2
:
2 2 2 2 2
1  %y12:::p D .1  %y1 /.1  %y21 /.1  %y312 /    .1  %yp12:::p1 /:

6.19 Mustonens measure of multivariate dispersion. Let cov.z/ D pp . Then


X
p
2
Mvar./ D max i12:::i1 ;
iD1

where the maximum is sought over all permutations of 1 ; : : : ; p .

6.20 Consider the data matrix X0 D .x1 W : : : W xk / and denote e1 D .I  P1 /x1 ,


ei D .I  P.1Wx1 W:::Wxi 1 / /xi , E D .e1 W e2 W : : : W ek /. Then E0 E is a diagonal
matrix where ei i D SSE.xi I 1; x1 ; : : : ; xi1 /, and
 1 0  2 2 2
det n1 E E D .1  r12 /.1  R312 /    .1  Rk1:::k1 /s12 s22    sk2
D det.Sxx /:

6.21 AR(1)-structure. Let yi D %yi1 C ui , i D 1; : : : ; n, j%j < 1, where ui s


(i D : : : , 2, 1, 0, 1, 2; : : : ) are independent random variables, each having
E.ui / D 0 and var.ui / D u2 . Then
6 Best linear predictor 31
0 n1 1
1 % %2 : : : %
B u2
% 1 % : : : %n2 C
(a) cov.y/ D D  2 V D B : : :: :: C
1  %2 @ :: :: : : A
n1 n2 n3
% % % ::: 1
u2 jij j

D % :
1  %2
   
y V11 v12
(b) cor.y/ D V D cor .n1/ D D f%jij j g,
yn v012 1
(c) V1 1 0
11 v12 D V11  %V11 in1 D %in1 D .0; : : : ; 0; %/ WD b 2 R
n1
,
(d) BLP.yn I y.n1/ / D b0 y.n1/ / D v012 V1
11 y.n1/ D %yn1 ,

(e) en D yn  BLP.yn I y.n1/ / D yn  %yn1 D nth prediction error,


(f) cov.y.n1/ ; yn  b0 y.n1/ D cov.y.n1/ ; yn  %yn1 / D 0 2 Rn1 ,
(g) For each yi , i D 2; : : : ; n, we have
BLP.yi I y.i1/ / D %yi1 ; eyi y.i 1/ D yi  %yi1 WD ei :
Dene e1 D y1 , ei D yi  %yi1 , i D 2; : : : ; n, i.e.,
0 1 0 1
y1 1 0 0 ::: 0 0 0
B y2  %y1 C B% 1 0 : : : 0 0 0C
B C B : : : :: :: :: C
eDB C B : : :
B y3 : %y2 C D B : : : : : :CC y WD Ly;
@ :: A @ 0 0 0 : : : % 1 0A
yn  %yn1 0 0 0 : : : 0 % 1
where L 2 Rnn .
 
1 00
(h) cov.e/ D  2 WD  2 D D cov.Ly/ D  2 LVL0 ,
0 .1  %2 /In1
(i) LVL0 D D; V D L1 D.L0 /1 ; V1 D L0 D1 L,
  p 
1  %2 0 0 1  %2 0 0
(j) D1 D 1%1
2 ; D 1=2
D p 1
;
0 In1 1%2 0 In1
0p 1
1  %2 0 0 : : : 0 0 0
B % 1 0 ::: 0 0 0C
1 B :: :: :: :: :: :: C
(k) D1=2 L D p B : :C
B : : : : C
1  %2 @
0 0 0 : : : % 1 0 A
0 0 0 ::: 0 % 1
1
WD p K;
1  %2
1
(l) D1=2 LVL0 D1=2 D KVK0 D In ,
1  %2
32 Formulas Useful for Linear Regression Analysis and Related Matrix Theory

1
(m) V1 D L0 D1 L D L0 D1=2 D1=2 L D K0 K
1  %2
0 1
1 % 0 ::: 0 0 0
B% 1 C %2 % : : : 0 0 0C
1 B B ::: :: :: :: :: :: C
D B : : : : : C
C;
1% @
2
0 0 0 : : : % 1 C %2 %A
0 0 0 ::: 0 % 1
(n) det.V/ D .1  %2 /n1 .
(o) Consider the model M D fy; X;  2 Vg, where  2 D u2 =.1  %2 /.
Premultiplying this model by K yields the model M D fy ; X ; u2 Ig,
where y D Ky and X D KX.

6.22 DurbinWatson test statistic for testing % D 0:


Xn X n
2 "O 0 G0 G"O
DW D .O"i  "Oi1 / "O2i D 2.1  %/;
O
"O 0 "O
iD2 iD1
P Pn
where %O D niD2 "Oi1 "Oi iD2 " O2i and G 2 R.n1/n and G0 G 2 Rnn are
0 1
1 1 0 : : : 0 0 0
B 0 1 1 : : : 0 0 0C
B : :: :: :: :: :: C
GDB B :
: : : : : :C C;
@ 0 0 0 : : : 1 1 0A
0 0 0 : : : 0 1 1
0 1
1 1 0 : : : 0 0 0
B1 2 1 : : : 0 0 0 C
B : :: :: :: :: :: C
G0 G D BB :
: : : : : : CC:
@ 0 0 0 : : : 1 2 1A
0 0 0 : : : 0 1 1

6.23 Let the sample covariance matrix of x1 ; : : : ; xk ; y be


 

S s
S D r jij j D 0xx xy 2 R.kC1/.kC1/ :
sxy sy2

Then sxy D rSxx ik and O x D S1 0


xx sxy D rik D .0; : : : ; 0; r/ 2 R .
k

0 1
1 0 :::0 0
B1 1 :::0 0C
B C
6.24 If L D B
B:
1 1 :::1 0C 2 Rnn , then
@ :: :::: :: :: C
: :: :A
1 1 1 ::: 1
7 Testing hypotheses 33
0 11 0 1
1 1 1
::: 1 1 2 1 0 : : : 0 0
B1 2 2
::: 2 2 C B1 2 1 : : : 0 0 C
B C B C
B1 2 3
::: 3 3 C B 0 1 2 : : : 0 0 C
.LL0 /1 DB
B ::: :: ::
:: :: C B
:: C D B :: :: :: : : :: :: C
C:
B : : : : : C B : : : : : : C
@1 2 3 : : : n  1 n  1A @ 0 0 0 : : : 2 1 A
1 2 3 ::: n  1 n 0 0 0 : : : 1 1

7 Testing hypotheses

7.1 Consider the model M D fy; X;  2 Ig, where y  Nn .X;  2 I/, X is n  p


(p D k C 1). Most of the time we assume that X D .1 W X0 / and it has full
column rank. Let the hypothesis to be tested be
H W K0 D 0; where K0 2 Rqqp , i.e., Kpq has a full column rank.
Let MH D fy; X j K0 D 0;  2 Ig denote the model under H and let C .X /
be that subspace of C .X/ where the hypothesis H holds:
C .X / D f z W there exists b such that z D Xb and K0 b D 0 g:
Then MH D fy; X ;  2 Ig and the hypothesis is H : E.y/ 2 C .X /. In gen-
eral we assume that K0 is estimable, i.e., C .K/  C .X0 / and the hypothesis
said to be testable. If r.X/ D p then every K0 is estimable.

Q=q Q=q
7.2 The F -statistic for testing H : F D D  F.q; n  p; /
O 2 SSE=.n  p/

7.3 SSE= 2 D y0 My= 2  2 .n  p/

7.4 Q D SSEH  SSE D SSE change in SSE due to the hypothesis


D SSE.MH /  SSE.M / SSEH D SSE.MH / D SSE under H
D y0 .I  PX /y  y0 .I  H/y
D y0 .H  PX /y; and hence

SSE.MH /  SSE.M /=q


7.5 F D
SSE.M /=.n  p/
.R2  RH2
/=q
D 2
RH D R2 under H
.1  R /=.n  p/
2

SSE.MH /  SSE.M /=r.X/  r.X /


D
SSE.M /=n  r.X/
.R2  RH2
/=r.X/  r.X /
D
.1  R /=n  r.X/
2
34 Formulas Useful for Linear Regression Analysis and Related Matrix Theory

7.6 Q= 2  2 .q; /; q D r.X/  r.X / central 2 if H true

7.7 D 0 X0 .H  PX /X= 2 D 0 X0 .I  PX /X= 2

7.8 C .X / D C .XK? /; r.X / D r.X/  dim C .X0 / \ C .K/ D r.X/  r.K/

7.9 Consider hypothesis H : kqC1 D    D k D 0, i.e., 2 D 0, when


 
1
D ; 2 2 Rq ; X D .X1 W X2 /; X1 D .1 W x1 W : : : W xkq /:
2
Then K0 D .0 W Iq /, X D X1 D XK? , and 7.107.16 hold.

7.10 Q D y0 .H  PX1 /y
D y 0 P M1 X2 y M1 D I  PX1  D 02 X02 M1 X2 2 = 2
D y0 M1 X2 .X02 M1 X2 /1 X02 M1 y
D O 02 X02 M1 X2 O 2 D O 02 T221 O 2 T221 D X02 M1 X2
D O 02 cov.O 2 /1 O 2  2 cov.O 2 / D  2 .X02 M1 X2 /1

.O 2  2 /0 T221 .O 2  2 /=q
7.11
FIq;nk1 condence ellipsoid for 2
O 2
7.12 The left-hand side of 7.11 can be written as
.O 2  2 /0 cov.O 2 /1 .O 2  2 / 2 =q
O 2
b
D .O 2  2 /0 cov.O 2 /1 .O 2  2 /=q

and hence the condence region for 2 is


b
.O 2  2 /0 cov.O 2 /1 .O 2  2 /=q
FIq;nk1 :

7.13 Consider the last two regressors: X D .X1 W xk1 ; xk /. Then 3.29 (p. 14)
implies that
 
O 1 rk1;kX1
cor. 2 / D ; rk1;kX1 D rk1;k12:::k2 ;
rk1;kX1 1
and hence the orientation of the ellipse (the directions of the major and minor
axes) is determined by the partial correlation between xk1 and xk .

b
The volume of the ellipsoid .O 2  2 /0 cov.O 2 /1 .O 2  2 / D c 2 is pro-
b
7.14
portional to c q detcov.O 2 /.

.O x  x /0 Txx .O x  x /=k
7.15
FIk;nk1 condence region for x
O 2
7 Testing hypotheses 35

7.16 Q D kres.yI X1 /  res.yI X/k2 D SSE.yI X1 /  SSE.yI X1 ; X2 /


D SSE D change in SSE when adding X2 to the model
D SSR  SSRH D y0 .H  J/y  y0 .PX  J/y D SSR

7.17 If the hypothesis is H : K0 D d, where K0 2 Rqqp , then


(a) Q D .K0 O  d/0 K0 .X0 X/1 K1 .K0 O  d/
D .K0 O  d/0 cov.K0 /
O 1 .K0 O  d/ 2 WD u0 cov.u/1 u 2 ;

(b) Q= 2 D 2 .q; /; D .K0  d/0 K0 .X0 X/1 K1 .K0  d/= 2

(c) F D
Q=q
O 2
b
D .K0 O  d/0 cov.K0 /
O 1 .K0 O  d/=q  F.q; p  q; /,

(d) K0 O  d  Nq K0  d;  2 K0 .X0 X/1 K,


(e) SSEH D minK0 Dd ky  Xk2 D ky  XO r k2 ,
(f) O r D O  .X0 X/1 KK0 .X0 X/1 K1 .K0 O  d/. restricted OLSE

7.18 The restricted OLSE O r is the solution to equation


 0    0 
XX K O r Xy
0 D ; where  is the Lagrangian multiplier.
K 0  d
The equation above has a unique solution i r.X0 W K/ D p C q, and hence
O r may be unique even though O is not unique.

7.19 If the restriction is K0 D 0 and L is a matrix (of full column rank) such that
L 2 fK? g, then X D XL, and the following holds:
(a) XO r D X .X0 X /1 X0 y D XL.L0 X0 XL/1 L0 X0 y,
 
(b) O r D Ip  .X0 X/1 KK0 .X0 X/1 KK0 O
D .Ip  P0KI.X0 X/1 /O D PLIX0 X O
D L.L0 X0 XL/1 L0 X0 XO D L.L0 X0 XL/1 L0 X0 y:

(c) cov.O r / D  2 L.L0 X0 XL/1 L0



D  2 .X0 X/1  .X0 X/1 KK0 .X0 X/1 K1 K0 .X0 X/1 :

7.20 If we want to nd K so that there is only one solution to X0 X D X0 y which


satises the constraint K0 D 0, we need to consider the equation
 0   0     
XX Xy X PX y
D ; or equivalently, D :
K0 0 K0 0
0
The above equation  a unique solution for every given PX y i C .X / \
 Xhas
C .K/ D f0g and K0 has a full column rank in which case the solution for
36 Formulas Useful for Linear Regression Analysis and Related Matrix Theory

is .X0 X C KK0 /1 X0 y. Requirement C .X0 / \ C .K/ D f0g means that K0


is not estimable.

7.21 H W 1 D    D k D 0, i.e., H W E.y/ 2 C .1/ W


MSR R2 =k
F D Foverall D D  F.k; n  k  1; /
MSE .1  R2 /=.n  k  1/

Oi2 SSE R2


7.22 H W i D 0 W F D F .Oi / D D D ;
se2 .Oi / MSE .1  R2 /=.n  k  1/
2
ryirest
D  F.1; n  k  1; /;
.1  ryirest
2
/=.n  k  1/
where R2 D R2  R.i/ 2
D change in R2 when xi deleted, R.i/ 2
D
R .yI X.i/ /, X.i/ D fall other regressors except the i thg D frestg.
2

q
Oi Oi
7.23 H W i D 0 W t D t .Oi / D Dp D F .Oi /  t.n  k  1/
se.Oi / c Oi /
var.

O 2
.k0 / O 2
.k0 /
7.24 H W k0 D 0 W F D D  F.1; n  k  1; /
O 2 k0 .X0 X/1 k c 0 /
var.k O

R2 =q
7.25 H W K0 D d W F D  F.q; n  p; /,
.1  R2 /=.n  p/
where R2 D R2  RH
2
D change in R2 due to the hypothesis.

7.26 In the no-intercept model R2 has to be replaced with Rc2 .

7.27 Consider the model M D fy; X;  2 Vg, where X 2 Rpnp , V is pd, y 


Nn .X;  2 V/, and F .V / is the F -statistic for testing linear hypothesis H :
K0 D d (K0 2 Rqqp /. Then
Q.V /=q Q.V /=q
(a) F .V / D D  F.q; n  p; /,
Q 2 SSE.V /=.n  p/
(b) Q 2 D SSE.V /=f; f D n  p, unbiased estimator of  2
under fy; X;  2 Vg
(c) SSE.V / D min ky  Xk2V1 weighted SSE
Q 0 V1 .y  X/,
D min .y  X/0 V1 .y  X/ D .y  X/ Q

(d) SSE.V /= 2  2 .n  p/,


(e) Q D .X0 V1 X/1 X0 V1 y D BLUE./ under fy; X;  2 Vg,
7 Testing hypotheses 37

Q D  2 K0 .X0 V1 X/1 K D K0 cov./K,


(f) cov.K0 / Q

e Q D Q 2 K0 .X0 V1 X/1 K,


(g) cov.K0 /
(h) Q.V / D SSEH .V /  SSE.V / D SSE.V / change in SSE.V /
due to H ,
(i) Q.V / D .K0 Q  d/0 K0 .X0 V1 X/1 K1 .K0 Q  d/
D .K0 Q  d/0 cov.K0 /
Q 1 .K0 Q  d/ 2 WD u0 cov.u/1 u 2 ;

(j) Q.V /= 2 D .K0 Q  d/0 cov.K0 /


Q 1 .K0 Q  d/  2 .q; /,
Q 1 .K0  d/= 2 ,
(k) D .K0  d/0 cov.K0 /
.K0 Q  d/0 cov.K0 /
Q 1 .K0 Q  d/ 2 =q
(l) F .V / D
Q 2
0 Q 0
e
D .K  d/ cov.K / .K0 Q  d/=q  F.q; f; /;
0 Q 1

(m) K0 Q  d  Nq K0  d;  2 K0 .X0 V1 X/1 K,


(n) SSEH .V / D minK0 Dd ky  Xk2V1 D ky  XQ r k2V1 , where

(o) Q r D Q  .X0 V1 X/1 KK0 .X0 V1 X/1 K1 .K0 Q  d/ restricted
BLUE.
7.28 Consider general situation, when V is possibly singular. Denote
SSE.W / D .y  X/ Q
Q 0 W .y  X/; where W D V C XUX0

with U satisfying C .W/ D C .X W V/ and XQ D BLUE.X/. Then


Q 2 D SSE.W /=f; where f D r.X W V/  r.X/ D r.VM/;
is an unbiased estimator for  2 . Let K0 be estimable, i.e., C .K/  C .X0 /,
Q
denote u D K0 d, cov.K0 /Q D  2 D, m D r.D/, and assume that DD u D
0 
u. Then .u D u=m/=  F.m; f; /, i.e.,
2

e
.K0 Q  d/0 cov.K0 /
Q  .K0 Q  d/=m  F.m; f; /:

7.29 If r.X/ < p but K0 is estimable, then the hypothesis K0 D d can be


tested as presented earlier by replacing appropriated inverses with generalized
inverses and replacing p (D k C 1) with r D r.X/.

7.30 One-way ANOVA for g groups; hypothesis is H : 1 D    D g .


P 1
group 1: y11 ; y12 ; : : : ; y1n1 ; SS1 D jnD1 .y1j  yN1 /2
Png
group g: yg1 ; yg2 ; : : : ; yg ng ; SSg D j D1 .ygj  yNg /2
X
g
(a) SSE D SSEWithin D SS1 C  CSSg ; SSBetween D N 2,
ni .yNi  y/
iD1
38 Formulas Useful for Linear Regression Analysis and Related Matrix Theory

SSB=.g  1/
(b) F D  F.g  1; n  g/ if H is true.
SSE=.n  g/

7.31 t -test for the equality of the expectations of 2 groups: H W 1 D 2 .


.yN1  yN2 /2 n1 .yN1  y/
N 2 C n2 .yN2  y/
N 2
F D  D
SS1 C SS2 1 1 SS1 C SS2
C
n2 n1 n2 n2
 1
n 1 n2 SS1 C SS2
D  .yN1  yN2 /   .yN1  yN2 /
n n2
.yN1  yN2 /2
D  F.1; n  2/ D t2 .n  2/; n1 C n2 D n
SSE n
n  2 n1 n2

7.32 One-way ANOVA in matrix terms.


y D X C "; yi D .yi1 ; yi2 ; : : : ; yi ni /0 ; i D 1; : : : ; g;
0 1 0 10 1 0 1
y1 1n1 ::: 0 1 "1
@ :: A D B :
@ ::
:: : C :
: :: A @ :: A C @ :: A ;
:
:
yg 0 : : : 1ng g "g
0 1 0 1
Jn1 : : : 0 In1  Jn1 : : : 0
B : :: C B :: :: C
H D @ :: : : : : A; M D @ :
::
: : A;
0 : : : Jng 0 : : : Ing  Jng
0 1 0 1 0 1
yN1 1n1 yN 1 Cn1 y1
B : C B:C B C
Hy D @ :: A D @ :: A ; My D @ ::: A ;
yNg 1ng yN g Cng yg
0 1
.yN1  y/1
N n1
B : C
.H  Jn /y D @ :
: A;
.yNg  y/1
N ng
X
g X
ni
0
SST D y Cn y D N 2;
.yij  y/
iD1 j D1

SSE D y0 My D SS1 C    C SSg ;


X
g
0
SSR D y .H  Jn /y D N 2 D SSB;
ni .yNi  y/
iD1
SST D SSR C SSE; because 1n 2 C .X/;
SSEH D SSE.1 D    D g / D SST H) SSEH  SSE D SSR:

7.33 Consider one-way ANOVA situation with g groups. Let x be a variable indi-
cating the group where the observation belongs to so that x has values 1; : : : ,
8 Regression diagnostics 39
      g  
1
g. Suppose that the n data points y111 ; : : : ; y1n 1
; : : : ; ygg1 ; : : : ; ygng
comprise
 a theoretical two-dimensional discrete distribution for the random
vector yx where each pair appears with the same probability 1=n. Let
E.y j x/ WD m.x/ be a random variable which takes the value E.y j x D i /
when x takes the value i , and var.y j x/ WD v.x/ is a random variable taking
the value var.y j x D i / when x D i . Then the decomposition
var.y/ D varE.y j x/ C Evar.y j x/ D varm.x/ C Ev.x/
(multiplied by n) can be expressed as
X
g X
ni
X
g X
g X
ni
2 2
.yij  y/
N D ni .yNi  y/
N C .yij  yNi /2 ;
iD1 j D1 iD1 iD1 j D1

SST D SSR C SSE; y0 .I  J/y D y0 .H  J/y C y0 .I  H/y;


where J and H are as in 7.32.

7.34 One-way ANOVA using another parametrization.


y D X C "; yi D .yi1 ; yi2 ; : : : ; yi ni /0 ; i D 1; : : : ; g;
0 1 0 101 0 1
y1 1n1 1n1 : : : 0 "1
@ :: A D B :: :: : : :: C B
B 1 C
C @ :: A ;
: @ : : : : A @:A: C :
yg : "g
1ng 0 : : : 1ng g
0 1
n n1 : : : ng
B n 1 n1 : : : 0 C
X0 X D B C
@ ::: ::: : : : ::: A ;
n g 0 : : : ng
0 1 0 1
0 0 ::: 0 0
B0 1=n1 : : : 0 C B N
y C
G WD B : : : : C 2 f.X0 X/ g; O D GX0 y D B :1 C :
@:: :
: : : : A
: @ :: A
0 0 : : : 1=ng yNg

8 Regression diagnostics

8.1 Some particular residuals under fy; X;  2 Ig:


(a) "Oi D yi  yOi ordinary least squares residual
(b) RESSi D .yi  yOi /=O scaled residual
yi  yOi
(c) STUDIi D ri D p internally Studentized residual
O 1  hi i
40 Formulas Useful for Linear Regression Analysis and Related Matrix Theory

yi  yOi
(d) STUDEi D ti D p externally Studentized residual
O .i/ 1  hi i
O .i/ D the estimate of  in the model where the i th case is excluded,
STUDEi  t.n  k  2; /.

8.2 Denote
     
X.i/ y
XD 0 ; y D .i/ ; D ;
x.i/ yi
Z D .X W ii /; ii D .0; : : : ; 0; 1/0 ;
and let M.i/ D fy.i/ ; X.i/ ;  2 In1 g be the model where the i th (the last, for
notational convenience) observation is omitted, and let MZ D fy; Z;  2 Ig
be the extended (mean-shift) model. Denote O .i/ D .M O O O
.i/ /, Z D .MZ /,
O O
and D .MZ /. Then
(a) O Z D X0 .I  ii i0i /X1 X0 .I  ii i0i /y D .X0.i/ X.i/ /1 X0.i/ y.i/ D O .i/ ,
i0 My "Oi
(b) O D 0i D ; we assume that mi i > 0, i.e., is estimable,
ii Mii mi i
(c) SSEZ D SSE.MZ /
D y0 .I  PZ /y D y0 .M  PMii /y D y0 .I  ii i0i  P.Iii i0i /X /y
D SSE.i/ D SSE  y0 PMii y D SSE  mi i O2 D SSE  "O2i =mi i ;
(d) SSE  SSEZ D y0 PMii y D change in SSE due to hypothesis D 0,
 
.In1  PX.i / /y.i/
(e) .In  PZ /y D ;
0

y0 PMii y SSE  SSEZ O2


(f) ti2 D D D ,
y0 .M  PMii /y=.n  k  2/ SSEZ =.n  k  2/ O
se2 ./
(g) ti2  F.1; n  k  2; 2 mi i = 2 /, F -test statistic for
testing D 0 in MZ ,
O D O 2 =mi i D O 2 =mi i ,
(h) se2 ./ Z .i/

(i) mi i > 0 () Mii 0 () ii C .X/ () x.i/ 2 C .X0.i/ /


() is estimable under MZ
() is estimable under MZ :

yn  yN.n/ yn  yN
8.3 If Z D .1 W in /, then tn D p D p ,
s.n/ = 1  1=n s.n/ 1  1=n
2
where s.n/ D sample variance of y1 ; : : : ; yn1 and yN.n/ D .y1 C    C
yn1 /=.n  1/.
8 Regression diagnostics 41

8.4 O
DFBETAi D O  O .i/ D .X0 X/1 X0 ii ; X.O  O .i/ / D Hii O

8.5 O
y  XO .i/ D My C Hii O H) "O.i/ D i0i .y  XO .i/ / D yi  x0.i/ O .i/ D ,
"O.i/ D the i th predicted residual: "O.i/ is based on a t to the data with the i th
case deleted

8.6 "O2.1/ C    C "O2.n/ D PRESS the predicted residual sum of squares

8.7 COOK2i D Di2 D .O  O .i/ /0 X0 X.O  O .i/ /=.p O 2 / Cooks distance


D .Oy  yO i /0 .Oy  yO i /=.p O 2 / D O2 hi i =.p O 2 / yO i D XO .i/

1 "O2i hi i ri2 hi i
8.8 COOK2i D D ; ri D STUDIi
p O 2 mi i mi i p mi i

8.9 hi i D x0.i/ .X0 X/1 x.i/ ; hi i D leverage


x0.i/ D .1; x0.i/ / D the i th row of X

8.10 h11 C h22 C    C hnn D p .D k C 1/ D trace.H/ D r.X/

8.11 H1 D 1 1 2 C .X/

1 1
8.12
hi i
; c D # of rows of X which are identical with x0.i/ (c  1)
n c
1
8.13 h2ij
; for all i j
4

8.14 hi i D 1
n
C xQ 0.i/ T1 Q .i/
xx x

D 1
n
C .x.i/  xN /0 T1 N/
xx .x.i/  x x0.i/ D the i th row of X0
D 1
n
C 1
.x
n1 .i/
 xN /0 S1 N/ D
xx .x.i/  x
1
n
C 1
n1
MHLN2 .x.i/ ; xN ; Sxx /

1 .xi  x/
N 2
8.15 hi i D C ; when k D 1
n SSx

8.16 MHLN2 .x.i/ ; xN ; Sxx / D .x.i/  xN /0 S1 N / D .n  1/hQ i i


xx .x.i/  x
Q D PCX
H 0

1=2
0 chmax .X0 X/ sgmax .X/
8.17 .X X/ D D condition number of X
chmin .X0 X/ sgmin .X/
1=2
0 chmax .X0 X/ sgmax .X/
8.18 i .X X/ D D condition indexes of X
chi .X0 X/ sgi .X/
42 Formulas Useful for Linear Regression Analysis and Related Matrix Theory

8.19 Variables (columns) x1 ; : : : ; xp are exactly collinear if one of the xi s is an


exact linear combination of the others. This is exact collinearity, i.e., linear
dependence, and by term collinearity or near dependence we mean inexact
collinear relations.

8.20 Let MZ D fy; Z;  2 Ig denote the extended model where Z D .X W U/,
M D fy1 ; X.1/ ; Inb g is the model without the last b observations and
     
X.1/ 0
Xnp D ; X.2/ 2 Rbp ; Unb D ; D :
X.2/ Ib
 
Ia  H11 H12
(a) M D ; a C b D n;
H21 Ib  H22
M22 D Ib  H22 D U0 .In  H/U;
(b) r.M22 / D r.U0 M/ D b  dim C .U/ \ C .X/,
(c) rX0 .In  UU0 / D r.X.1/ / D r.X/  dim C .X/ \ C .U/,
(d) dim C .X/\C .U/ D r.X/r.X.1/ / D r.X.2/ /dim C .X0.1/ /\C .X0.2/ /,
(e) r.M22 / D b  r.X.2/ / C dim C .X0.1/ / \ C .X0.2/ /,

(f) M22 is pd () C .X0.2/ /  C .X0.1/ / () r.X.1/ / D r.X/


() ch1 .U0 HU/ D ch1 .H22 / < 1
() C .X/ \ C .U/ D f0g
() (and then ) is estimable under MZ :

8.21 O
If X.1/ has full column rank, then .M O O
 / D .MZ / WD  and

(a) O D .U0 MU/1 U0 My D M1 O2


22 " "O 2 D lower part of "O D My
D .M1
22 M21 W Ib /y D M1
22 M21 y1 C y2 y2 D lower part of y
(b) SSEZ D y .In  PZ /y D y .M  PMU /y D y In  UU0  P.In UU0 /X y
0 0 0

D SSE D SSE  y0 PMU y


D SSE  "O 02 M1 O 2 D SSE  O 0 M22 O
22 "

(c) O  O  D .X0 X/1 X0 UM1 0


22 U My

D .X0 X/1 X0.2/ M1 O 2 D .X0 X/1 X0 UO


22 "

y0 PMU y=b y0 MU.U0 MU/1 U0 My=b


(d) t2 D D
y0 .M  PMU /y=.n  p  b/ SSE =.n  p  b/
0 1
"O 2 M22 "O 2 =b
D  F.b; n  p  b; 0 M22 = 2 /
SSE =.n  p  b/
D F -test statistic for testing hypothesis D 0 in MZ
D the multiple case analogue of externally Studentized residual
8 Regression diagnostics 43

(e) COOK D .O  O  /0 X0 X.O  O  /=.p O 2 / D O 0 H22 =.p


O O 2 /
D "O 02 M1 1
O 2 =.p O 2 /
22 H22 M22 "

8.22 jZ0 Zj D jX0 .In  UU0 /Xj D jX0.1/ X.1/ j D jM22 jjX0 Xj

8.23 jX0 Xj.1  i0i Hii / D jX0 .In  ii i0i /Xj; i.e., mi i D jX0.i/ X.i/ j=jX0 Xj
 
PX.1/ 0
8.24 C .X0.1/ / \ C .X0.2/ / D f0g () H D
0 PX.2/
   
PX.1/ 0 Ia  PX.1/ 0
8.25 PZ D ; QZ D I n  P Z D
0 Ib 0 0

8.26 Corresponding to 8.2 (p. 40), consider the models M D fy; X;  2 Vg,
M.i/ D fy.i/ ; X.i/ ;  2 V.n1/ g and MZ D fy; Z;  2 Vg, where r.Z/ D
Q 0 denote the BLUE of  under MZ . Then:
p C 1, and let Q D .Q 0 W / Z

(a) Q Z D Q .i/ ; Q D ePi =m


P ii ,
ePi
(b) DFBETAi .V / D Q  Q .i/ D .X0 V1 X/1 X0 V1 ii ,
P ii
m
P i i is the i th diagonal element of the matrix M
where m P dened as

(c) MP D V1  V1 X.X0 V1 X/1 X0 V1 D M.MVM/ M D .MVM/C ,


P We denote
and ePi is the i th element of the vector eP D My.
P D V1 X.X0 V1 X/1 X0 V1 and hence H
(d) H P CM
P D V1 .

ePi2 ePi2 Q2
(e) ti2 .V / D D D F -test statistic
P i i Q .i/
m 2 f ePi /
var. f /
var. Q
for testing D 0
SSE.V / P SSE.i/ .V /
(f) Q 2 D , where SSE.V / D y0 My; Q .i/
2
D ,
np np1
P i i D SSE.V /  Q2 m
(g) SSE.i/ .V / D SSE.V /  ePi2 =m P i i D SSEZ .V /,
(h) var.ePi / D  2 m
P ii ; Q D  2 =m
var./ P ii ,
.Q  Q .i/ /0 X0 V1 X.Q  Q .i/ / ri2 .V / hP i i
(i) COOK2i .V / D D .
p Q 2 p m P ii

8.27 Consider the deletion of the last b observations in model M D fy; X;  2 Vg


and let Z be partitioned as in 8.20 (p. 42). Then
(a) Q  Q  D .X0 V1 X/1 X0 V1 UQ D .X0 V1 X/1 X0 V1 UM P 1 eP 2
22
Q  is calculated without the last b observations;
44 Formulas Useful for Linear Regression Analysis and Related Matrix Theory

(b) Q D .U0 MU/


P 1 U0 My
P DM
P 1 eP 2 ;
22
P
eP 2 D lower part of eP D My
P  y0 MU.U
(c) SSEZ .V / D y0 My P 0 P P
MU/1 U0 My
P  eP 0 M
D y0 My P 1 P 2
2 22 e
P Z y D y0 MZ .MZ VMZ / MZ y
D y0 M MZ D I  P Z
D SSE .V /,
P 1 eP 2 =b
eP 02 M
(d) t2 .V / D 22 P 22 = 2 /,
 F.b; n  p  b; 0 M
SSE .V /=.n  p  b/
(e) COOK2 .V / D .Q  Q  /0 X0 V1 X.Q  Q  /=.p 2 / D Q 0 H Q
P 22 =.p Q 2 /.

9 BLUE: Some preliminaries

9.1 Let Z be any matrix such that C .Z/ D C .X? / D N .X0 / D C .M/ and
denote F D VC1=2 X, L D V1=2 M. Then F0 L D X0 PV M, and
X0 PV M D 0 H) PF C PL D P.FWL/ D PV :

9.2 P Consider the linear model M D fy; X;  2 Vg, where X and V


Matrix M.
P and M
may not have full column ranks. Let the matrices M R be dened as
P D M.MVM/ M;
M R D PV MP
M P V:

The matrix M P V is always


P is unique i C .X W V/ D Rn . However, PV MP
unique. Suppose that the condition HPV M D 0 holds. Then
R D PV M.MVM/ MPV D VC  VC X.X0 VC X/ X0 VC ,
(a) M
R D MVC M  MVC X.X0 VC X/ X0 VC M,
(b) M
R D MM
(c) M R D MM
R R
D MMM,
(d) M.MVM/C M D .MVM/C M D M.MVM/C D .MVM/C ,
R M
(e) MV R D M,
R i.e., V 2 f.M/
R  g,
R D r.VM/ D r.V/  dim C .X/ \ C .V/ D r.X W V/  r.X/,
(f) r.M/
(g) If Z is a matrix with property C .Z/ D C .M/, then
R D PV Z.Z0 VZ/ Z0 PV ;
M P D VZ.Z0 VZ/ Z0 V:
VMV

(h) Let .X W Z/ be orthogonal. Then always (even if HPV M 0)


C
.X W Z/0 V.X W Z/ D .X W Z/0 VC .X W Z/:
Moreover, if in addition we have HPV M D 0, then
9 BLUE: Some preliminaries 45
 
X0 V C X X0 V C Z
.X W Z/0 V.X W Z/C D D
Z0 VC X Z0 VC Z
 
X0 VX  X0 VZ.Z0 VZ/ Z0 VXC X0 VC XX0 VZ.Z0 VZ/C
:
.Z0 VZ/C Z0 VXX0 VC X Z0 VZ  Z0 VX.X0 VX/ X0 VZC

(i) If V is positive denite and C .Z/ D C .M/, then


(i) P DM
M R D M.MVM/ M D .MVM/C D Z.Z0 VZ/ Z0
D V1  V1 X.X0 V1 X/ X0 V1 D V1 .I  PXIV1 /;
P
(ii) X.X0 V1 X/ X0 D V  VZ.Z0 VZ/ Z0 V D V  VMV,
(iii) X.X0 V1 X/ X0 V1 D I  VZ.Z0 VZ/ Z0
P D I  P0ZIV D I  PVZIV1 :
D I  VM
(j) If V is positive denite and .X W Z/ is orthogonal, then
.X0 V1 X/1 D X0 VX  X0 VZ.Z0 VZ/1 Z0 VX:

9.3 If V is positive denite, then


P D V1  H
(a) M P
D V1  V1 X.X0 V1 X/ X0 V1 D V1 .I  PXIV1 /;
P D M.MVM/ M D MMM
(b) M P P D MM
D MM P
D M.MVM/C M D .MVM/C M D M.MVM/C D .MVM/C :

9.4 M.MVM/ M D M.MVM/C M i C .M/  C .MV/ i r.X W V/ D n

9.5 P general case. Consider the model fy; X; Vg. Let U be any matrix
Matrix M:
such that W D V C XUX0 has the property C .W/ D C .X W V/. Then
PW M.MVM/ MPW D .MVM/C
D WC  WC X.X0 W X/ X0 WC ;
i.e.,
P W WD M
PW MP R W D WC  WC X.X0 W X/ X0 WC :
R W has the corresponding properties as M
The matrix M R in 9.2 (p. 44).

9.6 W D VM.MVM/ MV C X.X0 W X/ X0

9.7 V  VM.MVM/ MV D X.X0 W X/ X0  X0 UX


D HVH  HVM.MVM/ MVH
46 Formulas Useful for Linear Regression Analysis and Related Matrix Theory

9.8 X.X0 WC X/ X0 WC D X.X0 W X/ X0 WC D I  WM.MWM/ MPW


D I  VM.MVM/ MPW D H  HVM.MVM/ MPW

9.9 Let W D V C XUU0 X 2 NNDn have property C .X W V/ D C .W/. Then


W is a generalized inverse of V i C .V/ \ C .XUU0 X0 / D f0g.

9.10 Properties of X0 W X. Suppose that V 2 NNDnn , X 2 Rnp , and W D


V C XUX0 , where U 2 Rpp . Then the following statements are equivalent:
(a) C .X/  C .W/,
(b) C .X W V/ D C .W/,
(c) r.X W V/ D r.W/,
(d) X0 W X is invariant for any choice of W ,
(e) C .X0 W X/ is invariant for any choice of W ,
(f) C .X0 W X/ D C .X0 / for any choice of W ,
(g) r.X0 W X/ D r.X/ irrespective of the choice of W ,
(h) r.X0 W X/ is invariant with respect to the choice of W ,
(i) X.X0 W X/ X0 W X D X for any choices of the g-inverses involved.
Moreover, each of these statements is equivalent to (a) C .X/  C .W0 /, and
hence to the statements (b)(i) obtained from (b)(i), by setting W0 in place
of W. We will denote
(w) W D f Wnn W W D V C XUX0 ; C .W/ D C .X W V/ g.

9.11 C .VX? /? D C .W X W I  W W/ W D V C XUX0 2 W

9.12 Consider the linear model fy; X; Vg and denote W D V C XUX0 2 W, and
let W be an arbitrary g-inverse of W. Then
C .W X/ C .X/? D Rn ; C .W X/? C .X/ D Rn ;
C .W /0 X C .X/? D Rn ; C .W /0 X? C .X/ D Rn :

9.13 PA .PA NPA /C PA D .PA NPA /C PA


D PA .PA NPA /C D .PA NPA /C for any N

9.14 Let X be any matrix such that C .X/ D C .X /. Then


(a) PV X .X0 VX / X0 PV D PV X.X0 VX/ X0 PV ,
(b) PV X .X0 VC X / X0 PV D PV X.X0 VC X/ X0 PV ,
(c) C .X/  C .V/ H)
9 BLUE: Some preliminaries 47

X .X0 V X / X0 D X.X0 V X/ X0 D H.H0 V H/ H D .H0 V H/C .

9.15 Let Xo be such that C .X/ D C .Xo / and X0o Xo D Ir , where r D r.X/. Then
Xo .X0o VC Xo /C X0o D H.H0 VC H/C H D .H0 VC H/C ;
whereas Xo .X0o VC Xo /C X0o D X.X0 VC X/C X0 i C .X0 XX0 V/ D C .X0 V/.

9.16 Let V 2 NNDnn , X 2 Rnp , and U 2 Rpp be such such that W D


V C XUX0 saties the condition C .W/ D C .X W V/. Then the equality
W D VB.B0 VB/ B0 V C X.X0 W X/ X0
holds for an n  q matrix B i
C .VW X/  C .B/? and C .VX? /  C .VB/;
or, equivalently, C .VW X/ D C .B/? \ C .V/, the subspace C .VW X/
being independent of the choice of W .

9.17 Let V 2 NNDnn , X 2 Rnp and C .X/  C .V/. Then the equality
V D VB.B0 VB/ B0 V C X.X0 V X/ X0
holds for an n  q matrix B i C .X/ D C .B/? \ C .V/.

9.18 Denition of estimability: Let K0 2 Rqp . Then K0 is estimable if there


exists a linear estimator Fy which is unbiased for K0 .
In what follows we consider the model fy; X1 1 C X2 2 ; Vg, where Xi 2
Rnpi , i D 1; 2, p1 C p2 D p.

9.19 K0 is estimable i C .K/  C .X0 / i K D X0 A for some A i K0 O D


K0 .X0 X/ X0 y is unique for all .X0 X/ .

9.20 L 2 is estimable i C .L0 /  C .X02 M1 /, i.e., L D BM1 X2 for some B.

9.21 The following statements are equivalent:


(a) 2 is estimable,

(b) C Ip02  C .X0 /,

(c) C .X1 / \ C .X2 / D f0g and r.X2 / D p2 ,


(d) r.M1 X2 / D r.X2 / D p2 .

9.22 Denoting PX2 X1 D X2 .X02 M1 X2 / X02 M1 , the following statements are
equivalent:
(a) X2 2 is estimable, (b) r.X02 / D r.X02 M1 /,
48 Formulas Useful for Linear Regression Analysis and Related Matrix Theory

(c) C .X1 / \ C .X2 / D f0g, (d) PX2 X1 X2 D X2 ,


(e) PX2 X1 is invariant with respect to the choice of .X02 M1 X2 / ,
(f) PX2 X1 is a projector onto C .X2 / along C .X1 / C .X/? ,
(g) H D PX2 X1 C PX1 X2 .

9.23 k is estimable () xk C .X1 / () M1 xk 0

9.24 k0 is estimable () k 2 C .X0 /

9.25 is estimable () r.Xnp / D p

9.26 Consider the one-way ANOVA using the following parametrization:


 

y D X C "; D ;

0 1 0 101 0 1
y1 1n1 1 n1 : : : 0 "1
@ :: A D B :: :: : : :: C B C
B :1 C C @ :: A :
: @ : : : : A @:A :
yg : "g
1ng 0 : : : 1ng g
Then the parametric function a0 D .b; c0 / D b C c0  is estimable i
a0 u D 0; where u0 D .1; 10g /; i.e., b  .c1 C    C cg / D 0;
and c  is estimable i c 1g D 0. Such a function c0  is called a contrast, and
0 0

the contrasts of the form i  j , i j , are called elementary contrasts.

9.27 (Continued . . . ) Denote the model matrix above as X D .1n W X0 / 2


Rn.gC1/ , and let C 2 Rnn be the centering matrix. Then
(a) N .X/ D C .u/; where u0 D .1; 10g /,
(b) N .CX0 / D C .X00 C/? D C .1g /; and hence C .X00 C/ D C .1g /? .

10 Best linear unbiased estimator

10.1 Denition 1: Let k0 be an estimable parametric function. Then g0 y is


BLUE.k0 / under fy; X;  2 Vg if g0 y is an unbiased estimator of k0 and
it has the minimum variance among all linear unbiased estimators of k0 :
E.g0 y/ D k0 and var.g0 y/
var.f 0 y/ for any f W E.f 0 y/ D k0 :

10.2 Denition 2: Gy is BLUE.X/ under fy; X;  2 Vg if


E.Gy/ D X and cov.Gy/
L cov.Fy/ for any F W E.Fy/ D X:
10 Best linear unbiased estimator 49

10.3 Gy D BLUE.X/ under fy; X;  2 Vg () G.X W VM/ D .X W 0/.

10.4 GaussMarkov theorem: OLSE.X/ D BLUE.X/ under fy; X;  2 Ig.

10.5 f D XQ should be understood as


The notation Gy D BLUE.X/ D X
Gy 2 fBLUE.X j M /g; i.e., G 2 fPXjVM g;
where fBLUE.X j M /g refers to the set of all representations of the BLUE.
The matrix G is unique i C .X W V/ D Rn , but the value of Gy is always
unique after observing y.

10.6 A gentle warning regarding notation 10.5 may be worth giving. Namely in
10.5 Q refers now to any vector Q D Ay such that XQ is the BLUE.X/. The
vector Q in 10.5 need not be the BLUE./the parameter vector may not
even be estimable.

10.7 Let K0 be an estimable parametric function under fy; X;  2 Vg. Then


Gy D BLUE.K0 / () G.X W VM/ D .K0 W 0/:

10.8 Gy D BLUE.X/ under fy; X;  2 Vg i there exists a matrix L so that G is


a solution to (Pandoras Box)
   0  
V X G 0
D :
X0 0 L X0

10.9 The general solution for G satisfying G.X W VM/ D .X W 0/ can be expressed,
for example, in the following ways:
(a) G1 D .X W 0/.X W VM/ C F1 In  .X W VM/.X W VM/ ,
(b) G2 D X.X0 W X/ X0 W C F2 .In  WW /,
(c) G3 D In  VM.MVM/ M C F3 In  MVM.MVM/ M,
(d) G4 D H  HVM.MVM/ M C F4 In  MVM.MVM/ M,
where F1 , F2 , F3 and F4 are arbitrary matrices, W D V C XUX0 and U is
any matrix such that C .W/ D C .X W V/, i.e., W 2 W, see 9.10w (p. 46).

10.10 If C .X/  C .V/, fy; X;  2 Vg is called a weakly singular linear model.

10.11 Consistency condition for M D fy; X;  2 Vg. Under M , we have


y 2 C .X W V/ with probability 1.
The above statement holds if the model M is indeed correct, so to say (as all
our models are supposed to be). Notice that C .X W V/ can be written as
C .X W V/ D C .X W VM/ D C .X/ C .VM/:
50 Formulas Useful for Linear Regression Analysis and Related Matrix Theory

10.12 Under the (consistent model) fy; X;  2 Vg, the estimators Ay and By are said
to be equal if their realized values are equal for all y 2 C .X W V/:
A1 y equals A2 y () A1 y D A2 y for all y 2 C .X W V/ D C .X W VM/:

10.13 If A1 y and A2 y are two BLUEs under fy; X;  2 Vg, then A1 y D A2 y for all
y 2 C .X W V/.

10.14 A linear estimator ` 0 y which is unbiased for zero, is called a linear zero func-
tion. Every linear zero function can be written as b0 My for some b. Hence an
unbiased estimator Gy is BLUE.X/ i Gy is uncorrelated with every linear
zero function.

10.15 If Gy is the BLUE for an estimable K0 under fy; X;  2 Vg, then LGy is the
BLUE for LK0 ; shortly, fLBLUE.K0 /g  fBLUE.LK0 /g for any L.

10.16 Gy D BLUE.GX/ under fy; X;  2 Vg () GVM D 0

10.17 If Gy D BLUE.X/ then HGy D BLUE.X/ and thereby there exists L


such that XLy D BLUE.X/.

10.18 Consider the models M D fy; X; Vg and MW D fy; X; Wg, where W D


V C XUU0 X0 , and C .W/ D C .X W V/. Then
(a) cov.XO j MW / D HWH;
cov.XQ j MW / D X.X0 W X/ X0 D .HW H/C ;

(b) cov.XO j M / D HVH; cov.XQ j M / D X.X0 W X/ X0  XUU0 X0 ,


(c) cov.XO j MW /  cov.XQ j MW / D cov.XO j M /  cov.XQ j M /,
(d) fBLUE.X j MW /g D fBLUE.X j M /g.

10.19 Under fy; X;  2 Vg, the BLUE.X/ has the representations


f D XQ D 
BLUE.X/ D X Q
D PXIV1 y D X.X0 V1 X/ X0 V1 y V pd
D X.X0# X# / X0# y# y# D V 1=2
y, X# D V 1=2
X
0   0 
D X.X V X/ X V y C .X/  C .V/
D H  HVM.MVM/ My
D OLSE.X/  HVM.MVM/ My
D I  VM.MVM/ My
D X.X0 W X/ X0 W y W D V C XUX0 2 W
10 Best linear unbiased estimator 51

10.20 BLUE./ D Q D .X0 X/1 X0 y  .X0 X/1 X0 VM.MVM/ My


D O  .X0 X/1 X0 VMy
P only r.X/ D p requested
0 C 1 0 C
D .X V X/ XV y C .X/  C .V/, any V will do

Q D X.X0 V1 X/ X0


10.21 (a) cov.X/ V pd
0   0  C
D X.X V X/ X D .HV H/ C .X/  C .V/
D HVH  HVM.MVM/ MVH
D K0 K  K0 PL K K D V1=2 H; L D V1=2 M
D cov.X/O  HVM.MVM/ MVH
D V  VM.MVM/ MV D V1=2 .I  PV1=2 M /V1=2
D X.X0 W X/ X0  XUX0 ,
Q
(b) cov./
D .X0 V1 X/1 V pd, r.X/ D p
D .X X/ X VX.X X/  .X X/ X VM.MVM/ MVX.X0 X/1
0 1 0 0 1 0 1 0 

O  .X0 X/1 K0 PL K.X0 X/1


D cov./ only r.X/ D p requested
0 1 0 0 0 1
D .X X/ .K K  K PL K/.X X/ K D V1=2 X; L D V1=2 M
D .X0 VC X/1 C .X/  C .V/; r.X/ D p

10.22 Denoting D .H W M/0 V.H W M/, we have cov.X/ Q D MVM= and


Q Q D
hence r./ D r.V/ D r.MVM/ C rcov.X/, which yields rcov.X/
Q
r.V/r.VM/ D dim C .X/\C .V/. Moreover, C cov.X/ D C .X/\C .V/
Q D dim C .X/ \ C .V/.
and if r.X/ D p, we have rcov./

10.23 The BLUE of X and its covariance matrix remain invariant for any choice
of X as long as C .X/ remains the same.

10.24 (a) cov.XO  X/


Q D cov.X/
O  cov.X/
Q D HVM.MVM/ MVH

(b) cov.O  /
Q D cov./
O  cov./
Q
D .X0 X/1 X0 VM.MVM/ MVX.X0 X/1
O /
(c) cov.; Q D cov./
Q

10.25 "Q D y  XQ D VM.MVM/ My D VMy


P residual of the BLUE

Q D cov.y/  cov.X/
Q D cov.y  X/
10.26 cov."/ Q D covVM.MVM/ My
P D VM.MVM/ MV;
D cov.VMy/ Q D C .VM/
C cov."/

10.27 Pandoras Box. Consider the model fy; X;  2 Vg and denote


52 Formulas Useful for Linear Regression Analysis and Related Matrix Theory
     
V X B1 B2 V X
D ; and BD D 2 f  g:
X0 0 B3 B4 X0 0
Then
   
V X
(a) C \ C D f0g,
X0 0
 
V X
(b) r D r.V W X/ C r.X/,
X0 0
(c) XB02 X D X; XB3 X D X,
(d) XB4 X0 D XB04 X0 D VB03 X0 D XB3 V D VB2 X0 D XB02 V,
(e) X0 B1 X, X0 B1 V and VB1 X are all zero matrices,
(f) VB1 VB1 V D VB1 V D VB01 VB1 V D VB01 V,
(g) tr.VB1 / D r.V W X/  r.X/ D tr.VB01 /,
(h) VB1 V and XB4 X0 are invariant for any choice of B1 and B4 ,
(i) XQ D XB02 y, cov.X/ Q D XB4 X0 D V  VB1 V,
"Q D y  XQ D VB1 y,
(j) for estimable k0 , k0 Q D k0 B02 y D k0 B3 y, var.k0 /
Q D  2 k0 B4 k,

(k) Q 2 D y0 B1 y=f is an unbiased estimator of  2 ; f D r.VM/.

10.28 "Q D y  XQ D .I  PXIV1 /y residual of the BLUE, V pd


P
D VM.MVM/ My D VMy
V can be singular

10.29 "Q # D V1=2 "Q residual in M# D fy# ; X# ;  2 Ig, V pd


D V1=2 .I  PXIV1 /y

10.30 SSE.V / D minky  Xk2V1 D min.y  X/0 V1 .y  X/


D ky  PXIV1 yk2V1
D ky  Xk Q 2 1 D .y  X/ Q 0 V1 .y Q D "Q 0 V1 "Q
 X/
V
D "Q 0# "Q # D y0# .I  PX# /y# y# D V1=2 y, X# D V1=2 X
D y0 V1  V1 X.X0 V1 X/ X0 V1 y
D y0 M.MVM/ My D y0 My P general presentation

10.31 Q 2 D SSE.V /=n  r.X/ unbiased estimator of  2 , V pd

10.32 Let W be dened as W D V C XUX0 , with C .W/ D C .X W V/. Then an


unbiased estimator for  2 is
10 Best linear unbiased estimator 53

(a) Q 2 D SSE.W /=f , where f D r.X W V/  r.X/ D r.VM/, and


Q 0 W .y  X/
(b) SSE.W / D .y  X/ Q D .y  X/
Q 0 V .y  X/
Q
0  0   0   0 
D "Q W "Q D y W  W X.X W X/ X W y
D y0 M.MVM/ My D y0 My P D SSE.V /: Note: y 2 C .W/

P D y0 My D SSE.I / 8 y 2 C .X W V/ () .VM/2 D VM
10.33 SSE.V / D y0 My

10.34 Let V be pd and let X be partitioned as X D .X1 W X2 /, r.X/ D p. Then, in


view of 21.2321.24 (p. 95):
(a) P.X1 WX2 /IV1 D X1 .X01 V1 X1 /1 X01 V1 C VM P 1 X2 .X02 M
P 1 X2 /1 X02 M
P1
P 2 X1 /1 X01 M
D X1 .X01 M P 2 C X2 .X02 M
P 1 X2 /1 X02 M
P 1;
P 1 D V1  V1 X1 .X0 V1 X1 /1 X0 V1 D M1 .M1 VM1 / M1 ,
(b) M 1 1

(c) Q 1 D .X01 M
P 2 X1 /1 X0 M
P
1 2 y; Q 2 D .X02 M
P 1 X2 /1 X0 M
P
2 1 y,

(d) cov.Q 2 / D  2 .X02 M


P 1 X2 /1 ,

(e) cov.O 2 / D  2 .X02 M1 X2 /1 X02 M1 VM1 X2 .X02 M1 X2 /1 ,


(f) BLUE.X/
D P.X1 WX2 /IV1 y D X1 Q 1 C X2 Q 2
D X1 .X0 M P 2 X1 /1 X0 M
P 2 y C X2 .X0 M
P 1 X2 /1 X0 M
P 1y
1 1 2 2
D 0 1 1 0 1
X1 .X1 V X1 / X1 V y P 1 X2 .X0 M
C VM P 1 0 P
2 1 X2 / X2 M1 y
D PX1 IV1 y C VM P 1 X2 Q 2
D X1 Q 1 .M1 / C VMP 1 X2 Q 2 .M12 /; M1 D fy; X1 1 ;  2 Vg
(g) Q 1 .M12 / D Q 1 .M1 /  .X01 V1 X1 /1 X01 V1 X2 Q 2 .M12 /.
(h) Replacing V1 with VC and denoting PAIVC D A.A0 VC A/1 A0 VC , the
results in (c)(g) hold under a weakly singular model.

10.35 Assume that C .X1 / \ C .X2 / D f0g, r.M1 X2 / D p2 and denote


W D V C X1 U1 X01 C X2 U2 X02 ; Wi D V C Xi Ui X0i ; i D 1; 2;
P 1W D M1 .M1 WM1 / M1 D M1 .M1 W2 M1 / M1 ;
M
where C .Wi / D C .Xi W V/, C .W/ D C .X W V/. Then
BLUE. 2 / D Q 2 .M12 / D .X02 M
P 1W X2 /1 X02 M
P 1W y:

10.36 (Continued . . . ) If the disjointness holds, and C .X2 /  C .X1 W V/, then
(a) X2 Q 2 .M12 / D X2 .X02 M
P 1 X 2 / X 0 M
P
2 1 y,

(b) X1 Q 1 .M12 / D X1 Q 1 .M1 /  X1 .X01 WC X1 / X01 WC X2 Q 2 .M12 /,


54 Formulas Useful for Linear Regression Analysis and Related Matrix Theory

(c) BLUE.X j M12 /


P 1  BLUE.X2 2 j M12 /
D BLUE.X1 1 j M1 / C W1 M
D BLUE.X1 1 j M1 / C W1  BLUE.M P 1 X2 2 j M12 /:

P 12 y
10.37 SSE.V / D minky  Xk2V1 D y0 M P 12 D M
M P

D y0 V1 .I  P.X1 WX2 /IV1 /y


P 1 X2 .X02 M
D y0 V1 .I  PX1 IV1 /y  y0 M P 1 X2 /1 X02 M
P 1y
D SSEH .V /  SSE.V / H : 2 D 0

P 1 X2 .X02 M
10.38 SSE.V / D y0 M P 1 X2 /1 X02 M
P 1y
0 P 0 P
D y M1 y  y M12 y change in SSE.V / due to the hypothesis
D minky  Xk2V1  minky  Xk2V1 H W 2 D 0
H

D Q 02 cov.Q 2 /1 Q 2  2 D Q.V /

10.39 OLSE.X/ D BLUE.X/ i any of the following equivalent conditions


holds. (Note: V is replaceable with VC and H and M can be interchanged.)
(a) HV D VH, (b) HV D HVH,
(c) HVM D 0, (d) X0 VZ D 0, where C .Z/ D C .M/,
(e) C .VX/  C .X/, (f) C .VX/ D C .X/ \ C .V/,
(g) HVH
L V, i.e., V  HVH is nonnegative denite,
(h) HVH
rs V, i.e., r.V  HVH/ D r.V/  r.HVH/, i.e., HVH and V are
rank-subtractive, i.e., HVH is below V w.r.t. the minus ordering,
(i) C .X/ has a basis consisting of r D r.X/ ortonormal eigenvectors of V,
(j) r.T0f1g X/ C    C r.T0fsg X/ D r.X/, where Tfig is a matrix consisting of
the orthonormal eigenvectors corresponding to the ith largest eigenvalue
fig of V; f1g > f2g >    > fsg , fig s are the distinct eigenvalues
of V,
(k) T0fig HTfig D .T0fig HTfig /2 for all i D 1; 2; : : : ; s,
(l) T0fig HTfj g D 0 for all i; j D 1; 2; : : : ; s, i j ,
(m) the squared nonzero canonical correlations between y and Hy are the
nonzero eigenvalues of V HVH for all V , i.e.,
cc2C .y; Hy/ D nzch.V HVH/ for all V ;

(n) V can be expressed as V D HAH C MBM, where A L 0, B L 0, i.e.,


10 Best linear unbiased estimator 55

V 2 V1 D f V L 0 W V D HAH C MBM; A L 0; B L 0 g;

(o) V can be expressed as V D XCX0 C ZDZ0 , where C L 0, D L 0, i.e.,


V 2 V2 D f V L 0 W V D XCX0 C ZDZ0 ; C L 0; D L 0 g;

(p) V can be expressed as V D I C XKX0 C ZLZ0 , where 2 R, and K


and L are symmetric, such that V is nonnegative denite, i.e.,
V 2 V3 D f V L 0 W V D I C XKX0 C ZLZ0 ; K D K0 ; L D L0 g:

10.40 Intraclass correlation structure. Consider the model M D fy; X;  2 Vg,


where 1 2 C .X/. If V D .1  %/I C %110 , then OLSE.X/ D BLUE.X/.

10.41 Consider models M% D fy; X;  2 Vg and M0 D fy; X;  2 Ig, where V


has intraclass correlation structure. Let the hypothesis to be tested be H :
E.y/ 2 C .X /, where it is assumed that 1 2 C .X /  C .X/. Then the
F -test statistics for testing H are the same under M% and M0 .

10.42 Consider the models M1 D fy; X; V1 g and M2 D fy; X; V2 g where V1


and V2 are pd and X 2 Rnp
r . Then the following statements are equivalent:
(a) BLUE.X j M1 / D BLUE.X j M2 /,
(b) PXIV1 D PXIV1 ,
1 2

(c) X 0
V1
2 PXIV1 D X0 V1
2 ,
1

(d) P0XIV1 V1 1


2 PXIV1 D V2 PXIV1 ,
1 1 1

(e) V1
2 PXIV1 is symmetric,
1

(f) C .V1 1
1 X/ D C .V2 X/,

(g) C .V1 X? / D C .V2 X? /,


(h) C .V2 V1
1 X/ D C .X/,

(i) X0 V1
1 V2 M D 0,

(j) C .V1=2
1 V2 V1=2
1  V1=2
1 X/ D C .V1=2
1 X/,

(k) C .V1=2
1 X/ has a basis U D .u1 W : : : W ur / comprising a set of r
eigenvectors of V1=2
1 V2 V1=2
1 ,

(l) V1=2
1 X D UA for some Arp , r.A/ D r,

(m) X D V1=2 1=2 1


1 UA; the columns of V1 U are r eigenvectors of V2 V1 ,

(n) C .X/ has a basis comprising a set of r eigenvectors of V2 V1


1 .
56 Formulas Useful for Linear Regression Analysis and Related Matrix Theory

10.43 Consider the models M1 D fy; X; V1 g and M2 D fy; X; V2 g, where


r.X/ D r. Denote PXIWC D X.X0 WC  0 C
1 X/ X W1 , where W1 D V1 C
1
XUU0 X0 , and C .W1 / D C .X W V1 / and so PXIWC y is the BLUE for X
1
under M1 . Then PXIWC y is the BLUE for X also under M2 i any of the
1
following equivalent conditions holds:
(a) X0 WC
1 V2 M D 0, (b) C .V2 WC
1 X/  C .X/,

(c) C .V2 M/  N .X0 WC C ? C ?


1 / D C .W1 X/ D C .G/; G 2 f.W1 X/ g,

(d) C .WC
1 X/ is spanned by a set of r proper eigenvectors of V2 w.r.t. W1 ,

(e) C .X/ is spanned by a set of r eigenvectors of V2 WC


1,

(f) PXIWC V2 is symmetric,


1

(g) V2 2 f V2 2 NNDn W V2 D XN1 X0 C GN2 G0 for some N1 and N2 g.

10.44 Consider the models M1 D fy; X; V1 g and M2 D fy; X; V2 g, and let


the notation fBLUE.X j M1 /g  fBLUE.X j M2 /g mean that every rep-
resentation of the BLUE for X under M1 remains the BLUE for X under
M2 . Then the following statements are equivalent:
(a) fBLUE.X j M1 /g  fBLUE.X j M2 /g,
(b) fBLUE.K0 j M1 /g  fBLUE.K0 / j M2 /g for every estimable K0 ,
(c) C .V2 X? /  C .V1 X? /,
(d) V2 D aV1 C XN1 X0 C V1 MN2 MV1 for some a 2 R, N1 and N2 ,
(e) V2 D XN3 X0 C V1 MN4 MV1 for some N3 and N4 .

10.45 Consider the models M1 D fy; X; V1 g and M2 D fy; X; V2 g. For X to


have a common BLUE under M1 and M2 it is necessary and sucient that
C .V1 X? W V2 X? / \ C .X/ D f0g:

10.46 Consider the linear models M1 D fy; X; V1 g and M2 D fy; X; V2 g. Then


the following statements are equivalent:
(a) G.X W V1 M W V2 M/ D .X W 0 W 0/ has a solution for G,
(b) C .V1 M W V2 M/ \ C .X/ D f0g,
   
MV1 MV1 M
(c) C C :
MV2 MV2 M

10.47 Let U 2 Rnk be given such that r.U/


n  1. Then for every X satis-
fying C .U/  C .X/ the equality OLSE.X/ D BLUE.X/ holds under
fy; X;  2 Vg i any of the following equivalent conditions holds:
10 Best linear unbiased estimator 57

trV.I  PU /
(a) V.I  PU / D .I  PU /,
n  r.U/
(b) V can be expressed as V D aICUAU0 , where a 2 R, and A is symmetric,
such that V is nonnegative denite.

10.48 (Continued . . . ) If U D 1n then V in (b) above becomes V D aI C b110 , i.e.


V is a completely symmetric matrix.

10.49 Consider the model M12 D fy; X1 1 C X2 2 ; Vg and the reduced model
M121 D fM1 y; M1 X2 2 ; M1 VM1 g. Then
(a) every estimable function of 2 is of the form LM1 X2 2 for some L,
(b) K0 2 is estimable under M12 i it is estimable under M121 .

10.50 Generalized FrischWaughLovell theorem. Let us denote


fBLUE.M1 X2 2 j M12 /g D f Ay W Ay is BLUE for M1 X2 2 g:
Then every representation of the BLUE of M1 X2 2 under M12 remains the
BLUE under M121 and vice versa, i.e., the sets of the BLUEs coincide:
fBLUE.M1 X2 2 j M12 /g D fBLUE.M1 X2 2 j M121 /g:
In other words: Let K0 2 be an arbitrary estimable parametric function under
M12 . Then every representation of the BLUE of K0 2 under M12 remains
the BLUE under M121 and vice versa.

10.51 Let 2 be estimable under M12 . Then


O 2 .M12 / D O 2 .M121 /; Q 2 .M12 / D Q 2 .M121 /:

10.52 Equality of the OLSE and BLUE of the subvectors. Consider a partitioned
linear model M12 , where X2 has full column rank and C .X1 /\C .X2 / D f0g
holds. Then the following statements are equivalent:
(a) O 2 .M12 / D Q 2 .M12 /,
(b) O 2 .M121 / D Q 2 .M121 /,
(c) C .M1 VM1 X2 /  C .M1 X2 /,
(d) C M1 VM1 .M1 X2 /?   C .M1 X2 /? ,
(e) PM1 X2 M1 VM1 D M1 VM1 PM1 X2 ,
(f) PM1 X2 M1 VM1 QM1 X2 D 0, where QM1 X2 D I  PM1 X2 ,
(g) PM1 X2 VM1 D M1 VPM1 X2 ,
(h) PM1 X2 VM D 0,
(i) C .M1 X2 / has a basis comprising p2 eigenvectors of M1 VM1 .
58 Formulas Useful for Linear Regression Analysis and Related Matrix Theory

11 The relative eciency of OLSE

11.1 The Watson eciency  under model M D fy; X; Vg. The relative e-
ciency of OLSE vs. BLUE is dened as the ratio
Q
detcov./ jX0 Xj2
O DD
e./ D 0 :
O
detcov./ jX VXj  jX0 V1 Xj
We have 0 < 
1, with  D 1 i OLSE./ D BLUE./.

BWK The BloomeldWatsonKnott inequality. The lower bound for  is


41 n 42 n1 4p npC1
  D 12 22    p2
;
.1 C n /2 .2 C n1 /2 .p C npC1 /2
i.e.,
1 Y
p
4i niC1 Y
p
min  D 0min D D i2 ;
X X XDIp jX0 VXj  jX0 V1 Xj .i C niC1 /2
iD1 iD1

where i D chi .V/, and i D i th antieigenvalue of V. The lower bound is


attained when X is
Xbad D .t1 tn W : : : W tp tnpC1 / D T.i1 in W : : : W ip inpC1 /;
where ti s are the orthonormal eigenvectors of V and p
n=2.

11.2 Suppose that r.X/ D p and denote K D V1=2 X, L D V1=2 M. Then


 0   0   0 
Xy X VX X0 VM K K K0 L
cov D D ;
My MVX MVM L0 K L0 L
     0 1 0 
O O cov./
cov./ Q .X X/ X VX.X0 X/1 
cov Q D Q cov./ Q D
cov./  
 0 0

FK KF FK .In  PL /KF
D ; F D .X0 X/1 ;
FK0 .In  PL /KF FK0 .In  PL /KF
Q D cov./
cov./ O  .X0 X/1 X0 VM.MVM/ MVX.X0 X/1
O  D:
D .X0 X/1 K0 .In  PL /K.X0 X/1 WD cov./

Q
jcov./j jX0 VX  X0 VM.MVM/ MVXj  jX0 Xj2
11.3 D D
O
jcov./j jX0 VXj  jX0 Xj2
jX0 VX  X0 VM.MVM/ MVXj
D
jX0 VXj
D jIp  .X0 VX/1 X0 VM.MVM/ MVXj
D jIp  .K0 K/1 K0 PL Kj;
11 The relative eciency of OLSE 59

where we must have r.VX/ D r.X/ D p so that dim C .X/ \ C .V/? D f0g.
In a weakly singular model with r.X/ D p the above representation is valid.

11.4 Consider a weakly singular linear model where r.X/ D p and let and 1 
2      p  0 and 1  2      p > 0 denote the canonical
correlations between X0 y and My, and O and ,
Q respectively, i.e.,

i D cci .X0 y; My/; O /;


i D cci .; Q i D 1; : : : ; p:
Suppose that p
n=2 in which case the number of the canonical correlations,
i.e., the number of pairs of canonical variables based on X0 y and My, is p.
Then
(a) m D r.X0 VM/ D number of nonzero i s,
p D number of nonzero i s,
(b) f 12 ; : : : ; m
2
g D nzch.X0 VX/1 X0 VM.MVM/ MVX
D nzch.PV1=2 X PV1=2 M / D nzch.PV1=2 H PV1=2 M /
D nzch.PK PL /;
(c) f12 ; : : : ; p2 g D chX0 X.X0 VX/1 X0 X  .X0 VC X/1 
O 1  cov./
D ch.cov.// Q
O 1 .cov./
D ch.cov.// O  D/
D chIp  X0 X.K0 K/1 K0 PL K.X0 X/1 
D chIp  .K0 K/1 K0 PL K D f1  ch.K0 K/1 K0 PL Kg
D f1  ch.X0 VX/1 X0 VM.MVM/ MVXg;
(d) i2 D 1  piC1
2
; i.e.;
O /
cc2i .; Q D 1  cc2 0
i D 1; : : : ; p.
piC1 .X y; My/;

11.5 Under a weakly singular model the Watson eciency can be written as
jX0 Xj2 jX0 VX  X0 VM.MVM/ MVXj
D D
jX0 VXj  jX0 VC Xj jX0 VXj
D jIp  X VM.MVM/ MVX.X VX/1 j
0  0

D jIn  PV1=2 X PV1=2 M j D jIn  PV1=2 H PV1=2 M j


Y
p Y
p
D i2 D .1  i2 /:
iD1 iD1

11.6 Q   2 cov./
The i2 s are the roots of the equation detcov./ O D 0, and
Q O
thereby they are solutions to cov./w D  cov./w, w 0.
2

11.7 Let Gy be the BLUE.X/ and denote K D V1=2 H. Then


60 Formulas Useful for Linear Regression Analysis and Related Matrix Theory
     0 
Hy HVH HVM K K K0 L
cov D D ;
My MVH MVM L0 K L0 L
     
Hy HVH GVG0 K0 K K0 .I  PL /K
cov D D :
Gy GVG0 GVG0 K0 .I  PL /K K0 .I  PL /K
Denote T1 D .HVH/ HVM.MVM/ MVH, T2 D .HVH/ GVG0 . Then
(a) cc2C .Hy; My/ D f 12 ; : : : ; m
2
g m D r.HVM/
D nzch.T1 / D nzch.PL PK /
D nzch.PL PK / D cc2C .X0 y; My/,
(b) cc2C .Hy; Gy/ D f12 ; : : : ; g2 g g D dim C .X/ \ C .V/
D nzch.T2 / D nzch.HVH/ GVG0  

D nzch.K0 K / K0 .In  PL /K 


D nzchPK .In  PL / D nzchPK .In  PL /,
(c) cc21 .Hy; Gy/ D max cor 2 .a0 Hy; b0 Gy/
a;b
a0 GVG0 a
D max max taken subject to VHa 0
a a0 HVHa
D ch1 .HVH/ GVG0  D 12 ,
(d) cc2i .Hy; My/ D 1  cc2hiC1 .Hy; Gy/; i D 1; : : : ; h; h D r.VH/.

11.8 u D # of unit canonical correlations ( i s) between X0 y and My


D # of unit canonical correlations between Hy and My
D dim C .V1=2 X/ \ C .V1=2 M/ D dim C .VX/ \ C .VM/
D r.V/  dim C .X/ \ C .V/  dim C .M/ \ C .V/
D r.HPV M/

11.9 Under M D fy; X; Vg, the following statements are equivalent:


(a) HPV M D 0, (b) PV M D MPV , (c) C .PV H/  C .H/,
(d) C .VH/ \ C .VM/ D f0g, (e) C .V 1=2
H/ \ C .V1=2 M/ D f0g,
(f) C .X/ D C .X/ \ C .V/  C .X/ \ C .V/? ,
(g) u D dim C .VH/ \ C .VM/ D r.HPV M/ D 0, where u is the number of
unit canonical correlations between Hy and My,
Q D PV X.X0 VC X/ X0 PV .
(h) cov.X/

11.10 The squared canonical correlations i2 s are the proper eigenvalues of K0 PL K


with respect to K0 K, i.e.,
HVM.MVM/ MVHw D 2 HVHw; VHw 0:
11 The relative eciency of OLSE 61

The squared canonical correlations i2 s are the proper eigenvalues of L0 L D


GVG0 with respect to K0 K D HVH:
GVG0 w D  2 HVHw D .1  2 /HVHw; VHw 0:

11.11 We can arrange the i2 s as follows:


12 D    D u2 D 1; u D r.HPV M/
2 2 2
1> uC1    uCt D m > 0; m D r.HVM/
2 2
uCtC1 D mC1 D  D mCs D %2h
2
D 0;
where i2 D 1  hiC1
2
, i D 1; : : : ; h D r.VH/, s D dim C .VX/ \ C .X/
and t D dim C .V/ \ C .X/  s.

11.12 Antieigenvalues. Denoting x D V1=2 z, the Watson eciency  can be inter-


preted as a specic squared cosine:
.x0 x/2 .x0 V1=2  V1=2 x/2
D D D cos2 .V1=2 x; V1=2 x/
x0 Vx  x0 V1 x x0 Vx  x0 V1 x
.z0 Vz/2
D 0 2 D cos2 .Vz; z/:
z V z  z0 z
The vector z minimizing  [maximizing angle .Vz; z/] is called an antieigen-
vector and the corresponding minimum ( 12 ) the rst antieigenvalue (squared):
p p
2 1 n 1 n
1 D min cos.Vz; z/ WD cos '.V/ D D ;
z0 1 C n .1 C n /=2
where '.V/ is the matrix angle of V and .i ; ti / is the p
i th eigenpair
p of V. The
rst antieigenvector has forms proportional to z1 D n t1 1 tn . The
second antieigenvalue of V is dened as
p
2 2 n1
2 D min cos.Vz; z/ D :
z0; z0 z1 D0 2 C n1

11.13 Consider the models M12 D fy; X1 1 C X2 2 ; Vg, M1 D fy; X1 ; Vg, and
M1H D fHy; X1 1 ; HVHg, where X is of full column rank and C .X/ 
C .V/. Then the corresponding Watson eciencies are
jcov.Q j M12 /j jX0 Xj2
(a) e.O j M12 / D D 0 ,
jcov.O j M12 /j jX VXj  jX0 VC Xj

jcov.Q 2 j M12 /j jX02 M1 X2 j2


(b) e.O 2 j M12 / D D 0 ,
jcov.O 2 j M12 /j P 1 X2 j
jX2 M1 VM1 X2 j  jX02 M
jX01 X1 j2
(c) e.O 1 j M1 / D ,
jX01 VX1 j  jX01 VC X1 j
62 Formulas Useful for Linear Regression Analysis and Related Matrix Theory

jX01 X1 j2
(d) e.O 1 j M1H / D .
jX01 VX1 j  jX01 .HVH/ X1 j

11.14 (Continued . . . ) The Watson eciency e.O j M12 / can be expressed as


1
e.O j M12 / D e.O 1 j M1 /  e.O 2 j M12 /  ;
e.O 1 j M1H /
where
jX01 X1 j2
e.O 1 j M1H / D
jX01 VX1 j  jX01 .HVH/ X1 j
D jIp1  X01 VM1 X2 .X02 M1 VM1 X2 /1 X02 M1 VX01 .X01 VX1 /1 j:

11.15 (Continued . . . ) The following statements are equivalent:


(a) e.O j M12 / D e.O 2 j M12 /,
(b) C .X1 /  C .VX/,
(c) Hy is linearly sucient for X1 1 under the model M1 .

11.16 (Continued . . . ) The eciency factorization multiplier  is dened as


e.O j M12 /
D :
e.O 1 j M12 /  e.O 2 j M12 /
We say that the Watson eciency factorizes if  D 1, i.e., e.O j M12 / D
e.O 1 j M12 /  e.O 2 j M12 /, which happens i (supposing X0 X D Ip )
jIn  PVC1=2 X1 PVC1=2 X2 j D jIn  PV1=2 X1 PV1=2 X2 j:
 
11.17 Consider a weakly singular linear model MZ D fy; Z; Vg D fy; Z ; Vg,
where Z D .X W ii / has full column rank, and denote M D fy; X; Vg, and
let M.i/ D fy.i/ ; X.i/ ; V.i/ g be such a version of M in which the i th obser-
vation is deleted. Assume that X0 ii 0 (i D 1; : : : ; n), and that OLSE./
equals BLUE./ under M . Then
O
.M Q
.i/ / D .M.i/ / for all i D 1; : : : ; n;
i V satises MVM D  M, for some nonzero scalar .
2

11.18 The BloomeldWatson eciency :


D 12 kHV  VHk2F D kHVMk2F D tr.HV  VH/.HV  VH/0
1
2
X
p
D tr.HVMV/ D tr.HV2 /  tr.HV/2
14 .i  ni1 /2 ;
iD1

where the equality is attained in the same situation as the minimum of 12 .
12 Linear suciency and admissibility 63

11.19 C. R. Raos criterion for the goodness of OLSE:


O  cov.X/
D trcov.X/ Q D trHVH  X.X0 V1 X/1 X0 
X
p

.1=2
i  1=2 2
niC1 / :
iD1

11.20 Equality of the BLUEs of X2 2 under two models. Consider the models
M12 D fy; X1 1 C X2 2 ; Vg and M12 D fy; X1 1 C X2 2 ; Vg, where
N
C .X1 / \ C .X2 / D f0g. Then the following N
statements are equivalent:
(a) fBLUE.X2 2 j M12 g  fBLUE.X2 2 j M12 g,
N
(b) fBLUE.M1 X2 2 j M12 g  fBLUE.M1 X2 2 j M12 g,
N
(c) C .VM/  C .X1 W VM/,
N
(d) C .M1 VM/  C .M1 VM/,
N
(e) C M1 VM1 .M1 X1 /?   C M1 VM1 .M1 X2 /? ,
N
(f) for some L1 and L2 , the matrix M1 VM1 can be expressed as
N
M1 VM1 D M1 X2 L1 X02 M1 C M1 VM1 QM1 X2 L2 QM1 X2 M1 VM1
N
D M1 X2 L1 X02 M1 C M1 VML2 MVM1

12 Linear suciency and admissibility

12.1 Under the model fy; X;  2 Vg, a linear statistic Fy is called linearly sucient
for X if there exists a matrix A such that AFy is the BLUE of X.

12.2 Let W D V C XUU0 X0 be an arbitrary nnd matrix satisfying C .W/ D C .X W


V/; this notation holds throughout this section. Then a statistic Fy is linearly
sucient for X i any of the following equivalent statements holds:
(a) C .X/  C .WF0 /,
(b) N .F/ \ C .X W V/  C .VX? /,
(c) r.X W VF0 / D r.WF0 /,
(d) C .X0 F0 / D C .X0 / and C .FX/ \ C .FVX? / D f0g.
(e) The best linear predictor of y based on Fy, BLP.yI Fy/, is almost surely
equal to a linear function of Fy which does not depend on .

12.3 Let Fy be linearly sucient for X under M D fy; X;  2 Vg. Then each
BLUE of X under the transformed model fFy; FX;  2 FVF0 g is the BLUE
of X under the original model M and vice versa.
64 Formulas Useful for Linear Regression Analysis and Related Matrix Theory

12.4 Let K0 be an estimable parametric function under the model fy; X;  2 Vg.
Then the following statements are equivalent:
(a) Fy is linearly sucient for K0 , (b) N .FX W FVX? /  N .K0 W 0/,
(c) C X.X0 W X/ K  C .WF0 /.

12.5 Under the model fy; X;  2 Vg, a linear statistic Fy is called linearly minimal
sucient for X, if for any other linearly sucient statistic Ly, there exists a
matrix A such that Fy D ALy almost surely.

12.6 The statistic Fy is linearly minimal sucient for X i C .X/ D C .WF0 /.

12.7 Let K0 be an estimable parametric function under the model fy; X;  2 Vg.
Then the following statements are equivalent:
(a) Fy is linearly minimal sucient for K0 ,
(b) N .FX W FVX? / D N .K0 W 0/,
(c) C .K/ D C .X0 F0 / and FVX? D 0.

12.8 Let X1 1 be estimable under fy; X1 1 C X2 2 ; Vg and denote W1 D V C


X1 X01 and MP 2 D M2 .M2 W1 M2 / M2 . Then X0 M P
1 2 y is linearly minimal
sucient for X1 1 .

12.9 Under the model fy; X;  2 Vg, a linear statistic Fy is called linearly complete
for X if for all matrices A such that E.AFy/ D 0 it follows that AFy D 0
almost surely.

12.10 A statistic Fy is linearly complete for X i C .FV/  C .FX/.

12.11 Fy is linearly complete and linearly sucient for X () Fy is minimal


linear sucient () C .X/ D C .WF0 /.

12.12 Linear LehmannSche theorem. Under the model fy; X;  2 Vg, let Ly be
linear unbiased estimator for X and let Fy be linearly complete and linearly
sucient for X. Then the BLUE of X is almost surely equal to the best
linear predictor of Ly based on Fy, BLP.LyI Fy/.

12.13 Admissibility. Consider the linear model M D fy; X;  2 Vg and let K0


be an estimable parametric function, K0 2 Rqp . Denote the set of all linear
(homogeneous) estimators of K0 as LEq .y/ D f Fy W F 2 Rqn g. The mean
squared error matrix of Fy with respect to K0 is dened as
MSEM.FyI K0 / D E.Fy  K0 /.Fy  K0 /0 ;
and the quadratic risk of Fy under M is
13 Best linear unbiased predictor 65

risk.FyI K0 / D trace MSEM.FyI K0 /


D E.Fy  K0 /0 .Fy  K0 /
D  2 trace.FVF0 / C k.FX  K0 /k2
D trace cov.Fy/ C bias2 :
A linear estimator Ay is said to be admissible for K0 among LEq .y/ under
M if there does not exist Fy 2 LEq .y/ such that the inequality
risk.FyI K0 /
risk.AyI K0 /
holds for every .;  2 / 2 Rp  .0; 1/ and is strict for at least one point
.;  2 /. The set of admissible estimators of K0 is denoted as AD.K0 /.

12.14 Consider the model fy; X;  2 Vg and let K0 (K0 2 Rqp ) be estimable,
i.e., K0 D LX for some L 2 Rqn . Then Fy 2 AD.K0 / i
C .VF0 /  C .X/; FVL0  FVF0 L 0 and
C .F  L/X D C .F  L/N;
where N is a matrix satisfying C .N/ D C .X/ \ C .V/.

12.15 Let K0 be any arbitrary parametric function. Then a necessary condition for
Fy 2 AD.K0 / is that C .FX W FV/  C .K0 /.

12.16 If Fy is linearly sucient and admissible for an estimable K0 , then Fy is


also linearly minimal sucient for K0 .

13 Best linear unbiased predictor

13.1 BLUP: Best linear unbiased predictor. Consider the linear model with new
observations:
     
y X V V12
Mf D ; ; 2 ;
yf Xf V21 V22
where yf is an unobservable random vector containing new observations (ob-
servable in future). Above we have E.y/ D X, E.yf / D Xf , and
cov.y/ D  2 V; cov.yf / D  2 V22 ; cov.y; yf / D  2 V12 :
A linear predictor Gy is said to be linear unbiased predictor, LUP, for yf if
E.Gy/ D E.yf / D Xf for all 2 Rp , i.e., E.Gy  yf / D 0, i.e., C .Xf0 / 
C .X0 /. Then yf is said unbiasedly predictable; the dierence Gy  yf is the
prediction error. A linear unbiased predictor Gy is the BLUP for yf if
cov.Gy  yf /
L cov.Fy  yf / for all Fy 2 fLUP.yf /g:
66 Formulas Useful for Linear Regression Analysis and Related Matrix Theory

13.2 A linear predictor Ay is the BLUP for yf i the equation


A.X W VX? / D .Xf W V21 X? /
is satised. This is equivalent to the existence of a matrix L such that A sat-
ises the equation (Pandoras Box for the BLUP)
   0  
V X A V12
D :
X0 0 L Xf0

13.3 A linear estimator ` 0 y which is unbiased for zero, is called a linear zero func-
tion. Every linear zero function can be written as b0 My for some b. Hence a
linear unbiased predictor Ay is the BLUP for yf under Mf i
cov.Ay  yf ; ` 0 y/ D 0 for every linear zero function ` 0 y:

13.4 Let C .Xf0 /  C .X0 /, i.e., Xf is estimable (assumed below in all state-
ments). The general solution to 13.2 can be written, for example, as
A0 D .Xf W V21 M/.X W VM/C C F.In  P.XWVM/ /;
where F is free to vary. Even though the multiplier A may not be unique, the
observed value Ay of the BLUP is unique with probability 1. We can get, for
example, the following matrices Ai such that Ai y equals the BLUP.yf /:
A1 D Xf B C V21 W In  X.X0 W X/ X0 W ;
A2 D Xf B C V21 V In  X.X0 W X/ X0 W ;
A3 D Xf B C V21 M.MVM/ M;
A4 D Xf .X0 X/ X0 C V21  Xf .X0 X/ X0 VM.MVM/ M;
where W D V C XUX0 and B D .X0 W X/ X0 W with U satisfying
C .W/ D C .X W V/.

13.5 The BLUP.yf / can be written as


Q
BLUP.yf / D Xf Q C V21 W .y  X/
D Xf Q C V21 W "Q D Xf Q C V21 V "Q
D Xf Q C V21 M.MVM/ My D BLUE.Xf / C V21 My
P
D Xf O C .V21  Xf XC V/My
P
P
D OLSE.Xf / C .V21  Xf XC V/My;

where "Q D y  XQ is the vector of the BLUEs residual:


XQ D y  VM.MVM/ My D y  VMy;
P P D WMy:
"Q D VMy P

13.6 BLUP.yf / D BLUE.Xf / () C .V12 /  C .X/

13.7 The following statements are equivalent:


13 Best linear unbiased predictor 67

(a) BLUP.yf / D OLSE.Xf / D Xf O for a xed Xf D LX,


(b) C V21  V.X0 /C Xf0   C .X/,
(c) C .V21  VHL0 /  C .X/.

13.8 The following statements are equivalent:


(a) BLUP.yf / D OLSE.Xf / D Xf O for all Xf of the form Xf D LX,
(b) C .VX/  C .X/ and C .V12 /  C .X/,
(c) OLSE.X/ D BLUE.X/ and BLUP.yf / D BLUE.Xf /.

13.9 Let Xf be estimable so that Xf D LX for some L, and denote Gy D


BLUE.X/. Then
Q C covV21 V .y  X/
covBLUP.yf / D cov.Xf / Q
D cov.LX/  Q
Q C covV21 V .y  X/
D cov.LGy/ C covV21 V .y  Gy/
P
D cov.LGy/ C cov.V21 My/
P 12
D L cov.Gy/L0 C V21 MV
D L cov.Gy/L C V21 V V  cov.Gy/V V12 :
0

13.10 If the covariance matrix of .y0 ; yf0 /0 has an intraclass correlation structure and
1n 2 C .X/, 1m 2 C .Xf /, then
BLUP.yf / D OLSE.Xf / D Xf O D Xf .X0 X/ X0 y D Xf XC y:

13.11 If the covariance matrix of .y0 ; yf /0 has an AR(1)-structure f%jij j g, then


BLUP.yf / D xf0 Q C %i0n "Q D xf0 Q C %eQn ;
where xf0 corresponds to Xf and "Q is the vector of the BLUEs residual.

13.12 Consider the models Mf and Mf , where Xf is a given estimable parametric


function such that C .Xf0 /  CN .X0 / and C .Xf0 /  C .X0 /:
     N  N
y X V11 V12
Mf D ; ; ;
yf Xf V21 V22
     
y X V V
Mf D ; N ; N 11 N 12 :
N y f X f V21 V22
N N N
Then every representation of the BLUP for yf under the model Mf is also a
BLUP for yf under the model Mf if and only if
  N 
X V11 M X V11 M
C N N N C :
Xf V21 M Xf V21 M
N N N
68 Formulas Useful for Linear Regression Analysis and Related Matrix Theory

13.13 Under the model Mf , a linear statistic Fy is called linearly prediction suf-
cient for yf if there exists a matrix A such that AFy is the BLUP for yf .
Moreover, Fy is called linearly minimal prediction sucient for yf if for any
other linearly prediction sucient statistics Sy, there exists a matrix A such
that Fy D ASy almost surely. Under the model Mf , Fy is linearly prediction
sucient for yf i
N .FX W FVX? /  N .Xf W V21 X? /;
and Fy is linearly minimal prediction sucient i the equality holds above.

13.14 Let Fy be linearly prediction sucient for yf under Mf . Then every repre-
sentation of the BLUP for yf under the transformed model
     
Fy FX FVF0 FV12
Mf D ; ; 2
yf Xf V21 F0 V22
is also the BLUP under Mf and vice versa.

14 Mixed model

14.1 The mixed model: y D X C Z C ", where y is an observable and " an un-
observable random vector, Xnp and Znq are known matrices, is a vector
of unknown parameters having xed values (xed eects),  is an unobserv-
able random vector (random eects) such that E./ D 0, E."/ D 0, and
cov./ D  2 D, cov."/ D  2 R, cov.; "/ D 0. We may denote briey
Mmix D fy; X C Z;  2 D;  2 Rg:
Then E.y/ D X, cov.y/ D cov.Z C "/ D  2 .ZDZ0 C R/ WD  2 , and
   0
  
y 2 ZDZ C R ZD 2 ZD
cov D D :
 DZ0 D DZ0 D

14.2 The mixed model can be presented as a version of the model with new ob-
servations; the new observations being now in : y D X C Z C ",  D
0  C "f , where cov."f / D cov./ D  2 D, cov.; "/ D cov."f ; "/ D 0:
     
y X ZDZ0 C R ZD
Mmix-new D ; ;  2 :
 0 DZ0 D

14.3 The linear predictor Ay is the BLUP for  under the mixed model Mmix i
A.X W M/ D .0 W DZ0 M/;
where D ZDZ0 C R. In terms of Pandoras Box, Ay D BLUP./ i there
exists a matrix L such that B satises the equation
14 Mixed model 69
   0  
X B ZD
D :
X0 0 L 0
The linear estimator By is the BLUE for X under the model Mmix i
B.X W M/ D .X W 0/:

14.4 Under the mixed model Mmix we have


(a) BLUE.X/ D XQ D X.X0 W X/ X0 W y,
Q D DZ0 W "Q D DZ0  "Q
(b) BLUP./ D Q D DZ0 W .y  X/
D DZ0 M.MM/ My D DZ0 M P y;
P D Z0 M.MM/ M.
where W D C XUX0 , C .W/ D C .X W / and M

14.5 Hendersons mixed model equations are dened as


 0 1     0 1 
XR X X0 R1 Z XR y
D :
Z0 R1 X Z0 R1 Z C D1  Z0 R1 y
They are obtained by minimizing the following quadratic form f .; / with
respect to and  (keeping also  as a non-random vector):
 0  1  
y  X  Z R 0 y  X  Z
f .; / D :
 0 D 
If  and   are solutions to the mixed model equations, then X  D
BLUE.X/ D XQ and   D BLUP./.

14.6 Let us denote


       
y X Z R 0
y# D ; X D ; V D ; D :
0 0 Iq 0 D 
Then f .; / in 14.5 can be expressed as
f .; / D .y#  X /0 V1
 .y#  X /;

and the minimum of f .; / is attained when has value


   
Q 0 1 1 0 1 XQ C ZQ
Q D D .X V X / X V y I X Q D :
Q Q

14.7 If V is singular, then minimizing the quadratic form


f .; / D .y#  X /0 W
 .y#  X /;

where W D V C X X0 , yields Q and .


Q
   
In M
14.8 One choice for X? ?
 is X D M D :
Z0 Z0 M
70 Formulas Useful for Linear Regression Analysis and Related Matrix Theory

14.9 Let us consider the xed eects partitioned model


F W y D X C Z C "; cov.y/ D cov."/ D R;
where both and  are xed (but of course unknown) coecients, and sup-
plement F with the stochastic restrictions y0 D  C "0 , cov."0 / D D. This
supplementation can be expressed as the partitioned model:
      
y X Z R 0
F D fy ; X ; V g D ; ; :
y0 0 Iq  0 D
Model F is a partitioned linear model F D fy; X CZ; Rg supplemented
with stochastic restrictions on .

14.10 The estimator By is the BLUE for X under the model F i B satises
the equation B.X W V X? / D .X W 0/, i.e.,
    
B11 B12 X Z RM X Z 0
(a) D :
B21 B22 0 Iq DZ0 M 0 Iq 0

14.11 Let By be the BLUE of X under the augmented model F , i.e.,
     
B11 B12 B11 B12
By D y D yC y
B21 B22 B21 B22 0
D BLUE.X j F /:
Then it is necessary and sucient that B21 y is the BLUP of  and .B11 
ZB21 /y is the BLUE of X under the mixed model Mmix ; in other words, (a)
in 14.10 holds i
       
In Z y B11  ZB21 BLUE.X j M /
B D yD ;
0 Iq 0 B21 BLUP. j M /
or equivalently
     
y B11 BLUE.X j M / C BLUP.Z j M /
B D yD :
0 B21 BLUP. j M /

14.12 Assume that B satises (a) in 14.10. Then all representations of BLUEs of
X and BLUPs of  in the mixed model Mmix can be generated through
 
B11  ZB21
y by varying B11 and B21 .
B21

14.13 Consider two mixed models:


M1 D fy; X C Z; D1 ; R1 g; M2 D fy; X C Z; D2 ; R2 g;
and denote i D ZDi Z0 CRi . Then every representation of the BLUP. j M1 /
continues to be BLUP. j M2 / i any of the following equivalent conditions
holds:
14 Mixed model 71
   
2M X 1M
(a) C  C ;
D2 Z0 M 0 D1 Z0 M
   
R2 M X R1 M
(b) C C ;
D2 Z0 M 0 D1 Z0 M
   
M 2 M M 1 M
(c) C C ;
D2 Z0 M D1 Z0 M
   
MR2 M MR1 M
(d) C  C :
D2 Z0 M D1 Z0 M

14.14 (Continued . . . ) Both the BLUE.X j M1 / continues to be the BLUE.X j M2 /


and the BLUP. j M1 / continues to be the BLUP. j M2 / i any of the fol-
lowing equivalent conditions holds:
   
2M 1M
(a) C C ;
D2 Z0 M D1 Z0 M
   
R2 M R1 M
(b) C  C ;
D2 Z0 M D1 Z0 M
(c) C .V2 X? ?
 /  C .V1 X /, where
   
Ri 0 ? In
Vi D and X D M:
0 Di Z0

(d) The matrix V2 can be expressed as


V2 D X N1 X C V1 X? ? 0
 N2 .X / V1 for some N1 , N2 :

14.15 Consider the partitioned linear xed eects model


F D fy; X1 1 C X2 2 ; Rg D fy; X; Rg:
Let M be a linear mixed eects model y D X1 1 C X2  2 C ", where
cov. 2 / D D, cov."/ D R, which we denote as
M D fy; X1 1 ; g D fy; X1 1 ; X2 DX02 C Rg:
Then, denoting D X2 DX02 C R, the following statements hold:
(a) There exists a matrix L such that Ly is the BLUE of M2 X1 1 under the
models F and M i N .X1 W X2 W RX? 1 /  N .M2 X1 W 0 W 0/.

(b) Every representation of the BLUE of M2 X1 1 under F is also the BLUE


of M2 X1 1 under M i C .X2 W RX? / C .X? 1 /.

(c) Every representation of the BLUE of M2 X1 1 under M is also the


BLUE of M2 X1 1 under F i C .X2 W RX? /  C .X? 1 /.
72 Formulas Useful for Linear Regression Analysis and Related Matrix Theory

15 Multivariate linear model

15.1 Instead of one response variable y, consider d response variables y1 ; : : : ; yd .


Let the n observed values of these d variables be in the data matrix Ynd
while Xnp is the usual model matrix:
0 0 1 0 0 1
y.1/ x.1/
B :: C B :: C
Y D .y1 W : : : W yd / D @ : A ; X D .x1 W : : : W xp / D @ : A :
y0.n/ x0.n/

Denote B D . 1 W : : : W d / 2 Rpd , and assume that


yj D Xj C "j ; cov."j / D j2 In ; E."j / D 0; j D 1; : : : ; d;
.y1 W : : : W yd / D X. 1 W : : : W d / C ."1 W : : : W "d /; Y D XB C ":
The columns of Y, y1 ; : : : ; yd , are n-dimensional random vectors such that
yj  .Xj ; j2 In /; cov.yj ; yk / D j k In ; j; k D 1; : : : ; d:
Transposing Y D XB C " yields
.y.1/ W : : : W y.n/ / D B0 .x.1/ W : : : W x.n/ / C .".1/ W : : : W ".n/ /;
where the (transposed) rows of Y, i.e., y.1/ ; : : : ; y.n/ , are independent d -
dimensional random vectors such that
y.i/  .B0 x.i/ ; /; cov.y.i/ ; y.j / / D 0; i j:
Notice that in this setup rows (observations) are independent but columns
(variables) may correlate. We denote this model shortly as A D fY; XB; g.

15.2 Denoting
0
1 0 10 1
y1 X ::: 0 1
: : :: A @ :: A
y D vec.Y/ D @ ::: A ; @
E.y / D :: : : : :
yd 0 ::: X d
D .Id X/ vec.B/;
we get
0 1
12 In 12 In : : : 1d In
B : :: :: C
cov.y / D @ :: : : A D In ;
d1 In d 2 In : : : d2 In
and hence the multivariate model can be rewritten as a univariate model
B D fvec.Y/; .Id X/ vec.B/; In g WD fy ; X  ; V g:

15.3 By analogy of the univariate model, we can estimate XB by minimizing


tr.Y  XB/0 .Y  XB/ D kY  XBk2F :
15 Multivariate linear model 73

The resulting XBO D HY is the OLSE of XB. Moreover, we have


.Y  XB/0 .Y  XB/ L .Y  HY/0 .Y  HY/ D Y0 MY WD Eres 8 B;
where the equality holds i XB D XBO D PX Y D HY.

15.4 Under multinormality, Y0 MY  Wd n  r.X/; , and if r.X/ D p, we have


MLE.B/ D BO D .X0 X/1 X0 Y, MLE./ D n1 Y0 MY D n1 Eres .

15.5 C .V X /  C .X / H) BLUE.X  / D OLSE.X  /

15.6 Consider a linear hypothesis H : K0 B D D, where K 2 Rpq


q and D 2 Rqd
0
are such that K B is estimable, Then the minimum EH , say, of
.Y  XB/0 .Y  XB/ subject to K0 B D D
occurs (in the Lwner sense) when XB equals
XBO H D XBO  X.X0 X/ KK0 .X0 X/ K1 .K0 BO  D/;
and so we have, corresponding to 7.17 (p. 35),
EH  Eres D .K0 BO  D/0 K0 .X0 X/ K1 .K0 BO  D/:
The hypothesis testing in multivariate model can be based on some appropri-
ate function of .EH  Eres /E1
res , or on some closely related matrix.

15.7 Consider two independent samples Y01 D .y.11/ W : : : W y.1n1 / / and Y02 D
.y.21/ W : : : W y.2n2 / / so that each y.1i/  Nd .1 ; / and y.2i/  Nd .2 ; /.
Then the test statistics for the hypothesis 1 D 2 can be be based, e.g., on
the

n 1 n2
ch1 .EH  Eres /E1  D ch 1 .N
y 1  N
y 2 /.N
y 1  N
y 2 / 0 1
E
res
n1 C n2 res

D .Ny1  yN 2 /0 S1
 .N y1  yN 2 /;
.n1 Cn2 2/n1 n2
where D n1 Cn2
, S D 1
E
n1 Cn2 2 res
D 1
T ,
n1 Cn2 2 
and

Eres D Y0 .In  H/Y D Y01 Cn1 Y1 C Y02 Cn2 Y2 D T ;


EH  Eres D Y0 .In  Jn /Y  Y0 .In  H/Y D Y0 .H  Jn /Y
D n1 .Ny1  yN /.Ny1  yN /0 C n2 .Ny2  yN /.Ny2  yN /0
n1 n2
D .Ny1  yN 2 /.Ny1  yN 2 /0 WD EBetween ;
n1 C n2
EH D EBetween C Eres :
Hence one appropriate test statistics appears to be a function of the squared
Mahalanobis distance
MHLN2 .Ny1 ; yN 2 ; S / D .Ny1  yN 2 /0 S1 y1  yN 2 /:
 .N

One such statistics is n1 n2


n1 Cn2
.Ny1  yN 2 /0 S1 y1  yN 2 / D T 2 D Hotellings T 2 .
 .N
74 Formulas Useful for Linear Regression Analysis and Related Matrix Theory

16 Principal components, discriminant analysis, factor analysis

16.1 Principal component analysis, PCA. Let a p-dimensional random vector z


have E.z/ D 0 and cov.z/ D . Consider the eigenvalue decomposition
of : D TT0 , D diag.1 ; : : : ; n /, T0 T D In , T D .t1 W : : : W tn /,
where 1  2      n  0 are the ordered eigenvalues of . Then the
random variable t0i z, which is the i th element of the random vector T0 z, is the
ith (population) principal component of z.

16.2 Denote T.i/ D .t1 W : : : W ti /. Then, in the above notation,


 max
0
var.b0 z/ D var.t01 z/ D 1 ,
b bD1

 max var.b0 z/ D var.t0i z/ D i ,


b0 bD1
T0.i 1/ bD0

i.e., t0i z has maximum variance of all normalized linear combinations un-
correlated with the elements of T0.i1/ z.

16.3 Predictive approach to PCA. Let Apk have orthonormal columns, and con-
sider the best linear predictor of z on the basis of A0 z. Then, the Euclidean
norm of the covariance matrix of the prediction error, see 6.14 (p. 29),
k  A.A0 A/1 A0 kF D k 1=2 .Id  P1=2 A / 1=2 kF ;
is a minimum when A is chosen as T.k/ D .t1 W : : : W tk /. Moreover, minimiz-
ing the trace of the covariance matrix of the prediction error, i.e., maximizing
tr.P1=2 A / yields the same result.

16.4 Sample principal components, geometric interpretation. Let X Q be an n  p


centered data matrix. How should Gnk be chosen if we wish to minimize the
sum of orthogonal distances (squared) of the observations xQ .i/ from C .G/?
The function to be minimized is
Q  XP
kEk2F D kX Q G k2 D tr.X
Q 0 X/
Q  tr.PG X
Q 0 X/;
Q
F

and the solution is G P D T.k/ D .t1 W : : : W tk /, where tk are the rst k


orthonormal eigenvectors of X Q (which are the same as those of the corre-
Q 0X
sponding covariance matrix). The new projected observations are the columns
Q 0 D PT X
of matrix T.k/ T0.k/ X Q 0 . In particular, if k D 1, the new projected
.k/
observations are the columns of
Q 0 D t1 .t01 X
t1 t01 X Q 0 / D t1 .t01 xQ .1/ ; : : : ; t01 xQ .n/ / WD t1 s01 ;

where the vector s1 D Xt Q 1 D .t0 xQ .1/ ; : : : ; t0n xQ .n/ /0 2 Rn comprises the val-
1
ues (scores) of the rst (sample) principal component; the i th individual has
the score t0i xQ .i/ on the rst principal component. The scores are the (signed)
lengths of the new projected observations.
16 Principal components, discriminant analysis, factor analysis 75

16.5 PCA and the matrix approximation. The matrix approximation of the centered
data matrix XQ by a matrix of a lower rank yields the PCA. The scores can be
obtained directly from the SVD of X:Q X Q D UT0 . With k D 1, the scores
of the j th principal component are in the j th column of U D XT. Q The
0Q0
columns of matrix T X represent the new rotated observations.

16.6 Discriminant analysis. Let x denote a d -variate random vector with E.x/ D
1 in population 1 and E.x/ D 2 in population 2, and cov.x/ D (pd) in
both populations. If
a0 .1  2 /2 a0 .1  2 /2 a0 .1  2 /2
max D max D ;
a0 var.a0 x/ a0 a0 a a0 a
then a0 x
is called a linear discriminant function for the two populations.
Moreover,
a0 .1  2 /2
max D .1 2 /0 1 .1 2 / D MHLN2 .1 ; 2 ; /;
a0 a0 a
and any linear combination a0 x with a D b  1 .1  2 /, where b 0
is an arbitrary constant, is a linear discriminant function for the two popula-
tions. In other words, nding that linear combination a0 x for which a0 1 and
a0 2 are as distant from each other as possibledistance measure in terms
of standard deviation of a0 x, yields a linear discriminant function a0 x.

16.7 Let U01 D .u.11/ W : : : W u.1n1 / / and U02 D .u.21/ W : : : W u.2n2 / / be indepen-
dent random samples from d -dimensional populations .1 ; / and .2 ; /,
respectively. Denote, cf. 5.46 (p. 26), 15.7 (p. 73),
Ti D U0i Cni Ui ; T D T1 C T2 ;
1
S D T ; n D n1 C n2 ;
n1 C n2  2
and uN i D 1
ni
U0i 1ni , uN D n1 .U01 W U02 /1n . Let S be pd. Then

a0 .uN 1  uN 2 /2
max
a0 a 0 S a
D .uN 1  uN 2 /0 S1
 .u N 1  uN 2 / D MHLN2 .uN 1 ; uN 2 ; S /
a0 .uN 1  uN 2 /.uN 1  uN 2 /0 a a0 Ba
D .n  2/ max D .n  2/ max ;
a0 a0 T a a0 a0 T a
P
where B D .uN 1  uN 2 /.uN 1  uN 2 /0 D n1nn2 2iD1 ni .uN i  u/. N 0 . The vector
N uN i  u/
1 N
of coecients of the discriminant function is given by a D S .u1  uN 2 / or
any vector proportional to it.

16.8 The model for factor analysis is x D  C Af C ", where x is an observable


random vector of p components; E.x/ D  and cov.x/ D . Vector f is an
76 Formulas Useful for Linear Regression Analysis and Related Matrix Theory

m-dimensional random vector, m


p, whose elements are called (common)
factors. The elements of " are called specic or unique factors. The matrix
Apm is the unknown matrix of factor loadings. Moreover,
2 2
E."/ D 0; cov."/ D D diag. 1;:::; p /;
E.f/ D 0; cov.f/ D D Im ; cov.f; "/ D 0:
The fundamental equation for factor analysis is cov.x/ D cov.Af/ C cov."/,
i.e., D AA0 C . Then
   
x A
cov D and cov.f  Lx/ L cov.f  A0 1 x/ for all L.
f A0 I m
The matrix L D A0 1 can be represented as
L D A0 1 D .A0 1 A C Im /1 A0 1 ;
and the individual factor scores can be obtained by L .x  /.

17 Canonical correlations
 
17.1 Let z D xy denote a d -dimensional random vector with (possibly singular)
covariance matrix . Let x and y have dimensions d1 and d2 D d  d1 ,
respectively. Denote
   
x xx xy
cov DD ; r. xy / D m; r. xx / D h
r. yy /:
y yx yy
Let %21 be the maximum value of cor2 .a0 x; b0 y/, and let a D a1 and b D b1
be the
pcorresponding maximizing values of a and b. Then the positive square
root %21 is called the rst (population) canonical correlation between x and
y, denoted as cc1 .x; y/ D %1 , and u1 D a01 x and v1 D b01 y are called the rst
(population) canonical variables.

17.2 Let %22 be the maximum value of cor 2 .a0 x; b0 y/, where a0 x is uncorrelated
with a01 x and b0 y is uncorrelated with b01 y, and let u 0
p2 D a2 x and v2 D b2 y
0

be the maximizing values. The positive square root %22 is called the second
canonical correlation, denoted as cc2 .x; y/ D %2 , and u2 and v2 are called
the second canonical variables. Continuing in this manner, we obtain h pairs
of canonical variables u D .u1 ; u2 ; : : : ; uh /0 and v D .v1 ; v2 ; : : : ; vh /0 .

17.3 We have
.a0 xy b/2 .a0 C1=2 xy C1=2 b  /2
cor 2 .a0 x; b0 y/ D
xx yy
D ;
a0 xx a  b0 yy b a0 a  b0 b

where a D 1=2 1=2


xx a, b D yy b. In view of 23.3 (p. 106), we get
17 Canonical correlations 77

max cor 2 .a0 x; b0 y/ D sg21 . C1=2


xx xy C1=2
yy /
a;b
D ch1 . C C 2 2
xx xy yy yx / D cc1 .x; y/ D %1 :

17.4 The minimal angle between the subspaces A D C .A/ and B D C .B/ is
dened to be the number 0
min

=2 for which
.0 A0 B/2
cos2 min D max cos2 .u; v/ D max :
u2A; v2B A0 0 A0 A  0 B0 B
u0; v0 B0

17.5 Let Ana and Bnb be given matrices. Then


.0 A0 B/2 .01 A0 B 1 /2
max D
A0 0 A0 A  0 B0 B 01 A0 A1  01 B0 B 2
B0
01 A0 PB A1
D D ch1 .PA PB / D 21 ;
01 A0 A1
where .21 ; 1 / is the rst proper eigenpair for .A0 PB A; A0 A/ satisfying
A0 PB A1 D 21 A0 A1 ; A1 0:
The vector 1 is the proper eigenvector satisfying
B0 PA B 1 D 21 B0 B 1 ; B 1 0:

17.6 Consider an n-dimensional random vector u such that cov.u/ D In and dene
x D A0 u and y D B0 u where A 2 Rna and B 2 Rnb . Then
   0   0   
x Au A A A0 B xx xy
cov D cov D WD D :
y B0 u B0 A B0 B yx yy
Let %i denote the ith largest canonical correlation between the random vectors
A0 u and B0 u and let 01 A0 u and 01 B0 u be the rst canonical variables. In view
of 17.5,
.0 A0 B/2 01 A0 PB A1
%21 D max D D ch1 .PA PB /:
A0 0 A0 A  0 B0 B 01 A0 A1
B0

In other words, .%21 ; 1 / is the rst proper eigenpair for .A0 PB A; A0 A/:
A0 PB A1 D %21 A0 A1 ; A1 0:
Moreover, .%21 ; A1 / is the rst eigenpair of PA PB : PA PB A1 D %21 A1 .

17.7 Suppose that r D r.A/


r.B/ and denote
cc.A0 u; B0 u/ D f%1 ; : : : ; %m ; %mC1 ; : : : ; %h g D the set of all ccs;
ccC .A0 u; B0 u/ D f%1 ; : : : ; %m g
D the set of nonzero ccs, m D r.A0 B/.
78 Formulas Useful for Linear Regression Analysis and Related Matrix Theory

Then
(a) there are h D r.A/ pairs of canonical variables 0i A0 u, 0i B0 u, and h
corresponding canonical correlations %1  %2      %h  0,
(b) the vectors i are the proper eigenvectors of A0 PB A w.r.t. A0 A, and the
%2i s are the corresponding proper eigenvalues,
(c) the %2i s are the h largest eigenvalues of PA PB ,
(d) the nonzero %2i s are the nonzero eigenvalues of PA PB , i.e.,
cc2C .A0 u; B0 u/ D nzch.PA PB / D nzch. yx  
xx xy yy /;

(e) the vectors i are the proper eigenvectors of B0 PA B w.r.t. B0 B,


(f) the number of nonzero %i s is m D r.A0 B/ D r.A/dim C .A/\C .B/? ,
(g) the number of unit %i s is u D dim C .A/ \ C .B/,
(h) the number of zero %i s is s D r.A/  r.A0 B/ D dim C .A/ \ C .B/? .

17.8 Let u denote a random vector with covariance matrix cov.u/ D In and let
K 2 Rnp , L 2 Rnq , and F has the property C .F/ D C .K/ \ C .L/. Then
(a) The canonical correlations between K0 QL u and L0 QK u are all less than
1, and are precisely those canonical correlations between K0 u and L0 u
that are not equal to 1.
(b) The nonzero eigenvalues of PK PL  PF are all less than 1, and are pre-
cisely those canonical correlations between the vectors K0 u and L0 u that
are not equal to 1.

18 Column space properties and rank rules

18.1 For conformable matrices A and B, the following statements hold:


(a) N .A/ D N .A0 A/,
(b) C .A/? D C .A? / D N .A0 /,
(c) C .A/  C .B/ () C .B/?  C .A/? ,
(d) C .A/ D C .AA0 /,
(e) r.A/ D r.A0 / D r.AA0 / D r.A0 A/,
(f) Rn D C .Anm /  C .Anm /? D C .A/  N .A0 /,
(g) r.Anm / D n  r.A? / D n  dim N .A0 /,
(h) C .A W B/ D C .A/ C C .B/,
18 Column space properties and rank rules 79

(i) r.A W B/ D r.A/ C r.B/  dim C .A/ \ C .B/,


 
A
(j) r D r.A/ C r.B/  dim C .A0 / \ C .B0 /,
B
(k) C .A C B/  C .A/ C C .B/ D C .A W B/,
(l) r.A C B/
r.A W B/
r.A/ C r.B/,
(m) C .A W B/? D C .A/? \ C .B/? .

18.2 (a) LAY D MAY & r.AY/ D r.A/ H) LA D MA rank cancel-


lation rule
(b) DAM D DAN & r.DA/ D r.A/ H) AM D AN

18.3 r.AB/ D r.A/ & FAB D 0 H) FA D 0

18.4 (a) r.AB/ D r.A/ H) r.FAB/ D r.FA/ for all F


(b) r.AB/ D r.B/ H) r.ABG/ D r.BG/ for all G

18.5 r.AB/ D r.A/  dim C .A0 / \ C .B/? rank of the product

18.6 r.A W B/ D r.A/ C r.I  PA /B D r.A/ C r.I  AA /B


D r.A/ C r.B/  dim C .A/ \ C .B/
 
A
18.7 r D r.A/ C r.I  PA0 /B0  D r.A/ C rB0 .I  A A/
B
D r.A/ C r.B/  dim C .A0 / \ C .B0 /

18.8 r.A0 UA/ D r.A0 U/


() C .A0 UA/ D C .A0 U/
() A0 UA.A0 UA/ A0 U D A0 U [holds e.g. if U L 0]

18.9 r.A0 UA/ D r.A/ () C .A0 UA/ D C .A0 / () A0 UA.A0 UA/ A0 D A0

18.10 r.A C B/ D r.A/ C r.B/


() dim C .A/ \ C .B/ D 0 D dim C .A0 / \ C .B0 /

18.11 C .U C V/ D C .U W V/ if U and V are nnd

18.12 C .G/ D C .G / H) C .AG/ D C .AG /

18.13 C .Anm / D C .Bnp /


i 9 F W Anm D Bnp Fpm ; where C .B0 / \ C .F? / D f0g
80 Formulas Useful for Linear Regression Analysis and Related Matrix Theory

18.14 AA0 D BB0 () 9 orthogonal Q W A D BQ

18.15 Let y 2 Rn be a given vector, y 0. Then y0 Qx D 0 8 Qnp H) x D 0.

18.16 Frobenius inequality: r.AZB/  r.AZ/ C r.ZB/  r.Z/

18.17 Sylvesters inequality:


r.Amn Bnp /  r.A/ C r.B/  n; with equality i N .A/  C .B/

18.18 For A 2 Rnm we have


(a) .In  AA /0 2 fA? g,
(b) In  .A /0 A0 2 fA? g,
(c) In  .A0 / A0 2 fA? g,
(d) Im  A A 2 f.A0 /? g,
(e) In  AAC D In  PA D QA 2 fA? g.
  ( ? )   ( ? )
In Anp Bnq QA 0 Anp
18.19 QA 2 ; 2 ;
B0 0 Iq 0 0 0qp
QA D I  P A

  ( ? )
In Bnq
18.20 2
B0 Iq

18.21 Consider Anp , Znq , D D diag.d1 ; : : : ; dq / 2 PDq , and suppose that


Z0 Z D Iq and C .A/  C .Z/. Then D1=2 Z0 .In  PA / 2 f.D1=2 Z0 A/? g.

18.22 C .A/ \ C .B/ D C A.A0 B? /?  D C A.A0 QB /?  D C A.I  PA0 QB /

18.23 C .A/ \ C .B/? D C A.A0 B/?  D C A.I  PA0 B / D C PA .I  PA0 B /

18.24 C .A/ \ C .B/ D C AA0 .AA0 C BB0 / BB0 

18.25 Disjointness: C .A/ \ C .B/ D f0g. The following statements are equivalent:
(a) C .A/ \ C .B/ D f0g,
 0  0 
A 0 0  0 0 A 0
(b) .AA C BB / .AA W BB / D ,
B0 0 B0
(c) A0 .AA0 C BB0 / AA0 D A0 ,
(d) A0 .AA0 C BB0 / B D 0,
18 Column space properties and rank rules 81

(e) .AA0 C BB0 / is a generalized inverse of AA0 ,


(f) A0 .AA0 C BB0 / A D PA0 ,
   0
0 A
(g) C  C ,
B0 B0
(h) N .A W B/  N .0 W B/,
(i) Y.A W B/ D .0 W B/ has a solution for Y,
 
  PA0 0
(j) P A0 D ,
B0 0 P B0
(k) r.QB A/ D r.A/,
(l) C .A0 QB / D C .A0 /,
(m) PA0 QB A0 D A0 ,
(n) ch1 .PA PB / < 1,
(o) det.I  PA PB / 0.

18.26 Let us denote PAB D A.A0 QB A/ A0 QB , QB D I  PB . Then the following


statements are equivalent:
(a) PAB is invariant w.r.t. the choice of .A0 QB A/ ,
(b) C A.A0 QB A/ A0 QB  is invariant w.r.t. the choice of .A0 QB A/ ,
(c) r.QB A/ D r.A/,
(d) PAB A D A,
(e) PAB is the projector onto C .A/ along C .B/  C .A W B/? ,
(f) C .A/ \ C .B/ D f0g,
(g) P.AWB/ D PAB C PBA .

18.27 Let A 2 Rnp and B 2 Rnq . Then


r.PA PB QA / D r.PA PB / C r.PA W PB /  r.A/  r.B/
D r.PA PB / C r.QA PB /  r.B/
D r.PB PA / C r.QB PA /  r.A/
D r.PB PA QB /:

18.28 (a) C .APB / D C .AB/,


(b) r.AB/ D r.APB / D r.PB A0 / D r.B0 A0 / D r.PB PA0 / D r.PA0 PB /.
82 Formulas Useful for Linear Regression Analysis and Related Matrix Theory

19 Inverse of a matrix

19.1 (a) A; : submatrix of Ann , obtained by choosing the elements of A


which lie in rows and columns ; and are index sets of the rows
and the columns of A, respectively.
(b) A D A; : principal submatrix; same rows and columns chosen.
(c) ALi D i th leading principal submatrix of A: ALi D A1; : : : ; i .
(d) A.; /: submatrix of A, obtained by choosing the elements of A which
do not lie in rows and columns .
(e) A.i; j / D submatrix of A, obtained by deleting row i and column j .
(f) minor.aij / D det.A.i; j / D ij th minor of A corresponding to aij .
(g) cof.aij / D .1/iCj minor.aij / D ij th cofactor of A corresponding to
aij .
(h) det.A/ D principal minor.
(i) det.ALi / D leading principal minor of order i .

19.2 Determinant. The determinant of matrix Ann , denoted as jAj or det.A/, is


det.a/ D a when a 2 R; when n > 1, we have the Laplace expansion of the
determinant by minors along the i th row:
X
n
det.A/ D aij cof.aij /; i 2 f1; : : : ; ng:
j D1

19.3 An alternative equivalent denition of the determinant is the following:


X
det.A/ D .1/f .i1 ;:::;in / a1i1 a2i2    anin ;
where the summation is taken over all permutations fi1 ; : : : ; in g of the set of
integers f1; : : : ; ng, and the function f .i1 ; : : : ; in / equals the number of trans-
positions necessary to change fi1 ; : : : ; in g to f1; : : : ; ng. A transposition is the
interchange of two of the integers. The determinant produces all products of
n terms of the elements of A such that exactly one element is selected from
each row and each column of A; there are n of such products.

19.4 If Ann Bnn D In then B D A1 and A is said to be nonsingular. Ann is


nonsingular i det.A/ 0 i rank.A/ D n.
   
a b 1 d b
19.5 If A D , then A1 D , det.A/ D ad  bc.
c d ad  bc c a
If A D faij g, then
19 Inverse of a matrix 83

1 1
A1 D faij g D cof.A/0 D adj.A/;
det.A/ det.A/
where cof.A/ D fcof.aij /g. The matrix adj.A/ D cof.A/0 is the adjoint
matrix of A.
 
A11 A12
19.6 Let a nonsingular matrix A be partitioned as A D , where A11
A21 A22
is a square matrix. Then
   
I 0 A11 0 I A1 A12
(a) A D 11
A21 A1
11 I 0 A22  A21 A1 11 A12 0 I
 1 
A11 C A1 1 1 1
11 A12 A221 A21 A11 A11 A12 A221
1
(b) A1 D 1 1 1
A221 A21 A11 A221
 
A1 A1 1
112 A12 A22
D 112
A1 1 1 1 1
22 A21 A112 A22 C A22 A21 A112 A12 A22
1

 1   1 
A11 0 A11 A12
D C A1 1
221 .A21 A11 W I/; where
0 0 I
(c) A112 D A11  A12 A1
22 A21 D the Schur complement of A22 in A;
A221 D A22  A21 A1
11 A12 WD A=A11 :

(d) For possibly singular A the generalized Schur complements are


A112 D A11  A12 A
22 A21 ; A221 D A22  A21 A
11 A12 :

(e) r.A/ D r.A11 / C r.A221 / if C .A12 /  C .A11 /; C .A021 /  C .A011 /


D r.A22 / C r.A112 / if C .A21 /  C .A22 /; C .A012 /  C .A022 /
(f) jAj D jA11 jjA22  A21 A
11 A12 j if C .A12 /  C .A11 /,
C .A021 /  C .A011 /
D jA22 jjA11  A12 A
22 A21 j if C .A21 /  C .A22 /,
C .A012 /  C .A022 /
(g) Let Aij 2 Rnn and A11 A21 D A21 A11 . Then jAj D jA11 A22 
A21 A12 j.

19.7 WedderburnGuttman theorem. Consider A 2 Rnp , x 2 Rn , y 2 Rp and


suppose that WD x0 Ay 0. Then in view of the rank additivity on the Schur
complement, see 19.6e, we have
 
A Ay
r 0 D r.A/ D r.x0 Ay/ C r.A  1 Ayx0 A/;
x A x0 Ay
and hence rank.A  1 Ayx0 A/ D rank.A/  1.
 1  
I A I A Enn Fnm
D
19.8
0 I 0 I
; 0 Gmm D jEjjGj
84 Formulas Useful for Linear Regression Analysis and Related Matrix Theory

19.9 Let A be nnd. Then there exists L such that A D L0 L, where


 0  0   
0 L1 L1 L1 L01 L2 A11 A12
ADLLD .L1 W L2 / D D :
L02 L02 L1 L02 L2 A21 A22
If A is positive denite, then
 
.L01 Q2 L1 /1 .L01 Q2 L1 /1 L01 L2 .L02 L2 /1
A1 D
.L02 L2 /1 L02 L1 .L01 Q2 L1 /1 .L02 Q1 L2 /1
 11 12 
A A
D ; Qj D I  PLj ; L0i Qj Li D Ai ij ;
A21 A22
where A11 and A12 (and correspondingly A22 and A21 ) can be expressed also
as
A11 D .L01 L1 /1 C .L01 L1 /1 L01 L2 .L02 Q1 L2 /1 L02 L1 .L01 L1 /1 ;
A12 D .L01 L1 /1 L01 L2 .L02 Q1 L2 /1 :
 1  
0 1 n 10 X0 1=n C xN 0 T1 xN Nx0 T1
19.10 .X X/ D D xx xx
X00 1 X00 X0 T1
xx x N T1xx
0 1
t 00 t 01 : : : t 0k
Bt 10 t 11 : : : t 1k C
B C
D B :: :: : : :: C
@ : : : : A
t k0 t k1
::: t kk

 0 1 !
0 1 X X X0 y G O SSE
1
19.11 .X W y/ .X W y/ D D ; where
y0 X y0 y O 0 SSE
1 1
SSE

19.12 G D X0 .I  Py /X1 D .X0 X/1 C O O 0 =SSE

19.13 jX0 Xj D njX00 .I  J/X0 j D njTxx j D jX00 X0 j10 .I  PX0 /1


D jX00 X0 j  k.I  PX0 /1k2

19.14 j.X W y/0 .X W y/j D jX0 Xj  SSE

19.15 r.X/ D 1 C r.X0 /  dim C .1/ \ C .X0 /


D 1 C r.I  J/X0  D 1 C r.Txx / D 1 C r.Rxx / D 1 C r.Sxx /

19.16 jRxx j 0 () r.X/ D k C 1 () jX0 Xj 0 () jTxx j 0


() r.X0 / D k & 1 C .X0 /

19.17 rij D 1 for some i j implies that jRxx j D 0 (but not vice versa)

19.18 (a) The columns of X D .1 W X0 / are orthogonal i


19 Inverse of a matrix 85

(b) the columns of X0 are centered and cord .X0 / D Rxx D Ik .

19.19 The statements (a) the columns of X0 are orthogonal, (b) cord .X0 / D Ik (i.e.,
orthogonality and uncorrelatedness) are equivalent if X0 is centered.

19.20 It is possible that


(a) cos.x; y/ is high but cord .x; y/ D 0,
(b) cos.x; y/ D 0 but cord .x; y/ D 1.

19.21 cord .x; y/ D 0 () y 2 C .Cx/? D C .1 W x/?  C .1/


[exclude the cases when x 2 C .1/ and/or y 2 C .1/]
   
Txx txy S s
19.22 T D ; S D 0xx xy ;
t0xy tyy sxy syy
0 1
  R11 r12
R rxy B rxy C
R D 0xx D @ r012 1 A
rxy 1
0
rxy 1

19.23 jRj D jRxx j.1  r0xy R1 2


xx rxy / D jRxx j.1  Ryx /
1

19.24 jRxx j D jR11 j.1  r012 R1 2


11 r12 / D jR11 j.1  Rk1:::k1 /

2 2 2
19.25 jRj D jR11 j.1  Rk1:::k1 /.1  Ryx /; Ryx D R2 .yI X/

2 2 2 2 2
19.26 jRj D .1  r12 /.1  R312 /.1  R4123 /    .1  Rk123:::k1 /.1  Ryx /

2 jRj 2 2 2 2
19.27 1  Ryx D D .1  ry1 /.1  ry21 /.1  ry312 /    .1  ryk12:::k1 /
jRxx j
   
Rxx rxy R11 r12
19.28 R1 D D f r ij g; R1
xx D D fr ij g
.rxy /0 r yy N .r12 /0 r kk
   
1 Txx txy T11 t12
19.29 T D ij
D f t g; T1 D D ft ij g
.txy /0 t yy N
xx .t12 /0 t kk

19.30 Rxx D .Rxx  rxy r0xy /1 D R1 1 0 1 2


xx C R xx rxy rxy R xx =.1  R /

19.31 t yy D 1=SSE D the last diagonal element of T1

19.32 t i i D 1=SSE.xi I X.i/ / D the i th diagonal element of T1


xx
86 Formulas Useful for Linear Regression Analysis and Related Matrix Theory

1 1
19.33 r yy D D D the last diagonal element of R1
1 r0xy R1
xx rxy 1  Ryx
2

1
19.34 r i i D
1  Ri2
D VIFi D the i th diagonal element of R1 2 2
xx ; Ri D R .xi I X.i/ /

19.35 Assuming that appropriate inverses exist:


(a) .B  CD1 C0 /1 D B1 C B1 CS1 C0 B1 , where S D D  C0 B1 C.
B1 uv0 B1
(b) .B C uv0 /1 D B1  .
1 C v0 B1 u
1
(c) .B C kii ij0 /1 D B1  B1 ii ij0 B1
1 C kij0 B1 ii
1
D B1  bi .bj /0 :
1 C kb .j i/
(d) X0 .I  ii i0i /X1 D .X0.i/ X.i/ /1
1
D .X0 X/1 C .X0 X/1 x.i/ x0.i/ .X0 X/1 :
1  hi i
(e) .A C kI/1 D A1  kA1 .I C kA1 /1 A1 .

20 Generalized inverses

In what follows, let A 2 Rnm .

20.1 (mp1) AGA D A, (mp1) () G 2 fA g


(mp2) GAG D G, (mp1) & (mp2): G 2 fA
r g reexive g-inverse

(mp3) .AG/0 D AG, (mp1) & (mp3): G 2 fA


`
g: least squares g-i
(mp4) .GA/0 D GA, (mp1) & (mp4): G 2 fA
m g: minimum norm g-i

All four conditions () G D AC : MoorePenrose inverse (unique)

20.2 The matrix Gmn is a generalized inverse of Anm if any of the following
equivalent conditions holds:
(a) The vector Gy is a solution to Ab D y always when this equation is
consistent, i.e., always when y 2 C .A/.
(b1) GA is idempotent and r.GA/ D r.A/, or equivalently
(b2) AG is idempotent and r.AG/ D r.A/.
20 Generalized inverses 87

(c) AGA D A. (mp1)

20.3 AGA D A & r.G/ D r.A/ () G is a reexive generalized inverse of A

20.4 A general solution to a consistent (solvable) equation Ax D y is


A y C .Im  A A/z; where the vector z 2 Rm is free to vary,
and A is an arbitrary (but xed) generalized inverse.

20.5 The class of all solutions to a consistent equation Ax D y is fGyg, where G


varies through all generalized inverses of A.

20.6 The equation Ax D y is consistent i y 2 C .A/ i A0 u D 0 H) y0 u D 0.

20.7 The equation AX D Y has a solution (in X) i C .Y/  C .A/ in which case
the general solution is
A Y C .Im  A A/Z; where Z is free to vary.

20.8 A necessary and sucient condition for the equation AXB D C to have a
solution is that AA CB B D C, in which case the general solution is
X D A CB C Z  A AZBB ; where Z is free to vary.

20.9 Two alternative representations of a general solution to g-inverse of A are


(a) G D A C U  A AUAA ,
(b) G D A C V.In  AA / C .Im  A A/W,
where A is a particular g-inverse and U, V, W are free to vary. In particular,
choosing A D AC , the general representations can be expressed as
(c) G D AC C U  PA0 UPA , (d) G D AC C V.In  PA / C .Im  PA0 /W.

20.10 Let A 0, C 0. Then AB C is invariant w.r.t. the choice of B i C .C/ 


C .B/ & C .A0 /  C .B0 /.

20.11 AA C D C for some A i AA C D C for all A i AA C is invariant


w.r.t. the choice of A i C .C/  C .A/.

20.12 AB B D A () C .A0 /  C .B0 /

20.13 Let C 0. Then C .AB C/ is invariant w.r.t. the choice of B i C .A0 / 


C .B0 / holds along with C .C/  C .B/ or along with r.ABC L/ D r.A/,
where L is any matrix such that C .L/ D C .B/ \ C .C/.
88 Formulas Useful for Linear Regression Analysis and Related Matrix Theory

20.14 rank.AB C/ is invariant w.r.t. the choice of B i at least one of the column
spaces C .AB C/ and C .C0 .B0 / A0 / is invariant w.r.t. the choice of B .

20.15 r.A/ D r.AA A/


r.AA /
r.A /; r.AC / D r.A/

20.16 C .AA / D C .A/ but C .A A/ D C .A0 / () A 2 fA


14 g

20.17 C .Im  A A/ D N .A/; N .In  AA / D C .A/

20.18 C .AC / D C .A0 /

20.19 AC D .A0 A/C A0 D A0 .AA0 /C

20.20 .AC /0 D .A0 /C ; .A /0 2 f.A0 / g

20.21 If A has a full rank decomposition A D UV0 then


AC D V.V0 V/1 .U0 U/1 U0 :

20.22 .AB/C D BC AC () C .BB0 A0 /  C .A0 / & C .A0 AB/  C .B/


 
1 0
20.23 Let Anm have a singular value decomposition A D U V0 . Then:
 1  0 0
1 K
G 2 fA g () G D V U0 ;
L N
 1 
1 K
G 2 fA12 g () G D V U0 ;
L L1 K
 1 
1 0
G 2 fA13 g D fA 
` g () G D V U0 ;
L N
 1 
1 K
G 2 fA14 g D fA 
m g () G D V U0 ;
0 N
 1 
1 0
G D AC () G D V U0 ;
0 0
where K, L, and N are arbitrary matrices.

20.24 Let A D TT0 D T1 1 T01 be the EVD of A with 1 comprising the nonzero
eigenvalues. Then G is a symmetric and nnd reexive g-inverse of A i
 1 
1 L
GDT T0 ; where L is an arbitrary matrix.
L0 L0 1 L
 
B C
20.25 If Anm D and r.A/ D r D r.Brr /, then
D E
20 Generalized inverses 89
 1 
B 0
Gmn D 2 fA g:
0 0

20.26 In (a)(c) below we consider a nonnegative denite A partitioned as


 0   
L1 L1 L01 L2 A11 A12
A D L0 L D D :
L02 L1 L02 L2 A21 A22
(a) Block diagonalization of a nonnegative denite matrix:
   
I .L01 L1 / L01 L2 I A A12
(i) .L1 W L2 / DL 11 WD LU
0 I 0 I
D L1 W .I  PL1 /L2 ;
 0 
L1 L1 0
(ii) U0 L0 LU D U0 AU D ;
0 L02 .I  PL1 /L2
     
I 0 I A 11 A12 D A11 0
(iii) A ;
A21 AD11 I 0 I 0 A22  A21 A 11 A12
   
I 0 A11 0 I A A12
(iv) A D 11 ;
A21 AD
11 I 0 A22  A21 A 11 A12 0 I
where AD  
11 , A11 , and A11 are arbitrary generalized inverses of A11 , and

A221 D A22  A21 A11 A12 D the Schur complement of A11 in A.
(b) The matrix A# is one generalized inverse of A:
 
A A A12 A
A D# 112 112 22 ;
A     
22 A21 A112 A22 C A22 A21 A112 A12 A22

where A112 D L01 Q2 L1 , with B denoting a g-inverse of B and Q2 D


I  PL2 . In particular, the matrix A# is a symmetric reexive g-inverse
of A for any choices of symmetric reexive g-inverses .L02 L2 / and
.L01 Q2 L1 / . We say that A# is in BanachiewiczSchur form.
(c) If any of the following conditions hold, the all three hold:
(i) r.A/ D r.A11 / C r.A22 /, i.e., C .L1 / \ C .L2 / D f0g,
 
C AC AC A12 AC
(ii) A D 112 112 22 ;
AC C C C C
22 A21 A112 A22 C A22 A21 A112 A12 A22
C

 C 
C A11 C AC A12 AC A21 AC AC A12 AC
(iii) A D 11 221 11 11 221 :
AC C
221 A21 A11 AC
221

20.27 Consider Anm and Bnm and let the pd inner product matrix be V. Then
kAxkV
kAx C BykV 8 x; y 2 Rm () A0 VB D 0:
The statement above holds also for nnd V, i.e., t0 Vu is a semi-inner product.
90 Formulas Useful for Linear Regression Analysis and Related Matrix Theory

20.28 Let G be a g-inverse of A such that Gy is a minimum norm solution (w.r.t.


standard norm) of Ax D y for any y 2 C .A/. Then it is necessary and su-
cient that AGA D A and .GA/0 D GA, i.e., G 2 fA 14 g. Such a G is called a
minimum norm g-inverse and denoted as A m .

20.29 Let A 2 Rnm and let the inner product matrix be N 2 PDn and denote a
minimum norm g-inverse as Am.N/
2 Rmn . Then the following statements
are equivalent:
(a) G D A
m.N/
,
(b) AGA D A, .GA/0 N D NGA (here N can be nnd),
(c) GAN1 A0 D N1 A0 ,
(d) GA D PN1 A0 IN , i.e., .GA/0 D PA0 IN1 .

20.30 (a) .N C A0 A/ A0 A0 .N C A0 A/ A 2 fA


m.N/
g,
(b) C .A0 /  C .N/ H) N A0 .A0 N A/ 2 fA
m.N/
g.

20.31 Let G be a matrix (not necessarily a g-inverse) such that Gy is a least-squares


solution (w.r.t. standard norm) of Ax D y for any y, that is
ky  AGyk
ky  Axk for all x 2 Rm ; y 2 Rn :
Then it is necessary and sucient that AGA D A and .AG/0 D AG, that is,
G 2 fA 
13 g. Such a G is called a least-squares g-inverse and denoted as A` .

20.32 Let the inner product matrix be V 2 PDn and denote a least-squares g-inverse
as A`.V/
. Then the following statements are equivalent:
(a) G D A
`.V/
,
(b) AGA D A, .AG/0 V D VAG,
(c) A0 VAG D A0 V (here V can be nnd),
(d) AG D PAIV .

20.33 .A0 VA/ A0 V 2 fA


`.V/ g:


  0

20.34 Let the inner product matrix Vnn be pd. Then .A0 /
m.V/
D A`.V1 / .

20.35 The minimum norm solution for X0 a D k is aQ D .X0 /


m.V/
k WD Gk, where

.X0 /  0  
m.V/ WD G D W X.X W X/ I W D V C XX0 :
Furthermore, BLUE.k0 / D aQ 0 y D k0 G0 y D k0 .X0 W X/ X0 W y.
21 Projectors 91

21 Projectors

21.1 Orthogonal projector. Let A 2 Rnmr and let the inner product
p and the corre-
sponding norm be dened as ht; ui D t0 u, and ktk D ht; ti, respectively.
Further, let A? 2 Rnq be a matrix spanning C .A/? D N .A0 /, and the
columns of the matrices Ab 2 Rnr and A? b
2 Rn.nr/ form bases for
C .A/ and C .A/? , respectively. Then the following conditions are equivalent
ways to dene the unique matrix P:
(a) The matrix P transforms every y 2 Rn ,
y D yA C yA? ; yA 2 C .A/; yA? 2 C .A/? ;
into its projection onto C .A/ along C .A/? ; that is, for each y above, the
multiplication Py gives the projection yA : Py D yA .
(b) P.Ab C A? c/ D Ab for all b 2 Rm , c 2 Rq .
(c) P.A W A? / D .A W 0/.
(d) P.Ab W A?
b
/ D .Ab W 0/.
(e) C .P/  C .A/; minky  Abk2 D ky  Pyk2 for all y 2 Rn .
b

(f) C .P/  C .A/; P0 A D A.


(g) C .P/ D C .A/; P0 P D P.
(h) C .P/ D C .A/; P2 D P; P0 D P.
(i) C .P/ D C .A/; Rn D C .P/  C .In  P/.
(j) P D A.A0 A/ A0 D AAC .
(k) P D Ab .A0b Ab /1 A0b .
 
(l) P D Ao A0o D U I0r 00 U0 , where U D .Ao W A? o /, the columns of Ao and
A?
o forming orthonormal bases for C .A/ and C .A/? , respectively.
The matrix P D PA is the orthogonal projector onto the column space C .A/
w.r.t. the inner product ht; ui D t0 u. Correspondingly, the matrix In  PA is
the orthogonal projector onto C .A? /: In  PA D PA? .

21.2 In 21.321.10 we consider orthogonal projectors dened w.r.t. the standard


inner product in Rn so that they are symmetric idempotent matrices. If P is
idempotent but not necessarily symmetric, it is called an oblique projector or
simply a projector.

21.3 PA C PB is an orthogonal projector () A0 B D 0, in which case PA C PB


is the orthogonal projector onto C .A W B/.
92 Formulas Useful for Linear Regression Analysis and Related Matrix Theory

21.4 P.AWB/ D PA C P.IPA /B

21.5 The following statements are equivalent:


(a) PA  PB is an orthogonal projector, (b) PA PB D PB PA D PB ,
(c) kPA xk  kPB xk for all x 2 R , n
(d) PA  PB L 0,
(e) C .B/  C .A/.
If any of the above conditions hold, then PA PB D P.IPB /A D PC .A/\C .B/? .

21.6 Let L be a matrix with property C .L/ D C .A/ \ C .B/. Then


(a) C .A/ D C L W .I  PL /A D C .A/ \ C .B/  C .I  PL /A,
(b) PA D PL C P.IPL /A D PL C PC .A/\C .L/? ,
(c) PA PB D PL C P.IPL /A P.IPL /B ,
(d) .I  PL /PA D PA .I  PL / D PA  PL D PC .A/\C .L/? ,
(e) C .I  PL /A D C .I  PL /PA  D C .A/ \ C .L/? ,
(f) r.A/ D dim C .A/ \ C .B/ C dim C .A/ \ C .L/? .

21.7 Commuting projectors. Denote C .L/ D C .A/ \ C .B/. Then the following
statements are equivalent;
(a) PA PB D PB PA ,
(b) PA PB D PC .A/\C .B/ D PL ,
(c) P.AWB/ D PA C PB  PA PB ,
(d) C .A W B/ \ C .B/? D C .A/ \ C .B/? ,
(e) C .A/ D C .A/ \ C .B/  C .A/ \ C .B/? ,
(f) r.A/ D dim C .A/ \ C .B/ C dim C .A/ \ C .B/? ,
(g) r.A0 B/ D dim C .A/ \ C .B/,
(h) P.IPA /B D PB  PA PB ,
(i) C .PA B/ D C .A/ \ C .B/,
(j) C .PA B/  C .B/,
(k) P.IPL /A P.IPL /B D 0.

21.8 Let PA and PB be orthogonal projectors of order n  n. Then


(a) 1
chi .PA  PB /
1; i D 1; : : : ; n,
(b) 0
chi .PA PB /
1; i D 1; : : : ; n,
21 Projectors 93

(c) trace.PA PB /
r.PA PB /,
(d) #fchi .PA PB / D 1g D dim C .A/ \ C .B/, where #fchi .Z/ D 1g D the
number of unit eigenvalues of Z.

21.9 The following statements are equivalent:


ch1 .PA PB / < 1; C .A/ \ C .B/ D f0g; det.I  PA PB / 0:

21.10 Let A 2 Rna , B 2 Rnb , and T 2 Rnt , and assume that C .T/  C .A/.
Moreover, let U be any matrix satisfying C .U/ D C .A/ \ C .B/. Then
C .QT U/ D C .QT A/ \ C .QT B/; where QT D In  PT :

21.11 As a generalization to y0 .In  PX /y


.y  Xb/0 .y  Xb/ for all b, we have,
for given Ynq and Xnp , the Lwner ordering
Y0 .In  PX /Y
L .Y  XB/0 .Y  XB/ for all Bpq :
 
21.12 Let P D PP11 P12
21 P22
be an orthogonal projector where P11 is a square matrix.
Then P221 D P22  P21 P 11 P12 is also an orthogonal projector.

21.13 The matrix P 2 Rnn


r is idempotent i any of the following conditions holds:
(a) P D AA for some A,
(b) In  P is idempotent,
(c) r.P/ C r.In  P/ D n,
(d) Rn D C .P/ C .In  P/,
(e) P has a full rank decomposition P D UV0 , where V0 U D Ir ,
 
I 0
(f) P D B r B1 , where B is a nonsingular matrix,
0 0
 
I C
(g) P D D r D0 , where D is an orthogonal matrix and C 2 Rr.nr/ ,
0 0
(h) r.P/ D tr.P/ and r.In  P/ D tr.In  P/.

21.14 If P2 D P then rank.P/ D tr.P/ and


ch.P/ D f0; : : : ; 0; 1; : : : ; 1g; #fch.P/ D 1g D rank.P/;
but ch.P/ D f0; : : : ; 0; 1; : : : ; 1g does not imply P2 D P (unless P0 D P).

21.15 Let P be symmetric. Then P2 D P () ch.P/ D f0; : : : ; 0; 1; : : : ; 1g; here


#fch.P/ D 1g D rank.P/ D tr.P/.
94 Formulas Useful for Linear Regression Analysis and Related Matrix Theory

21.16 P2 D P H) P is the oblique projector onto C .P/ along N .P/, where the
direction space can be also written as N .P/ D C .I  P/.

21.17 AA is the oblique projector onto C .A/ along N .AA /, where the direction
space can be written as N .AA / D C .In  AA /. Correspondingly, A A
is the oblique projector onto C .A A/ along N .A A/ D N .A/.

21.18 AAC D PA ; AC A D PA0

21.19 Generalized projector PAjB . By PAjB we mean any matrix G, say, satisfying
(a) G.A W B/ D .A W 0/,
where it is assumed that C .A/ \ C .B/ D f0g, which is a necessary and suf-
cient condition for the solvability of (a). Matrix G is a generalized projector
onto C .A/ along C .B/ but it need not be unique and idempotent as is the
case when C .A W B/ D Rn . We denote the set of matrices G satisfying (a) as
fPAjB g and the general expression for G is, for example,
G D .A W 0/.A W B/ C F.I  P.AWB/ /; where F is free to vary.

21.20 Suppose that C .A/ \ C .B/ D f0n g D C .C/ \ C .D/. Then


fPCjD g  fPAjB g () C .A/  C .C/ and C .B/  C .D/:

21.21 Orthogonal projector w.r.t. the inner product matrix V. Let A 2 Rnm
r , and
let the inner product (and the corresponding norm) be dened as ht; uiV D
t0 Vu where V is pd. Further, let A? ?
V be an n  q matrix spanning C .A/V D
0 ? 1 ?
N .A V/ D C .VA/ D C .V A /. Then the following conditions are
equivalent ways to dene the unique matrix P :
(a) P .Ab C A?
V c/ D Ab for all b 2 Rm , c 2 Rq .
(b) P .A W A? 1 ?
V / D P .A W V A / D .A W 0/.

(c) C .P /  C .A/; minky  Abk2V D ky  P yk2V for all y 2 Rn .


b

(d) C .P /  C .A/; P0 VA D VA.


(e) P0 .VA W A? / D .VA W 0/.
(f) C .P / D C .A/; P0 V.In  P / D 0.
(g) C .P / D C .A/; P2 D P ; .VP /0 D VP .
(h) C .P / D C .A/; Rn D C .P /  C .In  P /; here  refers to the
orthogonality with respect to the given inner product.
(i) P D A.A0 VA/ A0 V, which is invariant for any choice of .A0 VA/ .
The matrix P D PAIV is the orthogonal projector onto the column space
C .A/ w.r.t. the inner product ht; uiV D t0 Vu. Correspondingly, the matrix
21 Projectors 95

In  PAIV is the orthogonal projector onto C .A?


V /:

In  PAIV D PA? IV D V1 Z.Z0 V1 Z/ Z0 D P0ZIV1 D PV1 ZIV ;


V

where Z 2 fA g, i.e., PAIV D In  P0A? IV1 D In  PV1 A? IV .


?

21.22 Consider the linear model fy; X; Vg and let C .X/?V1


denote the set of vec-
tors which are orthogonal to every vector in C .X/ with respect to the inner
product matrix V1 . Then the following sets are identical:
(a) C .X/?
V1
, (b) C .VX? /, (c) N .X0 V1 /,
(d) C .V1 X/? , (e) N .PXIV1 /, (f) C .In  PXIV1 /.
Denote W D V C XUX0 , where C .W/ D C .X W V/. Then
C .VX? / D C .W X W In  W W/? ;
where W is an arbitrary (but xed) generalized inverse of W. The column
space C .VX? / can be expressed also as
C .VX? / D C .W /0 X W In  .W /0 W0 ? :
Moreover, let V be possibly singular and assume that C .X/  C .V/. Then
C .VX? / D C .V X W In  V V/?  C .V X/? ;
where the inclusion becomes equality i V is positive denite.

21.23 Let V be pd and let X be partitioned as X D .X1 W X2 /. Then


(a) PX1 IV1 C PX2 IV is an orthogonal projector i X01 V1 X2 D 0, in which
case PX1 IV1 C PX2 IV1 D P.X1 WX2 /IV1 ,
(b) P.X1 WX2 /IV1 D PX1 IV1 C P.IP /X2 IV1 ,
X1 IV1

D PVMP 1 X2 IV1 D VM P
P 1 X2 .X0 M 1 0 P
(c) P.IP
X1 IV1
/X2 IV1 2 1 X2 / X2 M1 ,

P 1 D V1  V1 X1 .X0 V1 X1 /1 X0 V1 D M1 .M1 VM1 / M1 .


(d) M 1 1

21.24 Consider a weakly singular partitioned linear model and denote PAIVC D
P 1 D M1 .M1 VM1 / M1 . Then
A.A0 VC A/ A0 VC and M
PXIVC D X.X0 VC X/ X0 VC D V1=2 PVC1=2 X VC1=2
C1=2
D V1=2 .PVC1=2 X1 C P.IP /VC1=2 X2 /V
VC1=2 X1

C1=2
D PX1 IVC C V1=2 P.IP /VC1=2 X2 V
VC1=2 X1

D PX1 IVC C V1=2 PV1=2 MP 1 X2 VC1=2


D PX1 IVC C VM P 1 X2 / X02 M
P 1 X2 .X02 M P 1 PV :
96 Formulas Useful for Linear Regression Analysis and Related Matrix Theory

22 Eigenvalues

22.1 Denition: The scalar  is an eigenvalue of Ann if


At D t for some nonzero vector t, i.e., .A  In /t D 0,
in which case t is an eigenvector of A corresponding to , and .; t/ is an
eigenpair for A. If A is symmetric, then all eigenvalues are real. The eigen-
values are the n roots of the characteristic equation
pA ./ D det.A  In / D 0;
where pA ./ D det.AIn / is the characteristic polynomial of A. We denote
(when eigenvalues are real)
1      n ; chi .A/ D i ;
ch.A/ D f1 ; : : : ; n g D spectrum of A;
nzch.A/ D f i W i 0 g:

22.2 The characteristic equation can be written as


pA ./ D ./n C S1 ./n1 C    C Sn1 ./ C Sn D 0;
pA ./ D .1  /    .n  / D 0;
for appropriate real coecients S1 ; : : : ; Sn : Si is the sum of all i  i principal
minors and hence Sn D pA .0/ D det.A/ D 1    n , S1 D tr.A/ D 1 C
   C n , and thereby always
det.A/ D 1    n ; tr.A/ D 1 C    C n :
For n D 3, we have
det.A  I3 /
D ./3 C tr.A/./2
 
a11 a12 a11 a13 a22 a23
C C C ./ C det.A/:
a21 a22 a31 a33 a32 a33
a b
22.3 The characteristic polynomial of A D b c
is
pA ./ D 2  .a C c/ C .ac  b 2 / D 2  tr.A/   C det.A/:
 p 
The eigenvalues of A are 12 a C c .a  c/2 C 4b 2 .

22.4 The following statements are equivalent (for a nonnull t and pd A):
(a) t is an eigenvector of A, (b) cos.t; At/ D 1,
(c) t0 At  .t0 A1 t/1 D 0 with ktk D 1,
(d) cos.A1=2 t; A1=2 t/ D 1,
22 Eigenvalues 97

(e) cos.y; At/ D 0 for all y 2 C .t/? .

22.5 The spectral radius of A 2 Rnn is dened as .A/ D maxfjchi .A/jg; the
eigenvalue corresponding to .A/ is called the dominant eigenvalue.

EVD Eigenvalue decomposition. A symmetric Ann can be written as


A D TT0 D 1 t1 t01 C    C n tn t0n ;
and thereby
.At1 W At2 W : : : W Atn / D .1 t1 W 2 t2 W : : : W n tn /; AT D T;
where Tnn is orthogonal, D diag.1 ; : : : ; n /, and 1      n are
the ordered eigenvalues of A; chi .A/ D i . The columns ti of T are the
orthonormal eigenvectors of A.
Consider the distinct eigenvalues of A, f1g >    > fsg , and let Tfig be an
n  mi matrix consisting of the orthonormal eigenvectors corresponding to
fig ; mi is the multiplicity of fig . Then
A D TT0 D f1g Tf1g T0f1g C    C fsg Tfsg T0fsg :
With this ordering, is unique and T is unique up to postmultiplying by a
blockdiagonal matrix U D blockdiag.U1 ; : : : ; Us /, where Ui is an orthogonal
mi  mi matrix. If all the eigenvalues are distinct, then U is a diagonal matrix
with diagonal elements equal to 1.

22.6 For a nonnegative denite n  n matrix A with rank r > 0 we have


  0 
1 0 T1
A D TT0 D .T1 W T0 / D T1 1 T01
0 0 T00
D 1 t1 t01 C    C r tr t0r ;
where 1      r > 0, 1 D diag.1 ; : : : ; r /, and T1 D .t1 W : : : W tr /,
T0 D .trC1 W : : : W tn /.

22.7 The nnd square root of the nnd matrix A D TT0 is dened as
A1=2 D T1=2 T0 D T1 1=2 0
1 T1 I .A1=2 /C D T1 1=2
1 T01 WD AC1=2 :

22.8 If App D .ab/Ip Cb1p 1p0 for some a; b 2 R, then A is called a completely
symmetric matrix. If it is a covariance matrix (i.e., nnd) then it is said to have
an intraclass correlation structure. Consider the matrices
App D .a  b/Ip C b1p 1p0 ; pp D .1  %/Ip C %1p 1p0 :
(a) det.A/ D .a  b/n1 a C .p  1/b,
(b) the eigenvalues of A are a C .p  1/b (with multiplicity 1), and a  b
(with multiplicity n  1),
98 Formulas Useful for Linear Regression Analysis and Related Matrix Theory

(c) A is nonsingular i a b and a .p  1/b, in which case


 
1 1 b 0
A D Ip  1p 1p ;
ab a C .p  1/b
(
1 C .p  1/% with multiplicity 1,
(d) ch./ D
1% with multiplicity p  1,

(e) is nonnegative denite ()  p1


1

%
1,
(f) t1 D 1p D eigenvector w.r.t. 1 D 1 C .p  1/%, 0 2 R,
t2 ; : : : ; tp are orthonormal eigenvectors w.r.t. i D 1  %, i D 2; : : : ; p,
t2 ; : : : ; tp form an orthonormal basis for C .1p /? ,
(g) det./ D .1  %/p1 1 C .p  1/%,
(h) 1p D 1 C .p  1/%1p WD 1 1p ; if %  p1 1
, then 1p D 1
1 1p in
which case
p
1p0  1p D 2 0  2 0
1 1p 1p D 1 1p 1p D ;
1 C .p  1/%
 
1 % 1
(i) 1 D Ip  1p 1p0 , for % 1, %  .
1% 1 C .p  1/% p1
(j) Suppose that
   
x xx  xy 0
cov D D .1  %/IpC1 C %1pC1 1pC1 ;
y  0xy 1
where (necessarily)  p1
%
1. Then

p%2
2
%yx D  0xy 
xx  xy D :
1 C .p  1/%
   
x %1p 1p0
(k) Assume that cov D , where
y %1p 1p0
p%
D .1  %/Ip C %1p 1p0 . Then cor.1p0 x; 1p0 y/ D .
1 C .p  1/%

22.9 Consider a completely symmetric matrix App D .a  b/Ip C b1p 1p0 , Then
0 1
ab 0 0 ::: 0 b
B 0 a  b 0 ::: 0 b C
B : :: :: :: :: C
1
L AL D B :B : C WD F;
: : : : C
@ 0 0 0 ::: a  b b A
0 0 0 : : : 0 a C .p  1/b
22 Eigenvalues 99

where L carries out elementary column operations:


   
Ip1 0p1 1 Ip1 0p1
LD 0 ; L D 0 ;
1p1 1 1p1 1
and hence det.A/ D .a  b/p1 a C .p  1/b. Moreover, det.A  Ip / D 0
i det.Ip  F/ D 0.

22.10 The algebraic multiplicity of , denoted as alg mult A ./ is its multiplicity as
a root of det.A  In / D 0. The set f t 0 W At D t g is the set of all
eigenvectors associated with . The eigenspace of A corresponding to  is
f t W .A  In /t D 0 g D N .A  In /:
If alg multA ./ D 1,  is called a simple eigenvalue; otherwise it is a multiple
eigenvalue. The geometric multiplicity of  is the dimension of the eigenspace
of A corresponding to :
geo mult A ./ D dim N .A  In / D n  r.A  In /:
Moreover, geo mult A ./
alg multA ./ for each  2 ch.A/; here the equal-
ity holds e.g. when A is symmetric.

22.11 Similarity. Two n  n matrices A and B are said to be similar whenever there
exists a nonsingular matrix F such that F1 AF D B. The product F1 AF is
called a similarity transformation of A.

22.12 Diagonalizability. A matrix Ann is said to be diagonalizable whenever there


exists a nonsingular matrix Fnn such that F1 AF D D for some diagonal
matrix Dnn , i.e., A is similar to a diagonal matrix. In particular, any sym-
metric A is diagonalizable.

22.13 The following statements concerning the matrix Ann are equivalent:
(a) A is diagonalizable,
(b) geo mult A ./ D alg multA ./ for all  2 ch.A/,
(c) A has n linearly independent eigenvectors.

22.14 Let A 2 Rnm , B 2 Rmn . Then AB and BA have the same nonzero eigen-
values: nzch.AB/ D nzch.BA/. Moreover, det.In  AB/ D det.Im  BA/.

22.15 Let A; B 2 NNDn . Then


(a) tr.AB/
ch1 .A/  tr.B/,
(b) A  B H) chi .A/  chi .B/, i D 1; : : : ; n; here we must have at least
one strict inequality if A B.
100 Formulas Useful for Linear Regression Analysis and Related Matrix Theory

22.16 Interlacing theorem. Let Ann be symmetric and let B be a principal subma-
trix of A of order .n  1/  .n  1/. Then
chiC1 .A/
chi .B/
chi .A/; i D 1; : : : ; n  1:
 
22.17 Let A 2 Rna
r and denote B D A00 A0 , sg.A/ D fi ; : : : ; r g. Then the
nonzero eigenvalues of B are 1 ; : : : ; r ; 1 ; : : : ; r .
In 22.18EY1 we consider Ann whose EVD is A D TT0 and denote
T.k/ D .t1 W : : : W tk /, Tk D .tnkC1 W : : : W tn /.

x0 Ax
22.18 (a) ch1 .A/ D 1 D max D max x0 Ax,
x0 x0 x x0 xD1

x0 Ax
(b) chn .A/ D n D min D min x0 Ax,
x0 x0 x x0 xD1

x0 Ax x0 Ax
(c) chn .A/ D n

1 D ch1 .A/; D Rayleigh quotient,
x0 x x0 x
(d) ch2 .A/ D 2 D max
0
x0 Ax,
x xD1
t01 xD0

(e) chkC1 .A/ D kC1 D max


0
x0 Ax D kC1 ; k D 1; : : : ; n  1.
x xD1
T0.k/ xD0

22.19 Let Ann be symmetric and let k be a given integer, k


n. Then
max tr.G0 AG/ D max tr.PG A/ D 1 C    C k ;
G0 GDIk 0 G GDIk

where the upper bound is obtained when G D .t1 W : : : W tk / D T.k/ .

PST Poincar separation theorem. Let Ann be a symmetric matrix, and let Gnk
be such that G0 G D Ik , k
n. Then, for i D 1; : : : ; k,
chnkCi .A/
chi .G0 AG/
chi .A/:
Equality holds on the right simultaneously for all i D 1; : : : ; k when G D
T.k/ K, and on the left if G D Tk L; K and L are arbitrary orthogonal ma-
trices.

CFT CourantFischer theorem. Let Ann be symmetric and let k be a given integer
with 2
k
n and let B 2 Rn.k1/ . Then
x0 Ax
(a) min max D k ,
B 0 B xD0 x0 x

x0 Ax
(b) max min D nkC1 .
B B xD0 x0 x
0

The result (a) is obtained when B D T.k1/ and x D tk .


22 Eigenvalues 101

EY1 EckartYoung theorem. Let Ann be a symmetric matrix of rank r. Then


min kA  Bk2F D min tr.A  B/.A  B/0 D 2kC1 C    C 2r ;
r.B/Dk r.B/Dk

and the minimum is attained when


B D T.k/ .k/ T0.k/ D 1 t1 t01 C    C k tk t0k :

22.20 Let A be a given n  m matrix of rank r, and let B 2 Rnm


k
, k < r. Then
.A  B/.A  B/0 L .A  APB0 /.A  APB0 /0 D A.Im  PB0 /A0 ;
and hence kA  Bk2F  kA  APB0 k2F . Moreover, if B has the full rank
decomposition B D FG0 , where G0 G D Ik , then PB0 D PG and
.A  B/.A  B/0 L .A  FG0 /.A  FG0 /0 D A.Im  GG0 /A0 :

22.21 Suppose V 2 PDn and C denotes the centering matrix. Then the following
statements are equivalent:
(a) CVC D c 2 C for some c 0,
(b) V D 2 I C a10 C 1a0 , where a is an arbitrary vector and is any scalar
ensuring the positive deniteness of V.
p
The eigenvalues of V in (b) are 2 C10 a na0 a, each with multiplicity one,
and 2 with multiplicity n  2.

22.22 Eigenvalues of A w.r.t. pd B. Let Ann and Bnn be symmetric of which B is


nonnegative denite. Let  be a scalar and w a vector such that
(a) Aw D Bw; Bw 0.
Then we call  a proper eigenvalue and w a proper eigenvector of A with
respect to B, or shortly, .; w/ is a proper eigenpair for .A; B/. There may
exist a vector w 0 such that Aw D Bw D 0, in which case (a) is satised
with arbitrary . We call such a vector w an improper eigenvector of A with
respect to B. The space of improper eigenvectors is C .A W B/? . Consider next
the situation when B is positive denite (in which case the word proper can
be dropped o). Then (a) becomes
(b) Aw D Bw; w 0.
Premultiplying (b) by B1 yields the usual eigenvalue equation B1 Aw D
w, w 0. We denote
(c) ch.A; B/ D ch.B1 A/ D ch.AB1 / D f1 ; : : : ; n g.
The matrix B1 A is not necessarily symmetric but in view of
(d) nzch.B1 A/ D nzch.B1=2 B1=2 A/ D nzch.B1=2 AB1=2 /,
and the symmetry of B1=2 AB1=2 , the eigenvalues of B1 A are all real.
102 Formulas Useful for Linear Regression Analysis and Related Matrix Theory

Premultiplying (a) by B1=2 yields B1=2 Aw D B1=2 w, i.e., B1=2 AB1=2 


B1=2 w D B1=2 w, which shows the equivalence of the following statements:
(e) .; w/ is an eigenpair for .A; B/,
(f) .; B1=2 w/ is an eigenpair for B1=2 AB1=2 ,
(g) .; w/ is an eigenpair for B1 A.

22.23 Rewriting (b) above as .AB/w D 0, we observe that nontrivial solutions w


for (b) exist i det.A  B/ D 0. The expression A  B, with indeterminate
, is called a matrix pencil or simply a pencil.

22.24 Let Ann be symmetric and Bnn positive denite. Then


x0 Ax x0 B1=2  B1=2 AB1=2  B1=2 x
(a) max D max
x0 x0 Bx x0 x0 B1=2  B1=2 x
z0 B1=2 AB1=2 z
D max
z0 z0 z
D ch1 .B1=2 AB1=2 / D ch1 .B1 A/
WD 1 D the largest root of det.A  B/ D 0:
(b) Denote Wi D .w1 W : : : W wi /. The vectors w1 ; : : : ; wn satisfy
x0 Ax w01 Aw1
max D D ch1 .B1 A/ D 1 ;
x0 x0 Bx w01 Bw1
x0 Ax w0i Awi
max D D chi .B1 A/ D i ; i > 1;
W0i 1 BxD0 x0 Bx w0i Bwi
x0

i wi is an eigenvector of B1 A corresponding to the eigenvalue


chi .B1 A/ D i , i.e., i is the i th largest root of det.A  B/ D 0.
.a0 x/2 x0  aa0  x
(c) max D max D ch1 .aa0 B1 / D a0 B1 a.
x0 x0 Bx x0 x0 Bx

22.25 Let Bnn be nonnegative denite and a 2 C .B/. Then


.a0 x/2
max D a0 B a;
Bx0 x0 Bx

where the equality is obtained i Bx D a for some 2 R.

22.26 Let Vpp be nnd with V D diag.V/ being pd. Then


a0 Va
max D ch1 .V1=2 VV1=2 / D ch1 .RV /;
a0 a 0 V a
22 Eigenvalues 103

where RV D V1=2

VV1=2

can be considered as a correlation matrix. More-
over,
a0 Va

p for all a 2 Rp ;
a 0 V a
where the equality is obtained i V D  2 qq0 for some  2 R and some q D
.q1 ; : : : ; qp /0 , and a is a multiple of a D V1=2

1 D 1 .1=q1 ; : : : ; 1=qp /0 .

22.27 Simultaneous diagonalization. Let Ann and Bnn be symmetric. Then there
exists an orthogonal matrix Q such that Q0 AQ and Q0 BQ are both diagonal
i AB D BA.

22.28 Consider the symmetric matrices Ann and Bnn .


(a) If B is pd, then there exists a nonsingular matrix Qnn such that
Q0 AQ D D diag.1 ; : : : ; n /; Q0 BQ D In ;
where ch.B1 A/ D f1 ; : : : ; n g. The columns of Q are the eigenvectors
of B1 A; Q is not necessarily orthogonal.
(b) Let B be nnd with r.B/ D b. Then there exists a matrix Lnb such that
L0 AL D diag.1 ; : : : ; b /; L0 BL D Ib :

(c) Let B be nnd with r.B/ D b, and assume that r.N0 AN/ D r.N0 A/, where
N D B? . Then there exists a nonsingular matrix Qnn such that
   
1 0 I 0
Q0 AQ D ; Q0 BQ D b ;
0 2 0 0
where 1 2 Rbb and 2 2 R.nb/.nb/ are diagonal matrices.

22.29 As in 22.22 (p. 101), consider the eigenvalues of A w.r.t. nnd B but allow now
B to be singular. Let  be a scalar and w a vector such that
(a) Aw D Bw, i.e., .A  B/w D 0; Bw 0.
Scalar  is a proper eigenvalue and w a proper eigenvector of A with respect
to B. The nontrivial solutions w for (a) above exist i det.A  B/ D 0.
The matrix pencil A  B, with indeterminate , is is said to be singular if
det.A  B/ D 0 is satised for any ; otherwise the pencil is regular.

22.30 If i is the i th largest proper eigenvalue of A with respect to B, then we write


chi .A; B/ D i ; 1  2      b ; b D r.B/;
ch.A; B/ D f1 ; : : : ; b g D set of proper eigenvalues of A w.r.t. B;
nzch.A; B/ D set of nonzero proper eigenvalues of A w.r.t. B:
104 Formulas Useful for Linear Regression Analysis and Related Matrix Theory

22.31 Let Ann be symmetric, Bnn nnd, and r.B/ D b, and assume that r.N0 AN/ D
r.N0 A/, where N D B? . Then there are precisely b proper eigenvalues of A
with respect to B, ch.A; B/ D f1 ; : : : ; b g, some of which may be repeated
or null. Also w1 ; : : : ; wb , the corresponding eigenvectors, can be so chosen
that if wi is the i th column of Wnb , then
W0 AW D 1 D diag.1 ; : : : ; b /; W0 BW D Ib ; W0AN D 0:

22.32 Suppose that r.QB AQB / D r.QB A/ holds; here QB D In  PB . Then


(a) nzch.A; B/ D nzch.A  AQB .QB AQB / QB A/B ,
(b) C .A/  C .B/ H) nzch.A; B/ D nzch.AB /,
where the set nzch.AB / is invariant with respect to the choice of the B .

22.33 Consider the linear model fy; X; Vg. The nonzero proper eigenvalues of V
with respect to H are the same as the nonzero eigenvalues of the covariance
matrix of the BLUE.X/.

22.34 If Ann is symmetric and Bnn is nnd satisfying C .A/  C .B/, then
x0 Ax w01 Aw1
max D D 1 D ch1 .BC A/ D ch1 .A; B/;
Bx0 x0 Bx w01 Bw1
where 1 is the largest proper eigenvalue of A with respect to B and w1 the
corresponding proper eigenvector satisfying Aw1 D 1 Bw1 , Bw1 0. If
C .A/  C .B/ does not hold, then x0 Ax=x0 Bx has no upper bound.

22.35 Under a weakly singular linear model where r.X/ D r we have


x0 Hx
max D 1 D ch1 .VC H/ D ch1 .HVC H/ D 1= chr .HVC H/C ;
Vx0 x0 Vx

Q
and hence 1=1 is the smallest nonzero eigenvalue of cov.X/.

22.36 Consider symmetric n  n matrices A and B. Then


X
n X
n
chi .A/ chni1 .B/
tr.AB/
chi .A/ chi .B/:
iD1 iD1

Suppose that A 2 Rnp , B 2 Rpn , and k D min.r.A/; r.B//. Then


X
k X
k
 sgi .A/ sgi .B/
tr.AB/
sgi .A/ sgi .B/:
iD1 iD1

22.37 Suppose that A, B and A  B are nnd n  n matrices. Then


chi .A  B/  chiCk .A/; i D 1; : : : ; n  k;
23 Singular value decomposition & other matrix decompositions 105

and with equality for all i i B D T.k/ .k/ T0.k/ , where .k/ comprises the
rst k eigenvalues of A and T.k/ D .t1 W : : : W tk /.

22.38 For A; B 2 Rnm , where r.A/ D r and r.B/ D k, we have


sgi .A  B/  sgiCk .A/; i C k
r;
and sgi .A  B/  0 for i C k > r. The equality is attained above i k
r
P P
and B D kiD1 sgi .A/ti u0i , where A D riD1 sgi .A/ti u0i is the SVD of A.

23 Singular value decomposition & other matrix decompositions

SVD Singular value decomposition. Matrix A 2 Rnm


r , .m
n/, can be written
as   0 
1 0 V1
A D .U1 W U0 / D UV0 D U1 1 V01 D U  V0
0 0 V00
D 1 u1 v01 C    C r ur v0r ;
where 1 D diag.1 ; : : : ; r /, 1      r > 0,  2 Rnm , and
   
1 0 
D D 2 Rnm ; 1 2 Rrr ;  2 Rmm ;
0 0 0
 D diag.1 ; : : : ; r ; rC1 ; : : : ; m / D the rst m rows of ;
rC1 D rC2 D    D m D 0;
p
i D sgi .A/ D chi .A0 A/
D i th singular value of A; i D 1; : : : ; m;
Unn D .U1 W U0 /; U1 2 R ; U0 U D UU0 D In ;
nr

Vmm D .V1 W V0 /; V1 2 Rmr ; V0 V D VV0 D Im ;


U D .u1 W : : : W um / D the rst m columns of U; U 2 Rnm ;
 2 
0 0 0 2 1 0
V A AV D   D  D 2 Rmm ;
0 0
 2 
0 0 0 2 1 0
U AA U D  D # D 2 Rnn ;
0 0
ui D i th left singular vector of A; i th eigenvector of AA0 ;
vi D i th right singular vector of A; i th eigenvector of A0 A:

23.1 With the above notation,


(a) f12 ; : : : ; r2 g D nzch.A0 A/ D nzch.AA0 /,
(b) C .V1 / D C .A/; C .U1 / D C .A0 /,
(c) Avi D i ui ; A0 ui D i vi ; i D 1; : : : ; m,
106 Formulas Useful for Linear Regression Analysis and Related Matrix Theory

(d) A0 ui D 0; i D m C 1; : : : ; n,
(e) u0i Avi D i ; i D 1; : : : ; m; u0i Avj D 0 for i j .

23.2 Not all pairs of orthogonal U and V satisfying U0 AA0 U D 2# and V0 A0 AV D
2 yield A D UV0 .

23.3 Let A have an SVD A D UV0 D 1 u1 v01 C    C r ur v0r , r.A/ D r. Then


sg1 .A/ D 1 D max x0 Ay subject to x0 x D y0 y D 1;
and the maximum is obtained when x D u1 and y D v1 ,
.x0 Ay/2
sg21 .A/ D 12 D max D ch1 .A0 A/:
x0; y0 x0 x  y0 y

The second largest singular value 2 can be obtained as 2 D max x0 Ay, where
the maximum is taken over the set
f x 2 Rn ; y 2 Rm W x0 x D y0 y D 1; x0 u1 D y0 v1 D 0 g:
The i th largest singular value can be dened correspondingly as
x0 Ay
max p D kC1 ; k D 1; : : : ; r  1;
x0; y0 x 0 x  y0 y
U.k/ xD0;V.k/ yD0

where U.k/ D .u1 W : : : W uk / and V.k/ D .v1 W : : : W vk /; the maximum


occurs when x D ukC1 and y D vkC1 .

23.4 Let A 2 Rnm , B 2 Rnn , C 2 Rmm and B and C are pd. Then
.x0 Ay/2 .x0 B1=2 B1=2 AC1=2 C1=2 y/2
max D max
x0; y0 x0 Bx  y0 Cy x0; y0 x0 B1=2 B1=2 x  y0 C1=2 C1=2 y

.t0 B1=2 AC1=2 u/2


D max D sg21 .B1=2 AC1=2 /:
t0; u0 t 0 t  u0 u
With minor changes the above holds for possibly singular B and C.

23.5 The matrix 2-norm (or the spectral norm) is dened as


 0 0 1=2
x A Ax
kAk2 D max kAxk2 D max
kxk2 D1 x0 x0 x
p
D ch1 .A0 A/ D sg1 .A/;
where
p kxk2 refers to the standard Euclidean vector norm, and sgi .A/ D
chi .A0 A/ D i D the i th largest singular value of A. Obviously we have
kAxk2
kAk2 kxk2 . Recall that
q
kAkF D 12 C    C r2 ; kAk2 D 1 ; where r D rank.A/:
23 Singular value decomposition & other matrix decompositions 107

EY2 EckartYoung theorem. Let Anm be a given matrix of rank r, with the sin-
gular value decomposition A D U1 1 V01 D 1 u1 v01 C    C r ur v0r . Let B
be an n  m matrix of rank k (< r). Then
minkA  Bk2F D kC1
2
C    C r2 ;
B

P k D 1 u1 v0 C    C k uk v0 .
and the minimum is attained taking B D B 1 k

23.6 Let Anm and Bmn have the SVDs A D UA V0 , B D RB S0 , where
A D diag.1 ; : : : ; r /, B D diag.1 ; : : : ; r /, and r D min.n; m/. Then
jtr.AXBY/j
1 1 C    C r r
for all orthogonal Xmm and Ynn . The upper bound is attained when X D
VR0 and Y D SU0 .

23.7 Let A and B be n  m matrices with SVDs corresponding to 23.6. Then


(a) the minimum of kXA  BYkF when X and Y run through orthogonal
matrices is attained at X D RU0 and Y D SV0 ,
(b) the minimum of kA  BZkF when Z varies over orthogonal matrices is
attained when Z D LK0 , where LK0 is the SVD of B0 A.

FRD Full rank decomposition of Anm , r.A/ D r:


A D BC0 ; where r.Bnr / D r.Cmr / D r:

CHO Cholesky decomposition. Let Ann be nnd. Then there exists a lower trian-
gular matrix Unn having all ui i  0 such that A D UU0 .

QRD QR-decomposition. Matrix Anm can be expressed as A D Qnn Rnm ,


where Q is orthogonal and R is upper triangular.

POL Polar decomposition. Matrix Anm (n


m) can be expressed as A D
Pnn Unm , where P is nonnegative denite with r.P/ D r.A/ and UU0 D In .

SCH Schurs triangularization theorem. Let Ann have real eigenvalues 1 ; : : : ; n .


Then there exists an orthogonal Unn such that U0 AU D T, where T is an
upper-triangular matrix with i s as its diagonal elements.

ROT Orthogonal rotation. Denote


   
cos   sin  cos  sin 
A D ; B D :
sin  cos  sin   cos 
Then any 2  2 orthogonal matrix Q is A or B for some  . Transformation
A u.i/ rotates the observation u.i/ by the angle  in the counter-clockwise
108 Formulas Useful for Linear Regression Analysis and Related Matrix Theory

direction, and B u.i/ makes the reection of the observation u.i/ w.r.t. the
 
line y D tan 2 x. Matrix
 
cos  sin 
C D
 sin  cos 
carries out an orthogonal rotation clockwise.

HSD HartwigSpindelbck decomposition. Let A 2 Rnn be of rank r. Then there


exists an orthogonal U 2 Rnn such that
 
K L
ADU U0 ; where  D diag.1 Ir1 ; : : : ; t Ir t /;
0 0
with  being the diagonal matrix of singular values of A, 1 >    > t > 0,
r1 C    C r t D r, while K 2 Rrr , L 2 Rr.nr/ satisfy KK0 C LL0 D Ir .

23.8 Consider the matrix A 2 Rnn with representation HSD. Then:


 0 2   2 
K  K K0 2 L  0
(a) A0 A D U U 0
; AA 0
D U U0 ;
L0 2 K L0 2 L 0 0
 0   
K K K0 L Ir 0
AC A D U U 0
; AA C
D U U0 ,
L0 K L0 L 0 0
(b) A is an oblique projector, i.e., A2 D A, i K D Ir ,
(c) A is an orthogonal projector i L D 0,  D Ir , K D Ir .

23.9 A matrix E 2 Rnn is a general permutation matrix if it is a product of


elementary permutation matrices Eij ; Eij is the identity matrix In with the
i th and j th rows (or equivalently columns) interchanged.

23.10 (a) Ann is nonnegative if all aij  0,


(b) Ann is reducible if there is a permutation matrix Q, such that
 
A11 A12
Q0 AQ D ;
0 A22
where A11 and A22 are square, and it is otherwise irreducible.

PFT PerronFrobenius theorem. If Ann is nonnegative and irreducible, then


(a) A has a positive eigenvalue, %, equal to the spectral radius of A, .A/ D
maxfjchi .A/jg; the eigenvalue corresponding to .A/ is called the dom-
inant eigenvalue,
(b) % has multiplicity 1,
(c) there is a positive eigenvector (all elements  0) corresponding to %.
24 Lwner ordering 109

23.11 Stochastic matrix. If A 2 Rnn is nonnegative and A1n D 1n , then A is a


stochastic matrix. If also 10n A D 10n , then A is a doubly stochastic matrix. If,
in addition, both diagonal sums are 1, then A is a superstochastic matrix.

23.12 Magic square. The matrix


0 1
16 3 2 13
B 5 10 11 8 C
ADB @ 9 6 7 12A ;
C

4 15 14 1
appearing in Albrecht Drers copper-plate engraving Melencolia I is a magic
square, i.e., a k  k array such that the numbers in every row, column and in
each of the two main diagonals add up to the same magic sum, 34 in this case.
The matrix A here denes a classic magic square since the entries in A are
the consecutive integers 1; 2; : : : ; k 2 . The MoorePenrose inverse AC also a
magic square (though not a classic one) and its magic sum is 1=34.

24 Lwner ordering

24.1 Let Ann be symmetric.


(a) The following statements are equivalent:
(i) A is positive denite; shortly A 2 PDn , A >L 0
(ii) x0 Ax > 0 for all vectors x 0
(iii) chi .A/ > 0 for i D 1; : : : ; n
(iv) A D FF0 for some Fnn , r.F/ D n
(v) all leading principal minors > 0
(vi) A1 is positive denite
(b) The following statements are equivalent:
(i) A is nonnegative denite; shortly A 2 NNDn , A L 0
(ii) x0 Ax  0 for all vectors x
(iii) chi .A/  0 for i D 1; : : : ; n
(iv) A D FF0 for some matrix F
(v) all principal minors  0

24.2 A
L B () B  A L 0 () B  A D FF0 for some matrix F
(denition of Lwner ordering)
110 Formulas Useful for Linear Regression Analysis and Related Matrix Theory

24.3 Let A; B 2 NNDn . Then


(a) A
L B () C .A/  C .B/ & ch1 .AB /
1
in which case ch1 .AB / is invariant with respect to B ,
(b) A
L B () C .A/  C .B/ & AB A
L A.

24.4 Let A L B where A >L 0 and B L 0. Then det.A/ D det.B/ H) A D B.


 11 A12 
24.5 Alberts theorem. Let A D A A21 A22 be a symmetric matrix where A11 is a
square matrix. Then the following three statements are equivalent:
(a) A L 0,
(b1 ) A11 L 0, (b2 ) C .A12 /  C .A11 /, (b3 ) A22  A21 A
11 A12 L 0,

(c1 ) A22 L 0, (c2 ) C .A21 /  C .A22 /, (c3 ) A11  A12 A


22 A21 L 0.

24.6 (Continued . . . ) The following three statements are equivalent:


(a) A >L 0,
(b1 ) A11 >L 0, (b2 ) A22  A21 A1
11 A12 >L 0,

(c1 ) A22 >L 0, (c2 ) A11  A12 A1


22 A21 >L 0.

24.7 The following three statements are equivalent:


 
B b
(a) A D L 0,
b0
(b) B  bb0 = L 0,
(c) B L 0; b 2 C .B/; b0 B b
.

24.8 Let U and V be pd. Then:


(a) U  B0 V1 B >L 0 () V  BU1 B0 >L 0,
(b) U  B0 V1 B L 0 () V  BU1 B0 L 0.

24.9 Let A >L 0 and B >L 0. Then: A <L B () A1 >L B1 .

24.10 Let A L 0 and B L 0. Then any two of the following conditions imply the
third: A
L B, AC L BC , r.A/ D r.B/.

24.11 Let A be symmetric. Then A  BB0 L 0 i


A L 0; C .B/  C .A/; and I  B0 A B L 0:

24.12 Consider a symmetric matrix


24 Lwner ordering 111
0 1
1 r12 r13
R D @r21 1 r23 A ; where all rij2
1:
r31 r32 1
Then R is a correlation matrix i R is nod which holds o det.R/ D 1 
2
r12  r13
2
 r232
C 2r12 r13 r23  0, or, equivalently, .r12  r13 r23 /2
.1 
r13 /.1  r23 /.
2 2

24.13 The inertia In.A/ of a symmetric Ann is dened as the triple f


; ; g, where

is the number of positive eigenvalues of A,  is the number that are negative,
and  is the number that are zero. Thus
C  D r.A/ and
C  C  D n.
Matrix A is nnd if  D 0, and pd if  D  D 0.
 
A11 A12
24.14 The inertia of the symmetric partitioned matrix A D satises
A21 A22
(a) In.A/ D In.A11 / C In.A22  A21 A
11 A12 / if C .A12 /  C .A11 /,
(b) In.A/ D In.A22 / C In.A11  A12 A
22 A21 / if C .A21 /  C .A22 /.

24.15 The minus (or rank-subtractivity) partial ordering for Anm and Bnm :
A
 B () r.B  A/ D r.B/  r.A/;
or equivalently, A
 B holds i any of the following conditions holds:
(a) A A D A B and AA D BA for some A 2 fA g,
(b) A A D A B and AA D BA for some A , A 2 fA g,
(c) fB g  fA g,
(d) C .A/ \ C .B  A/ D f0g and C .A0 / \ C .B0  A0 / D f0g,
(e) C .A/  C .B/, C .A0 /  C .B0 / & AB A D A for some (and hence for
all) B .

24.16 Let V 2 NNDn , X 2 Rnp , and denote


U D f U W 0
L U
L V; C .U/  C .X/ g:
The maximal element (in the Lwner partial ordering) U in U is the shorted
matrix of V with respect to X, and denoted as Sh.V j X/.

24.17 The shorted matrix S D Sh.V j X/ has the following properties:


Q under fy; X; Vg,
(a) S D cov.X/ (b) C .S/ D C .X/ \ C .V/,
(c) C .V/ D C .S/ C .V  S/, (d) r.V/ D r.S/ C r.V  S/,
C
(e) SV .V  S/ D 0,
(f) V 2 fS g for some (and hence for all) V 2 fV g, i.e., fV g  fS g.
112 Formulas Useful for Linear Regression Analysis and Related Matrix Theory

25 Inequalities

CSI .x0 y/2


x0 x  y0 y CauchySchwarz inequality with equality holding
i x and y are linearly dependent

25.1 Recall: (a) Equality holds in CSI if x D 0 or y D 0. (b) The nonnull vectors
x and y are linearly dependent (l.d.) i x D y for some  2 R.

In what follows, the matrix Vnn is nnd or pd, depending on the case. The or-
dered eigenvalues are 1 ; : : : ; n and the corresponding orthonormal eigen-
vectors are t1 ; : : : ; tn .

25.2 .x0 Vy/2


x0 Vx  y0 Vy equality i Vx and Vy are l.d.

25.3 .x0 y/2


x0 Vx  y0 V1 y equality i x and Vy are l.d.; V is pd

25.4 .x0 PV y/2


x0 Vx  y0 VC y equality i V1=2 x and VC1=2 y are l.d.

25.5 .x0 y/2


x0 Vx  y0 V y for all y 2 C .V/

.x0 x/2
25.6 .x0 x/2
x0 Vx  x0 V1 x; i.e.,
1,
x0 Vx  x0 V1 x
where the equality holds (assuming x 0) i x is an eigenvector of V: Vx D
x for some  2 R.

25.7 .x0 Vx/1


x0 V1 x; where x0 x D 1 equality i Vx D x for some 

25.8 Let V 2 NNDn with Vfpg dened as


Vfpg D Vp I p D 1; 2; : : : ;
D PV I p D 0;
C jpj
D .V / I p D 1; 2; : : :
Then: .x0 Vf.hCk/=2g y/2
.x0 Vfhg x/.y0 Vfkg y/ for h; k D : : : ; 1; 0; 1; 2; : : : ,
with equality i Vx / Vf1C.kh/=2g y.

41 n .x0 x/2


KI 12 WD
0 Kantorovich inequality
.1 C n / 2 x Vx  x0 V1 x i D chi .V/, V 2 PDn

25.9 Equality holds in Kantorovich inequality KI when x is proportional to t1 tn ,


where t1 and tn are orthonormal eigenvectors of V corresponding to 1 and
n ; when the eigenvalues 1 and n are both simple (i.e., each has multiplicity
1) then this condition is also necessary.
25 Inequalities 113
p
25.10 1 D 1 n =.1 C n /=2 D the rst antieigenvalue of V

.1 C n /2 .1 C n / .1=1 C 1=n /


25.11 12 D D 
41 n 2 2
1 2
. 1 C  n / 1
D 2p D  2
1 n 1 1 n
1 Cn

1 p p
25.12 x0 Vx 
. 1  n /2 .x0 x D 1/
x0 V1 x

WI Wielandt inequality. Consider V 2 NNDn with r.V/ D v and let i be the i th


largest eigenvalue of V and ti the corresponding eigenvector. Let x 2 C .V/
and y be nonnull vectors satisfying the condition x0 y D 0. Then
 
.x0 Vy/2 1  v 2 41 v
0 0

D1 WD v2 :
x Vx  y Vy 1 C v .1 C v /2
The upper bound is attained when x D t1 C tv and y D t1  tv .

25.13 If X 2 Rnp and Y 2 Rnq then


0
L Y0 .In  PX /Y
L .Y  XB/0 .Y  XB/ for all Bpq :

25.14 jX0 Yj2


jX0 Xj  jY0 Yj for all Xnp and Ynp

25.15 Let V 2 NNDn with Vfpg dened as in 25.8 (p. 112). Then
X0 Vf.hCk/=2g Y.Y0 Vfkg Y/ Y0 Vf.hCk/=2g X
L X0 Vfhg X for all X and Y;
with equality holding i C .Vfh=2g X/  C .Vfk=2g Y/.

25.16 X0 PV X.X0 VX/ X0 PV X


L X0 VC X equality i C .VX/ D C .PV X/

.1 C v /2 0
25.17 X0 VC X
L X PV X.X0 VX/ X0 PV X r.V/ D v
41 v

.1 C v /2 0
25.18 X0 VX
L X PV X.X0 VC X/ X0 PV X r.V/ D v
41 v

.1 C n /2 0
25.19 X0 V1 X
L X X.X0 VX/1 X0 X V pd
41 n

25.20 If X0 PV X is idempotent, then .X0 VX/C


L X0 VC X, where the equality holds
i PV XX0 VX D VX.

25.21 .X0 VX/C


L X0 V1 X, if V is pd and XX0 X D X
114 Formulas Useful for Linear Regression Analysis and Related Matrix Theory

25.22 Under the full rank model fy; X; Vg we have

O
L
Q
L cov./ .1 C n /2 Q
(a) cov./ cov./,
41 n
.1 C n /2 0 1 1
(b) .X0 V1 X/1
L .X0 X/1 X0 VX.X0 X/1
L .X V X/ ,
41 n
O  cov./
(c) cov./ Q D X0 VX  .X0 V1 X/1
p p

L . 1  n /2 Ip ; X0 X D Ip :

25.23 Consider V 2 NNDn , Xnp and Ynq , satisfying X0 PV Y D 0. Then


 
1  v 2 0
(a) X0 VY.Y0 VY/ Y0 VX
L X VX, r.V/ D v
1 C v
(b) V  PV X.X0 VC X/ X0 PV L VY.Y0 VY/ Y0 V,
(c) X0 VX  X0 PV X.X0 VC X/ X0 PV X L X0 VY.Y0 VY/ Y0 VX,
where the equality holds i r.V/ D r.VX/ C r.VY/.

25.24 Samuelsons inequality. Consider the n  k data matrix X0 D .x.1/ W : : : W


x.n/ /0 and denote xN D .xN 1 ; : : : ; xN k /0 , and Sxx D n1
1
X00 CX0 . Then
.n  1/2
Sxx  .x.j /  xN /.x.j /  xN /0 L 0;
n
or equivalently,
.x.j /  xN /0 S1 N / D MHLN2 .x.j / ; xN ; Sxx /
xx .x.j /  x
.n  1/2

; j D 1; : : : ; n:
n
The equality above holds i all x.i/ dierent from x.j / coincide with their
mean.

25.25 Let Vpp be nnd. Then 10 V1  p1


p
10 V1  tr.V/, i.e., assuming 10 V1 0,
 
p tr.V/
1 1 0 WD .1/:
p1 1 V1
where the equality is obtained i V D  2 110 for some  2 R. The term .1/
can be interpreted as Cronbachs alpha.

25.26 Using the notation above, and assuming that diag.V/ D V is pd [see 22.26
(p. 102)]
   
p a 0 V a p 1
.a/ D 1 0
1
1
p1 a Va p1 ch1 .RV /
26 Kronecker product, some matrix derivatives 115

for all a 2 Rp , where RV is a correlation matrix calculated from the covari-


ance matrix V. Moreover, the equality .a/ D 1 is obtained i V D  2 qq0
for some  2 R and some q D .q1 ; : : : ; qp /0 , and a is a multiple of
a D .1=q1 ; : : : ; 1=qp /0 .

26 Kronecker product, some matrix derivatives

26.1 The Kronecker product of Anm D .a1 W : : : W am / and Bpq and vec.A/ are
dened as
0 1
a11 B a12 B : : : a1n B 0 1
Ba21 B a22 B : : : a2n B C a1
AB D B : : : C 2 Rnpmq ; vec.A/ D @ :: A :
@ :: :: :: A :
am
an1 B an2 B : : : anm B

26.2 .A B/0 D A0 B0 ; a0 b D ba0 D b a0 ;  A D A D A 

26.3 .F W G/ B D .F B W G B/; .A B/.C D/ D AC BD

26.4 .A b0 /.c D/ D .b0 A/.D c/ D Acb0 D

26.5 Anm Bpk D .A Ip /.Im B/ D .In B/.A Ik /

26.6 .A B/1 D A1 B1 ; .A B/C D AC BC

26.7 PAB D PA PB ; r.A B/ D r.A/  r.B/

26.8 tr.A B/ D tr.A/  tr.B/; kA BkF D kAkF  kBkF

26.9 Let A L 0 and B L 0. Then A B L 0.

26.10 Let ch.Ann / D f1 ; : : : ; n g and ch.Bmm / D f1 ; : : : ; m g. Then ch.A


B/ D fi j g and ch.A Im C In B/ D fi C j g, where i D 1; : : : ; n,
j D 1; : : : ; m.

26.11 vec.A C B/ D vec.A/ C vec.B/; ; 2 R

26.12 vec.ABC/ D .I AB/ vec.C/ D .C0 A/ vec.B/ D .C0 B0 I/ vec.A/

26.13 vec.A1 / D .A1 /0 A1  vec.A/

26.14 tr.AB/ D vec.A0 /0 vec.B/


116 Formulas Useful for Linear Regression Analysis and Related Matrix Theory

26.15 Below are some matrix derivatives.


@Ax
(a) D A0
@x
@x0 Ax
(b) D .A C A0 /xI 2Ax when A symmetric
@x
@ vec.AX/
(c) DIA
@ vec.X/0
@ vec.AXB/
(d) D B0 A
@ vec.X/0
@ tr.AX/
(e) D A0
@X
@ tr.X0 AX/
(f) D 2AX for symmetric A
@X
@ logjX0 AXj
(g) D 2AX.X0 AX/1 for symmetric A
@X
Index

A block diagonalization 28, 89


BloomeldWatson eciency 62
Abadir, Karim M. vi BloomeldWatsonKnott inequality 58
added variable plot 18 BLP
adjoint matrix 83 BLP.yI x/ 22, 28
admissibility 65 BLP.zI A0 z/ 29
Alberts theorem 110 BLP.yn I y.n1/ / 31
ANOVA 37 y C yx 1 xx .x  x / 22
another parametrization 39 y C yx  xx .x  x / 28
conditional mean and variance 38 autocorrelation 31
estimability 48 BLUE
in matrix terms 38 Q representations 51
,
antieigenvalues 58, 61 Q 2 D .X02 M P 1W X2 /1 X0 MP
2 1W y 53
approximating Q 0 P 1 0 P
2 D .X2 M1 X2 / X2 M1 y 53
the centered data matrix 75
AR(1)-structure see autocorrelation Q i in the general case 53
autocorrelation BLUE.K0 / 49
V D f%ji j j g 30 BLUE.X/ 48
BLP 31 denition 48
in linear model 32 fundamental equation 49
cov.X/ Q as a shorted matrix 111
in prediction 67
regression coecients 32 Q
X, general representations 49
testing 32 X, Q representations 50
equality of OLSE and BLUE, several
B conditions 54
restricted BLUE 37
Baksalary, Oskar Maria vi without the ith observation 43
BanachiewiczSchur form 89 BLUP
Bernoulli distribution 19 fundamental equation 66
Bernstein, Dennis S. vi in mixed model 68
best linear predictor see BLP of y 14
best linear unbiased estimator see BLUE of  68
best linear unbiased predictor see BLUP, 65 of yf 66
best predictor see BP Pandoras Box 66
bias 27 representations for BLUP.yf / 66
binomial distribution 19 BP
Blanck, Alice vi BP.yI x/ 23

S. Puntanen et al., Formulas Useful for Linear Regression 117


Analysis and Related Matrix Theory, SpringerBriefs in Statistics,
DOI: 10.1007/978-3-642-32931-9,  c The Author(s) 2013
118 Index

BP.yI x/ 27 consistency (when solving X)


E.y j x/ 27 of AX D B 87
in Np 23 of AXB D C 87
is the conditional expectation 23, 27 of X.A W B/ D .0 W B/ 81, 94
consistency of linear model 49
C constraints to obtain unique O 35
contingency table 19
canonical correlations contours of constant density xii, 20
between x and y 76 contrast 48
between O and Q 59 Cooks distance 41
between Hy and My 60 COOK 2i .V / 43
between K0 QL u and L0 QK u 78 Cook, R. Dennis 30
non-unit ccs between K0 u and L0 u 78 correlation see multiple correlation
proper eigenvalues 77 cor.x; y/ D xy =x y xi
unit ccs 60 cors .xi ; xj /; cord .xi ; xj / 4
Watson eciency 59 R D cord .y; yO / 10
CauchySchwarz inequality 112 cor.O1 ; O2 / 14, 18
centered X0 ; Xy 2 cor."Oi ; "Oj / 10
centered and scaled Xy 3 as a cosine 4
centered model 11 between dichotomous variables 19
centering matrix C 2 random sample without replacement 19
characteristic equation 96 correlation matrix
Chatterjee, Samprit v cor.x/ 6
Cholesky decomposition 107 cord .X0 / D Rxx 3
Cochrans theorem 24 cord .X0 W y/ D R 3
coecient of determination see multiple correlation matrix 111
correlation determinant 84, 85
cofactor 82 inverse 13, 85, 86
collinearity 42 rank 5, 84
column space recursive decomposition of det.R/ 85
denition 5 CourantFischer theorem 100
C .VX? / 95 covariance
C .W X W I  W W/ 95 cov.x; y/ D E.x  x /.y  y / xi
C .X W V/ D C .X W VM/ 49 covs .x; y/; covd .x; y/ 4
C X0 .V C XUX0 / X 46 cov.z0 Az; z0 Bz/ 24
simple properties 78 covariance matrix
of cov.X/ Q 51 cov.x/ 6
completely symmetric matrix 57 cov.x; y/ 6
eigenvalues 97, 98 cov./O 12
condition index 41
cov.O 2 / 13
condition number 41
conditional see normal distribution cov.O x ; y/
N D 0 14
EE.y j x/ 23 cov.Q 2 / 53
varE.y j x/ 23, 38 cov.eyx / 28
expectation and the best predictor 27 cov.eyx ; x/ 28
condence ellipse see condence region cov.X/ Q 51
condence interval cov.x; y  Ax/ D 0 29
for i 14 covz  BLP.zI A0 z/ 29
for E.y / 15 covd .U/ 7
WorkingHotelling condence band 15 covd .UA/ 7
condence region covd .US1=2 / 7
for 2 34 covd .UT/; S D TT0 7
for x 34 covd .X0 / D Sxx 3
for  25 covd .Xy / D S 3
Index 119

always nonnegative denite 6 Draper, Norman R. v


of the prediction error 28, 74 DurbinWatson test statistic 32
partial covariance 22 Drers Melencolia I 109
crazy, still v
Cronbachs alpha 114 E

D EckartYoung theorem
non-square matrix 107
data matrix symmetric matrix 101
Xy D .X0 W y/ 2 eciency of OLSE see Watson eciency,
best rank-k approximation 75 BloomeldWatson eciency, Raos
keeping data as a theoretical distribution 4 eciency
decomposition see eigenvalue decomposi- eigenvalue, dominant 97, 108
tion, see full rank decomposition, see eigenvalues
singular value decomposition denition 96
HartwigSpindelbck 108 .; w/ as an eigenpair for .A; B/ 102
derivatives, matrix derivatives 116 ch. 22 / 21
determinant ch.A; B/ D ch.B1 A/ 102
22 21 ch.A22 / 96
Laplace expansion 82 ch1 .A; B/ D ch1 .BC A/ 104
recursive decomposition of det.S/ 30 jA  Ij D 0 96
recursive decomposition of det./ 30 jA  Bj D 0 101
Schur complement 83 nzch.A; B/ D nzch.AB / 104
DFBETAi 41 nzch.AB/ D nzch.BA/ 99
diagonalizability 99 algebraic multiplicity 99
discrete uniform distribution 19 characteristic polynomial 96
discriminant function 75 eigenspace 99
disjointness geometric multiplicity 99
C .A/ \ C .B/ D f0g, several conditions intraclass correlation 97, 98
80 of 2 I C a10 C 1a0 101
C .VH/ \ C .VM/ 60 of cov.X/ Q 104
C .X0.1/ / \ C .X0.2/ / 43 of PA PB 78, 92, 93
C .X/ \ C .VM/ 49 of PK PL  PF 78
C .X1 / \ C .X2 / 8 spectral radius 97
estimability in outlier testing 42 elementary column/row operation 99
distance see Cooks distance, see Ma- equality
halanobis distance, see statistical of BLUPs of yf under two models 67
distance of BLUPs under two mixed models 70
distribution see Hotellings T 2 , see normal of OLSE and BLUE, several conditions 54
distribution of OLSE and BLUE, subject to
F -distribution 24 C .U/ C .X/ 56
t -distribution 24 of the BLUEs under two models 55
Bernoulli 19 of the OLSE and BLUE of 2 57
binomial 19 O M.i/ / D .Q M.i/ / 62
.
chi-squared 23, 24
discrete uniform 19 O 2 .M12 / D Q 2 .M12 / 57
Hotellings T 2 25 Q 2 .M12 / D Q 2 .M12 / 63
BLUP.yf / D BLUE.X N
of MHLN2 .x; ; / 23 f / 66
of y0 .H  J/y 24 BLUP.yf / D OLSE.Xf / 67
of y0 .I  H/y 24 SSE.V / D SSE.I / 53
of z0 Az 23, 24 XO D XQ 54
of z0 1 z 23 Ay D By 50
Wishart-distribution 24, 73 PXIV1 D PXIV2 55
dominant eigenvalue 97, 108 estimability
120 Index

denition 47  D 0 25
constraints 35 1 D 2 26, 73
contrast 48 1 D    D k D 0 36
in hypothesis testing 33, 37 i D 0 36
of 48 D 0 in outlier testing 40
of k 48 1 D    D g 37
of in the extended model 42 1 D 2 38
of in the extended model 40 k0 D 0 36
of c0  48 K0 D d 35, 36
of K0 47 K0 D d when cov.y/ D  2 V 36
of X2 2 47 K0 D 0 33
extended (mean-shift) model 40, 42, 43 K0 B D D 73
overall-F -value 36
F testable 37
under intraclass correlation 55
factor analysis 75
factor scores 76 I
niteness matters 20
tted values 8 idempotent matrix
frequency table 19 eigenvalues 93
FrischWaughLovell theorem 11, 18 full rank decomposition 93
generalized 57 rank D trace 93
Frobenius inequality 80 several conditions 93
full rank decomposition 88, 107 illustration of SST = SSR + SSE xii
independence
G linear 5
statistical 20
Galtons discovery of regression v uN and T D U0 .I  J/U 25
GaussMarkov model see linear model observations, rows 24, 25
GaussMarkov theorem 49 of dichotomous variables 20
generalized inverse quadratic forms 23, 24
the 4 MoorePenrose conditions 86 inequality
BanachiewiczSchur form 89 chi .AA0 /
chi .AA0 C BB0 / 99
least-squares g-inverse 90 n
x0 Ax=x0 x
1 100
minimum norm g-inverse 90 kAxk
kAx C Byk 89
partitioned nnd matrix 89 kAxk2
kAk2 kxk2 106
reexive g-inverse 87, 89 tr.AB/
ch1 .A/  tr.B/ 99
through SVD 88 BloomeldWatsonKnott 58
generalized variance 6, 7 CauchySchwarz 112
generated observations from N2 .0; / xii Frobenius 80
Kantorovich 112
H Samuelsons 114
Sylvesters 80
Hadi, Ali S. v Wielandt 113
HartwigSpindelbck decomposition 108 inertia 111
Haslett, Stephen J. vi intercept 10
hat matrix H 8 interlacing theorem 100
Hendersons mixed model equations 69 intersection
Hotellings T 2 24, 25, 73 C .A/ \ C .B/ 80
when n1 D 1 26 C .A/ \ C .B/, several properties 92
hypothesis intraclass correlation
2 D 0 34 F -test 55
x D 0 27, 36 OLSE D BLUE 55
D 0 in outlier testing 42 eigenvalues 97, 98
Index 121

invariance Magnus, Jan R. vi


of C .AB C/ 87 Mahalanobis distance
of AB C 87 MHLN2 .Ny1 ; yN 2 ; S / 26, 73, 75
of X0 .V C XUX0 / X 46 MHLN2 .u.i/ ; uN ; S/ 4
of r.AB C/ 88 MHLN2 .x; ; / 5, 23
inverse MHLN2 .x ; xN ; Sxx / 14
of Ann 82 MHLN2 .x.i/ ; xN ; Sxx / 41, 114
intraclass correlation 97 MHLN2 .1 ; 2 ; / 75
of A D faij g D fmin.i; j /g 32 major/minor axis of the ellipse 21
of R 85 matrix
of Rxx 13, 86 irreducible 108
of X0 X 12, 84 nonnegative 108
of 22 21 permutation 108
of a partitioned block matrix 83 reducible 108
of autocorrelation matrix 32 shorted 111
of partitioned pd matrix 84 spanning C .A/ \ C .B/ 80
Schur complement 83 stochastic 109
irreducible matrix 108 matrix M R
PV MPP V 44
K PW MP P WDM R W 45
matrix M P
Kantorovich inequality 112 M.MVM/ M 44
Kronecker product 115 SSE.V / D y0 My P 52, 53
in multivariate model 72 matrix M P1
in multivariate sample 25 M1 .M1 VM1 / M1 53, 95
matrix M P 1W
L M1 .M1 W2 M1 / M1 53
matrix angle 61
Lagrangian multiplier 35 matrix of corrected sums of squares
Laplace expansion of determinant 82 T D Xy0 CXy 3
LATEX vi Txx D X00 CX0 3
least squares see OLSE maximizing
Lee, Alan J. v tr.G0 AG/ subject to G0 G D Ik 100
LehmannSche theorem 64 .a0 x/2 =x0 Bx 102
leverage hi i 41 .x0 Ay/2 =.x0 Bx  y0 Cy/ 106
likelihood function 26 .x0 Ay/2 =.x0 x  y0 y/ 106
likelihood ratio 26 a0 .1  2 /2 =a0 a 75
linear hypothesis see hypothesis a0 .Nu  0 /2 =a0 Sa 25
linear independence 5 a0 .Nu1  uN 2 /2 =a0 S a 75
collinearity 42 a0 .u.i/  uN /2 =a0 Sa 5
linear model 1 .Vz; z/ 61
multivariate 72 cor.a0 x; b0 y/ 76
linear prediction suciency 68 cor.y; b0 x/ 29
linear zero function 50, 66 cord .y; X/ 16
linearly complete 64 cord .y; X0 b/ 29
linearly sucient see suciency cos2 .u; v/; u 2 A; v 2 B 77
Lwner ordering see minimizing .0 A0 B/2
77
denition 109 A0 A  0 B0 B
0
2
Alberts theorem 110 kHVMkF 62
Y0 MY
L .Y  XB/0 .Y  XB/ 18, 113 u0 u subject to u0 1 u D c 2 21
var.b0 z/ 74
M x0 Ax=x0 Bx 102
x0 Ax=x0 Bx, B nnd 104
magic square 109 x0 Hx=x0 Vx 104
122 Index

likelihood function 26 multiple correlation


mean squared error 27 R D cord .y; yO / 10
mean squared error matrix 27, 64 in M121 17
Melencolia I by Drer 109 in no-intercept model 17
minimizing population 22, 27, 29
.Y  XB/0 .Y  XB/ 18, 72, 73, 93, 113 squared R2 16
.y  X/0 .y  X/ 9 multivariate linear model 72
.y  X/0 .y  X/ under K0 D d 35 Mustonens measure 30
.y  X/0 V1 .y  X/ 9, 36, 52
.y  X/0 V1 .y  X/ under K0 D d N
37
Ey  .Ax C b/y  .Ax C b/0 28 Niemel, Jarmo vi
MSEM.Ax C bI y/ 28 no-intercept model 1, 17, 36
cos.Vz; z/ 61 nonnegative deniteness
cov.Gy/ subject to GX D X 48 of partitioned matrix 110
cov.y  Fx/ 28 nonnegative matrix 108
covd .Y  XB/ 18, 28 norm
covs .y  B0 x/ 18, 28 matrix 2-norm 106
e./O D jcov./j=jcov.
Q O
/j 58 spectral 106
k  A.A0 A/1 A0 kF 74 normal distribution see multinormal
kA  BZkF subject to Z orthogonal 107 distribution
kA  BkF subject to r.B/ D k 101, 107 normal equation 9
kXQ  XP
Q G kF 74 general solution 9
ky  AxkV 94 generalized 9
ky  1 k 16 null space 5
ky  Axk 91
ky  Xk 9 O
tr.P1=2 A / 74
var.g0 y/ subject to X0 g D k 48 observation space 2
var.y  b0 x/ 29 observation vector 2
vard .y  X0 b/ 28, 29 OLS criterion 9
angle 77 OLSE
mean squared error 28 constraints 35
mean squared error matrix 28 tted values 8
orthogonal distances 74 of 10
minor 82 of K0 9
leading principal 82 of X 8, 9
principal 82 restricted OLSE 35
minus partial ordering 54, 111 without the i th observation 40
mixed model 68 without the last b observations 42
MLE ordinary least squares see OLSE
of 0 and x 27 orthocomplement
of  and 26 C .A/? 5
2
of yx 27 C .VX? / 95
model matrix 2 C .VX? / D C .W X W I  W W/? 46,
rank 5, 84 95
multinormal distribution C .V1 X/? 95
denition 20 C .X/? V1
95
conditional mean 22, 27 A? 5, 80, 91
conditional variance 22, 27 A?V 94
contours 20 orthogonal projector see projector
density function of N2 21 simple properties of PA 91
density function of Np 20 simple properties of PAIV 94
sample U0 from Np 25 C 2
Index 123

H 8 sample principal components 74


J 2 principal components
M 8 from the SVD of X Q 75
P221 93 principal minor
PC .A/\C .B/ 92 sum of all i  i principal minors 96
PC .A/\C .B? / 92 principal submatrix 82
PXIV1 D I  P0MIV 45, 95 projector see orthogonal projector
commuting PA PB D PB PA 92 PAjB 94
decomposition of P.X1 WX2 /IVC 95 PXjVM 49
decomposition of P.X1 WX2 /IV1 53, 95 oblique 91, 94
dierence PA  PB 92 proper eigenvalues
sum PA C PB 91 ch.A; B/ 103
orthogonal rotation 107 ch.GVG0 ; HVH/ 60
overall-F-value 36 ch.V; H/ 104
PSTricks vi
P
Q
Pandoras Box
QR-decomposition 107
BLUE 49, 51
quadratic form
BLUP 66
E.z0 Az/ 23
mixed model 68
var.z0 Az/ 24
partial correlation
distribution 23, 24
pcord .Y j X/ 17
independence 23, 24
%ij x 22
quadratic risk 64
%xy 22
rxy 18 R
2
decomposition of 1  %yx 30
2 random sample without replacement 19
decomposition of 1  Ry12:::k 17
2 rank
decomposition of 1  Ry12:::k 85
denition 5
population 22
simple properties 78
partial covariance 22
of .1 W X0 / 5, 84
partitioned matrix
of .A W B/ 79
g-inverse 89 Q
inverse 83 of cov.X/ 51
MP-inverse 89 of AB 79
nonnegative deniteness 110 of APB 81
pencil 102, 103 of A 88
permutation of HPV M 60
determinant 82 of Txx 5, 8
matrix 108 of X0 .V C XUX0 / X 46
PerronFrobenius theorem 108 of X02 M1 X2 8
Poincar separation theorem 100 of correlation matrix 5, 84
polar decomposition 107 of model matrix 5, 84
predictable, unbiasedly 65 rank additivity
prediction error r.A C B/ D r.A/ C r.B/ 79
Gy  yf 65 Schur complement 83
y  BLP.yI x/ 22, 28 rank cancellation rule 79
ei 1:::i1 30 rank-subtractivity see minus partial ordering
Mahalanobis distance 14 Raos eciency 63
with a given x 14 Rayleigh quotient x0 Ax=x0 x 100
prediction interval for y 15 recursive decomposition
2
principal component analysis 74 of 1  %yx 30
2
matrix approximation 75 of 1  Ry12:::k 17
2
predictive approach 74 of 1  Ry12:::k 85
124 Index

of det.R/ 85 to A.X W VX? / D .Xf W V21 X? / 66


of det.S/ 30 to AX D B 87
of det./ 30 to AXB D C 87
reduced model to Ax D y 87
R2 .M121 / 17 to A 87
M121 57 to G.X W VM/ D .X W 0/ 49
M1H 62 to X0 X D X0 y 9
AVP 18 to X.A W B/ D .0 W B/ 94
SSE.M121 / 17 spectral radius 97, 108
reducible matrix 108 spectrum see eigenvalues, 96
reection 108 square root of nnd A 97
regression coecients 10 standard error of Oi 14
old ones do not change 11 statistical distance 5
standardized 12 Stigler, Stephen M. v
regression function 23 stochastic matrix 109
regression towards mean v stochastic restrictions 70
relative eciency of OLSE see Watson Stricker-Komba, Ulrike vi
eciency suciency
residual linear 63
y  BLP.yI x/ 22, 28 linear prediction 68
ei 1:::i1 30 linearly minimal 64
after elimination of X1 16 sum of products SPxy D txy 4
externally Studentized 40, 42 sum of squares
internally Studentized 39 change in SSE 16, 33, 35
of BLUE 51, 52 change in SSE(V) 37
of OLSE 9, 39 predicted residual 41
predicted residual 41 SSy D tyy 4
scaled 39 SSE 15
residual mean square, MSE 15 SSE under M121 17
residual sum of squares, SSE 15 SSE, various representations 16
rotation 107 SSR 15
of observations 75 SST 15
weighted SSE 36, 52
S sum of squares and cubes of integers 19
Survo vi
Samuelsons inequality 114 Sylvesters inequality 80
Schur complement 83
determinant 83 T
MP-inverse 89
rank additivity 83 theorem
Schurs triangularization theorem 107 Albert 110
Seber, George A. F. v, vi Cochran 24
shorted matrix 111 CourantFischer 100
similarity 99 EckartYoung 101, 107
Simon, Paul v FrischWaughLovell 11, 18, 57
simultaneous diagonalization GaussMarkov 49
by a nonsingular matrix 103 interlacing 100
by an orthogonal matrix 103 LehmannSche 64
of commuting matrices 103 PerronFrobenius 108
singular value decomposition 105 Poincar separation 100
skew-symmetric: A0 D A ix Schurs triangularization 107
Smith, Harry v WedderburnGuttman 83
solution Thomas, Niels Peter vi
to A.X W VX? / D .Xf W V21 X? / 66 Tiritiri Island ii
Index 125

total sum of squares, SST 15 of a dichotomous variable 19


transposition 82 of prediction error with a given x 14
Trenkler, Gtz vi variance function 23
triangular factorization 30 vec 115
triangular matrix 30, 107 in multivariate model 72
in multivariate sample 25
U vector of means xN 2
Vehkalahti, Kimmo vi
unbiased estimator of  2 VIF 13, 86
O 2 15 volume of the ellipsoid 34
2
O .i/ 40
Q 2 36, 37, 52 W
unbiasedly predictable 65
Watson eciency
denition 58
V
decomposition 62
factorization 62
variable space 2 lower limit 58
variable vector 2 weakly singular linear model: C .X/ C .V/
variance 49
vars .y/; vard .y/ 4 WedderburnGuttman theorem 83
var.x/ D E.x  x /2 xi Weisberg, Sanford v
se2 .Oi / 14 Wielandt inequality 113
var.a0 x/ D a0 a 6 Wilkss lambda 26
var.Oi / 13 Wishart-distribution see distribution
var."Oi / 10 WorkingHotelling condence band 15

You might also like