Linear Statistical Models: Random vectors

Random vectors
Random quadratic forms
Independence
Linear Statistical Models: Random vectors

Notes by Yao-ban Chan and Owen Jones
1/53
Random vectors
Independence
Random vectors
The theory of linear algebra provides us with a good grounding to
analyse our linear models. However we must still do some more
groundwork. Once we have done this, the theoretical results come
out quite easily!
Previously, we were thinking of matrices and vectors simply as a
bunch of numbers. However, there is no reason why we cant think
of them as a bunch of random variables!
We can then extend the traditional concepts of expectation,
variance, etc. to random vectors.
2/53
Random vectors
Independence
Expectation
Although traditionally random variables are denoted with capital
letters, in keeping with our linear algebra notation, we will denote
them by lowercase.
We define the expectation of a random vector y to be the vector
of expectations of its components:
y1
E [y1 ]
y2
E [y2 ]
If y = . then E [y] =
.. .
..
.
yk
E [yk ]
3/53
Random vectors
Independence
Expectation properties
I
If a is a vector of constants, then E [a] = a.
If a is a vector of constants, then E [aT y] = aT E [y].
If A is a matrix of constants, then E [Ay] = AE [y].
Example. Let

A=
2 3
1 4

,y =
y1
y2
and assume that E [y1 ] = 10 and E [y2 ] = 20. Then

2 3
10
80
AE [y] =
=
.
1 4
20
90
4/53
Random vectors
Independence
On the other hand,

E [Ay] = E

=

=

=
2y1 + 3y2
y1 + 4y2
E [2y1 + 3y2 ]
E [y1 + 4y2 ]

2E [y1 ] + 3E [y2 ]
E [y1 ] + 4E [y2 ]

80
= AE [y].
90
5/53
Random vectors
Independence
Variance
Defining the variance of a random vector is slightly trickier. We
want to not just include the variance of the variables themselves,
but also how the variables affect each other.
Recall that the variance of a random variable Y with mean is
defined to be E [(Y )2 ]. Now let y be as before, a k 1 vector
of random variables. We define the variance of y (sometimes
known as the covariance matrix) to be
var y = E [(y )(y )T ]
where = E [y].
6/53
Random vectors
Independence
The diagonal elements of the covariance matrix are just the

variances of the individual elements of y:
[var y]ii = var yi , i = 1, 2, . . . , k .
The off-diagonal elements of the covariance matrix are the

covariances of the individual elements:
[var y]ij = cov(yi , yj ) = E [(yi i )(yj j )].
This means that all covariance matrices are symmetric.
7/53
Random vectors
Independence
Variance properties
Suppose that y is a random vector with var y = V . Then

I
If a is a vector of constants, then var aT y = aT V a.
If A is a matrix of constants, then var Ay = AVAT .
These can be derived from first principles quite easily.
It follows that any covariance matrix is symmetric and positive

semidefinite.
8/53
Random vectors
Independence
Example. Let
y1
y = y2
y3
be a random vector, such that var yi = 2 for all i , and that the
elements of y are independent. This means that cov(yi , yj ) = 0 for
i 6= j , so the covariance matrix of y is
2
0 0
var y = V = 0 2 0 = 2 I .
0 0 2
9/53
Random vectors
Independence
Example continued. Assume that X is a matrix of full rank (with

more rows than columns), which implies that X T X is nonsingular.
Let
z = (X T X )1 X T y = Ay
then
var z = AVAT = [(X T X )1 X T ] 2 I [(X T X )1 X T ]T
= (X T X )1 X T (X T )T [(X T X )1 ]T 2
= (X T X )1 X T X [(X T X )T ]1 2
= (X T X )1 2 .
We will be using this quite a bit later on!
10/53
Random vectors
Independence
Matrix square root

A square matrix A has a square root if there exists a matrix B , the
same size, such that B 2 = A. In general the square root is not
unique.
For a symmetric positive semidefinite matrix A, there is a unique
symmetric positive semidefinite square root, called the principle
root, denoted A1/2 .
Suppose that P diagonalises A, that is P T AP = . Then
A = P P T
= P 1/2 P T P 1/2 P T
A1/2 = P 1/2 P T .
11/53
Random vectors
Independence
Multivariate normal
Definition
Let z be a k 1 vector of i.i.d. standard normal r.v.s, A an n k
matrix, and b an n 1 vector, then we say that
x = Az + b
is (an n dimensional) multivariate normal, with mean = E x = b
and covariance matrix = var x = AAT .
We write x MVN (, ) or just x N (, ).
For any and any symmetric positive semidefinite matrix , let z
be a vector of i.i.d. standard normals, then
+ 1/2 z MVN (, ).
12/53
Random vectors
Independence
If x MVN (, ) and is k k positive definite, then x has the

density
1
1
T 1
f (x) =
e 2 (x) (x) .
k
/2
1/2
(2) ||
Note that a symmetric positive definite matrix is necessarily

invertible. Also, be aware that some authors require the covariance
matrix to be positive definite, rather than just positive
semi-definite.
13/53
Random vectors
Independence
If x MVN (, ) is k 1, A is n k , and b is n 1, then

y = Ax + b MVN (A + b, AAT ).
To see why, put x = 1/2 z + , then
y = A1/2 z + A + b.
14/53
Random vectors
Independence
If the random vector z = (z1 , z2 )T is multivariate normal, then z1

and z2 are independent if and only if they are uncorrelated.
In general, if z1 and z2 are normal random variables, z = (z1 , z2 )T

does not have to me multivariate normal. Moreover, z1 and z2 can
be uncorrelated but not independent.
For example, suppose that z1 N (0, 1) and u U (1, 1), then

z2 = z1 sign(u) N (0, 1), but z = (z1 , z2 )T is not multivariate
normal. (Consider its support.) Moreover z1 and z2 are
uncorrelated, but clearly dependent.
15/53
Random vectors
Independence
R example: multivariate normal
To generate a sample of size 100 with distribution

3
1 0.8
MVN
,
1
0.8
1
>
>
>
>
>
library(MASS)
a <- matrix(c(3, 1), 2, 1)
V <- matrix(c(1, .8, .8, 1), 2, 2)
y <- mvrnorm(100, mu = a, Sigma = V)
plot(y[,1], y[,2])
16/53
Independence
1
1
y[, 2]
Random vectors
y[, 1]
17/53
Random vectors
Independence
Alternatively, starting with standard normals

>
>
>
>
>
P <- eigen(V)$vectors
sqrtV <- P %*% diag(sqrt(eigen(V)$values)) %*% t(P)
z <- matrix(rnorm(200), 2, 100)
y_new <- sqrtV %*% z + rep(a, 100)
points(y_new[1,], y_new[2,], col = "red")
18/53
Independence
1
1
y[, 2]
Random vectors
y[, 1]
19/53
Random vectors
Independence
Just as we can consider vectors and matrices to be composed of

random variables, we can see what happens when these random
vectors are combined into quadratic forms. The result is a function
of random variables which is scalar (not vector), and so it is itself a
random variable.
Quadratic forms will pop up regularly in our analysis of linear

models. To fully analyse our models, we will want to know the
distribution of these forms, under the assumptions that we make
on the distribution of the variables in the model.
20/53
Random vectors
Independence
Theorem
Let y be a random vector with E [y] = and var y = V , and let
A be a matrix of constants. Then
E [yT Ay] = tr (AV ) + T A.
21/53
Random vectors
Independence
22/53
Random vectors
Independence
Example. Let y be a 2 1 random vector with

1
2 1
=
,V =
.
3
1 5
Let

A=
4 1
1 2

.
Consider the quadratic form

yT Ay = 4y12 + 2y1 y2 + 2y22 .
23/53
Random vectors
Independence
The expectation of this form is

E [yT Ay] = 4E [y12 ] + 2E [y1 y2 ] + 2E [y22 ].
From the definition of variance and the given covariance matrix,
2 = var y1 = E [y12 ] E [y1 ]2 = E [y12 ] 1
5 = var y2 = E [y22 ] E [y2 ]2 = E [y22 ] 9
so E [y12 ] = 3 and E [y22 ] = 14.
24/53
Random vectors
Independence
From the definition of covariance and the given covariance matrix,

1 = cov(y1 , y2 ) = E [y1 y2 ] E [y1 ]E [y2 ] = E [y1 y2 ] 3
so E [y1 y2 ] = 4. This gives
E [yT Ay] = 4 3 + 2 4 + 2 14 = 48.
25/53
Random vectors
Independence
From the theorem,

E [yT Ay] = tr (AV ) + T A

4 1
4 1
2 1
1
= tr
+ 1 3
1 2
1 5
1 2
3

7
9 9
= tr
+ 1 3
4 11
7
= 9 + 11 + 7 + 21 = 48.
26/53
Random vectors
Independence
Noncentral 2 distribution
Definition
Let y = (yi ) be a k 1 normally distributed
vector with
Prandom
k
T
2
mean and variance I . Then x = y y = i=1 yi follows a
noncentral 2 distribution with k degrees of freedom and
noncentrality parameter = 12 T . We write x 2k , .
Warning: some authors define to be T .
Note that the distribution of x depends on only through .
27/53
Random vectors
Independence
Suppose y MVN (, Ik ) and x = yT y 2k , . Then

E [x ] = tr (Ik ) + T = k + 2.
The noncentrality parameter = 12 T is zero if and only if

= 0, in which case x is just the sum of k i.i.d. standard normals.
That is, x has an ordinary (central) 2 distribution with k degrees
of freedom.
28/53
Independence
0.10
0.05
0.00
chisq 4 df lambda = 0, 1, 2
0.15
Random vectors
10
15
x
29/53
Random vectors
Independence
Theorem
Let Xk21 ,1 , Xk22 ,2 , . . . , Xk2n ,n be a collection of n independent
noncentral 2 random variables, with k1 , k2 , . . . , kn degrees of
freedom respectively and noncentrality parameters 1 , 2 , . . . , n
respectively. Then
n
X
Xk2i ,i
i=1
Pn
has a noncentral
distribution
Pn with i=1 ki degrees of freedom
and noncentrality parameter i=1 i .
2
If we set i = 0 for all i , we get the result that the sum of

independent 2 variables is another 2 variable.
30/53
Random vectors
Independence
Distribution of quadratic forms
Theorem
Let y be a n 1 normally distributed random vector with mean
and variance I and let A be a n n symmetric matrix. Then
yT Ay has a noncentral 2 distribution with k degrees of freedom
and noncentrality parameter = 12 T A if and only if A is
idempotent and has rank k .
31/53
Random vectors
Independence
32/53
Random vectors
Independence
33/53
Random vectors
Independence
Corollary
Let y be a n 1 normally distributed random vector with mean 0
and variance I and let A be a n n symmetric matrix. Then
yT Ay has a (ordinary) 2 distribution with k degrees of freedom if
and only if A is idempotent and has rank k .
Corollary
Let y be a n 1 normally distributed random vector with mean
and variance 2 I and let A be a n n symmetric matrix. Then
1 T
y Ay has a noncentral 2 distribution with k degrees of
2
freedom and noncentrality parameter = 21 2 T A if and only if
A is idempotent and has rank k .
34/53
Random vectors
Independence
Example. Let y1 and y2 be independent normal random variables

with means 3 and -2 respectively and common variance 1. Let

1 1 1
A=
.
2 1 1
It is easy to verify that A is symmetric and idempotent, and has
rank 1. Therefore
yT Ay =

1
y1 y2
2
1 1
1 1

y1
y2
1
1
= y12 + y1 y2 + y22
2
2
has a noncentral 2 distribution with 1 degree of freedom and

noncentrality parameter

1 1
1
1
3
3 2
= .
=
1 1
2
4
4
35/53
Random vectors
Independence
What happens if y does not have variance I ?
Theorem
Let y be a n 1 normal random vector with mean and variance
V , and let A be a n n symmetric matrix. Then yT Ay has a
noncentral 2 distribution with k degrees of freedom and
noncentrality parameter = 21 T A if and only if AV is
idempotent and has rank k .
36/53
Random vectors
Independence
Corollary
Let y be a n 1 normal random vector with mean 0 and variance
V and let A be a n n symmetric matrix. Then yT Ay has a
(ordinary) 2 distribution with k degrees of freedom if and only if
AV is idempotent and has rank k .
Corollary
V of full rank. Then yT V 1 y has a noncentral 2 distribution
with n degrees of freedom and noncentrality parameter
= 21 T V 1 .
37/53
Random vectors
Independence
R example: noncentral chisquared

Consider the quadratic form yT Ay with

1
3
1 0.8
1 1
y MVN a =
,V =
, A=
.
1
0.8
1
3.6 1 1
> A <- matrix(1/3.6, 2, 2)
> A %*% V
[,1] [,2]
[1,] 0.5 0.5
[2,] 0.5 0.5
> library(Matrix)
> (df <- rankMatrix(A %*% V)[1])
[1] 1
> (lambda <- t(a) %*% A %*% a / 2)
[,1]
[1,] 2.222222
38/53
Random vectors
Independence
> quadform <- function(y, A) t(y) %*% A %*% y

> x <- apply(y, 1, quadform, A = A)
> mean(x)
[1] 5.198274
> df + 2*lambda
[,1]
[1,] 5.444444
> hist(x, freq=F)
> curve(dchisq(x, df, 2*lambda), add = TRUE)
39/53
Random vectors
Independence
0.08
0.06
0.00
0.02
0.04
Density
0.10
0.12
0.14
Histogram of x
10
15
20
x
40/53
Random vectors
Independence
Example. Let y1 and y2 follow a multivariate normal distribution

with means
-1 and 4 respectively, and covariance matrix

3 2
V =
. Then
2 2

1
2 2
1 1
1
V =
=
,
3
1 3/2
3 2 2 2 2
and the quadratic form
3
yT V 1 y = y12 2y1 y2 + y22
2
has a noncentral 2 distribution with 2 degrees of freedom and
noncentrality parameter

1
33
1 1
1
1 4
=
= .
1 3/2
4
2
2
41/53
Random vectors
Independence
Independence of quadratic forms
Sometimes we will want to know when two quadratic forms are

independent. The next theorem tells us when this happens.
Theorem
V of full rank, and let A and B be symmetric n n matrices.
Then yT Ay and yT B y are independent if and only if
AVB = 0.
42/53
Random vectors
Independence
43/53
Random vectors
Independence
44/53
Random vectors
Independence
45/53
Random vectors
Independence
Example. Let y1 and y2 follow a multivariate normal distribution

with covariance matrix

1 c
V =
.
c 1
Consider the symmetric matrices

1 0
0 0
A=
,B =
.
0 0
0 1
It is obvious that
yT Ay = y12 , yT B y = y22 .
46/53
Random vectors
Independence
Now these quadratic forms will be independent if and only if
AVB
1 0
0 0

1 c
c 1

1 0
0 0

0 c
0 1
0 c
0 0
=
=
=
0 0
0 1
is the 0 matrix. But this happens if and only if c = 0, i.e. if y1 and

y2 have zero covariance.
47/53
Random vectors
Independence
Corollary
Let y be a random normal vector with mean and variance 2 I ,
and let A and B be symmetric matrices. Then yT Ay and yT B y
are independent if and only if AB = 0.
48/53
Random vectors
Independence
Next we consider when a quadratic form is independent of a

random vector. Firstly, we define a random variable to be
independent of a random vector if and only if it is independent of
all elements of that vector.
Theorem
V , and let A be a n n symmetric matrix and B a m n matrix.
Then yT Ay and B y are independent if and only if BVA = 0.
Lastly, we can combine several of the theorems we have seen
before to tell when a group of quadratic forms (more than two) are
independent.
49/53
Random vectors
Independence
Theorem
Let y be a normal random vector with mean and variance I , and
let A1 , A2 , . . . , Am be a collection of m symmetric matrices. If any
two of the following statements are true:
I
All Ai are idempotent;

Pm
i=1 Ai is idempotent;
Ai Aj = 0 for all i 6= j ;
then so is the third, and

I
For all i , yT Ai y has a noncentral 2 distribution with r (Ai )

degrees of freedom and noncentrality parameter
i = 12 T Ai ;
yT Ai y and yT Aj y are independent for i 6= j ; and

Pm
Pm
i=1 r (Ai ) = r ( i=1 Ai ).
50/53
Random vectors
Independence
P
When i Ai = I , the previous result can be seen as a special case
of the following result (which we will not prove):
Theorem (Cochran-Fisher Theorem)

2 I . Decompose the sum of squares of y/ into the quadratic
forms
m
X 1
1 T
y
y
=
yT Ai y.
2
2
i=1
Then the quadratic forms are independent and have noncentral 2

distributions with parameters r (Ai ) and 21 2 T Ai , respectively,
if and only if
m
X
r (Ai ) = n.
i=1
51/53
Random vectors
Independence
Example
> A <- matrix(1, 2, 2)
> B <- matrix(c(1,-1,-1,1), 2, 2)
> A %*% B
[1,]
[2,]
>
>
>
>
[,1] [,2]
0
0
0
0
y <- mvrnorm(200, c(0, 0), diag(c(2, 2)))

x1 <- apply(y, 1, quadform, A = A)
x2 <- apply(y, 1, quadform, A = B)
cor(x1, x2)
[1] 0.0662571
52/53
Random vectors
Independence
15
0
10
x2
20
25
30
> plot(x1, x2)
10
20
x1
30
40
53/53

Linear Statistical Models: Random vectors - for document

Uploaded by

Document Information

Original Description:

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Linear Statistical Models: Random vectors - for document

Uploaded by

Copyright:

Available Formats

Random vectors

Random quadratic forms