You are on page 1of 46

Topic 3

Some Basic Foundations


3.2 Some Distribution Theory
References
Stock and Watson, Ch.18
Wooldridge, Appendices D and E.
Heij Section 3.4
J ohn Section 3.4 and Appendix B
Verb Ch.2 and Appendix B
Greene Ch. 4, 5, Appendices A, B
Topic 3.2: Some Distribution Theory
2

Objective
- To examine the distribution theory underlying hypothesis
testing and interval estimation in the linear regression model.
Assumptions
y X e = |+
2
( ) 0 ( )
N
E e E ee I
'
= = o

2
(0, )
N
e N I o
N K
X

is nonstochastic of rank K
Note: Rank will be defined shortly. It implies there is no exact
collinearity and that X X
'
is nonsingular (its inverse exists).
The assumptions e normally distributed, and X nonstochastic will
be relaxed later.
Topic 3.2: Some Distribution Theory
3

I will give results in a general form and then relate them to the linear
regression model.
Normal Distribution Results
Let
1
1
( , )
N N N
N
x N V


[The joint distribution of the elements in x is
multivariate normal with mean vector and
covariance matrix V.]
Then, (1) the marginal distribution of a single element from x is
( , )
i i ii
x N v
See POE4, Appendix B for a discussion of joint, marginal and
conditional distributions.
Topic 3.2: Some Distribution Theory
4

(2) The distribution of
1
1
P N N
P
y A x

= is
( , ) y N A AVA
'

We have seen the result about the mean and covariance matrix
before. What we are saying that might be new is that linear
functions of normal random variables are also normal.
Where relevant

1
( ) b X X X e

' '
| =
Since
2
(0, )
N
e N I o , it follows that
(1)
( )
( )
2 1
0, ( ) b N X X

'
| o and
( )
2 1
, ( ) b N X X

'
| o
Topic 3.2: Some Distribution Theory
5

(2)
2
( , )
k k kk
b N a | o and (0,1)
k k
kk
b
N
a
|
o

where
kk
a is the k-th diagonal element of
1
( ) X X

' .
(3)
( )
2 1
, ( ) Rb N R R X X R

' '
| o
where is with R J K J K < .
Some examples of R.
(a) A single coefficient.
(b) Returns to scale in a Cobb-Douglas production function
(c) (b) and the equality of 2 input elasticities
(d) All coefficients except the intercept.
Topic 3.2: Some Distribution Theory
6

Note how R picks up the correct elements of the mean vector
and the covariance matrix.
Chi-square distribution results
Definition of a chi-square random variable.
If
1
(0, )
N
N
z N I

, then
2
( ) N
z z
'
_ .
Alternatively, if
1 2
, , ,
N
z z z are independent standard normal
random variables, then
2
1
N
i
i
z
=

has a chi-square distribution with


N degrees of freedom
2
( ) N
( _

.
( ) E z z N
'
= and var( ) 2 z z N
'
=
Topic 3.2: Some Distribution Theory
7


Theorem
If
1
1
( , )
N N N
N
x N V

, and V is positive definite (nonsingular), then



1 2
( )
( ) ( )
N
x V x

'
_
To prove this result we need to digress and consider some results
from matrix algebra.

Topic 3.2: Some Distribution Theory
8

Digression on matrix algebra
Diagonalization of a square symmetric matrix
Let A be ( ) N N symmetric matrix, then there exists an orthogonal
matrix P such that
P AP
'
= A
where A is a diagonal matrix containing the eigenvalues of A.
An orthogonal matrix P is such that P P I
'
= and hence
1
P P

'
=
and PP I
'
= .
The columns of P are called the eigenvectors of A.
Eigenvectors and eigenvalues are also called characteristic
vectors and characteristic roots.
Topic 3.2: Some Distribution Theory
9

Rank of a matrix
The rank of a matrix is defined as the maximum number of
linearly independent columns (or rows) in the matrix.
See matrices for 470 in 2010.pdf slides 22-29 for more detail
and some examples. [Other references will also have details.]
rank min{no. of rows, no. of columns} s
We will mainly be concerned with the rank of a square
symmetric matrix which is either positive definite or positive
semidefinite. [positive semidefinite = nonnegative definite]
Topic 3.2: Some Distribution Theory
10

An ( ) N N matrix with rank N is said to be of full rank. It is
nonsingular.
An ( ) N N matrix with rank less than N is said to be of
reduced rank. It is singular.
If a matrix is positive definite, it is of full rank.
If a matrix is positive semidefinite, and not positive definite, it is
of reduced rank.
The eigenvalues of a positive definite matrix and all positive.
The eigenvalues of a positive semidefinite matrix are positive or
zero.
Topic 3.2: Some Distribution Theory
11

The number of nonzero eigenvalues of square symmetric matrix
is equal to its rank.
Thus,
If a matrix is positive definite, then (a) it is nonsingular, (b) it is
of full rank, and (c) all its eigenvalues are positive.
If a matrix is positive semidefinite, and not positive definite, then
(a) it is singular, (b) it is not of full rank, and (c) some of its
eigenvalues are positive, and some are zero; the rank of the
matrix is the number of nonzero eigenvalues.
[To get more details, J ohnston and Dinardo, Appendix A is good.]

Topic 3.2: Some Distribution Theory
12


Where relevant:
Covariance matrices are positive semidefinite. They are also
positive definite unless one of the random variables is a linear
combination of the others.
Matrices like X X
'
and
1
X V X

'
are positive definite unless we
have exact collinearity.

Topic 3.2: Some Distribution Theory
13

The square root of a positive definite matrix
Return to the diagonalization of a square symmetric matrix:
Let A be ( ) N N symmetric matrix, then there exists an orthogonal
matrix P such that
P AP
'
= A
Now suppose A is positive definite. Then the diagonal elements
of A are positive.
Define
12
A as the matrix obtained by getting the square roots of
the diagonal elements of A. Then,
12 12
A = A A , and
12 12
PAP
'
= A A

Topic 3.2: Some Distribution Theory
14

12 12
PAP
'
= A A

12 12
PPAPP P P
' ' '
= A A

A L L
'
=

where
12
L P
'
= A . We have shown that for any positive definite
matrix A, we can always find a matrix L such that A L L
'
= . We
can think of L as the matrix square root of A.

12
L P
'
= A is not the only matrix with this property. There other
matrices, H say, such that A H H
'
= . One that is commonly used is
the Choelsky decomposition of A. Most software has a function
for computing the Choelsky decomposition.
Topic 3.2: Some Distribution Theory
15

Another result:
12 12
PAP
'
= A A
12 12 12 12 12 12
PAP

'
A A = A A A A
QAQ I
'
=
where
12
Q P

'
= A .
For any positive definite matrix A, there exists a matrix Q such
that QAQ I
'
= .
In all the above relationships, Q, H, L and P are all nonsingular
(of full rank).
End of digression.
Topic 3.2: Some Distribution Theory
16

We can now prove that:
If
1
1
( , )
N N N
N
x N V

, and V is positive definite (nonsingular), then



1 2
( )
( ) ( )
N
x V x

'
_
If V is positive definite, then
1
V

is positive definite.
(We can prove this result by showing that the eigenvalues of
1
V


are the reciprocals of the eigenvalues of V.)
Then, there exists a matrix Q such that
1
V Q Q

'
= . Thus,

1
( ) ( ) ( ) ( ) x V x x Q Q x y y

' ' ' '


= =
where ( ) y Q x = .
Topic 3.2: Some Distribution Theory
17

The mean and covariance matrix of ( ) y Q x = are
| |
( ) ( ) ( ( ) ) ( ) 0 E y E Q x Q E x Q = = = =

| |
| |
cov( ) ( ) ( )( )
( )( )
cov( )
y E yy E Q x x Q
QE x x Q
Q x Q
QVQ
I
' ' '
= =
' '
=
'
=
'
=
=

The last line follows because
1
V Q Q

'
= implies
( )
1
1
V Q Q

'
=
and hence
( )
1
1
QVQ QQ Q Q I

' ' '


= = .
Topic 3.2: Some Distribution Theory
18

Now, y is normally distributed because x is normally distributed
and ( ) y Q x = is a linear transformation of x.
Thus, (0, ) y N I , and
1 2
( )
( ) ( )
N
y y x V x

' '
= _
Where relevant
(1) Since
( )
2 1
, ( ) b N X X

'
| o
it follows that

2
( )
2
( ) ( )
K
b X X b
' '
| |
_
o

Topic 3.2: Some Distribution Theory
19

(2) Since
( )
2 1
, ( )
J K
R b N R R X X R

' '
| o , it follows that

1
1
2
( )
2
( ) ( ) ( )
J
Rb R R X X R Rb R

' ' ' (


| |

_
o

(We assume there are no redundant restrictions which implies R
is of rank J and
1
( ) R X X R

' '
is has full rank.)
Note: The Wald test statistic for testing
0
: H R r | = is given by

1
1
2
( ) ( ) ( )

Rb r R X X R Rb r
W

' ' ' (




=
o

When
0
H is true W converges in distribution to
2
( ) J
_ .
Topic 3.2: Some Distribution Theory
20

Notes:
1. The above results (1), and (2) could be used for interval
estimation and testing hypotheses about | if
2
o was known. It is,
of course, unknown.
2. One solution to this dilemma is to replace
2
o by its estimate
2

o .
Doing so leads to the Wald statistic that has the chi-square
distribution asymptotically (in large samples).
3. Another solution is to derive a slightly modified statistic that has
the F-distribution in finite samples.
We will now proceed in that direction.
Topic 3.2: Some Distribution Theory
21


Theorem
If
1
(0, )
N
z N I

, and A is an idempotent matrix of rank G, then



2
( ) G
z Az
'
_

Another matrix result is required to prove this theorem.


Topic 3.2: Some Distribution Theory
22

Another digression on matrix algebra
If A is an idempotent matrix, its eigenvalues are 1 or 0, and
rank( ) trace( ) A A = .
Proof:
If P is the orthogonal matrix that diagonalizes A, we have
P AP
'
= A
P APP AP
' '
= AA
P AAP
'
= AA
P AP
'
= AA
Thus, A = AA
Topic 3.2: Some Distribution Theory
23

If each eigenvalue is such that
2
i i
= , then 0 or 1
i
= .
Now,
tr( ) tr( ) tr( ) tr( ) P AP PP A A
' '
A = = =
This result is true in general, not just for idempotent A.
When A is idempotent

tr( ) tr( )
the number eigenvalues of that are equal to 1
the number nonzero eigenvalues of
the rank of
A
A
A
A
= A
=
=
=

End of digression
Topic 3.2: Some Distribution Theory
24

Return to theorem
If
1
(0, )
N
z N I

, and A is an idempotent matrix of rank G, then



2
( ) G
z Az
'
_
Proof:
Let P be the orthogonal matrix that diagonalizes A.
And let z Px = . Then,

2
1
0
0 0
G
G
i
i
N G
I
z Az x P APx x x x
=

(
' ' ' '
= = =
(



If we can show that (0, ) x N I , the result follows.
Topic 3.2: Some Distribution Theory
25


x Pz
'
= ( ) ( ) 0 E x P E z
'
= =
cov( ) cov( )
N
x P z P P IP I
' '
= = =
Thus, (0, ) x N I and
2 2
( )
1
G
i G
i
z Az x
=
'
= _

.

Topic 3.2: Some Distribution Theory
26

Where relevant
In the model y X e = |+

2
(0, )
N
e N I o
we previously showed that the least squares residuals

e y Xb =
were such that

1

( )
N
e e e Me e I X X X X e

' ' ' ' ' (


= =


where
1
( )
N
M I X X X X

' '
= is an idempotent matrix with trace N K .
Since
2
(0, )
N
e N I o , it follows that (0, )
N
e
N I
o
, and
2
( )
2

N K
e e e e
M

' '
= _
o o o

Topic 3.2: Some Distribution Theory
27

F-Distribution
An F random variable is defined as the ratio of two independent
chi-square random variables, with each divided by its degrees of
freedom.

2
1 1
2
2 2
df
F
df
_
=
_

The two chi-square random variables of interest to us are

1
1
2
( )
2
( ) ( ) ( )
J
Rb R R X X R Rb R

' ' ' (


| |

_
o


2
( )
2

N K
e e

'
_
o

Topic 3.2: Some Distribution Theory
28

Their ratio after dividing each by its degrees of freedom is

1
1
2
2
1
1
( , )
( ) ( ) ( )

( )
( ) ( ) ( )

( )
J N K
Rb R R X X R Rb R
J
F
e e
N K
Rb R R X X R Rb R J
F
e e N K

' ' ' (


| |

o
=
'

o
' ' ' (
| |

=
'



We have eliminated the unknown
2
o . Note
2

e e
N K
'
= o


However, we still need to prove the two chi-square distributions
are independent.
Topic 3.2: Some Distribution Theory
29

The two quadratic forms in b and

e will be independent if b and

e are independent.
Since b and

e are normally distributed, they will be independent


if the matrix containing the covariances between all their
elements is a zero matrix.
Using the results
1
( ) b X X X e

' '
| = and
( )
1

( ) e I X X X X e

' '
= ,
the covariances between the elements in b and those in

e are
given by

| | ( )
( )
1 1
2 1 1 1

( ) ( ) ( )
( ) ( ) ( )
0
E b e X X X E ee I X X X X
X X X X X X X X X X


' ' ' ' ' '
| =
' ' ' ' ' ' (
= o

=

Topic 3.2: Some Distribution Theory
30

Testing hypotheses with 1 or more linear constraints
1
1
( , )
( ) ( ) ( )

( )
J N K
Rb R R X X R Rb R J
F
e e N K

' ' ' (


| |

'


Consider testing
0
: H R r | = against
0
: H R r | = . When
0
H is true,

1
1
( , )
( ) ( ) ( )

( )
J N K
Rb r R X X R Rb r J
F
e e N K

' ' ' (




'



1
1
2
( )
( ) ( ) ( )

( )
d
J
Rb r R X X R Rb r
e e N K

' ' ' (




_
'


These are the two test statistics given by EViews.
Topic 3.2: Some Distribution Theory
31

Example

Topic 3.2: Some Distribution Theory
32

We will consider part (c).

0 1
: : H R r H R r | = | =

1
2
3
4
0 1 1 0 0
0 1 1 1 1
R r
|
(
(
|
( (
(
| = = =
( (
| (

(
|


2 3
0
2 3 4
0
:
1
H
| | =

| +| +| =



Topic 3.2: Some Distribution Theory
33


Topic 3.2: Some Distribution Theory
34

Testing whether all coefficients except the intercept are zero
In this case
| |
1
0
K
R I

= and 0 r = .
In terms of the notation in Tutorial 6, QA1.
0
: 0
s
H | =
It can be shown that
1
1
2
( ) ( ) ( ) ( 1)

( )
( ) ( ) ( 1)

s s s s s s
Rb r R X X R Rb r K
e e N K
b X DX b K

' ' ' (




'

' '
| |
=
o

When
0
H is true,
( 1, )
2
( 1)

s s s s
K N K
b X DX b K
F

' '

o

Topic 3.2: Some Distribution Theory
35

Now, from Tutorial 6

( )
2
2
1 1

s s s s
N N
i i
i i
b X DX b y Dy e e
y y e
= =
' ' ' '
=
=


which is the same as
regression sum of squares = total sum of squares sum of squared errors
SSR = TSS SSE
This F-statistic can be written as
( 1, )
( ) ( 1)
( )
K N K
TSS SSE K
F
SSE N K


Topic 3.2: Some Distribution Theory
36

It can be found in the EViews output as follows:

Topic 3.2: Some Distribution Theory
37

Testing hypotheses with 1 linear constraint: the t-distribution
Reconsider the example and look at part (b).

| | | |
1
2
3
4
0 1 1 1 1 R r
|
(
(
|
(
| = = =
| (
(
|


When doing this test, we get a t-value as well as the F and
2
_ values.
How is it calculated?
Topic 3.2: Some Distribution Theory
38


Topic 3.2: Some Distribution Theory
39

In this case we can write
1
1
2 1
2
1
( ) ( ) ( )
( ) ( )

( ) ( )
( )

( )
Rb R R X X R Rb R J
Rb R Rb R
e e N K R X X R
Rb R
R X X R

' ' ' (


| |
'
| |

=
' ' '
o
| |
|
= |
|
' '
o
\ .

We will see that
( )
1
( )

( )
N K
Rb R
t
R X X R

|
' '
o

and hence that
2
(1, ) ( ) N K N K
F t

= .
Topic 3.2: Some Distribution Theory
40

Definition
The ratio of a (0,1) N random variable to the square root of an
independent
2
( ) G
_ random variable divided by its degrees of freedom
is a
( ) G
t random variable (a t random variable with G degrees of
freedom.)

( )
2
( )
(0,1)
G
G
N
t
G
=
_

When 1 J = ,

( )
2 1
, ( ) Rb N R R X X R

' '
| o
( )
1
0,1
( )
Rb R
N
R X X R

|
' '
o

Topic 3.2: Some Distribution Theory
41

( )
1
0,1
( )
Rb R
N
R X X R

|
' '
o


2
2
( )
2 2

( )
N K
e e N K

'
o
= _
o o

1
( )
2
2
1
( )

( )
( )

( )
N K
Rb R
R X X R
t
N K
N K
Rb R
R X X R

|
' '
o
=
o

o
|
=
' '
o

When
0
: H R r | = is true
( )
1

( )
N K
Rb r
t
R X X R

=
' '
o

Topic 3.2: Some Distribution Theory
42

( )
1

( )
N K
Rb r
t
R X X R

=
' '
o

is the t-value computed by EViews when doing the Wald test.
Using the 3
rd
coefficient as an example, a routine case is where
| |
0 0 1 0 R = and 0 r = , in which case we have
3
33

b
t
a
=
o

where
33
a is the 3rd diagonal element of
1
( ) X X

' .
These are the values given routinely on the EViews output.
Topic 3.2: Some Distribution Theory
43

The general F test written as the difference between restricted and
unrestricted sums of squared errors
( )
1
1
( ) ( ) ( )

( ) ( )
SSE SSE J Rb r R X X R Rb r J
F
e e N K SSE N K

-
' ' ' (


= =
'


where SSE
-
is the sum of squared errors from the model estimated
under the restriction that R r | = , and SSE is the sum of squared
errors from the unrestricted model.
For a proof, see J ohnston and Dinardo, p.90-99., or Greene, p.83-92.
We have already seen how this is true for the special case
0
: 0
s
H | = .
We will demonstrate by example for another case.
Topic 3.2: Some Distribution Theory
44


The restricted model for part (c) where
2 3
| = | and
2 3 4
1 | +| +| = is

1 2
2
ln ln
PROD AREA LABOR
e
FERT FERT

| | | |
= | +| +
| |
\ . \ .

Its EViews output follows
Topic 3.2: Some Distribution Theory
45


40.60791 SSE
-
=
From slide 36, 40.56536 SSE =
Topic 3.2: Some Distribution Theory
46


( )
40.60791 40.56536 2
0.1825
40.56536 (352 4)
F

= =


Which agrees with the result on slide 33

You might also like