You are on page 1of 21

MATH2831/2931

Linear Models/ Higher Linear Models.


August 14, 2013
Week 3 Lecture 3 - Last lecture:

Least squares estimation of parameters in the general linear


model.

The full rank linear model.

Example - Polynomial regression

Example 2 - Basis Function Regressions


Week 3 Lecture 2 - This lecture:

Properties of least squares estimator of .


Week 3 Lecture 2 - Formulation of the general linear model
Responses y
1
, ..., y
n
.
The general linear model:
y
i
=
0
+
1
x
i 1
+ ... +
k
x
ik
+
i
where
0
, ...,
k
are unknown parameters and
i
, i = 1, ..., n are a
collection of uncorrelated errors with zero mean and common
variance
2
.
_

_
y
1
.
.
.
y
n
_

_
=
_

_
1 x
11
. . . x
1k
.
.
.
.
.
.
1 x
n1
. . . x
nk
_

_
_

0
.
.
.

k
_

_
+
_

1
.
.
.

n
_

_
or
y = X + .
Week 3 Lecture 2 - Full rank linear model
Least squares estimate b of (provided (X

X)
1
exists) is
b = (X

X)
1
X

y.
For simple linear regression model, we derived expressions for:
E(b
0
), E(b
1
), Var (b
0
), Var (b
1
), Cov(b
0
, b
1
).
For least squares estimator b of , want similar results.
Expectations of random vectors.
Y = (Y
1
, ..., Y
k
)

a random vector, E(Y


i
) =
i
, then
E(Y) = = (
1
, ...,
n
)

.
Week 3 Lecture 2 - Result about vector expectations
Lemma:
i) If a is a k 1 vector of constants then E(a) = a.
ii) If a is a k 1 vector of constants, and Y is a k 1 random
vector with E(Y) = , then E(a

Y) = a

.
iii) If A is an n k matrix, and Y is a k 1 random vector with
E(Y) = , then E(AY) = A.
Week 3 Lecture 2 - Vector expectations
Proof of i) is obvious.
Proof of ii:
a

Y =

k
i =1
a
i
Y
i
and hence
E(a

Y) = E(
k

i =1
a
i
Y
i
)
=
k

i =1
a
i
E(Y
i
)
=
k

i =1
a
i

i
= a

Week 3 Lecture 2 - Vector expectations


Proof of iii:
(AY)
i
=

k
m=1
A
im
Y
m
and so
E((AY)
i
) = E(
k

m=1
A
im
Y
m
)
=
k

m=1
A
im
E(Y
m
)
=
k

m=1
A
im

m
= (A)
i
.
Week 3 Lecture 2 - Unbiasedness of b
Exercise: apply the lemma just proved to show that b, the least
squares estimator of , is unbiased.
We have
b = (X

X)
1
X

y,
so, applying our lemma
E(b) = (X

X)
1
X

E(y)
= (X

X)
1
X

X
=
so that the least squares estimator is unbiased.
We would also like some measure of the precision of b.
What are the variances of components of b? Covariances?
Week 3 Lecture 2 - Covariance matrices
Covariance matrix (sometimes called variance-covariance matrix)
of a random vector:
Y is a k 1 random vector, the covariance matrix of Y denoted
Var (Y), k k matrix with element (i , j ) equal to Cov(Y
i
, Y
j
).
We have seen that expected value of a random matrix A is the
matrix with the same dimensions as A and with element (i , j )
equal to E(A
ij
).
Hence, we can write:
Var (Y) = E((Y )(Y )

)
Diagonal elements of Var (Y) are variances of components of Y,
and Var (Y) is symmetric (since Cov(Y
i
, Y
j
) = Cov(Y
j
, Y
i
)).
Week 3 Lecture 2 - Results about variances and covariance
matrices
Lemma:
i) Let Y be a k 1 random vector with Var (Y) = V. If a is a
k 1 vector of real numbers, then
Var (a

Y) = a

Va.
ii) Let Y be a k 1 random vector with Var (Y) = V. Let A be a
k k matrix. If Z = AY, then
Var (Z) = AVA

.
Week 3 Lecture 2 - Covariance matrices
Proof of i:
Var (a

Y) = Var (
k

i =1
a
i
Y
i
) = E
_
_
_
k

i =1
a
i
Y
i

k

i =1
a
i

i
_
2
_
_
= E
_
_
_
k

i =1
a
i
(Y
i

i
)
_
2
_
_
= E
_
_
k

i =1
k

j =1
a
i
a
j
(Y
i

i
)(Y
j

j
)
_
_
=
k

i =1
k

j =1
a
i
a
j
E((Y
i

i
)(Y
j

j
))
=
k

i =1
k

j =1
a
i
a
j
V
ij
= a

Va.
Week 3 Lecture 2 - Covariance matrices
Cov(Z
i
, Z
j
) = Cov

q=1
A
iq
Y
q
,
k

r =1
A
jr
Y
r

= E

q=1
A
iq
(Y
q

q
)

r =1
A
jr
(Y
r

r
)

= E

q=1
k

r =1
A
iq
A
jr
(Y
r

r
)(Y
q

q
)

=
k

q=1
k

r =1
A
iq
A
jr
V
qr
=
k

q=1
A
iq
k

r =1
V
qr
A
jr
=
k

q=1
A
iq
(VA

)
qj
= (AVA

)
ij
Week 3 Lecture 2 - Properties of least squares estimator
Theorem:
In the full rank linear model, the least squares estimator
b = (X

X)
1
X

y is unbiased,
E(b) =
with covariance matrix
Var (b) =
2
(X

X)
1
.
Week 3 Lecture 2 - Properties of least squares estimator
Recall:
E(b) = E((X

X)
1
X

y)
= (X

X)
1
X

X
=
Finding an expression for the covariance matrix is easy to do using
the previous lemma.
Using Var (y) =
2
I (where I is the identity matrix)
Var (b) =
2
(X

X)
1
X

((X

X)
1
X

=
2
(X

X)
1
X

X(X

X)
1
=
2
(X

X)
1
Week 2 Lecture 2 - Polynomial Regression (Special
examples of LINEAR MODELS).

In a polynomial regression model we consider multiple


predictors which can be used to construct a model such as the
examples:
y
i
=
0
+
J

j =1

j
x
ij
+
K

k=1
L

l =1

kl
x
l
ik
+
M

m=1
S

s=1

ms
x
im
x
is
+ . . . +
i

For example we could consider:


y
i
=
0
+
1
x
4
i 1
+
2
x
i 3
+
3
x
i 1
x
i 4
+
i

HOW DO WE SELECT SUCH MODELS?

ARE THERE STATISTICAL PROCEDURES?

ARE THEY AUTOMATED PROCEDURES ?


Week 3 - Basis Function Regression.
Enough of basic regression what else can we do? (note small
change to notation here)

Now consider an extension to the class of linear models so far


considered:
y = f (x, w) +
where
f (x, w) = w
0
+
M1

j =1
w
j

j
(x) = w
T
(x)
with
j
(x) a basis function and we dene
0
(x) = 1.

We still typically assume zero mean Gaussian errors


N(0, ) ( =
2
).

Basis regression is important to many areas of applied


statistics, Signal Processing Engineering, nancial modelling,
spatial point process modelling ....
Week 3 - Basis Function Regression.
Some Popular Basis Functions:

Many choices can be used:

Gaussian basis:
j
(x) = exp
_

(x)
2
2s
2
_

Sigmoidal basis:
j
(x) =
_
x
s
_
where the logistic sigmoid
function (a) =
1
1+exp(a)
or (a) =
tanh(a)+1
2
.
1 0 1
1
0.5
0
0.5
1
1 0 1
0
0.25
0.5
0.75
1
1 0 1
0
0.25
0.5
0.75
1
Figure : polynomial basis functions, Gaussian basis functions and
sigmoidal basis function examples.
Week 3 - Basis Function Regression.

Hence our model is now:


p(y|x, w, ) = N(y|f (x, w), ).

Under the squared error loss function (we know from week
one simple linear model case) that the optimal prediction, for
a new value of x is given by the the conditional mean of the
target variable -
E(y|x) =
_
yp(y|x)dy = f (x, w).

Our least squares estimates carry through in the same manner


as before =
_

_
1

T
y
Week 3 - Basis Function Regression.
Where the design matrix for the basis function regression becomes
=
_

0
(x)
1
(x) . . .
M
(x)

=
_

0
(x
11
)
1
(x
11
) . . .
M
(x
1M
)
.
.
.
.
.
.

0
(x
11
)
1
(x
1n
) . . .
M
(x
nM
)
_

_
Week 2 Lecture 2 - Learning Expectations.

Be familiar with the matrix formulation of the linear regression


model.

Be able to apply basic matrix manipulations to obtain the


least squares estimate and its mean and variance.

You might also like