Week 3 Lect 3

MATH2831/2931
Linear Models/ Higher Linear Models.

August 14, 2013
Week 3 Lecture 3 - Last lecture:
Least squares estimation of parameters in the general linear

model.
The full rank linear model.
Example - Polynomial regression
Example 2 - Basis Function Regressions

Week 3 Lecture 2 - This lecture:
Properties of least squares estimator of .

Week 3 Lecture 2 - Formulation of the general linear model
Responses y
1
, ..., y
n
.
The general linear model:
y
i
=
0
+
1
x
i 1
+ ... +
k
x
ik
+
i
where
0
, ...,
k
are unknown parameters and
i
, i = 1, ..., n are a
collection of uncorrelated errors with zero mean and common
variance
2
.
_
_
y
1
.
.
.
y
n
_
_
=
_
_
1 x
11
. . . x
1k
.
.
.
.
.
.
1 x
n1
. . . x
nk
_
_
_
0
.
.
.
k
_
_
+
_
1
.
.
.
n
_
_
or
y = X + .
Week 3 Lecture 2 - Full rank linear model
Least squares estimate b of (provided (X
X)
1
exists) is
b = (X
X)
1
X
y.
For simple linear regression model, we derived expressions for:
E(b
0
), E(b
1
), Var (b
0
), Var (b
1
), Cov(b
0
, b
1
).
For least squares estimator b of , want similar results.
Expectations of random vectors.
Y = (Y
1
, ..., Y
k
)
a random vector, E(Y

i
) =
i
, then
E(Y) = = (
1
, ...,
n
)
.
Week 3 Lecture 2 - Result about vector expectations
Lemma:
i) If a is a k 1 vector of constants then E(a) = a.
ii) If a is a k 1 vector of constants, and Y is a k 1 random
vector with E(Y) = , then E(a
Y) = a
.
iii) If A is an n k matrix, and Y is a k 1 random vector with
E(Y) = , then E(AY) = A.
Week 3 Lecture 2 - Vector expectations
Proof of i) is obvious.
Proof of ii:
a
Y =
k
i =1
a
i
Y
i
and hence
E(a
Y) = E(
k
i =1
a
i
Y
i
)
=
k
i =1
a
i
E(Y
i
)
=
k
i =1
a
i
i
= a
Week 3 Lecture 2 - Vector expectations

Proof of iii:
(AY)
i
=
k
m=1
A
im
Y
m
and so
E((AY)
i
) = E(
k
m=1
A
im
Y
m
)
=
k
m=1
A
im
E(Y
m
)
=
k
m=1
A
im
m
= (A)
i
.
Week 3 Lecture 2 - Unbiasedness of b
Exercise: apply the lemma just proved to show that b, the least
squares estimator of , is unbiased.
We have
b = (X
X)
1
X
y,
so, applying our lemma
E(b) = (X
X)
1
X
E(y)
= (X
X)
1
X
X
=
so that the least squares estimator is unbiased.
We would also like some measure of the precision of b.
What are the variances of components of b? Covariances?
Week 3 Lecture 2 - Covariance matrices
Covariance matrix (sometimes called variance-covariance matrix)
of a random vector:
Y is a k 1 random vector, the covariance matrix of Y denoted
Var (Y), k k matrix with element (i , j ) equal to Cov(Y
i
, Y
j
).
We have seen that expected value of a random matrix A is the
matrix with the same dimensions as A and with element (i , j )
equal to E(A
ij
).
Hence, we can write:
Var (Y) = E((Y )(Y )
)
Diagonal elements of Var (Y) are variances of components of Y,
and Var (Y) is symmetric (since Cov(Y
i
, Y
j
) = Cov(Y
j
, Y
i
)).
Week 3 Lecture 2 - Results about variances and covariance
matrices
Lemma:
i) Let Y be a k 1 random vector with Var (Y) = V. If a is a
k 1 vector of real numbers, then
Var (a
Y) = a
Va.
ii) Let Y be a k 1 random vector with Var (Y) = V. Let A be a
k k matrix. If Z = AY, then
Var (Z) = AVA
.
Proof of i:
Var (a
Y) = Var (
k
i =1
a
i
Y
i
) = E
_
_
_
k
i =1
a
i
Y
i

k
i =1
a
i
i
_
2
_
_
= E
_
_
_
k
i =1
a
i
(Y
i

i
)
_
2
_
_
= E
_
_
k
i =1
k
j =1
a
i
a
j
(Y
i

i
)(Y
j

j
)
_
_
=
k
i =1
k
j =1
a
i
a
j
E((Y
i

i
)(Y
j

j
))
=
k
i =1
k
j =1
a
i
a
j
V
ij
= a
Va.
Cov(Z
i
, Z
j
) = Cov
q=1
A
iq
Y
q
,
k
r =1
A
jr
Y
r
= E
q=1
A
iq
(Y
q
q
)
r =1
A
jr
(Y
r

r
)
= E
q=1
k
r =1
A
iq
A
jr
(Y
r

r
)(Y
q
q
)
=
k
q=1
k
r =1
A
iq
A
jr
V
qr
=
k
q=1
A
iq
k
r =1
V
qr
A
jr
=
k
q=1
A
iq
(VA
)
qj
= (AVA
)
ij
Week 3 Lecture 2 - Properties of least squares estimator
Theorem:
In the full rank linear model, the least squares estimator
b = (X
X)
1
X
y is unbiased,
E(b) =
with covariance matrix
Var (b) =
2
(X
X)
1
.
Week 3 Lecture 2 - Properties of least squares estimator
Recall:
E(b) = E((X
X)
1
X
y)
= (X
X)
1
X
X
=
Finding an expression for the covariance matrix is easy to do using
the previous lemma.
Using Var (y) =
2
I (where I is the identity matrix)
Var (b) =
2
(X
X)
1
X
((X
X)
1
X
=
2
(X
X)
1
X
X(X
X)
1
=
2
(X
X)
1
Week 2 Lecture 2 - Polynomial Regression (Special
examples of LINEAR MODELS).
In a polynomial regression model we consider multiple

predictors which can be used to construct a model such as the
examples:
y
i
=
0
+
J
j =1
j
x
ij
+
K
k=1
L
l =1
kl
x
l
ik
+
M
m=1
S
s=1
ms
x
im
x
is
+ . . . +
i
For example we could consider:

y
i
=
0
+
1
x
4
i 1
+
2
x
i 3
+
3
x
i 1
x
i 4
+
i
HOW DO WE SELECT SUCH MODELS?
ARE THERE STATISTICAL PROCEDURES?
ARE THEY AUTOMATED PROCEDURES ?

Week 3 - Basis Function Regression.
Enough of basic regression what else can we do? (note small
change to notation here)
Now consider an extension to the class of linear models so far

considered:
y = f (x, w) +
where
f (x, w) = w
0
+
M1
j =1
w
j
j
(x) = w
T
(x)
with
j
(x) a basis function and we dene
0
(x) = 1.
We still typically assume zero mean Gaussian errors

N(0, ) ( =
2
).
Basis regression is important to many areas of applied

statistics, Signal Processing Engineering, nancial modelling,
spatial point process modelling ....
Some Popular Basis Functions:
Many choices can be used:
Gaussian basis:
j
(x) = exp
_
(x)
2
2s
2
_
Sigmoidal basis:
j
(x) =
_
x
s
_
where the logistic sigmoid
function (a) =
1
1+exp(a)
or (a) =
tanh(a)+1
2
.
1 0 1
1
0.5
0
0.5
1
1 0 1
0
0.25
0.5
0.75
1
1 0 1
0
0.25
0.5
0.75
1
Figure : polynomial basis functions, Gaussian basis functions and
sigmoidal basis function examples.
Hence our model is now:

p(y|x, w, ) = N(y|f (x, w), ).
Under the squared error loss function (we know from week
one simple linear model case) that the optimal prediction, for
a new value of x is given by the the conditional mean of the
target variable -
E(y|x) =
_
yp(y|x)dy = f (x, w).
Our least squares estimates carry through in the same manner

as before =
_
_
1
T
y
Where the design matrix for the basis function regression becomes
=
_

0
(x)
1
(x) . . .
M
(x)
=
_
0
(x
11
)
1
(x
11
) . . .
M
(x
1M
)
.
.
.
.
.
.
0
(x
11
)
1
(x
1n
) . . .
M
(x
nM
)
_
_
Week 2 Lecture 2 - Learning Expectations.
Be familiar with the matrix formulation of the linear regression

model.
Be able to apply basic matrix manipulations to obtain the

least squares estimate and its mean and variance.

Week 3 Lect 3

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Week 3 Lect 3

Uploaded by

Copyright:

Available Formats

MATH2831/2931

Linear Models/ Higher Linear Models.

Least squares estimation of parameters in the general linear

The full rank linear model.

Example - Polynomial regression

Example 2 - Basis Function Regressions

Properties of least squares estimator of .

a random vector, E(Y

Week 3 Lecture 2 - Vector expectations

In a polynomial regression model we consider multiple

For example we could consider:

HOW DO WE SELECT SUCH MODELS?

ARE THERE STATISTICAL PROCEDURES?

ARE THEY AUTOMATED PROCEDURES ?

Now consider an extension to the class of linear models so far

We still typically assume zero mean Gaussian errors

Basis regression is important to many areas of applied

Many choices can be used:

Hence our model is now:

Our least squares estimates carry through in the same manner

Be familiar with the matrix formulation of the linear regression

Be able to apply basic matrix manipulations to obtain the

You might also like