Professional Documents
Culture Documents
_
y
1
.
.
.
y
n
_
_
=
_
_
1 x
11
. . . x
1k
.
.
.
.
.
.
1 x
n1
. . . x
nk
_
_
_
0
.
.
.
k
_
_
+
_
1
.
.
.
n
_
_
or
y = X + .
Week 3 Lecture 2 - Full rank linear model
Least squares estimate b of (provided (X
X)
1
exists) is
b = (X
X)
1
X
y.
For simple linear regression model, we derived expressions for:
E(b
0
), E(b
1
), Var (b
0
), Var (b
1
), Cov(b
0
, b
1
).
For least squares estimator b of , want similar results.
Expectations of random vectors.
Y = (Y
1
, ..., Y
k
)
.
Week 3 Lecture 2 - Result about vector expectations
Lemma:
i) If a is a k 1 vector of constants then E(a) = a.
ii) If a is a k 1 vector of constants, and Y is a k 1 random
vector with E(Y) = , then E(a
Y) = a
.
iii) If A is an n k matrix, and Y is a k 1 random vector with
E(Y) = , then E(AY) = A.
Week 3 Lecture 2 - Vector expectations
Proof of i) is obvious.
Proof of ii:
a
Y =
k
i =1
a
i
Y
i
and hence
E(a
Y) = E(
k
i =1
a
i
Y
i
)
=
k
i =1
a
i
E(Y
i
)
=
k
i =1
a
i
i
= a
k
m=1
A
im
Y
m
and so
E((AY)
i
) = E(
k
m=1
A
im
Y
m
)
=
k
m=1
A
im
E(Y
m
)
=
k
m=1
A
im
m
= (A)
i
.
Week 3 Lecture 2 - Unbiasedness of b
Exercise: apply the lemma just proved to show that b, the least
squares estimator of , is unbiased.
We have
b = (X
X)
1
X
y,
so, applying our lemma
E(b) = (X
X)
1
X
E(y)
= (X
X)
1
X
X
=
so that the least squares estimator is unbiased.
We would also like some measure of the precision of b.
What are the variances of components of b? Covariances?
Week 3 Lecture 2 - Covariance matrices
Covariance matrix (sometimes called variance-covariance matrix)
of a random vector:
Y is a k 1 random vector, the covariance matrix of Y denoted
Var (Y), k k matrix with element (i , j ) equal to Cov(Y
i
, Y
j
).
We have seen that expected value of a random matrix A is the
matrix with the same dimensions as A and with element (i , j )
equal to E(A
ij
).
Hence, we can write:
Var (Y) = E((Y )(Y )
)
Diagonal elements of Var (Y) are variances of components of Y,
and Var (Y) is symmetric (since Cov(Y
i
, Y
j
) = Cov(Y
j
, Y
i
)).
Week 3 Lecture 2 - Results about variances and covariance
matrices
Lemma:
i) Let Y be a k 1 random vector with Var (Y) = V. If a is a
k 1 vector of real numbers, then
Var (a
Y) = a
Va.
ii) Let Y be a k 1 random vector with Var (Y) = V. Let A be a
k k matrix. If Z = AY, then
Var (Z) = AVA
.
Week 3 Lecture 2 - Covariance matrices
Proof of i:
Var (a
Y) = Var (
k
i =1
a
i
Y
i
) = E
_
_
_
k
i =1
a
i
Y
i
k
i =1
a
i
i
_
2
_
_
= E
_
_
_
k
i =1
a
i
(Y
i
i
)
_
2
_
_
= E
_
_
k
i =1
k
j =1
a
i
a
j
(Y
i
i
)(Y
j
j
)
_
_
=
k
i =1
k
j =1
a
i
a
j
E((Y
i
i
)(Y
j
j
))
=
k
i =1
k
j =1
a
i
a
j
V
ij
= a
Va.
Week 3 Lecture 2 - Covariance matrices
Cov(Z
i
, Z
j
) = Cov
q=1
A
iq
Y
q
,
k
r =1
A
jr
Y
r
= E
q=1
A
iq
(Y
q
q
)
r =1
A
jr
(Y
r
r
)
= E
q=1
k
r =1
A
iq
A
jr
(Y
r
r
)(Y
q
q
)
=
k
q=1
k
r =1
A
iq
A
jr
V
qr
=
k
q=1
A
iq
k
r =1
V
qr
A
jr
=
k
q=1
A
iq
(VA
)
qj
= (AVA
)
ij
Week 3 Lecture 2 - Properties of least squares estimator
Theorem:
In the full rank linear model, the least squares estimator
b = (X
X)
1
X
y is unbiased,
E(b) =
with covariance matrix
Var (b) =
2
(X
X)
1
.
Week 3 Lecture 2 - Properties of least squares estimator
Recall:
E(b) = E((X
X)
1
X
y)
= (X
X)
1
X
X
=
Finding an expression for the covariance matrix is easy to do using
the previous lemma.
Using Var (y) =
2
I (where I is the identity matrix)
Var (b) =
2
(X
X)
1
X
((X
X)
1
X
=
2
(X
X)
1
X
X(X
X)
1
=
2
(X
X)
1
Week 2 Lecture 2 - Polynomial Regression (Special
examples of LINEAR MODELS).
j =1
j
x
ij
+
K
k=1
L
l =1
kl
x
l
ik
+
M
m=1
S
s=1
ms
x
im
x
is
+ . . . +
i
j =1
w
j
j
(x) = w
T
(x)
with
j
(x) a basis function and we dene
0
(x) = 1.
Gaussian basis:
j
(x) = exp
_
(x)
2
2s
2
_
Sigmoidal basis:
j
(x) =
_
x
s
_
where the logistic sigmoid
function (a) =
1
1+exp(a)
or (a) =
tanh(a)+1
2
.
1 0 1
1
0.5
0
0.5
1
1 0 1
0
0.25
0.5
0.75
1
1 0 1
0
0.25
0.5
0.75
1
Figure : polynomial basis functions, Gaussian basis functions and
sigmoidal basis function examples.
Week 3 - Basis Function Regression.
Under the squared error loss function (we know from week
one simple linear model case) that the optimal prediction, for
a new value of x is given by the the conditional mean of the
target variable -
E(y|x) =
_
yp(y|x)dy = f (x, w).
_
1
T
y
Week 3 - Basis Function Regression.
Where the design matrix for the basis function regression becomes
=
_
0
(x)
1
(x) . . .
M
(x)
=
_
0
(x
11
)
1
(x
11
) . . .
M
(x
1M
)
.
.
.
.
.
.
0
(x
11
)
1
(x
1n
) . . .
M
(x
nM
)
_
_
Week 2 Lecture 2 - Learning Expectations.