Professional Documents
Culture Documents
1. We show the claim only in the case that X and Y are continuous random variables.
Let fX,Y be the joint distribution of X and Y , then
Z Z
E(aX + bY ) =
= aE(X) + bE(Y )
(b) The rst claim is obvious, we show only the second. If X and Y are independent,
then
E [(X E(X))(Y E(Y ))] = E [(X E(X))] E [(Y E(Y ))] = 0
Linearity in the second component follows from the fact that the Covariance
function is symmetric.
Finally, for a, b R and random variable X and Y ,
Var(aX + bY )
= Cov(aX + bY, aX + bY )
= Cov(aX, aX + bY ) + Cov(bY, aX + bY ) linearity in the rst entry
= Cov(aX, aX) + Cov(aX, bY ) + Cov(bY, aX) + Cov(bY, bY ) linearity in the second entry
= Var(aX) + 2Cov(aX, bY ) + Var(bY )
= a2 Var(X) + 2abCov(X, Y ) + b2 Var(Y )
n
X
i=1
n
X
(b
yi yi )2
(b0 + b1 xi (b0 + b1 x
))2
i=1
n
X
b21 (xi x
)2
i=1
= b21 Sxx
and y = b0 + b1 x
Sxy
Sxx
4. We compute directly,
Sxx =
=
=
=
n
X
i=1
n
X
i=1
n
X
i=1
n
X
(xi x
)xi
(xi x
)(xi x
+x
)
(xi x
)(xi x
) +
n
X
(xi x
)
x
i=1
(xi x
)(xi x
)
i=1
(xi x
)
x=x
i=1
n
X
(xi x
)
i=1
=x
n
X
xi n
x=0
i=1
By a similar argument,
Sxy =
n
X
(xi x
)yi
i=1
n
X
(xi x
)(yi y + y)
i=1
n
X
(xi x
)(yi y) + y
i=1
n
X
n
X
i=1
(xi x
)(yi y)
i=1
(xi x
)
n
X
i=1
n
X
i=1
n
X
(xi x
)yi
yi xi x
yi xi
i=1
1
n
n
X
yi
i=1
n
X
i=1
xi
n
X
yi
i=1
5. Compute Var(b0 )
Var(b0 ) = Var(
y b1 x
)
= Var(
y ) + Var(b1 x
) 2Cov(
y , b1 x
)
=
2
2
+x
2
n
Sxx
Var(
y ) = Var(
1X
yi )
n
i=1
n
1 X
= 2
Var(yi ) by independence of yi
n
i=1
1
= 2
n
and
Var(b1 x
) = x
2 Var(b1 )
n
1 X
=x
2 2
(xi x
)2 Var(yi ) Using the fact that b1 =
Sxx
Sxy
Syy
i=1
1
= x
Sxx
2 2
Cov(yi , Sxy )
Sxx
Cov(
y , b1 x
) =
1
x
2
Sxx
i=1
n
X
Cov(yi ,
i=1
n
X
(xj x
)yj ) by linearity in the rst component
j=1
n
n
1 XX
= 2 x
(xj x
)Cov(yi , yj ) by linearity in the second component
Sxx
1
x
2
Sxx
i=1 j=1
n
X
(xi x
) 2
Cov(yi , yj ) = 2 when j = i
i=1
=0
6. Compute Cov(b0 , b1 )
Cov(b0 , b1 ) = Cov(
y b1 x
, b1 )
= Cov(
y , b1 ) Cov(b1 x
, b1 )
= Cov(
y , b1 ) Cov(b1 x
, b1 )
=
xCov(b1 , b1 ) we used the results of the previous question
1
=
x 2 Sxx
y/n
Z
=
xt
fX (x)fY (y)dxdy
y/n
Z ty/n
fY (y)
Z
fX (x)dxdy
p
fY (y)FX (t y/n)dy
p
p
fY (y) y/nfX (t y/n)dy
Z0
y t2 y
n
1
1 y2
2
=
y
e
e 2n
n
2n
2 2 n2
0
Z
2
n1
1
1
1 ( t +1)y
2 e 2 n
= n
y
dy
2n 0
2 2 n2
fT (t) =
1
2
n1
Z
2
n1
n1
t2
t
1
2
+1
2
+1
z 2 ez 2dz
n
n
n
2n 0
2
1
n1
2
2
2
n+1
1
t
t
n+1
= n
+1
2 2
+1
n
n
2
2 2 n 2 n2
n+1
n+1
2
2
2 2 n+1
t
2
= n+1
+
1
n
2 2 n 21 n2
2
n+1
2
1
1
t
=
+1
.
1 n
B( 2 , 2 ) n n
1
fT (t) = n
22
1 n
Where the last inequality follows from the fact that B( 12 , n2 ) = ( (2 )n+1( 2) ) .
2
Week 4 - Solutions
1. Let X = CY , then from the denition of the Covariance function for random vectors
Var(CX) = E((CX E(CX))((CX E(CX))T )
= E((CX E(CX))([C(X E(X))]T )
= CE((X E(X))((X E(X))T )C T
= CVar(X)C T .
Recall that b = X(X T X)1 X T y . Therefore, by using the above formula and the fact
that X T X is a symmetric matrix.
Var(b) = Var((X T X)1 X T y)
= (X T X)1 X T Var(y)X(X T X)1
= X(X T X)1 X T 2 IX(X T X)1
= 2 (X T X)1 .
i=1 Xii .
Pn
n
X
cXii = c
i=1
n
X
Xii = ctr(X)
i=1
n
n
n
X
X
X
[Xii + Yii ] =
Xii +
Yii = tr(X) + tr(Y )
i=1
i=1
i=1
n X
n
X
i=1 j=1
n X
n
X
Xij Yji
Yji Xij
j=1 i=1
= tr(Y X)
3. Prove the formula E(y T Ay) = tr(AV ) + T A, where V := Var(y) and := E(y).
T
E(y Ay) =
=
n X
n
X
i=1 j=1
n
n X
X
i=1 j=1
n
n X
X
E[yi Aij yj ]
Aij [Cov(yi , yj ) + E(yi )E(yj )]
Aij Vji +
i=1 j=1
n X
n
X
i=1 j=1
= tr(AV ) + T A
k
X
ci vi
i=1
k
X
ci vjT vi = ci kvj k2 ,
i=1
since vjT vi = 0 if i 6= j .
6. To show that c = (c1 , . . . , ck ) where where c1 = y T xi /kxi k2 solves the normal equation,
it is sucient to notice that since X = (x1 , . . . , xk ) is an orthogonal basis for V , then
kx1 k2
XT X =
0
...
kxk k2
and by substituting c = (c1 , . . . ck ) into the normal equation, we see that c satises
the normal equation.
7. The projection of y onto V is given by
yb =
k
X
y T xi
xi
kxi k2
i=1
Week 6 - Solutions
This shows that the distance from y to yb is the shortest amongst all vectors of the
form cx for c R.
3. Expand ky cxk2 = (y cx)T (y cx) to obtain
ky cxk2 = kyk 2cy T x + c2 kxk2
which is a parabolic equation in c and from the previous question we know that it has
an unique minimum. Therefore the determinate is less or equal to zero. That is
4c2 |y T x|2 4c2 kxk2 kyk2 0 = |y T x|2 kxk2 kyk2
4. (a) Compute the moment generating function of 2k random variable.By making the
substitution ( 12 t)x = y
Z
e
0
tx
k
2
( k2 )
k
1
2
x2
dx =
2
Z
=
1
2
k
2
= (2
x 2 1 e( 2 t)x dx
k
2 ( k2 )
Z
( k2 )
( k2 )
k
2
k
2
0
k2
t)
1
k
2
2
k 1
= (21 t) 2 k
22
k
= (1 2t) 2
( k2 )
y 2 1 ey dy
k
k
d
(1 2t) 2 |t=0 = k(1 2t) 2 1 |t=0 = k
dt
.
Therefore
1
2 1 xx
E(
1
1
SSreg ) = E( 2 b21 Sxx ) = 1
2
and X T X is symmetric.
(d) Computing the variance of e = (I H)y .
Var(e) = Var((I H)y)
= (I H)Var(y)(I H)T
= 2 (I H)(I H)T
= 2 (I H)