You are on page 1of 9

Week 2 - Solutions

1. We show the claim only in the case that X and Y are continuous random variables.
Let fX,Y be the joint distribution of X and Y , then
Z Z
E(aX + bY ) =

(ax + by)fX,Y (x, y)dxdy


Z Z
=
axfX,Y (x, y)dxdy +
byfX,Y (x, y)dxdy
Z
Z
= a xfX (x)dx + b yfY (y)dy
Z Z

= aE(X) + bE(Y )

2. (a) From the denition of the Variance.


Var(aX) = E((aX E(aX))2 )
= a2 E((X E(X))2 ) by linearity of the expectation
= a2 Var(X)

(b) The rst claim is obvious, we show only the second. If X and Y are independent,
then
E [(X E(X))(Y E(Y ))] = E [(X E(X))] E [(Y E(Y ))] = 0

(c) From the denition of the Covariance function, we have


Cov(X, Y ) = E [(X E(X))(Y E(Y ))]
= E [(Y E(Y ))(X E(X))]
= Cov(Y, X)

To prove linearity in the rst entry,


Cov(aX + bY, Z) = E [((aX + bY ) E(aX + bY ))(Y E(Y ))]
= aE [((X E(X))(Y E(Y ))] + bE [((Y ) E(Y ))(Y E(Y ))]
= aCov(X, Z) + bCov(Y, Z)

Linearity in the second component follows from the fact that the Covariance
function is symmetric.
Finally, for a, b R and random variable X and Y ,
Var(aX + bY )
= Cov(aX + bY, aX + bY )
= Cov(aX, aX + bY ) + Cov(bY, aX + bY ) linearity in the rst entry
= Cov(aX, aX) + Cov(aX, bY ) + Cov(bY, aX) + Cov(bY, bY ) linearity in the second entry
= Var(aX) + 2Cov(aX, bY ) + Var(bY )
= a2 Var(X) + 2abCov(X, Y ) + b2 Var(Y )

3. In the simple linear regression model,


SSreg =
=

n
X
i=1
n
X

(b
yi yi )2
(b0 + b1 xi (b0 + b1 x
))2

i=1

n
X

b21 (xi x
)2

i=1

= b21 Sxx

where in the second equality, we have use the fact that


ybi = b0 + b1 xi

and y = b0 + b1 x

One can conclude the by using the fact that b1 =

Sxy
Sxx

4. We compute directly,
Sxx =
=
=
=

n
X
i=1
n
X
i=1
n
X
i=1
n
X

(xi x
)xi
(xi x
)(xi x
+x
)
(xi x
)(xi x
) +

n
X

(xi x
)
x

i=1

(xi x
)(xi x
)

i=1

where the second term is zero since


n
X

(xi x
)
x=x

i=1

n
X

(xi x
)

i=1

=x

n
X

xi n
x=0

i=1

By a similar argument,
Sxy =

n
X

(xi x
)yi

i=1

n
X

(xi x
)(yi y + y)

i=1

n
X

(xi x
)(yi y) + y

i=1

n
X

n
X
i=1

(xi x
)(yi y)

i=1

(xi x
)

On the other hand


Sxy =
=
=

n
X
i=1
n
X
i=1
n
X

(xi x
)yi
yi xi x

yi xi

i=1

1
n

n
X

yi

i=1
n
X
i=1

xi

n
X

yi

i=1

5. Compute Var(b0 )
Var(b0 ) = Var(
y b1 x
)
= Var(
y ) + Var(b1 x
) 2Cov(
y , b1 x
)
=

2
2
+x
2
n
Sxx

Where we have made use of the following


n

Var(
y ) = Var(

1X
yi )
n
i=1

n
1 X
= 2
Var(yi ) by independence of yi
n
i=1

1
= 2
n

using the fact that Var(yi ) = 2

and
Var(b1 x
) = x
2 Var(b1 )
n
1 X
=x
2 2
(xi x
)2 Var(yi ) Using the fact that b1 =
Sxx

Sxy
Syy

i=1

1
= x

Sxx
2 2

To compute the covariance


1
x
Cov(
y , Sxy )
2
Sxx
n
1 X
= 2 x

Cov(yi , Sxy )
Sxx

Cov(
y , b1 x
) =

1
x

2
Sxx

i=1
n
X

Cov(yi ,

i=1

n
X
(xj x
)yj ) by linearity in the rst component
j=1

n
n
1 XX
= 2 x

(xj x
)Cov(yi , yj ) by linearity in the second component
Sxx

1
x

2
Sxx

i=1 j=1
n
X

(xi x
) 2

Cov(yi , yj ) = 2 when j = i

i=1

=0

6. Compute Cov(b0 , b1 )
Cov(b0 , b1 ) = Cov(
y b1 x
, b1 )
= Cov(
y , b1 ) Cov(b1 x
, b1 )
= Cov(
y , b1 ) Cov(b1 x
, b1 )
=
xCov(b1 , b1 ) we used the results of the previous question
1
=
x 2 Sxx

7. Let X N (0, 1) and Y 2 (n), compute the density of T := XY /n


p
P(T t) = P(X t Y /n)
Z
=
fX,Y (x, y)dxdy
xt

y/n

Z
=

xt

fX (x)fY (y)dxdy
y/n

Z ty/n

fY (y)
Z

fX (x)dxdy

p
fY (y)FX (t y/n)dy

assuming that we can dierentiate under the integral sign to obtain


Z

p
p
fY (y) y/nfX (t y/n)dy
Z0

y t2 y
n
1
1 y2
2

=
y
e
e 2n

n
2n
2 2 n2
0
Z
2
n1
1
1
1 ( t +1)y
2 e 2 n

= n
y
dy

2n 0
2 2 n2

fT (t) =

Change variable by setting 12 ( tn + 1)y = z . We obtain using the fact that ( 12 ) =


and the denition of the Gamma function,
2

1
 2
 n1
Z
2
n1
n1
t2
t
1
2

+1
2
+1
z 2 ez 2dz

n
n
n
2n 0
2
1
 n1


 2
 2
2
n+1
1
t
t
n+1
= n
+1
2 2
+1


n
n
2
2 2 n 2 n2

n+1
n+1
 2
 2
2 2 n+1
t
2
= n+1
+
1


n
2 2 n 21 n2
 2
 n+1
2
1
1
t

=
+1
.
1 n
B( 2 , 2 ) n n

1
fT (t) = n
22

1 n
Where the last inequality follows from the fact that B( 12 , n2 ) = ( (2 )n+1( 2) ) .
2

8. It is enough to notice that xT AT Ax = kAxk2 0 and xT AAT x = kAT xk2 0


4

Week 4 - Solutions

1. Let X = CY , then from the denition of the Covariance function for random vectors
Var(CX) = E((CX E(CX))((CX E(CX))T )
= E((CX E(CX))([C(X E(X))]T )
= CE((X E(X))((X E(X))T )C T
= CVar(X)C T .

Recall that b = X(X T X)1 X T y . Therefore, by using the above formula and the fact
that X T X is a symmetric matrix.
Var(b) = Var((X T X)1 X T y)
= (X T X)1 X T Var(y)X(X T X)1
= X(X T X)1 X T 2 IX(X T X)1
= 2 (X T X)1 .

2. Given a n n matrix X , tr(X) =

i=1 Xii .

Pn

(a) Show tr(cX) = ctr(X).


tr(cX) =

n
X

cXii = c

i=1

n
X

Xii = ctr(X)

i=1

(b) Show tr(X + Y ) = tr(X + Y ).


tr(X + Y ) =

n
n
n
X
X
X
[Xii + Yii ] =
Xii +
Yii = tr(X) + tr(Y )
i=1

i=1

i=1

(c) Show tr(XY ) = tr(Y X).


tr(XY ) =

n X
n
X
i=1 j=1
n X
n
X

Xij Yji
Yji Xij

j=1 i=1

= tr(Y X)

3. Prove the formula E(y T Ay) = tr(AV ) + T A, where V := Var(y) and := E(y).
T

E(y Ay) =
=

n X
n
X
i=1 j=1
n
n X
X
i=1 j=1
n
n X
X

E[yi Aij yj ]
Aij [Cov(yi , yj ) + E(yi )E(yj )]
Aij Vji +

i=1 j=1

n X
n
X
i=1 j=1

= tr(AV ) + T A

E(yi )Ai,j E(yj )

where in the second equality, we have used the fact that


Cov(yi , yj ) = E(yi yj ) E(yi )E(yj )

4. To nd yb, we need to solve the following system of equations





 yb1

 7
 
1 1
1
1 1
1
9
yb2 =
0 =
2 1 1
2 1 1
12
yb3
2

Please enjoy yourself by doing row reduction.


5. Suppose v1 , . . . , vk is an orthogonal basis of V . Since yb is assumed to be in V then
y =

k
X

ci vi

i=1

for some ci R. It is then sucient to nd the ci . To do that, we multiple by vjT on


both hand sides and
vjT y =

k
X

ci vjT vi = ci kvj k2 ,

i=1

since vjT vi = 0 if i 6= j .
6. To show that c = (c1 , . . . , ck ) where where c1 = y T xi /kxi k2 solves the normal equation,
it is sucient to notice that since X = (x1 , . . . , xk ) is an orthogonal basis for V , then

kx1 k2

XT X =
0

...

kxk k2

and by substituting c = (c1 , . . . ck ) into the normal equation, we see that c satises
the normal equation.
7. The projection of y onto V is given by
yb =

k
X
y T xi
xi
kxi k2
i=1

Week 6 - Solutions

1. For vectors x and y , by using the fact that xT y = y T x, we have


kx yk2 + kx + yk2 = (x y)T (x y) + (x + y)T (x + y)
= xT x 2xT y + y T y + xT x + 2xT y + y T y
= 2kxk2 + 2kyk2

2. Given the vector y and x, add and subtract yb to y cx.


y cx = y cx yb + yb
= (y yb) + (b
y cx).

The vector y yb is perpendicular to yb cx = (b c)x. Therefore from Pythagoras


theorem
ky cxk2 = ky ybk2 + kb
y cxk2
ky ybk2

This shows that the distance from y to yb is the shortest amongst all vectors of the
form cx for c R.
3. Expand ky cxk2 = (y cx)T (y cx) to obtain
ky cxk2 = kyk 2cy T x + c2 kxk2

which is a parabolic equation in c and from the previous question we know that it has
an unique minimum. Therefore the determinate is less or equal to zero. That is
4c2 |y T x|2 4c2 kxk2 kyk2 0 = |y T x|2 kxk2 kyk2

4. (a) Compute the moment generating function of 2k random variable.By making the
substitution ( 12 t)x = y
Z

e
0

tx

k
2

( k2 )

k
1
2

x2

dx =
2

Z
=

1
2

k
2

= (2

x 2 1 e( 2 t)x dx
k

2 ( k2 )
Z

( k2 )

( k2 )

k
2

k
2

((21 t)1 y) 2 1 ek (21 t)1 dy


k

(21 t)1 2 y 2 1 ey (21 t)1 dy

0
k2

t)

1
k
2

2
k 1
= (21 t) 2 k
22
k
= (1 2t) 2

( k2 )

y 2 1 ey dy

To compute the expectation


M 0 (t)|t=0 =

k
k
d
(1 2t) 2 |t=0 = k(1 2t) 2 1 |t=0 = k
dt

1 ). Therefore under H , we have


(b) Recall that SSreg = b21 Sxx and b1 N (1 , 2 Sxx
0
1 2
2
b
S

.
Therefore
1
2 1 xx

E(

1
1
SSreg ) = E( 2 b21 Sxx ) = 1
2

(c) Computing the SSreg in general


E(SSreg ) = E(b21 Sxx )
= Sxx E(b21 )


= Var(b1 ) + E(b1 )2 Sxx


1
= 2 Sxx
+ 12 Sxx
= 2 + 12 Sxx

We can that E(SSreg ) under the null is smaller, since 12 Sxx 0.


(d) Recall that 12 SSres = (np)

b2 2np . In the simple linear regress model p = 2.


2
Therefore
E(SSres ) = 2 (n 2)

Using the fundamental identity we have show that


E(SStotal ) = E(SSreg ) + E(SSres )
= 2 (n 2) + 2 + 12 Sxx
= 2 (n 1) + 12 Sxx

under the null 1 = 0, E(SStotal ) = 2 (n 1).


5. Properties of e.
(a) In the simple linear regression model
E(ei ) = (yi ybi ) = E(0 + 1 xi b0 + b1 xo )

which is equal to zero, since E(b0 ) = 0 and E(b1 ) = 1 .


Alternatively To show that E(e) = E(y Xb) = 0, it is enough to use the fact
that b is an unbiased estimator of and write
E(y Xb) = X XE(y) = X X = 0

(b) Xb = X(X T X)1 X T y = Hy .


(c) The matrix H is a n n matrix. the matrix H is symmetric, since
H T = (X(X T X)1 X T )T = (X T )T ((X T X)1 )T X T = X((X T X)1 )X T = H,

and X T X is symmetric.
(d) Computing the variance of e = (I H)y .
Var(e) = Var((I H)y)
= (I H)Var(y)(I H)T
= 2 (I H)(I H)T
= 2 (I H)

where the last equality holds since I H is symmetric and idempotent.


8

(e) From the above, we can write that


Var(ei ) = 2 (1 Hii )
Cov(ei , ej ) = 2 Hij

You might also like