Professional Documents
Culture Documents
EXAMINATIONS
1. This examination paper contains four (4) questions and comprises seven (7) printed pages.
2. Candidates must answer ALL questions on the paper.
3. Each question carries 20 marks. The total mark for the paper is 80.
4. Calculators may not be used.
5. Statistical tables will not be available.
6. This is a closed book exam.
ST5223
1.
p1
X
j xij + i
i {1, . . . , n}
j=1
where i are i.i.d. zero mean random variables, p > 1. For simplicity we will assume:
i N (0, 2 ), with N (0, 2 ) the univariate normal distribution with zero mean and
variance 2 .
p<n
i.i.d.
(i) Consider the estimation of = 0:p1 . Show that maximizing the log-likelihood of the
normal linear model and minimizing the residual sum of squares lead to the same estimators.
[3 Marks].
(ii) Using vector dierentiation, derive the least squares estimator. Show that estimator is
unbiased. Note: you do not need to use any Q R decompositions. [4 Marks].
(iii) State, but do not prove, the Gauss Markov theorem. [3 Marks].
(iv) Let e = BY , where B is a deterministic p n matrix (which could depend upon X ), be
any unbiased estimator of . Show that BX = Ip and establish that
(B S 1 X 0 )(B S 1 X 0 )0 + S 1 = BB 0
where S = X 0 X . [5 Marks].
(v) Show that the variance of e in (iv) is minimized (that is, the diagonal elements of the
covariance matrix) when e = (X 0 X)1 X 0 Y . [5 Marks].
ST5223
2.
This question again concerns the normal linear model, under study in question 1, which in matrix
and vector notation is:
Y = X +
with Nn (0, 2 In ). Recall Nn (0, 2 In ) is the normal distribution with mean vector 0 and
covariance matrix 2 In , In the n n identity matrix.
(i) Suppose that there is an orthogonal matrix Q such that the n p matrix R = QX is
upper-triangular (rij = 0 for i > j ). Show that
(Y X)0 (Y X) = (QY R)0 (QY R)
and hence that minimizing the residual sum of squares is the same as minimizing: kQY
Rk2 . [4 Marks].
(ii) By partitioning R into its rst p rows (U , you may assume that this matrix has rank p) and
its zero n p rows and partition Q in a similar manner into V and W , show that one must
take U = V to minimize the residual sum of squares. [4 Marks].
(iii) For the linear model under study, show that:
b =
E[]
b = 2 (X 0 X)1
Var[]
where U b = V . [6 Marks].
(iv) For the linear model under study, show that:
E[e
2] = 2
b
where e2 = RSS()/(n
p) and
b
RSS()
[6 Marks].
= kW k2 .
ST5223
3.
This question again concerns the normal linear model, under study in question 1, which in matrix
and vector notation is:
Y = X +
where IG(a, b) is the inverse Gamma distribution with probability density function:
p( 2 ) =
b
ba
( 2 )(a+1) e 2
(a)
with a, b > 0 and () the gamma function. The joint distribution of , 2 is called the
normal-inverse gamma distribution, denoted N IG(m, V, a, b).
(a) Show that the posterior density function of , 2 is of the form:
2 (a +p/2+1)
(, |Y, X) ( )
exp
1
0
1
2 ( m ) (V ) ( m ) + 2b
2
where
m
V
a
b
=
=
=
=
V (V 1 m + X 0 Y )
(V 1 + X 0 X)1
a + n/2
b + [m0 V 1 m + Y 0 Y (m )0 (V )1 m ]/2
where all inverses are assumed to exist. Hence identify the joint posterior distribution
of , 2 . You may assume, without proof, that
(Y X)0 (Y X) + ( m)0 V 1 ( m) + 2b = ( m )0 (V )1 ( m ) + 2b .
[5 Marks].
(b) Show that the marginal posterior density of is:
( 21 (2a + p)) |V |1/2
(|Y, X) =
[1 + ( m )0 (V )1 ( m )/2b ](2a +p)/2 .
2a
p/2
p/2
(2b )
( 2 )
Comment on the tails of the posterior distribution, relative to that of the prior. [5
Marks].
ST5223
(ii) An often used method for regularization is ridge regression. That is, when there is no
well-dened solution of the normal equations in linear regression (X 0 X) = X 0 y . In this
scenario, it is sought to minimize, with respect to (w.r.t.) , the equation
kY Xk2 + kk2
ST5223
4.
(i) Consider observations (y1 , x1,0:p1 ), . . . , (yn , xn,0:p1 ). Describe the three elements of
eralized linear models that have been studied in this course. [5 Marks].
gen-
(ii) The following question concerns a dynamic probit regression model. Here one observes
(y1 , x1 ), . . . , (yn , xn ), . . . sequentially in time with (yn , xn ) {0, 1} R. The proposed
model is:
Yn = I(0,) (Zn )
Zn = n xn + n
n = n1 + n
(1)
(2)
i.i.d.
where
Z
p(n |z1:n1 ) =
[4 Marks].
(c) Again, considering only (1)-(2), show that:
zx
1
1 1
1 |z1 N
,
.
1 + x21 1 + x21
2
n1 |z1:n1 N n1 , n1
show that:
n |z1:n1 N
2
n|n1 , n|n1
n |z1:n N n , n2
2
for some n|n1 , n|n1
, n , n2 to be determined. [7 Marks].
ST5223
(d) It is of interest to infer the regression coecients as data arrives. Given the above
results and assuming one can perform expectations w.r.t. p(z1:n |y1:n ) suggest how one
may calculate the expectation:
E[n |y1:n ].
Note that as one cannot analytically perform expectations w.r.t. p(z1:n |y1:n ) you may
leave your answer in terms of an integral. [2 Marks].