1112sem2 ST5223

NATIONAL UNIVERSITY OF SINGAPORE
EXAMINATIONS
ST5223 Statistical Models: Theory and Applications

(Semester 2: AY 2011-2012)
April/May 2012
Time Allowed: 2 Hours
Instructions for Candidates
1. This examination paper contains four (4) questions and comprises seven (7) printed pages.
2. Candidates must answer ALL questions on the paper.
3. Each question carries 20 marks. The total mark for the paper is 80.
4. Calculators may not be used.
5. Statistical tables will not be available.
6. This is a closed book exam.
ST5223
1.
Consider the following model, for Yi R:

Yi = 0 +
p1
X
j xij + i
i {1, . . . , n}
j=1
where i are i.i.d. zero mean random variables, p > 1. For simplicity we will assume:
i N (0, 2 ), with N (0, 2 ) the univariate normal distribution with zero mean and
variance 2 .
p<n
i.i.d.
(i) Consider the estimation of = 0:p1 . Show that maximizing the log-likelihood of the
normal linear model and minimizing the residual sum of squares lead to the same estimators.
[3 Marks].
(ii) Using vector dierentiation, derive the least squares estimator. Show that estimator is
unbiased. Note: you do not need to use any Q R decompositions. [4 Marks].
(iii) State, but do not prove, the Gauss Markov theorem. [3 Marks].
(iv) Let e = BY , where B is a deterministic p n matrix (which could depend upon X ), be
any unbiased estimator of . Show that BX = Ip and establish that
(B S 1 X 0 )(B S 1 X 0 )0 + S 1 = BB 0
where S = X 0 X . [5 Marks].
(v) Show that the variance of e in (iv) is minimized (that is, the diagonal elements of the
covariance matrix) when e = (X 0 X)1 X 0 Y . [5 Marks].
ST5223
2.
This question again concerns the normal linear model, under study in question 1, which in matrix
and vector notation is:
Y = X +
with Nn (0, 2 In ). Recall Nn (0, 2 In ) is the normal distribution with mean vector 0 and
covariance matrix 2 In , In the n n identity matrix.
(i) Suppose that there is an orthogonal matrix Q such that the n p matrix R = QX is
upper-triangular (rij = 0 for i > j ). Show that
(Y X)0 (Y X) = (QY R)0 (QY R)
and hence that minimizing the residual sum of squares is the same as minimizing: kQY
Rk2 . [4 Marks].
(ii) By partitioning R into its rst p rows (U , you may assume that this matrix has rank p) and
its zero n p rows and partition Q in a similar manner into V and W , show that one must
take U = V to minimize the residual sum of squares. [4 Marks].
(iii) For the linear model under study, show that:
b =
E[]
b = 2 (X 0 X)1
Var[]
where U b = V . [6 Marks].
(iv) For the linear model under study, show that:
E[e
2] = 2
b
where e2 = RSS()/(n
p) and
b
RSS()
[6 Marks].
= kW k2 .
ST5223
3.
This question again concerns the normal linear model, under study in question 1, which in matrix
and vector notation is:
Y = X +
with Nn (0, 2 In ). We will consider alternative estimation procedures to questions 1 and 2.

(i) Consider a Bayesian approach to estimation. It is assumed that the prior is:
| 2 Np (m, 2 V )
2 IG(a, b)
where IG(a, b) is the inverse Gamma distribution with probability density function:
p( 2 ) =
b
ba
( 2 )(a+1) e 2
(a)
with a, b > 0 and () the gamma function. The joint distribution of , 2 is called the
normal-inverse gamma distribution, denoted N IG(m, V, a, b).
(a) Show that the posterior density function of , 2 is of the form:
2 (a +p/2+1)
(, |Y, X) ( )

exp

1
0
1
2 ( m ) (V ) ( m ) + 2b
2
where
m
V
a
b
=
=
=
=
V (V 1 m + X 0 Y )
(V 1 + X 0 X)1
a + n/2
b + [m0 V 1 m + Y 0 Y (m )0 (V )1 m ]/2
where all inverses are assumed to exist. Hence identify the joint posterior distribution
of , 2 . You may assume, without proof, that
(Y X)0 (Y X) + ( m)0 V 1 ( m) + 2b = ( m )0 (V )1 ( m ) + 2b .
[5 Marks].
(b) Show that the marginal posterior density of is:
( 21 (2a + p)) |V |1/2
(|Y, X) =
[1 + ( m )0 (V )1 ( m )/2b ](2a +p)/2 .
2a
p/2
p/2
(2b )
( 2 )
Comment on the tails of the posterior distribution, relative to that of the prior. [5
Marks].
ST5223
(ii) An often used method for regularization is ridge regression. That is, when there is no
well-dened solution of the normal equations in linear regression (X 0 X) = X 0 y . In this
scenario, it is sought to minimize, with respect to (w.r.t.) , the equation
kY Xk2 + kk2
for some > 0.

(a) Show that, for an appropriate range of , the ridge regression estimator is:
b = (X 0 X + Ip )1 X 0 Y.
Interpret, relative to least squares, the new optimization function. [5 Marks].

(b) Bayesian maximum a-posteriori estimation (MAP) constitutes picking a parameter
which maximizes the posterior density (assuming the maxima exists). Identify a Bayesian
interpretation of ridge regression by MAP estimation. [5 Marks].
ST5223
4.
(i) Consider observations (y1 , x1,0:p1 ), . . . , (yn , xn,0:p1 ). Describe the three elements of
eralized linear models that have been studied in this course. [5 Marks].
gen-
(ii) The following question concerns a dynamic probit regression model. Here one observes
(y1 , x1 ), . . . , (yn , xn ), . . . sequentially in time with (yn , xn ) {0, 1} R. The proposed
model is:
Yn = I(0,) (Zn )
Zn = n xn + n
n = n1 + n
(1)
(2)
with 0 = 0, n N (0, 1) and independently n N (0, 1).

i.i.d.
i.i.d.
(a) Show that

P(Yn = 1|n ) = (n xn )
for any n 1 and the Gaussian CDF. [2 Marks].

(b) Consider only equations (1)-(2) that dene the model. Show that, for n 2
p(n |z1:n ) = R
where
p(zn |n )p(n |z1:n1 )

p(zn |n )p(n |z1:n1 )dn
Z
p(n |z1:n1 ) =
p(n |n1 )p(n1 |z1:n1 )dn1 .
[4 Marks].
(c) Again, considering only (1)-(2), show that:
zx
1
1 1
1 |z1 N
,
.
1 + x21 1 + x21
Now assuming that:

2
n1 |z1:n1 N n1 , n1
show that:
n |z1:n1 N
and hence that
2
n|n1 , n|n1

n |z1:n N n , n2
2
for some n|n1 , n|n1
, n , n2 to be determined. [7 Marks].
ST5223
(d) It is of interest to infer the regression coecients as data arrives. Given the above
results and assuming one can perform expectations w.r.t. p(z1:n |y1:n ) suggest how one
may calculate the expectation:
E[n |y1:n ].
Note that as one cannot analytically perform expectations w.r.t. p(z1:n |y1:n ) you may
leave your answer in terms of an integral. [2 Marks].

1112sem2 ST5223

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

1112sem2 ST5223

Uploaded by

Copyright:

Available Formats

NATIONAL UNIVERSITY OF SINGAPORE

ST5223 Statistical Models: Theory and Applications

Time Allowed: 2 Hours

Instructions for Candidates

Consider the following model, for Yi R:

with Nn (0, 2 In ). We will consider alternative estimation procedures to questions 1 and 2.

for some > 0.

Interpret, relative to least squares, the new optimization function. [5 Marks].

with 0 = 0, n N (0, 1) and independently n N (0, 1).

(a) Show that

for any n 1 and the Gaussian CDF. [2 Marks].

p(zn |n )p(n |z1:n1 )

p(n |n1 )p(n1 |z1:n1 )dn1 .

Now assuming that:

and hence that

You might also like

1112sem2 ST5223

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

1112sem2 ST5223

Uploaded by

Copyright:

Available Formats

NATIONAL UNIVERSITY OF SINGAPORE

ST5223 Statistical Models: Theory and Applications

Time Allowed: 2 Hours

Instructions for Candidates

Consider the following model, for Yi R:

with  Nn (0, 2 In ). We will consider alternative estimation procedures to questions 1 and 2.

for some > 0.

Interpret, relative to least squares, the new optimization function. [5 Marks].

with 0 = 0, n N (0, 1) and independently n N (0, 1).

(a) Show that

for any n 1 and the Gaussian CDF. [2 Marks].

p(zn |n )p(n |z1:n1 )

p(n |n1 )p(n1 |z1:n1 )dn1 .

Now assuming that:

and hence that

You might also like

with Nn (0, 2 In ). We will consider alternative estimation procedures to questions 1 and 2.

with 0 = 0, n N (0, 1) and independently n N (0, 1).