You are on page 1of 25

MLE AR(1)

For the Gaussian AR(1) process,


Y
t
= c +
1
Y
t1
+
t
, || < 1
e
t
WN(0,
2
).
The joint distribution of Y
T
= (Y
1
, Y
2
.....Y
T
)

is
Y
T
N
_
0,

_
the observations y (y
1
, y
2
, ..., y
T
) are the single
realization of Y
T
.
1
MLE AR(1)
_

_
Y
1
.
.
.
Y
T
_

_
= N
_
,

_
=
_

.
.
.

=
_
_
_
_
_

0
...
T1
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

T1
...
0
_
_
_
_
_
2
MLE AR(1)
The p.d.f. of the sample y = (y
1
, y
2
, ..., y
T
)

is given by
the multivariate normal density
f
Y
_
y; ;
_
= (2)

T
2
||
1
2
exp
_

1
2
(y )

1
(y )
_
Denoting

=
2
y
with
ij
=
|ij|

=
_
_
_
_
_

0
...
T1
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

T1
...
0
_
_
_
_
_
=
0
_
_
_
_
_
_
1 ...

T1

0
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

T1

0
... 1
_
_
_
_
_
_
3

=
2
y
=
2
y
_
_
_
_
_
1 ... (T 1)
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
(T 1) ... 1
_
_
_
_
_
(j) =
j
Collecting the parameters of the model in = (c, ,
2
)

,
the joint p.d.f. becomes:
f
Y
_
y;
_
= (2
2
y
)

T
2
||
1
2
exp
_

1
2
2
y
(y )

1
(y )
_
Collecting the parameters of the model in = (c, ,
2
)

,
the sample log-likelihood function is given by
L() =
T
2
log(2)
T
2
log(
2
y
)
1
2
log(||)
1
2
2
y
(y )

1
(y )
MLE
The exact log-likelihood function is a non-linear
function of the parameters . There is no closed
form solution for the exact mles.
The exact mles must be determined by numeri-
cally maximizing the exact log-likelihood function.
Usually, a Newton-Raphson type algorithm is used
for the maximization which leads to the interative
scheme

mle,n
=

mle,n


H(

mle,n
)
1
s(

mle,n
)
4
where

H(

) is an estimate of the Hessian matrix


(2nd derivative of the log-likelihood function), and
s(

mle,n
) is an estimate of the score vector (1st
derivative of the log-likelihood function).
The estimates of the Hessian and Score may be
computed numerically (using numerical derivative
routines) or they may be computed analytically (if
analytic derivatives are known).
Estimator might be biased but consistent, it con-
verges in probability.
Maximum Likelihood Estimation of ARMA Models
For i.i.d data with marginal pdf f(y
t
; ), the joint pdf
for a sample y = (y
1
, ..., y
T
) is
f(y; ) = f(y
1
, ..., y
T
; ) =
T

t=1
f(y
t
; )
The likelihood function is this joint density treated as
a function of the parameters given the data y:
L(|y) = L(|y
1
, ..., y
T
) =
T

t=1
f(y
t
; )
The log-likelihood is
L(|y) =
T

t=1
lnf(y
t
; )
5
Maximum Likelihood Estimation of ARMA Models
Problem: For a sample from a covariance stationary
time series {y
t
}, the construction of the log-likelihood
given above doesnt work because the random variables
in the sample y = (y
1
, ..., y
T
) are not i.i.d. One So-
lution: Conditional factorization of log-likelihood Intu-
ition: Consider the joint density of two adjacent ob-
servations f(y
2
, y
1
; ). The joint density can always be
factored as the product of the conditional density of y
2
given y
1
and the marginal density of y
1
:
f(y
2
, y
1
; ) = f(y
2
|y
1
; )f(y
1
; )
For three observations, the factorization becomes
f(y
3
, y
2
, y
1
; ) = f(y
3
|y
2
, y
1
; )f(y
2
|y
1
; )f(y
1
; )
6
In general, the conditional marginal factorization has
the form
f(y
p
, ..., y
1
; ) =
_
T

t=p+1
f(y
t
|I
t1
; )
_
.f(y
p
, ..., y
1
; )
I
t
= {y
t
, ..., y
1
} = info available at time t
y
p
, ..., y
1
= initial values
The exact log-likelihood function may then be expressed
as

mle
= arg max
T

t=p+1
lnf(y
t
|I
t1
; ) +lnf(y
p
, ..., y
1
; )
The conditional log-likelihood is

cmle
= arg max
T

t=p+1
lnf(y
t
|I
t1
; )
7
Two types of maximum likelihood estimates (mles) may
be computed. The rst type is based on maximizing the
conditional log-likelihood function. These estimates are
called conditional mles and are dened by

cmle
= arg max
T

t=p+1
lnf(y
t
|It 1; )
The second type is based on maximizing the exact log-
likelihood function. These estimates are called exact
mles, and are dened by

mle
= arg max
T

t=p+1
lnf(y
t
|It 1; ) +lnf(y
p
, ..., y
1
; )
8
Result:
For stationary models,

cmle
and

mle
are consistent and
have the same limiting normal distribution. In nite
samples, however,

cmle
and

mle
are generally not equal
and my dier by a substantial amount if the data are
close to being non-stationary or non-invertible.
9
AR(p ), OLS equivalent to Conditional MLE
Model:
y
t
= +
1
y
t1
+.......... +
p
y
tp
+
t
. e
t
WN(0,
2
).
= x

t
+
t
, =
1
,
2
, ....,
p
, x
t
= y
t1
, y
t2
, ...., y
tp
OLS:

= (
T
t=1
x
t
x

t
)
1

T
t=1
x
t
y
t
,

2
=
1
T (p +1)

T
t=1
(y
t
x

t
)
2
.
10
Properties of the estimator
is downward bias in a nite sample, i.e. E[

] < .
Estimator might be biased but consistent, it con-
verges in probability.
11
Example: MLE for stationary AR(1)
Y
t
= c +
1
Y
t1
+
t
, t = 1, ...., T

t
WN(0,
2
) || < 1
= (c, ,
2
)

Conditional on I
t1
y
t
|I
t1
N(c +y
t1
,
2
), t = 2...., T
which only depends on y
t1
. The conditional density
f(y
t
|I
t1;
) is then
f(y
t
|y
t1;
) = (2
2
)

1
2
exp
_
1
2
2
(y
t
c y
t1
)
2
_
,
t = 2...., T
12
To determine the marginal density for the initial value
y
1
, recall that for a stationary AR(1) process
E[y
1
] = =
c
1
(2)
var[y
t
] =

2
1
2
It follows that
y
1
N
_
c
1
,

2
1
2
_
f(y
1
; ) =
_
2
2
1
2
_

1
2
exp
_

1
2
2
2
(y
1

c
1
)
2
_
13
The conditional log-likelihood function is
T

t=2
lnf(y
t
|y
t1
; ) =
(T 1)
2
ln(2)
(T 1)
2
ln(
2
)

1
2
2
T

t=2
(y
t
c y
t1
)
2
Notice that the conditional log-likelihood function has
the form of the log-likelihood function for a linear re-
gression model with normal errors
y
t
= c y
t1
+
t
,
t
N(0,
2
), t = 2, .....T
It follows that
c
cmle
= c
ols

c
cmle
=

c
ols

2
cmle
=
T

t=2
(y
t
c
cmle


c
cmle
y
t1
)
2
14
The marginal log-likelihood for the initial value y
1
is
ln f(y
1
; ) =
1
2
ln(2)
1
2
ln
_

2
1
2
__
y
1

c
1
_
2
The exact log-likelihood function is then
lnL(; y) =
T
2
ln(2)
T
2
ln(

2
1
2
)
1 phi
2
2
2
_
y
1

c
1
_
2

(T 1)
2
ln(
2
)
1
2
2
T

t=2
(y
t
c y
t1
)
2
15
MLE Estimation of MA(1)
Recall
Y
t
= +
t1
+
t
, || < 1
e
t
WN(0,
2
).
|| < 1 is assumed for invertible representation only,
nothing about stationarity.
16
Estimation MA(1)
Y
t
|
t1
N( +
t1
,
2
),
f(Y
t
|
t1
,

) =
1

2
2
e

1
2
2
(y
t

t1
)
2

=(, ,
2
).
Problem: without knowing
t2
we dont observe
t1
.
Need to know
t2
to know
t1
= y
t

t2
.
But
t1
unobservable. Assume
0
= 0.
Make it non-random, just x it with number 0. This
works with any number.
17
Estimation MA(1)
Y
1
|
0
N(,
2
),
y
1
= +
1

1
= y
1
,
y
2
= +
2
+
1

2
= y
2
(y
1
),

t
= y
t
(y
t1
) +....... +(1)
t1

t1
(y
1
)
Conditional likelihood:
L(

|y
1
....y
T
;
0
= 0) =
T

t=1
1

2
2
e

1
2
2
(
2
t1
)
If || < 1 (much less),
0
doesnt matter, CMLE is con-
sistent.
Exact MLE requires Kalman Filter.
18
Prediction Error Decomposition
To illustrate this algorithm, consider the simple AR(1)
model. Recall,
y
t
|I
t1
N(c +y
t1
,
2
), t = 1, 2...., T
from which it follows that
E[y
t
|I
t1
] = c +y
t1
var[y
t
|I
t1
] =
2
The 1-step ahead prediction errors may then be de-
ned as

t
= y
t
E[y
t
|I
t1
] = c +y
t1
, t = 2...., T
19
The variance of the prediction error at time t is
f
t
=y
t
E[y
t
|I
t1
]
=c +y
t1
, t = 2...., T
f
t
= var(v
t
) = var(
t
) =
2
, t = 2, ...T
For the initial value, the rst prediction error and
its variance are

1
= y
1
E[y
1
] = y
1

c
1
f
1
= var(
1
) =

2
1
2
Using the prediction errors and the prediction error vari-
ances, the exact log-likelihood function may be re-expressed
as
y
t
|I
t1
N(c +y
t1
,
2
), t = 1, 2...., T
from which it follows that
E[y
t
|I
t1
] = c +y
t1
var[y
t
|I
t1
] =
2
The 1-step ahead prediction errors may then be dened
as

t
= y
t
E[y
t
|I
t1
] = c +y
t1
, t = 2...., T
20
The variance of the prediction error at time t is
f
t
=y
t
E[y
t
|I
t1
]
=c +y
t1
, t = 2...., T
f
t
= var(v
t
) = var(
t
) =
2
, t = 2, ...T
For the initial value, the rst prediction error and its
variance are

1
= y
1
E[y
1
] = y
1

c
1
f
1
= var(
1
) =

2
1
2
Using the prediction errors and the prediction error vari-
ances, the exact log-likelihood function may be re-expressed
as
ln L(|y) =
T
2
ln(2)
1
2
T

t=1
lnf
t

1
2
T

t=1

2
t
f
t
which is the prediction error decomposition. Remarks
1. A further simplication may be achieved by writing
var(v
t
) =
2
f

t
=
2
1
1
2
for t = 1
=
2
.1for t > 1
That is f

t
=
1
1
2
for t = 1 and f

t
= 1 for t > 1. Then
the log-likelihood becomes
ln L(|y) =
T
2
ln(2)
1
2
T

t=1
ln
2

1
2
T

t=1
lnf

t

1
2
2
T

t=1

2
t
f

You might also like