Professional Documents
Culture Documents
6.3
123
n
X
i Xn+1i
i=1
the best linear predictor (BLP) of Xn+m if it minimizes the prediction error
S() = E[Xn+m f(n) (X)]2 ,
where is the vector of the coefficients i and X is the vector of variables
Xn+1i .
Since S() is a quadratic function of and is bounded below by zero there is at
least one value of that minimizes S(). It satisfies the equations
S()
= 0,
i
i = 0, 1, . . . , n.
X
S()
i Xn+1i ] = 0
= E[Xn+m 0
0
i=1
(6.19)
X
S()
i Xn+1i )Xn+1j ] = 0
= E[(Xn+m 0
j
i=1
Assuming that E(Xt ) = the first equation can be written as
0
n
X
i = 0,
i=1
which gives
0 = (1
n
X
i=1
i ).
(6.20)
124
The set of equations (6.20) gives
0 = E(Xn+m Xn+1j ) 0
n
X
= E(Xn+m Xn+1j ) 2 (1
= (m (1 j))
n
X
i=1
i E(Xn+1i Xn+1j )
i=1
n
X
i=1
i )
n
X
i E(Xn+1i Xn+1j )
i=1
i (i j)
n
X
i=1
i (i j),
j = 1, . . . , n.
(6.21)
We obtain the same set of equations when E(Xt ) = 0. Hence, we assume further
that the TS is a zero-mean stationary process. Then 0 = 0 too.
n
X
i Xn+1i .
i=1
i (i j) = (j),
j = 1, 2, . . . , n.
(6.22)
n = {(i j)}j,i=1,2,...,n
n = (1 , . . . , n )T
n = ((1), . . . , (n))T .
If n is nonsingular than the unique solution to (6.22) exists and is equal to
n = 1
n n .
(6.23)
125
Xn+1 = nT X.
(6.24)
(n)
(n)
= E(Xn+1 nT X)2
2
= E(Xn+1 nT 1
n X)
T 1
T 1
2
= E(Xn+1
2nT 1
n XXn+1 + n n XX n n )
(6.25)
T 1
1
= (0) 2nT 1
n n + n n n n n
= (0) nT 1
n n .
(1)
= (1) = 11
(0)
and we obtain
(1)
X2 = (1)X1 = 11 X1 .
To predict X3 based on X2 and X1 we need to calculate 1 and 2 in the prediction
function
f(2) (X3 ) = 1 X2 + 2 X1 .
126
1
= 2
(0) 2 (1)
(0) (1)
(1) (0)
1
= 2
(0) 2 (1)
(0)(1) (1)(2)
2 (1) + (0)(2)
(1)
(2)
(1)((0)(2))
(1)(1(2))
2 (0) 2 (1)
12 (1)
=
=
2
2
(0)(2) (1) (2) (1) .
2 (0) 2 (1)
12 (1)
From the difference equations (6.17) calculated in Example 6.4 we know that
1
1 2
(2) 1 (1) 2 (0) = 0
(1) =
That is
(2) = 1 (1) + 2 .
It finally gives
1
2
1
2
In fact, we can obtain this result directly from the model taking
(2)
X3 = 1 X2 + 2 X1
which satisfies the prediction equations, namely
E[(X3 1 X2 2 X1 )X1 ] = E[Z3 X1 ] = 0
E[(X3 1 X2 2 X1 )X2 ] = E[Z3 X2 ] = 0.
In general, for n 2, we have
(n)
Xn+1 = 1 Xn + 2 Xn1 ,
i.e., j = 0 for j = 3, . . . , n.
(6.26)
127
(6.27)
Remark 6.8. An interesting connection between the PACF and the vector n is
that in fact nn = n the last element of the vector. For this reason, the vector n
is usually denoted by n in the following way
1
n1
2 n2
n = .. = .. = n .
. .
n
nn
The prediction equation (6.22) for a general ARMA(p,q) models is more difficult
to calculate, particularly for large values of n when we would have to calculate an
inverse of matrix n of large dimension. Hence some recursive solutions to calculate the predictor (6.24) and the mean square error (6.25) were proposed, one of
them by Levinson in 1947 and by Durbin in 1960.
The method is known as the Durbin-Levinson Algorithm. Its steps are following:
(0)
Step 1 Put 00 = 0, P1
= (0).
P
(n) n1
n1,k (n k)
=
Pk=1
n1
1 k=1 n1,k (k)
(6.28)
nk = n1,k nn n1,nk , k = 1, 2, . . . , n 1.
Step 3 For n 1 calculate
(n)
(6.29)
Remark 6.9. Note, that the Durbin-Levinson algorithm gives an iterative method
to calculate the PACF of a stationary process.
128
Remark 6.10. When we predict a value of the TS based only on one preceding
datum, that is n = 1, we obtain
11 = (1),
(1)
Xn+1 = (1)Xn
and its mean square error
(1)
P2
= (0)(1 211 ).
22 =
(2) 2 (1)
(2) 11 (1)
=
1 11 (1)
1 2 (1)
which we have also obtained solving the matrix equation (6.22) for 2 ,
21 = 11 22 11 = (1)(1 22 ).
Then the predictor is
(2)
Xn+1 = 21 Xn + 22 Xn1
and its mean square error
(2)
P3
129
22 =
1
1 2
(2) 2 (1)
= 2
1 2 (1)
21 = (1)(1 22 ) = 1
33 =
31 = 21 33 22 = 1
32 = 22 33 21 = 2
44 =
The results for 33 and 44 come from the fact that in the numerator we have the
difference which is zero (difference equation).
Hence, one-step-ahead predictor for AR(2) is based only on two preceding values,
as there are only two nonzero coefficients in the prediction function. As before,
we obtain the result
(2)
Xn+1 = 1 Xn + 2 Xn1 .
Remark 6.11. The PACF for AR(2) is
1
1 2
= 2
= 0 for 3.
11 =
22
(6.30)
(m)
(m)
(6.31)
130
(m)
(m)
(m)
T
(m)
= (n1 , n2 , . . . , (m)
n
nn ) .
(n)
(m)
Pn+m = E[Xn+m Xn+m ]2 = (0) (n(m) )T 1
n n .
(6.33)
The mean square prediction error assesses the precision of the forecast and it is
used to calculate so called prediction interval (PI). When the process is Gaussian
the the PI is
q
(n)
(n)
b
(6.34)
Xn+m u Pbn+m ,
where u is such that P (|U | < u ) = 1 , where U is a standard normal r.v.
For = 0.5 we have u 1.96 and the 95% prediction interval boundaries are
q
q
(n)
(n)
(n)
(n)
bn+m + 1.96 Pbn+m .
bn+m 1.96 Pbn+m , X
X
Here we have used the hat notation as usually we do not know the values of the
model parameters and we have to use their estimators. We will discuss the model
parameter estimation in the next section.
131
= X.
The method of moments gives good estimators for AR models but less efficient
estimators for MA or ARMA processes. Hence we will present the method for
AR time series. As usual we denote an AR(p) model by
Xt = 1 Xt1 + . . . + p Xtp + Zt .
This is a zero-mean model, but the estimation of the mean is straightforward and
we will not discuss it further. Here we use the difference equations, where we
replace the population autocovariance (central moment of order two) with the
sample autocovariance. The first p + 1 difference equations are
(0) = 1 (1) + . . . + p (p) + 2
( ) = 1 ( 1) + . . . + p ( p),
= 1, 2, . . . , p.
Note, that q = 0, so the sum on the right hand side of (6.16) is zero.
In matrix notation we can write
2 = (0) T p
p = p
where
p = {(i j)}i,j=1,...,p
= (1 , . . . , p )T
p = ((1), . . . , (p))T .
Replacing ( ) by the sample ACVF
n
b( ) =
1X
(Xt+ X)(X
t X)
n t=1
132
we obtain the solution
b 1
bpT
b2 =
b(0)
p bp
1
b=
b
bp .
(6.35)
These equations are called Yule-Walker estimators. They are often expressed in
terms of autocorrelation function rather than autocovariance function. Then we
have
b 1 bp
b2 =
b(0) 1 bT
R
p
p
(6.36)
1
b
b
= Rp bp ,
where
b p = {b
R
(i j)}i,j=1,2,...,p
bp = (b
(1), . . . , b(p))T
b of the model
Proposition 6.3. The distribution of the Yule-Walker estimators
parameters of a causal AR(p) process
Xt = 1 Xt1 + . . . + p Xtp + Zt .
is asymptotically (as n ) normal, in the sense that
and
b ) N (0, 2
b 1 ),
n(
p
p
b2 2 .
Remark 6.12. Note that the matrix equation (6.23) is of the same form as (6.36).
Hence, we can use the Durbin-Lewinson algorithm to calculate the estimates. This
will give us the values of the sample PACF as well as the estimates of .
Proposition 6.4. The distribution of the sample PACF of a causal AR(p) process
is asymptotically normal, that is
133
b 1 b2
b2 =
b(0) 1 bT
R
2
2
where
and
b=R
b 1 b2 ,
2
b2 =
R
b(0) b(1)
b(1) b(0)
b2 = (b
(1), b(2))T
b = (b1 , b2 )T .
We can easily invert a 2 2 matrix and calculate the estimators, or we can use the
Durbin-Levinson algorithm directly to obtain
b11 = b(1) =
Also, we get
b1
1 b2
b(2) b2 (1)
b
= b2
22 =
1 b2 (1)
b21 = b(1)[1 b22 ] = b1 .
"
b2 = (0) 1 (b
(1), b(2))
b1
b2
!#
= (0)[1 (b
(1)b1 + b(2)b2 )]
Furthermore, from Proposition 6.3 we can derive the confidence interval for i .
The proposition says that
d
b )
b 1 ),
n(
N (0, 2
p
b 1 ,
that is the variance of n(bi i ) is the i-th diagonal element of the matrix 2
p
say vii . But
var(bi ) =
1
vii
n
134
Also, from Proposition 6.4 we have
that is
var(b )
1
.
n
However, we know that the PACF for > p is zero. It means that with probability
1 we have
b 0
< u .
u < q
1
n
It can be interpreted that the estimate of the PACF indicates a non-significant value
of if it is in the interval
[u / n, u / n].
We will do the calculations for the simulated AR(2) process given in Figure 6.3.
For these data we have the following values of the sample variance
b(0) and the
sample autocorrelations b(1) and b(2)
b 2 is equal to
Then, matrix R
and its inverse is
b(0) = 1.947669
b(1) = 0.66018
b(2) = 0.33751.
b2 =
R
b 1 =
R
2
1
0.66018
0.66018
1
1.77254 1.17020
1.17020
1.77254
135
The series was simulated for 1 = 0.7 and 2 = 0.1 and a Gaussian White Noise
with zero mean and variance equal to 1. These estimates are not far from the true
values. Had we not known the true values we would have liked to calculate the
confidence intervals for them. There are 200 observations, i.e. n = 200 which is
big enough to use the asymptotic result given in the Proposition 6.3. To calculate
vii note that
= (0)R,
which gives
1 =
1
R1 .
(0)
Hence
b 1 =
b2
b2
1 b 1
R
b(0)
1.06542
=
1.947669
1.77254 1.17020
1.17020
1.77254
0.969623 0.640129
0.640129
0.969623
1
1
vii =
0.969623 = 0.0048481.
n
200
The 95% approximate confidence intervals for the model parameters 1 and 2
are, respectively
136
Maximum Likelihood Estimation
1 T 1
exp X n X .
L(, |x1 , . . . , xn ) = p
2
(2)n det(n )
2
A more convenient form can be obtained after taking natural logarithm. Then
l(, 2 |x1 , . . . , xn ) = ln L(, 2 |x1 , . . . , xn )
1
1
n
= ln(2) ln det(n ) X T 1
n X.
2
2
2
The Maximum likelihood Estimates are the values of and 2 which maximize
the function l(, 2 |x1 , . . . , xn ). Intuitively, the MLE is the parameter value for
which the observed sample is most likely.
The estimates are usually found numerically using some iterative numerical optimization routines. We will not discuss them here.
The MLE have the property of being asymptotically normally distributed. It is
stated in the following proposition.
Proposition 6.5. The distribution of the MLE b of a causal and invertible ARMA(p,q)
process is asymptotically normal in the sense that
d
n(b ) N (0, 2 1
(6.37)
p+q ),
where the (p + q) (p + q)-dimensional matrix p+q depends on the model parameters.
137
AR(1): Xt + Xt1 = Zt
1
2
b
AN , (1 )
n
MA(1): Xt = Zt + Zt1
1
2
b
AN , (1 )
n
b
(1 2 )(1 + ) (1 2 )(1 2 )
,
AN
n( + )2 (1 2 )(1 2 ) (1 2 )(1 + )
b
Using these results we can construct approximate confidence intervals for the
model parameters as in the method of moments.
138