TS Chapter6 3

6.3.
FORECASTING ARMA PROCESSES
6.3
123
Forecasting ARMA processes
The purpose of forecasting is to predict future values of a TS based on the data

collected to the present. In this section we will discuss a linear function of
X = (Xn , Xn1 , . . . , X1 )T predicting a future value of Xn+m for m = 1, 2, . . ..
We call a function
f(n) (X) = 0 + 1 Xn + . . . + n X1 = 0 +
n
X
i Xn+1i
i=1
the best linear predictor (BLP) of Xn+m if it minimizes the prediction error
S() = E[Xn+m f(n) (X)]2 ,
where is the vector of the coefficients i and X is the vector of variables
Xn+1i .
Since S() is a quadratic function of and is bounded below by zero there is at
least one value of that minimizes S(). It satisfies the equations
S()
= 0,
i
i = 0, 1, . . . , n.
Evaluation of the derivatives gives so called prediction equations

n
X
S()
i Xn+1i ] = 0
= E[Xn+m 0
0
i=1
(6.19)
X
S()
i Xn+1i )Xn+1j ] = 0
= E[(Xn+m 0
j
i=1
Assuming that E(Xt ) = the first equation can be written as
0
n
X
i = 0,
i=1
which gives
0 = (1
n
X
i=1
i ).
(6.20)
CHAPTER 6. ARMA MODELS
124
The set of equations (6.20) gives
0 = E(Xn+m Xn+1j ) 0
n
X
= E(Xn+m Xn+1j ) 2 (1
= (m (1 j))
n
X
i=1
i E(Xn+1i Xn+1j )
i=1
n
X
i=1
i )
n
X
i E(Xn+1i Xn+1j )
i=1
i (i j)
That is we obtain the following form of the prediction equations (6.20).

(m 1 + j) =
n
X
i=1
i (i j),
j = 1, . . . , n.
(6.21)
We obtain the same set of equations when E(Xt ) = 0. Hence, we assume further
that the TS is a zero-mean stationary process. Then 0 = 0 too.
6.3.1 One-step-ahead Prediction

Given {X1 , . . . , Xn } we want to forecast the value of Xn+1 . The BLP of Xn+1 is
f(n) =
n
X
i Xn+1i .
i=1
The coefficients i satisfy (6.21), that is

n
X
i=1
i (i j) = (j),
j = 1, 2, . . . , n.
A convenient way of writing these equations is using matrix notation. We have

n n = n ,
where
(6.22)
n = {(i j)}j,i=1,2,...,n
n = (1 , . . . , n )T
n = ((1), . . . , (n))T .
If n is nonsingular than the unique solution to (6.22) exists and is equal to
n = 1
n n .
(6.23)
6.3. FORECASTING ARMA PROCESSES
125
Then the forecast of Xn+1 based on X = (Xn , . . . , X1 )T can be written as

(n)
Xn+1 = nT X.
(6.24)
(n)
The mean square one-step-ahead prediction error denoted by Pn+1 is

(n)
(n)
Pn+1 = E(Xn+1 Xn+1 )2
= E(Xn+1 nT X)2
2
= E(Xn+1 nT 1
n X)
T 1
T 1
2
= E(Xn+1
2nT 1
n XXn+1 + n n XX n n )
(6.25)
T 1
1
= (0) 2nT 1
n n + n n n n n
= (0) nT 1
n n .
Example 6.6. Prediction for an AR(2)

Let
Xt = 1 Xt1 + 2 Xt2 + Zt
be a causal AR(2) process. Suppose we have one observation of X1 . Then onestep-ahead prediction function is
f(1) (X2 ) = 1 X1 ,
where
1 = 1
1 1 =
(1)
= (1) = 11
(0)
and we obtain
(1)
X2 = (1)X1 = 11 X1 .
To predict X3 based on X2 and X1 we need to calculate 1 and 2 in the prediction
function
f(2) (X3 ) = 1 X2 + 2 X1 .
126
These can be obtained from (6.23) as

1

1
(1)
(0) (1)
=
2
(2)
(1) (0)

1
= 2
(0) 2 (1)
(0) (1)
(1) (0)
1
= 2
(0) 2 (1)
(0)(1) (1)(2)
2 (1) + (0)(2)
(1)
(2)
(1)((0)(2))
(1)(1(2))
2 (0) 2 (1)
12 (1)
=
=
2
2
(0)(2) (1) (2) (1) .
2 (0) 2 (1)
12 (1)
From the difference equations (6.17) calculated in Example 6.4 we know that
1
1 2
(2) 1 (1) 2 (0) = 0
(1) =
That is
(2) = 1 (1) + 2 .
It finally gives
1
2
1
2
In fact, we can obtain this result directly from the model taking
(2)
X3 = 1 X2 + 2 X1
which satisfies the prediction equations, namely
E[(X3 1 X2 2 X1 )X1 ] = E[Z3 X1 ] = 0
E[(X3 1 X2 2 X1 )X2 ] = E[Z3 X2 ] = 0.
In general, for n 2, we have
(n)
Xn+1 = 1 Xn + 2 Xn1 ,
i.e., j = 0 for j = 3, . . . , n.
(6.26)
127
Similarly, it can be shown that a one-step-ahead prediction for AR(p) is

(n)
Xn+1 = 1 Xn + 2 Xn1 + . . . + p Xnp+1 , for n p.
(6.27)
Remark 6.8. An interesting connection between the PACF and the vector n is
that in fact nn = n the last element of the vector. For this reason, the vector n
is usually denoted by n in the following way
1
n1
2 n2
n = .. = .. = n .
. .
n
nn
The prediction equation (6.22) for a general ARMA(p,q) models is more difficult
to calculate, particularly for large values of n when we would have to calculate an
inverse of matrix n of large dimension. Hence some recursive solutions to calculate the predictor (6.24) and the mean square error (6.25) were proposed, one of
them by Levinson in 1947 and by Durbin in 1960.
The method is known as the Durbin-Levinson Algorithm. Its steps are following:
(0)
Step 1 Put 00 = 0, P1
= (0).
Step 2 For n 1 calculate

nn
where, for n 2
P
(n) n1
n1,k (n k)
=
Pk=1
n1
1 k=1 n1,k (k)
(6.28)
nk = n1,k nn n1,nk , k = 1, 2, . . . , n 1.
Step 3 For n 1 calculate
(n)
Pn+1 = Pn(n1) (1 2nn ).
(6.29)
Remark 6.9. Note, that the Durbin-Levinson algorithm gives an iterative method
to calculate the PACF of a stationary process.
128
Remark 6.10. When we predict a value of the TS based only on one preceding
datum, that is n = 1, we obtain
11 = (1),
(1)
and hence the predictor X2 = (1)X1 , or in general

(1)
Xn+1 = (1)Xn
and its mean square error
(1)
P2
= (0)(1 211 ).
When we predict Xn+1 based on two preceding values, that is n = 2, we obtain
22 =
(2) 2 (1)
(2) 11 (1)
=
1 11 (1)
1 2 (1)
which we have also obtained solving the matrix equation (6.22) for 2 ,
21 = 11 22 11 = (1)(1 22 ).
Then the predictor is
(2)
Xn+1 = 21 Xn + 22 Xn1
and its mean square error
(2)
P3
= (0)(1 211 )(1 222 ).
We could continue these steps for n = 3, 4, . . ..
Example 6.7. Prediction for an AR(2), continued
129
Using the Durbin-Levinson algorithm for AR(2) we obtain

11 = (1) =
22 =
1
1 2
(2) 2 (1)
= 2
1 2 (1)
21 = (1)(1 22 ) = 1
33 =
(3) 1 (2) 2 (1)

=0
1 1 (1) 2 (2)
31 = 21 33 22 = 1
32 = 22 33 21 = 2
44 =
(4) 1 (3) 2 (2)

=0
1 1 (1) 2 (2)
The results for 33 and 44 come from the fact that in the numerator we have the
difference which is zero (difference equation).
Hence, one-step-ahead predictor for AR(2) is based only on two preceding values,
as there are only two nonzero coefficients in the prediction function. As before,
we obtain the result
(2)
Xn+1 = 1 Xn + 2 Xn1 .
Remark 6.11. The PACF for AR(2) is
1
1 2
= 2
= 0 for 3.
11 =
22

(6.30)
6.3.2 m-step-ahead Prediction

Given values of variables {X1 , . . . , Xn } the m-steps-ahead predictor is
(n)
(m)
(m)
Xn+m = n1 Xn + n2 Xn1 + . . . + (m)

nn X1 ,
(6.31)
130
(m)
where nj = j satisfy the prediction equations (6.21). In matrix notation the

prediction equations are
n (m)
= n(m) ,
(6.32)
n
where
n(m) = ((m), (m + 1), . . . , (m + n 1))T
and
(m)
(m)
T
(m)
= (n1 , n2 , . . . , (m)
n
nn ) .
The mean square m-step-ahead prediction error is

(n)
(n)
(m)
Pn+m = E[Xn+m Xn+m ]2 = (0) (n(m) )T 1
n n .
(6.33)
The mean square prediction error assesses the precision of the forecast and it is
used to calculate so called prediction interval (PI). When the process is Gaussian
the the PI is
q
(n)
(n)
b
(6.34)
Xn+m u Pbn+m ,
where u is such that P (|U | < u ) = 1 , where U is a standard normal r.v.
For = 0.5 we have u 1.96 and the 95% prediction interval boundaries are

q
q
(n)
(n)
(n)
(n)
bn+m + 1.96 Pbn+m .
bn+m 1.96 Pbn+m , X
X
Here we have used the hat notation as usually we do not know the values of the
model parameters and we have to use their estimators. We will discuss the model
parameter estimation in the next section.
131
6.3.3 Parameter Estimation

In this section we will discuss methods of parameter estimation for ARMA(p,q)
assuming that the orders p and q are known.
Method of Moments
In this method we equate the population moments with the sample moments to
obtain a set of equations whose solution gives the required estimators. For example, the first population moment is 1 = E(X) and its sample counterpart is
This immediately gives
m1 = X.
= X.
The method of moments gives good estimators for AR models but less efficient
estimators for MA or ARMA processes. Hence we will present the method for
AR time series. As usual we denote an AR(p) model by
Xt = 1 Xt1 + . . . + p Xtp + Zt .
This is a zero-mean model, but the estimation of the mean is straightforward and
we will not discuss it further. Here we use the difference equations, where we
replace the population autocovariance (central moment of order two) with the
sample autocovariance. The first p + 1 difference equations are
(0) = 1 (1) + . . . + p (p) + 2
( ) = 1 ( 1) + . . . + p ( p),
= 1, 2, . . . , p.
Note, that q = 0, so the sum on the right hand side of (6.16) is zero.
In matrix notation we can write
2 = (0) T p
p = p
where
p = {(i j)}i,j=1,...,p
= (1 , . . . , p )T
p = ((1), . . . , (p))T .
Replacing ( ) by the sample ACVF
n
b( ) =
1X
(Xt+ X)(X
t X)
n t=1
132
we obtain the solution
b 1
bpT
b2 =
b(0)
p bp
1
b=
b
bp .
(6.35)
These equations are called Yule-Walker estimators. They are often expressed in
terms of autocorrelation function rather than autocovariance function. Then we
have

b 1 bp
b2 =
b(0) 1 bT
R
p
p
(6.36)
1
b
b
= Rp bp ,
where
b p = {b
R
(i j)}i,j=1,2,...,p
is the matrix of the sample autocorrelations and
bp = (b
(1), . . . , b(p))T
is the vector of sample autocorrelations.
b of the model
Proposition 6.3. The distribution of the Yule-Walker estimators
parameters of a causal AR(p) process
Xt = 1 Xt1 + . . . + p Xtp + Zt .
is asymptotically (as n ) normal, in the sense that
and
b ) N (0, 2
b 1 ),
n(
p
p
b2 2 .
Remark 6.12. Note that the matrix equation (6.23) is of the same form as (6.36).
Hence, we can use the Durbin-Lewinson algorithm to calculate the estimates. This
will give us the values of the sample PACF as well as the estimates of .
Proposition 6.4. The distribution of the sample PACF of a causal AR(p) process
is asymptotically normal, that is
b N (0, 1) for > p.

n
Example 6.8. Consider an AR(2) zero-mean causal process

Xt = 1 Xt1 + 2 Xt2 + Zt .
133
Then the Yule-Walker estimators are

b 1 b2
b2 =
b(0) 1 bT
R
2
2
where
and
b=R
b 1 b2 ,
2
b2 =
R
b(0) b(1)
b(1) b(0)
b2 = (b
(1), b(2))T
b = (b1 , b2 )T .
We can easily invert a 2 2 matrix and calculate the estimators, or we can use the
Durbin-Levinson algorithm directly to obtain
b11 = b(1) =
Also, we get
b1
1 b2
b(2) b2 (1)
b
= b2
22 =
1 b2 (1)
b21 = b(1)[1 b22 ] = b1 .
"
b2 = (0) 1 (b
(1), b(2))
b1
b2
!#
= (0)[1 (b
(1)b1 + b(2)b2 )]
Furthermore, from Proposition 6.3 we can derive the confidence interval for i .
The proposition says that
d
b )
b 1 ),
n(
N (0, 2
p
b 1 ,
that is the variance of n(bi i ) is the i-th diagonal element of the matrix 2
p
say vii . But
var[ n(bi i )] = n var(bi i ) = n var(bi ).

Hence,
var(bi ) =
1
vii
n
and the confidence interval is

#
"
r
r
1
1
vii , bi + u
vii .
bi u
n
n
134
Also, from Proposition 6.4 we have
that is
nb N (0, 1) for > p,
var( nb ) 1 for > p.
This gives the asymptotic result
var(b )
1
.
n
However, we know that the PACF for > p is zero. It means that with probability
1 we have
b 0
< u .
u < q
1
n
It can be interpreted that the estimate of the PACF indicates a non-significant value
of if it is in the interval
[u / n, u / n].
We will do the calculations for the simulated AR(2) process given in Figure 6.3.
For these data we have the following values of the sample variance
b(0) and the
sample autocorrelations b(1) and b(2)
b 2 is equal to
Then, matrix R
and its inverse is
b(0) = 1.947669
b(1) = 0.66018
b(2) = 0.33751.
b2 =
R
b 1 =
R
2
1
0.66018
0.66018
1
1.77254 1.17020
1.17020
1.77254
Hence, we obtain the following Yule-Walker estimates of the model parameters

!

b
0.775243
1
0.66018
1.77254 1.17020
=
=
0.174290
0.33751
1.17020
1.77254
b2
135
The estimate of the white noise variance is
b2 = 1.947669[1 (0.66018 0.775243 + 0.33751 (0.174290))] = 1.06542.
The series was simulated for 1 = 0.7 and 2 = 0.1 and a Gaussian White Noise
with zero mean and variance equal to 1. These estimates are not far from the true
values. Had we not known the true values we would have liked to calculate the
confidence intervals for them. There are 200 observations, i.e. n = 200 which is
big enough to use the asymptotic result given in the Proposition 6.3. To calculate
vii note that
= (0)R,
which gives
1 =
1
R1 .
(0)
Hence
b 1 =
b2
b2
1 b 1
R
b(0)
1.06542
=
1.947669
1.77254 1.17020
1.17020
1.77254
0.969623 0.640129
0.640129
0.969623
and we obtain the estimate of the variance of the parameter estimators

var(bi ) =
1
1
vii =
0.969623 = 0.0048481.
n
200
The 95% approximate confidence intervals for the model parameters 1 and 2
are, respectively
[0.775243 1.96 0.0048481, 0.775243 + 1.96 0.0048481]

= [0.638771, 0.911714]
[0.17429 1.96 0.0048481, 0.17429 + 1.96 0.0048481]

= [0.310761, 0.037818]
136
Maximum Likelihood Estimation
The method of Maximum Likelihood Estimation applies to any ARMA(p,q) model

Xt 1 Xt1 . . . p Xtp = Zt + 1 Zt1 + . . . + q Ztq .
This method requires an assumption on the distribution of the random variable
X = (X1 , . . . , Xn )T . The usual assumption is that the process is Gaussian. Let
us denote the p.d.f. of X by
fX (X1 , . . . , Xn ; , 2 ),
where
= (1 , . . . , p , 1 , . . . , q )T .
Given the values of X the p.d.f. becomes a function of the parameters. It is then
denoted by
L(, 2 |x1 , . . . , xn )
and for the Gaussian process it is

1 T 1
exp X n X .
L(, |x1 , . . . , xn ) = p
2
(2)n det(n )
2
A more convenient form can be obtained after taking natural logarithm. Then
l(, 2 |x1 , . . . , xn ) = ln L(, 2 |x1 , . . . , xn )
1
1
n
= ln(2) ln det(n ) X T 1
n X.
2
2
2
The Maximum likelihood Estimates are the values of and 2 which maximize
the function l(, 2 |x1 , . . . , xn ). Intuitively, the MLE is the parameter value for
which the observed sample is most likely.
The estimates are usually found numerically using some iterative numerical optimization routines. We will not discuss them here.
The MLE have the property of being asymptotically normally distributed. It is
stated in the following proposition.
Proposition 6.5. The distribution of the MLE b of a causal and invertible ARMA(p,q)
process is asymptotically normal in the sense that
d
n(b ) N (0, 2 1
(6.37)
p+q ),
where the (p + q) (p + q)-dimensional matrix p+q depends on the model parameters.
137
Some Specific Asymptotic Distributions
AR(1): Xt + Xt1 = Zt

1
2
b
AN , (1 )
n
AR(2): Xt + 1 Xt1 + 2 Xt2 = Zt

!

b
1
1 22
1 (1 + 2 )
1
1
,
AN
1 22
2
n 1 (1 + 2 )
b2
MA(1): Xt = Zt + Zt1

1
2
b
AN , (1 )
n
MA(2): Xt = Zt + 1 Zt1 + 2 Zt2

!

b
1
1 22
1 (1 + 2 )
1
1
AN
,
1 22
2
n 1 (1 + 2 )
b2
ARMA(1,1): Xt Xt1 = Zt + Zt1

!

1 +
b
(1 2 )(1 + ) (1 2 )(1 2 )
,
AN
n( + )2 (1 2 )(1 2 ) (1 2 )(1 + )
b
Using these results we can construct approximate confidence intervals for the
model parameters as in the method of moments.
138

TS Chapter6 3

Uploaded by

Document Information

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

TS Chapter6 3

Uploaded by

Copyright:

Available Formats

6.3.

FORECASTING ARMA PROCESSES

Forecasting ARMA processes

The purpose of forecasting is to predict future values of a TS based on the data

Evaluation of the derivatives gives so called prediction equations

CHAPTER 6. ARMA MODELS

That is we obtain the following form of the prediction equations (6.20).

6.3.1 One-step-ahead Prediction

The coefficients i satisfy (6.21), that is

A convenient way of writing these equations is using matrix notation. We have

6.3. FORECASTING ARMA PROCESSES

Then the forecast of Xn+1 based on X = (Xn , . . . , X1 )T can be written as

The mean square one-step-ahead prediction error denoted by Pn+1 is

Pn+1 = E(Xn+1 Xn+1 )2

Example 6.6. Prediction for an AR(2)

CHAPTER 6. ARMA MODELS

These can be obtained from (6.23) as

6.3. FORECASTING ARMA PROCESSES

Similarly, it can be shown that a one-step-ahead prediction for AR(p) is

Xn+1 = 1 Xn + 2 Xn1 + . . . + p Xnp+1 , for n p.

Step 2 For n 1 calculate

Pn+1 = Pn(n1) (1 2nn ).

CHAPTER 6. ARMA MODELS

and hence the predictor X2 = (1)X1 , or in general

When we predict Xn+1 based on two preceding values, that is n = 2, we obtain

= (0)(1 211 )(1 222 ).

We could continue these steps for n = 3, 4, . . ..

Example 6.7. Prediction for an AR(2), continued

6.3. FORECASTING ARMA PROCESSES

Using the Durbin-Levinson algorithm for AR(2) we obtain

(3) 1 (2) 2 (1)

(4) 1 (3) 2 (2)

6.3.2 m-step-ahead Prediction

Xn+m = n1 Xn + n2 Xn1 + . . . + (m)

CHAPTER 6. ARMA MODELS

where nj = j satisfy the prediction equations (6.21). In matrix notation the

The mean square m-step-ahead prediction error is

6.3. FORECASTING ARMA PROCESSES

6.3.3 Parameter Estimation

CHAPTER 6. ARMA MODELS

is the matrix of the sample autocorrelations and

is the vector of sample autocorrelations.

b N (0, 1) for > p.

Example 6.8. Consider an AR(2) zero-mean causal process

6.3. FORECASTING ARMA PROCESSES

Then the Yule-Walker estimators are

var[ n(bi i )] = n var(bi i ) = n var(bi ).

and the confidence interval is

CHAPTER 6. ARMA MODELS

nb N (0, 1) for > p,

var( nb ) 1 for > p.

This gives the asymptotic result

Hence, we obtain the following Yule-Walker estimates of the model parameters

6.3. FORECASTING ARMA PROCESSES

The estimate of the white noise variance is

b2 = 1.947669[1 (0.66018 0.775243 + 0.33751 (0.174290))] = 1.06542.

and we obtain the estimate of the variance of the parameter estimators

[0.775243 1.96 0.0048481, 0.775243 + 1.96 0.0048481]

[0.17429 1.96 0.0048481, 0.17429 + 1.96 0.0048481]

CHAPTER 6. ARMA MODELS

The method of Maximum Likelihood Estimation applies to any ARMA(p,q) model

and for the Gaussian process it is

6.3. FORECASTING ARMA PROCESSES

Some Specific Asymptotic Distributions