You are on page 1of 16

6.3.

FORECASTING ARMA PROCESSES

6.3

123

Forecasting ARMA processes

The purpose of forecasting is to predict future values of a TS based on the data


collected to the present. In this section we will discuss a linear function of
X = (Xn , Xn1 , . . . , X1 )T predicting a future value of Xn+m for m = 1, 2, . . ..
We call a function
f(n) (X) = 0 + 1 Xn + . . . + n X1 = 0 +

n
X

i Xn+1i

i=1

the best linear predictor (BLP) of Xn+m if it minimizes the prediction error
S() = E[Xn+m f(n) (X)]2 ,
where is the vector of the coefficients i and X is the vector of variables
Xn+1i .
Since S() is a quadratic function of and is bounded below by zero there is at
least one value of that minimizes S(). It satisfies the equations
S()
= 0,
i

i = 0, 1, . . . , n.

Evaluation of the derivatives gives so called prediction equations


n

X
S()
i Xn+1i ] = 0
= E[Xn+m 0
0
i=1

(6.19)

X
S()
i Xn+1i )Xn+1j ] = 0
= E[(Xn+m 0
j
i=1
Assuming that E(Xt ) = the first equation can be written as
0

n
X

i = 0,

i=1

which gives
0 = (1

n
X
i=1

i ).

(6.20)

CHAPTER 6. ARMA MODELS

124
The set of equations (6.20) gives
0 = E(Xn+m Xn+1j ) 0

n
X

= E(Xn+m Xn+1j ) 2 (1
= (m (1 j))

n
X
i=1

i E(Xn+1i Xn+1j )

i=1
n
X
i=1

i )

n
X

i E(Xn+1i Xn+1j )

i=1

i (i j)

That is we obtain the following form of the prediction equations (6.20).


(m 1 + j) =

n
X
i=1

i (i j),

j = 1, . . . , n.

(6.21)

We obtain the same set of equations when E(Xt ) = 0. Hence, we assume further
that the TS is a zero-mean stationary process. Then 0 = 0 too.

6.3.1 One-step-ahead Prediction


Given {X1 , . . . , Xn } we want to forecast the value of Xn+1 . The BLP of Xn+1 is
f(n) =

n
X

i Xn+1i .

i=1

The coefficients i satisfy (6.21), that is


n
X
i=1

i (i j) = (j),

j = 1, 2, . . . , n.

A convenient way of writing these equations is using matrix notation. We have


n n = n ,
where

(6.22)

n = {(i j)}j,i=1,2,...,n
n = (1 , . . . , n )T

n = ((1), . . . , (n))T .
If n is nonsingular than the unique solution to (6.22) exists and is equal to
n = 1
n n .

(6.23)

6.3. FORECASTING ARMA PROCESSES

125

Then the forecast of Xn+1 based on X = (Xn , . . . , X1 )T can be written as


(n)

Xn+1 = nT X.

(6.24)
(n)

The mean square one-step-ahead prediction error denoted by Pn+1 is


(n)

(n)

Pn+1 = E(Xn+1 Xn+1 )2

= E(Xn+1 nT X)2

2
= E(Xn+1 nT 1
n X)

T 1
T 1
2
= E(Xn+1
2nT 1
n XXn+1 + n n XX n n )

(6.25)

T 1
1
= (0) 2nT 1
n n + n n n n n

= (0) nT 1
n n .

Example 6.6. Prediction for an AR(2)


Let
Xt = 1 Xt1 + 2 Xt2 + Zt
be a causal AR(2) process. Suppose we have one observation of X1 . Then onestep-ahead prediction function is
f(1) (X2 ) = 1 X1 ,
where
1 = 1
1 1 =

(1)
= (1) = 11
(0)

and we obtain
(1)

X2 = (1)X1 = 11 X1 .
To predict X3 based on X2 and X1 we need to calculate 1 and 2 in the prediction
function
f(2) (X3 ) = 1 X2 + 2 X1 .

CHAPTER 6. ARMA MODELS

126

These can be obtained from (6.23) as




1 
 
1
(1)
(0) (1)
=
2
(2)
(1) (0)


1
= 2
(0) 2 (1)

(0) (1)
(1) (0)

1
= 2
(0) 2 (1)

(0)(1) (1)(2)
2 (1) + (0)(2)

(1)
(2)

(1)((0)(2))
(1)(1(2))
2 (0) 2 (1)

12 (1)
=

=
2
2
(0)(2) (1) (2) (1) .
2 (0) 2 (1)
12 (1)

From the difference equations (6.17) calculated in Example 6.4 we know that
1
1 2
(2) 1 (1) 2 (0) = 0
(1) =

That is
(2) = 1 (1) + 2 .
It finally gives

1
2

1
2

In fact, we can obtain this result directly from the model taking
(2)

X3 = 1 X2 + 2 X1
which satisfies the prediction equations, namely
E[(X3 1 X2 2 X1 )X1 ] = E[Z3 X1 ] = 0
E[(X3 1 X2 2 X1 )X2 ] = E[Z3 X2 ] = 0.
In general, for n 2, we have
(n)

Xn+1 = 1 Xn + 2 Xn1 ,
i.e., j = 0 for j = 3, . . . , n.

(6.26)

6.3. FORECASTING ARMA PROCESSES

127

Similarly, it can be shown that a one-step-ahead prediction for AR(p) is


(n)

Xn+1 = 1 Xn + 2 Xn1 + . . . + p Xnp+1 , for n p.

(6.27)

Remark 6.8. An interesting connection between the PACF and the vector n is
that in fact nn = n the last element of the vector. For this reason, the vector n
is usually denoted by n in the following way

1
n1
2 n2

n = .. = .. = n .
. .
n
nn
The prediction equation (6.22) for a general ARMA(p,q) models is more difficult
to calculate, particularly for large values of n when we would have to calculate an
inverse of matrix n of large dimension. Hence some recursive solutions to calculate the predictor (6.24) and the mean square error (6.25) were proposed, one of
them by Levinson in 1947 and by Durbin in 1960.
The method is known as the Durbin-Levinson Algorithm. Its steps are following:
(0)

Step 1 Put 00 = 0, P1

= (0).

Step 2 For n 1 calculate


nn
where, for n 2

P
(n) n1
n1,k (n k)
=
Pk=1
n1
1 k=1 n1,k (k)

(6.28)

nk = n1,k nn n1,nk , k = 1, 2, . . . , n 1.
Step 3 For n 1 calculate
(n)

Pn+1 = Pn(n1) (1 2nn ).

(6.29)

Remark 6.9. Note, that the Durbin-Levinson algorithm gives an iterative method
to calculate the PACF of a stationary process.

CHAPTER 6. ARMA MODELS

128

Remark 6.10. When we predict a value of the TS based only on one preceding
datum, that is n = 1, we obtain
11 = (1),
(1)

and hence the predictor X2 = (1)X1 , or in general


(1)

Xn+1 = (1)Xn
and its mean square error
(1)

P2

= (0)(1 211 ).

When we predict Xn+1 based on two preceding values, that is n = 2, we obtain

22 =

(2) 2 (1)
(2) 11 (1)
=
1 11 (1)
1 2 (1)

which we have also obtained solving the matrix equation (6.22) for 2 ,
21 = 11 22 11 = (1)(1 22 ).
Then the predictor is
(2)

Xn+1 = 21 Xn + 22 Xn1
and its mean square error
(2)

P3

= (0)(1 211 )(1 222 ).

We could continue these steps for n = 3, 4, . . ..

Example 6.7. Prediction for an AR(2), continued

6.3. FORECASTING ARMA PROCESSES

129

Using the Durbin-Levinson algorithm for AR(2) we obtain


11 = (1) =

22 =

1
1 2

(2) 2 (1)
= 2
1 2 (1)

21 = (1)(1 22 ) = 1
33 =

(3) 1 (2) 2 (1)


=0
1 1 (1) 2 (2)

31 = 21 33 22 = 1
32 = 22 33 21 = 2
44 =

(4) 1 (3) 2 (2)


=0
1 1 (1) 2 (2)

The results for 33 and 44 come from the fact that in the numerator we have the
difference which is zero (difference equation).
Hence, one-step-ahead predictor for AR(2) is based only on two preceding values,
as there are only two nonzero coefficients in the prediction function. As before,
we obtain the result
(2)
Xn+1 = 1 Xn + 2 Xn1 .
Remark 6.11. The PACF for AR(2) is
1
1 2
= 2
= 0 for 3.

11 =
22

(6.30)

6.3.2 m-step-ahead Prediction


Given values of variables {X1 , . . . , Xn } the m-steps-ahead predictor is
(n)

(m)

(m)

Xn+m = n1 Xn + n2 Xn1 + . . . + (m)


nn X1 ,

(6.31)

CHAPTER 6. ARMA MODELS

130
(m)

where nj = j satisfy the prediction equations (6.21). In matrix notation the


prediction equations are
n (m)
= n(m) ,
(6.32)
n
where
n(m) = ((m), (m + 1), . . . , (m + n 1))T
and

(m)

(m)

T
(m)
= (n1 , n2 , . . . , (m)
n
nn ) .

The mean square m-step-ahead prediction error is


(n)

(n)

(m)
Pn+m = E[Xn+m Xn+m ]2 = (0) (n(m) )T 1
n n .

(6.33)

The mean square prediction error assesses the precision of the forecast and it is
used to calculate so called prediction interval (PI). When the process is Gaussian
the the PI is
q
(n)
(n)
b
(6.34)
Xn+m u Pbn+m ,
where u is such that P (|U | < u ) = 1 , where U is a standard normal r.v.
For = 0.5 we have u 1.96 and the 95% prediction interval boundaries are


q
q
(n)
(n)
(n)
(n)
bn+m + 1.96 Pbn+m .
bn+m 1.96 Pbn+m , X
X

Here we have used the hat notation as usually we do not know the values of the
model parameters and we have to use their estimators. We will discuss the model
parameter estimation in the next section.

6.3. FORECASTING ARMA PROCESSES

131

6.3.3 Parameter Estimation


In this section we will discuss methods of parameter estimation for ARMA(p,q)
assuming that the orders p and q are known.
Method of Moments
In this method we equate the population moments with the sample moments to
obtain a set of equations whose solution gives the required estimators. For example, the first population moment is 1 = E(X) and its sample counterpart is
This immediately gives
m1 = X.

= X.
The method of moments gives good estimators for AR models but less efficient
estimators for MA or ARMA processes. Hence we will present the method for
AR time series. As usual we denote an AR(p) model by
Xt = 1 Xt1 + . . . + p Xtp + Zt .
This is a zero-mean model, but the estimation of the mean is straightforward and
we will not discuss it further. Here we use the difference equations, where we
replace the population autocovariance (central moment of order two) with the
sample autocovariance. The first p + 1 difference equations are
(0) = 1 (1) + . . . + p (p) + 2
( ) = 1 ( 1) + . . . + p ( p),

= 1, 2, . . . , p.

Note, that q = 0, so the sum on the right hand side of (6.16) is zero.
In matrix notation we can write
2 = (0) T p
p = p
where

p = {(i j)}i,j=1,...,p
= (1 , . . . , p )T

p = ((1), . . . , (p))T .
Replacing ( ) by the sample ACVF
n

b( ) =

1X

(Xt+ X)(X
t X)
n t=1

CHAPTER 6. ARMA MODELS

132
we obtain the solution

b 1
bpT

b2 =
b(0)
p bp
1
b=
b
bp .

(6.35)

These equations are called Yule-Walker estimators. They are often expressed in
terms of autocorrelation function rather than autocovariance function. Then we
have


b 1 bp

b2 =
b(0) 1 bT
R
p
p
(6.36)
1
b
b
= Rp bp ,

where

b p = {b
R
(i j)}i,j=1,2,...,p

is the matrix of the sample autocorrelations and

bp = (b
(1), . . . , b(p))T

is the vector of sample autocorrelations.

b of the model
Proposition 6.3. The distribution of the Yule-Walker estimators
parameters of a causal AR(p) process
Xt = 1 Xt1 + . . . + p Xtp + Zt .
is asymptotically (as n ) normal, in the sense that

and

b ) N (0, 2
b 1 ),
n(
p
p
b2 2 .

Remark 6.12. Note that the matrix equation (6.23) is of the same form as (6.36).
Hence, we can use the Durbin-Lewinson algorithm to calculate the estimates. This
will give us the values of the sample PACF as well as the estimates of .
Proposition 6.4. The distribution of the sample PACF of a causal AR(p) process
is asymptotically normal, that is

b N (0, 1) for > p.


n

Example 6.8. Consider an AR(2) zero-mean causal process


Xt = 1 Xt1 + 2 Xt2 + Zt .

6.3. FORECASTING ARMA PROCESSES

133

Then the Yule-Walker estimators are



b 1 b2

b2 =
b(0) 1 bT
R
2
2

where

and

b=R
b 1 b2 ,

2
b2 =
R

b(0) b(1)
b(1) b(0)

b2 = (b
(1), b(2))T
b = (b1 , b2 )T .

We can easily invert a 2 2 matrix and calculate the estimators, or we can use the
Durbin-Levinson algorithm directly to obtain
b11 = b(1) =

Also, we get

b1

1 b2
b(2) b2 (1)
b
= b2
22 =
1 b2 (1)
b21 = b(1)[1 b22 ] = b1 .
"

b2 = (0) 1 (b
(1), b(2))

b1
b2

!#

= (0)[1 (b
(1)b1 + b(2)b2 )]

Furthermore, from Proposition 6.3 we can derive the confidence interval for i .
The proposition says that

d
b )
b 1 ),
n(
N (0, 2
p

b 1 ,
that is the variance of n(bi i ) is the i-th diagonal element of the matrix 2
p
say vii . But

var[ n(bi i )] = n var(bi i ) = n var(bi ).


Hence,

var(bi ) =

1
vii
n

and the confidence interval is


#
"
r
r
1
1
vii , bi + u
vii .
bi u
n
n

CHAPTER 6. ARMA MODELS

134
Also, from Proposition 6.4 we have

that is

nb N (0, 1) for > p,

var( nb ) 1 for > p.

This gives the asymptotic result

var(b )

1
.
n

However, we know that the PACF for > p is zero. It means that with probability
1 we have
b 0
< u .
u < q
1
n

It can be interpreted that the estimate of the PACF indicates a non-significant value
of if it is in the interval

[u / n, u / n].
We will do the calculations for the simulated AR(2) process given in Figure 6.3.
For these data we have the following values of the sample variance
b(0) and the
sample autocorrelations b(1) and b(2)

b 2 is equal to
Then, matrix R
and its inverse is

b(0) = 1.947669
b(1) = 0.66018
b(2) = 0.33751.

b2 =
R

b 1 =
R
2

1
0.66018
0.66018
1

1.77254 1.17020
1.17020
1.77254

Hence, we obtain the following Yule-Walker estimates of the model parameters


! 

 

b
0.775243
1
0.66018
1.77254 1.17020
=
=
0.174290
0.33751
1.17020
1.77254
b2

6.3. FORECASTING ARMA PROCESSES

135

The estimate of the white noise variance is

b2 = 1.947669[1 (0.66018 0.775243 + 0.33751 (0.174290))] = 1.06542.

The series was simulated for 1 = 0.7 and 2 = 0.1 and a Gaussian White Noise
with zero mean and variance equal to 1. These estimates are not far from the true
values. Had we not known the true values we would have liked to calculate the
confidence intervals for them. There are 200 observations, i.e. n = 200 which is
big enough to use the asymptotic result given in the Proposition 6.3. To calculate
vii note that
= (0)R,

which gives
1 =

1
R1 .
(0)

Hence
b 1 =

b2
b2

1 b 1
R

b(0)

1.06542
=
1.947669

1.77254 1.17020
1.17020
1.77254

0.969623 0.640129
0.640129
0.969623

and we obtain the estimate of the variance of the parameter estimators


var(bi ) =

1
1
vii =
0.969623 = 0.0048481.
n
200

The 95% approximate confidence intervals for the model parameters 1 and 2
are, respectively

[0.775243 1.96 0.0048481, 0.775243 + 1.96 0.0048481]


= [0.638771, 0.911714]

[0.17429 1.96 0.0048481, 0.17429 + 1.96 0.0048481]


= [0.310761, 0.037818]

CHAPTER 6. ARMA MODELS

136
Maximum Likelihood Estimation

The method of Maximum Likelihood Estimation applies to any ARMA(p,q) model


Xt 1 Xt1 . . . p Xtp = Zt + 1 Zt1 + . . . + q Ztq .
This method requires an assumption on the distribution of the random variable
X = (X1 , . . . , Xn )T . The usual assumption is that the process is Gaussian. Let
us denote the p.d.f. of X by
fX (X1 , . . . , Xn ; , 2 ),
where
= (1 , . . . , p , 1 , . . . , q )T .
Given the values of X the p.d.f. becomes a function of the parameters. It is then
denoted by
L(, 2 |x1 , . . . , xn )

and for the Gaussian process it is



1 T 1
exp X n X .
L(, |x1 , . . . , xn ) = p
2
(2)n det(n )
2

A more convenient form can be obtained after taking natural logarithm. Then
l(, 2 |x1 , . . . , xn ) = ln L(, 2 |x1 , . . . , xn )
1
1
n
= ln(2) ln det(n ) X T 1
n X.
2
2
2
The Maximum likelihood Estimates are the values of and 2 which maximize
the function l(, 2 |x1 , . . . , xn ). Intuitively, the MLE is the parameter value for
which the observed sample is most likely.
The estimates are usually found numerically using some iterative numerical optimization routines. We will not discuss them here.
The MLE have the property of being asymptotically normally distributed. It is
stated in the following proposition.
Proposition 6.5. The distribution of the MLE b of a causal and invertible ARMA(p,q)
process is asymptotically normal in the sense that

d
n(b ) N (0, 2 1
(6.37)
p+q ),
where the (p + q) (p + q)-dimensional matrix p+q depends on the model parameters.

6.3. FORECASTING ARMA PROCESSES

137

Some Specific Asymptotic Distributions

AR(1): Xt + Xt1 = Zt


1
2
b
AN , (1 )
n

AR(2): Xt + 1 Xt1 + 2 Xt2 = Zt


!




b
1
1 22
1 (1 + 2 )
1
1
,
AN
1 22
2
n 1 (1 + 2 )
b2

MA(1): Xt = Zt + Zt1



1
2
b
AN , (1 )
n

MA(2): Xt = Zt + 1 Zt1 + 2 Zt2


!




b
1
1 22
1 (1 + 2 )
1
1
AN
,
1 22
2
n 1 (1 + 2 )
b2

ARMA(1,1): Xt Xt1 = Zt + Zt1


!
 


1 +

b
(1 2 )(1 + ) (1 2 )(1 2 )
,
AN

n( + )2 (1 2 )(1 2 ) (1 2 )(1 + )
b

Using these results we can construct approximate confidence intervals for the
model parameters as in the method of moments.

138

CHAPTER 6. ARMA MODELS