You are on page 1of 6

Massachusetts Institute of Technology Department of Economics

Time Series 14.384

Guido Kuersteiner

Lecture 5: Estimation and Specication of ARMA Models We rst consider estimators for AR(p) models. Assume that xt is generated by xt = 1 xt1 + ... + p xtp + t where t is a martingale dierence sequence, i.e. (or white noise plus mixing plus moment restrictions) EMt1 t = 0 where Mt contains all measurable functions of {xs , s t}. This assumption is stronger 0 than the WN assumption previously made. Stack zt = (xt1 , ..., xtp ) then the OLS estimator for is given by =
T X 0 zt zt T X

t=p+1

!1
0 zt zt

t=p+1

= +

t=p+1

!1

T X

zt xt
T X

zt t

t=p+1

Then by assuming that a WLLN holds, it follows that 1X 0 p zt zt T x (0) . . . .. = . x (p 1) . . . . x (0)


0

where

The WLLN holds for example if zt is strictly stationary, Ezt zt = and an additional technical condition, called ergodicity holds. We turn to the asymptotic distribution next. First note that zt t is a martingale dierence P p 2+ 1 0 0 < and T (zt zt 2 Ezt zt 2 ) 0 then we can apply sequence as well. Then, if supt E kzt t k t t a martingale dierence CLT to show that ! T T 1 X 1X d 0 2 zt t N 0, lim E(zt zt t ) T T T t=1+p t=1
0 If in additionEMt1 2 = 2 then Ezt zt 2 = 2 . Therefore the asymptotic distribution for is t t d given by T N 0, 2 1

5.1. ML-Estimation

The maximum likelihood estimator is dened as the value maximizing f (x1 , ..., xT ; )

over , where f (. | ) is the joint distribution of {x1 , ..., xT }. If XT = (x1 , ..., xT ) is a Gaussian time series then the likelihood function is given by f (x1 , ..., xT ; ) =
0

1 1 0 det(T ())1/2 exp( XT 1 ()XT ) T T /2 2 (2)

(5.1)

where T () = EXT XT is the T T covariance matrix of XT . The covariance matrix is a nonlinear function of the underlying parameters. Maximizing (5.1) directly is therefore a highly nonlinear optimization problem. The problem can be simplied by considering the conditional densities of xt . We can then write the joint density as a product of conditional densities f (x1 , ..., xT ; ) = f (x1 ) f (x2 |x1 ) f (x3 |x2 , x1 )... ...f (xT |x1 ...xT 1 ) If xt is a Gaussian process the conditional densities are all normal with conditional mean of xt = 2 PMt1 xt and conditional variance of xt equal to 2 = xt PMt1 xt . We have seen in Lecture t 5, how these expressions can be computed recursively. It therefore follows that the exact likelihood, assuming Gaussianity, can be computed in a recursive way for each set of parameter values . In particular we can avoid numerical inversion of the T T matrix T (). In particular cases the situation simplies even further. If we specify for example that t N (0, 2 ) and xt = 1 xt1 + t then 1 1 1 2 f (xt |xt1 ) = exp 2 (xt xt1 ) 2 2 ! T 1 1 1 X 2 exp 2 (xt xt1 ) f (x1 , .., xT ; ) = 2 t=2 (2)T /2 T 1 x2 1 exp 2 . 2 (1 2 )1 Taking the log of the likelihood function we have log f (x1 , .., xT ; ) = T log
T 1 X 1 x2 1 (xt xt1 )2 . 2 2 t=2 2 2 (1 2 )1

such that

(5.2)

If we ignore the last term and maximize log f (x1 , .., xT ; ) with respect to , we see that the M L estimator is asymptotically equivalent to OLS. This result was derived under the assumption that t is Gaussian. If Gaussianity does not hold we can still use (5.2) as the criterion function. In this case the estimator is called a Quasi Maximum Likelihood estimator. It can be shown that under certain conditions, including that EMt1 2 = 2 , the resulting estimator has the same asymptotic t distribution as if the errors were indeed normal. In other cases it is useful to approximate the exact innovations updating formulas. In particular, we have seen that the ARMA(1,1) case can be handled by looking at the limiting behavior of the projection coecients. For the ARMA(1,1) case parametrized by the polynomials (L) = (1 L) and (L) = (1 L) we use therefore the following approximate formulation for the likelihood function 1 1 1 2 f (xt |xt1 , ... ) exp 2 (xt xt1 + t1 ) 2 2 2

the full likelihood is obtained by setting 0 = 0. We have now ! T 1 1 1 X exp 2 (xt xt1 + t1 )2 f (x1 , ..., xT ; ) 2 t=2 (2)T /2 T 1 x2 1 exp 2 2 c where c = (1 + 2 + 2 )/(1 2 ). Maximizing f (x1 , ..., xT , ) with respect to and is equivalent to minimizing the sum ST (, ) =
T X t=2

(xt xt1 + t1 )2 +

x2 1 c2

The last term is typically left away since it has no eect asymptotically. The sum S(, ) can be evaluated for all values and by computing the residuals t recursively, i.e., 1 2 t = x1 = x2 x1 + x1 = x2 + ( )x1 b = xt xt1 + t1

We can therefore use numerical algorithms to evaluate ST (, ) at dierent values of , . More generally the ML estimator for the ARMA(p,q) class parametrized by (L) = (1 1 L ... p Lp ) and (L) = (1 1 L ... q Lq )of models can be written as ! T 1 1 1 X 2 1 0 f (x1 , ..., xT ; ) exp t exp Xm 1 Xm m 2 t=m+1 2 (2 )T /2 T
0 0

where m = max(p, q), Xm = (x1 , ...., xm ) and m = EXm Xm . The errors can be approximated again by 1 2 t = x1 = x2 1 x1 + 11 = xt 1 xt1 ... p xtp + 1bt1 + .... + qbtq

A further approximation step then uses

! T 1 1 1 X 2 exp f (x1 , ..., xT ; ) 2 t=m+1 t (2)T /2 T to estimate the parameters. 5.2. Asymptotic Distribution of ML-estimators

It can be shown that estimators minimizing the criterion function ST (, ) are consistent and asymptotically normal. More generally, let QT () = log f (x1 , ..., xT ; ). Then consistency follows if for some set C,
C

sup |QT () Q()| 0

(5.3)

in probability, where Q() is a nonstochastic function. Moreover, we also need that for any > 0 and a neighborhood N ( 0 , ) sup
C\N ( 0 ,)

Q() < Q( 0 )

(5.4)

Note that is identied once C. In words this condition means, that the AR and MA polynomials should have no common zeros, should both have roots outside the unit circle and should be of order p and q respectively in a nontrivial way. In particular this means that the coecient on the highest order lag in both polynomials should be nonzero. We give an example of an MA model that has the same autocovariance function for two dierent parameter values for . You should check that only one of the models is contained in C. Example 5.1. The MA(1) models xt = t + t1 and 1 xt = t + t1 are observationally equivalent in the sense that they imply the same autocovariance function. If conditions (5.3) and (5.4) are satised and if T = arg minQT () + op (1)
C

For the case of an ARM A model we have = (, ) where = 1 , ..., p 1 , ..., q . It can be shown that for the ARMA class the set C that satises (5.4) when QT () is the Gaussian likelihood is C = Rp+q | (z)(z) 6= 0 for |z| 1, p 6= 0, q 6= 0, (z) and (z) have no common zeros

where

then it follows that 0 in probability. Conditions (5.3) and (5.4) can be shown to hold for the ARM A model with Gaussian criterion function. The proofs are somewhat complicated because C is not a compact set. We will omit them here. A consistency result is usually the rst step in deriving the asymptotic distribution of an esti mator. A second step consists in showing that T is contained in a 1/ T neighborhood of the true parameter with high probability. A Taylor expansion in the neighborhood of the true parameter 0 is then used to obtain the asymptotic distribution of the estimator. It can be shown that d T T N (0, V ()) V () = E
2

Ut Ut0 0 Vt Ut

Ut Vt0 Vt Vt0

with ut = (L)1 t and vt = (L)1 t . The limiting covariance matrix V () can then be expressed in terms of Ut = [ut , ut1 , ..., utp+1 ]and Vt = [vt , vt1 , ..., vtq+1 ] . Note that the limiting covariance matrix does not depend on 2 . Example 5.2 (ARM A(1, 1)). P xt = 1 xt1 + t + t1 P j 1 2 2 = j=0 1 tj . From this it follows that Evt = 2 12 , Evt =
1

so ut =
2 1 12 1

and

j j=0 1 tj and vt 1 Eut vt = 2 1 1 . 1

5.3. Order Selection Before an ARMA(p,q) model can be estimated we need to select the order p and q of the AR and MA-polynomial. We have seen in Lecture 2 that in principle the autocorrelation and partial autocorrelation function characterize pure AR(p) and M A(q) models. For a pure M A(q) process the variance of the jth autocorrelation coecient is given by ! q X 1 2 var (T (j)) = (i) 1+2 T i=1 which can be estimated by 1 ((j)) = T where cov(xt1 xt+|j| ) . (i) = p p var(xt ) var(xt|j| ) 1+2
q X i=1

(i)2

!1/2

In order to identify the degree of the MA polynomial we can check for which value of h the estimated autocorrelation coecient (h) stays within 1.96 (1 + 2 (b(1) + ... + b(h 1))) T

In the same way we can identify a pure AR(p) model from its partial autocorrelation function. It can be shown that for a pure AR(p) model the partial autocorrelations (n) for n > p have variance 1 1.96 T . We can thus check for which value of j the estimated coecient (j) lies within T . If the model is likely to be a mixed ARM A(p, q) model then the above identication procedure runs into diculties. We can still look at autocorrelation and partial autocorrelation functions of the data to gain some insight into the maximal degree of the two lag polynomials. There is however a more formal procedure, based on information criteria, that can be used to determine the best model in an automated way. T If the process {xt }t=1 has a true density f (x, 0 ) and the ARMA-class has densities f (x, ) then the Kullback-Leibler distance is Z f (x, 0 ) f (x, )

d ( 0 | ) =

2 ln Z

f (x, )dx

RT

2 ln = 2 ln

f (x, 0 ) f (x, )dx f (x, ) f (x, 0 )dx = 0,

where d( 0 | ) = 0 if and only if f (x, 0 ) = f (x, ) a.e. The distance measure d( 0 | ) can be approximated by AIC(p, q) = ln 2 + 2(p + q)/T, P 2 1 where 2 = T t is the maximum likelihood estimator of 2 . The best model specication is that found by calculating AIC(p, q) for dierent values of p and q and picking the combination (p , q ) such that AIC(p, q) is minimized. 5

Increasing the number p, q reduces the value of 2 . This comes at a cost of overparametrizing the model which is captured in the term 2(p + q)/T. It can be shown that AIC is inconsistent in the sense that it asymptotically picks p and q too large. A modied criterion, called BIC, does not suer from this problem. It is dened as BIC(p, q) = ln 2 + (p + q) ln T /T. b

5.4. Diagnostic Checking

Once we have determined the values for p and q the model can be estimated with the methods of the previous section. It is a sensible strategy to start with low-order models and then test against increasing the order of an AR or MA polynomial by one. Note that one should never increase the AR and MA polynomial at the same time. Assume we have already estimated an ARMA(1,1) model, and we want to test whether an ARMA(2,1) or ARMA(1,2) is more appropriate. One way to proceed is to estimate both the ARMA(2,1) and ARMA(1,2) model and then test whether the additional coecient is signicantly dierent from zero. In particular, we choose the ARMA(1,1) if 2 |2 | q < 1.96 and q < 1.96. var(2 ) var(2 ) Note that the variances of the parameter estimates should be determined from the null distribution, i.e. under the assumption that the true parameter value is zero. An alternative procedure is to test if the residuals are white noise. If the estimated model is correctly specied then the time dependence in the data should be captured by the model and the residuals should be uncorrelated. If we obtain residuals from t = xt 1 xt1 ... p xtp + 1 t1 + ... + q tq and calculate (j) =
T j 1 X t t+|j| T t=1

then (j) = (j)/ (0) should be close to zero for all j. A popular test of this hypothesis is the Portmanteau or Box-Pierce test. It is based on Q=T
K X j=1

It can be shown that under H0 = (j) = 0 j the limit distribution of Q is Q 2 K(p+q) . The reduction in degrees of freedom by p + q in the asymptotic distribution is determined by the number of parameters used in the estimation of the model. In practical applications K should be chosen at least 15 to 20. There is however a trade o between low power of the test for K too large and inconsistency of the test for K too small. In practice it is therefore advisable to look at a plot of b (j) before applying the test. Box and Ljung show that the statistic Q = T (T + 2)
K X (j)2 d 2 Kpq T j j=1

b2 (j).

has less bias than Q relative to the asymptotic 2 distribution in small samples. 6

You might also like