You are on page 1of 31

CONSISTENCY OF AKAIKE'S INFORMATION CRITERION

FOR INFINITE VARIANCE AUTOREGRESSIVE PROCESSES

by

Keith Knight

TECHNICAL REPORT No. 97


January 1987

Department of Statistics, GN-22


University of Washington
Seattle, Washington 98195 USA

Consistency of Akaike's Inf()I'Dlation Criterion for Infinite Variance


Autoregressive Processes
Keith Knight
University of British Columbiaand Universityof Washington

ABSTRACT

Suppose {Xn

is a p -th order autoregressive process with. innovations

in the domain of attraction of a stable law and the true order p unknown.
The estimate of p,

P , is chosen to minimize Akaike's Information Criterion

over the integers 0, 1, ... , K . It is shown that


the consistency is retained if K

-?'

00

as N

is weakly consistent and

-?'

00

at a certain rate

depending on the index of the stable law.

March 14, 1987

This research was supported by ONR Contract NOOOI4-84-e-0169 and by NSF Grant MCS 83-01807 and was part of the
of
The author would like to thank his supervisor
author's Ph.D. dissertation completed at the
R.
Martin for his support and enCl)Urngem,ent in preparing this

O.

Introduction

where

}:

autoregressive (AR(P process

Consider a stationary p

{en} are independent, identically distributed (i.i.d.) random variables. The

parameters

B1 ' ... , Bp

satisfy the usual stationarity constraints, namely all zeroes of the

polynomial

have modulus less than 1.


Now assume that the true order p is unknown but bounded by some finite constant
K (N ). Our main purpose here will be to estimate p by

where

will be obtained by

minimizing a particular version of Akaikes Information Criterion (AIC) (Akaike, 1973) over
the integers {O, 1 , ... , K (N)}. Because we should be willing to examine a greater range
of possible orders for our estimate as the number of observations increases, it makes sense to
allow K (N) to increase with N . In the finite variance case with K (N) == K , AIC does not
a consistent estimate of p ; in
concentrated on

It snouto

there

a nondegenerate limit u.""UUJUWl\JU of

integers p, p + 1 , ... , K

is a
k -mmensionar

parameter vector

is defined as tolJ,OWS:

:::: -2

+2

- 2-

is usuany defined

terms

a Gaussian hkelihood; so

a k

as follows:
"2

)=Nlncr

)+2k

,,')

where cr- (k) is

innovations vanance obtained

estimate of

equations. We will cnoose as our estimate of p the

the Y\V estimating


minimizes $(k) for k

between 0 and K , that is,

=:

In the case where two or more orders achieve the minimum, we will take the smallest of
those to be our estimate.
For certain reasons, we may also want the autoregressive parameters to vary (with N)
over some region of the parameter space. For example, consider the following hypothesis
testing problem:

versus

Ha

X n is a nondegenerate autoregressive process.

We can consider a sequence

} convergmg to

parameters converse to zero


stanstica] test

set

stanonanty condmon is

sense
as

-3set

{ (p 1

r s

pp

) :

Pj

set of
E

(-1

j=l, ... ,p}.

one can parametrize an AR(P)

process by its p parnai autocorrelations, each


(-1,1), Moreover one can show

int'''T\I::l1

for an AR(p ) ..,rr,/,p,oc

order selection, the p-parametrization is somewhat more natural than


That is,

the

~- parametrization,

"distance" between two autoregressive models with different orders is more

seen in the p-parametrization,

-4-

1. Infinite variance autoregressions


We will be interested in the case where the innovations {en} are in
attraction of a stable law with index

ex E (0,2). If E ( Ien

Recall that given observations Xl ' ... ,XN


consistently estimate the AR parameters

betahatsubJ , ... , ~ I

13 1 "

I) <

00

domain of

then we will assume that

and known order p , it is possible to


" , J3p '

In fact for LS estimates

where l;c. p :
as,

Nl/~(~k(l) - 13k ) ~ 0
where

0> ex

for

13 k = 0 for k > p. For YW estimates, a slightly weaker result holds: convergence to

ois in probability rather than almost sure.


We may also wish to consider AR models of the form

where Jl is unknown and we retain the same assumptions on the

13k ' s and {en}' It can be

shown (Knight, 1987) that if we center the observed series by subtracting the sample mean X
(i.e.,

'10 .... ,

n = 1, ... , N), we will still have

Nl/~ (~k - 13k ) !..."

for 0 > max (I, ex)

convergence is almlost sure for


by subl:racting
Depending on

"
location esnmate Jl

Jl" we

to

rate

-5As stated earlier, we will want to vary the autoregressive parameters with N. For this
reason, we will consider a triangular array of random variables
X

(1)
1

(2)
1

(2)

where each row is a finite realization of an AR(p) process:


X

= ~

(N)

R lfY)X (N).

n-I

..l.J PI

j=1

e(N)

The corresponding triangular array of innovations, {e;;') }ns,N , consists of row-wise


independent random variables sampledfrom a common distribution which is in the domain
of attraction of a stable law. Given a single i.i.d. sequence {en}' we could construct each
element of the triangular array as follows:
00

We will require that

(N)

..l.J

j=O

c.I ( PR(N)e n-J.


-

f3(N) = (f3iN) , ... , f3;N)

f3p

, we can attempt to

consistently estimate p at

p;N

are

to intmity and

to

are such that (piN) , ... ,

compact)

= "

same time.

to zero as

the testmg

consistent test

-6-

I fit) I

altematives.) Intuitively, it would seem that the smaller

is, the more

it

should be to distinguish between a p -th order and a lower order AR model. From
simulations, this does seem to be the case. This is the real motivation for allowing the
parameters to vary with N . Consider the following example. Suppose we observe a p -th
order AR process which has Pp very close to zero (say Pp

=0.1).

To estimate the order

of the process, we use a procedure which we know to be consistent. So for N large enough,
we will select the true order with arbitrarily high probability. However, for moderate sized
N , the probability of underestimating p may be very high. Conversely, if

I pp I

is close

to 1, then even for small N there will be high probability of selecting the true order. So by
allowing Pp =

fip

to shrink to zero with N , we may get some idea of the relative sample

sizes needed to get the same probability of correct order selection for two different sets of
AR parameters. If we view order selection as a hypothesis testing problem (say testing a null
hypothesis of white noise versus autoregressive alternatives), shrinking

fip

to zero is

similar in spirit to the sequence of contiguous alternative hypotheses to a null hypothesis


considered in Pitman efficiency calculations.
We should note that the partial autocorrelations do not have tlleir usual finite variance
interpretation; however, they can be unambiguously defined in terms of the regular
autocorrelations which themselves can be unambiguously defined in terms of the linear

process coefficients. (see


can

and Resnick, 1985) Moreover,

If we mctude unknown location, u ,

not

variance case.

as

recursive YW estimates

model, we

assume that it
sense

it

not
a sense,

-7-

a nuisance parameter

this situation.

We will provide an answer to the following question: under what conditions (if any) on
K (N) and (~iN) , ... , ~;"l)

AlC provide a consistent estimate

of p ? Bhansali

(1983) conjectures that AlC may provide a consistent estimate of the order of an
autoregressive process based on the rapid convergence of parameter estimates. However, he
seems to conclude, from Monte Carlo results, that this may not be the case. If K (N) is
allowed to grow too fast then we may wind up severely overfitting much of the time; for
example,

could equal K (N) with high probability.

- 8-

2. Theoretical Results
The main result of this paper is contained in Theorem 7; the first six results provide the
necessary machinery for Theorem 7. We begin by stating two results dealing with r-th
moments of martingales and submartingales.
n

Theorem L (Esseen and von Bahr, 1965) Let Sn=l:X". If E(Xn ISn_ 1)=0 for

"=1

2 S. n S. N and X n

L' for 1 S. r S. 2 then


r

I I ) S.

E( SN

2 l: E ( IXn I

).

n=l

(Note that {Sn ' (j(Sn) ; n ~ I} is a martingale.)

Theorem 2. (c.f, Chung,1974 p.346) If {X n ' (j(X n ) ; n ~ I} is an L' -submartingale for


some r > 1 then

o
The following lemma will allow us to ignore the dependence on N of the moments of

{X;N)} by virtue of being able to bound the moments


parameters within a compact set.

Lemma 3. Let {X n (~)} be a stationary AR(P) process with parameter


{en} in the domain of attraction of a
the parameter

for

0 <0<e,

law with

Ct.

~ and innovations

-900

Proof. X n (f) :;::


-

Cj

j=o

(f)en

where

Cj

(~) is a

conanuoas function of f) for all j .

Now
00

IXn (f) I s

j~O
00

S;

IC j (~) II en -j I
aj Ien _j

j=O

where a , =

shewn that

Ix I < 1

I I

C jP x .i where

00

and so

Iajl

k Ia j I

< 00 for all "( > 0 . Under this summability condition, it

j=o

follows from Cline (1983) that the random variable


00

X =

k
j=o

a j Ie j

is finite almost surely with


00

lim
X-4

----"-----'~

'0
J=

00

a.

ct

<

00

This implies that E (X 0) is finite for all 0 < 0 < a and the result follows.
The following lemma will allow us to treat moments of
moments of

Lemma 4.

en when

{X n

k Xn

the same as the

a > 1.
a zero mean stal:ioI1lary

law

) process
a>1.

1 < r < a,

- 10-

(a)

El

(b) E

x.'

= O(N)

[~ l::?

r ]

= O(N).

Proof.

where

IRN I s

[max

ISkSp

.
C'= [ 1 - ""
~
where

I ~k I ] pep + 1)

max

I-pSksN

Ix, I ].

Thus

Pk

1... Inequality,
J . Th'us by Mi'
inkowski's
>

k=1

l/r

'LXn
n=1
r

I I ]=

and part (a) tollc:.>ws from Theorem 1 by noting that E [ RN


be shown to

by

a unitonn inteJgrabrility argument.)


rneorem 2 by notilng

). (It can actually

- 11 -

and using Minkowski's Inequality.


The following theorem deals with uniform convergence of both LS and YW
autoregressive parameter estimates in the case where location is known.
Theorem 5. Assume known location J1. Let K (N) = 0 (N) for 0 <

2;

and

11! II

denote the Euclidean norm of the vector v . Then


("a)

{jj

max
p'5.1'5.K(N)

{jj

(b)

max

1'5.1'5.K(N)

II ~(l) -

f)(N) II
-

0 where f)t) == 0 for k > P .

II ~(l) - ~(I)1I ~ O.
-

Note that the vectors are not fixed length but may vary with N .
Proof. (a) The style of proof will mimic Hannan and Kanter (1977). For convenience we

suppress the notation indicating the dependence of {Xn }, {En} and f) on N . For I ~ P
the LSestimating equations can be reexpressedjas follows:

where Ll* (j)=

LN
n=l+l

I,

C1

EnX n_ j

Fix

2
o< __
(X,

and set K=K(N)=O(N). For each

is non-negative definite and so it suffices to show that for some

<

.1..,
(X,

p
~O

(i)

1C

p
v~oo

).

and

max
l'5.K

) - f)

- 12To prove (i), it sutaees to show that

for some y < ~ .


Now

$:,\", (1-2"i7.E[>m.,.

N
LenXn_ j
lS1SK n=1+1

]=1

:; f

N(1-2K)1

j=1

[J

L enXn_ j
n=1

11S1SK

~N

j=1

(l-f<l1

J#:[ lIla){
11SlSK

1]
1
L nXn-j 2
n=l

+ L

+E

21

N(l-2K)1

L enXn
n=l

<1

1rJ

max

by

=0

JI
'21

1/2

- 13uniformly over j between 1 and K (N) .


k

If 21 ~ 1 then a> 1 and so Sk,j =

EnXn

is a martingale for each J.

n=1

ISk,j I

is an L 2Y-submartingale and so by Theorems 1 and 2,

uniformly over j .
Similarly it can be shown that for all permissible values of 1, WN,j = 0 (N) uniformly
over j between 1 and K (N). Thus for a given sequence K(N) = 0 (N) by taking
sufficiently close to ~ and 1 sufficiently close to ~, we will have

as desired.
To prove (ii), we define X n , v

' En, v

as follows:

En,v

L
k=1

k=1

vk

= 1. It

to

VkE n_ k

- 14p

Now note that X nv

=:E f}jXn-}.v

+e n v

By the triangle inequality,

Now
N- K

~ [~A
X
"'" tJ k

"'"

n=K+1 k=l

11. -

It remains only to show that Np


~

00

k v

]2 <
-

~ A~ ~

"'" tJJ "'"


}=1
k=l

:E e;.v ~

00.

N- K

"'"

n=K+1

11. -

k: v

If this is true then

since the probability that this quantity stays bounded clearly must tend

to zero.

Now

=:E :E

vk

k=ln=K+1

N-K
=

:E

n=K+1

e;

- 15 Thus

since

N-

N-K

E~ -+

00

n=K+l

Thus we need only show that

Now

N-

k-l

[vk:

k=2

Now take "( < a and note that. j

I L [vj I
j=l

En_jE n_ k

n=K+l

*' k . If "( < 1 then

If "(:2:: 1 then necessarily a> 1. Thus Sl =

En _ j En -k

n=K+l

hence
N

n=K+l

unitormly over j :t: k

Theorem 1.

-k

]=0

is an L r -martingale and

- 16Now

since I vk I s 1 for all k .


(b) From the definitions of

(la)

TN = max
lS/SK

C1 ,C1 '!.I and

f..1 ' it is easy to see that

') I
IC"1 (.I , J') -C-I (.I , J.
lSi,jSI

max

<_... 41
~Xn2
.
n=l

~
41
n=N-K+1

X n2

and
(lb)

Thus using equations (la) and (Ib), we have


(2a)
and
(2b)
for K<-.

some elementary

0:

about vector and matrix norms and equations

(2a) and (2b), we get


(3a)

max
lS/SK

matrix norm is

corresponds to

hu(;ll(1iean vector norm.

- 17 Now from the definitions of ~(l) and aO), we get

uniformly in 1 by equation (3b). Finally we must show that the minimum eigenvalue of
N- te

C1

tends in probability to infinity uniformly in 1 since II (N- te C1 rIll is (in the

case of symmetric positive definite matrices) merely the reciprocal of this minimum
eigenvalue. Note that for unit vectors

uniformly over 1 and unit vectors

by condition (ii) of the proof of part (a) of this

theorem and equation (3a) above. Therefore

as required.

In the case where we have an unknown location parameter and we estimate it with some
location estimate ~,we can obtain the following corollary.
Corollary 6.

If

(~-IJi

= Op (NY) for 'Y S; min [[

compact subsets

- ; +

J, 0 ]

uniformly over all

the parameter space, then Theorem 5 still holds.

condition.
K

Theorem S

=0

(1.>

1,

- 18 Proof. 1. (a) Assume without loss of generality that J.1 = O. We can again reexpress the LS
estimating equations as follows:

where now

and

!.7 (j) =

L
n=l+l

** U)

=!.I

+ (N - I) [
1-p
L ~]
k
J.1"2
k=l

By similar methods to those used in the proof of Theorem 5, it is easy to show that for some
2

K<-,
(l

max N Ih. - l(
pSISK

(The terrri involving

LXn _ j

P O.
Hr **
H~
i
-

is killed using Lemma

In addition, using the conditions on ~,

- 19(b) Defining TN and SN analogously to the proof of Theorem 5, we again get that for
some

<..!.,
ex

and

and the rest of the proof follows as in the proof of Theorem 5.

2. Everything follows from the fact that for any 0 < S < ex,

E [ max

lSISK n=l+l

S]

= 0 (N)

which implies that

So by taking S close to ex and

close to

..!.,
we get
0:

and conclusions (a) and (b) follow directly from this.


Theorem 7. If
some K

and conclusions (a)

> 2p

) then

p
4

(b) of Theorem 5

- 20-

Proof. First we note that since p is integer-valued,

p -+

is

to

P [p = p] -+ I (as N -+ 00). From here on, we will refer to K (N) as K and to ~r) as
~k

thus suppressing the dependence on N .

Moreover we will assume that the observations X n are already centered; that is, we have
subtracted out the location estimate ~ (if we are assuming unknown location).
We now use the fact that

il(k) = &2(0)

Ii (1- ~;(l

for k

eI

1=1

where

"2
(J

I N
2
(0) = - L X n
N n=1

and ~ k (/) is the YW estimate of ~k (I

AR(I) model. Now P [p < p]

<5::

P [ min <!>(k)

<5::

<!>(p) j.

os,k<p

Since

we can write

PI lri(l .... ~; (p;;:: ....2p IN

P[p <p]

= P(I-~;(P;;::exp(-2pIN)]
<5::

so

P(N~;(P):S;2p] .

<5::

<5::

I) in an

- 21lim sup P [N ~; (P) ~ 2p] = 0


N~oo

since lim inf


We also have that
P[

p > p ] s P [ <j>(k) < <j>(k -1)


-S;P [N min In(lp<k'SK

for some p < k

~i (k

sK

p
~

< -2 ]

conclusions of Theorem 5 hold, it follows that

and hence
f.l2

N min m(l- t'k(k ~ O.


p<k'5,K

>p] ~O.

Therefore,
Thus P [p

: p] ~

0 and so P [p = p]

1 which implies that

p .

The "practical" implication of this theorem is that if N is large, with high probability

P will equal

provided that

IlJp I

is not too small with respect to N. Or in other

words, for fixed (but large) N , the probability of selecting the correct order decreases as

IlJp I

decreases.

sample Monte Carlo results seem to bear

out.

- 22-

3. Simulation results
For illustrative purposes, a small simulation study was carried out using four symmetric
stable innovations distributions with a = 0.5, 1.2, 1.9 and 2.0 (the latter being the normal
distribution). The underlying processes were AR(I) processes with the AR parameter
13 =0.1 ,0.5 and 0.9. The sample sizes considered were 100 and 900. For N

= 100, the

maximum order K was taken to be 10 while for N = 900, K was taken to be 15. 100
replications were made for each of the 24 possible arrangements of a, 13 and N . The results
of the study are given in Tables 1 through 8.

Estimated
order
0
1
2
3
4
5

6
7
8
9
10

"C AR parameter
0.1
89
4
4
0
1
0
1
0
0
1
0

0.5
0
91
3
1
0
3
1
0
0
1
0

0.9
1
87
2
1
0
2
2
3
1
0
1

Table 1: Frequency of selected order for AR(I)


process. N = 100 a =0.5

Estimated
order
0
1
2
3
4
5
6
7
8
9
10-15

AR parameter
0.1
0.5
0.9
0
0
0
93
95
91
0
0
0
0
0
1
0
0
0
0
0
0
0
0
0
4
0
0
1
5
2
0
0
1
2
0
5

Table 2: Frequency of selected order for AR(l)


process. N = 900 a = 0.5
The results are much as expected. We can see that for N = 100 and

~1

= 0.1, AlC

underestimates the true order with high probability. For N = 900, the probabilities of
selecting the true order increases over those for N = 100.

Estimated
order
0
1
2
3
4
5
6
7
8
9
10

AR parameter
f\J:.
0.1
V.,J
0.9
70
0
0
15
86
86
7
7
4
3
3
1
1
3
1
0
1
0
0
0
2
0
2
1
2
1
1
0
0
0
0
0

Table 3: Frequency of selected order for


process. N = 100 a = 1.2

Estimated
order
0
1
2

3
4
5
6
7
8
9
10-15

AR parameter
0.5
I 0.1
0.9
0
0
0
80
87
90
6
3
3
6
0
1
1
4
3
1
2
0
0
0
0
2
2
1
1
0
0
0
0
0
3
2
2

Table 4: Frequency of selected order for AR(l)


process. N =900 a = 1.2

Estimated
order
0
1
2

3
4
5
6
7
8
9
10

AR parameter
0.1
0.5
0.9
57
0
0
25
76
71
5
8
10
2
5
9
3
6
6
2
1
2
3
1
0
1
1
1
1
0
0
0
0
1
1
2
0

Table 5: Frequency of selected order for AR(1)


process. N = 100 a = 1.9

Estimated
order

0
1
2

3
4

5
6
7
8
9
10-15

AR parameter

0.1
5
78
7
2
2

0.5
0
74
14
2

1
1
3
0
0

5
1
1
1
0
0

0.9
0
75
9
7
2
2

0
3
0
1
1

Table 6: Frequency of selected order for AR(l)


process. N =900 a = 1.9

Estimated
order

0
1
2

3
4
t:

6
7
8

9
10

AR parameter

0.1
63
25
4
1
0
2
2

1
1
0
1

0.5
0
75
3
6
7
2
4
0
1
2

0.9
0
75
12
2

'".
3
0
1
0
0

Table 7: Frequency of selected order for AR(I)


process. N = 100 Normal distribution

- 26-

Estimated
order

o
1
2

3
4
5
6
7
8
9
10-15

AR parameter

0.1
0.5
0.9
000
83
79
80
3311
443
460
233
000
040
411

002
000

Table 8: Frequency of selected order forAR(l)


process. N = 900 Normal distribution

- 274. Comments

Bhansali and Downham (1977) propose a generalization of AlC. They propose to


minimize $'(k) = N In

ci (k) + yk

where y E (0,4). It is easy to see from the proof of the

above result that their criterion will also lead to consistent estimates of p under similar
conditions on K (N) and ~~N). In fact, if y = y(N) > 0 satisfies y(N)/N ~ 0, then the
criterion corresponding to $" (k ) = N In

;l (k ) + y(N) k

will consistently estimate p.

Specifically, with known location, the estimate will be consistent provided


lim inf
N -')00

.s:
y(N)

~~N) 1

> P

with y(N) bounded away from zero and with the same conditions on K (N). With an
appropriate choice of y(N), this criterion will also be consistent in the finite variance case.
However if y(N) grows too quickly with N then the criterion may seriously underestimate
the true order p in small samples in both the finite and infinite variance cases. In an
application such as autoregressive spectral density estimation (assuming now finite
variance), underestimation is more serious than overestimation since, if the order is
underestimated, the resulting spectral density estimate may be lacking important features
which may indeed exist.

- 29Davis, R. and Resnick, S. (1985) Limit theory for moving averages of random variables with
regularly varying tail probabilities. Ann. Prob. 13 179-95.

Esseen, C. and von Bahr, B. (1965) Inequalities for the rth absolute moment of a sum of
random variables, 1 s r

s 2.

Ann. Math. Statist. 36 299-303.

Hannan, E. and Kanter, M. (1977) Autoregressive processes with infinite variance. J. App.
Prob. 14 411-15.

Knight, K. (1987) Rate of convergence of centred estimates of autoregressive parameters for


infinite variance autoregressions. J. Time Ser. Anal. 8 51-60.

Shibata, R. (1976) Selection of the order of an autoregressive model by Akaike's


information criterion. Biometrika 63 117-26.

Current address:
V6T lW5

of Statisncs, Univlersity of British Columbia, 2021 West

- 28-

References

Akaike, H. (1973) Information theory and an extension of the maximum likelihood


principle, in: Second International Symposium on Information Theory (eds. B. Petrov
and F. Csaki) Akedemiai Kiade, Budapest 267-81.

Bamdorff-Nielsen, O. and Schou, G. (1973) On the parametrization of autoregressive


models by partial autocorrelations. J. Multivar. Anal. 3 408-19.

Bhansali, R (1983) Order determination for processes with infinite variance, in: Robust and
Nonlinear Time Series Analysis. (eds. J. Franke, W. Hardle, RD. Martin), Springer-

Verlag, New York, 17-25.

Bhansali, R and Downham, D. (1977) Some properties of the order of an autoregressive


model selected by a generalization of Akaike's FPE criterion. Biometrika 64 547-51.

Chung, K.L. (1974) A course in probability theory. Academic Press, New York.

Cline, D.
Report

of random vanabtes with regularly

'{791"\111'1'0

83-24, Institute of Applied Mathematics and Statistics, Umversitv

You might also like