Consistency of AKAIKE'S INFORMATION CRITERION

CONSISTENCY OF AKAIKE'S INFORMATION CRITERION
FOR INFINITE VARIANCE AUTOREGRESSIVE PROCESSES
by
Keith Knight
TECHNICAL REPORT No. 97

January 1987
Department of Statistics, GN-22

University of Washington
Seattle, Washington 98195 USA
Consistency of Akaike's Inf()I'Dlation Criterion for Infinite Variance

Autoregressive Processes
Keith Knight
University of British Columbiaand Universityof Washington
ABSTRACT
Suppose {Xn
is a p -th order autoregressive process with. innovations
in the domain of attraction of a stable law and the true order p unknown.
The estimate of p,
P , is chosen to minimize Akaike's Information Criterion
over the integers 0, 1, ... , K . It is shown that

the consistency is retained if K
-?'
00
as N
is weakly consistent and
-?'
00
at a certain rate
depending on the index of the stable law.
March 14, 1987
This research was supported by ONR Contract NOOOI4-84-e-0169 and by NSF Grant MCS 83-01807 and was part of the
of
The author would like to thank his supervisor
author's Ph.D. dissertation completed at the
R.
Martin for his support and enCl)Urngem,ent in preparing this
O.
Introduction
where
}:
autoregressive (AR(P process
Consider a stationary p
{en} are independent, identically distributed (i.i.d.) random variables. The
parameters
B1 ' ... , Bp
satisfy the usual stationarity constraints, namely all zeroes of the
polynomial
have modulus less than 1.

Now assume that the true order p is unknown but bounded by some finite constant
K (N ). Our main purpose here will be to estimate p by
where
will be obtained by
minimizing a particular version of Akaikes Information Criterion (AIC) (Akaike, 1973) over
the integers {O, 1 , ... , K (N)}. Because we should be willing to examine a greater range
of possible orders for our estimate as the number of observations increases, it makes sense to
allow K (N) to increase with N . In the finite variance case with K (N) == K , AIC does not
a consistent estimate of p ; in
concentrated on
It snouto
there
a nondegenerate limit u.""UUJUWl\JU of
integers p, p + 1 , ... , K
is a
k -mmensionar
parameter vector
is defined as tolJ,OWS:
:::: -2
+2
- 2-
is usuany defined
terms
a Gaussian hkelihood; so
a k
as follows:
"2
)=Nlncr
)+2k
,,')
where cr- (k) is
innovations vanance obtained
estimate of
equations. We will cnoose as our estimate of p the
the Y\V estimating

minimizes $(k) for k
between 0 and K , that is,
=:
In the case where two or more orders achieve the minimum, we will take the smallest of
those to be our estimate.
For certain reasons, we may also want the autoregressive parameters to vary (with N)
over some region of the parameter space. For example, consider the following hypothesis
testing problem:
versus
Ha
X n is a nondegenerate autoregressive process.
We can consider a sequence
} convergmg to
parameters converse to zero

stanstica] test
set
stanonanty condmon is
sense
as
-3set
{ (p 1
r s
pp
) :
Pj
set of
E
(-1
j=l, ... ,p}.
one can parametrize an AR(P)
process by its p parnai autocorrelations, each

(-1,1), Moreover one can show
int'''T\I::l1
for an AR(p ) ..,rr,/,p,oc
order selection, the p-parametrization is somewhat more natural than

That is,
the
~- parametrization,
"distance" between two autoregressive models with different orders is more
seen in the p-parametrization,
-4-
1. Infinite variance autoregressions

We will be interested in the case where the innovations {en} are in
attraction of a stable law with index
ex E (0,2). If E ( Ien
Recall that given observations Xl ' ... ,XN

consistently estimate the AR parameters
betahatsubJ , ... , ~ I
13 1 "
I) <
00
domain of
then we will assume that
and known order p , it is possible to

" , J3p '
In fact for LS estimates
where l;c. p :
as,
Nl/~(~k(l) - 13k ) ~ 0
where
0> ex
for
13 k = 0 for k > p. For YW estimates, a slightly weaker result holds: convergence to
ois in probability rather than almost sure.

We may also wish to consider AR models of the form
where Jl is unknown and we retain the same assumptions on the
13k ' s and {en}' It can be
shown (Knight, 1987) that if we center the observed series by subtracting the sample mean X
(i.e.,
'10 .... ,
n = 1, ... , N), we will still have
Nl/~ (~k - 13k ) !..."
for 0 > max (I, ex)
convergence is almlost sure for

by subl:racting
Depending on
"
location esnmate Jl
Jl" we
to
rate
-5As stated earlier, we will want to vary the autoregressive parameters with N. For this
reason, we will consider a triangular array of random variables
X
(1)
1
(2)
1
(2)
where each row is a finite realization of an AR(p) process:

X
= ~
(N)
R lfY)X (N).
n-I
..l.J PI
j=1
e(N)
The corresponding triangular array of innovations, {e;;') }ns,N , consists of row-wise

independent random variables sampledfrom a common distribution which is in the domain
of attraction of a stable law. Given a single i.i.d. sequence {en}' we could construct each
element of the triangular array as follows:
00
We will require that
(N)
..l.J
j=O
c.I ( PR(N)e n-J.

-
f3(N) = (f3iN) , ... , f3;N)
f3p
, we can attempt to
consistently estimate p at
p;N
are
to intmity and
to
are such that (piN) , ... ,
compact)
= "
same time.
to zero as
the testmg
consistent test
-6-
I fit) I
altematives.) Intuitively, it would seem that the smaller
is, the more
it
should be to distinguish between a p -th order and a lower order AR model. From
simulations, this does seem to be the case. This is the real motivation for allowing the
parameters to vary with N . Consider the following example. Suppose we observe a p -th
order AR process which has Pp very close to zero (say Pp
=0.1).
To estimate the order
of the process, we use a procedure which we know to be consistent. So for N large enough,
we will select the true order with arbitrarily high probability. However, for moderate sized
N , the probability of underestimating p may be very high. Conversely, if
I pp I
is close
to 1, then even for small N there will be high probability of selecting the true order. So by
allowing Pp =
fip
to shrink to zero with N , we may get some idea of the relative sample
sizes needed to get the same probability of correct order selection for two different sets of
AR parameters. If we view order selection as a hypothesis testing problem (say testing a null
hypothesis of white noise versus autoregressive alternatives), shrinking
fip
to zero is
similar in spirit to the sequence of contiguous alternative hypotheses to a null hypothesis

considered in Pitman efficiency calculations.
We should note that the partial autocorrelations do not have tlleir usual finite variance
interpretation; however, they can be unambiguously defined in terms of the regular
autocorrelations which themselves can be unambiguously defined in terms of the linear
process coefficients. (see

can
and Resnick, 1985) Moreover,
If we mctude unknown location, u ,
not
variance case.
as
recursive YW estimates
model, we
assume that it
sense
it
not
a sense,
-7-
a nuisance parameter
this situation.
We will provide an answer to the following question: under what conditions (if any) on
K (N) and (~iN) , ... , ~;"l)
AlC provide a consistent estimate
of p ? Bhansali
(1983) conjectures that AlC may provide a consistent estimate of the order of an
autoregressive process based on the rapid convergence of parameter estimates. However, he
seems to conclude, from Monte Carlo results, that this may not be the case. If K (N) is
allowed to grow too fast then we may wind up severely overfitting much of the time; for
example,
could equal K (N) with high probability.
- 8-
2. Theoretical Results
The main result of this paper is contained in Theorem 7; the first six results provide the
necessary machinery for Theorem 7. We begin by stating two results dealing with r-th
moments of martingales and submartingales.
n
Theorem L (Esseen and von Bahr, 1965) Let Sn=l:X". If E(Xn ISn_ 1)=0 for
"=1
2 S. n S. N and X n
L' for 1 S. r S. 2 then

r
I I ) S.
E( SN
2 l: E ( IXn I
).
n=l
(Note that {Sn ' (j(Sn) ; n ~ I} is a martingale.)
Theorem 2. (c.f, Chung,1974 p.346) If {X n ' (j(X n ) ; n ~ I} is an L' -submartingale for

some r > 1 then
o
The following lemma will allow us to ignore the dependence on N of the moments of
{X;N)} by virtue of being able to bound the moments

parameters within a compact set.
Lemma 3. Let {X n (~)} be a stationary AR(P) process with parameter

{en} in the domain of attraction of a
the parameter
for
0 <0<e,
law with
Ct.
~ and innovations
-900
Proof. X n (f) :;::

-
Cj
j=o
(f)en
where
Cj
(~) is a
conanuoas function of f) for all j .
Now
00
IXn (f) I s
j~O
00
S;
IC j (~) II en -j I
aj Ien _j
j=O
where a , =
shewn that
Ix I < 1
I I
C jP x .i where
00
and so
Iajl
k Ia j I
< 00 for all "( > 0 . Under this summability condition, it
j=o
follows from Cline (1983) that the random variable

00
X =
k
j=o
a j Ie j
is finite almost surely with

00
lim
X-4
----"-----'~
'0
J=
00
a.
ct
<
00
This implies that E (X 0) is finite for all 0 < 0 < a and the result follows.
The following lemma will allow us to treat moments of
moments of
Lemma 4.
en when
{X n
k Xn
the same as the
a > 1.
a zero mean stal:ioI1lary
law
) process
a>1.
1 < r < a,
- 10-
(a)
El
(b) E
x.'
= O(N)
[~ l::?
r ]
= O(N).
Proof.
where
IRN I s
[max
ISkSp
.
C'= [ 1 - ""
~
where
I ~k I ] pep + 1)
max
I-pSksN
Ix, I ].
Thus
Pk
1... Inequality,
J . Th'us by Mi'
inkowski's
>
k=1
l/r
'LXn
n=1
r
I I ]=
and part (a) tollc:.>ws from Theorem 1 by noting that E [ RN

be shown to
by
a unitonn inteJgrabrility argument.)

rneorem 2 by notilng
). (It can actually
- 11 -
and using Minkowski's Inequality.

The following theorem deals with uniform convergence of both LS and YW
autoregressive parameter estimates in the case where location is known.
Theorem 5. Assume known location J1. Let K (N) = 0 (N) for 0 <
2;
and
11! II
denote the Euclidean norm of the vector v . Then

("a)
{jj
max
p'5.1'5.K(N)
{jj
(b)
max
1'5.1'5.K(N)
II ~(l) -
f)(N) II
-
0 where f)t) == 0 for k > P .
II ~(l) - ~(I)1I ~ O.
-
Note that the vectors are not fixed length but may vary with N .
Proof. (a) The style of proof will mimic Hannan and Kanter (1977). For convenience we
suppress the notation indicating the dependence of {Xn }, {En} and f) on N . For I ~ P
the LSestimating equations can be reexpressedjas follows:
where Ll* (j)=
LN
n=l+l
I,
C1
EnX n_ j
Fix
2
o< __
(X,
and set K=K(N)=O(N). For each
is non-negative definite and so it suffices to show that for some
<
.1..,
(X,
p
~O
(i)
1C
p
v~oo
).
and
max
l'5.K
) - f)
- 12To prove (i), it sutaees to show that
for some y < ~ .

Now
$:,\", (1-2"i7.E[>m.,.
N
LenXn_ j
lS1SK n=1+1
]=1
:; f
N(1-2K)1
j=1
[J
L enXn_ j
n=1
11S1SK
~N
j=1
(l-f<l1
J#:[ lIla){
11SlSK
1]
1
L nXn-j 2
n=l
+ L
+E
21
N(l-2K)1
L enXn
n=l
<1
1rJ
max
by
=0
JI
'21
1/2
- 13uniformly over j between 1 and K (N) .

k
If 21 ~ 1 then a> 1 and so Sk,j =
EnXn
is a martingale for each J.
n=1
ISk,j I
is an L 2Y-submartingale and so by Theorems 1 and 2,
uniformly over j .
Similarly it can be shown that for all permissible values of 1, WN,j = 0 (N) uniformly
over j between 1 and K (N). Thus for a given sequence K(N) = 0 (N) by taking
sufficiently close to ~ and 1 sufficiently close to ~, we will have
as desired.
To prove (ii), we define X n , v
' En, v
as follows:
En,v
L
k=1
k=1
vk
= 1. It
to
VkE n_ k
- 14p
Now note that X nv
=:E f}jXn-}.v
+e n v
By the triangle inequality,
Now
N- K
~ [~A
X
"'" tJ k
"'"
n=K+1 k=l
11. -
It remains only to show that Np

~
00
k v
]2 <
-
~ A~ ~
"'" tJJ "'"

}=1
k=l
:E e;.v ~
00.
N- K
"'"
n=K+1
11. -
k: v
If this is true then
since the probability that this quantity stays bounded clearly must tend
to zero.
Now
=:E :E
vk
k=ln=K+1
N-K
=
:E
n=K+1
e;
- 15 Thus
since
N-
N-K
E~ -+
00
n=K+l
Thus we need only show that
Now
N-
k-l
[vk:
k=2
Now take "( < a and note that. j
I L [vj I
j=l
En_jE n_ k
n=K+l
*' k . If "( < 1 then
If "(:2:: 1 then necessarily a> 1. Thus Sl =
En _ j En -k
n=K+l
hence
N
n=K+l
unitormly over j :t: k
Theorem 1.
-k
]=0
is an L r -martingale and
- 16Now
since I vk I s 1 for all k .

(b) From the definitions of
(la)
TN = max
lS/SK
C1 ,C1 '!.I and
f..1 ' it is easy to see that
') I
IC"1 (.I , J') -C-I (.I , J.
lSi,jSI
max
<_... 41
~Xn2
.
n=l
~
41
n=N-K+1
X n2
and
(lb)
Thus using equations (la) and (Ib), we have

(2a)
and
(2b)
for K<-.
some elementary
0:
about vector and matrix norms and equations
(2a) and (2b), we get

(3a)
max
lS/SK
matrix norm is
corresponds to
hu(;ll(1iean vector norm.
- 17 Now from the definitions of ~(l) and aO), we get
uniformly in 1 by equation (3b). Finally we must show that the minimum eigenvalue of
N- te
C1
tends in probability to infinity uniformly in 1 since II (N- te C1 rIll is (in the
case of symmetric positive definite matrices) merely the reciprocal of this minimum
eigenvalue. Note that for unit vectors
uniformly over 1 and unit vectors
by condition (ii) of the proof of part (a) of this
theorem and equation (3a) above. Therefore
as required.
In the case where we have an unknown location parameter and we estimate it with some
location estimate ~,we can obtain the following corollary.
Corollary 6.
If
(~-IJi
= Op (NY) for 'Y S; min [[
compact subsets
- ; +
J, 0 ]
uniformly over all
the parameter space, then Theorem 5 still holds.
condition.
K
Theorem S
=0
(1.>
1,
- 18 Proof. 1. (a) Assume without loss of generality that J.1 = O. We can again reexpress the LS
estimating equations as follows:
where now
and
!.7 (j) =
L
n=l+l
** U)
=!.I
+ (N - I) [
1-p
L ~]
k
J.1"2
k=l
By similar methods to those used in the proof of Theorem 5, it is easy to show that for some
2
K<-,
(l
max N Ih. - l(
pSISK
(The terrri involving
LXn _ j
P O.
Hr **
H~
i
-
is killed using Lemma
In addition, using the conditions on ~,
- 19(b) Defining TN and SN analogously to the proof of Theorem 5, we again get that for
some
<..!.,
ex
and
and the rest of the proof follows as in the proof of Theorem 5.
2. Everything follows from the fact that for any 0 < S < ex,
E [ max
lSISK n=l+l
S]
= 0 (N)
which implies that
So by taking S close to ex and
close to
..!.,
we get
0:
and conclusions (a) and (b) follow directly from this.

Theorem 7. If
some K
and conclusions (a)
> 2p
) then
p
4
(b) of Theorem 5
- 20-
Proof. First we note that since p is integer-valued,
p -+
is
to
P [p = p] -+ I (as N -+ 00). From here on, we will refer to K (N) as K and to ~r) as
~k
thus suppressing the dependence on N .
Moreover we will assume that the observations X n are already centered; that is, we have
subtracted out the location estimate ~ (if we are assuming unknown location).
We now use the fact that
il(k) = &2(0)
Ii (1- ~;(l
for k
eI
1=1
where
"2
(J
I N
2
(0) = - L X n
N n=1
and ~ k (/) is the YW estimate of ~k (I
AR(I) model. Now P [p < p]
<5::
P [ min <!>(k)
<5::
<!>(p) j.
os,k<p
Since
we can write
PI lri(l .... ~; (p;;:: ....2p IN
P[p <p]
= P(I-~;(P;;::exp(-2pIN)]
<5::
so
P(N~;(P):S;2p] .
<5::
<5::
I) in an
- 21lim sup P [N ~; (P) ~ 2p] = 0

N~oo
since lim inf

We also have that
P[
p > p ] s P [ <j>(k) < <j>(k -1)

-S;P [N min In(lp<k'SK
for some p < k
~i (k
sK
p
~
< -2 ]
conclusions of Theorem 5 hold, it follows that
and hence
f.l2
N min m(l- t'k(k ~ O.

p<k'5,K
>p] ~O.
Therefore,
Thus P [p
: p] ~
0 and so P [p = p]
1 which implies that
p .
The "practical" implication of this theorem is that if N is large, with high probability
P will equal
provided that
IlJp I
is not too small with respect to N. Or in other
words, for fixed (but large) N , the probability of selecting the correct order decreases as
IlJp I
decreases.
sample Monte Carlo results seem to bear
out.
- 22-
3. Simulation results
For illustrative purposes, a small simulation study was carried out using four symmetric
stable innovations distributions with a = 0.5, 1.2, 1.9 and 2.0 (the latter being the normal
distribution). The underlying processes were AR(I) processes with the AR parameter
13 =0.1 ,0.5 and 0.9. The sample sizes considered were 100 and 900. For N
= 100, the
maximum order K was taken to be 10 while for N = 900, K was taken to be 15. 100
replications were made for each of the 24 possible arrangements of a, 13 and N . The results
of the study are given in Tables 1 through 8.
Estimated
order
0
1
2
3
4
5
6
7
8
9
10
"C AR parameter
0.1
89
4
4
0
1
0
1
0
0
1
0
0.5
0
91
3
1
0
3
1
0
0
1
0
0.9
1
87
2
1
0
2
2
3
1
0
1
Table 1: Frequency of selected order for AR(I)

process. N = 100 a =0.5
Estimated
order
0
1
2
3
4
5
6
7
8
9
10-15
AR parameter
0.1
0.5
0.9
0
0
0
93
95
91
0
0
0
0
0
1
0
0
0
0
0
0
0
0
0
4
0
0
1
5
2
0
0
1
2
0
5
Table 2: Frequency of selected order for AR(l)

process. N = 900 a = 0.5
The results are much as expected. We can see that for N = 100 and
~1
= 0.1, AlC
underestimates the true order with high probability. For N = 900, the probabilities of
selecting the true order increases over those for N = 100.
Estimated
order
0
1
2
3
4
5
6
7
8
9
10
AR parameter
f\J:.
0.1
V.,J
0.9
70
0
0
15
86
86
7
7
4
3
3
1
1
3
1
0
1
0
0
0
2
0
2
1
2
1
1
0
0
0
0
0
Table 3: Frequency of selected order for

process. N = 100 a = 1.2
Estimated
order
0
1
2
3
4
5
6
7
8
9
10-15
AR parameter
0.5
I 0.1
0.9
0
0
0
80
87
90
6
3
3
6
0
1
1
4
3
1
2
0
0
0
0
2
2
1
1
0
0
0
0
0
3
2
2

process. N =900 a = 1.2
Estimated
order
0
1
2
3
4
5
6
7
8
9
10
AR parameter
0.1
0.5
0.9
57
0
0
25
76
71
5
8
10
2
5
9
3
6
6
2
1
2
3
1
0
1
1
1
1
0
0
0
0
1
1
2
0
Table 5: Frequency of selected order for AR(1)

process. N = 100 a = 1.9
Estimated
order
0
1
2
3
4
5
6
7
8
9
10-15
AR parameter
0.1
5
78
7
2
2
0.5
0
74
14
2
1
1
3
0
0
5
1
1
1
0
0
0.9
0
75
9
7
2
2
0
3
0
1
1

process. N =900 a = 1.9
Estimated
order
0
1
2
3
4
t:
6
7
8
9
10
AR parameter
0.1
63
25
4
1
0
2
2
1
1
0
1
0.5
0
75
3
6
7
2
4
0
1
2
0.9
0
75
12
2
'".
3
0
1
0
0
Table 7: Frequency of selected order for AR(I)

process. N = 100 Normal distribution
- 26-
Estimated
order
o
1
2
3
4
5
6
7
8
9
10-15
AR parameter
0.1
0.5
0.9
000
83
79
80
3311
443
460
233
000
040
411
002
000
Table 8: Frequency of selected order forAR(l)

process. N = 900 Normal distribution
- 274. Comments
Bhansali and Downham (1977) propose a generalization of AlC. They propose to

minimize $'(k) = N In
ci (k) + yk
where y E (0,4). It is easy to see from the proof of the
above result that their criterion will also lead to consistent estimates of p under similar
conditions on K (N) and ~~N). In fact, if y = y(N) > 0 satisfies y(N)/N ~ 0, then the
criterion corresponding to $" (k ) = N In
;l (k ) + y(N) k
will consistently estimate p.
Specifically, with known location, the estimate will be consistent provided

lim inf
N -')00
.s:
y(N)
~~N) 1
> P
with y(N) bounded away from zero and with the same conditions on K (N). With an
appropriate choice of y(N), this criterion will also be consistent in the finite variance case.
However if y(N) grows too quickly with N then the criterion may seriously underestimate
the true order p in small samples in both the finite and infinite variance cases. In an
application such as autoregressive spectral density estimation (assuming now finite
variance), underestimation is more serious than overestimation since, if the order is
underestimated, the resulting spectral density estimate may be lacking important features
which may indeed exist.
- 29Davis, R. and Resnick, S. (1985) Limit theory for moving averages of random variables with
regularly varying tail probabilities. Ann. Prob. 13 179-95.
Esseen, C. and von Bahr, B. (1965) Inequalities for the rth absolute moment of a sum of
random variables, 1 s r
s 2.
Ann. Math. Statist. 36 299-303.
Hannan, E. and Kanter, M. (1977) Autoregressive processes with infinite variance. J. App.
Prob. 14 411-15.
Knight, K. (1987) Rate of convergence of centred estimates of autoregressive parameters for

infinite variance autoregressions. J. Time Ser. Anal. 8 51-60.
Shibata, R. (1976) Selection of the order of an autoregressive model by Akaike's

information criterion. Biometrika 63 117-26.
Current address:
V6T lW5
of Statisncs, Univlersity of British Columbia, 2021 West
- 28-
References
Akaike, H. (1973) Information theory and an extension of the maximum likelihood

principle, in: Second International Symposium on Information Theory (eds. B. Petrov
and F. Csaki) Akedemiai Kiade, Budapest 267-81.
Bamdorff-Nielsen, O. and Schou, G. (1973) On the parametrization of autoregressive

models by partial autocorrelations. J. Multivar. Anal. 3 408-19.
Bhansali, R (1983) Order determination for processes with infinite variance, in: Robust and
Nonlinear Time Series Analysis. (eds. J. Franke, W. Hardle, RD. Martin), Springer-
Verlag, New York, 17-25.
Bhansali, R and Downham, D. (1977) Some properties of the order of an autoregressive

model selected by a generalization of Akaike's FPE criterion. Biometrika 64 547-51.
Chung, K.L. (1974) A course in probability theory. Academic Press, New York.
Cline, D.
Report
of random vanabtes with regularly
'{791"\111'1'0
83-24, Institute of Applied Mathematics and Statistics, Umversitv

Consistency of AKAIKE&#39;S INFORMATION CRITERION

Uploaded by

Document Information

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Consistency of AKAIKE&#39;S INFORMATION CRITERION

Uploaded by

Copyright:

Available Formats

CONSISTENCY OF AKAIKE'S INFORMATION CRITERION

FOR INFINITE VARIANCE AUTOREGRESSIVE PROCESSES

TECHNICAL REPORT No. 97

Department of Statistics, GN-22

Consistency of Akaike's Inf()I'Dlation Criterion for Infinite Variance

is a p -th order autoregressive process with. innovations

P , is chosen to minimize Akaike's Information Criterion

over the integers 0, 1, ... , K . It is shown that

is weakly consistent and

depending on the index of the stable law.

March 14, 1987

autoregressive (AR(P process

{en} are independent, identically distributed (i.i.d.) random variables. The

satisfy the usual stationarity constraints, namely all zeroes of the

have modulus less than 1.

a nondegenerate limit u.""UUJUWl\JU of

where cr- (k) is

innovations vanance obtained

equations. We will cnoose as our estimate of p the

the Y\V estimating

between 0 and K , that is,

X n is a nondegenerate autoregressive process.

We can consider a sequence

parameters converse to zero

j=l, ... ,p}.

one can parametrize an AR(P)

process by its p parnai autocorrelations, each

for an AR(p ) ..,rr,/,p,oc

order selection, the p-parametrization is somewhat more natural than

"distance" between two autoregressive models with different orders is more

seen in the p-parametrization,

1. Infinite variance autoregressions

Recall that given observations Xl ' ... ,XN

then we will assume that

and known order p , it is possible to

In fact for LS estimates

13 k = 0 for k > p. For YW estimates, a slightly weaker result holds: convergence to

ois in probability rather than almost sure.

where Jl is unknown and we retain the same assumptions on the

13k ' s and {en}' It can be

n = 1, ... , N), we will still have

Nl/~ (~k - 13k ) !..."

for 0 > max (I, ex)

convergence is almlost sure for

where each row is a finite realization of an AR(p) process:

The corresponding triangular array of innovations, {e;;') }ns,N , consists of row-wise

We will require that

c.I ( PR(N)e n-J.

f3(N) = (f3iN) , ... , f3;N)

are such that (piN) , ... ,

altematives.) Intuitively, it would seem that the smaller

is, the more

To estimate the order

similar in spirit to the sequence of contiguous alternative hypotheses to a null hypothesis

process coefficients. (see

and Resnick, 1985) Moreover,

If we mctude unknown location, u ,

AlC provide a consistent estimate

could equal K (N) with high probability.

L' for 1 S. r S. 2 then

(Note that {Sn ' (j(Sn) ; n ~ I} is a martingale.)

Theorem 2. (c.f, Chung,1974 p.346) If {X n ' (j(X n ) ; n ~ I} is an L' -submartingale for

{X;N)} by virtue of being able to bound the moments

Lemma 3. Let {X n (~)} be a stationary AR(P) process with parameter

Consistency of AKAIKE'S INFORMATION CRITERION

Consistency of AKAIKE'S INFORMATION CRITERION