Professional Documents
Culture Documents
by
Keith Knight
ABSTRACT
Suppose {Xn
in the domain of attraction of a stable law and the true order p unknown.
The estimate of p,
-?'
00
as N
-?'
00
at a certain rate
This research was supported by ONR Contract NOOOI4-84-e-0169 and by NSF Grant MCS 83-01807 and was part of the
of
The author would like to thank his supervisor
author's Ph.D. dissertation completed at the
R.
Martin for his support and enCl)Urngem,ent in preparing this
O.
Introduction
where
}:
Consider a stationary p
parameters
B1 ' ... , Bp
polynomial
where
will be obtained by
minimizing a particular version of Akaikes Information Criterion (AIC) (Akaike, 1973) over
the integers {O, 1 , ... , K (N)}. Because we should be willing to examine a greater range
of possible orders for our estimate as the number of observations increases, it makes sense to
allow K (N) to increase with N . In the finite variance case with K (N) == K , AIC does not
a consistent estimate of p ; in
concentrated on
It snouto
there
integers p, p + 1 , ... , K
is a
k -mmensionar
parameter vector
is defined as tolJ,OWS:
:::: -2
+2
- 2-
is usuany defined
terms
a Gaussian hkelihood; so
a k
as follows:
"2
)=Nlncr
)+2k
,,')
estimate of
=:
In the case where two or more orders achieve the minimum, we will take the smallest of
those to be our estimate.
For certain reasons, we may also want the autoregressive parameters to vary (with N)
over some region of the parameter space. For example, consider the following hypothesis
testing problem:
versus
Ha
} convergmg to
set
stanonanty condmon is
sense
as
-3set
{ (p 1
r s
pp
) :
Pj
set of
E
(-1
int'''T\I::l1
the
~- parametrization,
-4-
ex E (0,2). If E ( Ien
betahatsubJ , ... , ~ I
13 1 "
I) <
00
domain of
where l;c. p :
as,
Nl/~(~k(l) - 13k ) ~ 0
where
0> ex
for
shown (Knight, 1987) that if we center the observed series by subtracting the sample mean X
(i.e.,
'10 .... ,
"
location esnmate Jl
Jl" we
to
rate
-5As stated earlier, we will want to vary the autoregressive parameters with N. For this
reason, we will consider a triangular array of random variables
X
(1)
1
(2)
1
(2)
= ~
(N)
R lfY)X (N).
n-I
..l.J PI
j=1
e(N)
(N)
..l.J
j=O
f3p
, we can attempt to
consistently estimate p at
p;N
are
to intmity and
to
compact)
= "
same time.
to zero as
the testmg
consistent test
-6-
I fit) I
it
should be to distinguish between a p -th order and a lower order AR model. From
simulations, this does seem to be the case. This is the real motivation for allowing the
parameters to vary with N . Consider the following example. Suppose we observe a p -th
order AR process which has Pp very close to zero (say Pp
=0.1).
of the process, we use a procedure which we know to be consistent. So for N large enough,
we will select the true order with arbitrarily high probability. However, for moderate sized
N , the probability of underestimating p may be very high. Conversely, if
I pp I
is close
to 1, then even for small N there will be high probability of selecting the true order. So by
allowing Pp =
fip
to shrink to zero with N , we may get some idea of the relative sample
sizes needed to get the same probability of correct order selection for two different sets of
AR parameters. If we view order selection as a hypothesis testing problem (say testing a null
hypothesis of white noise versus autoregressive alternatives), shrinking
fip
to zero is
not
variance case.
as
recursive YW estimates
model, we
assume that it
sense
it
not
a sense,
-7-
a nuisance parameter
this situation.
We will provide an answer to the following question: under what conditions (if any) on
K (N) and (~iN) , ... , ~;"l)
of p ? Bhansali
(1983) conjectures that AlC may provide a consistent estimate of the order of an
autoregressive process based on the rapid convergence of parameter estimates. However, he
seems to conclude, from Monte Carlo results, that this may not be the case. If K (N) is
allowed to grow too fast then we may wind up severely overfitting much of the time; for
example,
- 8-
2. Theoretical Results
The main result of this paper is contained in Theorem 7; the first six results provide the
necessary machinery for Theorem 7. We begin by stating two results dealing with r-th
moments of martingales and submartingales.
n
Theorem L (Esseen and von Bahr, 1965) Let Sn=l:X". If E(Xn ISn_ 1)=0 for
"=1
2 S. n S. N and X n
I I ) S.
E( SN
2 l: E ( IXn I
).
n=l
o
The following lemma will allow us to ignore the dependence on N of the moments of
for
0 <0<e,
law with
Ct.
~ and innovations
-900
Cj
j=o
(f)en
where
Cj
(~) is a
Now
00
IXn (f) I s
j~O
00
S;
IC j (~) II en -j I
aj Ien _j
j=O
where a , =
shewn that
Ix I < 1
I I
C jP x .i where
00
and so
Iajl
k Ia j I
j=o
X =
k
j=o
a j Ie j
lim
X-4
----"-----'~
'0
J=
00
a.
ct
<
00
This implies that E (X 0) is finite for all 0 < 0 < a and the result follows.
The following lemma will allow us to treat moments of
moments of
Lemma 4.
en when
{X n
k Xn
a > 1.
a zero mean stal:ioI1lary
law
) process
a>1.
1 < r < a,
- 10-
(a)
El
(b) E
x.'
= O(N)
[~ l::?
r ]
= O(N).
Proof.
where
IRN I s
[max
ISkSp
.
C'= [ 1 - ""
~
where
I ~k I ] pep + 1)
max
I-pSksN
Ix, I ].
Thus
Pk
1... Inequality,
J . Th'us by Mi'
inkowski's
>
k=1
l/r
'LXn
n=1
r
I I ]=
by
- 11 -
2;
and
11! II
{jj
max
p'5.1'5.K(N)
{jj
(b)
max
1'5.1'5.K(N)
II ~(l) -
f)(N) II
-
II ~(l) - ~(I)1I ~ O.
-
Note that the vectors are not fixed length but may vary with N .
Proof. (a) The style of proof will mimic Hannan and Kanter (1977). For convenience we
suppress the notation indicating the dependence of {Xn }, {En} and f) on N . For I ~ P
the LSestimating equations can be reexpressedjas follows:
LN
n=l+l
I,
C1
EnX n_ j
Fix
2
o< __
(X,
<
.1..,
(X,
p
~O
(i)
1C
p
v~oo
).
and
max
l'5.K
) - f)
$:,\", (1-2"i7.E[>m.,.
N
LenXn_ j
lS1SK n=1+1
]=1
:; f
N(1-2K)1
j=1
[J
L enXn_ j
n=1
11S1SK
~N
j=1
(l-f<l1
J#:[ lIla){
11SlSK
1]
1
L nXn-j 2
n=l
+ L
+E
21
N(l-2K)1
L enXn
n=l
<1
1rJ
max
by
=0
JI
'21
1/2
EnXn
n=1
ISk,j I
uniformly over j .
Similarly it can be shown that for all permissible values of 1, WN,j = 0 (N) uniformly
over j between 1 and K (N). Thus for a given sequence K(N) = 0 (N) by taking
sufficiently close to ~ and 1 sufficiently close to ~, we will have
as desired.
To prove (ii), we define X n , v
' En, v
as follows:
En,v
L
k=1
k=1
vk
= 1. It
to
VkE n_ k
- 14p
=:E f}jXn-}.v
+e n v
Now
N- K
~ [~A
X
"'" tJ k
"'"
n=K+1 k=l
11. -
00
k v
]2 <
-
~ A~ ~
:E e;.v ~
00.
N- K
"'"
n=K+1
11. -
k: v
since the probability that this quantity stays bounded clearly must tend
to zero.
Now
=:E :E
vk
k=ln=K+1
N-K
=
:E
n=K+1
e;
- 15 Thus
since
N-
N-K
E~ -+
00
n=K+l
Now
N-
k-l
[vk:
k=2
I L [vj I
j=l
En_jE n_ k
n=K+l
En _ j En -k
n=K+l
hence
N
n=K+l
Theorem 1.
-k
]=0
is an L r -martingale and
- 16Now
(la)
TN = max
lS/SK
') I
IC"1 (.I , J') -C-I (.I , J.
lSi,jSI
max
<_... 41
~Xn2
.
n=l
~
41
n=N-K+1
X n2
and
(lb)
some elementary
0:
max
lS/SK
matrix norm is
corresponds to
uniformly in 1 by equation (3b). Finally we must show that the minimum eigenvalue of
N- te
C1
case of symmetric positive definite matrices) merely the reciprocal of this minimum
eigenvalue. Note that for unit vectors
as required.
In the case where we have an unknown location parameter and we estimate it with some
location estimate ~,we can obtain the following corollary.
Corollary 6.
If
(~-IJi
compact subsets
- ; +
J, 0 ]
condition.
K
Theorem S
=0
(1.>
1,
- 18 Proof. 1. (a) Assume without loss of generality that J.1 = O. We can again reexpress the LS
estimating equations as follows:
where now
and
!.7 (j) =
L
n=l+l
** U)
=!.I
+ (N - I) [
1-p
L ~]
k
J.1"2
k=l
By similar methods to those used in the proof of Theorem 5, it is easy to show that for some
2
K<-,
(l
max N Ih. - l(
pSISK
LXn _ j
P O.
Hr **
H~
i
-
- 19(b) Defining TN and SN analogously to the proof of Theorem 5, we again get that for
some
<..!.,
ex
and
2. Everything follows from the fact that for any 0 < S < ex,
E [ max
lSISK n=l+l
S]
= 0 (N)
close to
..!.,
we get
0:
> 2p
) then
p
4
(b) of Theorem 5
- 20-
p -+
is
to
P [p = p] -+ I (as N -+ 00). From here on, we will refer to K (N) as K and to ~r) as
~k
Moreover we will assume that the observations X n are already centered; that is, we have
subtracted out the location estimate ~ (if we are assuming unknown location).
We now use the fact that
il(k) = &2(0)
Ii (1- ~;(l
for k
eI
1=1
where
"2
(J
I N
2
(0) = - L X n
N n=1
<5::
P [ min <!>(k)
<5::
<!>(p) j.
os,k<p
Since
we can write
P[p <p]
= P(I-~;(P;;::exp(-2pIN)]
<5::
so
P(N~;(P):S;2p] .
<5::
<5::
I) in an
~i (k
sK
p
~
< -2 ]
and hence
f.l2
>p] ~O.
Therefore,
Thus P [p
: p] ~
0 and so P [p = p]
p .
The "practical" implication of this theorem is that if N is large, with high probability
P will equal
provided that
IlJp I
words, for fixed (but large) N , the probability of selecting the correct order decreases as
IlJp I
decreases.
out.
- 22-
3. Simulation results
For illustrative purposes, a small simulation study was carried out using four symmetric
stable innovations distributions with a = 0.5, 1.2, 1.9 and 2.0 (the latter being the normal
distribution). The underlying processes were AR(I) processes with the AR parameter
13 =0.1 ,0.5 and 0.9. The sample sizes considered were 100 and 900. For N
= 100, the
maximum order K was taken to be 10 while for N = 900, K was taken to be 15. 100
replications were made for each of the 24 possible arrangements of a, 13 and N . The results
of the study are given in Tables 1 through 8.
Estimated
order
0
1
2
3
4
5
6
7
8
9
10
"C AR parameter
0.1
89
4
4
0
1
0
1
0
0
1
0
0.5
0
91
3
1
0
3
1
0
0
1
0
0.9
1
87
2
1
0
2
2
3
1
0
1
Estimated
order
0
1
2
3
4
5
6
7
8
9
10-15
AR parameter
0.1
0.5
0.9
0
0
0
93
95
91
0
0
0
0
0
1
0
0
0
0
0
0
0
0
0
4
0
0
1
5
2
0
0
1
2
0
5
~1
= 0.1, AlC
underestimates the true order with high probability. For N = 900, the probabilities of
selecting the true order increases over those for N = 100.
Estimated
order
0
1
2
3
4
5
6
7
8
9
10
AR parameter
f\J:.
0.1
V.,J
0.9
70
0
0
15
86
86
7
7
4
3
3
1
1
3
1
0
1
0
0
0
2
0
2
1
2
1
1
0
0
0
0
0
Estimated
order
0
1
2
3
4
5
6
7
8
9
10-15
AR parameter
0.5
I 0.1
0.9
0
0
0
80
87
90
6
3
3
6
0
1
1
4
3
1
2
0
0
0
0
2
2
1
1
0
0
0
0
0
3
2
2
Estimated
order
0
1
2
3
4
5
6
7
8
9
10
AR parameter
0.1
0.5
0.9
57
0
0
25
76
71
5
8
10
2
5
9
3
6
6
2
1
2
3
1
0
1
1
1
1
0
0
0
0
1
1
2
0
Estimated
order
0
1
2
3
4
5
6
7
8
9
10-15
AR parameter
0.1
5
78
7
2
2
0.5
0
74
14
2
1
1
3
0
0
5
1
1
1
0
0
0.9
0
75
9
7
2
2
0
3
0
1
1
Estimated
order
0
1
2
3
4
t:
6
7
8
9
10
AR parameter
0.1
63
25
4
1
0
2
2
1
1
0
1
0.5
0
75
3
6
7
2
4
0
1
2
0.9
0
75
12
2
'".
3
0
1
0
0
- 26-
Estimated
order
o
1
2
3
4
5
6
7
8
9
10-15
AR parameter
0.1
0.5
0.9
000
83
79
80
3311
443
460
233
000
040
411
002
000
- 274. Comments
ci (k) + yk
above result that their criterion will also lead to consistent estimates of p under similar
conditions on K (N) and ~~N). In fact, if y = y(N) > 0 satisfies y(N)/N ~ 0, then the
criterion corresponding to $" (k ) = N In
;l (k ) + y(N) k
.s:
y(N)
~~N) 1
> P
with y(N) bounded away from zero and with the same conditions on K (N). With an
appropriate choice of y(N), this criterion will also be consistent in the finite variance case.
However if y(N) grows too quickly with N then the criterion may seriously underestimate
the true order p in small samples in both the finite and infinite variance cases. In an
application such as autoregressive spectral density estimation (assuming now finite
variance), underestimation is more serious than overestimation since, if the order is
underestimated, the resulting spectral density estimate may be lacking important features
which may indeed exist.
- 29Davis, R. and Resnick, S. (1985) Limit theory for moving averages of random variables with
regularly varying tail probabilities. Ann. Prob. 13 179-95.
Esseen, C. and von Bahr, B. (1965) Inequalities for the rth absolute moment of a sum of
random variables, 1 s r
s 2.
Hannan, E. and Kanter, M. (1977) Autoregressive processes with infinite variance. J. App.
Prob. 14 411-15.
Current address:
V6T lW5
- 28-
References
Bhansali, R (1983) Order determination for processes with infinite variance, in: Robust and
Nonlinear Time Series Analysis. (eds. J. Franke, W. Hardle, RD. Martin), Springer-
Chung, K.L. (1974) A course in probability theory. Academic Press, New York.
Cline, D.
Report
'{791"\111'1'0