You are on page 1of 16

Y"t,lo4

J. RorHpNspRG, AND Javes H. Srocxl

The asymptotic power envelope is derived for point-optimal tests of a unit root in the
autoregressive representation of a Gaussian time series under various trend specifications.
We propose a family of tests whose asymptotic power functions are tangent to the power
envelope at one point and are never far below the envelope. When the series has no
detdrministic component, some previously proposed tests are shown to be asymptotically
equivalent to members of this family. When the series has an unknown mean or linear

trend, commonly used tests are found to be dominated by members of the family of

point-optimal invariant tests. We propose a modified version of the Dickey-Fuller , test

which has substantially impioved power when an unknown mean or trend is present. A
Monte Carlo experiment indicates that the modified test works well in small samples.
KEywoRDs: Power envelope, point optimal tests, nonstationarity, Ornstein-Uhlenbeck
processes.

1. lNrRooucrroru
ForrowrNc rHE

sEMINAL woRK

econometricians have developed numerous alternative procedures for testing

the hypothesis that a univariate time series is integrated of order one against the
hypothesis that it is integrated of order zero. The procedures typically are based
on second-order sample moments, but employ various testing principles and a
variety of'methods to eliminate nuisance parameters. Banerjee et al. (L993) and
Stock (1994) survey many of the most popular of thesg tests. Although numerical
calculations (e.g., Nabeya and Tanaka (1990)) suggest that the power functions

for the tests can differ substantially, no general optimality theory has been
developed. In particular, there are few general results (even .asymptotic) concerning the relative merits of the competing testing principles and of the various
methods for eliminating trend parameters.
Emptoying a model common in the previous literature, we assume that the
data y1,,,,,/r woro generated as

(1)

yt -- dt + ul
. ttt:

dllt-1+

(t:t,...,7),
Dt

where {d,} is a deterministic component and (u,} is an unobserved stationary

zero-mean error 'process whose spectral density function is positive at zero
frequency. dur interest is in the null hypothesis a: 1 (which implies the y, ate
integrated of order one) versus lal< t (which implies the y, are integrated of
lThe authors thank Jushan Bai, Maxwell King, Sastry Pantula, Pierre Perron, and Mark Watson
for helpful discussions. This research was supported in part by National Science Foundation Grant
sEs-91-22463.
813

!E
{r

T&

.ff

814

c. ELLIorr, T. J. RoTHENBERG,

AND J. H. srocK

is
,H

AUTOREGRESSIVE

UNIT ROOT

815

--ri

order zero). Standard asymptotic testing theory, as surveyed for example in

Engle (1984), is inapplicable since the data do not give rise to a l,ocally
asymptotic normal ]ikelihood.z Nevertheless, it is possible to develop an asymptotic framework for comparing alternative tests for a unit root in this model. If

the distribution of the data were otherwise known, the Neyman-pearson Lemma
gives us the best test against any given point alternative a. The power of this

d,

when plotted against d, defines the power

envelope which is an upper bound for the power function of any test based on
the same likelihood. Using large-sample approximations to simplify the analysis,
we can then compare the asymptotic power functions of existing tests with this
asymptotic bound. In piractice, of course, the likelihood function will depend on
additional nuisance parameters determining d,, us, and the distribution of {u,}.

If there

exist feasible tests with the same asymptotic power as the Neyman-pearson point-optimal tests, the comparison will be appropriate in the nuisance

parameter case as well.

when the observed time series is Gaussian with constant or slowly evolving
deterministic component, we find that, although no test uniformly attains the
asymptotic power bound, there exist tests with asymptotic power functions very
close to the bound. Furthermore, these tests can be constructed without knowledge of any nuisance parameters. when the deterministic component contains a
polynomial trend, no feasible test comes close to attaining the power bound
derived under the assumption the trend parameters are known. Nevertheless,
the Neyman-Pearson Lemma can still be employed to derive an asymptotic
power bound for the natural family of tests that are invariant to the trend
parameters. Again, there exist feasible invariant tests with asymptotic power

functions very close to this bound, even when there are additional nuisance

parameters determining the autocovariances of the u,.

our asymptotic power results have implications for tests commonly used in
practice. In the case where there is no deterministic component, we find that the
asymptotic power curve of the Dickey-Fuller / test virtually equals the bound
when power is one-half and is never very far below. In the more relevant case
where a deterministic mean or trend is present, power can be improved
considerably over the standard Dickey-Fuller test by modifying the method
employed to estimate the parameters characterizing the deterministic term.
our approach is similar to that employed by Dufour and King (1991) in their
analysis of exact point-optimal invariant tests in the normal AR(1) model.
However, by employing local-to-unity asymptotic approximations, we are able to
obtain simpler and more interpretable results that cover a much broader class of
models. Saikkonen and Luukkonen (1993) apply a similar analysis in their study
of asymptotically point-optimal invariant tests for a unit moving-average root.

:a.i\$

2. mtn ASyMprorIC GAUSSIAN powER

.t6
.t.I

*
,.]

.ri.3i

:]d

:*

In this section we derive an upper bound to the asymptotic power function for
tests of the hypothesis a: L when the data are generated by (1) and the
following condition is satisfied.

:li

,.,,f

it

CoNDIIIoN A: The stationaty sequence (u,) has a stictly positiue spectral density
function; it has a mouing auerage representation u,:Li:o61'th- 1 where the 11, arc
independent standard normal random uariables and Li:oil}il <*. The initial uo is
0 and the 6's are known.
The unrealistic assumption of known ao, 5's and error distribution is made so
we may employ the Neyman-Pearson theory; in Section 3 we show that it may be
dropped without any essential change. Our results, however, are quite sensitive

to the nature of the deterministic components d,. Section 2.1 considers the
simplest case where the d, are known. Section 2.2 examines the case where the
d, are "slowly evolving" and Section 2.3 examines the case where d, is a linear
combination of nonrandom trending regressors. Our purpose here is to derive
the power bound; tests that might be used in practice are discussed later in the
paper. All proofs are given in the Appendix.
2.1.. Known Detetmi.nistic Component

When the d, are known, a, is observable and minus two times the log
likelihood is (except for an additive constant) given by

(2)

L(a) :lau-(a-r)u-tl;

t-\au- (r,-r)u-i

: (ut, uz - u1,..., uT - ur - t)'., u -, -- (0, Lry... t u7 - 1)', and ) is the

non-singular variance-covariance matrix for uy...,ur. By the Neyman-Pearson
Lemma, the most powerful test of the null hypothesis that a: 1 against the
alternative that a: a rejects for small values of thp likelihood ratio statistic
where Au

L(d) - L(r).

When the sample size is large, any reasonable test will have high power unless
is close to one. Thus, in obtaining large-sample approximations, it is natural
to employ local-to-unity asymptotics where the parameter space is a shrinking
neighborhood of unity as the sample size grows. In our case the appropriate rate
to get nondegenerate distributions is T-1 so we reparameterize the model
writing c:T(a - 1) and take c to be a constant when making limiting arguments. Cf. Chan and Wei (1987), Phillips and Perron (1988). Setting e = T(a - t),
we can then write the likelihood ratio test statistic as

(3)
'Using u different maintained model, Robinson (1994) develops a ,.standard,, asymptotic theory
of efficient tests for a unit root. This requires dropping the familiar autoregression framework and
assuming, for example, a fractionally differenced process for the data.

ENVELopE

L(d) -L(1) :e2T-2u'-r2-1u-\-2eT'1u'-r2-1

Au.

For any given c, rejecting when the linear combination (3) is small yields the
most powerful test against the alternative that c:e .

816

AUTOREGRESSIVE

STOCK

8t7

UNIT ROOT

Note that (T-'u'_rE-'u_r,T-1u'_t2-1 Au) is a pair of minimally sufficient

statistics for inference about a when the nuisance parameters are known.
Furthermore, since the pair has a nondegenerate joint limiting distribution
under local alternatives, the asymptotic minimal sufficient statistic also has
dimension two. As a consequence, there exists no uniformly most powerful test
of a: 1 even in large samples. There is an infinite family of asymptotically
admissible tests, indexed by c, no one dominating the others for all c.
The limiting power functions for the family of Neyman-pearson tests can be
expressed conveniently in terms of stochastic integrals. rgt wo() represent
standard Brownian motion defined on [0, 1] and let w"(.)be the related diffusion
process W"G) : l[exp{c(t - s)} dWoG) which satisfies the stochastic differential

d. are modeled as a linear combination of a set of nonrandom regressors so that

d',: F'2, where B is a q-dimensional unknown parameter vector and the z, are
otserved q-dimensional data vectors. Unless B'2, happens to satisfy Condition
B, the power functions n(c,c) derived in Section 2.1 will not be attainable by

(4)

(s)
\J/

equation aW,U):cW,(t)dt+dIAoQ) with initial condition W"(0):0. In the

Appendix we show that the local asymptotic power function for the test indexed
by d when the significance level is a is given by
rr(c,c)

=wle, Iw,, -

ew"z(1.) <

b(e)]

: Ilw"'(t) dt and b(c) satisfies tulez[W] - ewie)

< b(e)l : a. Because the test indexed by c is optimal against the alternative c:e , the envelope
power function for this family of point-optimal tests is nk): r(c,c).
where

114"2

(Slowly evolving trend): The

T-18,L1(Ad,)2 --+ O and T-1/2maxld,L - 0 as T --+ a.

unchanged if y, were replaced,by y,+p'2, for arbitrary B. tt is therefore

natural to restrict attention to the family of tests which are themselves invariant
to this group of transformations. This approach is taken by Dufour and King
(1991) who build on previous results in King (1980, 1988). (Their additional
restriction of scale invariance is ignored here since the tests proposed in Section
3 satis! this invariance automatically,) Defining the Z-dimensional column
veator yo and the Txq matta Z, bY

lo : (lt, lz - a!r,. ", !r - a!7-11,

Zo:(21,22-azr,.,,,Zr-az7_11,

L(a, P) : (y.- Z, Bl E- 1(y^ - Z, P).

From the development in Lehmann (1959, p. 249), the most powerful invariant
test of a:l- vs. a:a rejects for large values of /exp(-1./2L(d,F)laB7
l exp{ - 1/2L(1, F)} a B. For our normal likelihood, this is equivalent to rejecting for small values of

CoNorrroN

any feasible critical region.

bounded with

This will automatically be satisfied if. the d, are constant, It will also be
of time. These include low frequenry
sinusoids (e.g., d,:cos(2nkt/I) for finite k); slowly increasing time trends
(e.9., dt:ln(r) or d,:t6 for 6 <t/2); and step functions with finitely many
jumps (e.g., d,: Fo when / < to and d,: F, when / > /o). In the slowly evolving
trend case, the random component of y, dominates the deterministic component
when I is large. It is tempting therefore to ignore the deterministic term when
constructing the test statistic. In the Appendix we show that, if the d, evolve
satisfied by a variety of smooth functions

u,by !, when forming (3) has no effect on the asymptotic size

or power of the Neyman-Pearson tests. Under Condition B, there is no efficiency loss from d, being unknown.
slowly, replacing

(6)

tfi: minL(a, B) BB

minL(1, F).

The test statistic is the difference in (weighted) sum of squared residuals from
two constrained GI-S regressions, one imposing d:d and the other imposing

a:

L.

Asymptotic representations in terms of stochastic integrals can be found for

this family of statistics but they depend on the specific 2,. When. there is no
deterministic term(2,:0), E, is identical to the statistic defined in (3). Some
general results for polynomial trend are given in the Appendix. We present
explicit formulas for a constant mean where z,: 1 and a linear trend where
z,: (7,t)'. Let I : (1 - e)/O - a + e2 fi). Defining the process

(D

v,(t,e)

w,(t)

- ,l^orr,

+ 3(1 - t)

[sw"G)

ds],

The construction of a useful asymptotic power bound when the

d,

are

unknown and not slowly evolving is more complicated. Suppose, for example, the

TnEoRpu l: Suppose {y,l is generated by the Gaussian model (L) under Condition A. Consider unit-root tests of size e under locaho-unity asymptotics where both
c : T(a - 1.) and e : T(d - L) are fixed as T tends to infinity.

G. ELLIOTI, T. J. ROTHENBERG, AND J. H.

818

STOCK

AUTOREGRESSIVE

a. When d, is known or satisfies Condition B, the Neyman-Pearson most powerful test against the alteratiue c : e has asymptotic power function r(c,E\ defined in
(4). An upper bound to the asymptotic power of any unit-root test is giuen by the
power enuelope II(c)= n(c,c).
b. When d,: Fo, the most powerful inuaiant test against the altematiue c : E has
asymptotic power function r(c,e). The asymptotic power enuelope for this family of
point-optimal inuaiant tests is II(c).
c. When d,: Fo+ BJ, the most powetful inuaiant test against the altematiue
c

d has asymptotic power function

rr'(c,e):rufe'[r'nz(t,e) dt+

(8)
where

b'(e)

satisfies

(1

e)v,z(t,z)

<b'Cdf

Prlezff](t,e) + (1 :e)V;(I,c) <b'(e)): e. An

upper

bound to the arymptotic power of any unit-root test inuaiant to the trend parameters
Bo and B, is giuen by the power enuelope II"(c) = r'(c,c).

Our primary interest is in alternatives 7 ( 0, but the theorem is valid for

positive c and e as well. There are no simple analytic expressions for the power
envelopes lf(c) and II"(c), but simulations indicate that they are monotonically
increasing functions of lcl. Some plots and calculations are given in Section 4
where a number of alternative tests are discussed.
3. peesrst-e

PoINT-oPTTMAL TESTS

Although the point-optimal test statistics defined in (3) and (6) require -E and
uo to be known, it is possible to construct tests having the same large-sample
properties even in the absence of this knowledge. Furthermore, the asymptotic
theory is valid under less stringent assumptions than those made in Theorem L.
In this section, we continue to assume that equation (1) describes the data
generating process but we drop Condition A and consider the properties of
some feasible tests under weaker assumptions. For 0 (s ( 1, let [sI] be the
greatest integer less than or equal to sZ and let + denote weak convergence of
the underlying probability measures as 7 tends to infinity.

CoNutIoN C: The initial error uo has a distibution with bounded second

for all a in a neighborhood of unity. The zero mean process (u,) ls
stationary and ergodic with finite autocouaiances y(k) = Eu,u,-p such that
(a) .2:Li: --y(k) is fintte and nonzero;
(b) the scaled partial-sum process f-l/2L1,[]u, - alloG\
moment

The assumptions on {u,l are satisfied by stationary and invertible ARMA

models under moment conditions and are standard in the literature. The
stationary assumption can be relaxed at the cost of a more complex notation.
Specific assumptions on u, which imply (b) are discussed in Phillips (1987) and

UNIT ROOT

819

phillips and Solo (1992). When a:L+T-1c and Condition C is satisfied,

T-1/zur,rt- aW"(t) and sample moments of the data have limiting representadons in terms of stochastic integrals involving I/".

To generate a convenient family of tests, we note that, when ao : 0 and the u,

are iid N(0, 1), the likelihood ratio statistic Z| takes a very simple form. Irt
S(a) be the sum of squared residuals from a least squares regression of y, on
Zo, where yo and Zo are defined in (5). Then the test statistic is equal to
S(a) - S(1). If .I is not in fact the identity matrix, this difference in sum of
squared residuals will in general have a limiting distribution depending on the
error variances and covariances and thus will not produce a test of the correct
size. However, it is easy to construct a modified statistic that does produce a
valid large-sample test. For the general problem of testing a: 1 vs. a : a
where d,: B'2, and the u, have unknown covariances, consider the feasible
statistic

(9)
where

612

ar2. If B is known, no regression is needed

F)'(y, - Z" B).

is an estimator for

Zo

and

TnpoRpla 2: Suppose {y,l is generated by (1) where d, is a (possibly constant)

polynomial time trend. Then, if Condition C is satisfied and toz is a consistent
estimator of a2 when c: T(a - 1) is fixed, P, has the same limiting distibution
under local-to-unity asymptotics as If, - c. Specifically, P, conuerges in distibution
b ezlw"z - aW"z(L) in the zero mean and constant mean cases and to e2142(t,e) +
(l - e)V"2(1,c) in the linear trend case.

Thus the power functions r(c,e) ar,d n'(c,c) derived in Theorem L for
point-optimal tests in the Gaussian model with X known can be attained by the
simple P. family of statistics under the much weaker assumptions of this
section. This is important, because, in practice, I will generally contain uirknown parameters and there is often no compelling reason to believe that the
data are normally distributed. If the errors are non-normal, tests exploiting the
form of the actual likelihood and possessing power higher than II(c) and [1"(c)
could be constructed. In the absence of such information, quasi-likelihood tests
based on least-squares regressions are likely to be used in practice. The power
bounds derived under normality are still valid when comparing such tests.
Although our analysis is based on relatively weak assumptions, two interesting
models considered elsewhere in the literature are ruled out. A problem closely
related to ours is to test the null hypothesis that {u,} is an integrated process
against the alternative that it is a strictly stationary process. Under that
alternative, ao will have a variance proportional to (1 - o2)-', a violation of
Condition C. The tests studied in Section 2 are not point optimal under this
specification and the asymptotic power bounds are no longer valid. Our P,
statistics, however, still have simple local-to-unity limiting representations under

820

c. ELLIorr, T. J. RoTHENBERG,

AND J. H. srocK

AUTOREGRESSIVE

AS

(10)

(1ee3).

second, closely related approach to modeling unit roots is also ruled out
here, One way to avoid making an assumption about the initial error a6 is to
base the entire statistical analysis on the conditional distribution of the data
given the first observation yr. When d, is known, there is no difference
asymptotically between our analysis based on the full likelihood and analysis
based on the conditional likelihood. But when d, is unknown, the point-optimal
invariant test based on the full likelihood is not asymptotically equivalent to the
point-optimal test based on the conditional likelihood. Invariance under the

transformatio\ lt+lt+p'2, is often justified by the argument that adding a

constant to the data should not change the analysis. But that argument is not
compelling once one has conditioned on the first observation. There appears to
be no convincing way to avoid making an assumption about the initial observation when there are unknown nuisance parameters in d,.
Although our analysis has been based on local-to-unity asymptotics, our tests
have good properties when judged by standard fixed parameter asymptotics as
well. Specifically, since the power functions for the P. tests have nondegenerate
limits under sequences of local alternatives a approaching unity, one would also
expect power to approach one as I tends to infinity for any fixed a < 1. This is
indeed the case if the estimate 62 is not only consistent for local alternatives
but also well behaved globally.

3: Suppose the conditions of Theorem 2 hold except that a is fixed

than one in absolute ualue. If Prlloz > p) -+ t for some positiue constant
p", then the tests which reject for small ualues of P, (with e : T(A - | held fixed)
haue power functions tending to one as T tends to infinity.
TnsoRpI\,I
less

Estimators to2 that are consistent under local alternatives and have nonzero

probability limits under fixed alternatives clearly satisfy this condition. Some
examples of such estimators are given in Section 5.

a
:.i
!t

4. soup

Theorems 1, and 2 imply that, among tests based on second-order sample

moments, those that reject for small values of [S( A) - ots(l)]/ Az are asymptotically point optimal invariant; each has an asymptotic power curve tangent to the
power envelope at one point. It will be convenient to index the test by its power
rather than by the value a. That is, by inverting the envelope power function

821

*: tlfT(a - 1)l for dll,we can find that alternative d(r,2, a) which yields
(approximate) power rl' when using the point optimal test of level e with a
sample of size Z. Then, for e < rr < L, the family of test statistics can be written

the stationary alternative. Suppose, for example, zo is normal with mean zero
and variance (l - oz1-t and that the u, are serially uncorrelated with unit
variance. Then T-l/zuvn- W!(t) : W"G) * no"' where 4o is a normal variate, independent of W"(.), with mean zero and variance (-Zc)-l. The P.
statistics can then be written as functionals of the W!(t) process. Further
analysis of the stationary alternative testing problem can be found in Elliott

and

UNIT ROOT

PTQI) =

Sla(rr,T, e)l - a(r,T, e)S(1)

h)'

(We suppress the dependence of P on e.) Although every member of this family
is admissible, past research suggests that values of z- near one-half often yield
tests whose power functions lie close to the power envelope over a considerable
range. Cf. King (1988).
For the remainder of the paper we restrict attention to the three standard
cases discussed in the literature where d, is either zero, a constant, or a linear
trend. To distinguish the cases, we follow Dickey and,Fuller (1979) and use a
superscript trr, when d, is constant and a superscript r when it is a linear trend.
Since commonly used test statistics have distributions not depending on the
parameters determining the d,, we shall also restrict attention to invariant tests.
when there is no deterministic term, our family of P, tests includes as special
cases many tests^ previously proposed. Recall that Pr(rr) has the asymptotic
representation czGr)1W"2 -eG)W"z(l) where E(zr) is i monotonically deireasing function taking the value zero when z' is equal to a (the size of the test) and
tending to minus infinity as z,' approaches one" Sargan and Bhargava (1983)
suggest S(0)/S(l)-as a test statistic when the u, are white noise; asymptotically it
behaves like lll"2 and corresponds to P1(1). The locally most powerful test
described by Dufour and King (1991) behaves asymptotically like w"2(D ana
corresponds to P7G). The Dickey-Fuller estimator test (based on their statistic
D) is also a member, since its rejection region is determined, asymptotically, by a
Iinear combination of IW"2 and w"2(1); computations indicate that it has the
same limiting distribution as our Pr(l - a). The Dickey-Fuller , statistic (denoted by i) is a nonlinear function of lW"2 and W"z(l). Nonetheless, computations indicate that the asymptotic power function of their , test is tangent to the
power envelope when power is aboul one-half and behaves like the pr(.5) test.
Likewise, the Z. and Z, tests examined in Phillips (1987) and phillips and
Perron (1988) behave like members of the P. family since they are asymptotically equivalent to the fi and i tests, respectively.
Figure L graphs the asymptotic power functions of these tests along with the
power envelope when the tests have size 0.05. These are based on 20,000 Monte
carlo replications where w" was approximated by its discrete realization from a
sample of size 500; simulation standard errors are less than 0.0013. The power
e,nvelope is monotonic and equals one-half when c :
- 7. with the exception of

the locally most powerful test which puts all the weight on w"z(l), a[ the tests
have power functions very close to the power envelope. Indeed, it is hard to
distinguish them without vastly changing the scale of the figure. Although none
of these tests is uniformly most powerful even asymptotically, our numerical

'#
H
822

STOCK

AUTOREGRESSIVE

,\$

UNIT ROOT

823

'l
:I

it

-/

':3
ililt

l:\f

C.'

:i:I

-.i'
.'2'.-- /-- .'/
/./
.a

-.trb / .//
/'

{\$
o.7

o.7

line: Gaussian Power Envelope

0.6

Solid

o.5

A: Pr(1.0); Sargan-Bhargava
B: P;(.9s); Dickey-Fuller P
C: Pr(.5); Dickey-Fuller"
D: Pr(.05); Locally most Powerful

l=
o.5

o.4

Solid

line: Gaussian Power Envelope

Pf (.5)
B: DF-GI-S/(.s)

C: Sargan-Bhargava

o.1

o.'l

2.5

12.5 15 17.5 X) 25
4

7.5

<.95, the Pr(r) tests are, for all practical

purposes, equivalent in large samples and have power functions essentially
identical to the power bound: Other calculations not reported here demonstrate
that this conclusion carries over to tests at the lVo and l|Vo significance levels as
calculations indicate that, for .25 <

7.5

21.5

Frcune l-Asymptotic power functions of selected unit root tests: no deterministic component.

well.

D: Dickey-Fullerpr
E: Dickey-Fuller?r

o.2

o.2

Things are rather different, however, when d, contains parameters that have
to be estimated. The Sargan-Bhargava (1983) test for the constant mean case,
Bhargava's (1986) extension for the linear trend case, the Dickey-Fuller estimator tests (based on their statistics pp and D'), the Dickey-Fuller, tests (based
on their ?r' and ?'), and the Phillips-Perron Z tesls are no longer asymptotically equivalent to members of the P, family since they employ OI-S estimates
of the p's instead of constrained local-to-unity estimates. The power functions
for the PfGr) and P[(r) tests remain very close to the relevant power
envelopes II(c) and II"(c) for a broad range of 7r values. The power functions
for the tests which use OI.\$ estimates of. B are well below the power envelopes.
Some results for tests at ttre 5Vo level are presented in Figure 2 for the constant

FrcuRE

12.5 15

17.5

27.s

2-Asymptotic power functions of selected unit root tests: constant mean (2,

L).

mean case and in Figure 3 for the linear trend case. The envelope power curve
II'(c) has the same shape as I/(c), but now takes the value one-half when
c: -13.5. The power loss of the commonly used.tests is particularly dramatic in
the constant mean case. The same pattern is found for tests at the lVo and llVo
significance levels.
A measure of the difference between two tests is Pitman asymptotic relative

efficiency (ARE), defined as the ratio of the values of. c at which the tests
achieve a specified power. Evaluating efficiency at power one-half and using 57o
level tests, we find in the constant mean case the ARE's of the Sargan-Bhargava,
ip and ?p tests relative to the powerenvelope are, respectively, 1.40,1.53, and
1.91. Since c is proportional to I, this implies that using the Dickey-Fuller / test
instead of the P,(.5) test is equivalent in large samples to discarding almost half
of the observations. The corresponding ARE's for the linear trend case are L.07,

t.13, and1.,.25.
Since the difficulties with the standard tests are associated with inefficient
estimates of the trend parameters, it is reasonable to expect that modified

824

STOCK

AUTOREGRESSIVE

UNIT ROOT

825

TABLEI

-7

/...r''r-'

o.9

CnmrcaL VeLues,

-.n'

If,vel
2.5%

/o'i'',"
'

o.7

.:-/'

A.

t/

50
100

^'"/..,i'i;

1.91.

1.99

/.,., ,'

/.'./ t
,,,, ,,

/''.'t

o.4

f"l,'ri
(,i'.,'
4,
.t-

o.3

o.2

f/r/
z'

? ,.

o.1

Constant Mean:

1.87
1.95

200

,'

50
100

zo0

line: Gaussian power Envelope

Pf(.s)
a:
Solid

2..5 %

50
100

200

estimates could improve their performance. Because of their relatively good size
properties found in small-sample Monte Carlo studies (e.g., Schwert (1989)),
natural tests to modify are those based on the Dickey-Fuller / statistics ip and,
?'. choosing a to be that alternative where maximal power is approximately
one-half, we propose regressing y" on Zu to obtain the estimate p. Then one
can perform the usual augmented Dickey-Fuller I test (without deterministic
regressors) using the residual series yd =y,- B'2, in place of y,. Thus the
modified test statistic (denoted by DF-GLS(z') in the tables and figures) is the /
statistic for testing ao:0 in the regression

Bo, the estimate fo is stochastically bounded and T-l/z(yt,rt. Yh"n,4,.p)

+ aw"(s); the / statistic calculated from the demeaned data has the limiting
representation 0.5(1W"2)-,/,\W,(t) - 11, which is identical to that of ?. Critical

2.97
3.11
3.17
3.26

3.9r
4.17
4.33
4.48

4.22
4.26
4.05
3.96

5.72
5.64
5.66
5.62

4.94
4.90
4.83
4.78

13.5

3.58

-3.46
3.48

of the conventional Dickey-Fuller /

statistic when there is no intercept. In the linear trend case, the detrended series

-3.46

- 3.29
- 3.18
- 3.15
20,OOO

-3.L9
- 3.03
-2.93
-2.89

6.77
6.79
6.86
6.89

13,5

-2.89

2.74

-2.64
-2.s7

[+i'iii\$I."#lll,,h'[;J: j,1,rr;i,,.',.';,:f :]'J[i:';

30

:I,::tff ;'1*.,';"rTi:;.'"X.'",ii,,:n:*"1,;J;'/#li:Jl::,il#:

Ftcunr 3-Asymptotic power functions of selected unit root tests: linear trend (2,:(1,/),).

(11)

2.39
2.47
2.47
2.55

e: -7

e: -

-3.77

27.5

with

C. Linear Trend: DF-GIS' wirh e :

B: DF-GLSr(.5)
C: Bhargava
D: Dickey-Fullerpr
E: Dickey-Fullerfr

+

Pf

" Entries are based on

2.5

to%

integrals.

yi:l:-

Br- Br, plays the role of y/. It is shown in the Appendix that
T-1/2!["rt* a4(s,c) when d:1+ c/T is used for the estimation of B; the t

statistic then has the limiting representation 0.5[1V,2(s,c)]-1/r142(1,c)

- 11.
Figure 3 graphs the asymptotic power function of the locally detrended r test
when d : - 13.5. It is indistinguishable from the pow"r eru"l,op".
The d that produces a given asymptotic power z depends on the size of the
test, so critical values for the Pr(r) tests and for the Dickey-Fuller / test
applied to locally detrended data depend on a. This is inconvenient as it
requires an extensive set of tables and, if marginal significance levels are
calculated, recomputing the test statistic for different e. since the power curves
of the tests are not sensitive to d.(n) in the range .25 . n. .65, a simpler
approach is to fix d:1+ E/T independently of e. our calculations indiiate
that, if e : -7 is chosen for the constant mean case and c :
- 13.5 for the
linear trend case, the limiting power functions of the resulting p. tests and for
the Dickey-Fuller / test applied to locally detrended data are-witirin 0.01 of the

power envelope
given in Table

for .01 < e < .10. Some critical values for this choice of e are
r. Note that, although the small-sample values are valid only for

826

AUToREGRESSTvE UNrT

G. ELLIorr, T. J. RoTHENBERG, AND J. H. srocK

Gaussian white noise {u,}, the large-sample critical values do not depend

on

.5

or normality.
5.

r'rurt

lr

63r: t
nt:

SAMPLE PERFoRMANCE

theory describes the small-sample properties of our tests. We investigated tests

on Pf(.S), the standard Dickey-Fuller , statistic (denoted DF-?p), and
the modified Dickey-Fuller , statistic (denoted DF-GLS/') for the constant
mean case and the corresponding three tests (based on Pf(.5), DF-?", and
DF-GLS") in the linear trend case. Data generating processes considered
elsewhere in the literature (e.g., Phillips and Perron (1988), Schwert (1989),
DeJong et al. (1992), and Lumsdaine (1994)) were employed. Specifically, letting
{A} be a set of independent standard normal variables, we used the following
three models for the {u,} process:

based

MA(l):
AR(l):

III.

GARCH MA(1):

(o:

u,: Qu,-'* n,

.8,.5,0, -.5,

(d:

-.8),

.s,

-.s),

.s,0,

-.s).

u,: (,- 0{,-1, t,:hl/'r\,,

ht:l+ .65ht-r* .25{}_1, ho:0

(0:

each of these models the initial condition was ,r0 :0. Although the null
distribution of the test statistics considered here are invariant to the initial
condition, small-sample power typically depends otr uo. This dependence is

In

investigated by considering a variant of the first model where the {rr,} are strictly
stationary under the alternative hypothesis. That is, zo is normal with mean zero
and variance equal to (l+ gz -zea)/(L- a'),a+ 1. This design violates our
Condition C and is intended to shed light on the importance of that assumption.
The autocorrelation structure of {u,} was assumed to be unknown to the

investigator, so two types of loosely parameterized estimators, autoregressive

(AR) and sum-of-covariances (SC), were used for toz. The AR estimators are
given by

(13)
wherc

(14)

o:^:

6]

u;

f (, - ,L,r,)'

and the d, are OLS estimates from the regression

Ay,:aoy,-r*arAy,-L+...

+ap

827

Two choices of lag length were employed: the AR(8) estimator used p:8 and
the AR(BIC) estimator used p chosen by the Schwarz(1978) Bayesian information criterion constrained so 3 <p < 8. The SC estimators are given by

A Monte Carlo experiment was conducted to see how well the asymptotic

I.
[.

Roor

Ay,-r*aoat*

4r.

K(m/tr)i(m)
- lr

where K(.) is the Parzen kernel, i@): T-tL!:1"e,er+n, and e, is the residual
from an OLS regression of y, on (/r-r, z,). Two variants were employed: SC(12)
using /, : L2 and SC(auto) using Andrews' (1991) optimal automatic procedure
(his equations (6.2) and (6.4)).
The results are summarized in Table II for a constant mean and in Table III
for a linear trend. Tests were at the 5Vo asymptotic significance level and the
sample size T was 100. For a: l, the tables report the observed rejection rates
from 5000 Monte Carlo replications when critical values were based on the
limiting distributions. For a ( 1, the tables report size-adjusted power; this is the
rejection rate when critical values are estimated from the a: 1 Monte Carlo
trials.

The results suggest three conclusions. First, the predicted superiority of the
tests using local-to-unity estimates of the mean and trend parameters is borne
out by the Monte Carlo study. The Pr and modified Dickey-Fuller tests have
higher size-adjusted power than the standard Dickey-Fuller , test for almost all
of the data generating processes and all choices of to2. The improvement is
largest in the constant mean case. A.lthough the observed power curves tend to
be somewhat below the asymptotic power curves, the results are generally
consistent with the predictions of the asymptotic theory. The main exception is
the poor performance of the point-optimal tests using SC estimates of r,r2 when
the MA parameter 0 is large.
Second, the choice of estimator for az has a large effect on the size of the P,
tests, with the AR estimator exhibiting much smaller distortions than the SC
estimator. This mirrors similar results found for other unit-root statistics; see,
for example, DeJong et al. (1992) and Perron (1996). The AR(S) and AR(BIC)
tests have moderate size distortion except in the MA model with large 0. The
modified Dickey-Fuller tests have notably smaller size distortions than those
based on Pr. In addition, the tests based on the AR(BIC) estimator have better
size-adjusted power than those based on the AR(8) estimator, which typically
estimates more nuisance parameters. Other experiments not reported in Tables
II or III indicate that the AR(BIC) tests also dominate the ones based on the
AR(4) estimator for a2. Lag length selection based on sequential likelihood
ratio statistics was also tried; no general improvement over AR(BIC) was found,
although the LR selector appears to improve the size-adjusted power of the
modified Dickey-Fuller test relative to BIC in the linear trend case, at least for

of 0.
Third, the powers of the P. and modified Dickey-Fuller tests deteriorate
substantially when the a, are stationary. Even so, in the linear trend case with
small values

G. ELLIOTT, T. J. ROTHENBERG, AND J. H.

srze ern srzp-Aorustpo powrn or
5%
Test

Asymptotic
Power

Slalistic

Pi\$)
AR(8)

1.00
.95
.90
.80
.70

.05
.32
.'t6
1.00
1.00

Pi\$)
1.00 .os
AR(BIC) .95 .32
.90 .76
.80 1.00
.70 1.00
1.00

.05

.95

.32

.90

Pr(J)
SC(auto)

.80

r.00

.70

1.00

l.oo
.95
.90
.80
..10

A(.5) 1.00
AR(8)
.95
.90
.80
.70

DF-GLS

P(.5) 1.00
AR(BIC)
.95
.90
.80
.,to

DF-GLS

1.00
.95
.90
.80
.'lo

.os
.32
.76
1.00

1.00
.05

.32
..15

1.00
1.00

0.18
0.31
0.47
0.56

o.t4
0.24
0.50
0.82
0.92

0.o2
0.29
0.64
0.96
1.00
0.04
0.30
0.67
0.97
1.00
0.05
0.21
0.42
0.68
0.80

1.00

0.10
0.26
0.56
0.87
0.96

.05
.12
.31
.85
l 00

0.08
0.11
0.23
0.55
0.76

.05

.32

.'t5
1.00

TEsrs oF THE

CoNsTANT

Mrar (2,:

MAo), A-

0.18

AUTOREGRESSIVE

1),

0J-

REsuLTs

GARCH MA(l),

- 0.5

0.20 0.18 0.20 0.22

0.19 0.18 0.15 0.18
o.32 0.32 0.30 0.31
0.50 0.51 0.51 0.46
0.s7 0.59 0.60 0.47 0.55

0.20
o.17
0.29
0.46
0.53

0.21
0.18
0.30
0.48
0.s6

0.11
0.27
0.57
0.89
0.97

0.10
0.27
0.56
0.88
0.96

0.13
0.26
0.s4
0.86
0.95

0.56 0.57 0.3,1

0.8

0.20
0.18
0.31
0.48

0.65
0.97
1.00

0.04
0.30
0.68
0.98
1.00

0.06
o.23
0.43

0.10
0.28
0.59
0.91
0.98

0.28 0.79 0.26
0.59 0.41 0.52
0.92 0.79 0.83
0.98 0.94 0.93

0.07 0.50 0.98 0.01

0,29 0.2'1 0.15 0.29
o.'to 0.62 0.09 0.s9
0.99 0.84 0.01 0.92
1.00 0.78 0.00 0.98
0.06 0.31 0.88 0.03
0.32 0.31 0.30 0.29
o.74 0.73 0.59 0.64
0.99 0.99 0.'t2 0.96
1.00 1.00 0.71 0.99

0.84 0.91

0.98

0.08

0.07 0.1 1 0.45 0.07

0.28 0.30 0.30 0.26
0.60 0.67 0.68 0.54
0.93 0.97 0.98 0.86
0.99 1.00 1.00 0.95

0.59
0.92
0.98
0.06
0.10
0.22
0.56
0.7e

0.10 0.13 0.13
o.22 0.31 0.31
o.59 0.77 0.78
0.83 0.96 0.96

0.06
0.10
0.20
0.46
0.67

.99

1.oo

.05

0.i3

.95

.10

0.18

0.3?

.90

.27

0.36

0.69

.80

.81

0.65

0.83

.10

.99

o.82

0.10
o.tl
0.36
0.69
0.86

1.00

.05
.10

.90

.27

.80

.81

.'10

.99

0.00
0.10
0.25
0.65
0.87

0.00
0.11
0.26
0.69
0.91

0.o3 0.'t7 1.00 0.00

0.12 0.11 0.05 0.11
0.32 0.22 0.OZ 0.24
0.81 0.35 0.00 0.57
0.96 0.29 0.00 0.83

0.57

.95

1.oo

.05
.10

.90

.27

.80

.81

.'70

.99

0.01
o.12
0.30
o.7'1
0.9'1

0.01
0.11
0.30
0.79
0.97

0.12 0.12 0.10
0.32 0.33 0.22
0.85 0.85 0.41
0.99 0.98 0.39

0.00
0.11
0.27
0.69
0.93

0.26

.95

1.00

.05

,10

0.04
0.08
0.16
0.30
0.41

0.0s
0.09
0.7't
0.31
0.42

0.05
0.09
0.7't
0.33
0.45

0.04 0.09
0.10 0.11
o.20 0.25
0.40 0.s3
0.56 0.68

0.0s
0.09
0.15
0.28
0.37

0.05

.95

0.11
0.11
0.23
0.53
0.75

0.08
0.10
0.23
0.57
0.80

0.07
0.10
0.24
0.61
0.84

0.11 0.58
0.11 0.12
0.28 0.2't
0.72 0.70
0.94 0.91

0.06
0.10
0.22
0.48
0.69

0.07

0.10
0.09
0.16
0.36
0.5'1

0.07
0.08
0.74
0.36
0.s8

0.08 0.09 0.08
0.15 0.18 0.17
0.39 0.51 0.50
0.64 0.81 0.80

0.05
0.08
0.14
0.30
0.48

0.06

0.11 0.12 0.11

0.68

0.64

0.68

0.59

o.97

0.95
0.99

0.98

0.80

1.00

0.74

0.o7

0.34

0.31

0.31

0.73

0.04
0.30
0.68

0.69

0.99

o.97

1.00
0.06

0.87

0.82

,,10

0.51

0.7't

o.gz

0.s1

0.28

0.74

o.li

0.46

0.08
0.28
0.62
0.95
1.00
0.05
0.11
0.2s
0.65
0.89

0.02
o.17
0.36
0.63
0.76

Pi\$)
AR(8)

Pi(.s\

0.10

AR(BIC)

0.17

Pi\$)

0.07

ss(12)

0.17
0.40
0.'13
0.85

0.70

0.04
0.17
0.39

0.44

0.98

0.98

0.80

1.00

0.'12

1.00

1.00

0.8s

0.91

0.06
o.22
0.43
0.69
o.82

0.06 0.07
0.22 0.23
0.44 0.4'1
0.71 0.78
0.83 0.90

0.06
0.14
o.25
0,40
0.46

0.09
0.26
o.57
0.90
0.98

0.08 0.1 I
0.27 0.26
0.59 0.61
0.92 0.95
0.98 1.00

0.08
o.17
o.37
0.66
o.79

0.80

0.07
0.10
o.23
0.54
o.77

0.06 0.08

0.06

0.06

Pi(s)

0.06

Sc(auto)

0.19

DF-GLS'(.5)
0.06

AR(8)
.

0.14
0.25
0.40

DF-GLS'(.5)
0.07

AR(BIC)

0.16
0.37
0.68

0.10 0.13

0.11

0.r1

0.23 0.29

0.24

0.24

0.58 0.73

0.57

0.60

0.82 0.93

0.80

0.84

replications,

Power

.95

.05
.10

1.00

.90
.80

.81

.'70

.99

1.00

.05

.95

.10

.90

.27

.80

.81

.70

.99

1.00

.05

.9s

.09

.90

.19

.80

.61

.'70

.94

0.5

0.5

0.16
0.35
0.68
0.84

0.11
0.26
0.61
0.71

0.12
0.32
0.86
0.99

0.09
0.18
0.3s
0.48

GARCH MA(l),

Staiionary MA(l),

-0J
0.19
0.14
o.23
o.37
0.46

0.o 05
0.19 0.13
0.15 0.15
0.25 0.25
0.40 0.42
0.48 0.51

-0J
0.18
0.13
0.23
0.38
0.46

0"0

0J

0.18
0.13
0.24
0.39
0.48

0.14
0.13

0.11
0.17
0.34
0.66
0.83

0.08
0.16
0.34
0.68
0.84

0.06
0.17
0.37
0.73
0.89

0.10
0.14
0.29
0.61
0.78

0.07
0.14
0.30
0.63
0.81

0.05

0.00
0.11
0.27
0.68
0.89

0.03
0.12
0.29
0.74
0.93

0.77
0.11
0.20
0.31
0.25

0.00
0.09
0.21
0.51
0.73

0.03
0.10
0.25
0.64
0.85

0.77

0.01
0.11
0.30
0;76
o.97

0.05
0.11
0.30
0.81
0.98

0.51
0.11
0.30
0.80
0.97

0.01
0.10
0.23
0.62
0.87

0.04
0.10
0.25
0.70
0.93

0.49

0.05
0.08
0.15
0.31
0.43

0.05
0.09
0.16
0.32
0.45

0.0s
0.09
0.18
0.38
0.53

0.05
0.08
0.13
0.22
0.30

0.05
0.08
0.13
0.23
0.31

0.04

0.08
0.10
0.23
0.56
0.78

0.06
0.10
0.24
0.59
0.82

0.11
0.11
0.26
0.69
0.91

0.08
0.09
0.19
0.46
0.67

0.07
0.09
0.19
0.49
0.71

0.11

0.07
0.08
0.14
o.34
0.55

0.06
0.08
0.14
0.37
0.60

0.09
0.09
0.18
0.48
0.78

0.o7
0.08
0.15
0.36
0.58

0.05
0.08
0.15
0.39
0.64

0.09

0.25
0.42
0.50

0.14

0.32
0.68
0.86

0.09
0.16
0.25
0.20

0.10
0.24
0.63
0.82

0.09
0.15
0.26
0.33

0.47

DF-z"
AR(BrC)

iy'ot?J: For each statislic, entries in the first row are the empirical
reiection rate under the null (the size). The remaining eDtries arc
size-adjusted power under rhe model described in if,"
,,a"y.pto-ii"'io*"i,;; is the locallo-unity
6sympt
power ror each statisric. The entrv helow the name
""jl"i'. " "otu-n,
of each
secrion 5). For the lol< t
in the final lhree columns, uo *u'" dr"*n r.om lts srarlonaiy ai;i;i;;'tl-.
Based on 5000 Monte carlo
j

t
""i*i,''t
"irii.il"liii*re. rh;;;;;";;;;7';;;'i;:

0.05

.81

0.30

0.66

o.tz

0.10
0.16
0.37
0.60
0.76

.2't

.80

0.57 0.58 o.4s

0.08

0.24

0.07 0.05 0.29

0.17 0.18 0.15
0.36 0.39 0.32
0.72 0.7'r 0.70
0.88 0.92 0.90

.90

0.13

0.03

0.31

0.18
0.16
0.26
0.41
0.49

AR(l), d 05 -oJ
0.18 0.14 0.13 0.21 0.17
0.16 0.16 0.13 0.15 0.15
0.26 0.2't 0.25 0.2s O.24
0.41 0.44 0.45 0.38 0.38
0.s0 0.53 0.42 0.45 0.46

0.25

0.20

o.29

0.18

MA(l).0:
-

0.42

0.33

0.99

Asymptotic

Tesl
Statistic

0.8
0.16
0.15
0.26
0.40
0.48

0.0

0.29

0.06 0.06 0.12 0.06

0.23 0.23 0.30 o.23
o.45 0.47 0,62 0.42

0.70

0.27

Powrn or SeLEcteo TEsrs or rrre I(1) Nur-l: Movre CanLo Resulrs

SVo LeverTEsrs, LII.TEAR Tnnxo (2, : (1, r)'), T: 100

0- Srationary
0.5 - 0.5 0.0

05

0.29

829

100

":

ARo), 6

-o5

0.02

UNIT ROOT

TABLEIII
sslrcreo

LevuTEsrs,

0.8

STOCK

0.10
0.25
0.63
0.88

0.08
0.15
0.42
0.69

0.09
0.21
0.54
0.76

0.09
0.18
0.52
0.81

830

G. ELLIorr, T. J. RoTHENBERG, 41sp J. H. srocK

AUTOREGRESSIVE

0:0, the size-adjusted powers of the tests using Iocal detrending exceed that of
DF-4". In the constant mean case, the size-adjusted powers of tests using local
demeaning exceed that of DF-?,' for close but not distant alternatives. The
gains from employing local-to-unit estimates of the intercept appear to depend
crucially on the assumption that, under both the null and the alternative

let

O.ifttrcelementsoftheTxluectot'zandofttrcTxTmatr*Aareboundedinabsoluieualue,then,

Pnoor:

nt

s- l - f

- *)z:

T-@

onl- r, rf, there exist positive m and M such that

r(2) <2rM and r(-I-r) <(2nm)-t. For some constant K, lD'zl<r<llDll,

/(

hence,

lll<KT.

>- | _ V ) zl

: 7- r lat l- | D, zl. 7- t y, E_ | llD, zl < T_ t lxlr( E_,

|
T- ltrl A' ( E- -,1, ) A)l : T- 2 lttl A' >- | D'All < T - 2 r ( E- 1 )l All D'Al
t

lxt (

)f

llOll

O,

Leurraa

A2:
I

(l)

if c : T(a

posiriue and,

- l)

'u',>'il

is

*,

fued as T -

,-a'l

St.,

O.

under Condition

rs

P^
,)ll.

'u',u

O.

Pnoor: Since zrf(),):V-*y(k)eik^ and,l2rf(l)l'r:y'-*p(k)eik^, we find that a2:2nf(0)

>0andIi_-p(k):.-z.As /rr:u+cT-tu-y itwillsufficetoshowthat Sl=T'2u'-{2-1 .-21)u-,3 0 ,na Sz=27-tu'-{2-t - o.-2l)u 1.-2y(0- 1. When uo:0, u-r:Au, where
A:la,,l is a ?X 7 matrix with a,, equal to dr-r-l when r>s and zero otherwise. Note that
la,,l<el'l and, for nonrandom square matrix B, E(u'Bu): tr(B)) and var(u'Bu) :tr(BEB'+
B}B'E). Defining R:(61-t/z - r-!t/2), we find
lE(sr )l : 7-2.- tl1rlA'R E- t/2A r)l r-r.- r( l)r1/2 ( >-t )etclpnl,
=
Var(Sr ) : T-aa-22trl At R r- /2A tAt t- /2RA 2) < 2'T-2@-212(>rrr r-t1"zvtlRAlz ,
E(s2) : 2T- t ltr(A) - .-2 tr( Ee)l
: -2 o-2 7- tDI: t(k)Q - k) dk- | - o-2y(0) - t,
Var(S2 ) : T-2o- 4 fil At R El / 2At R > / 2 + RA EA' Rl < 8T- o:- r ( E )l Mr .
t

d,,: D y(k)pG-r-k)+ Dz(k)p(s -r-k).

Since

<f(l) <M;
T-

94720, U.S.A.,

l*t yUr) be the autocovariance function and /(I) the spectral density function for the stationary
process {ur} satisfuing Condition A. The rs element of the Z x I covariance matrix .t is y(r s) :
l!, ei<'-"ty1^1d,\. We shall approimate -5- t by the f x f matrix g w.irh rs element p(r - s) =
pI
is given by Davies (1973) and DzhaI!,ei('-")^l4n2f(A)l-r dA. The rc element of D=Irparidze (1985) as

tim r-rx'(.5-r

T)@

lD'Al<KTt/zllDll, and

Dept. of Economics, Uniuersity of Califumia-San Diego, 9500 Gilman Dr., La

Jolla, CA 92093, U.S.A.,
Dept. of Economics, Uniuersiry of Califomia-Berkeley, Euans Hall, Berkeley,

APPENDIX A: PRELIMINARy LErrluas

D Dia,,l.2Ll-y?)lk
D lp(ilt.-.
j=-@
r-ls:l
k-t

under ConditionA,

accurate, these tests are essentially optimal among tests based on second-order
sample moments and should perform considerably better than tests which
employ OLS estimates of the parameters determining d,. Our Monte Carlo
results suggest that the Dickey-Fuller / test applied to a locally demeaned or
detrended time series, using a data-dependent lag length selection procedure,
has the best overall performance in terms of small-sample size and power.
The numerical finding that, as a practical matter, the asymptotic power
functions of the P.(.5) and the modified Dickey-Fuller , tests effectively lie on
the Gaussian power envelope indicates that, in large samples, there is little
room for improvement under the stochastic specification made here. Of course,
if the errors have a known non-normal distribution or if the initial error zo is
large compared to ar, better tests could be constructed. Furthermore, the Monte
Carlo evidence suggests that autocorrelation in the u, can have very substantial
effects in small samples. Nevertheless, it appears that, when parameters in the
deterministic component of a series have to be estimated, the proposed tests for
a unit root dominate those currently in common use.

final

TT

tlDll:

LEMMA A1: Let 2 and 9 be TxT Toeplilz matices formed from y(k) and p(k), the Fouier
coeficients of 2rf(l) andl2rf(),))-r, tcspectiuely. Let xbe a Tx 7 uectorsuch that lim,--T-llxl:

If the sample size is large enough so the effects of residual

autocorrelation are captured by 6' and the asymptotic approximations are

1992;

B:lbijl,let

=LLlf:lbijl,<

As a consequence, we have the following four lemmas.

squares regressions.

Manuscipt receiued Nouember,

llBll

(A1)

6. coNcrusroNs
The P, and modified Dickey-Fuller , statistics are easily computed from least

and
Kennedy School of Gouemment, Haruard (Jniuercity, 79 John F. Kennedy
Cambidge, MA 02138, U.S.A.

831

r(B)bethesquarerootof thelargestcharacteristicrootof .8,8,

and let lBl :trt/2(B'B). Then, r(B)<lBl <llBll and, if B and C are
conformable, ltr(BC)l lBllcl and lBCl<lBlr(C). Cf. Davies (1973). Since E;- --l6jjl < o implies
Di= --ly(k)kl < o and (by Theorem 5.2 in Zygmund (1968, p. 24n D|= --l p(r)l < "1, we find
Forreal pxqmatrtx

hypotheses, only the early observations are informative about that parameter.

cA

UNIT ROOT

'i

:i

To complete the proo! we need to show that

a7(k)

T_K T
=T-' L L o,,"o,*0," :

T-tlRAl-

0 as 7+ o. Define
T-K

T-2(7 + cT-r)k

,:1 s:1

Note that, for fixed k,

a7(k)-("2'-l-2c)/(2c)z

where the limits are independent of k. Moreover, the

when

ar(k)

E (t + cr-r;2{'-"(T - k - r)'

c*0

and

a7(k)+l/2when c:0,

832

y(k)and p(k)are absolutelysummable.

s.=E|=l_r*fl(k)-o2

Since

a-2 , it follows that

T-ztt[A,(>-srr)A]:2L

and

AUTOREGRESSIVE

rr=L[*r_r+tp(k)

y(k)lar(&) -a7(0)l -

t-1
T-l
T-2trlA, (V - rrl) A) : 2 | p(k)ta7(k)

_ a1(0)l

k:1

0,

where

T-2

0.

1A

o)-

>A

the data are generated by (7) under Conditions

and B, then

(A4)

(As)

plimT-2 d'- 1 Z-

where

d_,

(0, db

..

.,

dr_

-, :

I-

plim

and Atl

d,_ r

(db d2

S- r 4a

dr,

A, A]

0.

dr_ rl

g,

p(o)r-

,o?

,f

T-1

o?-

[d,d,*o-d,_1d,_111,1

-T-r

- r)2

- a d'
-,

>-

tA

ZA,

E-

- t < T-

e2ktr ( E ) 12 (

E-

- e
which.implies that plimT-2d'-t>-1u_t:0. For the second part of (A5), note that Au_ul
+ 0 and lD'Al37r/z"tctllDll imply T-t ad,(2-t _ v)u_r 5 0 ,in""
)l

d.

rl2

argument used
shows that

T-

t

ld,1l,

d, _

thus completing the proof.

rp

u_

rl < T-

o) - a-2lQr(d) - Qr\)l

t<T

k-_@

p&)l L

o,

where,r, is the last column of X'. Under Condition A,T-t/2upz..-aW,(s). By the continuous
mapping theorem and the Ito calculus,
|

/2

xE + hO, ilw,o)

Io'

n tt, e) + eh(s,

wherelr(s,c-)isthevectorconsistingofthe4-lfunctions

e))w,( s) ds

:I

h?)ldw,

- ew,l,

h,(s,e),H(s,d) isthevectoroffirst

derivatives dh/s,e)/ds, and the time index is dropped in the final term for notational convenience.
Since Iim T- tZ'Z is block diagonal, we find that Qy(a) + u? + o2Q(d) vrhere

orrrroo -rw))
o@ =Uh.aaw, -ew,)f'
Un<etn,<ttf-'11
Setting a to one, the same argument shows that QrG) - Qr1) * azlQc) - O(0)1.
The argument now follows the proof of Lemma A2. Because the elements 2,i of Z
polynomials in t, for all k and for all (i, j) pairs except i:i:1,
the terms

arc

T-k

2,,i21

a,il

are uniformly bounded

to constants independent of k. The ,J element of 7:-r(Z'92is given by LI:l "r]tr"^o
p?)tblk) - b7(0)l and hence tends to zero except when i :J : 1. Since
T-tXt(Z't - 9)X + 0 and T-tX'(2-t - 9)x + 0 by lrmma A1, we conclude that T-rX'>,-rx

rrZ'Z)

o.

last term tends to zero by the same

is stochastically bounded, the algebra of (46)

mryl,)

t<T

o.

ana

+0and o2T-txtE-txaG(e).similarly,T-t(x'>x-.2x'x)+0,so?-llRxl2-0,whereR

mu<ld,l

G10)

k_

QY

and

2

o- T-

< t<t
_a

EIT- d'- | 2-

G) -

Q{G): t'2-12(72-tZ)-tzt>-te . write Z: [.r,x], where x is the flrst

X: [,r,il consists of the other q - 1 columns. Defining el to be the first column of
ITand r tobeavectorof Tones,wecanwrite x:(Tt/2+eT-t/2)et-dT-t/2t;hence,T-tx'x+1
indT-t/2x,t-_ttr+or(7).For j:1,.,...,q_1, xtj:1and x,i:T-i(Tili _c-ri)when /> 1. Forall
tand j, l;,;l<g+eandhenceT-tX'x+0.Definethecontinuousfunctionft;(s,-):(j-cs)si-r
-1.
for 0 < s < Then T- X' X converges to the matrix GG) : I ) : I I ; h I s, z)i jG, d dsl.
The identity X'u - X'-ru-1: X' Au -l 4X'U- r implies
(A9)
T-t/zx't:T-t/2Xt(Au -eT-tu_1) :T-t/2xrur- T-3/2(TAX+eXlu_1

,,

T-k

+2\ oG)T-, |
k-l
,-1

in (5).

T-2d,_t>-rd_1<r(>-t)T-2dt_i_t<r(D-t)max,.rdlyT-0.
I-I Ad,(r-t - ild_t:0. But, defining do =b, wi have
I

is given

uo

oY

!ZG'Z)-'Z't

By Lemma A1, lim

(,{6)

Z,

and

pnoor: Define the qxq diagonal matrix Nr: diag(j.1/2,7,r-1,...,7,-n-') and the f x4
matru Z:ZuNr. Then, setting 6: Au-dT-tu-1:uf (c-e)T-tu-, we have QrG):

column of

plim T- | Ad, E- 1 u _,

dr

...,

k) : u',,E-'Z,:ZLI,- tZ,)-' 222-

Qy

2At

uo:(upuz-du1,...tu7-au7-11

(A8)

RAf - T- 2 fi At R2A : T- 2 tr[ @2A, U-

Leuva A3: If

t
Q7k) : u'oZolZ|Z"l- Z2u,,

LEMMA A4: Suppose the data are generated by (1) under Condition A and d,: B'2,, where
zi=(1,t,t2,...,tq-t). Then QrG) - QrT) has a timitingdistibutionwhen c: T(o - 1,) ande :T(a
l) are fixed as T tends to infinity. Furthemtore,

Cf. Anderson (1971, 1,0.2.3). Using (A?), we have

(43)

833

Define

(AT

T-1

UNIT ROOT

So:7-rtz*'rr-t -.-211u !

O,

tr[E(S3S!)]: (d-2T-ltrlXtR>-t/2A>At>-t/2RX'l< a-27-tr(E)r(E-t)e2t"tlRXl2 - 0 and

: s'- z 7- t 11 yt R2 x) : r- z 7- r RX 12 + 0. This impties T- | /2 x'( >- I - r-, t) t 3 o.
By the same argument, T-2LtE-tL-0, T-rll-rer-0, T-rt2-1t1 0, ura T-1e'rE-tu-rLO.
since

tr[E(s+sl)]

834

G. ELLIOTT, T. J. ROTHENBERG, 41qp J. H.

Definin_g

x'b) 3
(Ai1)

AUTOREGRESSIVE

STOCK

the chi-square vaiate Xz(u)=(e,rl-tu)2/e\I-re1, we find that 1*,2-tE)z7xtE-txT- tZ'>- tZ is block diagonal, we have

0. Since lim

OYG): a-2er(d)

(A8) follows.

- .-ru? + oo0).

+ x2(u)

on d,

(fw,'z,W,'z(D). Lemma 42 implies that iiizu'_r2'-tu_r,27-tu,_rE-t

to a-2 (T-2 u'- ru - y,2T- | u'-, Au + /0)- .2). But
(B1)
since

2T-tu'-, Au=T-11u2

Au'

Aul:f-tul.-

(82)

e2

T-

Lt

-, E-

-, -

ZeT-

If y is used in place of Il in the slowly

2

a2T- (d'_ t

2-

d_

i2

d'_

| tt'

-I

E-' Au -

e2

y(0).

W,

/u)

converies in probability

(l) - ll.

u_

r)

2eT-

t(

>-

Recalling that

L| :

Ii :

B)

y;l 2-,

minrL(-u, B)
T-

>-, z.(zL E- l z )- 2," E-' r.

]
minoL{l, F) and

a:

>- | Au + Ad, E-

u_

u,o

E-

uo

- e{

(a).

(83)

(T-2u'-p

u'

- Et

- 1,7-

2eT-

u' _ t

uzr,grG) - ereD

I4- tt

=e2

Iw,'-aw"z(7)

+ g(o)

- ,r([w,r,w,2(t),eG)- O(0)),

- eG) + c.

For polynomial trend, L* is a function of c, - and 4; it does not depend on .E at all. when : 1 so
{
z,:1, Q(e):0 and the result is the same as in part (a).
(c) Wtren
(A10)
is given by (1 -cs) so [h(e)2:t_ e +
/,.1 Fo+ Pl + ut, the tunction lr(s,c) in
"
e'/3
and lhG)G\-zw,):Q,-e)w,(l)+ezlswc(s). Afrer considerable algebraic manipulation,
we find the following alternative expressions for (84):

(Bs)

t -e : r, I r:

(1,

e)w,2 O)

e2

/3),

azPr -- s(d)

: izT-

u'

,'ouo

- Qrk),

s(1)(1 + er- t)
- ru - t

Or( d).

P,

a-'.'le'lw,'

- a(e)]

-ew,'<t>+ o(o)
= Lx

as long as plim 6'2

o2.

PRooF oF THEoREM 3: It suffices to show that A2Prl 0 *h", ?+ o with lal < 1 and fixed.
Since the initial condition is asymptotically negligible, {u,} behaves like a stationary process. Cf.
Anderson (1971, Section 5.5.2). Thus T-zu'-ru-r3O urA f-'u'r30. From (Ag), T-t/2XE:

^p g.
(86) that to'Pr
-

0, so both

Or(1) and Q7(d) converge to af. It follows from

+ ?-1c-, we find

where Q is defined in (A10). Thus, from (A8) and the argument leading to (B2),

(84)

(86)

Z- | Au + ey e)

- Oy G).
-, For the polynomial trend of Lemma 44, we have from the continuous mapping theorem,
c2

Comparing (B4) and (B7), we see that P, + e

d _ L + d, _ |

to the statistic on the left of (B2). But, from Lemma ,A3, these terms converge in probability to zero
under Condition B, so the limiting distribution is unchanged.
(b) From standard GLS projection theory and (A7)
min L(o,

(.B7)

>-

I : o - d)/(l -

^w,(l) y,P be the detrended series

process Z.(s, c-) has a simple interpretation.

The limiting results (A10) and (B3) follow from the fact rhat T-t/2uL"rt-alY,(s) and that
,-' r' , L z(0). Since these limits also follow from Condition C, we have

ftrs,

elw,2

+ (7 - )r)3lsw,(s), and Iz.(s,e) : I/.(s) -sb,. The

l*t
_
_ Bol _ ( Ey _ ilt
_
_
),p : t, B{ B( t : ut ( B{
te
wnere B{ ard B( are the estim^ates that minimize L(a, il when d : 7 +.7- . From the algebra
pfl is stochastically bounded and T'/2(E( - F)- ob,. Hince,
leading io (A11), we find that

where

estimates. Thus the same interpretation holds when detrending is done with OI-S.

z(0) +or(1)

T-r Au' ALt:T-ru'u + czT-3u,_rtt_r+ZcT-2u,_ru

835

ROOT

r)1tr,. By I-emma
4(s,e) is the limiting representation of the standardized detrended series 7-rl2opolynomial
of
in
the
trend
case
are
asymptotically
estimates
equivalent
to GIS
B
44, OLS

The statistical theory underlying Theorem 1 can be found in Lehmann (1959, Chapter 6). The
limiting representations for the test statistics are derived as follows:
(a) Under Condition A, T-1/2up,]
aWG) and hence a-2(T-2u,_ru_yT-ruzr) *

UNIT

- 0 - e + e, fil-

tf<r

-.)w,(1) *

REFERENCES
ANDERSoN, T. W. (1971): Tlrc Statistical Analysis of Time Series. New York: Wiley.
ANDREws, D. W. K. (1991): "Heteroskedasticity and Autocorrelation Consistent Covariance Matrix
Estimation," Economeltica, 59, 817-858.
Baren:ee, A., J. J. DoLADo, J. W. GeI-eRAtrH, AND D. F. HENDRv (1993): Co-inegratiott, Enor
Corection, and the Econonenic Analysis of Non-stationaty Data. Ofrord: Oxford University Press.
BHencave, A. (1986): "On the Theory of Testing for Unit Roots in Observed Time Series," Reuiew
of Ecortotrtic Studies, 53, 369-384.
CnaN, N. H., AND C. Z. WEt (1987): "Asymptotic Inference for Nearly Nonstationary AR(1)
Processes," Annals of Statistics, 15, 1050-1063.
Devres, R. B. (7973): "Asymptotic Inference in Stationary Gaussian Time-Series," Aduances in
Applie d Pr oba bility, 5, 469 - 497.
DEJoNc, D. N., J. C. NnNrrnvrs, N. E. SAVIN, AND C. H. WHTTEMAN (1992): "The Power Problems
of Unit Root Tests in Time Series with Autoregressive Errors," Jotunal of Ecottontehics, 53,

323-343.

r,

l rw,<rlf'

:e'ftlw,<a - sw,(l))2rls + (1- n

llw,<a-,11,2<,1]'r"]
:e2
ilw,(s) - sbrlz ds + (r - dlw"(r) - brl' :d l4rG,e) + (r - e)42e,,.)

Dtcrry, D. A.,

AND W. A. FULLER (1979): "Distribution of the Estimators for Autoregressive Time

with a Unit Root," Joumal of the Ameican Statistical Association, T4, 427-431.
DuFouR, J.-M., AND M. L. KrNc (1991): "Optimal Invariant Tests for the Autocorrelation CoeffiSeries

cient in Linear Regressions with Stationary or Nonstationary AR(1) Errors," Jounul of Economettics, 47,115-143.
DzHarantozr, K. (1985): Parameter Estimation and Hypothesis Testing in Spectral Analysis of Stationaty Time Series. New York: Springer-Verlag.

836

c. ELLIorr, T. J. RoTHENBERG, AND J, H, srocK

ELLlorr, G. (1993): "Efficient Tests for a Unit Root when the Initial Observation is Drawn from its
Unconditional Distribution," unpublished manuscript.
Ettcle, R. F. (1984): "Wald, Likelihood Ratio, and Lagrange Multiplier Tests in Econometrics," in
Handbook of Econometics, Vol. II, ed. by Z. Griliches and M. Intriligator. New York: North
Holland.

A. (1976): Intoduction to Statistical Time Seies. New York: Wiley.

KING, M. L. (1980): "Robust Tests for Spherical Symmetry and their Application to Least Squares
Regression," Annals of Statbtics, 8, 1265-1277.
(1988): "Towards a Theory of Point Optimal Testing," Economettic Reuiews,6, 169-21,8.
LEHMANN, E. (1959): Testing Statistical Hypotheses. New York: Wiley.
FULLER, W.

-Luusontrve, R. L. (1995): "Finite Sample Properties of the Maximum Likelihood Estimator in

GARCH(1, 1) and IGARCH(I, 1) Models: A Monte Carlo Investigation," Jounul of Business and
Economic Statistics, 13, 1-10.
Nareve, S., arqo K. Ta].IaKA (1990): "Limiting Power of Unit-Root Tests in Time-series Regression,"
Joumal of Econometics, 46, 247 -27 1.
PERRoN, P. (1996): "The Adequary of Asymptotic Approximations in the Near-Integrated Autoregressive Model with Dependent Errors," Joumal of Econometics,70,317-350.
PHILLIPs, P. C. B. (1987): "Time Series Regression with a Unit Root," Econometica, 55,277-301.
PHtt-lrrs, P. C. B., AND P. PERRoN (1988): "Testing for a Unit Root in a Time Series Regression,"
Biometika, 7 5, 335-346.
PHtt-rrrs,P.C.B.,ANDV.SoLo(1992): "AsymptoticsforLinearProcesses," Annalsof Statistics,20,
971-1001.
RostNsoN, P. M. (1994): "Efficient Tests
Statistical Association, 89, 7420-1437.

of Nonstationary Hypotheses," Joumal of the American

SntxxoNru, P., AND R. Luuxxoxen (1993): "Point Optimal Tests for Testing the Order to
Differencing in ARIMA Models," Econometric Theory, 9, 343-362.

Sancm, J. D., AND A. BHARcAvA (1983): "Testing Residuals from Least Squares Regression for
Being Generated by the Gaussian Random Walk," Econometica,5l,753-1.74.
ScHwARz, G. (1978): "Estimating the Dimension of a Model," Annals of Statistics,6, 467-464.
ScHWERT, G. W. (1989): "Tests for Unit Roots: A Monte Carlo Investigation," Joumal of Business
and Economic Statistics, 7, 147-759.
Srocx, J. H. (1994): "Unit Roots and Trend Breaks in Econometrics," in Handbook of Econometics,
Vol.4, ed. by R. F. Engle and D. McFadden. New York: North Holland, pp.2740-2847.
Zvctu,ruxo, A. (1968): Tigonometic Seies, Vol.7. Cambridge: Cambridge University Press.

CONOME,TRICA
JOURNAL OF THE ECoNoMETRIC SOCIETY

-(,1,

CONTENTS
Peren C. B. Psllt-tps: Econometric Model Determination . . . .
Gneueu ELLtort, Tuotrans J' RorHeNnsnc, AND Jar'aes H. Srocrc: Efficient
Tests for an Autoregressive Unit Root . . . .

Form;

Variables and Semiparametric Functional Forms

Perpn Hell auo Joel L. HoRowlrz: Bootstrap Critical Values for Tests Based
on Generalized-Method-of-Moments Estimators
AlenNono M. MeNplt-t: Cheap Talk and Sequential Equilibria in Signaling

Games

763
813
837
865
891

917

Mutations

943

... ..

Analysis

957

Nores eNo CouveNrs:

CHnrsropsen D. Cennoll eNo MILes S. KtNaeaLL: On the Concavity

of
981

the Consumption Function

ANr'roLrNcpveNTS .

993

997

News Nores

SuavlssroN

MoNocnapH
998

Srntps

999

Socrerv

VOL.

64, NO.

4-July,

1996