You are on page 1of 32

UNIT ROOT AND COINTEGRATING LIMIT THEORY

WHEN INITIALIZATION IS IN THE INFINITE PAST




By

Peter C. B. Phillips and Tassos Magdalinos



May 2008






COWLES FOUNDATION DISCUSSION PAPER NO. 1655













COWLES FOUNDATION FOR RESEARCH IN ECONOMICS
YALE UNIVERSITY
Box 208281
New Haven, Connecticut 06520-8281

http://cowles.econ.yale.edu/
Unit Root and Cointegrating Limit Theory when Initialization
is in the Innite Past
1
Peter C. B. Phillips
Cowles Foundation for Research in Economics
Yale University
and
University of Auckland & Singapore Management University
Tassos Magdalinos
University of Nottingham, UK
January 28, 2008
1
Phillips acknowledges partial support from a Kelly Fellowship and from the NSF under Grant Nos.
SES 04-142254.and SES 06-47086. Correspondence to: Peter C.B. Phillips, Department of Economics,
Yale University, P.O. Box 208268, New Haven, CT 06520-8268, Email: peter.phillips@yale.edu
Abstract
It is well known that unit root limit distributions are sensitive to initial conditions in the distant
past. If the distant past initialization is extended to the innite past, the initial condition dom-
inates the limit theory producing a faster rate of convergence, a limiting Cauchy distribution
for the least squares coecient and a limit normal distribution for the t ratio. This amounts
to the tail of the unit root process wagging the dog of the unit root limit theory. These simple
results apply in the case of a univariate autoregression with no intercept. The limit theory for
vector unit root regression and cointegrating regression is aected but is no longer dominated
by innite past initializations. The latter contribute to the limiting distribution of the least
squares estimator and produce a singularity in the limit theory, but do not change the principal
rate of convergence. Usual cointegrating regression theory and inference continues to hold in
spite of the degeneracy in the limit theory and is therefore robust to initial conditions that
extend to the innite past.
Keywords: Cauchy limit distribution, cointegration, distant past initialization, innite past
initialization, random orthonormalization, singular limit theory.
JEL classication: C22
1. Introduction
Early research on unit root limit theory revealed that initial conditions could play an impor-
tant role in the nite sample performance of tests and the form of the limit distribution. The
latter role was evident in continuous record asymptotics (Phillips, 1987) and unit root asymp-
totics developed under distant past initializations (Phillips and Lee, 1996; Uhlig, 1995). The
importance of initial conditions in aecting size and power in inference has been particularly
emphasized in recent work (Elliott, 1999; Mller and Elliott, 2003; Elliott and Mller, 2006;
Harvey, Leybourne and Taylor, 2007).
For many economic time series that wander randomly like integrated processes, the precise
initialization of the sample observations that are used in inference typically has nothing to do
with and, in principle at least, should not aect the underlying stochastic properties of the
time series. Moreover, the stochastic properties of the initiating observation must often be
expected to be analogous to those of the terminal observation. Accordingly, just as the time
series may wander according to a stochastic trend, the initialization itself may be regarded as
the outcome of a similar random wandering process that may have originated in the distant
past. In developing asymptotics that embody these properties, it is therefore of some interest
to determine the eects of such conditions on the form of the limit theory and on econometric
inference.
The present contribution points out that if a distant past initialization is extended to the
innite past, as is frequently the case in stationary series, then the unit root limit theory is
dominated by the initial condition. This outcome is equivalent to the tail of the unit root
process wagging the dog of the unit root limit theory, an analogy given in an early draft of this
paper (Phillips, 2006). Thus, even though an invariance principle still operates, the tail of the
process determines the form of the limit theory. In such cases, initial conditions are evidently
of great signicance.
To x ideas, consider the simple unit root autoregression
r
t
= jr
t1
n
t
, t 1, ..., : , j = 1, (1)
driven by stationary innovations n
t
. In order for the process r
t
to be uniquely dened by the
stochastic dierence equation (1) an initial condition is required. In most cases this initial
condition is taken to be a constant or a random variable with a specied distribution see
e.g. White (1958) and Anderson (1959). However, other possibilities may be considered.
Much of the theory for the stationary case ([j[ < 1) is based on the Wold decomposition
r
t
=

1
)=0
j
)
n
t)
which entails an initial condition of the form r
0
=

1
)=0
j
)
n
)
for (1), so
that r
0
and r
t
are comparable in distribution and order of magnitude. When j = 1 the innite
series in this initialization for r
0
diverges almost surely. We can nonetheless consider an initial
condition that is of the form
r
0
(:) =
in

)=0
n
)
, (2)
where i
a
is an integer-valued sequence increasing to innity with the sample size. Clearly,
the sequence i
a
determines how many past innovations are included in the initial condition,
with larger values of i
a
associated with the more distant past. As shown below, under suitable
assumptions on the innovation sequence, i
12
a
r
0
(:) has a limiting form that dominates the rate
1
of convergence and the asymptotic distribution of j
a
when i
a
,: . The limit distribution
of j
a
is then Cauchy and bears more similarity to autoregressions with explosive or mildly
explosive roots (c.f. Phillips and Magdalinos, 2007a, 2007b) than it does to conventional unit
root limit theory. Andrews and Guggenberger (2007) found that a similar result applies for
autoregressions with roots very close to unity and innite past initializations.
On the other hand, when (1) is a vector autoregression, an innite past initialization for
the process gives rise to a singularity in the asymptotic form of the sample moment matrix.
This degeneracy is analyzed in the paper by characterizing the degeneracy and rotating the
regression coordinates in a direction orthogonal to the initial condition. These reductions
produce a limit theory for the least squares estimator that has the usual :-rate of convergence
but a dierent analytic form.
In cointegrated models involving integrated processes where initial conditions are in the
innite past, a similar degeneracy occurs in the limiting sample moments. Nonetheless, the
usual mixed normal limit theory for estimation of the cointegrating matrix still applies and
inference may proceed as usual in such situations. The eect of innite past initializations is
therefore moderated in multiple regressions when there are some unit roots. These results are
relevant in practice and conrm that there is some robustness in cointegrating regression theory
to very distant initializations. In this respect, scalar unit root limit theory and cointegration
theory are again quite distinct.
The paper is organized as follows. Section 2 outlines the models used and formulates
initial conditions into three categories (recent, distant, and innitely distant) determined by
the inherent order of magnitude of the initialization and the extent to which the initialization
reaches into the past. Our primary interest in this paper is in the third category, where innite
past initializations are permitted. Some preliminary results for unit root autoregressions and
vector autoregressions as well as the new limit theory for innitely distant past realizations
are presented in Section 2. Section 3 develops the corresponding limit theory for cointegrated
systems and explores the implications for inference. Section 4 discusses extensions to models
with deterministic trend. Proofs are given in the Appendix. Throughout the paper standard
weak convergence and unit root limit theory notation is employed.
2. Limit Theory under Extended Initializations
2.1 Model and assumptions
Consider an R
1
-valued integrated process generated by
r
t
= 1r
t1
n
t
, t 1, ..., : , 1 = 1
1
, (3)
where n
t
is a sequence of zero mean, weakly dependent disturbances and r
0
= r
0
(:) is an
initialization based on past innovations that is possibly dependent on the sample size :. The
latter dependence enables r
0
to have analogous properties to those of the sample trajectory
values r
t
: t = 1, ..., :. The following conditions facilitate the development of a limit theory
based on the Phillips - Solo (1992) framework.
2
Assumption LP Let 1 (.) =

1
)=0
1
)
.
)
, where 1
0
= 1
1
and 1 (1) has full rank. For each
: Z, n
c
has Wold representation
n
c
= 1 (1) -
c
=
1

)=0
1
)
-
c)
,
1

)=0
,
2
|1
)
|
2
< , (4)
where (-
c
)
c2Z
is a sequence of independent and identically distributed (0, ) random vectors
with 0.
We employ the usual notation \ = 1 (1) 1 (1)
0
for the long run variance of n
c
and A =

1
I=1
1
_
n
t
n
0
tI
_
, ^ =

1
I=0
1
_
n
t
n
0
tI
_
for the one sided long run covariance matrices.
Under (3), we may decompose r
t
as
r
t
= r
0
(:) 1
t
, (5)
where 1
t
:=

t
)=1
n
)
is an integrated process with initial condition 1
0
= 0. The asymptotic
behavior of r
t
is governed by the order of magnitude of the initialization r
0
(:) , which in turn
depends on the behavior of i
a
as : .
Assumption IC The initial condition r
0
(:) of the stochastic dierence equation (3) is
given by (2) with n
)
satisfying Assumption LP and (i
a
)
a2N
an integer valued sequence satis-
fying i
a
and
i
a
:
t [0, [ as : . (6)
The following cases are distinguished:
(i) If t = 0, r
0
(:) is said to be a recent past initialization.
(ii) If t (0, ) , r
0
(:) is said to be a distant past initialization.
(iii) If t = , r
0
(:) is said to be an innite past (or innitely distant) initialization.
The above rates for the sequence i
a
are considered in view of the diering impact of the
initial condition on the time series r
t
and least squares regression theory on (3). Recent past ini-
tializations where t = 0 satisfy r
0
(:) = O
j
_
i
12
a
_
= o
j
_
:
12
_
and do not contribute to the lim-
iting distribution of the least squares coecient estimator

1
a
=

a
t=1
r
t
r
0
t1
_
a
t=1
r
t1
r
0
t1
_
1
,
in the same way that constant initial conditions are asymptotically negligible. Thus, the limit
distribution of the standardized and centred estimator :(

1
a
1
1
) is invariant to recent past
initialization of the process and has the standard form given in Phillips and Durlauf (1986).
Distant past initializations have asymptotic order r
0
(:) = O
j
_
:
12
_
and are of the same order
of magnitude as the partial sum process in the functional limit theory that drives unit root
asymptotics. In consequence, the standard approach to unit root limit theory applies but with
an additional contribution from the initial condition, as shown in Phillips and Lee (1996) for
the near-integrated case.
The eect of innite past initializations on unit root limit theory is materially dierent
and seems not to have been considered in the published literature, although some results may
3
be familiar
1
. The present paper makes several contributions to this subject. First, we show
that an innite past initialization dominates the unit root limit theory, giving rise to a Cauchy
limit distribution for the normalised and centred least squares estimator and a limit normal
distribution for the t statistic in the univariate case. These results, which are analogous to
those for an explosive Gaussian autoregression, hold under an invariance principle. Second,
for multivariate integrated regressors the eects are shown to be more complex in nature but
simpler in terms of their implications. The complexity arises because innite past initializations
produce an asymptotic degeneracy that gives rise to a singular least squares regression limit
theory. This singularity carries over to cointegrating regression limit theory, where the eects
are important for inference because they ensure robustness of the standard limit theory to
innite past initial conditions, thereby simplifying the eects of initialization on inference. In
this respect, there are some major dierences between the eects of large initial conditions
on unit root limits and cointegration regression theory.
2.2 Recent and distant past initializations
The following result summarizes limit theory for

1
a
covering recent and distant past initial-
izations and is largely already familiar.
Theorem 1 Under model (3) and Assumptions LP and IC with t [0, )
:
_

1
a
1
1
_
=
__
1
0
d11
+0
t
A
___
1
0
1
+
t
1
+
t
0
_
1
, as : (7)
where 1, 1
0
are independent 1-vector Brownian motions with variance matrix \, 1
+
t
(:) =
1(:)
_
t1
0
(1) and A =

1
I=1
1
_
n
t
n
0
tI
_
.
Remark A
(i) Under recent past initializations, t = 0 and the usual least squares regression theory
(Phillips and Durlauf, 1986; Phillips, 1988a) applies. Similar results have been obtained
(see Mller and Elliott (2003) and the references therein) for nearly integrated processes
with coecient matrix 1
a
= 1
1
C,:, C = diag(c
i
) < 0, and an initial condition of
the form r
0
(:) =

1
)=0
1
)
a
n
)
. Of course, when C = 0 this innite series diverges. The
integrated processes of this paper could be nested into a local to unity framework by
choosing a more exible distant past initialization of the form
r
0
(:) =
in

)=0
1
)
a
n
)
where i
a
,: t (0, ) ,
with 1
a
= 1
1
C,:, as in Phillips and Lee (1996). Theorem 1 then specializes that limit
theory to the case where C = 0. In this sense, Theorem 1 is not new and is included for
the sake of completeness.
1
For instance, the scalar case has been given in Yale time series lectures for some years and, as mentioned
above, Andrews and Gugenberger (2007) recently considered a very near to unity scalar limit theory with innite
past initializations.
4
(ii) The Brownian motions 1
0
and 1 in Theorem 1 are independent limit processes corre-
sponding to partial sums that involve past and sample period innovations, respectively.
These processes are dened by the functional laws

0
a
(:) :=
1 (1)
i
12
a
bincc

)=0
-
)
=1
0
(:) and
a
(:) :=
1 (1)
:
12
bacc

t=1
-
t
=1(:)
given in (27) and (37) below. The composite process 1
+
t
(:) in Theorem 1 then depends
on both the limiting sample trajectory 1(:) and the component
_
t1
0
(1) which carries
the eect of the initial conditions.
(iii) Theorem 1 is readily extended to include the case where a nonparametric bias correction
(Phillips, 1987) is made to the estimate

1
a
involving a consistent estimator

A of the
one sided long run covariance matrix A that is constructed in the usual manner from
regression residuals. Expression (7) in the limit theory is adjusted accordingly, eliminating
the term in the numerator of the matrix quotient that involves A. Evidently, the critical
values corresponding to this limit theory dier from those delivered by standard unit root
tabulations when t 0, partly explaining the size distortions from distant initializations
that can occur in such cases.
(iv) If an intercept is included in the regression, i.e. the integrated process r
t
is generated by
(3) but the least squares estimator is obtained from the regression
r
t
= j
a


1
a
r
t1
n
t
, (8)
then the distribution of

1
a
is invariant to the initial condition r
0
even in nite samples.
This simple algebraic fact implies, in particular, that the limit theory for least squares
regression in this case is given by Theorem 1 with t = 0 and 1 replaced by demeaned
Brownian motion.
2.3 Innite past initializations: scalar autoregression
The main contribution of the present work is the development of a limit theory under innitely
distant initializations, as presented in Theorems 2 and 3 below and in the cointegration limit
theory that follows in Sections 3 and 4. We start with the scalar case.
Theorem 2 When 1 = 1 and Assumptions LP and IC hold with t = , the following limit
theory applies as : :
(i)
_
i
a
:
_

1
a
1
_
=(, where ( is a standard Cauchy variate.
(ii) Letting :
2
a
= :
1

a
t=1
_
r
t


1
a
r
t1
_
2
, the t-statistic satises
_
a
t=1
r
2
t1
_
12
:
a
_

1
a
1
_
=
\
o
2
\ (1) , (9)
where o
2
= 1
_
n
2
t
_
and \ is standard Brownian motion.
5
Remark B
(i) Part (ii) follows immediately from part (i) and the fact that :
2
a
= :
1

a
t=1
n
2
t
O
j
_
:
1
_

j
1
_
n
2
t
_
. Theorem 2 shows that integrated processes with innitely distant initializations
do not conform with the usual unit root asymptotics. The asymptotic behavior of the
least squares estimator presents more similarities to explosive rather than unit root re-
gression theory in the form of the limiting distribution, its symmetry, and the rate of
convergence. The latter can be made to grow arbitrarily fast according to how far in the
past of the innovation sequence the initial condition r
0
(:) is allowed to reach. If the
sequence i
a
is allowed to increase at an exponential rate, the least squares estimator of
Theorem 2 may achieve or even exceed the explosive consistency rate.
(ii) The limit behavior of the t-statistic also resembles the standard stationary and Gaussian
explosive cases. When the innovation sequence n
t
is independent, \ = 1
_
n
2
1
_
, so the
t-statistic has a standard normal limit distribution. Andrews and Guggenberger (2007)
derived a related result by considering local to unity autoregressions with an innite past
initialization based on i.i.d. innovations. The present result extends that theory to the
unit root case with weakly dependent innovations. Obviously, both (i) and (ii) can be
used for inference, and in the case of the t statistic consistent estimation of \ and o
2
can
be accomplished by standard methods.
(iii) It is worth pointing out that the eect of the dominating initial condition in Theorem
2 is analogous to the eect of the initial condition and initial shocks in an explosive
autoregression. In that case the initial condition and shocks also play a dominant role in
determining the form of the signal, which behaves like the square of a one dimensional
random variable whose distribution depends on the distribution of the shocks in the pure
explosive case but not in the mildly explosive case - see Phillips and Magdalinos (2007a).
In the present case, the centred least squares estimator again behaves like the ratio of
two independent random variables, one determined by the past (through 1
0
) and one by
the future (through 1). Unlike the explosive case, the limit theory involves an invariance
principle because the dominating initial condition eect arises from the functional limit
law (28).
(iv) The heuristic explanation for the result in Theorem 2(i) is that when t = the behavior
of the time series r
t=[av]
is overtaken by the one dimensional normal random variable 1
0
,
which is not dependent on r, the limiting point in the sample trajectory corresponding
to t. So the limiting trajectory of the process over the sample period is dominated by
the innitely distant initialization. Hence, upon suitable scaling, the numerator of the
centred least squares estimate is a product of independent normals and the denominator
is the square of one of these normals, thereby producing a Cauchy limit distribution
for the centred coecient. Upon random normalization in the case of the t ratio, the
eect of the innitely distant initialization cancels from the numerator and denominator,
producing a Gaussian limit. In both cases, the tail of the unit root process wags the
trajectory of the process and in doing so denes the limit theory when t = .
(v) Theorem 2 holds for regressions through the origin. If an intercept is included in the
6
regression as in (8), the eect of the initial condition is eliminated and standard unit root
limit theory involving demeaned Brownian motion applies.
(vi) The Cauchy limiting distribution of Theorem 2 requires that the initial condition is a
random process. If the initial condition is a non-stochastic sequence increasing at a rate
faster than O
_
:
12
_
, i.e. r
0
(:) ,
_
i
a
0 R 0 with i
a
,: as : , the tail
of the unit root process again wags the trajectory of the process. The decomposition (5)
and the central limit theorem then imply that r
t
is dominated by r
0
(:) in such a way
that the signal is non random in the limit and

1
a
1 =

a
t=1
r
t1
n
t

a
t=1
r
2
t1
~
r
0
(:)

a
t=1
n
t
:r
0
(:)
2
=
1
_
:i
a
1
p
a

a
t=1
n
t
r
0
(:) ,
_
i
a
,
implying that
_
i
a
:
_

1
a
1
_
=
_
0,

0
2
_
. This Gaussian limit theory corresponds to
results originally obtained for large initializations in Phillips (1987) and later in Perron
(1991). Of course, this specication for r
0
(:) is rather unrealistic because the initializa-
tion does not carry any information about past innovations, unlike initializations such as
those given in (2) which carry long range memory eects of the past innovation sequence.
(vii) Since the rate of convergence of

1
a
in Theorem 2 (i) is of order
_
:i
a
, which exceeds
the order : rate of conventional unit root theory (Phillips, 1987), it is apparent that
conventional coecient based unit root tests will produce conservative tests, thereby
under rejecting the null hypothesis of a unit root asymptotically. Hence, as indicated in
Andrews and Guggenberger (2007), the usual unit root tests are robust to innite past
initializations with t = .
2.4 Innite past initializations: vector autoregression
When the initialization is in the innite past (t = ), the sample moment matrix is shown in
(43) to behave asymptotically as
i
1
a
:
1
a

t=1
r
t1
r
0
t1
=1
0
(1) 1
0
(1)
0
as : ,
where 1
0
= 1' (\) obtained from the functional law (27), so the limit is singular if 1 _ 2.
A similar situation occurs in explosively cointegrated systems with repeated roots, i.e. systems
with a (possibly mildly) explosive coecient matrix that does not have distinct latent roots
see Phillips and Magdalinos (2007b) and Magdalinos and Phillips (2007) for details. The
asymptotic singularity of the sample moment matrix may be treated by rotating the regression
coordinate system to isolate the eects of the dominant component (here the initialization
r
0
(:)). This coordinate rotation is analogous to that used in Park and Phillips (1988) and
Phillips (1989) for systems with cointegrated regressors, but in the present case the rotation
matrix is a random matrix in the limit, corresponding to the random limit of r
0
(:) , a feature
that causes some technical complications.
To x ideas, dene
H (:) =
r
0
(:)
_
r
0
(:)
0
r
0
(:)

12
, (10)
7
and consider a 1(1 1) random orthogonal complement H
?
(:) to H (:) satisfying H
?
(:)
0
H (:) =
0 and H
?
(:)
0
H
?
(:) = 1
11
almost surely. Although H
?
(:) is not unique its outer product
is uniquely dened by the well known identity (e.g., 8.67 in Abadir and Magnus, 2005)
H
?
(:) H
?
(:)
0
H (:) H (:)
0
= 1
1
a.:. (11)
Then ' (:) = [H (:) , H
?
(:)[ is a 1 1 orthogonal matrix which may be used to transform
r
t
into a vector with the property that all but one of the regressors has a zero initialization.
Specically, dene
.
t
:= ' (:)
0
r
t
=
_
H (:)
0
r
t
H
?
(:)
0
r
t
_
=:
_
.
1t
.
2t
_
. (12)
Then, using (5), we can write .
2t
= H
?
(:)
0
r
0
(:)H
?
(:)
0
1
t
= H
?
(:)
0
1
t
, which implies that
.
2t
has initial condition zero, and
.
1t
= H (:)
0
r
t
= H (:)
0
(r
0
(:) 1
t
) =
_
r
0
(:)
0
r
0
(:)

12
H (:)
0
1
t
=
_
r
0
(:)
0
r
0
(:)

12
_
1 O
j
__
:
i
a
__
, (13)
under innite past initialization (t = ). Thus, for large :, .
1t
behaves like the quantity
_
r
0
(:)
0
r
0
(:)

12
and is independent of t as
a
in
0. Thus, the new coordinate system reveals
that in one direction the time series behaves like an integrated process originating at the origin
(i.e., H
?
(:)
0
1
t
), whereas in the other direction the time series behaves like a constant (over
t) intercept but one that has a random diverging value as : , viz.
_
r
0
(:)
0
r
0
(:)

12
~
i
12
a
_
1
0
(1)
0
1
0
(1)
_
12
.
The diering behavior of these components leads to a singular regression limit theory that
corresponds to a unit root limit theory of reduced dimension (1 1) in one direction and an
explosive limit theory in the other. The outcome is presented in the following result.
Theorem 3 For the multivariate integrated process generated by (3) with 1 _ 2 under
Assumptions LP and IC with t = , the following limit theory applies as : :
:
_

1
a
1
1
_
=w(1, 1
0
) :=
__
1
0
d11
0
A
_
H
?
_
H
0
?
_
1
0
11
0
H
?
_
1
H
0
?
, (14)
_
:i
a
_

1
a
1
1
_
H (:) =
_
1
0
(1)
0
1
0
(1)

12
_
1(1) w(1, 1
0
)
_
1
0
1
_
, (15)
where H
?
is a 1 (1 1) random orthogonal complement to 1
0
(1) satisfying (26), 1 and
1
0
are independent 1-vector Brownian motions with variance matrix \ and 1(:) = 1(:)
_
1
0
1(:) d:.
Remark C
(i) Theorem 3 reveals that the least squares estimator has the usual :-rate of convergence and
that the initialization contributes to the asymptotic distribution (through H
?
H
0
?
) but
8
does not dominate the limit theory. Thus, the eect of an innite past initial condition
on multivariate unit root regression theory is moderated by higher dimensional eects
in comparison with the univariate case. The result of Theorem 3 bears some similarity
to regression theory under distant past initializations, where both the initial condition
and the sample moments of the integrated process contribute to the limiting distribution
of the least squares estimator without one dominating the other. Of course, in the
direction H (:) where the initialization dominates, the limit theory is accelerated to the
rate
_
:i
a
. When 1 = 1, (15) reduces to the result for the scalar case given in Theorem
2(i) because in this case H
?
(:) = H
?
= 0, H (:) = :iq:(r
0
(:)) = :iq:(1
0
(1)) ,
_
1
0
(1)
0
1
0
(1)
_
12
= [1
0
(1)[ , and then (15) is simply
_
i
a
:
_

1
a
1
_
=(.
(ii) Interestingly, the unit root limit theory given in (14) and (15) involves the demeaned
process 1(:) even though there is no intercept in the regression. The demeaning eect
arises because, as shown in (13), in the direction of the initial condition, the time series
is dominated by a component that behaves like a constant, i.e., .
1t
~
_
r
0
(:)
0
r
0
(:)

12
.
Thus, using the identity (11) and the denition of H (:) in (10), we can write the tted
regression as
r
t
=

1
a
r
t1
n
t
=

1
a
H (:) .
1t1


1
a
H
?
(:) .
2t1
n
t
~

1
a
r
0
(:)

1
a
H
?
(:) .
2t1
n
t
as : . Thus, the tted regression in the direction of H
?
(:) is given by
.
2t
~ H
?
(:)
0

1
a
r
0
(:) H
?
(:)
0

1
a
H
?
(:) .
2t1
H
?
(:)
0
n
t
. (16)
It is the regression in (16) which gives rise to the limit theory in (14), the termH
?
(:)
0

1
a
r
0
(:)
producing the demeaning eect of an intercept. Of course, this random intercept does
not appear in the data generating process, since H
?
(:)
0
1r
0
(:) = H
?
(:)
0
r
0
(:) = 0.
(iii) The limiting distribution in Theorem 3 is singular, since the matrix
H
?
_
H
0
?
_
1
0
11
0
H
?
_
1
H
0
?
has rank equal to 1 1. This is a manifestation in the limit theory of the asymptotic
singularity of the sample moment matrix in the original regression coordinates.
(iv) The matrix H
?
_
H
0
?
_
1
0
11
0
H
?
_
1
H
0
?
is invariant to the coordinate system dening H
?
.
Thus, the limit theory of Theorem 3 is also invariant to the choice of coordinates.
(v) When A is estimated nonparametrically and a corresponding bias corrected estimate

1
+
a
constructed, then the limit theory for this estimate is given by expression (14) with A = 0.
This limit theory is analogous to that of a rst order vector autoregression with a tted
intercept and 11 unit roots. The reason for the tted intercept in this correspondence is
that the implicit regression on .
1t
in the new coordinate system is equivalent to regression
on a constant because .
1t
= H (:)
0
r
t
behaves like r
0
(:) asymptotically.
9
3. Cointegration under Extended Initialization
This section considers the cointegrated system
j
t
= r
t
n
jt
, r
t
= r
t1
n
at
, t 1, ..., : , (17)
where n
t
=
_
n
0
jt
, n
0
at
_
0
is an :1-vector of innovations satisfying Assumption LP, is an :1
matrix of cointegrating coecients, r
t
is a 1-vector of integrated time series and the system is
initialized at some r
0
(:) =

in
)=0
n
a,)
that satises Assumption IC. Under LP, the functional
law :
12

bac
)=1
n
)
= 1() applies with 1 an : 1-vector Brownian motion with variance
matrix \. We partition the limiting Brownian motion and the various matrices associated with
its variance conformably with n
t
as follows: 1 =
_
1
0
j
, 1
0
a
_
0
, 1 (1) =
_
1
j
(1)
0
, 1
a
(1)
0
_
0
,
\ =
_
\
jj
\
ja
\
aj
\
aa
_
and ^ =
_
^
jj
^
ja
^
aj
^
aa
_
.
Finally, we let 1
0
denote a 1-vector Brownian motion with variance matrix \
aa
dened by
the functional law i
12
a

binc
)=0
n
a,)
=1
0
().
We will be concerned with the eect of the initialization on the limit theory of cointegration
estimators and tests. These eects are demonstrated in terms of the FM regression procedure
(Phillips and Hansen, 1990) and the same results apply for other commonly used cointegration
procedures. Of course, under IC(i), or recent past initializations, the limit theory is well known
to be invariant to the eects of r
0
(:) . Under IC(ii), the eects are manifest in the mixture
process in the limit theory, so that
:
_

_
='
_
0, \
jj.a

_
1
0
1
+
t
1
+
t
0
_
, (18)
where

+
is the FM regression estimator, \
jj.a
= \
jj
\
ja
\
1
aa
\
aj
is the conditional long-
run covariance matrix of n
jt
given n
at
, and 1
+
t
(:) = 1
a
(:)
_
t1
0
(1) as in Theorem 2,
so that 1
a
and 1
0
are independent 1-vector Brownian motions with variance matrix \
aa
.
Result (18) follows in a straightforward way using results obtained in the proof of Theorem 2.
Since
_
1
0
1
+
t
1
+
t
0
is the weak limit of the standardized sample moment matrix :
2

a
t=1
r
t
r
0
t
, as
shown earlier, the limit theory (18) leads to the usual inferential theory based on the estimate

+
. Thus, the conventional approach to inference in cointegrated systems is robust to both
recent and distant initializations. We therefore focus our attention in this section on innitely
distant initial conditions.
The FM regression estimator has the explicit form

+
=
_

1
+0
A :

^
+
ja
_
(A
0
A)
1
, where
A = [r
0
1
, ..., r
0
a
[
0
,

1
+
=
_
j
+
1
0
, ..., j
+
a
0

0
is an : : matrix of observations of corrected variates
j
+
t
= j
t


\
ja

\
1
aa
^r
t
, where

\
ja

\
1
aa
and

^
+
ja
are consistent estimates of \
ja
\
1
aa
and ^
+
ja
=
^
ja
\
ja
\
1
aa
^
aa
, all of which may be constructed in the familiar fashion using semiparametric
lag kernel methods with residuals from a preliminary cointegrating least squares regression on
(17). The limit theory for

+
under innitely distant initial conditions as given in IC(iii) is as
follows.
10
Theorem 4 Under model (17) and Assumptions LP and IC(iii) with t = , we have, as
:
:
_

_
='
_
0, \
jj.a
H
?
_
H
0
?
_
1
0
1
a
1
0
a
H
?
_
1
H
0
?
_
, (19)
_
:i
a
_

_
H (:) ='
_
0, \
jj.a
_
1
0
\ (:)
2
d:
1
0
(1)
0
1
0
(1)
_
, (20)
where 1 and 1
0
are independent 1-vector Brownian motions with variance matrix \
aa
, H
?
is a 1 (1 1) random orthogonal complement to 1
0
(1) satisfying (26), 1
a
(:) = 1
a
(:)
_
1
0
1
a
(:) d: is demeaned 1
a
, 1
j.a
= 1
j
\
ja
\
1
aa
1
a
is Brownian motion with covariance
matrix \
jj.a
independent of 1
a
and 1
0
, and
\ (:) = 1 1
a
(:)
0
H
?
_
H
0
?
_
1
0
1
a
1
0
a
H
?
_
1
H
0
?
__
1
0
1
a
_
.
Remark D
(i) The limit distribution of

+
is mixed Gaussian, just as in the case of recent and distant
initial conditions, and the dominating rate of convergence is order : as usual. The dom-
inating limit theory (19) is invariant to the innitely distant initialization. Nonetheless,
the initialization does aect the limit theory because the limit distribution (19) is singular
and a faster convergence rate
_
:i
a
applies in the direction of the innitely distant initial
condition. In that direction the limit theory is also mixed Gaussian and the mixing vari-
ate depends on the squared norm |1
0
(1)|
2
= 1
0
(1)
0
1
0
(1) of the standardized limiting
initialization. Thus, while the initialization does have an eect on the limit theory, it is
of secondary importance.
(ii) As in Theorem 3, the limit theory (19) involves the demeaned process 1
a
(:) corresponding
to the regressor r
t
. Again, the demeaning is caused by the fact that in the direction of
the initial condition, the time series r
t
is dominated by a component that behaves like
a constant - in this case H (:)
0
r
t
~ 1
0
(1)
0
1
0
(1) - which acts like an intercept in the
limit theory, see Remark C(ii). Therefore, one material impact of the innitely distant
initialization is that the regression equation behaves as if there is a tted intercept.
4. Extensions to Models with Drift
The above discussion has considered unit root and cointegration regression models without
intercept and trend. Introducing drift to these models provides a practical extension that
produces some further new results. It will be sucient to use the cointegrating regression
model to illustrate the eects of drift in both the sample observations and the initial conditions.
One aspect of the results an increase in the degeneracy of the limit theory stemming from a
drifted initialization is not immediate.
11
We take model (17), assume 1 8, and replace the generating mechanism of the regressors
by
r
t
= ,t r

t
, t = 1, ..., : (21)
r

t
=
t

)=1
n
a)
r

0
, r

0
(:) =
in

)=0
n
a,)
,i
a
(22)
in which case r
0
= r

0
(:) is the outcome of a random wandering process with drift so that its
stochastic order is O
j
(i
a
) , which is analogous to that of r
t
. In this event, the sample data
A
0
= [r
1
, ..., r
a
[ satisfy
A =
a
,
0

a
r
0
0
o,
where o
0
= [o
1
, ...o
a
[ with o
t
=

t
)=1
n
a)
,
a
=(1, ..., :)
0
, and
a
= (1, ..., 1)
0
. As usual, unit
root regression with a tted trend and intercept removes the eects of the initialization r

0
and the trend coecient ,, and conventional theory applies with appropriate eects of the
detrending being manifest in the limit theory, as shown in Park and Phillips (1988) across
a variety of models. Similar considerations apply in the present case but with an additional
complication arising from the form of the initialization (22).
We illustrate by taking the case of FM regression applied to (17) with r
t
generated as in
(21). Here the limit theory is given by
:
_

_
='
_
0, \
jj.a
H
?
_
H
0
?
_
1
0
1
a
1
0
a
H
?
_
1
H
0
?
_
, (23)
where 1
a
is the detrended process
1
a
(r) = 1
a
(r)
__
1
0
1
a
7
0
___
1
0
77
0
_
1
7
0
(r) , 7 (r) = (1, r)
0
, (24)
so that the limit theory is entirely analogous in form to that given in (19). However, in the
present case, the additional complication stems from the fact that the directional matrix H
?
has structure and rank that reect the presence of the time trend and the space spanning the
innitely distant initialization. The latter is aected by the rate at which i
a
in relation
to : and the various components of the initialization, which we now briey discuss.
Observe that under (22) the drift in the initialization determines the primary limit so that
i
1
a
r

0
(:)
j
,. Expanding the probability space as needed for the strong invariance principle
1
0a
:= i
12
a

binc
)=0
n
a,)

o.c
1
0
() to hold, with 1
0
= BM(\
aa
) , the large sample behavior
of the initial condition has the form
r

0
(:) =
in

)=0
n
a,)
,i
a
= ,i
a
1
0a
_
i
a
= [,, 1
0
(1)[
_
i
a
_
i
a
_
1 o
o.c
(1) ,
so that r

0
(:) is spanned by the two columns of the matrix C
a
= [,, 1
0a
[ and in the limit
by the matrix C = [,, 1
0
(1)[ where , and 1
0
(1) are a.:. linearly independent vectors. The
12
components , and 1
0a
of the initialization vector r

0
(:) have divergence rates i
a
and i
12
a
corresponding to the two components in (22). So, because of (21) there will be a time trend in
the regression and because of the eect of the initial condition there is eectively a (random)
intercept in the regression since i
a
is large. When i
a
is very large relative to the trend,
in particular if
i
1=2
n
a
, then r

0
(:) is the dominating force in the asymptotics and both
components of r

0
(:) gure in the limit theory. To resolve the limit, we transform coordinates
using the matrix [C
a
, C
a?
[ , where C
a?
is a complementary matrix of vectors orthogonal to C,
giving
_
C
0
a
C
0
a?
_
r
bac
=
_
_
C
0
a
, :| C
0
a
C
a
_
i
a
_
i
a
_
C
0
a
o
bac
C
0
a?
, :| C
0
a?
o
bac
_
_
.
If i
a
is such that
p
in
a
, the largest eect is in the direction C
a
, so that both components ,
and 1
0a
are relevant. The next largest eect comes in the direction C
0
a?
, and then nally the
dominating eect on the limit theory for

+
with slowest asymptotics comes in the direction
orthogonal to [C
a
, C
0
a?
,[ . That rate is O
j
(:) and the limit theory for

+
is just as given in
Theorem 4 by (19) or (23) above. However, in this case H
?
is of reduced dimension 1(1 8)
and is a random orthogonal matrix spanning the orthogonal complement of the limit matrix
[C, C
0
?
,[ . The dimension reduction to 18 in the columns of H
?
comes about because of the
eect of the linear trend in r
t
and the initialization r

0
(:) which lies in the two dimensional
space spanned by C in the limit. The process 1
a
in (23) is the detrended process (24). Again,
inference proceeds as usual in the presence of initializations such as (22).
Thus, initialization with drift in a cointegrated system does not aect the practicalities of
inference even when the initialization is in the innite past. But initialization does inuence
the form of the asymptotic theory in a subtle manner in terms of its dimensionality and its
support whose orientation involves a random component that is determined by innitely distant
initialization eects.
5. Appendix
This section provides proofs of theorems in the text together with some auxiliary results. We
start with the following preliminary results. The notation is the same as that used in the text.
Lemma A1. Joint convergence in distribution of
_

0
a
(1) ,
a
(1) , :
32
a

t=1
1
t1
, :
2
a

t=1
1
t1
1
0
t1
_
as : is equivalent to convergence in distribution of each component, where
0
a
(:) :=
1 (1) i
12
a

bincc
)=0
-
)
,
a
(:) := 1 (1) :
12

bacc
t=1
-
t
, and 1
t
:=

t
)=1
n
)
.
Proof. Joint convergence of
0
a
(1) and
a
(1) holds trivially by independence. We will show
that the last two components are asymptotically equivalent to continuous functionals of the
13
partial sum process
a
() on the Skorohod space 1[0, 1[
1
. The lemma will then follow by the
continuous mapping theorem and independence of
a
() and
0
a
().
The BN decomposition yields, for each : [0, 1[ ,
l
a
(:) :=
1
:
12
bacc

t=1
n
t
=
a
(:)
1
:
12
_
-
bacc
-
0
_
, (25)
where -
t
=

1
)=0

1
)
-
t)
with

1
)
=

1
I=)+1
1
I
. Using (25) and the fact that 1
0
= 0,
:
32

a
t=1
1
t1
can be written as
:
32
a

t=1
1
t1
=
_
1
0
l
a
(:) d: =
_
1
0

a
(:) d:
1
:
12
_
1
0
-
bacc
d: o
j
(1)
=
_
1
0

a
(:) d: o
j
(1) ,
since :
12
_
1
0
-
bacc
d:
1
1
0. Similarly, by (25)
:
2
a

t=1
1
t1
1
0
t1
=
_
1
0
l
a
(:) l
a
(:)
0
d: =
_
1
0

a
(:)
a
(:)
0
d: o
j
(1) ,
since :
12
_
1
0
-
bacc
-
0
bacc
d:
1
1
0 and
1
_
_
_
_
1
:
12
_
1
0

a
(:) -
0
bacc
d:
_
_
_
_
_
1
:
12
_
1
0
1
_
_
_
a
(:) -
0
bacc
_
_
_d:
_
1
:
12
_
1
0
1
_
|
a
(:)|
_
_
-
bacc
_
_
_
d:
_
1
:
12
_
1
0
1
_
|
a
(:)|
2
_
12
1
_
_
_
-
bacc
_
_
2
_
12
d:
_ 1
_
|-
1
|
2
_
12
1
_
|-
1
|
2
_
12
1
:
12
_
1
0
::|
:
d: = O
_
:
12
_
.
Lemma A2. In the setup of Sections 2.4 and 3, there exists a 1(1 1) random orthogonal
complement, H
?
, to 1
0
(1) satisfying
H
0
?
1
0
(1) = 0 and H
?
H
0
?
= 1
1

_
1
0
(1)
0
1
0
(1)

1
1
0
(1) 1
0
(1)
0
a.:. (26)
Dene 1(:) = 1(:)
_
1
0
1(:) d:, 7
1
= [.
10
, .
11
, ..., .
1a1
[
0
, the : (1 1) matrix 7
2
=
_
.
0
20
, .
0
21
, ..., .
0
2a1

0
, and
H
1a
=
_
7
0
1
7
1
_
1
7
0
1
7
2
, Q
1
= 1
a
7
1
_
7
0
1
7
1
_
1
7
0
1
.
The following hold as : and
a
in
0 :
(i) H
1a
= O
j
__
a
in
_
= o
j
(1) ,
14
(ii) :
1

a
t=1
n
t
.
0
1t1
H
1a
=
a
(1)
_
1
a
3=2

a
t=1
1
t1
_
0
H
?
(:) o
j
(1) ,
(iii) H
?
(:)
_
:
2
7
0
2
Q
1
7
2
_
1
H
?
(:)
0
=H
?
_
H
0
?
_
1
0
11
0
H
?
_
1
H
0
?
.
Proof of (26). We begin by establishing the existence of an orthogonal complement satisfying
(26) in the setup of Section 2.4. In view of Assumption LP the asymptotic behavior of the
initial condition r
0
(:) follows by standard methods (Phillips and Solo, 1992). In particular,
letting 1
0
= 1' (\), we have the functional law

0
a
(:) = 1 (1)
1
i
12
a
bincc

)=0
-
)
=1
0
(:) , as : (27)
which, together with the BN decomposition, yield
i
12
a
r
0
(:) =
0
a
(1) o
j
(1) =1
0
(1) . (28)
By (10) and (28) we obtain
H (:) =

0
a
(1)
_

0
a
(1)
0

0
a
(1)

12
o
j
(1) =H :=
1
0
(1)
_
1
0
(1)
0
1
0
(1)

12
. (29)
Since |H| = 1, the random matrix 1
1
HH
0
is positive semidenite with rank 11. Therefore,
by a standard decomposition result for positive semidenite matrices (cf. 8.21 in Abadir and
Magnus, 2005) there exists a 1 (1 1) random matrix H
?
such that, a.:.,
H
?
H
0
?
= 1
1
HH
0
= 1
1

_
1
0
(1)
0
1
0
(1)

1
1
0
(1) 1
0
(1)
0
and H
0
?
H
?
is a diagonal matrix of rank 11 containing the positive eigenvalues of 1
1
HH
0
.
Since 1
1
HH
0
is idempotent, all its positive eigenvalues are equal to 1, implying that H
0
?
H
?
=
1
11
a.:. Combining the latter with H
?
H
0
?
= 1
1
HH
0
implies that H
0
?
H = 0, so the matrix
H
?
is an orthogonal complement to H (and hence to 1
0
(1)).
Having established the existence of an orthogonal complement H
?
satisfying (26), we can
use (11) to write the limiting distribution of the outer product H
?
(:) H
?
(:)
0
as
H
?
(:) H
?
(:)
0
= 1
1


0
a
(1)
0
a
(1)
0

0
a
(1)
0

0
a
(1)
o
j
(1) =H
?
H
0
?
. (30)
For the setup of Section 3, we can use an identical argument, replacing
0
a
(:) by
0
aa
(:)
(dened in (53)) and \ by \
aa
.
Proof of Lemma A2 (i). First, note that, by (5),
1
i
12
a
:
32
a

t=1
r
t1
1
0
t1
=
r
0
(:)
i
12
a
1
:
32
a

t=1
1
0
t1

1
i
12
a
:
32
a

t=1
1
t1
1
0
t1
=
r
0
(:)
i
12
a
1
:
32
a

t=1
1
0
t1
O
j
__
:
i
a
_
. (31)
15
Thus, since, by (29) and (30), H (:) and H
?
(:) are O
j
(1), (28) yields
7
0
1
7
2
= H (:)
0
a

t=1
r
t1
1
0
t1
H
?
(:) = O
j
_
i
12
a
:
32
_
.
The result for H
1a
follows since (7
0
1
7
1
)
1
= O
j
_
i
1
a
:
1
_
.
Proof of Lemma A2 (ii). By (43) and (29)
1
:i
a
7
0
1
7
1
= H (:)
0
1
i
a
:
a

t=1
r
t1
r
0
t1
H (:) =
0
a
(1)
0

0
a
(1) o
j
(1) , (32)
which, together with (31), (28) and (29), yields
H
1a
=
_
7
0
1
7
1
:i
a
_
1
H (:)
0
1
:i
a
a

t=1
r
t1
1
0
t1
H
?
(:)
=
_
7
0
1
7
1
:i
a
_
1
H (:)
0
_
_
:
i
a
r
0
(:)
_
i
a
1
:
32
a

t=1
1
0
t1
O
j
_
:
i
a
_
_
H
?
(:)
=
_
:
i
a
_

0
a
(1)
0

0
a
(1)

12
_
1
:
32
a

t=1
1
t1
_
0
H
?
(:) o
j
__
:
i
a
_
. (33)
Thus, by (41) and (29) we obtain
1
:
a

t=1
n
t
.
0
1t1
H
1a
=
_

0
a
(1)
0

0
a
(1)

12
_
:i
a
a

t=1
n
t
r
0
t1
H (:)
_
1
:
32
a

t=1
1
t1
_
0
H
?
(:)
o
j
_
_
:
i
a
1
:
a

t=1
n
t
r
0
t1
_
=
_

0
a
(1)
0

0
a
(1)

12

a
(1)
0
a
(1)
0
H (:)
_
1
:
32
a

t=1
1
t1
_
0
H
?
(:) o
j
(1)
=
a
(1)
_
1
:
32
a

t=1
1
t1
_
0
H
?
(:) o
j
(1) . (34)
Proof of Lemma A2 (iii). We rst show that
:
2
7
0
2
Q
1
7
2
= H
?
(:)
0
T
a
H
?
(:) o
j
(1) , (35)
where T
a
denotes the random matrix
T
a
=
1
:
2
a

t=1
1
t1
1
0
t1

_
1
:
32
a

t=1
1
t1
__
1
:
32
a

t=1
1
t1
_
0
.
16
By an application of (11) we can write
7
0
2
7
1
i
12
a
:
32
7
0
1
7
2
i
12
a
:
32
= H
?
(:)
0

a
t=1
1
t1
r
0
t1
i
12
a
:
32
H (:) H (:)
0

a
t=1
r
t1
1
0
t1
i
12
a
:
32
H
?
(:)
= H
?
(:)
0

a
t=1
1
t1
r
0
t1
i
12
a
:
32

a
t=1
r
t1
1
0
t1
i
12
a
:
32
H
?
(:)
H
?
(:)
0

a
t=1
1
t1
r
0
t1
i
12
a
:
32
H
?
(:) H
?
(:)
0

a
t=1
r
t1
1
0
t1
i
12
a
:
32
H
?
(:)
= H
?
(:)
0

a
t=1
1
t1
r
0
t1
i
12
a
:
32

a
t=1
r
t1
1
0
t1
i
12
a
:
32
H
?
(:)
H
?
(:)
0

a
t=1
1
t1
1
0
t1
i
12
a
:
32
H
?
(:) H
?
(:)
0

a
t=1
1
t1
1
0
t1
i
12
a
:
32
H
?
(:)
= H
?
(:)
0

a
t=1
1
t1
r
0
t1
i
12
a
:
32

a
t=1
r
t1
1
0
t1
i
12
a
:
32
H
?
(:) O
j
_
:
i
a
_
,
where H
?
(:)
0

a
t=1
r
t1
1
0
t1
= H
?
(:)
0

a
t=1
1
t1
1
0
t1
because of (5) and the fact that H
?
(:)
0
r
0
(:) =
0. Thus, (31) and (28) yield
7
0
2
7
1
i
12
a
:
32
7
0
1
7
2
i
12
a
:
32
=
0
a
(1)
0

0
a
(1) H
?
(:)
1
:
32
a

t=1
1
t1
1
:
32
a

)=1
1
0
)1
H
?
(:)
0
o
j
(1) ,
and (35) follows by (32) and the identity 7
0
2
7
2
= H
?
(:)
0

a
t=1
1
t1
1
0
t1
H
?
(:) since
1
:
2
7
0
2
Q
1
7
2
=
1
:
2
7
0
2
7
2

_
7
0
1
7
1
:i
a
_
1
7
0
2
7
1
i
12
a
:
32
7
0
1
7
2
i
12
a
:
32
=
1
:
2
7
0
2
7
2
H
?
(:)
0
1
:
32
a

t=1
1
t1
1
:
32
a

)=1
1
0
)1
H
?
(:) o
j
(1)
= H
?
(:)
0
T
a
H
?
(:) o
j
(1) .
Having established (35), the limiting distribution of
H
?
(:)
_
:
2
7
0
2
Q
1
7
2
_
1
H
?
(:)
0
= H
?
(:)
_
H
?
(:)
0
T
a
H
?
(:)

1
H
?
(:)
0
o
j
(1)
is derived as follows. By Lemma A1,
_

0
a
(1) , T
a
_
= (1
0
(1) , T) , where T =
_
1
0
11
0
. So (30)
implies that
_
H
?
(:) H
?
(:)
0
, T
a
_
=
_
H
?
H
0
?
, T
_
. (36)
Thus, the Skorohod representation theorem implies that there exist random matrices
_
1
a
,

T
a
_
and
_
1,

T
_
dened on the same probability space for all : N such that
_
1
a
,

T
a
_
=
o
_
H
?
(:) H
?
(:)
0
, T
a
_
and
_
1
a
,

T
a
_

o.c.
_
1,

T
_
as : . By (36),
_
1,

T
_
=
o
(H
?
H
0
?
, T).
17
Denote by '
+
the Moore-Penrose inverse of a matrix '. Since the rank of both 1
a

T
a
1
a
and
1

Q1 is 1 1 a.:., Theorem 2 of Andrews (1987) yields


H
?
(:)
_
H
?
(:)
0
T
a
H
?
(:)

1
H
?
(:)
0
=
_
H
?
(:) H
?
(:)
0
T
a
H
?
(:) H
?
(:)
0

+
=
o
_
1
a

T
a
1
a
_
+

o.c.
_
1

T1
_
+
=
o
_
H
?
H
0
?
TH
?
H
0
?
_
+
= H
?
_
H
0
?
TH
?
_
1
H
0
?
,
which proves part (iii) of the lemma.
Proofs of Theorems 1 and 2
The limit theory for sample moments involving trajectories of r
t
may incorporate elements
from both initial conditions and sample period observations depending on the behavior of i
a
as
: . Decompose r
t
as r
t
= r
0
(:) 1
t
, as in (5). Recalling that 1
0
= 0, the limit behavior
of 1
t
and its sample moments is standard (Phillips and Durlauf, 1986), viz.,
:
12
bacc

t=1
n
t
=
a
(:) o
j
(1) ,
a
(:) := 1 (1) :
12
bacc

t=1
-
t
=1 (1)
12
\ (:) , (37)
and
:
32
a

t=1
1
t1
=
_
1
0
1, :
2
a

t=1
1
t1
1
0
t1
=
_
1
0
11
0
, :
1
a

t=1
n
t
1
0
t1
=
_
1
0
d11
0
A, (38)
where 1 = 1 (1)
12
\ = 1' (\) , \ = 1 (1) 1 (1)
0
, A =

1
I=1
1
_
n
t
n
0
tI
_
and \ is
standard 1-vector Brownian motion. By virtue of the independence of the -
t
, the processes
i
12
a
bincc

)=0
n
)
=
0
a
(:) o
j
(1) and :
12
bacc

t=1
n
t
=
a
(:) o
j
(1) (39)
are asymptotically independent for all : [0, 1[ , and so the Brownian motions 1
0
and 1 are also
independent. The asymptotic equivalences in (39) follow by employing the BN decomposition
and partial summation as in Phillips and Solo (1992), in view of the summability assumption
in (4).
The eect of the initial condition on the asymptotic behavior of the sample moments of r
t
can be obtained by comparing the convergence rate of r
0
(:) with that of the sample moments
18
of 1
t
. First,
1
:
a

t=1
n
t
r
0
t1
=
_
i
a
:
1
:
12
a

t=1
n
t
_
r
0
(:)
i
12
a
_
0

1
:
a

t=1
n
t
1
0
t1
=
_
t
a
(1)
0
a
(1)
0

1
:
a

t=1
n
t
1
0
t1
o
j
(1)
=
_
t1(1) 1
0
(1)
0

_
1
0
d11
0
A
=
_
1
0
d11
+0
t
A, (40)
where 1
+
t
(:) = 1(:)
_
t1
0
(1) , giving the limit result for recent (t = 0) and distant
(0 < t < ) past initializations. For innite (t = ) past initializations, (40) requires
rescaling so that
1
_
i
a
:
a

t=1
n
t
r
0
t1
=
a
(1)
0
a
(1)
0
O
j
__
:
i
a
_
=1(1) 1
0
(1)
0
, (41)
and the sample moments involving 1
t
are asymptotically negligible under the revised stan-
dardization, thereby eliminating the components that produce the usual unit root limit theory.
Instead, the asymptotic behavior of the sample covariance

a
t=1
n
t
r
0
t1
is determined exclu-
sively by the innite past initialization r
0
(:) and partial sums of n
t
.
For t [0, ) the sample moment matrix of r
t
has the expanded form
1
:
2
a

t=1
r
0
t1
r
0
t1
=
i
a
:
r
0
(:) r
0
(:)
0
i
a

_
i
a
:
1
:
32
a

t=1
1
t1
r
0
(:)
0
i
12
a

_
i
a
:
r
0
(:)
i
12
a
1
:
32
a

t=1
1
0
t1

1
:
2
a

t=1
1
t1
1
0
t1
=
_
1
0
1
+
t
1
+
t
0
, (42)
giving the limit result for recent and distant past initializations. Under innite (t = ) past
initializations the sample moment matrix has a faster rate of convergence that is driven by the
behavior of r
0
(:). In particular,
1
i
a
:
a

t=1
r
t1
r
0
t1
=
0
a
(1)
0
a
(1)
0
O
j
__
:
i
a
_
=1
0
(1) 1
0
(1)
0
, (43)
producing a singular limit for the sample moment matrix unless (3) is a scalar autoregression
(1 = 1). In the scalar case, (41) and (43) yield
_
i
a
:
_

1
a
1
_
=
1
p
ina

a
t=1
r
t1
n
t
1
ina

a
t=1
r
2
t1
=
1(1)
a
1=2

a
t=1
-
t
1(1)
i
1=2
n

in
)=0
-
)
O
j
__
:
i
a
_
=
1(1)
1
0
(1)
=
o
(
19
where ( is a standard Cauchy variate, giving the scalar result of Theorem 2. In this case where
t = , the tail of the process from the origination of r
t
wags the dog in the limit theory of
estimator. The distribution depends on the past through 1
0
and the sample through 1.
Combining (40) and (42) we have the least squares regression limit theory for (3) under
recent or distant past initializations
:
_

1
a
1
1
_
=
__
1
0
d11
+0
t
A
___
1
0
1
+
t
1
+
t
0
_
1
,
as stated in Theorem 1.
Proof of Theorem 3
In the new coordinates given by (12), we have
:
_

1
a
1
1
_
=
_
1
:
a

t=1
n
t
.
0
t1
__
1
:
2
a

t=1
.
t1
.
0
t1
_
1
' (:)
0
=
_
1
:
a

t=1
n
t
.
0
t1
_
_
:
2
7
0
1
7
1
:
2
7
0
1
7
2
:
2
7
0
2
7
1
:
2
7
0
2
7
2
_
1
_
H (:)
0
H
?
(:)
0
_
, (44)
where
H
1a
=
_
7
0
1
7
1
_
1
7
0
1
7
2
and Q
1
= 1
a
7
1
_
7
0
1
7
1
_
1
7
0
1
.
Set 7 = [7
1
, 7
2
[. Standard partitioned inversion gives
_
:
2
7
0
7
_
1
=
_
_
_
Z
0
1
Z
1
a
2
_
1
H
1a
_
Z
0
2
Q
1
Z
2
a
2
_
1
H
0
1a
H
1a
_
Z
0
2
Q
1
Z
2
a
2
_
1

_
Z
0
2
Q
1
Z
2
a
2
_
1
H
0
1a
_
Z
0
2
Q
1
Z
2
a
2
_
1
_
_
, (45)
and (44) becomes
:
_

1
a
1
1
_
=
1
:
a

t=1
n
t
.
0
1t1
_
7
0
1
7
1
:
2
_
1
H (:)
0

1
:
a

t=1
n
t
.
0
1t1
H
1a
_
7
0
2
Q
1
7
2
:
2
_
1
_
H
0
1a
H (:)
0
H
?
(:)
0

1
:
a

t=1
n
t
.
0
2t1
_
7
0
2
Q
1
7
2
:
2
_
1
_
H
?
(:)
0
H
0
1a
H (:)
0

. (46)
By (29) and (30) we know that both H (:) and H
?
(:) are bounded in probability. Thus,
recalling that the eect of the initial condition is present only in .
1t1
, we have
7
0
1
7
1
= O
j
(:i
a
) ,
a

t=1
n
t
.
0
1t1
= O
j
(
_
:i
a
) and
a

t=1
n
t
.
0
2t1
= O
j
(:) . (47)
20
The asymptotic behavior of the remaining terms of (46) is given in Lemma A2 above. Consid-
eration of these terms leads to the following simplication
:
_

1
a
1
1
_
=
_
1
:
a

t=1
n
t
.
0
2t1

1
:
a

t=1
n
t
.
0
1t1
H
1a
_
_
7
0
2
Q
1
7
2
:
2
_
1
H
?
(:)
0
O
j
__
:
i
a
_
=
_
1
:
a

t=1
n
t
1
0
t1

a
(1)
_
1
:
32
a

t=1
1
t1
_
0
_
H
?
(:)
_
7
0
2
Q
1
7
2
:
2
_
1
H
?
(:)
0
o
j
(1) . (48)
Joint convergence in distribution of the various elements in (48) needs to be proved. The proof
of Lemma A2 (iii) yields
H
?
(:)
_
7
0
2
Q
1
7
2
:
2
_
1
H
?
(:) =
_
H
?
(:) H
?
(:)
0
T
a
H
?
(:) H
?
(:)
0

+
o
j
(1) ,
which together with (30), imply that the right side of (48) is a continuous function of
_

0
a
(1) ,
a
(1) , :
32
a

t=1
1
t1
, :
2
a

t=1
1
t1
1
0
t1
,
1
:
a

t=1
n
t
1
0
t1
_
.
Joint convergence of the rst four terms has been established in Lemma A1. The sample covari-
ance :
1

a
t=1
n
t
1
0
t1
does not admit a neat integral representation like the other two sample
moments. The stochastic component of its limiting distribution is nonetheless driven by the
partial sum process l
a
() in (25) and joint convergence of :
1

a
t=1
n
t
1
0
t1
and other sam-
ple moments of 1
t
is well documented (c.f., Phillips, 1988b). Thus, it is enough to show
that :
1

a
t=1
n
t
1
0
t1
A is asymptotically independent of
0
a
(). To see this, note that
:
1

a
t=1
-
t
n
0
t

o.c.
1 (-
t
n
0
t
) = A by the ergodic theorem and a simple calculation. Using
the BN decomposition and summation by parts we can write
1
:
a

t=1
n
t
1
0
t1
A =
1 (1)
:
a

t=1
-
t
1
0
t1

1
:
-
a
1
0
a

1
:
a

t=1
-
t
n
0
t
A
=
1 (1)
:
a

t=1
-
t
1
0
t1
o
j
(1)
=
1 (1)
:
a

t=1
-
t
t1

)=1
-
0
)
1 (1)
0

1 (1)
:
a

t=1
-
t
-
0
t1
o
j
(1)
=
1 (1)
:
a

t=1
-
t
t1

)=1
-
0
)
1 (1)
0
o
j
(1) , (49)
since :
1

a
t=1
-
t
-
0
t1
0 in 1
2
by a martingale LLN. This establishes the required asymptotic
independence.
21
Since joint convergence of the various terms of (48) applies, (38), (37) and Lemma A2 (iii)
give
:
_

1
a
1
1
_
=
__
1
0
d11
0
A 1(1)
_
1
0
1
0
_
H
?
_
H
0
?
_
1
0
11
0
H
?
_
1
H
0
?
=
__
1
0
d11
0
A
_
H
?
_
H
0
?
_
1
0
11
0
H
?
_
1
H
0
?
as stated in (14).
For (15), using the fact that
1
_
:i
a
a

t=1
n
t
.
0
1t1
=
1
_
:i
a
a

t=1
n
t
r
0
(:)
0
H (:) O
j
__
:
i
a
_
=
_

0
a
(1)
0

0
a
(1)

12

a
(1) o
j
(1)
and that H (:)
0
H (:) = 1, H
?
(:)
0
H (:) = 0 a.:., (46) gives
_
:i
a
_

1
a
1
1
_
H (:)
=
1
_
:i
a
a

t=1
n
t
.
0
1t1
_
7
0
1
7
1
:i
a
_
1

1
:
a

t=1
n
t
.
0
2t1
_
7
0
2
Q
1
7
2
:
2
_
1
__
i
a
:
H
1a
_
0

1
_
:i
a
a

t=1
n
t
.
0
1t1
__
i
a
:
H
1a
__
7
0
2
Q
1
7
2
:
2
_
1
__
i
a
:
H
1a
_
0
=
_

0
a
(1)
0

0
a
(1)

12

a
(1)
_

0
a
(1)
0

0
a
(1)

12
_
a

t=1
n
t
1
0
t1
:

a
(1)
_
a

t=1
1
t1
:
32
_
0
_
H
?
(:)
_
7
0
2
Q
1
7
2
:
2
_
1
H
?
(:)
0
_
1
:
32
a

t=1
1
t1
_
=
_
1
0
(1)
0
1
0
(1)

12
_
1(1)
__
1
0
d11
0
A
_
H
?
_
H
0
?
_
1
0
11
0
H
?
_
1
H
0
?
_
1
0
1
_
,
where we have used (32), (33), (38), Lemma A2 and joint convergence developed in Lemma A1
and (49).
Proof of Theorem 4
Setting n
j.at
= n
jt
\
ja
\
1
aa
^r
t
and l
j.a
=
_
n
0
j.a1
, ..., n
0
j.aa

0
as the corresponding data
matrix, we have
j
+
t
= j
t


\
ja

\
1
aa
^r
t
= r
t
n
0.at

\
ja

\
1
aa
\
ja
\
1
aa
_
^r
t
,
and, letting l
a
:= ^A = [n
0
a1
, ..., n
0
aa
[
0
, we obtain
_

_
=
_
l
0
j.a
A :

^
+
ja

\
ja

\
1
aa
\
ja
\
1
aa
_
l
0
a
A
_
_
A
0
A
_
1
. (50)
22
Writing (50) in the rotated coordinates (12) and using the inversion formula (45), we have
:
_

_
=
_
l
0
j.a
7
:


^
+
j:

\
ja

\
1
aa
\
ja
\
1
aa
_
l
0
a
7
:
_
_
:
2
7
0
7
_
1
' (:)
0
=
_
1
:
a

t=1
n
j.at
.
1t


^
+
j:
1

_

\
ja

\
1
aa
\
ja
\
1
aa
_
l
0
a
7
1
:
_
_
7
0
1
7
1
:
2
_
1
H (:)
0

_
1
:
a

t=1
n
j.at
.
1t


^
+
j:
1

_

\
ja

\
1
aa
\
ja
\
1
aa
_
l
0
a
7
1
:
_
H
1a
_
7
0
2
Q
1
7
2
:
2
_
1
_
H
0
1a
H (:)
0
H
?
(:)
0

_
1
:
a

t=1
n
j.at
.
0
2t


^
+
j:
2

_

\
ja

\
1
aa
\
ja
\
1
aa
_
l
0
a
7
2
:
_

_
7
0
2
Q
1
7
2
:
2
_
1
_
H
?
(:)
0
H
0
1a
H (:)
0

. (51)
In order to analyze the components of (51), note that, by an identical argument to Lemma
A2, both

a
t=1
n
j.at
.
1t
and l
0
a
7
1
are of order O
j
_
_
:i
a
_
,

a
t=1
n
j.at
.
0
2t
and l
0
a
7
2
are of order
O
j
(:), 7
0
1
7
1
= O
j
(:i
a
) and 7
0
2
7
2
= O
j
_
:
2
_
. Also, given an integrable lag kernel function
/ () and a lag truncation parameter ' satisfying
1
A

A
a
0,

^
+
j:
=
A

I=0
/
_
/
'
_
1
:
a

t=1
n
j.at
^.
0
tI
=
A

I=0
/
_
/
'
_
1
:
a

t=1
n
j.at
^r
0
tI
' (:)
=

^
+
ja
' (:) = O
j
(1) ,
since

^
+
ja
is a consistent estimator (cf. Phillips and Hansen, 1990) and ' (:) = O
j
(1). Thus
both

^
+
j:
1
and

^
+
j:
2
are bounded in probability. Finally, as both

\
ja
and

\
aa
are consistent
estimators (Phillips and Hansen, 1990),

\
ja

\
1
aa
\
ja
\
1
aa
= o
j
(1).
The above facts imply, for the rst term of (51),
:
i
a
_
1
:
a

t=1
n
j.at
.
1t


^
+
j:
1

_

\
ja

\
1
aa
\
ja
\
1
aa
_
l
0
a
7
1
:
_
_
7
0
1
7
1
:i
a
_
1
=
_
1
i
a
a

t=1
n
j.at
.
1t

:
i
a

^
+
j:
1
o
j
(1)
1
i
a
l
0
a
7
1
_
O
j
(1) = O
j
__
:
i
a
_
.
Since :,i
a
0, this shows that the rst term of (51) is o
j
(1) as : . For the second term
23
of (51), since H
1a
= O
j
__
a
in
_
from Lemma A2, we obtain
_
1
:
a

t=1
n
j.at
.
1t


^
+
j:
1

_

\
ja

\
1
aa
\
ja
\
1
aa
_
l
0
a
7
1
:
_
H
1a
_
7
0
2
Q
1
7
2
:
2
_
1
H
0
1a
H (:)
0
=
_
1
:
a

t=1
n
j.at
.
1t


^
+
j:
1

_

\
ja

\
1
aa
\
ja
\
1
aa
_
l
0
a
7
1
:
_
O
j
_
:
i
a
_
=
_
1
i
a
a

t=1
n
j.at
.
1t

:
i
a

^
+
j:
1
o
j
(1)
l
0
a
7
1
i
a
_
O
j
(1) = O
j
__
:
i
a
_
,
and
_

^
+
j:
1

_

\
ja

\
1
aa
\
ja
\
1
aa
_
l
0
a
7
1
:
_
H
1a
_
7
0
2
Q
1
7
2
:
2
_
1
H
?
(:)
0
= O
j
__
:
i
a
_

\
ja

\
1
aa
\
ja
\
1
aa
_
l
0
a
7
1
:
O
j
__
:
i
a
_
= O
j
__
:
i
a
_
o
j
(1)
l
0
a
7
1
_
:i
a
O
j
(1) = o
j
(1) O
j
(1) = o
j
(1) .
A similar argument on the third term of (51) yields
_
1
:
a

t=1
n
j.at
.
0
2t


^
+
j:
2

_

\
ja

\
1
aa
\
ja
\
1
aa
_
l
0
a
7
2
:
_
_
7
0
2
Q
1
7
2
:
2
_
1
H
0
1a
H (:)
0
= O
j
(1)
_
7
0
2
Q
1
7
2
:
2
_
1
H
0
1a
H (:)
0
= O
j
(|H
1a
|) = O
j
__
:
i
a
_
,
and
_

\
ja

\
1
aa
\
ja
\
1
aa
_
l
0
a
7
2
:
_
7
0
2
Q
1
7
2
:
2
_
1
H
?
(:)
0
= o
j
(1) .
Thus, (51) yields
:
_

_
=
_
1
:
a

t=1
n
j.at
.
0
2t


^
+
j:
2

1
:
a

t=1
n
j.at
.
1t
H
1a
_
_
7
0
2
Q
1
7
2
:
2
_
1
H
?
(:)
0
o
j
(1) .
(52)
Corresponding to the notation of Lemma A1, let 1
at
:=

t
)=1
n
a)

0
aa
(:) := 1
a
(1) i
12
a
bincc

)=0
-
)
,
Ja
(:) := 1
J
(1) :
12
bacc

t=1
-
t
, (53)
for J r, j, and

+
j.aa
(:) :=
ja
(:) \
ja
\
1
aa

aa
(:) =1
j.a
(:) = BM(\
j.aa
) .
24
Then, an identical argument to that used in the derivation of (34) in Lemma A2(ii) yields
1
:
a

t=1
n
j.at
.
0
1t
H
1a
=
1
:
a

t=1
n
jt
.
0
1t
H
1a
\
ja
\
1
aa
1
:
a

t=1
n
at
.
0
1t
H
1a
=
_

ja
(1) \
ja
\
1
aa

aa
(1)

_
1
:
32
a

t=1
1
at
_
0
H
?
(:) o
j
(1)
=
+
j.aa
(1)
_
1
:
32
a

t=1
1
at
_
0
H
?
(:) o
j
(1)
= 1
j.a
(1)
_
1
0
1
a
. (54)
Also, using Lemma A2(iii) with 1
a
in place of 1 we obtain
H
?
(:)
_
7
0
2
Q
1
7
2
:
2
_
1
H
?
(:)
0
=H
?
_
H
0
?
_
1
0
1
a
1
0
a
H
?
_
1
H
0
?
. (55)
Thus, for the nal component of (52), the fact that

^
+
j:
2
=

^
+
ja
H
?
(:) =

^
+
jY
H
?
(:) yields
_
1
:
a

t=1
n
j.at
.
0
2t


^
+
j:
2
_
_
7
0
2
Q
1
7
2
:
2
_
1
H
?
(:)
0
=
_
1
:
a

t=1
n
j.at
1
0
at


^
+
jY
_
H
?
(:)
_
7
0
2
Q
1
7
2
:
2
_
1
H
?
(:)
0
=
__
1
0
d1
j.a
1
0
a
_
H
?
_
H
0
?
_
1
0
1
a
1
0
a
H
?
_
1
H
0
?
, (56)
as in (49) and using Lemma A2(iii).
Substituting (54), (55) and (56) into (52) and using joint convergence of the various elements
as in the proof of Theorem 3, we obtain
:
_

_
=
___
1
0
d1
j.a
1
0
a
_
1
j.a
(1)
_
1
0
1
a
_
H
?
_
H
0
?
_
1
0
1
a
1
0
a
H
?
_
1
H
0
?
=
__
1
0
d1
j.a
1
0
a
_
H
?
_
H
0
?
_
1
0
1
a
1
0
a
H
?
_
1
H
0
?
= '
_
0, \
jj.a
H
?
_
H
0
?
_
1
0
1
a
1
0
a
H
?
_
1
H
0
?
_
,
producing the stated result (19). Mixed normality holds because the limit process 1
j.a
is
independent of both 1
a
and 1
0
.
25
It remains to show (20). From (51) we have
_
:i
a
_

_
H (:)
=
_
1
_
:i
a
a

t=1
n
j.at
.
0
1t

_
:
i
a

^
+
j:
1

_

\
ja

\
1
aa
\
ja
\
1
aa
_
l
0
a
7
1
_
:i
a
_
_
7
0
1
7
1
:i
a
_
1

_
i
a
:
_
1
:
a

t=1
n
j.at
.
0
1t


^
+
j:
1

_

\
ja

\
1
aa
\
ja
\
1
aa
_
l
0
a
7
1
:
_
H
1a
_
7
0
2
Q
1
7
2
:
2
_
1
H
0
1a

_
i
a
:
_
1
:
a

t=1
n
j.at
.
0
2t


^
+
j:
2

_

\
ja

\
1
aa
\
ja
\
1
aa
_
l
0
a
7
2
:
_
_
7
0
2
Q
1
7
2
:
2
_
1
H
0
1a
=
1
_
:i
a
a

t=1
n
j.at
.
0
1t
_
7
0
1
7
1
:i
a
_
1

_
i
a
:
1
:
a

t=1
n
j.at
.
0
1t
H
1a
_
7
0
2
Q
1
7
2
:
2
_
1
H
0
1a

_
i
a
:
_
1
:
a

t=1
n
j.at
.
0
2t


^
+
j:
2
_
_
7
0
2
Q
1
7
2
:
2
_
1
H
0
1a
o
j
(1) , (57)
using similar arguments for the remainder terms as those used in the derivation of (52). Using
(32) and the fact that

a
t=1
n
j.at
1
0
at
= O
j
(:) , the rst term of (57) becomes
1
_
:i
a
a

t=1
n
j.at
.
0
1t
=
1
_
:i
a
a

t=1
n
j.at
r
0
t
H (:)
_
7
0
1
7
1
:i
a
_
1
=
1
_
:i
a
a

t=1
n
j.at
r
0
(:)
0
H (:)
_
7
0
1
7
1
:i
a
_
1
O
j
__
:
i
a
_
=
_
r
0
(:)
0
r
0
(:)
i
a
_
12
1
_
:
a

t=1
n
j.at
_
7
0
1
7
1
:i
a
_
1
O
j
__
:
i
a
_
=
_

0
aa
(1)
0

0
aa
(1)

12

+
j.aa
(1) o
j
(1)
=
_
1
0
(1)
0
1
0
(1)

12
1
j.a
(1) . (58)
Similarly, letting
T
aa
=
1
:
2
a

t=1
1
at
1
0
at

_
1
:
32
a

t=1
1
at
__
1
:
32
a

t=1
1
at
_
0
and using the above, (33), (35) and Lemma A2, the second term of (57) can be written as
_
1
_
:i
a
a

t=1
n
j.at
.
0
1t
_
__
i
a
:
H
1a
__
7
0
2
Q
1
7
2
:
2
_
1
__
i
a
:
H
1a
_
0
=
_

0
aa
(1)
0

0
aa
(1)

12

+
j.aa
(1)
_
1
:
32
a

t=1
1
at
_
0
H
?
(:)
_
H
?
(:)
0
T
aa
H
?
(:)

1
H
?
(:)
0
_
1
:
32
a

t=1
1
at
_
o
j
(1)
26
=
_
1
0
(1)
0
1
0
(1)

12
1
j.a
(1)
_
1
0
1
0
a
H
?
_
H
0
?
_
1
0
1
a
1
0
a
H
?
_
1
H
0
?
_
1
0
1
a
. (59)
For the third term of (57), since

^
+
j:
2
=

^
+
jY
H
?
(:), we obtain
_
i
a
:
_
1
:
a

t=1
n
j.at
.
0
2t


^
+
j:
2
_
_
7
0
2
Q
1
7
2
:
2
_
1
H
0
1a
=
_
1
:
a

t=1
n
j.at
1
0
at


^
+
jY
_
H
?
(:)
_
7
0
2
Q
1
7
2
:
2
_
1
__
i
a
:
H
1a
_
0
=
_

0
a
(1)
0

0
a
(1)

12
_
1
:
a

t=1
n
j.at
1
0
at


^
+
jY
_
H
?
(:)
_
7
0
2
Q
1
7
2
:
2
_
1
H
?
(:)
0

_
1
:
32
a

t=1
1
at
_
o
j
(1)
=
_
1
0
(1)
0
1
0
(1)

12
__
1
0
d1
j.a
1
0
a
_
H
?
_
H
0
?
_
1
0
1
a
1
0
a
H
?
_
1
H
0
?
_
1
0
1
a
. (60)
Applying (58), (59) and (60) to (57) and using the joint weak convergence of the random
elements of Proposition A1, we obtain
_
:i
a
_

_
H (:)
= 1
j.a
(1)
_
1
0
(1)
0
1
0
(1)
_
12
1
j.a
(1)
_
1
0
(1)
0
1
0
(1)
_
12
__
1
0
1
a
_
0
H
?
_
H
0
?
_
1
0
1
a
1
0
a
H
?
_
1
H
0
?
__
1
0
1
a
_

__
1
0
d1
j.a
1
0
a
_
H
?
_
H
0
?
_
1
0
1
a
1
0
a
H
?
_
1
H
0
?
__
1
0
1
a
_
_
1
0
(1)
0
1
0
(1)
_
12
= 1
j.a
(1)
_
1
0
(1)
0
1
0
(1)
_
12

__
1
0
d1
j.a
1
0
a
_
H
?
_
H
0
?
_
1
0
1
a
1
0
a
H
?
_
1
H
0
?
__
1
0
1
a
_
_
1
0
(1)
0
1
0
(1)
_
12
=
_
1
0
d1
j.a

_
_
1
0
d1
j.a
1
0
a
_
H
?
_
H
0
?
_
1
0
1
a
1
0
a
H
?
_
1
H
0
?
_
_
1
0
1
a
_
_
1
0
(1)
0
1
0
(1)
_
12
=
_
1
0
d1
j.a
(:)
_
1 1
a
(:)
0
H
?
_
H
0
?
_
1
0
1
a
1
0
a
H
?
_
1
H
0
?
_
_
1
0
1
a
_
_
_
1
0
(1)
0
1
0
(1)
_
12
= '
_
0, \
jj.a
_
1
0
\ (:)
2
d:,1
0
(1)
0
1
0
(1)
_
,
where
\ (:) = 1 1
a
(:)
0
H
?
_
H
0
?
_
1
0
1
a
1
0
a
H
?
_
1
H
0
?
__
1
0
1
a
_
,
27
as required for (20). Again, mixed normality holds because 1
j.a
is independent of both 1
a
and
1
0
.
7. References
Abadir K.M. and J. R. Magnus (2005). Matrix Algebra. Econometric Exercises, vol.1. Cam-
bridge University Press.
Anderson, T.W. (1959). On asymptotic distributions of estimates of parameters of stochastic
dierence equations. Annals of Mathematical Statistics, 30, 676-687.
Andrews, D.W K. (1987). Asymptotic results for generalised Wald tests. Econometric
Theory, 3, 348-358.
Andrews, D. W. K. and P. Guggenberger (2007). Asymptotics for stationary very nearly unit
root processes, Journal of Time Series Analysis (forthcoming).
Elliott, G. (1999). Ecient tests for a unit root when the initial observation is drawn from
its unconditional distribution. International Economic Review, 40, 767-783.
Elliott, G. and U. K. Mller (2006). Minimizing the impact of the initial condition on testing
for unit roots. Journal of Econometrics, 135, 285-310.
Harvey, D. I., S. J. Leybourne and A. M. R. Taylor (2007). Unit root testing in practice:
dealing with uncertainty over the trend and initial condition. University of Nottingham
(mimeographed).
Magdalinos, T. and P. C. B. Phillips (2007). Limit Theory for Cointegrated Systems with
Moderately Integrated and Moderately Explosive Regressors. Working paper.
Mller, U. and G. Elliott (2003). Tests for unit roots and the initial condition. Economet-
rica, 71, 12691286.
Park, J. Y. and P.C.B. Phillips (1988). Statistical Inference in Regressions with Integrated
Processes: Part 1. Econometric Theory 4, 468-497.
Perron, P. (1991). A continuous time approximation to the unstable rst order autoregressive
process: the case without an intercept. Econometrica, 59, 211-236.
Phillips, P. C. B. (1987). Time series regression with a unit root. Econometrica, 55, 277302.
Phillips, P. C. B. (1988a). Multiple regression with integrated processes. In N. U. Prabhu,
(ed.), Statistical Inference from Stochastic Processes, Contemporary Mathematics 80, 79
106.
Phillips, P. C. B. (1988b). Weak convergence to the matrix stochastic integral 1d1. Journal
of Multivariate Analysis, 24(2), 252264.
Phillips, P. C. B. (1989). Partially identied econometric models. Econometric Theory, 5,
181-240.
28
Phillips, P. C. B. (2006). When the tail wags the unit root limit theory, Yale university
(mimeographed).
Phillips, P.C.B. and S.N. Durlauf (1986). Multiple time series regression with integrated
processes. Review of Economic Studies, 473-495.
Phillips, P. C. B. and B. E. Hansen (1990). Statistical inference in instrumental variables
regression with I(1) processes, Review of Economic Studies 57, 99125.
Phillips, P.C.B. and C.C. Lee (1996). Eciency gains from quasi-dierencing under nonsta-
tionarity. In P.M. Robinson and M. Rosenblatt (eds.), Athens Conference on Applied
Probability and Time Series: Essays in Memory of E.J. Hannan, SpringerVerlag: New
York.
Phillips, P. C. B. and T. Magdalinos (2007a). Limit theory for Moderate deviations from a
unit root. Journal of Econometrics 136, 115-130.
Phillips, P. C. B. and T. Magdalinos (2007b). Limit theory for explosively cointegrated
systems. Econometric Theory (forthcoming).
Phillips, P. C. B. and V. Solo (1992), Asymptotics for Linear Processes, Annals of Statistics
20, 9711001.
Uhlig, H. (1995). On Jereys prior when using the exact likelihood function. Econometric
Theory, 10, 633-644.
White, J. S. (1958). The limiting distribution of the serial correlation coecient in the
explosive case. Annals of Mathematical Statistics 29, 11881197.
29

You might also like