You are on page 1of 10

Massachusetts Institute of Technology Guido Kuersteiner

Department of Economics
Time Series 14.384
Lecture 7: Unit Root Asymptotics and Unit Root Tests
In this lecture we relax the stationarity assumption in the sense that we allow for processes of the form
x
t
= x
t1
+ u
t
, (7.1)
P 1
where u
t
=
P

j=0
c
j

tj
such that |j|
2
|c
j
| < and
t
i.i.d.(0, 1). Under these conditions u
t
is weakly
P

stationary. If the polynomial C(L) =


j=0
c
j
L
j
is invertible then we can also view x
t
as being generated by
an innite order AR model. Thus
C(L)
1
(1 L)x
t
=
t
where the AR-polynomial C(L)
1
(1 L) now has one root on the unit circle. In other words we are considering
a generalization of the model
x
t
=
1
x
t1
+ ... +
p
x
tp
+
t
with (1
1
...
p
) = 0. In particular we are interested in estimation and testing of the unit root. It turns
out, that this can be done without fully specifying the short run dynamics of the model. We can show that
parameter estimates are consistent even if the model is misspecied and that they converge at a faster rate than

the usual T asymptotics suggest. This property is often referred to as superconsistency. These facts open
the way to some novel statistical procedures based on semiparametric approximations to the true underlying
model.
In order to develop the necessary asymptotic theory we return to the model in equation (7.1). Expanding
x
n
in terms of past innovations it follows that x
n
=
P
t
n
=1
u
t
+ x
0
. We can use the BN decomposition to analyze

X
u
t
= c
j

tj
.
j=0
The goal is to express u
t
as the sum of an independent innovation plus the dierence between two stationary
processes. We rst obtain a new representation of the lag polynomial C(L).
X
C(L) = c
i
L
i=0
! !

X X X X X X
= c
i
c
i
+ c
i
c
i
L + c
i
c
i
L
2
+ ...
i=0 i=1 i=1 i=2 i=2 i=3

X X

X
=
i=0
c
i
+ (L 1)
i=1
c
i
+

L
2
L
i=2
c
i
+ ...
P

We now dene the coecients c


j
=
k=j+1
c
k
such that C(L) = C(1)+(L 1)

P

C(L) where C(L) =
j=0
c
j
L
j
.

It follows immediately that u
t
= C(1)
t
+ (L 1) C(L)
t
where we also commonly write
t
= C(L)
t
. The
process u
t
can now be written as
u
t
= C(1)
t
+
t1

t
.
If we sum up the u
t
terms then the dierences
t1

t
cancel in the summation except for the rst and last
term. Such a sum is sometimes referred to as a telescoping sum. We have therefore
n
X
x
n
= C(1)
t
+
0

n
.
t=1
Note that E
t
= 0 and that var(
t
) < . This follows from
X X
1
2
j
2
|c
j
| < c
j
<
since
_ _
2
_ __ _

X X X X X X
=
2
c
o
+
_ _
c
o
+
_
j=0
c
j
j=1 k=j+1
|c
k
|
j=1 k=j+1
k
1/2
|c
k
|
__
k=j+1
|c
k
| /k
1/2
_

X X X
c
o
+ j
1/2
|c
j
|
k=j+1
|c
k
| /k
1/2
j=1 j=1

X X
= c
o
+ j
1/2
|c
j
|
k=1
|c
k
| k
1/2
<
j=1
where the rst inequality follows from the Cauchy-Schwartz inequality and the last equality follows from
counting the number of times each element |c
k
| /k
1/2
appears in the double sum.
We now dene a stochastic process on [0, 1] as
[nr]
1
X
1
X
n
(r) = u
j
+ x
0
r [0, 1]
n n
j=1
[nr]
=
C

(1)
X

j
+
1
n

0

[nr]

+
1
n
x
0
n
j=1
where [nr] denotes the largest integer number less than nr. The process X
n
(r) is right continuous and has left
limits. Functions with this property are usually called CADLAG.
The problem with the function space of right continuous functions with left limits is that it is not separable
under the uniform metric. Lack of separability can lead to nonmeasurability such that the standard theory of
weak convergence does not apply. For the processes we consider here this is however not a serious concern.
There are several ways to proceed. One can use a continuous approximation to x
n
(r) or one can use a dierent
metric, the so called Skorokhod metric to make the space of CADLAG functions complete and separable..
We will not go into these technical details in this lecture but just point out a few elements of the proofs
needed to establish that X
n
(r) converges weakly to a limit process. This result is known as Donskers Theorem
in the probability literature and the interested reader is referred to Billingsley (1968).
We have seen that
t
is stationary with nite variance. It can be shown that


1

sup


(
0

[nr]
)


0,
r
n p
p
and it also is innocuous to assume

1
x
0
0. We therefore are left with
n
[nr]
C(1)
X
X
n
(r) =
j
+ o
p
(1).
n
j=1
It is an immediate consequence of the CLT that for r xed
2
[nr]
X
n
(r) = C(1)
p

[nr]
X
n
p
[
1
nr]
j

j
N

0, C(1)
2
r

,

where

[nr]


r. Note that C(1)
2
= 2f
u
(0). Moreover, for r
1
< r
2
< r
3
... < r
m
xed, we have
n
d

[X
n
(r
1
), X
n
(r
2
) X
n
(r
1
), ..., X
n
(r
m
) X(r
m1
)] N

0, C(1)
2
R

,
where R = diag(r
1
, r
2
r
1
, ..., r
m
r
m1
). It should be pointed out that the increments X
n
(r
i
) X
n
(r
i1
)
are independent. We have therefore established that the nite dimensional distributions of X
n
(r) converges to
the nite dimensional distributions of Brownian Motion denoted as B(r) where B(r) is a process dened from
standard Brownian Motion W (r). Standard Brownian motion has the following properties.
1. W (0) = 0
2. W (t) has stationary and independent increments and for all t and s such that t > s we have W (t)
W (s) N (0, (t s))
3. W (t) N (0, t) t
4. W (t) is sample path continuous
We can now dene B(r) as W (r). We have shown that our limit process has the same nite dimensional
distributions as Brownian Motion. To show convergence of X
n
(r) in the function space we need to establish in
addition that for all , > 0 there exists a > 0 such that
!
sup P
|rs|<
|X
n
(r) X
n
(s)| > <
as n goes to innity. A proof of this statement is omitted but can be found in Billingsley.
We can now use the continuous mapping theorem to analyze the behavior of certain statistics of interest.
The rst result is an asymptotic representation of the sample mean. Note that unlike in the stationary case
the sample mean does not converge to a constant but rather to a random variable. In particular we have
n
Z
1
1
X
n
3/2
x
t

0
B(s)ds
R
1
where
0
B(s)ds is a standard Riemann integral and denotes weak convergence in the function space. We
R
1
R
will often write
0
B(s)ds = B to simplify the notation. The integral is a random variable because it has a
stochastic process as its argument. We now turn to the proof of this result.
We have from before
t1
X
x
t
= u
j
+ u
t
+ x
0
j=1
P
n
j=1
u
j
. Then such that
t=1
x
t
=
P
t
n
=1
(S
t1
+ u
t
+ x
0
) with S
t1
=
P
t1
n
P
1
X
1
X
S
t1
u
t
x
0
x
t
= + +
n
3/2
n n n
3/2
n
t=1
n
X
Z
t/n
= X
n
(r)dr + o
p
(1)
t1
t=1 n
3
P
X
0

S
t1
since
ut
= O
p
(n
1/2
) and

= O
p
(n
1/2
). Also X
n
(r) =
S
[nr]
+ o
p
(1) =

+ o
p
(1) for
t1
r <
t
and
n
3/2
n n n n n
R
t/n
1
t1
dr =
n
, such that
n
n
Z
1
1
X
n
3/2
t=1
x
t
=
0
X
n
(r)dr + o
p
(1).
Since the integral is continuous it now follows that
Z
1
Z
1
X
n
(r)dr B(r)dr
0 0
R
1
by the continuous mapping theorem. By linearity of the integral
0
B(r)dr is Gaussian N (0, ) where =
C(1)
2
. This can be seen from noting that E
R
0
1
B(r)dr = 0 by linearity of the integral and E(
R
0
1
B(r)dr)
2
=
R
0
1
3
R
0
1
E[B(r)B(s)]drds. Now E[B(r)B(s)] = C(1)
2
min(r, s) such that = 2C(1)
2
R
0
1
R
0
s
rdrds = C(1)
2
/3.
We next turn to the analysis of the sample variance of x
t
. In particular we have
n
1
X
2
x
t
=
n
2
t=1
=
=
=
1
n
2
1
n
2
X
Z
t/n
S
[
2
nr]
2x
0
C(1)
X
Z
t/n
S
[nr]
x
2
C(1)
2
t1 nC(1)
2
dr +
n t1

nC(1)
+
n
0
t=1 n t=1 n
Z
1
X
2
n
(r)
0
C(1)
2
0
C(1)
2
dr +
2x
0

C
n
(1)
Z
1
X
n
(r)
dr +
x
n
2
+ o
p
(1)
0
C(1)
Z
1
d
C(1)
2
W (r)
2
dr
0
n
X
(S
t
+ x
0
)
2
t=1
n
X

2
t=1

S
t
2
+ 2x
0
S
t
+ x
0
n n
which follows again by the continuous mapping theorem and the previous results, in particular the fact that
R
1 X
n
(r)
0 C(1)
dr = O
p
(1).
We are also interested in the behavior of score functions of the form
1
X
x
t1
u
t
n
We use partial summation. Note that
n
X
S
2
=
t
t=1
=
=
n
1
X
= (S
t1
+ x
0
) u
t
n
t=1
n
1
X
= S
t1
u
t
+ op(1).
n
t=1
n
X
h
(S
t1
+ u
t
)
2
S
t
2
1
t=1
n
X

2
t=1

u
t
+ 2u
t
S
t1
n n
X X
2
u
t
+ 2 u
t
S
t1
,
t=1 t=1
i
4

such that
!
n n n
1
X
1 1
X
1
X
2
S
t1
u
t
= S
t
2
u
t
n 2 n n
t=1 t=1
2

1
t=1
1
X
t

1
2
= S
2
n
u
n n
"

S
n
n

2
1
X
2
#
1
= u
t
2 n
d
1

2

B(1)
2

2

P
where
2
= c
j
2
. Then
1
2

B(1)
2

2

=
C(1)
2

W (1)
2
1

+
1
C(1)
2

2

2 2 |

{z }
It can be shown that
C(1)
2
W (1)
2
1

has a representation as a stochastic (Ito) integral
R
0
1
BdB. Also note
2
that W (1)
2

2
1
. It now follows immediately that
n
1
P
R
1
x
t1
u
t
BdB +
n( 1) =
n
2
P
x
2

0
R
1
B
2
t
0
where P
x
t
x
t1
= P
x
2
t
and x
t
= x
t1
+ u
t
. Several things are worth noting at this point. First of all, the estimator converges at rate
O
p
(n
1
) to the true parameter value, regardless of whether the short run dynamics of the model are correctly
specied or not. This sharply contrasts with the stationary case where misspecication of the model leads to
inconsistency. However, in the case of misspeci
R
cation, i.e. if the innovations u
t
are serially correlated, the limit
distribution has an asymptotic bias term / B
2
. It is the presence of this bias term, which makes inference
hard, since it is no longer possible to use tables to obtain critical values. Note that if = 0, i.e., u
t
iid then
R
WdW
n( 1) R
1
,
W
2
0
such that the limit distribution is nuisance parameter free. In other words, there is no need to use a t-test in
this case because the limit distribution of does not depend on an unknown parameter.
In this simple situation a unit root test would compare n ( 1) against the critical value of a statistic with
distribution

1
2

R
W (1)
2
1
.
W (r)
2
dr
This is possible since under H
0
: = 1 we have shown that
1
1)
2

R
W (1)
2
1

n (
W (r)
2
dr
Critical values are tabulated in Hamilton, Table B5, case 1. For example, if n = 100, then one can reject H
0
at 5% if n( 1) < 7.9 for a one-sided test. We can also look at the t-statistic, assuming again that u
t
iid
5

0,
2

n
1 1
t =


=

P
x
2
t1

1/2


where
1
X
1
X
x
t1
)
2
= (u
t
+ ( ) x
t1
)
2
= (x
t

n n
1
X
2
1
X
= u
t
+ ( ) x
t1
u
t
n n
1
X
2
p
+ ( )
2
x
t1

2
u
n
and

n (
2
2

W (1)
2
1
1
1)

n
1
2
X
x
t1
1


2
R
1
2

2
0
W (r)
2
dr
so

1
t

2
R

1
W (1)
2
1
1
2
0
W (r)
2
dr
which is tabulated in table B6, case 1. We will now see that the limit distribution of depends on the tted
model, even if the true underlying model remains unchanged. In particular we consider tting a constant while
the true process is still assumed to be x
t
= x
t1
+ u
t
with u
t
= C(L)
t
. The estimator for the autoregressive
parameter can now be written as
P
n
P
n
x) (x
t1
x) (x
t1

=
t=2
(x
t
x
1
)
= 1 +
t=2
(u
t
x
1
)
P
n
t=2
(x
t1
x
1
)
2
x
1
)
2
P
n
t=2
(x
t1

P
n
P
n
P
n
x
1
)
P
n
x
t=2
u
t
(x
t1

= 1 + P
n
x)
2
=
t=2
u
t
x
t1

t=2
u
t
t=2
(x
t1

t=2
x
t
2
1
n x
2
x = n
1
P
t
n
=2
x
t
and where x
1
= n
1
P
t
n
=2
x
t1
so
n t=2
u
t
x
t1

n
1
P
n
x
1
P
n
t=2
u
t
n ( 1) =
n
1
2
P
t
n
=2
x
t
2
1

1
x
2
+ o
p
(1).

n
Now, using the previous arguments we can establish the following results.
n
Z
1
X
2
n
2
x
t1
C(1)
2
W (r)
2
dr
t=2
!
2
n
1
2
1
X
x
n
=
n
3/2
x
t

Z
0
1
W (r)dr
2
C(1)
2
t=2
n
1
X
u
t
x
t1

W (1)
2
1

C(1)
2
+
n 2
t=2
n
x 1
X
u
t

Z
W (r)drW (1)

C(1)
2
.
n n
t=2
6

This leads to the following asymptotic representation of the estimator


1)
C(1)
2
W (1)
2
1

+ W (1)
R
0
1
W (r)drC(1)
2
2
n (
C(1)
2
h
R
W (r)
2
dr
R
W (r)dr

2
i
1
2

W (1)
2
1

W (1)
R
1
W
= h
R
W
2

R
W

2
i
0
+
C(1)
2
h
R
W
2

R
W

2
i
If = 0 then the asymptotic distribution is free of nuisance parameters and critical values can be taken from
Table B6, case 2. There are basically two possible ways to avoid the misspecication problem. One is to t
the correct parametric model to the short run dynamics. This approach is the basis of augmented Dickey-
Fuller tests (ADF tests). The second approach, the Phillips Z

test, uses a nonparametric correction to the
test statistic to account for omitted serial dependence.. The advantage of this second approach is, that the
unit root hypothesis can be tested without making specic assumptions about the parametric form of the
dependence in u
t
.
The Phillips Z

test uses a non parametric correction to obtain a nuisance-parameter-free limit distribution.
It is formulated in this way


n ( 1) +
1
P
x
2
n
2
t1
with being the demeaned estimator for the unit root and

1

= 2f
b
(0)
n
2
where
1
X
1
X
u
t

n
=
2
= (x
t
x
t1
)
2
n n
and
M
X
2f
b
(0) =
j=M

1
|j|
M

(j)
0 for M = O(n
1/4
) and
n
1
2
x
2
t1

0
W (r)
2
drC(1)
2
such that with (j) = n
1
P
u
t
u
tj
. Then

p
P R
1

1
1) +
1
P

x
2

2

W (1)
2
1

W (1)
R
0
1
W (r)dr
n (
n
2
t1
R
W (r)
2
dr
R
W (r)dr

2
We can use standard tables even though we have not fully specied the dynamics of the model. In particular
the null is rejected at a certain signicance level if


1) +
1
P
x
2
< c
1
n (

n
2
t1
or


1) +
1
P
x
2
> c
2
n (

n
2
t1
where c
1

are critical values for the level alpha for a one sided test from table B6, case two. There is a

and c
2
similar test based on the t-statistic. Note that Monte Carlo studies indicate that the Z

test has size distortions
for certain models such as x
t
=
t
0.8
t1
. Here x
t
is nonstationary but the Z

test rejects H
0
too often.
As mentioned before a parametric way to remove the nuisance parameters from the limit distribution is to
fully specify the short run dynamics. Assume that (L)x
t
=
t
;
t
iid(0,
2
) and (L) such that (1) = 0.
7
Then we can write
(L) = 1
1
L
2
L
2
. . .
p
L
p

= 1

1
+
2
... +
p
L +

2
+ ... +
p
L

2
L
2
. . .
p
L
p
= 1

1
+
2
+ ... +


L +

2
+ ... +


L(1 L)
+

3
+ ... +


L
2
(1 L) + ... +
p
L
p1
(1 L)
so x
t
= x
t1
+
1
x
t1
+ ... +
1
x
t+1
+
t
where
i
=

i
+ ... +


and =

1
+ ... +
p

1.
So under the null hypothesis
H
0
: (L) has one unit root
we have = 0 and x
t
= u
t
=
t
/(1
1
L ...
p1
L
p1
) is stationary. Stack the variables z
t
1
=
(x
t1
, ..., x
tp
, 1, x
t1
) and compute

X

1 X
0

= z
t
z
t
z
t
x
t
then

X

1 X

D
n
= D
1
z
t
z
t
1
D
1
D
1
z
t

t n n n
with
_ _
n
1/2

.
.
.


D
n
= .
_
n
1/2 _
n
Note that
_ _
X


0
R
0
D
1
z
t
z
t
1
D
1

_
0
R
1
R
B
_
;
n
0 B B
2
[

]
ij
= cov (u
i
, u
j
)
since n
1
P
x
t
0 and n
3/2
P
x
t
x
t
0. Thus
p
_ _

1
0

D
1
X
z
t
z
t
1
D
1

1

_

R
1
R
R
B

1
_
.
n n
0
B B
2
Also,
_ P _

1
u
t1

t
n

.
.
X

n

D
1
z
t

t
=

1
n
P
.
u
tp

t
P
_
1

t
_
n
P
1
n
x
t1

t
The rst p 1 elements converge to N

0,
2



. This is an important result in itself since it shows that the
stationary part of the model can be estimated and tested for by standard inferential methods. The reason for
this is that the parameters of the nonstationary part converge at a faster rate and can thus be considered as
constant in the asymptotic theory for the parameters of the stationary part of the model.
8
b
The nonstationary part of the model behaves according to

1
X

t
,
1
X
x
t1

W (1),
C(1)
2
(W (1) 1)

n n 2
P P
with C(1) = (1
1
...
p
)
1
from
1
x
t1

t
=
1
S
t1

t
+ op(1). Now by the BN decomposition we
n n
have
t1
X
S
t1
= C(1)
j
+ (
0

t1
)
j=1
such that
n t1
1
X
1
XX
1
X
1
X
x
t1

t
= C(1)
j

t
+
0

t
+
t

t1
n n n n
t=1 j=1

C(1)
2

W (1)
2
1

.
2
These results show that
1
n(
d
2

W (1)
2
1

W (1)
R
0
1
W (r)

dr
1)
C(1)

R
0
1
W (r)dr

R
0
1
W (r)dr
2

This can be seen by considering
"
R
1
#
1
1
"
R
1
B
2
R
1
#
1 B B
0
R
0
R
1
B
R
1
B
2
=
R
B
2
( B)
2

0
R
1
B 1
0 0 0
and
Z Z
B = C(1) W (r)dr
Z Z
1
B
2
= C(1)
2

2
W (r)
2
dr
0
such that it follows that
Z
n ( 1) =
R
C(1)

2
W (1) W +

2
2

W (1)
2
1


1 1
=
B
2
h

R
R
B

2
i

2

W (1)
2
1

W (1)
Z
W

C(1) W
2

R
W
We see that the limit distribution is free of a bias term but still depends on nuisance parameters. The bias
term vanished because we correctly modeled the short run dynamics. Unfortunately the limit distribution still
depends on the unknown long run variance of the process. This problem can be overcome by considering a
t-test rather than a Z test on the parameter directly. We consider a t-test of H
0
: = 1
1) n ( n ( 1)
=

se

D
1
P
z
t
z
t
1
D
1

1/2
n n
p+1
where

D
1
P
z
t
z
t
1
D
1

1/2
stands for the p + 1 diagonal element of

D
1
P
z
t
z
t
1
D
1

1/2
. We have shown
n n
p+1
n n
9
before that

X

1/2
1
D
1
z
t
z
t
1
D
1

n n
p+1
C(1)
h
R
W
2

R
W

2
i
1/2
.
It now follows that
1
1)

2
R

1
W (1)
2
1

W (1)
R
0
1
W

(r)dr n (


n n
p+1

D
1
P
z
t
z
t
1
D
1

1/2
0
W
2
(r)dr

R
0
1
W (r)dr
2

1/2
which is free of nuisance parameters. Critical values can be obtained from Table B6, section 2. It should be
pointed out that the limit distribution is independent of the number of estimated parameters for lagged x
ti
.
10

You might also like