Estimation and Hypothesis Testing of Cointegration Vectors in Gaussian Vector Auto Regressive Models

Estimation and Hypothesis Testing of Cointegration Vectors in Gaussian Vector
Autoregressive Models
Author(s): Søren Johansen
Source: Econometrica, Vol. 59, No. 6 (Nov., 1991), pp. 1551-1580
Published by: The Econometric Society
Stable URL: http://www.jstor.org/stable/2938278
Accessed: 16/08/2010 08:48
Your use of the JSTOR archive indicates your acceptance of JSTOR's Terms and Conditions of Use, available at
http://www.jstor.org/page/info/about/policies/terms.jsp. JSTOR's Terms and Conditions of Use provides, in part, that unless
you have obtained prior permission, you may not download an entire issue of a journal or multiple copies of articles, and you
may use content in the JSTOR archive only for your personal, non-commercial use.
Please contact the publisher regarding any further use of this work. Publisher contact information may be obtained at
http://www.jstor.org/action/showPublisher?publisherCode=econosoc.
Each copy of any part of a JSTOR transmission must contain the same copyright notice that appears on the screen or printed
page of such transmission.
JSTOR is a not-for-profit service that helps scholars, researchers, and students discover, use, and build upon a wide range of
content in a trusted digital archive. We use information technology and tools to increase productivity and facilitate new forms
of scholarship. For more information about JSTOR, please contact support@jstor.org.
The Econometric Society is collaborating with JSTOR to digitize, preserve and extend access to Econometrica.
http://www.jstor.org
Econometrica, Vol. 59, No. 6 (November, 1991), 1551-1580
ESTIMATIONAND HYPOTHESISTESTINGOF COINTEGRATION

VECTORSIN GAUSSIAN VECTORAUTOREGRESSIVEMODELS
BY S0REN JOHANSEN
The purpose of this paper is to present the likelihood methods for the analysis of
cointegration in VAR models with Gaussian errors, seasonal dummies, and constant
terms. We discuss likelihood ratio tests of cointegration rank and find the asymptotic
distribution of the test statistics. We characterize the maximum likelihood estimator of
the cointegrating relations and formulate tests of structural hypotheses about these
relations. We show that the asymptotic distribution of the maximum likelihood estimator
is mixed Gaussian. Once a certain eigenvalue problem is solved and the eigenvectors and
eigenvalues calculated, one can conduct inference on the cointegrating rank using some
nonstandard distributions, and test hypotheses about cointegrating relations using the x2
distribution.
KEYWORDS: Cointegration, error correction models, maximum likelihood estimation,

likelihood ratio test, Gaussian VAR models.
AND SUMMARY
1. INTRODUCTION
A LARGE NUMBER OF PAPERS are devoted to the analysis of the concept of
cointegration defined first by Granger (1981, 1983), Granger and Weiss (1983),
and studied furtherby Engle and Granger(1987). Under this headingthe topic
has been studied by Stock (1987), Phillips and Ouliaris(1988), Phillips (1988,
1990),Johansen(1988b),Johansenand Juselius(1990, 1991).The main statisti-
cal technique that has been applied is regressionwith integratedregressors,which
has been studied by Phillips(1988), Phillipsand Park (1988), Park and Phillips
(1988, 1989), Phillips and Hansen (1990), Park (1988), and Sims, Stock, and
Watson (1990). Similarproblemshave been studied under the name common
trends(see Stock and Watson(1988)).
The purpose of this paper is to present some new results on maximum
likelihood estimatorsand likelihood ratio tests for cointegrationin Gaussian
vector autoregressivemodels which allow for constant term and seasonal dum-
mies. This brings in the technique of reduced rank regression (see Anderson
(1951),Velu, Reinsel, and Wichern(1986),Ahn and Reinsel (1990),and Reinsel
and Ahn (1990)), as well as the notion of canonical analysis (Box and Tiao
(1981), Velu, Wichern,and Reinsel (1987), Pena and Box (1987), and the very
elegant paper by Tso (1981)). In Johansen(1988b)the likelihoodbased theory
was presented for such a model without constant term and seasonal dummies,
but it turns out that the constant plays a crucial role for the interpretationof
the model, as well as for the statisticaland the probabilisticanalysis.
A detailed statistical analysis illustratingthe techniques by data on money
demand from Denmarkand Finland is given in Johansen and Juselius (1990),
and the present paper deals mainlywith the underlyingprobabilitytheory that
allowsone to make asymptoticinference.
1551
1552 S0REN JOHANSEN
The structureof the paper is the following:Section 2 describesthe cointegra-

tion model and the tests for cointegrationrank. The asymptoticdistributionof
the likelihoodratio test statisticfor the hypothesisof r cointegrationvectors is
given. In Section 3 it is shown that the cointegration model with linear
restrictionson the cointegratingrelationsand the adjustmentcoefficientsallows
explicitestimation.The likelihoodratio test statisticof this hypothesisis given.
Section 4 gives a simple proof of Granger's representationtheorem which
clarifiesthe role of the constantterm and gives a conditionfor the processto be
integratedof order 1. In Section 5 the asymptoticdistributionof the maximum
likelihood estimator for the cointegratingrelations is given together with an
estimate of its "variance"to be used in constructingWald tests. The presence
of the trend gives rise to some new limit distributions.Section 6 containsa brief
discussionof the relation of the present work to the results of Phillips, Stock,
and Watsonand others, and the appendicescontain technicaldetails as well as
results for inference concerningsmooth hypotheses on the cointegratingrela-
tions.
2. THE STATISTICAL ANALYSIS OF THE VAR MODEL FOR COINTEGRATION AND

THE TEST FOR COINTEGRATION RANK
Consider a general VAR model with Gaussian errors written in the error
correctionform
k-1
(2.1) AXt= E riAXt-i + Xt-k + PDt + / + Et ( T),
where Dt are seasonal dummiesorthogonalto the constant term. Further, Et

(t = 1,..., T) are independent p-dimensional Gaussian variables with mean
zero and variance matrix A. The first k data points Xlk . .., XO are consid-
ered fixed and the likelihood function is calculated for given values of these.
The parameters Fl,..., Tk 1, 0, ,u, and A are assumed to vary without
restrictions,and we formulatethe hypothesesof interest as restrictionson H.
In this section we analyze the likelihood function conditionalon the initial
values. There are two reasons for this. Firstlywe shall discuss nonstationary
processes, for which only the conditionallikelihood can be defined, and sec-
ondlythe conditionallikelihoodfunctiongives the usual least squaresregression
estimators in the unrestrictedmodel, and hence gives tractable estimators.
When it comes to discussing the properties of the process Xt, as will be
necessary for the asymptoticanalysis, it is convenient (see Theorem 4.1) to
considersome linear combinationsof Xt as well as AXt as stationaryprocesses
underthe conditionsstated there. Thus the likelihoodfunctiondescribedin this
section is the conditionallikelihoodfunctionfor the observationsof X1,..., XT
from the process, describedin detail in Theorem 4.1, conditionalon the initial
values, that is, conditionalon the first k observations.
COINTEGRATION VECTORS 1553
Model (2.1) is denoted by H1 and we formulatethe hypothesisof (at most) r

cointegrationvectors as
(2.2) H2: H = af3,
where ,, the cointegratingvectors,and a, the adjustmentcoefficients,are p x r
matrices.Sometimeswe compare models with differentnumbersof cointegra-
tion vectors, and we then use the notation H2(r).
The purpose of the analysis of this paper is to conduct inference on the
number of cointegratingrelations as well as the structure of these without
imposinga prioristructuralrelations.This is accomplishedby fittingthe general
VAR model (2.1), which is used to describethe variationof the data, and then
formulatingquestions concerningstructuraleconomic relations as hypotheses
on parametersof the VAR model. These hypothesesare tested using likelihood
ratio statistics,and allow the researcherto check interestingeconomichypothe-
ses againstthe data.
It is seen that the parametersa and , are not identifiedin model H2, since
for any choice of an r X r matrix (, the matrices a6' -1 and ,83 imply the same
distribution.What can be determinedby the model is the space spannedby I,
the cointegrationspace sp(,l), and the space spanned by a, the adjustment
space sp (a). Note that the space spannedby , is the row space of 1H,and the
adjustmentspace is the columnspace of H.
It turns out that the role of the constant term is crucial for the statistical
analysisas well as for the probabilisticanalysis.It is provedin Theorem4.1 that
under certain conditions on the parameters the process given by (2.1) is
integratedof order 1. In this model the constant term ,u can be decomposed
into two parts,a(a'a) - 1a'l,whichcontributesto the interceptin the cointegrat-
ing relation (see (4.9)), and a, (ala,)-'al', u which determinesa linear trend.
Here al is a p x (p - r) matrixof full rankconsistingof vectors orthogonalto
the vectorsin a. The presence of the linear trend changesthe analysisand it is
therefore convenient to define a model H2* where the * indicates that apart
from the restrictionimposedunder H2 we also impose the restriction,u = a,l0,
where the (r x 1) vector -Bo has the interpretation as an intercept in the
cointegrationrelations. In this case clearly al, = 0, and the linear trend is
absent.
In order to facilitate the presentationof the main result of this section we
first introducesome notation.
Let Z0t = AXt, Zlt = (AXtl A
**,Xt-k+l, Df, 1Y, and Zkt =Xt-k, and let F
consist of the parameters(Fl, ... , Tk 1P, /,). Then the model becomes
(2.3) Zo = FZ1t + aP'Zkt + et
With this notation define the productmomentmatrices
T
(2.4) Mij = T 1 ZitZjt (i, j=0,1,k),

t=1
1554 S0REN JOHANSEN
the residuals
Rit = Zit- milmi-l zit ( i = 0, k),
and the residualsums of squares
(2.5) Sij =M11-M 1M11M1j (i, j=0, k).
The estimate of F for fixedvalues of a, 3, and A is found to be
( 2.6) F( a, 1/3)= (MO
Mo- aWMkl1) M1
Thus the residuals are found by regressing AXt and Xt-k on the lagged
differences,the dummies and the constant. This gives the likelihood function
concentratedwith respect to the parametersFl, ..., Tk- 1' P, and ,u:
(2.7) LTmax (a, 3, A)
T
IA Iexp T-1 E (Rot - aP'Rkt)'A -'(Rot - aP6'Rkt)}.
This functionis easily minimizedfor fixed f3 to give

(2.8) a(f) -Sok,B(P'SkkP)',
= S00- Sk(1'Sk3)
(2.9) A() -f'Sko,
togetherwith
(2.10) L-21T(f) = ISoo0I'(Skk SkoSoSok)f3I/ I'SkkI.1
This again is minimized by the choice ,B - (Vl,... ., r), where V= (vQ,... A,tvp)are
the eigenvectorsof the equation
(2.11) IASkk
-
SkOSOOSOk1 0
A A AA
normed by V'SkkV= I, and ordered by Al > ... > Ap> 0. The maximized
likelihoodfunctionis found from
r
(2.12) Lm/T(r)- ISooIH (1 -Ai).
This procedure is given in Johansen (1988b) for the model without constant
term and seasonal dummies,and consistsof well knownmultivariatetechniques
from the theory of partial canonicalcorrelationsand reduced rank regression
(see Anderson(1951) and Tso (1981)).
To give an intuitionfor the above analysis,considerthe estimate of H in the
unrestrictedVAR model given by H = SOkSkk. Since the hypothesisof cointe-
gration is the hypothesisof reduced rank of H, it is intuitivelyreasonableto
calculate the eigenvaluesof the 11 and check whether they are close to zero.
This is the approachof Fountis and Dickey (1989). Another possibilityis to
calculate singular values, i.e. eigenvalues of H'H, since they are real
and positive. It is interestingto see that the maximumlikelihood estimation
involves solving (2.11) which amounts to calculating the singular values of

'SOk/24o S-1'2=S-1"2 HSJ12, that is, a normalizedversion of the intuitively
reasonablesolution. The normalizationof the problemgiven by the likelihood
method guaranteesa simple asymptoticdistributionof the likelihoodratio tests,
which only depends on the dimension of the problem considered and not on
nuisanceparameters.The distributionsof the test statisticsare nonstandardand
have to be tabulated by simulation. They are natural generalizationsof the
Dickey-Fulleror unit root distributions.
The relation(2.12) gives the maximizedlikelihoodfunctionfor all values of r.
For r = p, H2(r) = H1, so that the ratio Lmax(r)/Lmax(p) is the likelihood ratio
test statistic of the hypothesis H2(r) in H1, and Lmax(r)/Lma(r + 1) is the
likelihoodratio statisticfor H2(r) in H2(r + 1).
The main result that is given here and proved in Appendix B is that the
asymptoticdistributionof the test statistic has a limit distributionthat only
depends on the dimension of the problem (p - r), and on whether a',I = 0 or
not. The relationof this resultto that of Stock and Watson(1988)is discussedin
Section 6.
THEOREM 2.1: The likelihood ratio test statistic for hypothesis H2: H =a13'
versus H1 is given by
p
(2.13) -21n (Q; H2H1)H -T E ln (1-A),
i=r+l1
whereas the likelihood ratio test statistic of H2(r) versus H2(r + 1) is given by
(2.14) -21n(Q;rlr+ 1) =-Tln(1-Ar+l).
The statistic - 2 ln (Q; H21H1) has a limit distribution which, if al ,u * 0, can be
expressed in terms of a (p - r)-dimensional Brownian motion B with i.i.d.
components as
(2.15) tr ((dB)F' [FF'du] F(dB)'},
where F' = (FM,F2), and
(2.16) Fli(t)= Bi(t)-fBj(u) du (i=1,.*.,p-r-1)
and
(2.17) F2(t)= t- 2
The test statistic - 2 ln (Q; rIr + 1) is asymptotically distributed as the maxi-
mum eigenvalue of the matrix in (2.15).
If in fact a'I,u = 0, then the asymptotic distributions of - 2 n (Q; H21H1) and
-2 In (Q; r Ir + 1) are given as the trace and the maximal eigenvalue respectively
of the matrix in (2.15) with F(t) = B(t) - JB(u) du.
1556 S0REN JOHANSEN
Here, and in the following,the integralsare all on the unit interval,where the
Brownian motions are defined. Note that integrals of the form fFF'du are
ordinaryRiemannintegralsof continuousfunctionsand the result is a matrixof
stochastic variables. The integral fF(dBY, however, is a matrix of stochastic
integrals,defined as L2 limits of the correspondingRiemannsums.
This section is concludedby pointingout how one can analyzethe model H2*.
First note that if jt = a80 then
a'Xt-k + =
a- 'X-k + af30 = aI3* Xttk,
for f3* = (,8',,p0Yand Xt*k = (Xtk, 1)Y.In these new variables the model is
written
k-1
=L-
JXta*'XtEkri Xt-i + + PDt + Et (t=1 .. , T).
i=1
The analysis is now performed as above by defining Z* = AXt, Z* =

(AXt-1, *...* AXt-k+1, D)', and Zkt = Xt* k, as well as the corresponding mo-
ment matrices M: and S[, (see (2.4) and (2.5)).
THEOREM 2.2: Under hypothesis H*: Hl= af3 and j.t = a,80, the likelihood
ratio statistics -21n(Q; H2f H1) and -21n(Q; H2(r)IH2*(r + 1)) are distributed
as the trace and maximal eigenvalue respectively of the matrix in (2.15), with
F= (B',1Y.
Finallywe test the hypothesis H2 in H2 by a likelihood ratio test, i.e., test

that the trend is absent under the assumptionthat there are r cointegrating
relations.
THEOREM 2.3: The asymptotic distribution of the likelihood ratio test

-2n (Q; H* H2) for the hypothesis H* given the hypothesis H2, i.e. aflpu = 0,
when there are r cointegration vectors, is asymptotically distributed as x2 with
p - r degrees of freedom.
The distributionsderivedin this section have been tabulatedby simulationin

Johansen and Juselius (1990) (p - r = 1, . .. , 5) in connection with an applica-
tion of the methods to money demand in Denmark and Finland. The tables
have been extended (p - r = 1,..., 10) by Osterwald-Lenum (1992); see also
Reinsel and Ahn (1990).
3. THE TEST OF HYPOTHESESON THE COINTEGRATING RELATIONS AND THE

ADJUSTMENT COEFFICIENTS
The purpose of fitting the VAR model and determiningthe cointegrating

rank is that one gets the opportunityto formulateand test interestinghypothe-
ses about the cointegratingrelationsand their adjustmentcoefficients.
Since the parametersa and , are not identifiedwe can only test restrictions
that cannot be satisfiedby normalization.
We consider here in detail a simple but importantmodel for linear restric-
tions of the cointegratingspace and the adjustmentspace that allows explicit
maximumlikelihoodestimation:
(3.1) H3: f3=Hfp and a-=A+i.
Note that H3 is a submodelof H2. The likelihood ratio test of the restrictions
H3 in the model H2 will be discussedbelow. There are of course many other
possible hypotheseson the cointegratingrelationsbut the ones chosen here are
simple to analyze, and have a wide variety of applications;see Johansen and
Juselius (1990), Hoffman and Rasche (1989), and Kunst and Neusser (1990).
Another class of hypothesesof the form ,B= (H(p,q/) can be solvedwith similar
methods (see Johansenand Juselius(1991), and Mosconi and Giannini(1992)).
For more general hypotheseson ,Bof the form h(,3) = 0 one can of course not
prove the existence and uniquenessof the maximumlikelihood estimator,but
such hypotheses can be tested by likelihood ratio or Wald tests using the
asymptoticdistributionof ,Bderivedin AppendixC.
Under hypothesis H3 we transformthe matrices Sij some more. Together
with A (p x m) we consider a (p x (p - m)) matrix B =A1 of full rank, such
that BA = 0, and introducethe notation
Shh.b = H'SkkH - H'SkOB(B'SOOB) BSok H,
5aa.b =ASOO A -A'Soo B(B'S00 B) 1B'SOOA,

and similarlyfor Sha.b, sah.b' Aab' Sbb' etc.
THEOREM 3.1: Under hypothesis H3: /3 = H(p and a = A p where H is p X s and

A is p x m, the maximum likelihood estimators are found as follows: First solve
(3.2) IAShhb -
Sha .bSaa b Sah .b I 0,
for eigenvalues A1> ... > AS> 0, and eigenvectors v v Then
(3.3) 1=H( v, ... ,vr),
(3.4) &=A(A'A) Sak.bf.

The estimate of A is found from
(3.5) Abb =Sbb,
( 3.6) A ab = Sab A aff Skb,
(3.7) Aaa.b = Saa -A' A.

1558 S0REN JOHANSEN
The estimate for F is found from (2.6) and the maximized likelihood function is
r
(3.8) LmaxT = ASolIH(1-Xi)
THEOREM 3.2: The likelihood ratio test statistic of the restriction , = Hp and
a =A if versus H2 is given by
r
(3.9) - 21n (Q; H3IH2) = T ln {(1 - Ai(H3))/(1 - Ai(H2)),
i=1
which is asymptotically X2 distributed with (p - m)r + (p - s)r degrees of free-

dom.
The proofs of these results are given in AppendixB and C.
4. GRANGER'S REPRESENTATION THEOREM
When we want to investigatethe distributionalpropertiesof the estimators

and the test statisticswe have to make more assumptionsabout the process in
order to rule out varioustypes of nonstationarity.The basic assumptionis that
for the characteristicpolynomialderivedfrom model (2.1),
k-1
(4.1) 17(z) = (1 -Z) I - E r(1 - z) z' _7Z k,
i=l1
it holds that IH(z) I = 0 impliesthat either Iz > 1 or z = 1. This guaranteesthat

the nonstationarityof X, can be removedby differencing.
If H has full rank it is well known that under the above condition the
equations(2.1) determine X, as a stationaryprocess providedthe initial values
are given their invariantdistribution.If we startwith a doublyinfinitesequence
{Et, we can represent the initial values and hence the whole process as
Xt =A - l(L) (Et+, + PDt),that is, as an infinitelinear combinationof the Et's
(see Anderson(1971, p. 170)).
If H has reducedrankwe want to prove that some linear combinationsof X,
have stationarydistributionsfor a suitablechoice of initial distribution,whereas
others are nonstationary.
The model defined by (2.1) is rewrittenas
(4.2) 17( L)Xt = -IIXt + IF(L) AXt = Et + /- + PDt ( t= 1, ... ., T),
where we have introduced !(L) = (U(L) - 1(1))/(1 - L). Note that -1 =
17(1) is the value of H7(z) for z = 1, and that - = - O(1)is the derivative of
lI(z) for z= 1.
The result that we want to prove is the fundamental result about error
correction models of order 1 and their structure.The basic result is due to
Granger(1983) (see also Engle and Granger(1987) or Johansen (1988a)).We
give a very simple proof here. In additionwe provide an explicit condition for
the process to be integratedof order 1 (see (4.4) below) and we clarifythe role
of the constantterm.
THEOREM 4.1: (Granger'sRepresentationTheorem). Let the processXt sat-

isfy the equation (4.2) for t = 1, 2, .. ., and let
(4.3) 1H=af3'
for a and f of dimension p x r and rank r, and let
(4.4) afIw31
have full rank p - r. We define
(4.5) C = P_L(af .l31_)- 1a1

Then AXt and I'Xt can be given initial distributions, such that
(4.6) AXt is stationary,
(4.7) t'Xt is stationary,
(4.8) Xt is nonstationary, with linear trend rt = C1ut.
Further
(4.9) E,l3'Xt a-(a) -at}.1u+ (afa) -af 2X38( af -1F aIA

(4.10) EAXt =X,,
apart from terms involving the seasonal dummies. If aIl ,u = 0, then r = 0 and the
linear trend disappears. If the initial distributions are expressed in terms of the
doubly infinite sequence {Et), then AXt has a representation
AXt = C(L)(Et + l + Dt)
with C(1) = C. For C1(L) = (C(L) - C(1))/(1 - L), so that C(L) = C(1) +
(1 - L)C1(L), the process X, has the representation
t t
(4.11) Xt=X0+CEi +'7t+C(L)P EDi+St-So,
i=l i=l
where St = C1(L)Et, and /'X0 = f'So.
PROOF:If we multiply the equation (4.2) by a' and a', respectively, we get
the equations
-a aa8Xt + a' V(L) AXt = af(Et + lu + ODt),
a', VI(L) AXt=a1(Et + + Dt).
To discussthe propertiesof the process Xt we solve the equationsfor Xt and

express it in terms of the cE's.The problem is of course that since H is singular
the system is not invertible, and we therefore introduce the new variables
Zt = ('Y)P- 'Xt and Yt= (1l 13I)- 131AXt, where 83 is a p x (p - r) matrix
of full rank such that /3',/3= 0. It is also convenient with the notation a!=
1560 S0REN JOHANSEN
a(a'a) -1 for any matrix a of full rank.With this notation note that a'a = I and
aa' = aa' = a(a'a)-'a' which is just the projectiononto the space spannedby the
columnsof a. The process AX, can be recoveredfrom Zt and Yt:
JXt = (X1X1 +A) JXt = 1yt + zt
This gives the equationsfor Zt and Yt
(4.12) - a'a/3'/3Zt + a'1i(L)t AZt + a'tV(L)/1Yt = a'(t + t + PDt),
(4.13) I(L)
a', AZt + a' (L)f? Yt = a (Et + / + Dt)).
The idea of the proof is now to show that the equationsfor the processes Zt
and Yt constitute an invertibleautoregressivemodel.
We write the equationsfor Zt and Yt as
A( L) ( Zt, Yt)= (a, aj_) ( Et + 1 + Dt )
with
d aff'8 + a'tV( z ),8-Z1): -A '( "(8
A(z) - a'i '(z)/3(1 i'( z
z) a'C )/3
For z = 1 this has determinant
IIaI IP,8
a(a= Ia'I 113'13
which is nonzero by assumptions(4.3) and (4.4). Hence z = 1 is not a root. For
z 0 1 we use the representation
A(z) = (a,ca )1H(z)(3, 1 (1-z 1)
which gives the determinantas
1A(z) I = 1(a, a ,) IIHI(z) I 1(/3,/3) 1(1_ z)(Pr).
This shows that all roots of 14(z) I = 0 are outside the unit disk, by the
assumptionabout H(z); see (4.1).
It follows that the systemdefinedby (4.12) and (4.13) is invertible,and hence
that Zt and Yt can be given such initial distributionsthat they become station-
ary. Hence also AXt =,,/Yt + p AZt is stationary apart from the contribution
from the centered dummies.This proves (4.6) and (4.7).
If these initial distributions are expressed in terms of a doubly infinite
sequence {Et}, then the process (Z, Yt') has the representation
Yt')'=A(L) l(a, a,l )'(E-t + A + ODt) .

(zt,
The expectationof Zt and Yt can be found from A(1)-'(a, a,Y,.tuFrom the
representationof the processes Zt and Yt we get a representationof AXt by
multiplyingby the matrix(P3A,/3). Hence
C(L) = (8A,81)A(L) '(a, a
For L = 1 we get (4.5). By summation of AXt we find that Xt has the
representation(4.11) and hence containsthe nonstationarycomponentCE= lis

together with a linear trend rt = C,ut, which proves (4.8) and completes the
proof of Theorem4.1.
Note that ,u enters the linear trend only through ca', , and that the linear
trend r is contained in the span of 3, , and hence cancels if we consider the
components /3'X,. The seasonal dummies are so constructedthat they remain
boundedeven after summationover t and hence do not contributeto the linear
trend.
Strictlyspeakingthe processes AX, and f3'X-k equal a stationaryprocess
plus the term involvingthe seasonal dummies,but we shall call such a process
stationary. One can also make the seasonal dummies stationary by initial
randomassignmentof a season.
Relations (4.3) and (4.5) displayan interestingsymmetrybetween the singu-
larity of the "impact"matrix H for the autoregressiverepresentationand the
singularityof the "impact"matrixfor the moving averagerepresentation.The
null space for C' is the rangespace for H and the rangespace for H' is the null
space for C. It is this symmetrythat allows the results for I(1) process to be
relativelysimple.
Note also that the condition (4.4) is what is needed for the process to be
integratedof order 1. If this matrixhas reduced rank, the process X, will be
integratedof higherorder then 1. Thus a theoryof I(2) processesin the context
of a VAR model can be based on the reduced rankof the first two matricesin
the expansion of H(z) at the point z = 1. The mathematicaland statistical
theoryfor such processeshas been workedout in Johansen(1988a, 1990, 1991).
It is of course easy at this point to give a representationof the process for the
model with a linear term added to (2.1). Such a term gives rise to a quadratic
trend in general. The asymptoticanalysisof such a model, however,becomes
somewhatmore complicatedbecause there are more directionsthat will require
special normalization.
5. ASYMPTOTIC PROPERTIES OF THE ESTIMATORS UNDER THE ASSUMPTION

OF COINTEGRATION AND LINEAR RESTRICTIONS ON a AND ,3
The asymptoticpropertiesof the estimatorsare here given under the hypoth-

esis H3 where restrictions are imposed on both a and /3. The maximum
likelihoodestimatorsunder H3 are denoted by ^. The results are corollariesof
Theorem C.1 which gives the asymptoticdistributionof the estimatorunder a
smooth hypothesis on the parameters.An importantresult is that inference
concerning /3 can be conducted as if the other parametersare fixed, and vice
versa. Thus inferenceconcerninga and the short term dynamicsrepresentedby
rF,..., Fk-l follow the usual results for stationaryprocesses. We therefore
concentrateon the results for f3.
The first result concerns p normalizedby c, i.e. f38c
= f3(c',f3) , such that 8'c
has full rank.The resultswill be expressedin the naturalcoordinatesystem in
sp (H). We choose, apart from 8 E sp (H), the projection of r onto sp (H),
1562 S0REN JOHANSEN
TJ = PH1r which is also orthogonal to /, since I,3rH=P PHr = ,B3 = 0. Next

supplement with s - r - 1 vectors YH= (YH1" * * YHs-r-1) E sp(H), such that
(P3,YH, TH)consists of s mutuallyorthogonalvectors spanningsp (H). Similarly
let y(p x (p - r - 1)) be chosen orthogonal to (/3, r) such that (/3, T, y)
span RP.
THEOREM5.1: Suppose hypothesis H3: a =Ai and /3= Hp is satisfied. If

0 the limit distribution of T(IC -/3) is given by
#H O,
(5.1) (I-8cC) cYH y( yG2GG2 du Y'YH) YiyfG2(dVa)'(C3)

where
(5.2) Va= (a'A-a) a'A-A'W

is independent of G' = (GQ,G2), defined as
(5.3) G1(t) = i'C(W(t) - W(u) du) (t e[0,1]),
(5.4) G t = -2 (t E=[0,1]),
G12= G1 - (G1G2 du )(G2G2du )G2
that is, the Brownian motion corrected for level and trend. The asymptotic
conditional variance is
(5.5) (I CC')YH (Yv f Gi2G

G2 du y y1 (I -c/)
0 (c'H7A - 1Hc) -I
which is consistently estimated by
(5.6) T(I-f3PCC)HtA H'(I-cf3'c) ? (c'l''ftc)

where V = (Vr+i'.. ., vp); see (3.2). One can replace the matrix vv' by Shh in (5.6),
and apply the identity
c'HAt Hc =c'f6(diag -A C,...,7 .

If rH= 0 then (5.1) and (5.5) holds with YH(P X (p - s)) orthogonal to /3such
that (P3 YH) span sp (H), and G1.2 replaced by G1.
The proof is given in Appendix C. The stochasticintegralsare all taken on

the unit interval. Since the constant term is included in the regressors the
process X, is correctedfor its mean in the preliminaryregressions.This is seen
to be reflected in the asymptotics by subtracting JW(u) du and u=fudu. Since
the process contains a linear trend only p - r - 1 components (y'C) of the
process W enter the result. The trend is describedby definingthe last compo-
nent of G by t.
Note that the limitingdistributionfor fixed G is Gaussianwith mean zero and
variance
(5.7) (I-( CC)YH(YHYJG12G12 du Y y(I I- c(' )
? (c'H'A - 1Hc) - 1
whichwe call the limitingconditionalvariance.Thus the limitingdistributionof
TQoc- /c) is a mixtureof Gaussiandistributions.See Jeganathan(1988) for a
general theoryof locally asymptoticallymixed normalmodels.
The result shows that if the 3's are normalizedby c one can find the limiting
distributionof any of the coefficientsand hence of any smooth functionof the
coefficients.Note, however,that if we were interestedin the linear combination
TTr(,3c -c3) then the limiting distributiondegeneratesto zero, since 4H(I -
/3(c'/3 -1c')YH= 0. A different normalization by T3/2 is needed in this case. The
resultcan be determinedfrom the proof of Theorem5.1 in AppendixC, but will
not be explicitlyformulatedhere.
Without proof we give the correspondingresult for model H3*, i.e. when
/i a/30, /3 = H(, and a =A if. Introduce YH such that / and YH span H and
=
define YH= (YH,OYand f = (0, 1)'. The normalization by the p x r matrix c is
now done as follows: /3*= ((3'c)- 1/3,(/'8c)-1/) = ('81,/30jY
THEOREM 5.2: Suppose hypothesis H3: a =Aqp, = H>p, and /i = a,30 is

satisfied; then the limit distribution of T(c3C-oc3) is given by (5.1) with G12
replaced by G1.
A consistent estimate of the asymptotic conditional variance is given by
(5.8) T(I-PCc')(H' ? V (H,O)'(I- X

HA HO
The estimator of the constant term behaves differently:Let G* = y'CW, G*(t) = 1,
and
G2 1= G*- G*G* du (G*G* du) G1*;

then
(5.9) T2(/3 G-G/3)

0 *(fG*1G2 du)
l fG* (dVa)'(c'/3)
and a consistent estimator for the asymptotic conditional variance is
(5.10) T(0,)Vartv(0,1)e v (CtfAro (3)

Here A,3,*) are the eigenvectorsfrom (3.2) withSijb replacedby Sl!*J.b.
1564 S0REN JOHANSEN
Note that T(13]-/3,) has the same limit distributionas that givenby (5.1) for
=0.
As an example of an applicationof the results given above consider the
following simple situation where r = 1, and where we want to test a linear
constraint K'/3 = 0 on the cointegration relation /3' = (,B1....,. BP).
We formulatethe result as a Corollary.
COROLLARY 5.3: If only 1 cointegration vector /3 is present (r = 1), and if we

want to test the hypothesis K'/3 = 0, then the test statistic
(5.11) T(K3) ((Al1 -1)(K K)
is asymptotically X2with 1 degree of freedom. Here A1 is the maximal eigenvalue

and /3 the correspondingeigenvector of the equation
|ASkk
-
SkOSOO5SOk =0.
The remaining eigenvectorsform v. A similar result holds for the model with no
trend.
Note that the test statistic (5.11) is very easy to calculate once the basic
eigenvalueproblem(2.11) has been solved. After havingpicked out the eigen-
vector that estimates the cointegratingrelation one can apply the remaining
eigenvectorsto estimate the "variance"of the coefficientsof the cointegrating
relations.
Thus if there is only one cointegrationvector /3 one can think of the matrix
(A,l 1)vv'/T as giving an estimate of the asymptotic "variance" of /3.
If we want to derive a confidence intervalfor the parameterp=132/31 we
define K' = (po, - 1, 0, ..., 0), such that K'/3 = po/3 - 32' which is zero if p = po.
Theorem5.1 yields the result that
wO(po)= T(p1O /2) ((Al - 1) E (POt1-V2j) -
is asymptoticallyX2(1) if p = po. Then the set

{pI Iw(p)I <xX2lj
will be an asymptotic1 - E confidenceset for the parameterp, where v 2_ is

the 1 - E quantile in the distributionof X2(1). A simpler interval can be
obtained by insertingthe estimate P = f2/f31 for po in the denominatorwhich
gives the interval
(5.12) p 1 +X1, Al -2j/82 T .

6. DISCUSSION
This paper addressesthree issues:first the problemof findingthe numberof

cointegratingrelationsin nonstationarydata, next the problemof estimatingthe
cointegratingrelations,and finallythat of testing interestingeconomichypothe-
ses about their structure.The approachis model based in the sense that we
assume that a VAR model describes the data adequately,but no economic
structureis imposed on the model in the initial analysis.The VAR model is
analyzedusing likelihoodmethods in order to answerthe above problems.
The method has the advantagethat, once the eigenvalue problem (2.11) is
solved, the inferencecan be based entirelyon the eigenvaluesand eigenvectors
found. The successivetests for the rank are all based on the eigenvaluesfrom
(2.11). For any value r of the cointegratingrankthe estimate of the cointegrat-
ing relations is the subset of the eigenvectorscorrespondingto the r largest
eigenvalues.Finallythe remainingeigenvectorscorrespondingto p - r smallest
eigenvaluescontaininformationabout the "variance"of the estimators.In view
of this summarylet us now consider some of the methods that have been
proposedbefore.
We first considerthe estimationof /, assumingthat the cointegratingrankis
known.The originalmethod of Engle and Grangerfor estimatingthe long-run
parametersconsisted of regressingsome of the variableson the others. This
gives consistentestimatorsas shown by Stock (1987), but the asymptoticdistri-
bution theory is complicated,which makes inference on structuralhypotheses
difficult(see Phillips(1990)for a discussionof these problems).Very brieflyone
can say that the simple regressionestimatorhas a limiting distributionthat is
composed of a mixed Gaussian distribution,a unit root distribution,and a
constant.One can get rid of the constantby includingthe lags, and one can get
rid of the unit root distributioncomponentby analyzingthe full system rather
than single equations.
A numberof other methods have been proposedfor estimatingcointegration
relations:Stock and Watson (1988) have suggestedthe smallest principalcom-
ponents of E,(X, - X)(X, -XY as estimatorof the cointegratingrelations,and
the orthogonal complement as estimator of the common trends. Bossaerts
(1988) has suggestedcanonicalvariatesbetween X, and X,-,, and there are a
numberof single equationmethods. Commonto all these methods is that they
give asymptoticinference for the cointegratingrelations that have the same
problem with the asymptoticinference as indicated above for the regression
estimator.A simulationof a numberof these methods has been performedby
Gonzalo (1989). He finds, not surprisingly,that if the data are generatedby an
error correction model, like the one we have analyzed here, the likelihood
methods, which derive methods by analyzingthe model, have a better perfor-
mance.
Other methods give mixed Gaussian limit distributions.Phillips (1988) has
suggested a nonparametricspectral regressionmethod which permits the esti-
mation of long-runequilibriumrelationshipsin the frequencydomain. Phillips
(1990) contains a discussionof maximumlikelihood estimationin a numberof
1566 S0REN JOHANSEN
models, includingthe VAR. Park(1988) and Phillipsand Hansen (1990)suggest

a regression estimator where the regressors are corrected using a spectral
estimate of the long-runvariance matrix.Finally Engle and Yoo (1989) have
suggested a three stage estimatorfor the error correctionmodel, which starts
with the originalregressionestimator,calculates the remainingparametersby
OLS, and then performsone step in a Newton-Raphsonalgorithmin order to
approachthe maximumof the likelihoodfunction.This estimatoris asymptoti-
cally equivalentto the maximumlikelihoodestimatorin the VAR context.
Next consider the problem of determining the cointegration rank r. A
systematicapproachto findingthe cointegrationrank is proposedby Stock and
Watson (1988). They determine the rank of the cointegratingspace by first
estimating /3 by a given number of principal components, then filtering the
commontrends ilS AX, by fittingan autoregressiveprocess, and finallyregress-
ing the residualson the summed residuals.The estimated coefficientmatrixis
then investigatedfor unit roots. The procedure is repeated until the correct
numberof cointegratingrelationsis found.
In order to facilitate the comparisonbetween their method and the method
derived from the likelihood function in the VAR model, we present in an
informalmannertheir calculationsin our notation. The principlecomponents
are almost the same as the smallest eigenvectorsin Skk, and the subsequent
fitting of an autoregressivemodel to Af31X, is similar to the autoregressions
performedin this paper. Thus the final matrixthat they analyzeis analogousto
II SOkIL(IL S kkt)-8 . An investigationof eigenvaluesin this matrixis analo-
gous to an investigationof eigenvaluesof 1 = SokSk. The limitingdistribution
of the matrixis of the form encounteredhere, that is (JBB'du)-1fB(dB)' for a
(p - r)-dimensionalBrownianmotion B. Thus many of the calculations are
similarto those based on the likelihoodmethodsfor the errorcorrectionmodel,
but the setup is different, in that the present approachis model dependent
throughthe use of the VAR model and the likelihoodanalysis.
Throughoutthis paperwe have assumedthe Gaussiandistributionin order to
be able to analyzethe likelihoodfunction,with the purposeof developingnew
methods that are presumablyoptimal for this distribution.These methods
clearly depend on the VAR model assumptions,and major departuresfrom
these assumptionswould requirenew models. It would, however,be interesting
to see how robust the methods derived are to minor departures from the
assumptions.
The assumptionof a Gaussian distributionis not so serious, as long as the
process EY2Ei can be approximatedby a Brownianmotion, since it is not difficult
to see that the asymptoticanalysisgives the same results.
The choice of lag length is more important,but simulationsindicate that for
moderate departures (which would not be detected in the initial statistical
analysis)the inference does not seem to change too much;see Gonzalo (1989).
It is the advantageof the model based inferencepresentedhere that one can
checkwhetherthe model fits the data, and one can give a precise formulationof
the economic hypotheses to be tested, but the methods are clearly model
dependent. If major departuresfrom the model assumptionsunderlyingthe

present analysisare relevant,a new model should be formulatedand analyzed.
It is importantto note that for VAR models that allow integrationof higher
order, the likelihood analysisis more complicated.Jt turns out, however, that
the present methods can be applied to processes that are I(2) with only minor
modifications;see Johansen(1991).
Institute of Mathematical Statistics, Universitetsparken5, DK-2100 Copenhagen

0, Denmark
Manuscript receivedJanuary, 1989; final revision received March, 1991.
APPENDIX A. SOME TECHNICAL RESULTS
Since we have proved in Theorem 4.1 that under certain conditions AX, and B'X, can be
considered stationary, it follows that the stochastic components of Z't = (AX1'_1, . . ..
AXt-k-l, D,, 1) are stationary. Let Xkk = Var (Xt -kIZl)- Since the process Xt -k is nonstationary,
this variance clearly depends on t, but since f3'Xt-k is stationary, then 'hktkP does not depend on
t. We shall indicate this by leaving out the dependence on t and defining
Var
Ax IZ, =
oo OkP
- V B
('X tk ) ( '4'XkO kk
The first result concerns the relations between these variance-covariance matrices and the parame-
ters a and ,. in the model H2.
LEMMA A.1: The following relations hold:
(A.1) 43=aI3XkO+A,
(A.2) Ok P = atXkkI3,
and hence
(A.3) voo= a(I3'-kkk3)af + A.
These relations imply that
(A.4) (a' - l5a) -a a - 1 = (afA -la) -aA -, 1
(A.5) X _-I '( a'(a av 'a) - = (a'L4aa ) 1,
= a, (a', Aa,) af
=A-' A-1a(aA-1a) a'A1,
(A.6) 'kk( kOX'ok) - '1kkI- kkP = ('-1 a)1
PROOF:From equation (2.3),

AXt =FZlt+HZkt+Et,
one finds immediately the results (A.1), (A.2), and (A.3). To prove (A.4) multiply first by a from the
right, and both sides become the identity; then multiply by 40a, = Aa1 and both sides reduce to
zero. Since the p xp matrix (a, Aa,) has full rank the relation (A.4) has been proved.
1568 S0REN JOHANSEN
The first relation in (A.5) is proved the same way, by multiplyingby (a, 400a_L).The second
equality in (A.5) follows from (A.3) since a'X 00 = aA, and the third is proved as the first.
Finally(A.6) is provedby insertinga = XokI3(P'XYkkP)-' such that (A.6) becomes
(aZ ) -18'kk = (a'1a)l.
This relationcan be provedby multiplying(A.4) by
lv00a(da) -= (aP'XkkI3a' + A)a('a)1.
This completesthe proof of LemmaA.1.

The asymptoticpropertiesof the nonstationaryprocess X, are describedby a Brownianmotion
W in p dimensionson the unit interval.This Brownianmotion is the limit of the randomwalk
LTi=06j,which appearsin the representation(4.11) and can be found by rescalingthe time axis and
the variablesas follows:
[TuI
T- 2 E iW(U) (U E [0, 1])-
i=O
Fromthe representation(4.11) it followsthat X, is composedof a randomwalk,a lineartrend,and

a stationaryprocess. The asymptoticpropertiesof the process therefore depend on the linear
combinationof the processwe consider.If we considerr'X, it is clearthat the processis dominated
by the linear trend, whereas if we take vectors y which are orthogonal to r and linearly
independentof f3, then the dominatingterm is the randomwalk. Finally if we take the linear
combinationsf3'X1, then both the trend and the randomwalk are multipliedby ,3'C= 0, and the
processbecomesstationary.Thus let y(p x (p - r - 1)) be chosenorthogonalto T and ,3, such that
(,f, Y, r) span all of RP. The propertiesof the processare then summarizedin the followinglemma.
LEMMAA.2: Let T -- oo and u E [0,1];
[Tu]
w
(A.7) T- Y'X[TU]
= T- Iy'C S E? + op(l) y'CW(u),
i=o
[Tu]
P
(A.8) T-'rT'X[TU]
= T- r'C E + 'r[Tu]T- + op(l) T Tu.
i=o
Note that the limitingbehaviorof the nonstationarypart of the processis completelydescribed

by the matrixC (see (4.5)), the directionr = C,u,and the variancematrixof the errorsA.
Using these resultsone can describethe asymptoticpropertiesof the productmomentmatrices
and Sij definedin Section2, whichare basicfor the propertiesof the estimatorsand tests. We also
need the productmomentmatriceswhen only the interceptis correctedfor
T
Mkk = T E (Xt-k -X-k)(Xt-k X_k)f,

t= 1
Mko = T-1 Xt-k - Vk ) (?-t- TY

t= 1
We do not give the asymptoticresultsin detailhere since the proofsare similarto those in Johansen
(1988b),whichare based on the resultsin Phillipsand Durlauf(1986),and are simpleconsequences
of the representation(4.11),but we summarizethe resultsin two lemmas.
LEMMAA.3: For r = CA - 0 define BT = (, T 2) and define G' = (G21,G, as in Theorem 5.1;
then
(A.9) T- 1BTSkkBT f GG'du,
w
(A-10) B+(Sko - SkkI3a') fG(dW)'.
Finally P'Skkp k 'Sk : P'OXkoand S00 X00

If r = 0, we choose y(p x (p - r)) = /3k, delete the terms involving r, and then (A.9) and (A.10)
hold with G replaced by G1. The same results hold if Skk is replaced by Mkk and Sko is replaced by
Mko.
Under hypothesisH2*,whichalso restrictsA to have no trendcomponent,i.e. T = 0 or A = a,f0,

we get differentasymptoticresults.Estimatesare calculatedusingthe matricesS J (see Section2),
and we define ,*' = (,f', t30), choose 5' = (0,1), and y*' = (f3L , 0). As , has full rank the vectors
( are r+ (p - r) + 1 =p + 1 linearly independent vectors spanning R We then have
the followinglemma.
LEMMA A.4: Let BT=(Y*,T 2, and define G*= (G1*',G*), as in Theorem 5.2; then
T 1ETlP'Xlk- -k , and
(A.11) T- 1B*'S* B* wJG*G*

du,
(A.12) BT'(S - Skk* pa') A JG*(dW)
Finally P*'S*P* 2 Xkk pS P'X,0 and 00-

By movingthe constant term to the vector Xt-k, we no longer correct for the mean in the
processW and the added 1 gives an extradimensionto the matrixS*k. It is seen that the constant
term plays an importantrole for the formulationof the limitingresults,either because it impliesa
lineartrendfor the nonstationarypartof the process,or becauseit enters the cointegrationvector.
The two cases require a differentnormalization.The seasonal dummiesdo not play an equally
importantrole once they havebeen orthogonalizedto the constantterm.The reasonfor this is that
quantitieslike T- 1ET= D1 AXt and T-ET DtXIDk remainboundedin probabilityas T -moo. The
crucialproperty,whichis appliedto see this, is that the partialsumsof D, remainbounded.
APPENDIXB. PROOF OF THE RESULTS IN SECTION 2 AND SECTION3
Proof of Theorem 2.1

The likelihoodratio test statisticof H2 in H1 is given in the form
p
(B.1) - 21n(Q;H2IHi) = -T E ln(1 -Ai)
i=r+l
where the eigenvaluesAr+1,...-AP are the smallestsolutionsto the equation
(B.2) ?0;
ISASkk-SkOSO'SOk
see (2.11). Let S(A)1=SASkk-SkOSoSOk. We apply Lemma A.3 to investigate the asymptotic
propertiesof S(A) and applythe fact that the orderedsolutionsof (B.2) are continuousfunctionsof
the coefficientmatrices.
1570 S0REN JOHANSEN
As in LemmaA.3 we let y be orthogonalto 13and r, such that (,3,y, r) span RP. We then find
from Lemma A.3, that for BT = (7, T ) and AT = (,1, T- 2BT) we get
,8 -kk 8 0 ] [pzz-
(B.3) IA' (S(A)) AI A[ JGGdu -[VkO-VO~O~ oj6
= IA13'Xkk13-3'XkoZ OXOk81I AJGG'du
which has r positiveroots and (p - r) zero roots. This shows that the r largestsolutionsof (B.2)
convergeto the roots of (B.3) and that the rest convergeto zero.
Next considerthe decomposition
1(3, BT)'S(A)(18 BT) I = 1I8'S(k)8I I|BT{JS(A) -S(A)P['8S(A)P] -,PfS(A)}BT|
and let T -X 00 and A -- 0 such that p = TA is fixed. From Lemma A.3 it follows that
13'S(A)13 = pT 16'Skk13-13'SkoSj7Sokk3 - -1'koX&Xok1,

whichshowsthat in the limit the firstfactorhas no roots. In orderto investigatethe next factorwe
note the followingconsequencesof LemmaA.3:
B+S(A)13 = pT 'BTSkk16 BTSkOSOO'SOk1 = 0BSko&'Ok1 +op(l),
and
B'T{S(A) - S(A)183[p'S(A)13] ',3'S(A)}Bk = pT 'B'TSkkBT- BTSkoNSokBT + Op(l),

where N is a notationfor the matrix
V
XOO-12OOXOk4k13X8Xk0OXOk- 3'] Xko0X&-
By LemmaA.1 this matrixequals a, (aIlAa-)-lal, which shows that the limit distributionof
BT SkOa, = B'T(SkO - Skk13a')a? can be found from LemmaA.3.
The above resultsimplythat the p - r smallestsolutionsof (B.2) normalizedby T convergeto
those of the equation
(B.4) pJGG' du - fG(dWa)' (a' Aa? lafl f(dW)G' = 0,
where G1(t) = yC(W(t) - JW(u) du) and G2(t) = t - 4, and G' = (G, G2).
In order to simplifythis expressionintroducethe (p - r)-dimensionalBrownianmotion U =
(a',Aa?) 2af W, which has variance matrix I, and the (p - r + 1)-dimensional process F(t) =
(U(t)' - fU'(u)du,(t- 4))' We can then write(B.4) as
(B.5) L(PFF' du - f(dU)'f (dU)F)L = 0,
where the (p - r) x (p - r + 1) matrix L has the form
Lll ?
L
,
and L1 =
'l3, (a' IP,8l)- '(a'(cAaL) applying the representation (4.5) for the matrix C. The
F
process enters into the integralswith the factor L1l which are p - r - 1 linearlyindependent
combinationsof the componentsof U - JU(u) du. By multiplyingby (L1IL'l )- 2 we can turn these
into orthonormalcomponentsand by supplementingthese vectorswith an extraorthonormalvector,
which is proportionalto (a'IAa) -L 2ca,, we can transformthe process U by an orthornormal
matrix0 to the process B = OU. Then the equationcan be writtenas
(B.6) p fFF' du-fF(dBY f(dB)F'| = 0,

where F is given by (2.16) and (2.17). This equation has p - r roots. Thus we have seen that the
p - r smallest roots of (B.2) decrease to zero at the rate T- 1 and that TA converge to the roots of
(B.6). From the expression for the likelihood ratio test statistics we find that
p
-21n (Q; H21H,) = T E Ai+ op(l)
i=r+ 1
- L
E i = tr ((dB)F [fFF' du f F(dB)')
i=r?1l
Note that if r = 0, i.e. the linear trend is missing, then again applying Lemma A.3 we can choose
y = f3L, and the results have to be modified by leaving out the terms containing ir. The matrix Ll, is
(p - r) x (p - r) and cancels in (B.5) so that the test of H2 in H1 is distributed as
(B.7) T2 = tr (f(dU)F [JFF' duI fF(dU)'}
with F(t)= U(t)- JU(u)du. This completes the proof of Theorem 2.1.
Proof of Theorem2.2
The estimation under H2* involved the solution of the equation
(B.8) IAS*k-S*oSo S-kI = 0;
see (2.11) with the S11replaced by S!*. Let A* = (,f*, T- 2B*) and multiply the matrix in (B.8) by
AT and its transpose (see Lemma A.4), and let T m-*o. The roots of (B.8) converge to the roots of
the equation
[A13'XkkI3- PXkOXOOOkP 0 ]
0.
[ L?0 AJG*G*' du]
This shows that the r largest solutions of (B.8) converge to the roots of the same limiting equation
as before (see (B.3)). Now multiply instead by BT and its transpose and let p = TA and A -O 0; then
we obtain, by an argument similar to that given in the proof of Theorem 2.1, that in the limit the
p - r + 1 smallest roots normalized by T will converge in distribution to the roots of the equation
-
pfG*G*' du - fG*(dWyaC (a1Aa1) la1f(dW)G*' =0.
Again we can introduce the p - r dimensional process U = (a',AaYa)- faI, W and cancel the matrix
(af -3'(a' Aa,) 2 to see that the test statistic has a limit distribution which is given by
(B.9) T2* f(
tr(Id) )
U f
J )(U du J(U)(d))
The result for the maximal eigenvalue follows similarly. This completes the proof of Theorem 2.2.
Proof of Theorem2.3
From the relation
() [f( )()duI (1) -1 + (U-U) f(U- U)(U- U)'duI (U- U)
where U = JU(u) du, it follows that T2*= U(1)'U(1)+ T2 (see (B.7) and (B.9)). The likelihood ratio
test statistic of H2* in H2 is the difference of the two test statistics considered in Theorem 2.1 and
Theorem 2.2. Furthermore the test statistics have the same variables entering the asymptotic
expansions and hence the distribution can be found by subtracting the above random variables T2
and T2*,but U(1YU(1)is x2(p - r).
1572 S0REN JOHANSEN
Proof of Theorem3.1
We multiply (2.3) by A' and B' respectively and insert a =A i and ,3 = Hep and obtain
(B.10) A'Zot =A'IZ,t +A'AtP1p'H'Zkt +A'Et,
(B .11) B'Zot = B'FZt1 + B'EtX
since B'Atf = 0. These equations are analyzed by considering the contribution to the likelihood
function from (B.11) and then the contribution from (B.10) given (B.11).
The contribution from (B.11) after having maximized with respect to the parameters
(Fl,..Fk -1,,A)is
L-21 = Abb exp {T /IB'Bi,

(Abb) ER bbtAbb Rbt}
where Rbt = B'Rot and Abb = B'AB. Maximizing we find
Abb = Sbb = B'SOOB,

which proves (3.5). The relevant part of the maximized likelihood function is
Lm2/T= ISbbI/IB'BI .
The contribution from (B.10) given (B.11) is, after the initial maximization,
( T
2/ T qi,;P, -1)= JA .b Iexp T- p I: 'A-1Rt)/lA'Al,
Lmax ( tt7s7A
4aa.b , ab Abb )|aa t Aaa.bt)/lA
t=1
where
Rt =Rat-AabAb'Rbt -A'At/(p'RhtX
and Rat =A'Rot and Rht = H'Rkt. Minimizing with respect to the parameter AabAbb gives rise to
yet another regression of Rat and Rht on Rbt, and the estimate
AabAbb( p, tf) = (Sab -AAAtIp'Shb)S,b'.

This proves (3.6) and gives the new residuals
Ra.bt = Rat -SabS 'Rbt

and
Rhbt = Rht -ShbSbb Rbt.
For fi =A'At, the likelihood function is reduced to the form (2.7) in terms of Ra.bi Rh.bt, ('
and Aaa.b. Hence the solution can be found by solving (3.2) and using a =A(A'A)- 1' together with
the relations (2.8), (2.9), (2.12), and (2.6), which completes the proof of Theorem 3.1.
Proof of Theorem3.2
The limit result follows from Theorem C.1, proved in Appendix C. What remains is to calculate
the degrees of freedom. The matrix H = a,f3' =Aqip'H' is identified, as is the matrix S?p'. Now
normalize Sp to be of the form 'p'= (I, Xpf) with Spo of dimension r x (s - r). Then there are
rm + r(s - r) free parameters under the assumptions of H3. For m = s = p we get the result for H2
and the difference is the degrees of freedom for the test.
APPENDIX C. ASYMPTOTIC
INFERENCE
Proof of Theorem C.1
There is a qualitative difference between inference for , and that for the other parameters. It
was proved by Stock (1987) that the regression estimate for , was superconsistent. This has
consequences for the usual proof for asymptotic normality, as later exploited by Phillips (1990).
The idea can be illustratedas follows:Let e = (01, #02) denote all the parameters,and #2 the
parametersin ,3. Let q denote the log likelihoodfunctionnormalizedby T, and q1,q12,etc. denote
the derivativeswith respect to 11, 11 and #2, etc. The usual asymptoticrepresentationfor the
maximumlikelihoodestimatoris
[qll q12 [1 -a 1 kqi

Lq21 q22 -
2 122] q2-
If one can now prove that q11,q12,q21are Op(l) whereas q22 is Op(Tv) for some v > 0, then it is
convenientto normalizeas follows:
(C.1) E q1I 2q2 Tq

T 2q21 Tvq22 ] [T +1)( 2 - t2) ] [T 2(V)q2]
Since by assumptionT- Yq12 P 0, the equation(C.1) splits into
ql,T 2( 1-1 =T
)
and
[T-q2j ] T(VI +1)(e ) -

22) l
These equationsare the ones we wouldget when conductinginferenceabout 11 for fixed#2 and #2
for fixed #1. The same expansionwill show that the likelihood ratio test statistic for a simple
hypothesisabout e is
-21nQ~ TI 1'1 H| qll q12 1 - 11
[22- 2 q21 q221 2-12I

- - 1)
T-[ ( 1-1) l qll T-2 q121 T2 (e1
LT2l'1)(e2-2) LT 2^ T-Vq22 T24 1)(52

-
^2-12 -~2 q2
T2( 1 -1) qjjT2 (1- 1)
+ T2(v + l)(~2
2-2)'[T vq22]T](T+2)( -
This showsthat the test statisticdecomposesinto a test for #1 and an independenttest for #2*
The above argumentindicatesthat inferenceabout #2 can be conductedas if 11 were known,
and vice versa.
We can prove the above property about the second derivativesof the likelihood function
concentratedwith respectto A andwe thereforededucethat inferenceabout(Il, .... k- 1'a,cP, A)
can be conductedfor fixed ,3; hence one can applythe well knownresultsfor asymptoticinference
for the stationaryprocesses AX, and 8f'X,. See Dunsmuir and Hannan (1976) for a general
treatment of smooth hypotheses for stationaryprocesses. The asymptoticdistributionof A is
somewhatmore complicated,and will not be treatedin detail here.
The asymptoticpropertiesof estimatorsand test statistics are discussed here for a general
smooth hypothesison the cointegratingrelations: p =,83(0), O - 6 cRk, leaving the remaining
parametersunrestricted.Let D,8(u) denote the derivativeof 8(0) with respectto e in the direction
u E Rk, i.e. the p x r matrix,with elementsEk=1Juap11ij(M)/m?k. We assumethroughoutthat D,8(u)
has full rankfor u # 0, and that ,3'D,3(u)= 0 for all u, where , is the value of the parameterfor
whichthe resultsare derived.This last conditioncan alwaysbe achievedby normalizing8(0) by 8,
i.e. by considering 8(0)(,3f,83(0))-' We also let D,3 denote the pr X k matrixwith element
((i, j), s) equal to
dVij(e)/ots (i = 1,..., p; j =1. r; s =1. k),
so that (D,3)u = vec(D,3(u)). The naturalcoordinatesystem in RP is given by (p, y, r) and the
1574 S0REN JOHANSEN
correspondingcoordinatesystemin Rk is definedby choosingu1, Uk orthogonalin Rk suchthat

(C.2) r'D3(ui) = 0 (i =1 k1)
and
(C.3) r'Dp(ui) # 0 (i = k1 + 1, ... k).
The behaviorof the estimateof e depends on which linear combinationu'(? - 1?)is considered
and we define NT as the k x k matrixwith ith columngiven by Tu (uiui)1, if i = 1,. ..,k1 and
T3/2ui(uu i)-1, if i = k1 + 1. k.
Furthermorewe need a (p - r)r x k matrixD,3 with ith columngivenby
(C.4) D31 = vec {(y, O)'D3(ui)} (i=1 kl),
(C.5) Dp3 = vec {(O, r) Df3(ui)} (i = k1 + . k).

We also define the variableY1= (AX-l. .. k+l, Xtk1
-, D )', as well as the parameters
iy = E(Yt)
and
T
n = lim T- E( Et_ P) ( Et_
T--oo t=1
The resultsbelow are derivedunder the assumptionthat the maximumlikelihoodestimatorfor e

exists and is consistent.
THEOREM C.1: Under the assumption H = a,8(Y the asymptotic distribution of
T2(( F1,a Fk 0 (Fl--. Fk,la, P))
is Gaussian with mean zero and variance matrix
A 9 (n-l o )
A 2A
The asymptotic distribution of e is given by
-
(C.6) NT(# - it) Df,'( JGG' du 0 (a'A la))D } D(3' vec (fG(dV)'}X
that is, Gaussian for fixed G with variance
(Dfi'(fGG'du ?
(aA-*a))D13
which we call the asymptotic conditional variance. Here V= 'AA-1W. It follows that the limit
distribution of (Ty, T312r)'( 6() - ,3) is given by
(C.7) D3f(D Bf(JGG' du ? (a'A -a))D13} Df3' vec (JG(dVY}-
If k1 = k, that is if 'r'Df3(u) = 0 for all u, then the results should be modified by replacing G by G1; see
Theorem 5.1.
The likelihood ratio test statistic of a smooth hypothesis e = ~ 1E RS, s < k is asymptotically
distributed as X2 with k - s degrees of freedom. A similar result holds if a',i = 0; only G should be
replaced by G* (see Theorem 5.2).
PROOF: The likelihoodfunctionconcentratedwith respectto A and dividedby T is givenby

q(i?k a,A) = -2 lnIA I - 2 tr{A T1X t(E, -)(E, - )}
for
k-1
Et-?- AXt - X- E i(AXt- -i-jx-) - aj(0)(Xt-k -'Vk)
- (Dt- D).
i=l
Here the bar denotes average. The derivatives are most easily found by a Taylor expansion. Thus if
q,(u) denotes the derivative of q with respect to tY in the direction u, we can find the derivative
from the expansion q(t# + u, a, A) = q(a, a, A) + ql,(u) + 0( u 2). We then get derivatives
qr(g) = tr {A TX,t(Et -)(AXt1 -AX-ji)'g ),
q,(a) = tr A -'T' 1(#t )( Xt-Xk

- X0 k) pa'),
q,p(f) = tr{ A -'T-1,t(t - -)(Dt -D)'f'},

-
qA(l) = 2 tr {A -lA 'T 1,t(Et - E)(-Et - 2 tr (A -1,
qo(u) = tr { A - 1Mo0kD(u)a'),
and second derivatives
qrr(g,g) = -tr{A'gT'X,(X,_ - -
qaa(a,a)= -tr{A-la'PT-1X,lt(Xt-k- -k)(Xtk-k)- -k) pa'},
q4,4,(f,f) = -tr {A-lf T-1,t(Dt - D)(Dt - D) f'},
qAA(l, I) =-tr { A -1lA - lA1T- 1,t(Et - )(et - )'} + 2 tr {A - 11A 11},
qe(u,u) = -tr {A 1T lkt(Et- )(Xt-k-X-k) D2P(u, u)a'}
- tr {A -aD,(u)'MkkD(u)')
together with similar expressions for mixed second derivatives. It follows from Lemma A.3, that all
terms in the above derivatives are Op(M),except the expression - tr{A -l'aD.(u)'MkkD.8(u)a'} in
q,, and that this tends to infinity, since the columns of D/3(u) are orthogonal to /8. Thus we apply
the above general argument, even though we have two different normalizations, and have shown
that inference for /8 can be conducted for fixed value of the other parameters, and vice versa.
It is not difficult to see by the central limit theorem for stationary ergodic processes (see White
(1984)) that the derivatives T2(qrF,..., qrk_,qa, qo, qA) are asymptotically Gaussian with mean
zero and variance matrix
which is also the limit of the matrix of the second derivatives with opposite sign with respect to these
parameters, such that the first conclusion of the Theorem holds and the variance is given by (C.7);
see also Lutkepohl and Reimers (1989).
To find the asymptotic distribution of the estimate of t# we expand the likelihood function
around the point t#, such that the other parameters are kept fixed. The relation (C.4) now takes the
form
tr { D,(u)'MkkD,8(t# - tY)a'A a) =tr { D/(u)'MkOA -1a)
for all u E Rk. In vectorized form this becomes

-
(C.8) [Dp'( AMkk ?9a(A a) D ] (t - ) ZD,' vec I MOk A - la).
The limiting behavior of the various matrices is given by Lemma A.3: From the identity BT(Y, T 2Ty
= Ry' + Ti' = I -
P., we get since B'D,8(u) = 0 that for u, v E Rk,
D(U)'MkkDP(v) = D(u)'( y, T2) B'TMkkBT(y, T2) DP8(v) = M(u, v),

1576 S0REN JOHANSEN
say. The matrix Mkk should be normalized by T- 1 and by BT to get convergence (see Lemma A.3),
but of course the factor T fi has to be taken care of. Now introduce the coordinate system wi =
T- 2Ui, i =1. kl, wi = T- lui, i = k1 + 1. k. Then M(wi, wj) is weakly convergent towards
u DP' [JGG' dIjD3u
(see (C.4) and (C.5)). Similarly
T2D(uf)'MkoA'a = T2DP(uY(y,TlT)B'TMkOA-a
converges weakly for u = w1 towards
UIDP3'JG(dV)'.
With this notation we can replace (C.8) by the result (C.6). By a Taylor's expansion we find (C.7). By
expanding the likelihood function around a fixed value of t# one finds that the test statistic for a
simple hypothesis for tY is
-21n Q = T tr fDp,l]( - t#) MkDpl(t#-t#)at'A-lca
(vec ( (dV)) D(DP( fGG' du a'A-1a)D} D/3
x (vec ( G(dV)')}
which for given G is x2 distributed with k degrees of freedom. Now if t# = one finds the same
resultexceptthat Df3 is replacedby (D/3)(DM), i.e. the pr x k matrixDf3 is multipliedby the k x s
matrix of derivatives DE
The likelihood ratio test statistic for the hypothesis tY= t(1) is then
asymptotically distributed as
Q' ( D, DRV(Q IG)D,8} 1D, - D,Dt ( DtDfi V(Q IG)D,Dt ) DtDR)Q,

where Q = vec (f G(dV)') and V(Q IG) = JGG' duX( aA - 1a. For fixed G the variable Q is Gauss-
ian with covariance matrix V(Q IG); hence this statistic is x2 distributed with k - s degrees of
freedom and as this distribution does not depend on G the result holds unconditionally.
This completes the proof of Theorem C. 1, but let us just see how these results can be applied to
find the asymptotic distribution of ,. The estimating equation is
k-1
AO= E flAX i+aPX.k+'D+/,
i=l1
which together with the identity

k-1
X0 EFlAX i + ap'X-k + +D+
OP ?
i=l
shows that
T2(4 -R.t) = -(T2(r-T1 - .. k-l -k_, a^-a - Y- T }
-acT2( P X-k.
The first term converges towards a Gaussian distribution with mean zero and variance matrix
A(y 2 - 'Ay + 1) and the second is normalized just right, so that the distribution can be found from
the second part of Theorem C.1. Note that the asymptotic distribution of a' 4 is Gaussian, but the
component which goes into the cointegrating relation has a more complicated limit distribution.
Proof of Theorem5.1
The proof consistsof simplifyingthe expressionsgivenin TheoremC.1. A point in sp(H) can be
represented as p + (YH' TH)t~, where (p,,yHrH) are orthogonal and span sp(H), and where
1= j i = 1. s - r, j = 1. r}. It followsthat D,3(u) = (YH, TH)U and that the equation
T'Dp (u) = T'(YH, TH) U = (0, THTH) U = 0
(see (C.2) and (C.3)), can be solved by choosing uij = oiej, i = 1. s - r - 1, j =1. r, where oi
are unit vectors in RS-r and ej unit vectors in Rr. Thus in this case k = (s - r)r and k1 =
(k - s - l)r. The (i, j)th column of D13 (see (C.4) and (C.5)), is given by
vec {(y, O)'(H )e} (i =1 . s - r - 1,j=1. r),

vec {(0,T) (0,TH)oe} (i= s - r,j= 1. r).
This shows that
Dfi3'(JGG'du 09(aA-1 a))DI3= P 0a'A-la,
where P is a notationfor
I'(H GjG' du Y'YH YJGIG2 du TITH
fYYJG2G'
T du Y'YH T4Tf G2G2 du 'TH J
The left handside of (C.7)becomes(tr(u'iu 1))-X1 tr(u'1j,) = t?jYmultipliedby T if i= s - r-

1 and T3/2 if i = s - r. The matrixD,B' vec{JG(dVY} has (i, j)th element equal to
tr (ejo,(YH,O)(Y,O)fG(dV)} = y,yf (i = 1.s - r - 1, j= 1. r)

Gl(dVj)
and
tr (ejo!(O, rH(O, r)JG(dVY} = H G (i = s - r, j=1. r).
Collectingthese resultswe obtain
(TyH,T3/2TH (YH,TH)P1(YH,THY(Y,T)fG(dVY(aA-1a)
The normalizedestimate.B =6,P(,) has the expansionaroundi# = 0:

pC(#) -c =) (I8c)l'#c)- + 0,(, 12
whichshows that
(C.9) T( 13-
PC ) C(I-Pc)(YH, 0)P 1(fH TH)(, )JG(dVY(a'A 1a) (c'P)1.
We can simplifythe expression

(YH, )P '(Y,H TH) (Y, r)G = YHP1YWyG1 + P THTG2
= YHpl(YHYGl -P12P22T TG2)
= YH(YHYJG1:2Gl:2du Y'YH) yjyG1:2,
which insertedinto (C.9) gives the result(5.1).

1578 S0REN JOHANSEN
The consistent estimator (5.6) for the asymptotic conditional variance is found from Lemma C.2
below, for the choice K = I - f3Bc', which satisfies K'/3= 0.
Finally we note that the normalization V'ShhbV= I (see (3.2)) implies that HSh-h bH' =
HVV'H' = /3/3'+ Huu'H'. Since (I - 8cc')P = 0, (I - Pcc')P is Op(T- 1) and (I - Pcc'YHM is
Op(T- 2), such that
T(I - cP' ) = T(I-Pc')'HSi bH(I-

cP1') + op(l)
-pcc')'HSj-,'H(I
= T(I-/ccc')'HiWiH'(I-c.3') +op(l).
Hence one can apply either of these in the consistent estimation of the asymptotic conditional
variance. The relation (A.6) from Lemma A.1 gives the identity (G'A a = diag(Al - 1. Ar -
1). This completes the proof of Theorem 5.1.
Consistent Estimates of the Asymptotic Conditional Variance
The next results are needed for the consistent estimation of the limiting conditional variance in
the limiting distribution of 8c. We take the coordinates (13,YH,TH) where YH(P X (p - s - 1)) is
chosen such that (13,YH, TH) span sp(H). We let u = (r+ , ,) (see (3.2) with the normalization
U Shh.b= I)
LEMMA C.2: If K'/3 = 0, and TH * 0, then
TK'Huu'H'K -* K'YH (YJf G1.2G'2 du y'YH) YK.
If TH = 0 then this results holds with YH(P X (p - s)) chosen such that (13, YH)span sp (H) and G1.2
is replaced by G1.
PROOF: We first expand
(C-10) Hu =,3e + yHg + rHf

and then note that from
U ShaobSia' bSahbi = diag(Ar+1-,A s) E Op(T-1)
it follows that VAand hence also the coordinates (e, g, f ) are OP(T- 1/2). From the normalization
P
iYShh.bU= I, it even follows that f E Op(T- 1)and that, since e -* 0, we have
(YHg + rHf )'Skk.b(YHg + THf ) I
and hence that

P
(C.11) (YHg + rHf)Skk(YHg +THf -*->I.
Note finally that from (C.10) we have, since K'/3 = 0, that
(C.12) K'Hu = K'(YHg + rHf ) = K'(yH, TH)(g', f')'
Now insert (C.11) and (C.12) into TK'Huu'H'K and we get
(C.13) TK'(YHg + rHf ){(YHg + THf) Skk(YHg + T'Hf)} (YHg + THf) K
= TK'(YH,rH) { (YH,'rH )Skk (YH,rH))} (YH,rH ) K
= K'BT(T-'B'TSkkBT) B K,
for BT = (YH, T- 2TH). The terms involving T- 2TH are of smaller order of magnitude than the
terms involving yH and hence (C.13) converges to
K'YH ( y (J GIG' du - (fGIG2 du) (fG2G2 dU) (G2G' du)})'yH) HK.
If TH = 0 we can drop the terms involving TH and choose yH orthogonal to /3 such that they span
sp (H) and apply Lemma A.3 again.
REFERENCES
AHN, S. K., AND G. C. REINSEL (1990): "Estimation for Partially Non-stationary Multivariate
Autoregressive Models," Journal of the American Statistical Association, 85, 813-823.
ANDERSON, T. W. (1951): "Estimating Linear Restrictions on Regression Coefficients for Multivari-
ate Normal Distributions," Annals of Mathematical Statistics, 22, 327-351.
(1971): The Statistical Analysis of Time Series. New York: Wiley.
BOSSAERT, P. (1988): "Common Nonstationary Components of Asset Prices," Journal of Economic
Dynamics and Control, 12, 347-364.
Box, G. E. P., AND G. C. TIAO (1981): "A Canonical Analysis of Multiple Time Series with
Applications," Biometrika, 64, 355-365.
DUNSMUIR, W., AND E. J. HANNAN (1976): "Vector Linear Time Series Models," Advances in
Applied Probability, 8, 339-364.
ENGLE, R. F., AND C. W. J. GRANGER (1987): "Co-integration and Error Correction: Representa-
tion, Estimation, and Testing," Econometrica, 55, 251-276.
ENGLE, R. F., AND B. S. Yoo (1989): "Cointegrated Economic Time Series: A Survey with New
Results," Discussion Paper 89-38, University of California, San Diego.
FOUNTIS, N. G., AND D. A. DicKEY (1989): "Testing for Unit Root Nonstationarity in Multivariate
Autoregressive Time Series," Annals of Statistics, 17, 419-428.
GONZALO, J. (1989): "Comparison of Five Alternative Methods of Estimating Long-Run Equilib-
rium Relationships," Discussion Paper 89-55, University of California, San Diego.
GRANGER, C. W. J. (1983): "Cointegrated Variables and Error Correction Models," Discussion
Paper, 83-13a, University of California, San Diego.
(1981): "Some Properties of Time Series Data and their Use in Econometric Model
Specification," Journal of Econometrics, 16, 121-130.
GRANGER, C. W. J., AND A. A. WEISS(1983): "Time Series Analysis of Error Correcting Models," in
Studies in Econometrics, Time Series and Multivariate Statistics, ed. by S. Karlin, T. Amemiya, and
L. A. Goodman. New Yofk: Academic Press, 255-278.
HOFFMAN, D., AND R. H. RASCHE (1989): "Long-run Income and Interest Elasticities of Money
Demand in the United States," National Bureau of Economic Research Discussion Paper No.
2949.
JEGANATHAN, P. (1988): "Some Aspects of Asymptotic Theory with Applications to Time Series
Models," The University of Michigan.
JOHANSEN, S. (1988a): "The Mathematical Structure of Error Correction Models," Contemporary
Mathematics, 80, 259-386.
(1988b): "Statistical Analysis of Cointegration Vectors," Journal of Economic Dynamics and
Control, 12, 231-254.
(1990): "A Representation of Vector Autoregressive Processes Integrated of Order 2," to
appear in Econometric Theory.
(1991): "The Statistical Analysis of I(2) Variables," University of Copenhagen.
JOHANSEN, S., AND K. JUSELIUS (1990): "Maximum Likelihood Estimation and Inference on
Cointegration-with Applications to the Demand for Money," Oxford Bulletin of Economics and
Statistics, 52, 169-210.
(1991): "Some Structural Hypotheses in a Multivariate Cointegration Analysis of the
Purchasing Power Parity and the Uncovered Interest Parity for UK," to appear in Journal of
Econometrics.
LUTKEPOHL,H., AND H.-E. REIMERS (1989): "Impulse Response Analysis of Cointegrated Systems
with an Investigation of German Money Demand," Christian-Albrechts Universitat Kiel.
KUNST, R., AND K. NEUSSER (1990): "Cointegration in a Macro-economic System," Journal of
Applied Econometrics, 5, 351-365.
1580 S0REN JOHANSEN
OSTERWALD-LENUM, M. (1992): "A Note with Fractiles of the Asymptotic Distribution of the
LikelihoodCointegrationRank Test Statistics:Four Cases," to appear in OxfordBulletinof
Economics and Statistics.
MOSCONI, R., AND C. GIANNINI (1992):"Non-Causalityin CointegratedSystems:Representation,
Estimation and Testing," to appear in Oxford Bulletin of Economics and Statistics.
PARK, J. Y. (1988):"CanonicalCointegratingRegressions,"CornellUniversity.
PARK, J. Y., AND P. C. B. PHILLIPS(1988): "StatisticalInference in Regressionswith Integrated
Processes: Part 1," Econometric Theory, 4, 468-497.
(1989):"StatisticalInferencein Regressionswith IntegratedProcesses:Part 2," Econometric
Theory, 5, 95-131.
Structurein Time Series," Journal of
PENA, D., AND G. E. P. Box (1987):"Identifyinga Simplifying
the American Statistical Association, 82, 836-843.
PHILLIPS,P. C. B. (1988):"SpectralRegressionfor CointegratedTime Series,"CowlesFoundation
No. 872.
(1990): "Optimal Inference in Cointegrated Systems," Econometrica, 59, 283-306.
PHILLIPS,P. C. B., AND S. N. DURLAUF (1986):"MultipleTime Series Regressionwith Integrated
Processes," Review of Economic Studies, 53, 473-495.
PHILLIPS,P. C. B., AND S. OULIARIS (1988): "Testing for Cointegration using Principal Components
Methods," Journal of Economic Dynamics and Control, 12, 1-26.
PHILLIPS,P. C. B., AND Y. J. PARK (1988): "Asymptotic Equivalence of OLS and GLS in Regression
with Integrated Regressors," Journal of the American Statistical Association, 83, 111-115.
PHILLIPS,P. C. B., AND B. E. HANSEN (1990): "StatisticalInferencewith I(1) Processes," Review of
Economic Studies, 57, 99-124.
REINSEL, G. C., AND S. K. AHN (1990): "VectorsAR Modelswith Unit Roots and Reduced Rank
Structure:Estimation,LikelihoodRatio Test, and Forecasting,"Universityof Wisconsin.
SIMS,C. A., J. H. STOCK,AND M. W. WATSON (1990): "Inferencein LinearTime Series Modelswith
some Unit Roots," Econometrica, 58, 113-144.
STOCK, J. H. (1987):"AsymptoticPropertiesof Least SquaresEstimatesof CointegrationVectors,"
Econometrica, 55, 1035-1056.
STOCK, J. H., AND M. W. WATSON (1988): "Testing for Common Trends,"Journal of the American
Statistical Association, 83, 1097-1107.
Tso, M.K.-S. (1981): "Reduced-Rank Regression and Canonical Analysis," Journal of the Royal
Statistical Society, Series B, 43, 183-189.
VELU, R. P., G. C. REINSEL, AND D. W. WICHERN (1986): "Reduced Rank Models for Multiple
Time Series,"Biometrika,73, 105-118.
VELU,R. P., D. W. WICHERN, AND G. C. REINSEL (1987): "A Note on Non-stationary and Canonical
Analysis of Multiple Time Series Models," Journal of Time Series Analysis, 8, 479-487.
WHITE, H. (1984): Asymptotic Theoryfor Econometricians. New York: Academic Press.

Estimation and Hypothesis Testing of Cointegration Vectors in Gaussian Vector Auto Regressive Models

Uploaded by

Document Information

Original Description:

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Estimation and Hypothesis Testing of Cointegration Vectors in Gaussian Vector Auto Regressive Models

Uploaded by

Copyright:

Available Formats

Estimation and Hypothesis Testing of Cointegration Vectors in Gaussian Vector

ESTIMATIONAND HYPOTHESISTESTINGOF COINTEGRATION

KEYWORDS: Cointegration, error correction models, maximum likelihood estimation,

The structureof the paper is the following:Section 2 describesthe cointegra-

2. THE STATISTICAL ANALYSIS OF THE VAR MODEL FOR COINTEGRATION AND

(2.1) AXt= E riAXt-i + Xt-k + PDt + / + Et ( T),

where Dt are seasonal dummiesorthogonalto the constant term. Further, Et

Model (2.1) is denoted by H1 and we formulatethe hypothesisof (at most) r

(2.4) Mij = T 1 ZitZjt (i, j=0,1,k),

This functionis easily minimizedfor fixed f3 to give

involves solving (2.11) which amounts to calculating the singular values of

(2.15) tr ((dB)F' [FF'du] F(dB)'},

where F' = (FM,F2), and

(2.16) Fli(t)= Bi(t)-fBj(u) du (i=1,.*.,p-r-1)

The analysis is now performed as above by defining Z* = AXt, Z* =

Finallywe test the hypothesis H2 in H2 by a likelihood ratio test, i.e., test

THEOREM 2.3: The asymptotic distribution of the likelihood ratio test

The distributionsderivedin this section have been tabulatedby simulationin

3. THE TEST OF HYPOTHESESON THE COINTEGRATING RELATIONS AND THE

The purpose of fitting the VAR model and determiningthe cointegrating

Shh.b = H'SkkH - H'SkOB(B'SOOB) BSok H,

5aa.b =ASOO A -A'Soo B(B'S00 B) 1B'SOOA,

THEOREM 3.1: Under hypothesis H3: /3 = H(p and a = A p where H is p X s and

(3.3) 1=H( v, ... ,vr),

(3.4) &=A(A'A) Sak.bf.

(3.5) Abb =Sbb,

( 3.6) A ab = Sab A aff Skb,

(3.7) Aaa.b = Saa -A' A.

(3.8) LmaxT = ASolIH(1-Xi)

which is asymptotically X2 distributed with (p - m)r + (p - s)r degrees of free-

The proofs of these results are given in AppendixB and C.

4. GRANGER'S REPRESENTATION THEOREM

When we want to investigatethe distributionalpropertiesof the estimators

it holds that IH(z) I = 0 impliesthat either Iz > 1 or z = 1. This guaranteesthat

THEOREM 4.1: (Granger'sRepresentationTheorem). Let the processXt sat-

(4.5) C = P_L(af .l31_)- 1a1

(4.9) E,l3'Xt a-(a) -at}.1u+ (afa) -af 2X38( af -1F aIA

where St = C1(L)Et, and /'X0 = f'So.

a', VI(L) AXt=a1(Et + + Dt).

To discussthe propertiesof the process Xt we solve the equationsfor Xt and

Yt')'=A(L) l(a, a,l )'(E-t + A + ODt) .

representation(4.11) and hence containsthe nonstationarycomponentCE= lis

5. ASYMPTOTIC PROPERTIES OF THE ESTIMATORS UNDER THE ASSUMPTION

The asymptoticpropertiesof the estimatorsare here given under the hypoth-

TJ = PH1r which is also orthogonal to /, since I,3rH=P PHr = ,B3 = 0. Next

THEOREM5.1: Suppose hypothesis H3: a =Ai and /3= Hp is satisfied. If

(5.1) (I-8cC) cYH y( yG2GG2 du Y'YH) YiyfG2(dVa)'(C3)

(5.2) Va= (a'A-a) a'A-A'W

(5.3) G1(t) = i'C(W(t) - W(u) du) (t e[0,1]),

G12= G1 - (G1G2 du )(G2G2du )G2

(5.5) (I CC')YH (Yv f Gi2G

which is consistently estimated by

(5.6) T(I-f3PCC)HtA H'(I-cf3'c) ? (c'l''ftc)

c'HAt Hc =c'f6(diag -A C,...,7 .

The proof is given in Appendix C. The stochasticintegralsare all taken on

(5.7) (I-( CC)YH(YHYJG12G12 du Y y(I I- c(' )

THEOREM 5.2: Suppose hypothesis H3: a =Aqp, = H>p, and /i = a,30 is

(5.8) T(I-PCc')(H' ? V (H,O)'(I- X

G2 1= G*- G*G* du (G*G* du) G1*;

(5.9) T2(/3 G-G/3)

and a consistent estimator for the asymptotic conditional variance is

(5.10) T(0,)Vartv(0,1)e v (CtfAro (3)

COROLLARY 5.3: If only 1 cointegration vector /3 is present (r = 1), and if we

G2 1= G- GG* du (GG du) G1*;

(A.11) T- 1B'S B* wJGG

Finally P'SP* 2 Xkk pS P'X,0 and 00-

(B.8) IASk-SoSo S-kI = 0;