Professional Documents
Culture Documents
Wooldridge, Ch.2
School of Finance
Renmin University of China
Spring 2013
1 / 110
R EGRESSION A NALYSIS
R EGRESSION A NALYSIS
2 / 110
3 / 110
4 / 110
(1)
5 / 110
E STIMATION OF PARAMETERS
E STIMATION OF PARAMETERS
6 / 110
E(ui ) = 0,
E(xi ui ) = 0,
E(u2i
2 ) = 0.
7 / 110
8 / 110
Choose estimates b
and b
to get "best fit" in the sense of
minimizing
S(, ) = [yi ( + xi )]2 .
ubi
b2i
u
[yi (b + bxi )] = 0,
xi ubi =
xi [yi (b + bxi )] = 0,
S(b
, b
)
S(b
, b
)
= 0
9 / 110
xi [yi (b + bxi )]
i
ubi = 0
xi ubi = 0
(2)
(xi x)(yi y)
b
=
It is clear that the normal equations imply that the OLS estimates
of and are equal to the corresponding MME previously.
10 / 110
The solution of b
and b
which minimize the objective function
S(, ) are,
(3)
= 0
= 0
[yi (b + bxi )]
11 / 110
(xi x)2
i
b
= yb
x
12 / 110
Let b
yi = ( b
+b
xi ) denote a typical OLS predicted value, then the
normal equation for OLS yield several results.
yi
f ( yi ) = L
i
b2 =
equals OLS. The MLE of 2 is
n 1
ub2i = MM estimate.
13 / 110
=
=
=
i
i
b
y2i
+
i
b2i
u
+0
14 / 110
G OODNESS OF F IT
G OODNESS OF F IT
(yi n1 yi )2
y2i
n
[y ( + xi )]2
l(, , 2 ) = ln(22 ) i
.
2
22
i
byi ubi
15 / 110
16 / 110
Var(b
) =
2
(xi x)2
i
2
)
+ x2 Var(b
n
Cov(b
, b
) = xVar(b
)
Var(b
) =
17 / 110
18 / 110
b
= + wi ui
ub2i = RSS 2 2 (n 2)
b
= + zi ui
is independent of b
and b
.
19 / 110
20 / 110
S TATISTICAL I NFERENCE
S TATISTICAL I NFERENCE
S TOCHASTIC
Hence,
b2i ) = 2 (n 2)
E( u
i
s2 =
ub2i
i
n2
b2 = [(n 2)/n] s2 is
is unbiased. The ML estimator, however,
biased (of course when sample size gets relatively small).
21 / 110
S TATISTICAL I NFERENCE
S TATISTICAL I NFERENCE
22 / 110
( xi x ) 2 > 0
1
23 / 110
b
and b
are N (, var(b
)) and N ( , var(b
)), respectively, so that
q
z(b
) = (b
)/ var(b
) N (0, 1)
q
) N (0, 1)
z( b
) = (b
)/ var(b
b2i 2 2 (n 2) independently of b
RSS = u
and b
, so
RSS
2
b
(n 2) independently of z(b
) and z( ), so
2
t(b
) =
t( b
) =
J IANHUA G ANG (RUC)
q
q
z(b
)
RSS
(n2) 2
z( b
)
RSS
(n2) 2
t(n 2)
t(n 2)
24 / 110
)
2
Hence,
t( b
) =
s2
2
s2
s2
)
=
(
)(
)
=
2
SXX 2
SXX
SXX
25 / 110
t(b
) =
SE(b
)
t(n 2)
)
(b
t(n 2),
SE(b
)
26 / 110
prob(d1 t(n 2) d1 ) = (1 )
t( b
) =
(b
)
t(n 2),
SE(b
)
t0 (b
) =
(b
0 )
t(n 2).
SE(b
)
respectively.
and so if H0 is true,
b
d1 SE(b
)
b
)
d1 SE(b
27 / 110
28 / 110
2
3
H1 : 6= 0 reject H0 if |t0 (b
)| > d1 ,where
prob(t(n 2) > d1 ) = /2
H1+ : > 0 reject H0 if t0 (b
) > d2 ,where prob(t(n 2) > d2 ) =
H1 : < 0 reject H0 if t0 (b
) < d2 ,where
prob(t(n 2) < d2 ) =
Just replace by and b
by b
in the above to obtain test procedures
for (the intercept).
29 / 110
E(b
|x1 , ..., xn ) = , E(b
|x1 , ..., xn ) = and E(s2 |x1 , ..., xn ) = 2 .These
expectations do not depend upon the x values and so OLS
estimators are unconditionally unbiased. Similar remarks apply to
probability limits;
var(b
|x1 , ..., xn ), var(b
|x1 , ..., xn ) and cov(b
, b
|xx1 , ..., xn ),as given
above, do depend on the x values, and so do not correspond to
unconditional characteristics.
Fortunately, 2 does not pose major problems for inference. The
variables (b
) /SE(b
) and (b
)/SE(b
) are, given x values,
both distributed as t(n 2), still. This distribution does not depend
on x values, but just on the values of (n 2). Hence the t tests
and confidence intervals described above are unconditinally valid.
30 / 110
31 / 110
32 / 110
P REDICTION
P REDICTION
P REDICTION
P REDICTION
yf = + xf + uf , uf N (0, 2 ).
yf = + xf + uf , uf N (0, 2 ).
33 / 110
34 / 110
READING
Wooldridge, Ch.3, 4
and j are parameters/coefficients.
Regressors xji vary with i, but nonrandom (nonstochastic, i.e. fixed
in repeated sampling).
can be regarded as an intercept with = E(yi ), given all xji = 0.
Slopes j can often be regarded as partial derivatives: j =
E(yi )
xji .
35 / 110
36 / 110
j
n
37 / 110
38 / 110
39 / 110
40 / 110
= 0
= 0
ubi
S(, 1 , ..., k ) =
yi
+ j xji
!#2
xji ubi
i
ubi
bi is the residual yi (b
for j = 1, ..., k, where u
+b
j xji ), i = 1, ..., n.
xji ubi
b2i .
b 2 = n1 u
= 0
b
j xji
+b
, i = 1, ..., n.
= 0
42 / 110
S(, 1 , ..., k )
n
.
l(, 1 , ..., k , ) = ( ) ln(22 )
2
22
2
43 / 110
44 / 110
yi
byi ubi
bi + bj xji u
bi = 0
= b
u
y2i =
i
(byi n byi )
i
(byi + ubi )2
i
( yi n y i )
(b + bj xji )ubi
i
b2i
+u
i
45 / 110
G OODNESS OF F IT
G OODNESS OF F IT
46 / 110
Coefficient of determination
is index of goodness of fit of OLS
ESS
RSS
2
line with R = TSS = 1 TSS , 0 R2 1.
2
b
j xj ,
= yb
j
b
j =
exji yi
i
ex2ji
where e
xji is the ith residual from the OLS regression of the
jth regressor on the other (k 1) regressors and the intercept term.
J IANHUA G ANG (RUC)
47 / 110
48 / 110
exji ui
i
ex2ji
= j +
exji ui
i
RSSj
49 / 110
2 2
= RSS (n k 1)
is unbiased.
50 / 110
independently of b
and bj , j.
var(b
j ) = 2 /RSSj , j = 1, ..., k.
E( b
j ) = j
b2i
u
51 / 110
b
)) and N ( j , var( bj )), respectively, so that
and b
j are N (, var(b
z(b
) = (b
)/
RSS =
var(b
) N (0, 1)
q
z( bj ) = ( bj j )/ var( bj ) N (0, 1).
52 / 110
C ONFIDENCE I NTERVALS
simlarly
t( bj ) = bj j /SE( bj ) t(n k 1)
) /SE(b
) t(n k 1)
t(b
) = (b
53 / 110
j0
54 / 110
C ONFIDENCE I NTERVALS
55 / 110
j0
prob(t(n k 1) > d2 ) =
56 / 110
E XAMPLE
Suppose that the null hypothesis to be tested is denoted by H0 and
consists of several linear restrictions on the parameters of the
regression model. Thus H0 specifies the values of, say, q < (k + 1)
linear combinations of the regression coefficients. For example, with
k = 4 and q = 3, H0 could consist of the following restrictions:
+ 1 = 0; 2 = 1; and 4 = 0. We now need a joint test of all the
restrictions of H0 ,rather than a collection of separate t-tests.
57 / 110
58 / 110
P REDICTION
P REDICTION
D EFINITION
Define the F statistic by the following equation
F=
,
RSS(H1 )
q
yf = + j xjf + uf , uf N (0, 2 ).
j
59 / 110
OLS estimators use the data for i = 1, ..., n. This predictor is BLUE
for E(yf ) = + xf .
The predictor b
yf is a linear combination of the OLS estimators and
so is normally distributed. The variance of b
yf can be estimated, and
confidence intervals and tests of hypotheses are feasible.
J IANHUA G ANG (RUC)
60 / 110
P REDICTION
P REDICTION
READING
Wooldridge, Ch.3.
61 / 110
M ULTICOLLINEARITY
M ULTICOLLINEARITY
62 / 110
M ULTICOLLINEARITY
M ULTICOLLINEARITY
"
(xji xj )
i
1 R2j .
63 / 110
64 / 110
M ULTICOLLINEARITY
M ULTICOLLINEARITY
M ULTICOLLINEARITY
M ULTICOLLINEARITY
65 / 110
M ULTICOLLINEARITY
66 / 110
M ULTICOLLINEARITY
READING
Wooldridge, Ch.3, Ch. 7, Ch. 9.
67 / 110
68 / 110
C ONSEQUENCES
C ONSEQUENCES
C ASE 1
C ASE 2
69 / 110
2. May have omitted some relevant regressors: Write the conditional mean
function as E(yi |xji , j = 1, ..., k) = + j xji + E(fi |xji , j = 1, ...k.),
j
C ONSEQUENCES
70 / 110
C ASE 3
If have strong belief about the omitted factor, can use precise test.
For example, if sure that fi is a linear combination of q variables zji ,
can apply F-test of H0 : 1 = ... = q = 0 in the expanded model
yi = + j xji + ui , ui NID(0, 2 ),
72 / 110
yi )j+1 + ui , ui NID(0, 2 ).
yi = + j xji + j (b
j
yi = + j xji + ui , ui NID(0, 2 ), if i 1 ,
Notes:
1
2
3
4
5
6
7
No b
yi term because this is a linear combination of the intercept term
and the regressors xji ;
F-test is valid even though added variables are random;
Choice of q has impact on power;
No rule for determining the best value of q;
Often use quite small values of q, e.g. 1 or 2;
Cannot expect RESET to indicate how a model should be re-specified;
Cannot assume RESET will always have high power.
73 / 110
yi = + j xji + ui , ui NID(0, 2 ), if i 2 .
j
74 / 110
and
F=
75 / 110
76 / 110
T REATMENT
T REATMENT
READING
Wooldridge, Ch.5.
77 / 110
N ON - NORMAL D ISTURBANCES
N ON - NORMAL D ISTURBANCES
78 / 110
C ONSEQUENCES
C ONSEQUENCES
OLS estimators are still BLUE, but, in general, are NOT normally
distributed. Therefore the t and F tests are no longer valid in
finite samples.
yi = + j xji + ui , i = 1, ..., n,
j
79 / 110
80 / 110
T EST P ROCEDURES
T EST P ROCEDURES
T REATMENT
T REATMENT
81 / 110
82 / 110
H ETEROSKEDASTICITY-I NTRODUCTION
H ETEROSKEDASTICITY-I NTRODUCTION
READING
83 / 110
84 / 110
C ONSEQUENCES OF H ETEROSKEDASTICITY
C ONSEQUENCES OF H ETEROSKEDASTICITY
Goldfeld-Quandt Test
ji
var(b
j ) =
"i
ex2ji 2i / ex2ji
i
#2
2
= e
x2ji u2i / RSSj which is not equal
i
T ESTSFOR H ETEROSKEDASTICITY
85 / 110
T ESTSFOR H ETEROSKEDASTICITY
86 / 110
T ESTSFOR H ETEROSKEDASTICITY
Goldfeld-Quandt Test
Let RSS1 and RSS2 denote the OLS residual sum of squares
functions for estimation using the first m and last m observations,
respectively. Under the null hypothesis of homoskedasticity, the
statistic GQ = RSS2 /RSS1 is distributed as F(m k 1, m k 1)
and large values indicate data inconsistency of null hypothesis.
87 / 110
88 / 110
T ESTSFOR H ETEROSKEDASTICITY
T ESTSFOR H ETEROSKEDASTICITY
form
2i
not be specified.
89 / 110
T ESTSFOR H ETEROSKEDASTICITY
90 / 110
T ESTSFOR H ETEROSKEDASTICITY
91 / 110
92 / 110
T REATMENT OF H ETEROSKEDASTICITY
T REATMENT OF H ETEROSKEDASTICITY
T REATMENT OF H ETEROSKEDASTICITY
T REATMENT OF H ETEROSKEDASTICITY
WSE(b
j ) =
ex2ji u2i /
i
2
RSSj ,
93 / 110
T REATMENT OF H ETEROSKEDASTICITY
T REATMENT OF H ETEROSKEDASTICITY
94 / 110
T REATMENT OF H ETEROSKEDASTICITY
T REATMENT OF H ETEROSKEDASTICITY
prob(d1 N (0, 1) d1 ) = (1 ),
the (1 ) 100 per cent confidence intervals for and j are
) and b d1 WSE( b ), respectively.
given by, b
d1 WSE(b
j
under H0 .
b
b
b
tW
0 ( j ) = ( j j0 ) /WSE( j )N (0, 1)
95 / 110
96 / 110
T REATMENT OF H ETEROSKEDASTICITY
T REATMENT OF H ETEROSKEDASTICITY
b ) > d1 , where
H1 : j 6= j0 reject H0 if tW
(
0
j
j0
prob(t(n k 1) > d2 ) = ;
Just replace j by and b
by b
in the above to obtain test
procedures relevant to testing hypotheses concerning the
intercept.
97 / 110
C ONSEQUENCES OF A UTOCORRELATION
jt
exjt ut
t
98 / 110
C ONSEQUENCES OF A UTOCORRELATION
xjt ut ) and so
6= var(e
t
var(b
j ) 6= 2 /RSSj .Conventional standard errors are, therefore,
biased.
99 / 110
100 / 110
L EMMA
Values of d close to 0 (resp. 4) indicate high level of positive (resp. negative)
residual first order serial correlation. The distribution of d under null
hypothesis of independent errors depends upon values of regressors, so critical
values vary from one case to another.
where,
r(1) =
101 / 110
102 / 110
Have tables for combinations of n and k (and for models with and
without an intercept) giving bounds for the critical values for
testing H0 of serial independence against H1 : (1) > 0.These
upper and lower bounds, denoted by du and dl , define an interval
that contains the true known critical value. If d < dl , reject.If
d > du , accept.If dl d du ,the test is inconclusive. For
H1 : (1) < 0,use 4 du and 4 dl as bounds.
103 / 110
104 / 110
E STIMATION
E STIMATION
econometric softwares.
btj are lagged values of the residuals from the OLS estimation
the u
k
btj = 0.
selected j terms. If t j is not positive, set u
J IANHUA G ANG (RUC)
105 / 110
106 / 110
107 / 110
108 / 110
109 / 110
110 / 110
Jianhua Gang
School of Finance
Renmin University of China
Spring 2013
1 / 18
M OMENTS
o
Covariance: E (Yt t )(Yt+j t+j ) = t (j)
Correlation:
2 / 18
M OMENTS
3 / 18
t (j)
t t+j
4 / 18
O PERATORS
O PERATORS
P ROBLEM
Suppose {Y1 , Y2 , ..., Yt , Yt+1 , ..., YT1 , YT } is a single realization from a
stochastic process {Yt }
.We are interested in the model that generated
the time series, but we do not know it. How can we make inference, using
one single realization?
Lag operator: L
L Yt = Yt1
So, L1 Yt = Yt+1
S OLUTION
We must use the fact that this is a T-dimensional observation:
5 / 18
R ESTRICT H ETEROGENEITY
R ESTRICT H ETEROGENEITY
6 / 18
R ESTRICT H ETEROGENEITY
R ESTRICT H ETEROGENEITY
E(Yt ) = , t
E (Yt )(Yt+j )
= (j), t
i.e. the first two moments are finite and do not depend on time
(spatial equivalent).
7 / 18
8 / 18
R ESTRICT H ETEROGENEITY
R ESTRICT HETEROGENEITY
For any j1 , ...jn , the joint distribution of Yt+j1 , ..., Yt+jn and of
Yt+ +j1 , ..., Yt+ +jn is the same for any .
1
|j | < .
j=0
D EFINITION
One restriction on the dependence that allows to consistently estimate
the population moments using the sample moments in stationary
processes is called Ergodicity.
9 / 18
R ESTRICT D EPENDENCE
10 / 18
11 / 18
12 / 18
F ORECAST
W OLD D ECOMPOSITION
W OLD D ECOMPOSITION
Of course, in some cases a non-linear forecast may be better.
(m) (m)
(m)
(1 , 2 , ..., m )
(m)
(m)
characterise a good
(m)
where
E(Yt+1 Xt ) E(Xt Xt ) = 0
1
b
E(Xt Yt+1 )
= E(Xt Xt )
0 = 1, 2j <
j=0
13 / 18
W OLD D ECOMPOSITION
W OLD D ECOMPOSITION
14 / 18
I MPULSE R ESPONSE
I MPULSE R ESPONSE
For a process Yt that admits
Yt = + j tj
j=0
E(t ) = 0, E(2t ) = 2 ,
E(t s ) = 0, s 6= t.
notice that
Yt
= j
tj
15 / 18
16 / 18
ACF
A UTOCORRELATION F UNCTION
D EFINITION (PACF)
D EFINITION (ACF)
(m)
b t+1|t,...,tm+1 = (m) Yt + (m) Yt1 + ... + m
Y
Ytm+1
1
2
j
0
(1)
(2)
(m)
PACF
17 / 18
18 / 18
Jianhua Gang
School of Finance
Renmin University of China
Spring 2013
S PRING 2013
1 / 19
P RELIMINARIES
P RELIMINARIES :
P RELIMINARIES :
B INOMIAL D ISTRIBUTION
Define,
f (x) =
x x
S PRING 2013
2 / 19
P RELIMINARIES
Define as,
x sample space;
x random variable
f (x) = 1;
Z x
Pr {x x } =
f (x)dx = 1.
S PRING 2013
3 / 19
n!
px (1 p)nx , for x = 1, 2, ..., n.
x!(n x)!
(a + b)n =
n!
x=0
S PRING 2013
4 / 19
P RELIMINARIES
P RELIMINARIES :
P RELIMINARIES :
P OISSON D ISTRIBUTION
N ORMAL D ISTRIBUTION
P RELIMINARIES
Define as,
e x
, for x = 1, 2, ..., n.
x!
The density arises from the identity of:
f (x) =
Define as,
o
n
(x )2
exp 22
f (x) =
22
written as x N , 2 , where < x < .
e =
x
x!
x=0
in which = E(x).
S PRING 2013
5 / 19
6 / 19
x f (x) discrete
x x
Z
E(x) =
x f (x) dx continuous
x x
g (x) f (x)
x x
Z
E {g(x)} =
g (x) f (x)dx
S PRING 2013
o
n
i = E (x )i
x x
S PRING 2013
7 / 19
S PRING 2013
8 / 19
C ALCULATION OF M OMENTS
4 :
E {c g(x)} = c E {g(x)}
E {a + b g(x)} = a + bE {g(x)}
and hence,
3
3
4
4
o
n
2
2 = E (x )2 = E x2 [E(x)]
S PRING 2013
9 / 19
x x
so that,
S PRING 2013
10 / 19
#
2 x2
1 + x +
+ ... f (x)dx
2!
di [Mx ()]
i
2
3
i
2 + 3 + ... + i + ...
2!
3!
i!
d
Hence we call the function Mx () the MGF of x. Note that this
property is true in either the discrete or the continuous case.
x x
S PRING 2013
"
= 1 + 1 +
x x
)
2 2
x
Mx () = E ex = E 1 + x +
+ ...
2!
)
"
#
(
Z
(x)i
(x)i
=
= E
i! f (x)dx
i!
i=0
i=0
e f (x)
x x
n o
Z
Mx () = E ex =
ex f (x)dx
11 / 19
S PRING 2013
12 / 19
E XAMPLE OF MGF
A N E XAMPLE
It is also easy to see that the MGF satisfies two very important
properties.
E XAMPLE
Observations x1 through xn which are independent copies from r.v.
x Po ().Suppose were interested in the properties (distribution,
moments, etc.) of the sample mean:
X=
Mn1 xi =
Mn 1 xi
1n
i=1
n
1
n
= Mxi ( ) = Mx ( )
n
n
i=1
1 n
Xi
n i
=1
S PRING 2013
13 / 19
E XAMPLE OF MGF
A N E XAMPLE
S PRING 2013
14 / 19
E XAMPLE OF MGF
A N E XAMPLE
P ROBLEM
Calculate the MGF of Sn = nX;
P ROBLEM
Calculate the MGF of X;
S OLUTION
S OLUTION
n o
Mx () = E ex =
MSn
() =
x
e
e x
= e
= e
x!
x!
x=0
x=0
n
o
o
n
= e exp e = exp e 1
x
S PRING 2013
n
oin
exp e 1
n
o
= exp n e 1
x=0
i=1
ex f (x)
Mx () =
Note that the MGF of Sn is of the same form as that for x, i.e. letting = n
n
o
MSn () = exp e 1
i.e. Sn Po (n) = Po ( ).
15 / 19
S PRING 2013
16 / 19
E XAMPLE OF MGF
A N E XAMPLE
E XAMPLE OF MGF
A N E XAMPLE
P ROBLEM
The moments of X.
P ROBLEM
Calculate the MGF of X;
S OLUTION
S OLUTION
h
E X
Mxi ( )
n
n
n
i=1
n h
oi
n
n
= Mx ( ) = exp e n 1
n
o
n
= exp n e n 1
MX () = M Sn
( ) = M xi ( ) =
S PRING 2013
h 2i
E X
=
2X
17 / 19
E XAMPLE OF MGF
2X =
n
If we consider X as an estimator for , we refer to these properties
as unbiasedness, and given the consistency, that is the variance
tends to be zero.
S PRING 2013
E X =
A N E XAMPLE
i
i
19 / 19
o
n
di exp n(e n 1)
i
n d
o
d exp n(e n 1)
o
n d
2
d exp n(e n 1)
2
| =0
| =0 =
| =0 = 2 +
i d
2
= E X E X
= (central moments)
n
h
S PRING 2013
18 / 19
3 C REDITS , 51 H OURS
Yt = +
j t j
j =0
Jianhua Gang
for the impulse response analysis and for forecasting.
School of Finance
Renmin University of China
Spring 2013
1 / 47
2 / 47
If t is w .n.(0, 2 ),
D EFINITION
{ t }
is white noise if:
E ( t ) = 0t
and if Yt = +
E (2t ) = 2 t
(j )
so, j = 0, j = 0, and j
and if Yt = +
mean if
is stationary if
j t j ,then Yt
3 / 47
j =0
j =0
j < .
j =0
2j < ;
j t j ,then Yt
j =0
E ( t s ) = 0t (t 6 = s )
4 / 47
MA(1)
MA(1)
I NVERTIBLE MA(1)
I NVERTIBLE MA(1)
Rewrite t = Yt t 1 as t = Yt Lt
2j = 1 + 2 < .
j =0
Otherwise, we can check that the first two moments do not depend on
time.
1
2
Mean: E (Yt ) =
Autocovariances:
= E [(Yt )2 ] = E [(t + t 1 )2 ] = (1 + 2 )2
= E [(Yt )(Yt 1 )] = 2
= 0
i.e. Yt =
j =1
5 / 47
MA( Q )
6 / 47
MA( Q )
MA( Q )
0
j q
= E [(t + 1 t 1 + ... + q t q ) ]
= (1 + 21 + ... + 2q )2
= E [(t + 1 t 1 + ... + q t q )
(t j + 1 t 1 j + ... + q t q j )]
j >q
MA( Q )
( )j Yt j + t .
Autocorrelations: 1 = 2 , j 2 = 0.
1 +
Yt
= Yt ( )j Lj = ( )j Yt j ,
(1 + L)
j =0
j =0
0
j 2
3
t =
= ( j + 1 j +1 + 2 j +2 + ... + q j q )2
= 0
7 / 47
8 / 47
MA( INFINITY )
MA( INFINITY )
AR(1)
j t j
AR(1)
j =0
j =0
by j in a
Yt
= c + (c + Yt 2 + t 1 ) + t
= (1 + )c + 2 Yt 2 + t 1 + t
= ...iterating
Yt
Mean: E (Yt ) =
Autocovariances:
2k 2
k k +j 2
k =0
j =0
j =0
as n , and || < 1
n
1
Yt =
c + 0 + j t j
1
j =0
k =0
j c + n +1 Yt n 1 + j t j
9 / 47
AR(1)
AR(1)
10 / 47
AR(1)
AR(1)
the condition
Mean:
j =0
| j | =
j =0
| |j =
j =0
1
1 ||
0 =
j c + j t j
j =0
2k 2 = 2k 2 = 1 2 2
k =0
c
(= )
1
= (1 L)1 c + (1 L)1 t
E (Yt ) =
k =0
k =0
k k +j 2 =
k =0
k k +j 2 =
k =0
2k j 2 =
j
2
1 2
j =0
c
+ j t j
1 j =0
11 / 47
12 / 47
AR(1)
AR(1)
AR(1)
AR(1)
Autocorrelations
Upon knowing that the process is stationary, we could derive the mean and
autocovariances:
j
= j
j =
0
= + Yt 1 + t
Yt = (Yt 1 ) + t
Yt
then
0 = E (Yt )2 = E ((Yt 1 ) + t )2
13 / 47
AR(1)
AR(1)
14 / 47
AR( P )
AR( P )
solving for 0,
2
.
1 2
= E [(Yt )(Yt j )]
0 =
j 1
P ROBLEM
How can we check for stationarity?
= E [((Yt 1 ) + t )(Yt j )]
= E [(Yt 1 )(Yt j )] + E (t (Yt j ))
= j 1
So
j 1 =
S OLUTION
Factoring (1 1 L ... p Lp ) = (1 1 L)...(1 p L) stationary
follows if |j | < 1 for all j.
Another way to state this condition is to check that the solutions of the
equation in z, (1 1 z ... p z p ) = 0 are all OUTSIDE the unit circle.
j
2 .
1 2
15 / 47
16 / 47
AR( P )
AR( P )
AR( P )
AR( P )
Given stationarity,
Autocovariances:
0 = E (Yt )2
j 1
17 / 47
AR( P )
AR( P )
18 / 47
AR( P )
AR( P )
0 = 1 1 + 2 2 + 2
1 = 1 0 + 2 1
2 = 1 1 + 2 0
and notice that 1 = 1 ,so replacing 1 and 2 ,
1
1 2 0
21
=
+ 2 0
1 2
(1 2 )
h
i 2
=
(1 + 2 ) (1 2 )2 21
1 =
2
0
19 / 47
20 / 47
21 / 47
22 / 47
ARMA( P, Q )
ARMA( P, Q )
23 / 47
24 / 47
ARMA( P, Q )
ARMA( P, Q )
ARMA( P, Q )
ARMA( P, Q )
Given the stationarity,
Autocovariances: The autocovariances are a combination between
those of an AR (p ) and an MA(q ), so for j > q,
j = 1 j 1 + ... + p j p
E (Yt ) = E (c + 1 Yt 1 + ... + p Yt p
+t + 1 t 1 + ... + q t q )
= c + 1 + ... + p + 0 + ... + 0
c
=
1 1 ... p
E [(Yt )t ] = E [((Yt 1 ) + t + t 1 )t ]
= 0 + 2 + 0 = 2
E [(Yt )t 1 ] = E [((Yt 1 ) + t + t 1 )t 1 ]
= 2 + 0 + 2 = ( + )2
J IANHUA G ANG (RUC)
25 / 47
ARMA( P, Q )
ARMA( P, Q )
26 / 47
ARMA( P, Q )
ARMA( P, Q )
so
0 = E [((Yt 1 ) + t + t 1 )(Yt )]
so
0
1
j 2 = j 1
and
1 = E [(Yt )(Yt 1 )]
( + )2
= 1+
1 2
( + )2
= 2 + +
1 2
2
27 / 47
28 / 47
ARMA( P, Q )
ARMA( P, Q )
IRF OF ARMA
The autocorrelation can be derived in the same way: for the generic
ARMA(p, q ), for j > q,
=
1
(L) (L) =
(L) =
(1 + 1 L + ... + q Lq ) =
Yt
( L ) 1 (L ) t
(L)
(L) (L)
(1 1 L... p Lp )
(1 + 1 L + 2 L2 + ...)
(1 + 1 L + ... + q Lq ) = 1 1 L + 1 L 2 L2 + 2 L2
3 L3 2 1 L3 1 2 L3 + 3 L3 + ...
29 / 47
IRF OF ARMA
30 / 47
IRF OF ARMA
1 = +
L0 : 1 = 1
1
: 1 = 1 + 1
: 2 = 2 + 1 1 + 2
L
L
j 2 = j 1 = ( + ) j 1
The ARMA(1, 1) could also be decomposed in impulse response by
looking at
Yt = Yt 1 + t , t = t + t 1
L3 : 3 = 3 + 3 + 2 1 + 1 2
31 / 47
32 / 47
IRF OF ARMA
C OMMON FACTORS
C OMMON FACTORS
Then,
Yt
j =0
j =0
j t j + j t j 1
j =0
j t j = j (t j + t j 1 )
j =0
j t j + l 1 t l
j =0
l =1
= t + j 1 t j + j 1 t j
j =1
j =1
= t + ( + ) j 1 t j
j =1
33 / 47
C OMMON FACTORS
E XAMPLE :
34 / 47
is
(1 0.5L)Yt = t
i.e.
Yt = 0.5Yt 1 + t
J IANHUA G ANG (RUC)
35 / 47
36 / 47
F ILTERS
Sometimes data are treated (by nature or by the researcher) by
summing / averaging / differencing ...
For Yt , a filter h(L) is applied as:
Xt = h(L)Yt
where
h (L) =
hj Lj
j =
If
|hj | < ,
j =
j =
| j | <
then
Xt = + (L)t
where
= h(1)c, (L) = h(L)(L)
J IANHUA G ANG (RUC)
37 / 47
38 / 47
S UM OF ARMA PROCESSES
Example:
Yt = Xt + vt
k 1
Yt j
where
j =0
Xt = ut + ut 1
39 / 47
40 / 47
S UM OF ARMA PROCESSES
S UM OF ARMA PROCESSES
In order to find ,compute
j 2
1 =
= (1 + 2 )2u + 2v
= E (Xt + vt )(Xt 1 + vt 1 ) = E (Xt Xt 1 ) + E (vt Xt 1 )
+E (Xt vt 1 ) + E (vt vt 1 )
= 2u
= 0
= 1
1 + 2
solve for :
= 1 + 1 2
q
1 1 421
1,2 =
21
Yt = t + t 1 , t w .n.(0, 2 )
41 / 47
42 / 47
S UM OF ARMA PROCESSES
1 421
21
2 = 2u .
L EMMA
In general, consider
Yt = Xt + Wt
2 ,for
example from
where Xt and Wt are (zero mean) stationary processes such that Xt and
W are not correlated at any t, , then
E (Yt Yt j ) = E (Xt Xt j ) + E (Wt Wt j )
S UM OF ARMA PROCESSES
1 =
For = 1 , where
2u
.
(1 + 2 )2u + 2v
Since in an MA(1),
43 / 47
i.e.
Yj = Xj + W
j
44 / 47
S UM OF TWO MA PROCESSES
S UM OF TWO AR PROCESSES
Suppose,
Yt = Xt + Wt
where
(1 L)Xt = ut , (1 L)Wt = vt ( 6= )
then
L EMMA
If Xt is MA(q1 ) and Wt is MA(q2 ), then Yt is MA(max[q1, q2 ]).
(1 L)(1 L) (Xt + Wt )
= (1 L)ut + (1 L)vt
So Yt is ARMA(2, 1). (If = , Yt is AR (1) ).
J IANHUA G ANG (RUC)
45 / 47
S UM OF T WO ARMA P ROCESSES
L EMMA
If Xt is ARMA(p1 , q1 ), Wt is ARMA(p2 , q2 ), then Yt is ARMA(p, q ) with
p p1 + p2
and
q max(p1 + q2 , p2 + q1 )
47 / 47
46 / 47
Jianhua Gang
School of Finance
Renmin University of China
Spring 2013
1 / 42
Y = (Y1 , ..., YT )
be a Normally distributed vector with
E (Y ) = ,E ((Y )(Y ) ) =
Sample autocovariance
bj =
Sample autocorrelation
Let
Sample Mean
2 / 42
1
Yt
T t
=1
1 T
(Yt Y )(Yt j Y )
T t =
j +1
in the support of Y is
fY T ,...Y 1 (yT , ..., y1 )
bj
b
j =
b0
1
= (2 )T /2 ||1/2 exp( (y ) 1 (y ))
2
3 / 42
4 / 42
E XAMPLES :
1
= (2 )T /2 |()|1/2 exp( (y ) ()1 (y ))
2
is the likelihood function. Maximizing that function w.r.t. yields the
(exact) maximum likilihood estimate.
Note the difference between and .
AR (1)(|0 | < 1) :
Yt = c0 + 0 Yt 1 + t , t Nid (0, 20 )
5 / 42
= (c, , 2 ) , (||
1
2
...
...
() =
1 2
T 2 T 3
T 1 T 2
... T 2 T 1
... T 3 T 2
...
...
...
...
1
...
E XAMPLES :
< 1) and
6 / 42
E XAMPLES :
MA(1)(| 0 | < 1) :
Yt = 0 + t + 0 t 1 , t
Nid (0, 20 )
= (c, , 2 ) , and
() =
(1 + 2 )
2
2
(1 + ) ...
0
0
J IANHUA G ANG (RUC)
(1 + 2 )
...
...
...
0
...
...
...
1
...
(1 + 2 )
time
obs.
...
2
(1 + )
1
0
y1
0.5
y2
0.8
y3
0.2
y4
2
and suppose you want to estimate 0 in the MA(1) model with the
additional assumption that 0 = 0 and 20 = 1: consider five
potential values for 0 : 0.5, 0.25, 0, 0.25, 0.5.
Then, we have to compute () for each : for example, when
= 0.5,
7 / 42
8 / 42
E XAMPLES :
E XAMPLES :
and
then,
() =
0.5 2
(1 +0.5 )
(1 + 0.52 ) ...
0
0
0.5
(1 +0.52 )
1
...
0
0
...
...
...
...
...
0
0
...
1
0.5
(1 +0.52 )
0
0
...
0.5
(1 +0.52 )
1
(y ) ()1 (y )
= 0.5 0.8 0.2 2
1
1.25 0.5
0
0
0.5 1.25 0.5
0
0
0.5 1.25 0.5
0
0
0.5 1.25
0.5
0.8
0.2
2
= 4.6903
9 / 42
E XAMPLES :
10 / 42
E XAMPLES :
So,
1
(2 )T /2 |()|1/2 exp( (y ) ()1 (y ))
2
1
1/2
= (2 )4/2 (1.332)
exp( 4.6903)
2
= 2.1033 103
Therefore, we may get all the likelihoods for different .
0.5 0.25 0
0.25
0.5
103 f 3.178 2.618 2.153 1.967 2.103
11 / 42
12 / 42
E XAMPLES :
ML OF AR(1)
AR(1)
The computation of
Then
1
(2 )T /2 |()|1/2 exp( (y ) ()1 (y ))
2
Yt N
13 / 42
1
c0
, 20
1 0
1 20
fY1 (y1 ; )
2
c
1/2
2
1 y1 1
= (2 )1/2
exp
2
2
1 2
1 2
ML OF AR(1)
AR(1)
14 / 42
ML OF AR(1)
AR(1)
and, by the same arguement,
= (2 )
fY |Y ,...,Y
t
t =2
t 1
where
fY t |Yt 1 ,...,Y 1 (yt |yt 1 , ..., y1 ; )
"
#
1 (yt c yt 1 )2
1/2 2 1/2
= (2 )
exp
2
2
fY 2 |Y 1 (y2 |y1 ; )
1/2
"
#
2
2 1/2
y
1
y
(
)
2
1
exp
2
2
when t = 2, ..., T .
15 / 42
16 / 42
ML OF AR(1)
AR(1)
ML OF AR(1)
AR(1)
The log-likelihood is
T
l () = ln(fY1 (y1 ; )) +
t =2
1
= ln 2
2
1 2
t 1
c
1 y1 1
2
2
2
1
2
1 T (yt c yt 1 )2
T 1
2
ln(2 )
2
2 j =2
2
We then succeed in writing the (log) likelihood in a way that does
not require the inversion of a T T matrix.
17 / 42
ML OF AR(1)
AR(1)
18 / 42
ML OF AR(1)
AR(1)
1 T (yt c yt 1 )2
T 1
ln(22 )
2
2 j =2
2
y
T 1
1
y
(
)
t
t
ln(22 )
2
2
2 j =2
2
T
=
J IANHUA G ANG (RUC)
T 1 1
1
(yt c yt 1 )
+
2
2
2 j =2
( 2 )2
c2 =
(yt y . )(yt 1 y .1 )
t =2
b =
where, y . =
19 / 42
T
1
(yt y . ) (yt 1 y .1 )
T 1 t =2
b
c =
T
1
(yt c yt 1 )2
T 1 j =2
1
T 1
( yt 1 y . 1 ) 2
t =2
T
t =2
t =2
yt , y .1 = T 11 yt 1
I NTRODUCTORY F INANCIAL E CONOMETRICS Topic 4 Estimation SofPRING
ARMA2013
20 / 42
ML OF AR(1)
AR(1)
ML OF AR( P )
AR( P )
Assume
Yt = c0 + 0;1 Yt 1 + ... + 0;p Yt p + t ,
where
t Nid (0, 20 )
and the roots of 1 0;1 z ... 0;p z p = 0 are outside the unit
circle.
Yt = c0 + 0 Yt 1 + t , |0 | < 1,
Introduce
Y1 = y1
and
2 1
Vp = ( )
21 / 42
ML OF AR( P )
Yp p
Yp p
AR( P )
22 / 42
ML OF AR( P )
AR( P )
= (2 )p/2 |2 Vp ()|1/2
1
1
exp 2 yp p Vp () yp p
2
t =p +1
= (2 )1/2 |2 |1/2
"
#
2
1 (yt c 1 yt 1 ... p yt p )
exp
2
2
when t = p + 1, ..., T .
J IANHUA G ANG (RUC)
23 / 42
24 / 42
ML OF AR( P )
AR( P )
ML OF AR( P )
AR( P )
t =p +1
p
= ln(2 ) |2 Vp ()|1/2
2
1
2 yp p Vp1 () yp p
2
T p
ln(22 )
2
2
1 T (yt c 1 yt 1 ... p yt p )
2 t =p +1
2
t =p +1
25 / 42
ML OF MA(1)
MA(1)
26 / 42
ML OF MA(1)
MA(1)
Suppose
i.e. the density of Yt |t 1 is
fY t |t 1 (yt |t 1 ; 0 )
1 (yt 0 0 t 1 )2
1
exp
= q
2
20
220
1 2t
1
exp 2
= q
2 0
220
27 / 42
28 / 42
ML OF MA(1)
MA(1)
ML OF MA(1)
MA(1)
and
fY t ,Y t 1 ,...,Y 1 |0 (yt , yt 1 , ..., y1 |0 ; 0 )
= fY1 |0 (y1 |0 ; 0 )
1 (0 ) = y1 0 0 0
29 / 42
t =2
T
t ( 0 )2
.
= (2 )T /2 (20 )T /2 exp
220
t =1
Notice that this is not the density of (Yt , Yt 1 , ..., Y1 ) where each Yt
has an MA(1) representation, but that a density (i.e., the density of
(Yt , Yt 1 , ..., Y1 ) when each Yt has MA(1) representation)
conditional on 0 .
Moreover, we cannot compute a likelihood, because we cant observe
0 .
I NTRODUCTORY F INANCIAL E CONOMETRICS Topic 4 Estimation SofPRING
ARMA2013
ML OF MA(1)
MA(1)
30 / 42
ML OF MA(1)
MA(1)
Therefore,
Yt = 0 + t + 0 t 1 ,
= fY1 |0 =0 (y1 |0 = 0; 0 )
with
t > 0, 0 = 0
t =2
This process is very similar to the stationary MA(1), and it has the
density above (setting 0 = 0); given that we know 0 ,we can
initialize the iterations (for all the admissible values of )
= (2 )
t 1
T /2
2 T /2
( )
t ()2
exp 22 .
t =1
T
31 / 42
l () =
T
1
T
ln(2 ) ln(2 ) 2
2
2
2
2t ()
t =1
32 / 42
ML OF MA( Q )
MA( Q )
ML OF MA( Q )
MA( Q )
Iteratively, and we can formulate a "conditional maximum likelihood":
fY t ,Y t 1 ,...,Y 1 ,0 =0 (yt , yt 1 , ..., y1 , 0 = 0; 0 )
= fY1 |0 = 0(y1 |0 = 0; 0 )
T
zq
t =2
= (2 )
t 1
T /2
2 T /2
| |
t ()2
exp 22 .
t =1
T
t () = yt 1 t 1 () ... q t q ()
l () =
T
1
T
ln(2 ) ln(2 ) 2
2
2
2
2t ()
t =1
33 / 42
ML OF ARMA( P, Q )
ARMA( P, Q )
34 / 42
ML OF ARMA( P, Q )
ARMA( P, Q )
35 / 42
36 / 42
ML OF ARMA( P, Q )
ARMA( P, Q )
2t ()
t =p +1
37 / 42
38 / 42
l ()
| (0) (gradient)
=
2 l ()
H ((0 ) ) =
| (0) (Hessian)
=
g ((0 ) ) =
ih
i
h
l () l ((0 ) ) + g ((0 ) ) (0 )
i
i
h
1h
(0 ) H ((0 ) ) (0 )
2
l ()
| b=0
=
39 / 42
40 / 42
solving for
h
i
h
i
g ((0 ) ) H ((0 ) ) (0 ) = 0,
i.e.,
i
h
= (0 ) + H ((0 ) )1 g ((0 ) ) .
41 / 42
42 / 42
Spring 2013
1 / 30
2 / 30
A S AMPLE
leptokurtosis
volatility clustering or volatility pooling
leverage effects
SPX (S&P500 Index)
2000
100
1000
50
CBOE VIX
or more compactly
y = X + u, u N 0, 2
0
1990
3 / 30
1992
1994
1996
1998
2002
Year
2004
2006
2008 2009
4 / 30
A S AMPLE
yt = f (ut , ut 1 , ut 2 , ...)
15
10
5
0
-5
-10
1990
1992
1994
1996
1998
2000
Year
2002
2004
2006
20082009
80
60
40
20
0
-20
-40
1990
1992
1994
1996
1998
2000
Year
2002
2004
2006
20082009
Models with nonlinear g (.) are non-linear in the mean, while those
with nonlinear 2 (.) are nonlinear in variance.
5 / 30
6 / 30
ARCH / GARCH
switching models
bilinear models
One particular non-linear model that has proved very useful in finance
is the ARCH model due to Engle (1982).
7 / 30
8 / 30
H ETEROSKEDASTICITY R EVISITED
H ETEROSKEDASTICITY R EVISITED
ARCH M ODELS
9 / 30
ARCH M ODELS
10 / 30
ARCH M ODELS
So use a model which does not assume that the variance is constant.
Recall the definition of the variance of ut :
n
o
2t = var (ut |ut 1 , ut 2 , ...) = E [ut E (ut )]2 |ut 1 , ut 2 , ...
we usually assume that E (ut ) = 0, so
2t = var (ut |ut 1 , ut 2 , ...) = E ut2 |ut 1 , ut 2 , ...
where 2t = 0 + 1 ut21 .
We can easily extend this to the general case where the error variance
depends on q lags of squared errors:
Now, what could the current value of the variance of the errors
plausibly depend upon?
11 / 30
12 / 30
2 Then square the residuals, and regress them on q own lags to test for
ARCH of order q, i.e. run the regression
bt21 + 2 u
bt22 + ... + q u
bt2q + vt
u
bt2 = 0 + 1 u
13 / 30
How do we decide on q?
H0 : 1 = 2 = ... = q = 0
H1 : q 6= 0, q = 1 or 2 or 3...
If the value of the test statistic is greater than the critical value from
the 2 distribution, then reject the null hypothesis.
Note that the ARCH test is also sometimes applied directly to returns
instead of the residuals from Stage 1 above.
14 / 30
15 / 30
16 / 30
GARCH M ODELS
GARCH M ODELS
2t = 0 + 1 ut21 + 2t 1
This is a GARCH(1,1) model, which is like an ARMA(1,1) model for
the variance equation.
We could also show that a GARCH(1,1) model can be written as an
infinite order ARCH model.
2t = 0 + i ut2i +
i =1
j 2t j
j =1
17 / 30
18 / 30
Since the model is no longer of the usual linear form, we cannot use
OLS.
0
1 ( 1 + )
when 1 + < 1
1 + > 1 is termed non-stationarity in variance;
The method works by finding the most likely values of the parameters
given the actual data.
19 / 30
20 / 30
Specify the appropriate equations for the mean and the variance - e.g.
an AR(1)-GARCH(1,1) model:
yt
2t
= + yt 1 + ut , ut N (0, 2t )
=
0 + 1 ut21
(1)
2t 1
1 T
T
1 T (yt yt 1 )2
l = log(2 ) log(2t )
2
2 t =1
2 t =1
2t
(2)
1
2
3
The computer will maximise the function and give parameter values
and their standard errors.
21 / 30
Set up LLF.
Use regression to get initial guesses for the mean parameters.
Choose some initial guesses for the conditional variance parameters.
Specify a convergence criterion - either by criterion or by value.
22 / 30
vt N (0, 1)
q ut = vt t
t = 0 + 1 ut21 + 2t 1
vt = utt
The sample counterpart is vbt =
ubt
bt
Are the vbt normal? Typically vbt are still leptokurtic, although less so
than u
bt . Is this a problem? Not really, as we discussed before. We
can use the ML with a robust variance/covariance estimator. ML
with robust standard error is called Quasi-Maximum Likelihood or
QML (also known as pseudo-).
23 / 30
24 / 30
T HE EGARCH M ODEL
T HE EGARCH M ODEL
T HE GJR M ODEL
r
2
ut 1
|ut 1 |
+ q
log 2t = + log 2t 1 + q
2t 1
2t 1
Since we model the log 2t , then even if the parameters are negative,
2t will be positive.
We can account for the leverage effect by noticing that a negative
shock (u
t 1 ) has an asymmetric effect on the dependent variable
log 2t as opposed to a positive shock.
T HE GJR M ODEL
25 / 30
A N E XAMPLE OF GJR
It 1 = 1, if ut 1 < 0
It 1 = 0, otherwise
For a leverage effect, we would see > 0.
We require 1 + 0 and 1 0 for non-negativity conditions.
A N E XAMPLE OF GJR
26 / 30
0.14
GARCH
= 0.172
2t =
GJR
0.12
(3.198 )
Value of Conditional Variance
yt
(16.372 )
(0.437 )
(14.999 )
(5.772 )
0.1
0.08
0.06
0.04
0.02
0
-1
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
27 / 30
28 / 30
GARCH- IN -M EAN
GARCH- IN -M EAN
GARCH can model the volatility clustering effect since the conditional
variance is autoregressive. Such models can be used to forecast
volatility.
We could show that
Var (yt |yt 1 , yt 2 ..., ) = Var (ut |ut 1 , ut 2 , ...)
29 / 30
30 / 30
Spring 2013
1 / 44
MGARCH FAMILY
MGARCH FAMILY
2 / 44
MGARCH FAMILY
MGARCH FAMILY
P ROBLEM
1
Is the impact the same for negative and positive shocks of the same
amplitude?
3 / 44
4 / 44
MGARCH FAMILY
MGARCH FAMILY
yt = t ( ) + t
ET AL .
ht = c + A t 1 + Ght 1
where
ht
t = Ht1/2 ( ) zt
Var (zt ) = IN
= vech (Ht )
= vech t t
and vech () denotes the operator that stacks the lower triangular portion
of a N N matrix as a N (N + 1) /2 1vector. A and G are square
parameter matrices of order N (N + 1) /2 and c is a N (N + 1) /2 1
parameter vector.
E (zt ) = 0
1988)
5 / 44
VEC
ET AL .
1988)
AND
DVEC (B OLLERSLEV
ET AL .
6 / 44
1988)
But, even under this diagonality, large-scale systems are still highly
parameterized and difficult to estimate.
But, even under this diagonality, large-scale systems are still highly
parameterized and difficult to estimate.
Even simpler version of the DVEC (Ding and Engle, 2001): A and G
to be positive scalar (scalar model).
Even simpler version of the DVEC (Ding and Engle, 2001): A and G
to be positive scalar (scalar model).
7 / 44
8 / 44
BEKK
ht = (1 ) t 1 + ht 1
which is a scalar VEC. The decay factor proposed by Riskmetrics is
0.94 for daily data and 0.97 for monthly data.
However, the decay factor is not estimated by suggested. Therefore
very hard to justify.
9 / 44
10 / 44
BEKK
BEKK
D EFINITION (BEKK(1,1,K))
Ht = C C +
k =1
k =1
Ak t 1 t 1 Ak + Gk Ht 1 Gk
(1)
11 / 44
12 / 44
VEC
AND
BEKK
AL .
D EFINITION (FGARCH(1,1,K))
The difficulty when estimating a VEC or even a BEKK model is the
high number of unknown parameters, even after imposing several
restrictions.
It is thus not surprising that these models are rarely used when the
number of series is larger than 3 or 4.
Factor and orthogonal models circumvent this difficulty by imposing a
common dynamic structure on all the elements of Ht , which results in
less parameterized models.
Gk
13 / 44
(4)
AL .
14 / 44
(3)
N
0 for k 6= i
, wkn = 1
1 for k = i n =1
AL .
(2)
= k wk k ,
= k wk k ,
D EFINITION (FGARCH(1,1,K))
Substitute (2) and (3) into (1) and define = C C , we get
K
Ht = +
k k
k =1
2k wk t 1 t 1 wk + 2k wk Ht 1 wk
(5)
Ht
15 / 44
= + 1 1 21 w1 t 1 t 1 w1 + 21 w1 Ht 1 w1
+1 1 22 w2 t 1 t 1 w2 + 22 w2 Ht 1 w2
(6)
16 / 44
AL .
AL .
t = 1 f1t + 2 f2t + et
t = ft + et
Each factor fkt has zero conditional mean and conditional variance
like a GARCH(1,1) process.
17 / 44
O RTHOGONAL GARCH
O RTHOGONAL GARCH
Each factor fkt has zero conditional mean and conditional variance
like a GARCH(1,1) process.
D EFINITION (O-GARCH(1,1, M ))
Kariya (1988) and Alexander and Chibumba (1997): The N N
time-varing variance marix Ht is generated by m N univariate GARCH
models,
t = 1 f1t + 2 f2t + et
where et represents an idiosyncratic shock with constant variance matrix
and uncorrelated with the two factors.
18 / 44
19 / 44
20 / 44
Multivariate models must allow where one can specify separately (the
individual conditional variances) and the conditional correlation
matrix or other measure of dependence between individual series (like
the copula of the conditional joint density).
A hierarchical procedure:
1
21 / 44
CCC M ODEL
CCC M ODEL
D EFINITION (CCC)
hiit hjjt
1/2
1/2
...hNNt
Dt = diag h11t
22 / 44
is a
23 / 44
24 / 44
DCC M ODEL
DCC M ODEL
The DCC model of Tse and Tsui (2002) and Engle (2002) are useful
when modelling high-dimensional data sets.
Ht = Dt Rt Dt
where Dt is defined in (7), hiit can be defined as any univariate GARCH
model and
Rt = (1 1 2 ) R + 1 t 1 + 2 Rt 1
(8)
25 / 44
DCC M ODEL
DCC M ODEL
MODEL OF
OR
M
m =1 ui ,t m uj ,t m
M
2
2
u
u
M
m =1 i ,t m
m =1 j ,t m
(9)
27 / 44
26 / 44
28 / 44
DCC M ODEL
DCC M ODEL
Ht = Dt Rt Dt
where
1/2
1/2
1/2
1/2
Q
...qNN
diag
q
...q
Rt = diag q11,t
t
11,t
,t
NN ,t
29 / 44
12t
M
m =1 u1,t m u2,t m
M
2
2
u
u
M
m =1 1,t m
m =1 2,t m
(10)
2
(1 ) q 22 + u2,t
1 + q22,t 1
Unlike in DCCT the DCCE model does not formulate the conditional
correlation as a weighted sum of past correlations.
= (1 1 2 ) 12 + 2 12,t 1 + 1
q
30 / 44
+ r
12t
(11)
31 / 44
32 / 44
33 / 44
34 / 44
A model somewhat different from the previous ones but that nests
several of them is the general dynamic covariance (GDC) model
proposed by Kroner and Ng (1998).
Dt = (dijt ), diit =
(12)
(13)
35 / 44
36 / 44
Elementwise we have,
hijt
hiit
p
= ijt iit jjt + ij ijt , for i 6= j
= | iit | , for any i.
(14)
1
2
3
37 / 44
E STIMATION I SSUES
T WO -S TEP MLE
39 / 44
38 / 44
E STIMATION I SSUES
40 / 44
E STIMATION I SSUES
E STIMATION I SSUES
D IAGNOSTIC C HECKING
D IAGNOSTIC C HECKING
It is desirable to check,
1
VTE:
1
41 / 44
D IAGNOSTIC C HECKING
E (ztzt ) = I
N;
Cov zit2 , zjt2 = 0, for all i 6= j;
Cov zit2 , zj2,t k = 0, for k > 0.
42 / 44
D IAGNOSTIC C HECKING
D IAGNOSTIC C HECKING
D IAGNOSTIC C HECKING
43 / 44
44 / 44
Spring 2013
1 / 52
Qst
Qdt
(1)
(2)
(3)
= + P + S + u
Q = + P + kT + v
Q
(4)
(5)
Assuming that the market always clears, and dropping the time
subscripts for simplicity.
y is an ENDOGENOUS variable.
An example from economics to illustrate-the demand and supply of a
good:
= + Pt + St + ut
= + Pt + kTt + vt
= Qst
2 / 52
All the models we have looked at thus far have been single equations
models of the form: y = X + u
Qdt
3 / 52
4 / 52
Re-arranging (6):
Solving for Q,
+ P + S + u = + P + kT + v
P=
(6)
v u
+
T
S
(8)
Solving for P,
S
u
Q
kT
v
Q
=
(7)
Q=
u v
T+
S+
(9)
(8) and (9) are the reduced form equations for P and Q.
5 / 52
6 / 52
But what would happen if we had estimated equations (4) and (5),
i.e. the structural form equations, separately using OLS?
Both equations depend on P. One of the CLRM assumptions was that
E (X u ) = 0, where X is a matrix containing all the variables on the
R.H.S. of the equation.
It is clear from (8) that P is related to the errors in (4) and (5) i.e. it is stochastic.
Hence, when estimating coefficient before P, it is biased! (Since
E (X u ) 6= 0 in general!)
7 / 52
8 / 52
= 10 + 11 T + 12 S + 1
= 20 + 21 T + 22 S + 2
(10)
(11)
We CAN estimate equations (10) and (11) using OLS since all the
R.H.S. variables are exogenous.
But ... we probably dont care what the values of the coefficients
are; what we wanted were the original parameters in the structural
equations - , , , , , k.
P ROBLEM
As well as simultaneity, we sometimes encounter another problem:
Identification. Consider the following demand and supply equations
= + P
Q = + P
Q
9 / 52
(13)
10 / 52
(12)
We cannot tell which is which! (same equations in nature from OLS view!)
11 / 52
12 / 52
An equation is unidentified
like (12) and (13)
we cannot get the structural coefficients from the reduced form
estimates.
An equation is exactly identified
e.g. (4) or (5)
can get unique structural form coefficient estimates.
An equation is over-identified
Examples given later
More than one set of structural coefficients could be obtained from
the reduced form.
13 / 52
14 / 52
T HE O RDER C ONDITION
E XAMPLE
In the following system of equations, the Ys are endogenous, while the Xs
are exogenous. Determine whether each equation is over-, under-, or
just-identified.
D EFINITION
Statement of the Order Condition (from Ramanathan 1995, pp.666)
Let G denote the number of structural equations. An equation is just
identified if the number of variables excluded from an equation is G-1.
If more than G-1 are absent, it is over-identified. If less than G-1 are
absent, it is not identified.
T HE O RDER C ONDITION
15 / 52
Y1 = 0 + 1 Y2 + 2 Y3 + 3 X1 + 4 X2 + u1
(14)
Y2 = 0 + 1 Y3 + 2 X1 + u2
(15)
Y3 = 0 + 1 Y2 + u3
(16)
16 / 52
T HE O RDER C ONDITION
T HE R ANK C ONDITION
S OLUTION
G=3
If # excluded variables = 2, the eq. is just identified
If # excluded variables > 2, the eq. is over-identified
If # excluded variables < 2, the eq. is not identified
Hence,
Equation 14: Not identified
Equation 15: Just identified
Equation 16: Over-identified
17 / 52
T HE R ANK C ONDITION
18 / 52
T HE R ANK C ONDITION
For example:
y1 = 3y2 2x1 + x2 + u1
Results:
y2 = y3 + x3 + u2
y3 = y1 y2 2x3 + u3
2
3
19 / 52
20 / 52
= 10 + 11 X1 + 12 X2 + v1
= 20 + 21 X1
+ v2
= 30 + 31 X1
+ v3
(17)
(18)
(19)
Estimate the reduced form equations (17)-(19) using OLS, and obtain
b2 , Y
b3 .
b1 , Y
the fitted values: Y
21 / 52
22 / 52
R ECURSIVE S YSTEMS
R ECURSIVE S YSTEMS
Consider the following system of equations:
Y1 = 10
(21)
Y2
(22)
3. Run the regression 14 again, but now also including the fitted values
b3 as additional regressors:
b2 , Y
Y
Y3
b2 + 3 Y
b3 + u1 (20)
Y1 = 0 + 1 Y2 + 2 Y3 + 3 X1 + 4 X2 + 2 Y
23 / 52
+ 11 X1 + 12 X2 + u1
= 20 + 21 Y1
+ 21 X1 + 22 X2 + u2
= 30 + 31 Y1 + 32 Y2 + 31 X1 + 32 X2 + u3
(23)
P ROBLEM
Assume that the error terms are not correlated with each other. Can we
estimate the equations individually using OLS?
(21) contains no endogenous variables, so X1 and X2 are NOT
correlated with u1 . So we can use OLS on (21).
(22) contains endogenous variable Y1 . We can use OLS on (22) if all
the R.H.S. variables are uncorrelated with the error u2 (True!). In
fact, Y1 is not correlated with u2 because there is no Y2 term in
equation (21). So we can use OLS on (22).
J IANHUA G ANG (RUC)
24 / 52
R ECURSIVE S YSTEMS
R ECURSIVE S YSTEMS
D EFINITION
If the system is just identified, ILS involves estimating the reduced form
equations using OLS, and then using them to substitute back to obtain
the structural parameters.
25 / 52
Obtain and estimate the reduced form equations using OLS. Save the
fitted values for the dependent variables.
27 / 52
26 / 52
Obtain and estimate the reduced form equations using OLS. Save the
fitted values for the dependent variables.
Estimate the structural equations, but replace any R.H.S. endogenous
variables with their Stage 1 fitted values.
27 / 52
b2 + 2 Y
b3 + 3 X1 + 4 X2 + u1
= 0 + 1 Y
b3 + X1 + u2
= 0 + 1 Y
2
b
= 0 + 1 Y2 + u3
(24)
(25)
(26)
b2 and Y
b3 will not be correlated with u1 , will not be correlated
Now Y
with u2 , and will not be correlated with u3 .
28 / 52
29 / 52
Recall that the reason we cannot use OLS directly on the structural
equations is that the endogenous variables are correlated with the
errors.
One solution: abandon Y2 or Y3 , rather, use some other variables
instead.
We want these other variables to be (highly) correlated with Y2 and
Y3 , but not correlated with the errors - the INSTRUMENTS.
Say, some suitable instruments for Y2 and Y3 , z2 and z3 respectively.
We do not use the instruments directly, but run regressions of the
form:
Y2 = 1 + 2 z2 + 1
(27)
Y3 = 3 + 4 z3 + 2
(28)
30 / 52
b2 and Y
b3 , and replace Y2
Obtain the fitted values from (27) & (28), Y
and Y3 with these in the structural equation.
We do not use the instruments directly in the structural equation.
It is typical to use more than one instrument per endogenous
variable.
If the instruments are the variables in the reduced form equations,
then IV is equivalent to 2SLS.
31 / 52
P ROBLEM
What are the instruments?
y2t
S OLUTION
Solution: 2SLS is easier.
Other Estimation Techniques:
1
2
32 / 52
Yt = 0 + 1 Yt 1 + ut
J IANHUA G ANG (RUC)
33 / 52
y2t
20
y2t 1
u2t
21
21
34 / 52
2
3
4
5
VARs are theoretical (as are ARMA models). What if not the VAR
process?
How to decide the appropriate lag length?
So many parameters to estimate!
Do we need to ensure all components of the VAR are stationary?
How do we interpret the coefficients?
35 / 52
2 (q ), q = # of restrictions
In the our case above we restrict 4 lags of two variables in each of the
two equations:4 2 2 = 16 restrictions.
P ROBLEM
Conducting the LR test is cumbersome and requires a normality
assumption for the disturbances
36 / 52
37 / 52
y1t
y2t
in compact form:
10
y1t 1
y2t
u1t
y1t
11 11
12 0
=
+
+
+
0 22
21 21
y2t
20
y2t 1
y1t
u2t
The values of the information criteria are constructed for 0,1,... lags
(up to some pre-specified maximum kmax ).
38 / 52
39 / 52
We can take the contemporaneous terms over to the L.H.S. and write
y1t
10
y1t 1
u1t
1
12
11 11
=
+
+
22
1
21 21
y2t
20
y2t 1
u2t
BYt = 0 + 1 Yt 1 + ut
We then pre-multiply both sides by B 1 :
or
Yt = A0 + A1 Yt 1 + et
This is known as a standard form VAR, which we can estimate using
ML as before.
I NTRODUCTORY F INANCIAL E CONOMETRICS Topic 7 Multivariate
S PRING
Models2013
40 / 52
41 / 52
I MPULSE R ESPONSES
Implied Restriction
21 = 0 and 21 = 0 and 21 = 0
11 = 0 and 11 = 0 and 11 = 0
12 = 0 and 12 = 0 and 12 = 0
22 = 0 and 22 = 0 and 22 = 0
I MPULSE R ESPONSES
Each of these four joint hypotheses can be tested within the F-test
framework, since each set of restrictions contains only parameters
drawn from one equation.
Hypothesis
1. Lags of y1t do not explain current y2t
2. Lags of y1t do not explain current y1t
3. Lags of y2t do not explain current y1t
4. Lags of y2t do not explain current y2t
22 y 2t 1 21 22 y 2t 2 21 22 y 2t 3 u 2t
Yt = B 1 0 + B 1 1 Yt 1 + B 1 ut
or
2
y2 t = 20 + 21 y2 t 1 + 21 y1t 1 + u2 t
42 / 52
43 / 52
VARIANCE D ECOMPOSITIONS
VARIANCE D ECOMPOSITIONS
44 / 52
L AG N UMBER S ELECTION
L AG N UMBER S ELECTION
45 / 52
T ESTS OF R ANDOMNESS
T ESTS OF R ANDOMNESS
46 / 52
Tb
j d N (0, 1) ,(j 1)
We can use this property to design two tests to check if the data are
independently distributed.
47 / 52
L AG N UMBER S ELECTION
T ESTS OF R ANDOMNESS
L AG N UMBER S ELECTION
T EST OF R ANDOMNESS
P ORTMANTEAU TEST
P ORTMANTEAU TEST
b2j d
2k
j =1
48 / 52
49 / 52
50 / 52
51 / 52
L AG N UMBER S ELECTION
PARSIMONIOUS M ODELLING
PARSIMONIOUS M ODELLING
Large econometrics models tend to do badly in terms of
forecasting, and are outperfomed by small ARMA models (Box
& Jenkins).
Even in ARMA models, increasing the number of parameters reduces
the precision of with which each parameter is estimated. This is
beacuse when the parameters are estimated, their variance
contributed to the variance of the forecast.
Adding extra parameters may then help to reduce or eliminate the
forecast bias, but the gain in terms of reduction bias 2 is outweighted
by the loss in increased variance of the forecast.
Should balance the number of estimated parameters and the number
of observations.
Sometimes, Information Criteria have been advocated also to select
more parsimonious models.
J IANHUA G ANG (RUC)
52 / 52
3 C REDITS , 51 H OURS
Jianhua Gang
E XAMPLE
Examine the series for consumption and income. The presence of trends
can sometimes invalidate the usual asymptotic theory for OLS and test
procedures.
School of Finance
Renmin University of China
Spring 2013
A discussion of trends and related topics such as tests for unit roots
and cointegration is, therefore, required.
1 / 49
Applied workers often specify models that include both lagged values
of the dependent variable and a distributed lag component in the
regression function. These models are called autoregressive
distributed lag (ADL) models.
A very simple ADL relationship is
yt = + 1 yt 1 + 0 xt + 1 xt 1 + ut , |1 | < 1.
yt = + 1 yt 1 + 0 xt + 1 xt 1 + ut , |1 | < 1.
2 / 49
j =0
4 / 49
can be written as
(yt + yt 1 ) = + 1 yt 1 + 0 (xt + xt 1 ) + 1 xt 1 + ut ,
yt = 0 xt (1 1 ) [yt 1 xt 1 ] + ut
Thus the ECM has first differences in y linked to first differences in x
and the extent by which y deviates from the long-run expected value in
the previous period. (Martingale)
5 / 49
If the OLS estimates of the ADL are denoted byband the nonlinear
least squares estimates of the ECM are denoted bye, then it can
be shown that,
e =
e
=
b
0
0 = e
b
1 = e
1
6 / 49
(1 b1 )
1 )
(b
0 + b
(1 b1 )
This two-step approach can play an important role when the data
contain trends and will be discussed later in further detail.
7 / 49
8 / 49
T RENDING VARIABLES
T RENDING VARIABLES
T RENDING VARIABLES
zt = zt zt 1 = ut , ut NID (0, 2 ),
T RENDING VARIABLES
9 / 49
T RENDING VARIABLES
Hence,
zt =
us + z0 .
1
T RENDING VARIABLES
(1)
10 / 49
so that, if z0 = 0, then,
E (zt )
var (zt )
corr (zt , zt s )
zt =
us + at,
(2)
0;
h
i
2 / 1 2 < ;
s s 0
Thus this I (0) variable has constant mean, constant variance (hence
large departures are rare), and autocorrelations decline as the order
increases.
11 / 49
12 / 49
where, zt =
us ,if z0 = 0.
1
Hence,
var (zt )
E (zt )
=
=
corr (zt , zt g )
0,
t2 , (monotonic on t )
q
(t g )/ t (t g ), (dependence on t )
Note that zt =
us
13 / 49
14 / 49
which provides a basis for valid estimation and inference, with the
additional regressor serving as a trend-removing agent.
However, it has been established that the asymptotic theory of
OLS estimators and tests developed for I (0) variables can be
misleading when applied to data from I (1) processes. With
nonstationary variables, OLS estimators may tend to nonstandard
distributions, rather than normality, as n .
J IANHUA G ANG (RUC)
15 / 49
16 / 49
- If
= 1 zt I (1)
< 1 zt I (0)
Let DP and RE denote the true data process and the regression
equation used to compute the test, respectively. Will consider
three cases.
C ASE 1 H1 : zt stable AR (1) with zero mean.
zt = zt 1 + ut , ut NID (0, ).
||
17 / 49
18 / 49
19 / 49
20 / 49
21 / 49
22 / 49
zt = A + A t + (A 1)zt 1 + j zt j + t
1
23 / 49
24 / 49
C O - INTEGRATION
C O - INTEGRATION
C O - INTEGRATION
zit I (d ), d > 0, i;
2
25 / 49
27 / 49
6= 0.
C O - INTEGRATION
26 / 49
28 / 49
29 / 49
30 / 49
The second approach involves applying the ADF test in t-ratio form
after OLS estimation of the equation
p
b
ut = b
ut 1 + j b
ut j + et ,
1
Consider for example the PPP theory for the ex-change rate. In
perfect markets there are no arbitrage opportunities so the exchange
rate R is determined by the relative movements of the domestic price
level P and the foreign price level. P* i.e.
R=
This test is denoted CRADF. The DF tables are not valid for CRADF.
Asymptotic distributions under the unit root hypothesis depend upon
the number of I (1) regressors and whether or not the CR includes an
intercept and/or a trend term.
Finite sample critical values have been estimated by computer
method for various cases and are availble in some estimation
programs, e.g. PcGive.
I NTRODUCTORY F INANCIAL E CONOMETRICS Topic 8 Trending and
S PRING
Co-Integration
2013
31 / 49
P
r = p p (in log)
P
This can be seen as an long equilibrium. Data like exchange rates and
inflation levels are usually I (1). So they are quite volatile. However if
the PPP theory is correct they should not drift apart a lot over time
i.e.
r (p p ) small
32 / 49
Moreover in practice the variables above are usually I (1) so they may
co-integrate.
mt = 1 rt + 2 t + 3 Yt
33 / 49
34 / 49
Use DF test to make sure that all the variables are I (1).
Use OLS to estimate the model mt = c
1 rt + c
2 t + c
3 Yt + ubt .
Conduct test on the residual. If there is co-integration, then the
residuals must be stationary, otherwise the residuals will be I (1).
35 / 49
36 / 49
VAR M ODEL
VAR M ODEL
VAR M ODEL
VAR M ODEL
The VAR model is, as the name suggests, an autoregression of a
vector process. Consider a simplest example of a VAR. This is a
two-variable VAR model with lag of first order (VAR(1)).
1t
y1t
11 12 y1t 1
=
+
21 22 y2t 1
y2t
2t
i yt i + t , and it can be
i =1
reparameterized as
yt = yt 1 + i yt i + t
i =2
or
p 1
yt = yt p +
ci yt i + t
i =1
Yt =
i Yt i + t
i =1
37 / 49
VECM
Yt = 1 Yt 1 + t
i Yt i + t
i =1
where i = I
j
j =1
1 Yt 1 = Yt t
.
p 1
38 / 49
Just like the scalar AR(p) model, the VAR(p) model can finally be
reparameterised as follows
Yt = p Yt p +
39 / 49
Note: the right hand side is I (0) so 1 Yt 1 must be I (0) as well i.e.
the rows of the matrix 1 are co-integrating vectors and y1t and y2t
co-integrate. The rank of 1 gives the number of the linearly
independent co-integrating vectors.
Note that m = 2 so we cannot have more than one linearly
independent co-integrating vectors.
J IANHUA G ANG (RUC)
40 / 49
The result from last slide can be generalized easily to higher order
VECMs. Consider the model as before and suppose that
Y (t ) = I (1). Then,
1
2
41 / 49
42 / 49
43 / 49
and
b r +1 )
max (r , r + 1) = T ln(1
44 / 49
C O - INTEGRATING V ECTORS
The trace (r ) tests the null that the number of co-integrating vectors
is less than or equal to r against an unspecified alternative, while
the max (r , r + 1) tests the null that the number of co-integrating
vectors is r against an alternative of r + 1.
45 / 49
C O - INTEGRATING V ECTORS
p =
where , are m r full rank matrices.
Consider, for example, the case of m = 2. Then if y1t , y2t
co-integrate r = 1 and
1
1 2
p = =
2
1 1 1 2
=
2 1 2 2
J IANHUA G ANG (RUC)
46 / 49
et = xt xt
The ECM suggests that xt changes over time to correct disequilibrium
errors that occurred in the past i.e.
xt = et 1
, where is a speed adjustment coefficient.
47 / 49
48 / 49
49 / 49
Spring 2013
G OALS
O UR DATA
VARIABLES
Some variables that are specific to the phenomenon under study, allow
ones to follow their evolutions.
D EFINITION (E XOGENOUS VARIABLE )
In order to explain the phenomenon, some variables may have influence on
the endogenous variables, and that the values of which are fixed outside
the phenomenon.
LINEAR SYSTEM
A0 yt + A1 yt 1 + ... + Ap yt p + B0 xt + B1 xt 1 + ... + Bp xt p + = 0,
(1)
where Aj , j = 0, 2, ..., p are n n; Bj are n m matrices, and is a n 1
vector. The A0 is supposed to be nonsingular, so that the whole system
allows for a unique determination of the current values of the endogenous
variables.
The system,
GDPt = Ct + It + Gt
Ct = aGDPt 1
It = b (GDPt 1 GDPt 2 )
(2)
GDPt
GDPt 1
1 1 1
0 0 0
0 1
0 Ct a 0 0 Ct 1
0 0
1
b 0 0
It
It 1
GDPt 2
1
0 0 0
+ 0 0 0 Ct 2 0 Gt = 0
0
b 0 0
It 2
(3)
R ANDOMNESS
R ANDOMNESS
R ANDOMNESS
R ANDOMNESS
D YNAMICS AND
DISTURBANCES
The dynamic model (3) is deterministic and does not reflect short-run
disturbances.
If the whole dynamics has been correctly included in the initial
specification as in (3), these disturbances should be independent.
With random factors, we may re-write the model (2) as in,
TDt = Ct + It + Gt
GDPt = TDt
C = aGDPt 1 + ut
t
It = b (GDPt 1 GDPt 2 ) + vt
(4)
(5)
More compactly,
A0 yt + A1 yt 1 + A2 yt 2 + B0 xt + B1 xt 1 + B2 xt 2 + = t
D EFINITIONS
D EFINITIONS
VAR .)
+C0 zt + C1 zt 1 + ... + Cp zt p + = t
VAR .
A0 yt + A1 yt 1 + ... + Ap yt p + B0 xt + B1 xt 1 + ... + Bp xt p
+C0 zt + C1 zt 1 + ... + Cp zt p + = t
xt + D1 xt 1 + ... + Dp xt p + E0 zt + E1 zt 1 + ... + Ep zt p
+F1 yt 1 + ... + Fp yt p + = ut
(6)
WEAK EXOGENEITY
The control var.s can have an impact on the endogenous var. or the
environment var. However, they do not influence them directly. (i.e. do
not alter through Aj , Bj , Dj , Fj );
The x are exogenous because the xt s are fixed prior to the yt s
(F0 = 0, and cov (ut , t ) = 0).
E XAMPLE
In Keynesian model, the government can alter Gt so as to influence the
economy. e.g. maintain a constant level of expenditure,
A0 yt + A1 yt 1 + ... + Ap yt p + B0 xt + B1 xt 1 + ... + Bp xt p
+C0 zt + C1 zt 1 + ... + Cp zt p + = t
xt + D1 xt 1 + ... + Dp xt p + E0 zt + E1 zt 1 + ... + Ep zt p
+ = ut
Gt = Gt 1 ,
(7)
A0 yt + A1 yt 1 + ... + Ap yt p + B0 xt + B1 xt 1 + ... + Bp xt p
+C0 zt + C1 zt 1 + ... + Cp zt p + = t
xt + D1 xt 1 + ... + Dp xt p + E0 zt + E1 zt 1 + ... + Ep zt p
+F1 yt 1 + ... + Fp yt p + = ut
zt + G1 zt 1 + ... + Gp zt p + H1 xt 1 + ... + Hp xt p
+I1 yt 1 + ... + Ip yt p + = vt
(8)
(9)
determination of z of x of y .
However, policy maker may only give values that he wants to the
coefficients Gj , Hj , Ij , whereas he does not have any influence on the
other parameters of the model.
(10)
T HE STRUCTURAL FORM
T HE STRUCTURAL FORM
S IMULTANEITY
+B0 xt + B1 xt 1 + ... + Bp xt p + = t ,
(11)
= + (I A0 )yt A1 yt 1 ... Ap yt p
B0 xt B1 xt 1 ... Bp xt p + t ,
(12)
T HE REDUCED FORM
T HE REDUCED FORM
D EFINITION
C OMMENTS
= A 01 (A0 yt + A1 yt 1 + ... + Ap yt p
+B0 xt + B1 xt 1 + ... + Bp xt p + ) + A01 t .
(13)
T HE FINAL FORM
C AUSALITY
C AUSALITY
D EFINITION
C AUSALITY
C AUSALITY
C AUSALITY
D EFINITIONS - CAUSALITY
D EFINITION -
C AUSALITY
NONCAUSALITY
E (xt | x t 1 , y t ) 6= E (xt | x t 1 , y t 1 ).
C AUSALITY
C AUSALITY
C AUSALITY
D EFINITION -
D EFINITION -
NONCAUSALITY
NONCAUSALITY
D EFINITION (N ONCAUSALITY )
1. y does not cause x at time t iff,
var ((xt | x t 1 , y t 1 )) = var ((xt | x t 1 ));
2. y does not cause x instantaneously at time t iff,
C OROLLARY (S YMMETRIC )
The two following statements are equivalent:
1. y does not cause x instantaneously at time t;
2. x does not cause y instantaneously at time t.
C AUSALITY
C AUSALITY
C AUSALITY
C AUSALITY R EVERSAL
L IMIT
= y + yt
cx
xt
xt
xy (L) x (L)
(14)