Professional Documents
Culture Documents
26 Autoregressive Conditional
Heteroscedasticity (ARCH)
Introduction
Ch. 26
1 INTRODUCTION
Ch. 26
2 ARCH(M ) MODEL
ARCH(m) Model
A model which allows the conditional variance to depend on the past realization
of the series is considered in the following. Suppose that
p
ut =
ht t
(1)
ht = 0 + 1 u2t1 + 2 u2t2 + + m u2tm ,
(2)
with E(t ) = 0 and V ar(t ) = 1, then this is a example of what will be called an
autoregressive conditional heteroscedasticity (ARCH(m)) model.
2.1
2.1.1
Let Ft1 denote the information set available at time t 1. The conditional mean
of ut is
p
(3)
E(ut |Ft1 ) = ht E(t |Ft1 ) = 0.
From (3) it implies that the conditional variance of ut is
t2 = V ar(ut |Ft1 )
= E{[ut E(ut |Ft1 )]2 |Ft1 }
= E(u2t |Ft1 )
= E(ht 2t |Ft1 )
= E(2t )E(0 + 1 u2t1 + 2 u2t2 + + m u2tm |Ft1 )
= 0 + 1 u2t1 + 2 u2t2 + + m u2tm
= ht .
From the structure of the model, it is seen that large past squared shocked
shocks u2ti , i = 1, .., m imply a large conditional variance t2 (= V ar(ut |Ft1 )) for
this variable ut . Consequently, ut tends to assume a large value. This means
that, under the ARCH framework, large shocks tend to be followed by another
large chock. This feature is similar to the volatility clustering observed in asset
returns.
Ch. 26
2.1.2
2 ARCH(M ) MODEL
The Conditional Density
2.1.3
The unconditional mean of ut is E[E(ut |Ft1 )] = E(0) = 0. While ut is conditional heteroscedastic, the unconditional variance of ut is
V ar(ut ) = V ar[E(ut |Ft1 )] + E[V ar(ut |Ft1 )]
= 0 + 0 + 1 E(u2t1 ) + 2 E(u2t2 ) + + m E(u2tm )
= 0 + 1 V ar(ut1 ) + 2 V ar(ut2 ) + + m V ar(utm ),
Here, we have used the fact that E(ut |Ft1 ) = 0. If the process generating u2t is
covariance stationary, i.e. all the roots of
1 1 z 2 z 2 ... m z m = 0
lie outside the unit circle, then the unconditional variance is not changing over
time so
0
.
V ar(ut ) = V ar(ut1 ) = V ar(ut2 ) = V ar(utm ) =
1 1 m
For this ratio to be finite and positive, we require that 0 > 0 and 1 + 2 + +
m < 1. Moreover, since E(ut |Ft1 ) = 0, so E(ut utj ) = 0, that is, ut is a white
noise process.
2.1.4
ht t
ht = 0 + 1 u2t1 ,
4
(4)
(5)
Ch. 26
2 ARCH(M ) MODEL
with t N (0, 1). Because the variance of ut must be positive, we need 0 1 <
1. In some applications, we need higher order moments of ut to exist and, hence,
1 must also satisfy some additional constrains. For instance, to study its tail
behavior, we require that the fourth moment of ut is finite. Under the normality
assumption of t , we have1
E(u4t |Ft1 ) = 3[E(u2t |Ft1 )]2 = 3(0 + 1 u2t1 )2 .
Therefore,
E(u4t ) = E[E(u4t |Ft1 )] = 3E(0 + 1 u2t1 )2 = 3E(02 + 20 1 u2t1 + 12 u4t1 ).
If ut is fourth-order stationary with m4 = E(u4t ), then we have
m4 = 3(02 + 20 1 V ar(ut ) + 12 m4 )
1
2
= 30 1 + 2
+ 312 m4 .
1 1
Consequently,
m4 =
302 (1 + 1 )
.
(1 1 )(1 312 )
=
3
> 3.
[V ar(ut )]2
(1 1 )(1 312 )
02
1 312
Thus, the excess kurtosis of ut is positive and the tail distribution of ut is heavier2
than that of a normal distribution. In other words, the shock ut of a conditional
1
The skewness, S(X) and kurtosis, K(X) of a random variable X are defined as
S(X) =
E(X X )3
,
3
X
K(X) =
E(X X )4
,
4
X
2
where X
= E(X X )2 . The quantity K(X)3 is called the excess kurtosis because K(X) = 3
for a normal distribution. Here, f (ut |Ft1 ) N (0, ht ), so the conditional density of f (ut |Ft1 )
E(u4 |Ft1 )
is normal. Therefore [E(u2t|Ft1
= 3.
)]2
t
2
So the unconditional distribution f (ut ) is not normal.
Ch. 26
2 ARCH(M ) MODEL
Gaussian ARCH(1) model is more likely than a Gaussian white noise series to
produce outliers. This is in agreement with the empirical finding that outliers
appear more often in asset returns than that implied by an iid sequence of normal
random variates.
2.1.5
An Alternative Representation
(6)
f or t =
E(wt w ) =
0 otherwise.
Expression (6) implies that
E(u2t |Ft1 ) = 0 + 1 u2t1 + 2 u2t2 + + m u2tm .
(7)
Since ut is random and u2t cannot be negative, this can be a sensible representation
only if (7) is positive and (6) is nonnegative for all realization of ut . This can
be ensured if wt is bounded from below by 0 with 0 > 0 and if i 0 for
j = 1, 2, ..., m. In order for u2t to be covariance-stationary, we further require that
the roots of
1 1 z 2 z 2 m z m = 0
lie outside the unit circle. If the i are all nonnegative, this is equivalent to the
requirement that
1 + 2 + + m < 1.
When these conditions are satisfied, the unconditional variance of ut is given by
2 = E(u2t ) =
0
.
(1 1 2 m )
6
Ch. 26
2.2
2 ARCH(M ) MODEL
(8)
(9)
For example, p = 2, 0 =
(12 ) 2
.
(1+2 )[(12 )2 21 ]
Ch. 26
2 ARCH(M ) MODEL
Ch. 26
2.3
2.3.1
2 ARCH(M ) MODEL
Estimation
Pure ARCH(m) Model
T
Y
1
(ut 0)2
p
=
f (um , um1 , ..., u1 ),
exp
2
2
2
2
t
t
t=m+1
where t the vector of observations obtained through date t:
t = (ut , ut1 , ..., u1 ),
and we have used the fact the conditional mean and variance of ut , i.e. E(ut |t1 ) =
0 and V ar(ut |t1 ) = t2 . Since the exact form of f (um , um1 , ..., u1 ) is complicated, it is commonly dropped from the likelihood function, especially when the
sample size is sufficiently large. This results in using the conditional likelihood
function
T
Y
1
u2t
p
exp 2 ,
f (uT , uT 1 , ..., um+1 |um , um1 , ..., u1 ) =
(10)
2
2
2
t
t
t=m+1
where t2 = 0 + 1 u2t1 + 2 u2t2 + + m u2tm can be evaluated recursively.
We refer to estimates obtained by maximizing the likelihood function as the conditional maximum likelihood estimates under normality.
Maximizing the conditional likelihood function is equivalent to maximizing its
logarithm, which is easier to handle. The conditional log likelihood function is
l(uT , uT 1 , ..., um+1 |um , um1 , ..., u1 )
= ln f (uT , uT 1 , ..., um+1 |um , um1 , ..., u1 )
T
X
1 u2t
1
1
2
=
ln(2) ln(t )
.
2
2
2
2
t
t=m+1
(11)
In some applications, it is more appropriate to assume that t follows a heavytailed distribution such as a standardized Student-t distribution. Let xv be a
Ch. 26
2 ARCH(M ) MODEL
1+
2t
v2
(v+1)/2
, v > 2.
(v+1)/2
T
Y
u2t
((v + 1)/2) 1
p
1+
, v > 2.
=
(v 2)t2
t=m+1 (v/2) (v 2) t
(12)
We refer to the estimate that maximize the prior likelihood function as the conditional M LE under t-distribution. The degrees of freedom of the t-distribution
can be specified a prior or estimated jointly with other parameters. A value between 3 and 6 is often used if it is prespecified.
2.3.2
T
T
T m
1 X
1 X (Yt x0t )2
ln(2)
ln(t2 )
,
2
2 t=m+1
2 t=m+1
t2
(13)
where
t2 = 0 + 1 u2t1 + 2 u2t2 + + m u2tm
= 0 + 1 (Yt1 x0t1 )2 + 2 (Yt2 x0t2 )2 + + m (Ytm x0tm )2
(14)
and the vector of parameters to be estimated 0 = (, 0 , 1 , 2 , , m )0 . For
a given numerical value for the parameter vector , the sequence of conditional
10 Copy Right by Chingnun Lee r 2008
Ch. 26
2 ARCH(M ) MODEL
variances can be calculated from (14) and used to evaluate the log likelihood function (13). This can then be maximized numerically using the methods described
in Chapter 3.4
ln f (Yt |xt , t1 ; )
T
X
[(v + 1)/2]
1/2
= (T m) ln
(v 2)
(1/2)
ln(t2 )
1/2
(v/2)
t=m+1
T
X
(Yt x0t )2
ln 1 + 2
[(v + 1)/2]
,
t (v 2)
t=m+1
where
t2 = 0 + 1 u2t1 + 2 u2t2 + + m u2tm
= 0 + 1 (Yt1 x0t1 )2 + 2 (Yt2 x0t2 )2 + + m (Ytm x0tm )2
Pm
Imposing the stationarity condition ( j=1 j < 1) and the nonnegativity conditions (j
0, j) can be difficult in practice. Typically, either the value of m is very small or else some ad
hoc structure is imposed on the sequence {j }m
j=1 as in Engle (1982, eq. (38)).
4
Ch. 26
2.4
2.4.1
2 ARCH(M ) MODEL
In the linear regression model, OLS is the appropriate procedure if the disturbance are not conditional heteroscedastic. Because the ARCH model requires
iterative procedure, it may be desirable to test whether it is appropriate before
going to the effort to estimate it. The Lagrange multiplier test procedure is ideal
for this as in many cases. See for example, Breusch and Pagan (1979, 1980),
Godfrey (1978) and Engle (1979).
Consider the following linear regression model
Yt = x0t + ut ,
p
ut =
ht t ,
ht = 0 + 1 u2t1 + 2 u2t2 + + m u2tm = 0 ut , .
where = [0 , ..., m ]0 and ut = [1, ut1 , ..., utm ]0 . Under the null hypothesis,
1 = ... = m = 0. The test is based upon the score under the null and the
information matrix under the null. The likelihood function, omitting the revelent
constants, can be written as
T
T
1 X
1 X u2t
l() =
ln(ht )
.
2 t=m+1
2 t=m+1 ht
(15)
T
X
l
1 ht u2t
=
1 .
t=m+1 2ht ht
Under the null hypothesis that 1 = ...m = 0, then the estimation of is simply
And therefore
the OLS estimator .
ht = 0 ,
ht
= [1, e2t1 , ..., e2tm ]0 = z0t ,
H0
T
X
1 0 e2t
z
1
=
20 t 0
t=m+1
=
1 0
Z g,
20
12 Copy Right by Chingnun Lee r 2008
Ch. 26
2 ARCH(M ) MODEL
h 2
i0
e
e2m+2
e2T
where Z0 = [z0m+1 z0m+2 ... z0T ], g = ( m+1
1)
(
1)
...
(
1)
.
0
0
0
The second derivative of (15) is
T
X
2l
1 ht ht u2t
ut
1 ht
=
2
+
.
1
0 t=m+1
2ht 0 ht
ht
0 2ht
The condition expectation E(ut |Ft1 ) = ht , therefore the conditional expectation
of the second term of () is zero and of the last factor in the first, is just one. Hence
the information matrix which is simple the negative expectation of the Hessian
matrix over all observation, become
T
X
1 ht ht
I=
E 2
2ht 0
t=m+1
T
X
1 ht ht
=
2
2ht 0
t=m+1
T
X
1 0
=
2 zt zt
20
t=m+1
=
H0
1 0
Z Z.
202
0
1
1 0
1 0
1 0
LM =
Zg
ZZ
Zg
20
202
20
1 0
=
g Z(Z0 Z)1 Z0 g
2
which under the null hypothesis is asymptotically distributed as a 2m distribution.
Ch. 26
2 ARCH(M ) MODEL
2
e
1)
( m+1
e20
m+2
2
0 ( 0 1)
em+2
g0 g
1 e2m+1
eT
.
plim
= plim
1
1 ...
1
T
T
0
0
0
.
( T0 1)
PT 4
P
et /T
T
2 T e2t /T
= plim
+
(U nder normality)
2
0
0
T
= 32+1
= 2.
Thus, an asymptotically equivalent statistic would be
LM = T
2.4.2
It is simple to test whether the residuals ut from a regression model exhibit timevarying heteroskedasticity without having to estimate the ARCH parameters.
Engle (1982) derived the following Lagrange multiplier principle. This test is
equivalent to the usual F statistic for testing i = 0, i = 1, ..., m in the linear
regression
u2t = 0 + 1 u2t1 + 2 u2t2 + + m u2tm + et , t = m + 1, ..., T,
(16)
where et denote the error term, m is a prespecified positive integer, and T is the
P
u2t u2 )2 ,5 where u2 is the sample mean of
sample size. Let SSE0 = Tt=m+1 (
5
Ch. 26
2 ARCH(M ) MODEL
P
u2 , and SSE1 = Tt=m+1 e2t , where et is the least squares residual of the linear
regression (16). Then we have
F =
2.5
Ch. 26
Although the ARCH model is simple, it often requires many parameters to adequately describe the volatility process such as of an asset returns. Some alternative model must be sought. Bollerslev (1986) proposes a useful extension known
as the generalized ARCH (GARCH) model.
The white noise process ut follows a GARCH(s, m) if
ut =
ht t , ht = 0 +
s
X
j htj +
j=1
m
X
i u2ti
(17)
i=1
where again t is a sequence of iid random variables with mean 0 and variance
P
1.0, 0 > 0, i 0, j 0, and max(m,s)
(i + i ) < 1.
i=1
3.1
3.1.1
u2t
= 0 +
(i +
i )u2ti
+ t
i=1
s
X
j tj .
(18)
j=1
Here, it is understood that i = 0 for i > m and j = 0 for j > s. Notice that
ht is the forecast of u2t based on its own lagged value and thus t = u2t ht is
the error associated with this forecast. Thus, t is a white noise process that is
fundamental for u2t . (18) is recognized as an ARM A form for the squared series
u2t .
3.1.2
Let Ft1 denote the information set available at time t 1. The conditional mean
of ut is
E(ut |Ft1 ) =
p
ht E(t |Ft1 ) = 0.
(19)
Ch. 26
= E(ht 2t |Ft1 )
=
E(2t )E(0
m
X
i u2ti
i=1
= 0 +
m
X
i u2ti +
s
X
j htj |Ft1 )
j=1
s
X
j htj
j=1
i=1
= ht .
3.1.3
3.1.4
The unconditional mean of ut is E[E(ut |Ft1 )] = E(0) = 0. While ut is conditional heteroscedastic, from (18) the unconditional variance of ut is
max(m,s)
s
X
X
(i + i )u2ti + t
j tj
E(u2t ) = E 0 +
i=1
j=1
max(m,s)
= 0 +
(i + i )E(u2ti ).
i=1
0
Pmax(m,s)
i=1
(i + i )
Ch. 26
3.1.5
The strengths and weakness of GARCH model can easily be seen by focusing on
the simplest GARCH(1, 1) model with
p
ut =
ht t
ht = 0 + 1 u2t1 + 1 ht1 , 0 1 , 1 1, (1 + 1 ) < 1.
with t N (0, 1). First, a large u2t1 or ht1 gives rise to a large t2 . This means
that a large u2t1 tends to be followed by another large u2t , generating, again, the
well-known behavior of volatility clustering in financial time series. Second, it
can be shown that if 1 212 (1 + 1 )2 > 0, then
E(u4t )
3[1 (1 + 1 )2 ]
=
> 3.
[E(u2t )]2
1 (1 + 1 )2 212
(20)
Hence
E(u4t ) = 3E[02 + 12 u4t1 + 12 h2t1 + 2(0 1 u2t1 ) + 2(0 1 ht1 ) + 2(1 1 u2t1 ht1 )].(21)
Denote E(u4t ) = m4 . Notice that
E(ht ) = E[E(u2t |Ft1 )] = E(u2t ) = 2 ,
m4
E(h2t ) = E[E(u2t |Ft1 )2 ] = 4 =
,
3
E(u2t1 ht1 ) = E[E(u2t1 ht1 |Ft1 )] = E[u2t1 E(ht1 |Ft1 )]
= E{u2t1 [E(u2t1 |Ft2 )|Ft1 ]}
= E{u2t1 [E(u2t1 |Ft2 ]}
m4
= 4 =
.
3
18 Copy Right by Chingnun Lee r 2008
Ch. 26
(21) become
E(u4t ) = 302 + 312 m4 + 12 m4 + 60 1 2 + 60 1 2 + 21 1 m4 .
Consequently,
(1 312 12 21 1 )m4 = 302 + 60 1 2 + 60 1 2
60 1 0
60 1 0
= 302 + +
+
(1 1 1 ) (1 1 1 )
2
30 (1 1 1 ) + 602 1 + 602 1
=
(1 1 1 )
2
30 (1 + 1 + 1 )
=
,
(1 1 1 )
that is,
m4 =
302 (1 + 1 + 1 )
.
(1 1 1 )(1 312 12 21 1 )
( 2 )2
(1 1 1 )(1 312 12 21 1 )
02
3[1 (1 + 1 )2 ]
=
> 3.
1 (1 + 1 )2 212
3.2
3.3
Estimation
3.4
Ch. 26
If the AR polynomial of the GARCH representation in Eq. (18) has a unit root,
then we have (ut ) an IGARCH model. Thus, IGARCH models are unit-root
GARCH models. Similar to ARIM A models, a key feature of IGARCH models
2
is that the impact of past squared shock ti = u2ti t1
for i > 0 on u2t is
persistent. For example, an IGARCH(1, 1) model can be written as
ut =
p
ht t , ht = 0 + 1 ht1 + (1 1 )u2t1 ,
(22)
Ch. 26
( + )t E(|t |) if t 0,
g(t ) =
( )t E(|t |) if t < 0.
p
For the standard Gaussian random variable t , E(|t |) =
2/. For the
standardized student-t distribution, we have
2 v 2((v + 1)/2)
.
E(|t )| =
(v 1)(v/2)
p
1 + 1 L + ... + s Ls
ht t , ln(ht ) = 0 +
g(t1 ),
1 1 L ... m Lm
where 0 is a constant and all roots of 1 + (L) = 0 and 1 (L) = 0 are outside
the unit circle. Based on this representation, some properties of the EGARCH
model can be obtained in a similar manner as those of the GARCH model. For
instance, the unconditional mean of ln ht is 0 . However, the model differs from
the GARCH model in at least two ways:
(a). It uses logged conditional variance to relax the positiveness constraint of
model coefficients,
(b). The use of g(t ) enables the model to respond asymmetrically to positive
and negative lagged values of ut .
To better understand the EGARCH model, let us consider the simple EGARCH(1, 0)
model
p
ut = ht t , (1 L) ln(ht ) = (1 L)0 + g(t1 ),
21 Copy Right by Chingnun Lee r 2008
Ch. 26
p
where t are iid standard normal. In this case, E(|t )| = 2/ and the model
for ln ht become
+ ( + )t1 if t1 0
(1 L) ln(ht ) =
+ ( )t1 if t1 < 0,
p
where = (1 )0 2/. This is a nonlinear function similar to that
of the threshold autoregressive model (T AR) of Tong (1978,1990). It suffices to
say that for this simple EGARCH model the conditional variance evolves in a
nonlinear manner depending on the sign of ut1 . specifically, we have
ut1
exp ( + ) 2
if ut1 0,
ht1
ht = ht1 exp( )
ut1
if ut1 < 0.
exp ( ) 2
ht1
Ch. 26
Finance theory suggests that an asset with a higher perceived risk would pay a
higher return on average. For example, let rt denote the expose rate of return
on some asset minus the return on a safe alternative asset. Suppose that rt is
decomposed into a component anticipated by investors at date t 1 (denoted t
and a component that was unanticipated (denoted ut ):
rt = t + ut .
Then the theory suggests that the mean return t would be related to the variance
of the return ht . In general, the ARCH-in mean or ARCHM , regression model
introduced by Engle, Lilien, and Robins (1987) is characterized by
Yt = x0t + ht + ut
m
X
p
ut =
ht t , ht = 0 +
i u2ti
i=1
for t i.i.d. with zero mean and unit variance. The effect that higher perceived
variability of ut has on the level of Yt is captured by the parameter .
Ch. 26
7.1
(t1)
hii = i + ai hii
+ bi u2i,t1 .
We might postulate n such GARCH specifications (i = 1, 2, ..., n), one for each
element of ut . The conditional covariance between uit and ujt , or the row i,
column j element of Ht , is taken to be a constant correlation ij (in stead of
0ijt ) times the conditional standard deviation of uit and ujt :
q q
(t)
(t)
(t)
hij = E(uit ujt |Ft1 ) = ij hii hjj .
Maximum likelihood estimation of this specification turns out to be quite tractable;
see Bollerslev (1990) for details.
24 Copy Right by Chingnun Lee r 2008