Garch Models

Lecture notes: Financial time series, ARCH and GARCH models
Piotr Fryzlewicz Department of Mathematics University of Bristol Bristol BS8 1TW UK p.z.fryzlewicz@bristol.ac.uk http://www.maths.bris.ac.uk/~mapzf/ July 18, 2007
Financial time series
Let Pk , k = 0, . . . , n, be a time series of prices of a nancial asset, e.g. daily quotes on a share, stock index, currency exchange rate or a commodity. Instead of analysing P k , which often displays unit-root behaviour and thus cannot be modelled as stationary, we often analyse log-returns on Pk , i.e. the series Pk Pk1 Pk Pk1 Pk1
yk = log Pk log Pk1 = log
= log 1 +
By Taylor-expanding the above, we can see that y k is almost equivalent to the relative return (Pk Pk1 )/Pk1 . The reason we typically consider log-returns instead of relative
returns is the additivity property of log-returns, which is not shared by relative returns. As an illustration, consider the time series of daily closing values of the WIG index, which is the main summary index of Warsaw Stock Exchange, running from 16 April 1991 to 5 January 2007. The data are available from http://bossa.pl/notowania/daneatech/metastock/ (page in Polish). The top left plot in Figure 1 shows the actual series P k . The top right plot shows yk . The series yk displays many of the typical stylised facts present in nancial log-return series. As shown in the middle left plot of Figure 1, the series y k is uncorrelated, here with the exception of lag 1 (typically, log-return series are uncorrelated with the exception of the
2 is strongly rst few lags). However, as shown in the middle right plot, the squared series y k
auto-correlated even for very large lags. In fact, in this example it is not obvious that the
2 decays to zero at all. auto-correlation of yk
It is also typical of nancial log-return series to be heavy-tailed, as illustrated in the bottom left plot of Figure 1, which shows the Q-Q plot of the marginal distribution of y k against the standard normal. Finally, the bottom right plot illustrates the so-called leverage eect: the series y k responds dierently to its own positive and negative movements, or in other words the conditional distribution of |yk | {yk1 > 0} is dierent from that of |yk | {yk1 < 0}. The bottom right plot of Figure 1 shows the sample quantiles of the two conditional distributions plotted against each other. The explanation is that the market responds dierently to good and bad news, which is only too natural. Statisticians like stationarity as it enables them to estimate parameters globally, using the entire available data set. However, to propose a stationary model for y k which captures the above stylised facts is not easy, as the series does not look stationary: the local variance (volatility) is clearly clustered in bunches of low/high values. If we were to t a linear time series model (such as ARMA) to y k , the estimated parameters would come out
Pk
40000
yk 0 500 1500 Time 2500 3500
20000
0.10 0 500
0.00
0.10
1500 Time
2500
3500
Series yk
Series yk^2
0.8
ACF
0.4
ACF 0 20 40 Lag 60 80 100
0.0
0.0 0
0.4
0.8
20
40 Lag
60
80
100
Normal QQ Plot
Returns after ve moves 2 0 2
Sample Quantiles
0.10
0.00
0.10
0.00 0.00
0.04
0.08
0.05
0.10
0.15
Theoretical Quantiles
Returns after +ve moves
Figure 1:
as zero because of the lack of serial correlation property, which clearly would not be what we wanted.
2
2.1
(G)ARCH models
Introduction
Noting the above diculties, Engle (1982) was the rst to propose a stationary non-linear model for yk , which he termed ARCH (Auto-Regressive Conditionally Heteroscedastic; it means that the conditional variance of y k evolves according to an autoregressive-type process). Bollerslev (1986) and Taylor (1986) independently generalised Engles model to make it more realistic; the generalisation was called GARCH. GARCH is probably the most commonly used nancial time series model and has inspired dozens of more sophisticated models. Literature. Literature on GARCH is massive. My favourites are: Giraitis et al. (2005), Bera and Higgins (1993), Berkes et al. (2003), and the book by Straumann (2005). This chapter is based on the latter three. Denition. The GARCH(p, q ) model is dened by
yk = k k
p 2 k = + i=1 2 i yk i + j =1 q 2 j k j ,
(1)
where > 0, i 0, j 0, and the innovation sequence {i } i= is independent and identically distributed with E (0 ) = 0 and E (2 0 ) = 1.
2 , the conditional variance of y given information available up to The main idea is that k k
time k 1, has an autoregressive structure and is positively correlated to its own recent past and to recent values of the squared returns y 2 . This captures the idea of volatility (=
2 are likely to be followed conditional variance) being persistent: large (small) values of y k
by large (small) values.
2.2
2.2.1
Basic properties
Uniqueness and stationarity
The answer to the question of whether and when the system of equations (1) admits a unique and stationary solution is not straightforward and involves the so-called top Lyapunov exponent associated with a certain sequence of random matrices. This is beyond the scope of this course. See Bougerol and Picard (1992) for a theorem which gives a necessary and sucient condition for (1) to have a unique and stationary solution. Proposition 3.3.3 (b) in Straumann (2005) implies that it is the case if
p q
i +
i=1 j =1
j < 1.
(2)
2.2.2
Mean zero
Dene the information set
Fk1 = {i ,
< i k 1}.
In any model in which k is measurable with respect to Fk1 (which is the case in the GARCH model (1)), the mean of yk is zero:
E (yk ) = E (k k ) = E [E (k k |Fk1 )] = E [k E (k |Fk1 )] = E [k E (k )] = E [k 0] = 0.
2.2.3
Lack of serial correlation
In the same way, we can show that yk is not correlated to yk+h for h > 0:
E (yk yk+h ) = E (yk k+h k+h ) = E [E (yk k+h k+h |Fk+h1 )] = E [yk k+h E (k+h |Fk+h1 )] = 0.
2.2.4
Unconditional variance
2 ), it is useful to consider an alternative representation of y 2 . First In order to compute E (yk k
dene the sequence

2 2 2 2 k = k (k 1). Zk = y k
Proceeding exactly like in Sections 2.2.2 and 2.2.3, we can show that Z k is a martingale dierence and therefore has mean zero (the lack of serial correlation property is more
4 ) < , which is not always true). The main point of this tricky as it would require E (yk
denition is that for many purposes, Z k can be treated as a white noise sequence: there are a lot of results about martingale dierence sequences which extend results on independent sequences. See for example Davidson (1994). We now proceed with our alternative representation. Write
2 2 yk = k + Zk p
q 2 i yk i
= +
i=1 p
+
j =1 q
2 j k j + Zk q 2 j yk j j =1
= +
i=1
2 i yk i
j =1
j Zkj + Zk .
If we denote R = max(p, q ), i = 0 for i > p and j = 0 for j > q , then the above can be written as
2 yk =+ i=1 R 2 (i + i )yk i j =1 q
j Zkj + Zk .
(3)
2 is an ARMA process with martingale dierence innovations. In other words, yk 2 ) = E (y 2 Using stationarity (which implies E (y k k +h )), the unconditional variance is now easy
to obtain:
R 2 E (yk ) q 2 i )E (yk i )
= +
i=1
(i +
R
j =1
j E (Zkj ) + E (Zk )
2 ) = + E (yk i=1
i + i ,
which gives
2 E (yk )=
1
R i=1
i + i
This result shows again the importance of condition (2).
2.2.5
Heavy tails of yk
In this section, we argue that the GARCH model (1) can easily be heavy-tailed. For ease of presentation, we only show it for the GARCH(1,1) model. We rst assume the following condition:
q/2 E (1 2 >1 k + 1 )
(4)
for some q > 0. This condition is, for example, satised if k N (0, 1) (but not only in this case, obviously). We write
2 2 2 2 2 k +1 = + 1 yk + 1 k = + (1 k + 1 )k ,
which, using the independence of k of Fk1 , gives

q q 2 2 q/2 2 q/2 q/2 E (k E [(1 2 = E (1 2 E (k ). k + 1 )k ] k + 1 ) +1 ) = E [ + (1 k + 1 )k ]
q q If E (k ) were nite, then by stationarity, it would be equal to E ( k +1 ). Then, simplifying,
we would obtain
q/2 , 1 E (1 2 k + 1 ) q 2) which contradicted assumption (4). Thus E ( k ) is innite, which also implies that E (y k
is innite. Thus yk does not have all nite moments, so it is heavy-tailed.
2.3
2.3.1
Estimation
Some extra notation
The parameter vector will be denoted by , that is
= (, 1 , . . . , p , 1 , . . . , q ).
We assume that there is a unique stationary solution to the set of equations (1). By
2 2 expanding k j in (1), k can be represented as 2 k = c0 + i=1 2 ) < , which is a kind of a stability condition, but this is not (given that E (log 0 2 ci yk i ,
(5)
important for now). How to compute the coecients c k in (5)? They obviously depend on the parameter . More generally, if the parameter is u = (x, s 1 , . . . , sp , t1 , . . . , tq ), then:
If q p, then x 1 t1 . . . t q
c0 (u) =
c1 (u) = s1 c2 (u) = s2 + t1 c1 (u) ...
cp (u) = sp + t1 cp1 (u) + . . . + tp1 c1 (u) cp+1 (u) = t1 cp (u) + . . . + tp c1 (u) ... cq (u) = t1 cq1 (u) + . . . + tq1 c1 (u)
If q < p, then x 1 t1 . . . t q
c0 (u) =
c1 (u) = s1 c2 (u) = s2 + t1 c1 (u) ... cq+1 (u) = sq+1 + t1 cq (u) + . . . + tq c1 (u) ... cp (u) = sp + t1 cp1 (u) + . . . + tq cpq (u)
For i > max(p, q ),
ci (u) = t1 ci1 (u) + t2 ci2 (u) + . . . + tq ciq (u).
(6)
2.3.2
Estimation via Quasi Maximum Likelihood
Dene the parameter space
U = {u : t1 +. . .+tq 0 , min(x, s1 , . . . , sp , t1 , . . . , tq ) u, max(x, s1 , . . . , sp , t1 , . . . , tq ) u},
where 0 , u, u are arbitrary but such that 0 < u < u, 0 < 0 < 1, qu < 0 . Clearly, U is a compact set (it is a closed and bounded subset of R p+q+1 ).
Quasi-likelihood for GARCH(p,q ) is dened as 1 2

2 yk wk (u)
Ln (u) =
1k n
log wk (u) +
where wk (u) = c0 (u) +

1i< 2 ci (u)yk i
(in practice we truncate this sum, but let us not worry about it for now). Note that
2. wk ( ) = k
If k is standard normal, then conditionally on F k1 , yk is normal with mean zero and variance wk ( ). Normal log-likelihood for N (0, 2 ) is log 1 x2 1 +C 2 2 2 log 2 + x2 2
Thus, if k are N (0, 1) and the parameter is a general u, then the conditional likelihood of yk given Fk1 is 1 2 log wk (u) +
2 yk wk (u)
n is dened as Summing over k , we get exactly Ln (u). The quasi-likelihood estimator n = argmax uU Ln (u). The estimator is called a quasi-likelihood estimator as it will also be consistent if the variables k are not normally distributed. In order to prove consistency, we rst need to list a few assumptions. Assumption 2.1 (i) 2 0 is a non-degenerate random variable.
10
(ii) The polynomials A(x) = 1 x + 2 x2 + . . . + p xp B (x) = 1 1 x 2 x2 . . . q xq
do not have common roots. (iii) lies in the interior of U . (This implies that there are no zero coecients and that the orders p and q are known!) (iv)
2(1+ )
E 0 for some > 0. (v)
<
t0
lim t P (2 0 t) = 0
for some > 0. (vi) E (2 0 ) = 1. (vii) E |0 |2 <
for some > 0. (viii)

2 E |y0 | <
11
for some > 0. n . We are now ready to formulate a consistency theorem for the quasi-likelihood estimator Theorem 2.1 Under Assumptions 2.1 (i)(vi), we have n
a.s.,
as n .
2.3.3
Consistency proof for the Quasi Maximum Likelihood estimator
Before we prove Theorem 2.1, we need a number of technical lemmas. These lemmas use, in an ingenious and beautiful way, a number of simple but extremely useful mathematical techniques/tricks, such as: mathematical induction, the H older inequality, an alternative representation of the expectation of a nonnegative random variable, a method for proving implications, the Bonferroni inequality, the Markov inequality, the mean-value theorem, ergodicity, the ergodic theorem and the concept of equicontinuity.
Lemma 2.1 For any u U , we have C1 ui ci (u) ci (u) C2 0 for some positive constants C1 , C2 . Proof. By induction. (7) and (8) hold for 0 i max(p, q ) if C 1 (C2 ) is chosen small (large) enough. Let j > R and assume (7), (8) hold for all i < j . By (6), cj (u) u min cj k (u) uC1 uj 1 = C1 uj .
k =1,...,q i/q
0i< 0 i < ,
(7) (8)
12
Also by (6),
(j q )/q j/q
cj (u) (t1 + . . . + tq ) max cj k (u) 0 C2 0

k =1,...,q
= C 2 0 ,
which completes the proof.
Lemma 2.2 Suppose Assumptions 2.1 (iii), (v) and (vii) hold. Then for any 0 < < we have E sup
uU 2 k wk (u)
< .
Proof. Take M 1. We have

2 k wk (u) 2 k M 2 i=1 ci (u)yk i 2 k M 2 2 i=1 ci (u)k i k i
(9)
2, By denition of k
2 2 k 1 > i k i1 , 2 2 k 1 > i yk i1 , 2 k 1 > .
1iq 1ip
Hence
2 k 2 k1 2 2 2 2 + 1 yk 1 + . . . + p yk p + 1 k 1 + . . . + q k q 2 k 1 2 2 1 2 2 yk 3 p 2 q k 1 k 1 2 + + + ... + + 1 + +... + 2 2 p1 1 q1 k1 1 yk11 2
1+
K1 (1 + 2 k 1 ),
13
for some K1 > 1. This leads to

2 k M K1 2 k i
(1 + 2 k j ),
1j M
for all i = 1, . . . , M . Thus, using Lemma 2.1, (9) gives

2 k wk (u) 2 k M 2 2 i=1 ci (u)k i k i M K1 1j M
(1+2 k j )
1
M 2 i=1 ci (u)k i
M K2
2 1j M (1 + k j ) , M 2 i=1 k i
older for some K2 > 0, which is now independent of u (a uniform bound)! Thus, using the H inequality,
2 k sup uU wk (u) M K2 2 1j M (1 + k j ) M 2 i=1 k i
By the independence of the i s, and by Assumption 2.1 (vii),
M K2 E
1j M
/ (1 + 2 ) E k j
2 k i
i=1
E It is now enough to show
1j M
= E (1 + 2 (1 + 2 0) k j )
< .
E
i=1
2 k i
< .
Dealing with moments of sums can be dicult, but we will use a trick to avoid having to do so. From 1st year probability, we know that for a nonnegative random variable Y ,
EY =
P (Y > t)dt.
14
Thus if P (Y > t) K3 t2 for t large enough, then EY < . We will show

M
P for t large enough. We have

M
2 k i
i=1
> t K3 t2
2 k i
i=1
> t = P
2 k i t
i=1
(10)
Obviously the following implication holds:

M

2 k i t
i=1
i,
2 k i t
so (10) can be bounded from above by

P 2 0 t

(11)
Now, Assumption 2.1 (vii) implies P (2 0 s) Cs , . Thus, we bound (11) from above by for all s, for some C M t C
( )M
K3 t2 ,
if M is chosen large enough, which completes the proof.
Lemma 2.3 Suppose Assumptions 2.1 (iii) and (viii) hold. Then for any > 0,
2 i=1 ici (u)yk i 2 i=1 ci (u)yk i
sup
uU
1+
< .
15
Proof. For any M 1, we have

2 i=1 ici (u)yk i 2 i=1 ci (u)yk i M 2 2 i=1 ici (u)yk i i=M +1 ici (u)yk i + 2 2 1+ 1+ i=1 ci (u)yk i i=1 ci (u)yk i M 2 i=1 ici (u)yk i 2 + ici (u)yk i M 2 i=1 ci (u)yk i i=M +1 2 M+ ici (u)yk i . i=M +1
1+
(12)
We now recall another basic fact of probability. For a nonnegative variable Y , if P (Y > t) et , then all moments of Y are nite, i.e. E |Y | < for all . Explanation: E |Y | = P (Y > t)dt = P (Y > t1/ )dt et
/
dt < .
We will show
P
1/q
sup
uU i=M +1
2 ici (u)yk i > t
et .
Choose constants 0
< < 1, > 1 such that < 1 and take M M0 (C2 , , )
large enough. Then by (8) we have

2 ici (u)yk i > t 2 i0 yk i > t i=M +1 i/q 2 i yk i > t . i=M +1
sup
uU i=M +1
C2
(13) Now,
2 i yk i > t i=M +1 2 i yk i > t
i = M + 1, . . . ,
1 i
To see that this implication of the form p q is true, it is easy to show that q p. Thus, by the Bonferroni inequality, (13) can be bounded from above by
2 yk i i t
P
i=M +1
>
1 i
=
i=M +1
2 yk i
>t
( )i
16
Now using the Markov inequality, we bound the above by
E |y0 |2 t
i=M +1
( )i = E |y0 |2 t
( )M 1 ( )
(14)
We now take t > 2 max(M0 , 1) (it is enough to show the above for large t) and M = t/2. Combining (12) and (14), we get
2 ici (u)yk i 2 ici (u)yk i > t/2
M + sup
uU i=M +1
>t
= P
sup
uU i=M +1
E |y0 |2 (t/2)
( )t/2 K4 eK5 t , 1 ( )
which completes the proof. Lemma 2.4 Let | | denote the maximum norm of a vector. Suppose that Assumptions 2.1 (iii), (iv), (v), (viii) hold. Then 1 |log wk (u) log wk (v )| < |u v |
u,v U 2 2 yk yk 1 |u v | wk (u) wk (v )
E sup
u,v U
E sup
<
Proof. The mean value theorem says that for a function f , we have |f (u) f (v )| = |f ( )|, |u v |
2 /w (u), we get where max(| u|, | v |) |u v |. Applying it to f (u) = y k k 2 2 2 yk yk yk = |u v | wk (u) wk (v ) wk ( )
wk ( ) . wk ( )
Clearly
wk (u) = c0 (u) +
i=1
2 ci (u)yk i .
17
We now use a fact which we accept without proof. The proof is easy but long and is again done by induction. If you are interested in the details, see Berkes et al. (2003), Lemma 3.2. |c0 (u)| < C i (u). |ci (u)| < Cic Using the above, we get 1+ wk (u) K6 wk (u) 1+
2 i=1 ici (u)yk i . 2 i=1 ci (u)yk i
sup
uU
Given the above, Lemma 2.3 implies that w (u) E sup k uU wk (u)
2+
< .
On the other hand, by Lemma 2.2 and the assumptions of this lemma,
2 yk wk (u) 1+/2 1+/2 = E (2 E k) 2 k wk (u) 1+/2
E sup
uU
sup
uU
< .
The H older inequality completes the proof. The proof for log is very similar.
Lemma 2.5 Suppose Assumptions 2.1 (iii), (iv) and (v) hold. Then 1 Ln (u) L(u) 0 n
sup
uU
almost surely, as n , where

2 y0 w0 (u)
L(u) = 1/2E log w0 (u) +
18
Proof. We start by stating the fact that if E | 2 0 | < for some > 0, then there exists 2 | < and E | 2 | < . The proof is beyond the scope of this a > 0 such that E |y0 0

course. See Berkes et al. (2003), Lemma 2.3, for details. Thus, the assumptions of this lemma mean that
2 E |y0 | <
(15)
for some . Using Lemma 2.1,
0 < C1 wk (u) C2 1 + which implies
1i<
2 0 yk i ,
i/q
| log w0 (u)| log C2 + log 1 +
1i<
2 A+B 0 yk i
i/q
2 0 yk i 1i< i/q
which implies E | log w0 (u)| < by (15). By Lemma 2.2, E

2 y0 w0 (u)
= E2 0E
2 0 w0 (u)
< .
Clearly, there exists a function g such that
yk = g (k , k1 , . . .),
and therefore yk is stationary and ergodic by Theorem 3.5.8 of Stout (1974) (since { k }k is stationary and ergodic as it is independent). As E |L(u)| < , we can use the ergodic theorem, which says that for any u U , 1 Ln (u) L(u) n 19
almost surely. Also,

u,v U
sup |Ln (u) Ln (v )|
1 1/2 |u v |
k ,
1k n
where k = sup
u,v U
1 |u v |
| log wk (u) log wk (v )| +
2 2 yk yk wk (u) wk (v )
Again by Theorem 3.5.8 of Stout (1974), k is stationary and ergodic. By Lemma 2.4, E0 < . Using the ergodic theorem, 1 n almost surely, showing that 1 1 1 Ln (u) Ln (v ) = O (1). n n |u v |
n
i = O (1)
i=1
sup
u,v U
Thus the sequence of functions Ln (u)/n is equicontinuous. Also as shown earlier it converges almost surely to L(u) for all u U . This, along with the fact that U is a compact set, implies that the convergence is uniform, which completes the proof. (Recall a well-known fact of mathematical analysis: let f n be an equicontinuous sequence of functions from a compact set to R. If fn (x) f (x) for all x, then fn f uniformly in x.) Lemma 2.6 Suppose the conditions of Theorem 2.1 are satised. Then L(u) has a unique maximum at .
2 . As E2 = 1, Proof. w0 ( ) = k 0 2 y0 w0 (u) 2 0 w0 (u)
=E
20
We have
2 1 y2 1 y0 2 L( ) L(u) = E log 0 + 0 + E log w ( u ) + 0 2 2 2 w0 (u) 0 2 0 1 2 = E (log 0 ) + 1 E (log w0 (u)) E 2 w0 (u) 2 2 1 0 0 = E log +1 2 w0 (u) w0 (u) 2 2 0 0 1 1 log . = + E 2 2 w0 (u) w0 (u)
The function x log x is positive for all x > 0 and attains its minimum value (of 1) for x = 1. Thus L(u) has a global maximum at . Is the maximum unique? Assume L(u ) = L( ) for some u U . 1 1 0 = L( ) L(u ) = + E 2 2
2 2 0 0 log w0 (u ) w0 (u )
When is it possible that E (X log X ) = 1 if X > 0? X log X 1, so it is only possible if

2 = w (u ) almost surely, so we must also have c ( ) = c (u ) X = 1 almost surely. Thus 0 0 i i
for all i (we accept this intuitively obvious statement here without proof; see Berkes et al.
2 = w (u ) for all k . Let (2003) for details). So we also have k k
u = (x , s 1 , . . . , sp , t1 , . . . , tq ).
On the one hand, by denition,

2 2 2 2 2 k = wk ( ) = + 1 yk 1 + . . . + p yk p + 1 k 1 + . . . + q k q .
21
On the other hand, by the above discussion,

2 2 2 2 2 k = wk (u ) = x + s 1 yk 1 + . . . + sp yk p + t1 k 1 + . . . + tq k q .
Equating coecients (using the uniqueness of GARCH representation, also without proof: see Berkes et al. (2003) for details), we have u = , which completes the proof. We are nally ready to prove Theorem 2.1. Proof of Theorem 2.1. U is a compact set. L n /n converges uniformly to L on U with probability one (Lemma 2.5) and L has a unique maximum at u = (Lemma 2.6). Thus by standard arguments (best seen graphically!) the locations of the maxima of L n /n converge to that of L. This completes the proof of the Theorem. Exercise: try to think why we need uniform convergence for this reasoning to be valid. Would it not be enough if Ln (u)/n converged pointwise to L(u) for all u?
2.4
Forecasting
By standard Hilbert space theory, the best point forecasts of y k under the L2 norm are given by E (yk+h |Fk ) and are equal to zero if h > 0 by the martingale dierence property of yk . The equation (3) is a convenient starting point for the analysis of the optimal forecasts for
2 . Again under the L norm, they are given by E (y 2 yk 2 k +h |Fk ). Formally, this only makes 4 ) < , which is not always the case. However, many authors take the above as sense if E (yk
their forecasting statistic of choice. It might be more correct (and interesting) to consider
2 Median(yk +h |Fk ), which is the optimal forecast under the L 1 norm. This always makes 2 ) < as we saw before. However, it is mathematically far more tractable to sense as E (yk 2 look at E (yk +h |Fk ), which is what we are going to do below.
22
Take h > 0. Recall that E (Zk+h |Fk ) = 0. From (3), we get

R 2 yk +h = + i=1 R 2 E (yk +h |Fk ) = + i=1 R 2 E (yk +h |Fk ) = + i=1 2 (i + i )E (yk +hi |Fk ) j =1 2 (i + i )E (yk +hi |Fk ) j =1 q 2 (i + i )yk +hi j =1 q q
j Zk+hj + Zk+h
j E (Zk+hj |Fk ) + E (Zk+h |Fk )
j E (Zk+hj |Fk ).
(16)
The recursive formula (16) is used to compute the forecasts, with the following boundary conditions:
2 E (yk +hi |Fk ) is given recursively by (16) if h > i,
2 2 E (yk +hi |Fk ) = yk +hi if h i,
E (Zk+hj |Fk ) = 0 if h > j ,
E (Zk+hj |Fk ) = Zk+hj if h j .
2.4.1
The asymptotic forecast
For h > p, (16) becomes

R 2 E (yk +h |Fk )
=+
i=1
2 (i + i )E (yk +hi |Fk ),
(17)
2 which is a dierence equation for the sequence {E (y k +h |Fk )}h=p+1 . Standard theory of
dierence equations says that if the roots of the polynomial p(z ) = 1 (1 + 1 )z . . . (R + R )z R
23
lie outside the unit circle, then the solution of (17) converges to 1
R i=1 (i
+ i )
2 ! In other words, as the forecasting horizon which is the unconditional expectation of y k
gets longer and longer, the conditioning set F k has less and less impact on the forecast and asymptotically, it does not matter at all.
2.4.2
Example: GARCH(1,1)
In this section, we obtain explicit formulae for forecasts in the GARCH(1,1) model. Using formula (16) and the denition of Zk , we get
2 2 2 2 E (yk +1 |Fk ) = + (1 + 1 )yk 1 Zk = + 1 yk + 1 k .
Substituting recursively into (17), we obtain

2 1 2 2 E (yk +2 |Fk ) = [1 + (1 + 1 ) ] + 1 (1 + 1 )yk + 1 (1 + 1 )k 2 1 2 2 2 2 2 E (yk +3 |Fk ) = [1 + (1 + 1 ) + (1 + 1 ) ] + 1 (1 + 1 ) yk + 1 (1 + 1 ) k
...
h1 2 E (yk +h |Fk )
=
i=0
2 2 (1 + 1 )i + 1 (1 + 1 )h1 yk + 1 (1 + 1 )h1 k ,
which clearly converges to /(1 1 1 ) as h , as expected.
2.5
Extensions of GARCH
There are many extensions of the GARCH model. Two of them, EGARCH and IGARCH are probably the most popular and are covered in Straumann (2005). The Exponential
24
GARCH (EGARCH) model reads

2 2 log k = + log k 1 + k 1 + |k 1 |.
The Integrated GARCH (IGARCH) process is a GARCH process for which
R i=1 i + i
= 1.
2.6
Software for tting GARCH models
Both S-Plus and R have their own packages containing routines for tting and forecasting GARCH models. The S-Plus module is called FinMetrics, is described on http://www.insightful.com/products/finmetrics/ and is a commerical product. Sadly, it is much better than its (free) R counterpart, the tseries package, available from http://cran.r-project.org/src/contrib/Descriptions/tseries.html The R package is only able to t GARCH models, while the S-Plus module can t GARCH, EGARCH and a number of other models.
2.7
Relevance of GARCH models
Are GARCH models really used in practice? The answer is YES. Only recently, a big UK bank was looking for a time series analyst to work on portfolio construction (risk management). One of the job requirements was familiarity with GARCH models!
References
A. K. Bera and M. L. Higgins. ARCH models: properties, estimation and testing. J. Economic Surveys, 7:305366, 1993. 25
I. Berkes, L. Horv ath, and P. Kokoszka. GARCH processes: structure and estimation. Bernoulli, 9:201227, 2003. T. Bollerslev. Generalized autoregressive conditional heteroskedasticity. J. Econometrics, 31:307327, 1986. P. Bougerol and N. Picard. Stationarity of GARCH processes and of some nonnegative time series. J. Econometrics, 52:115127, 1992. J. Davidson. Stochastic Limit Theory. Oxford University Press, 1994. R. F. Engle. Autoregressive conditional heteroscedasticity with estimates of the variance of United Kingdom ination. Econometrica, 50:9871007, 1982. L. Giraitis, R. Leipus, and D. Surgailis. Recent advances in ARCH modelling. In Long Memory in Economics, pages 339. Springer, Berlin, 2005. W. F. Stout. Almost Sure Convergence. Academic Press, New York-London, 1974. Probability and Mathematical Statistics, Vol. 24. D. Straumann. Estimation in Conditionally Heteroscedastic Time Series Models, volume 181 of Lecture Notes in Statistics. Springer-Verlag, Berlin, 2005. S. J. Taylor. Modelling Financial Time Series. Wiley, Chichester, 1986.
26

Garch Models

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Garch Models

Uploaded by

Copyright:

Available Formats

Lecture notes: Financial time series, ARCH and GARCH models

Financial time series

yk = log Pk log Pk1 = log

yk 0 500 1500 Time 2500 3500

ACF 0 20 40 Lag 60 80 100

Returns after +ve moves

by large (small) values.

Dene the information set

E (yk ) = E (k k ) = E [E (k k |Fk1 )] = E [k E (k |Fk1 )] = E [k E (k )] = E [k 0] = 0.

Lack of serial correlation

2 ), it is useful to consider an alternative representation of y 2 . First In order to compute E (yk k

dene the sequence

This result shows again the importance of condition (2).

which, using the independence of k of Fk1 , gives

q q If E (k ) were nite, then by stationarity, it would be equal to E ( k +1 ). Then, simplifying,

is innite. Thus yk does not have all nite moments, so it is heavy-tailed.

The parameter vector will be denoted by , that is

c1 (u) = s1 c2 (u) = s2 + t1 c1 (u) ...

For i > max(p, q ),

ci (u) = t1 ci1 (u) + t2 ci2 (u) + . . . + tq ciq (u).

Estimation via Quasi Maximum Likelihood

Dene the parameter space

U = {u : t1 +. . .+tq 0 , min(x, s1 , . . . , sp , t1 , . . . , tq ) u, max(x, s1 , . . . , sp , t1 , . . . , tq ) u},

Quasi-likelihood for GARCH(p,q ) is dened as 1 2

where wk (u) = c0 (u) +

(ii) The polynomials A(x) = 1 x + 2 x2 + . . . + p xp B (x) = 1 1 x 2 x2 . . . q xq

E 0 for some > 0. (v)

for some > 0. (vi) E (2 0 ) = 1. (vii) E |0 |2 <

for some > 0. (viii)

Consistency proof for the Quasi Maximum Likelihood estimator

cj (u) (t1 + . . . + tq ) max cj k (u) 0 C2 0

which completes the proof.

Proof. Take M 1. We have

2 2 k 1 > i k i1 , 2 2 k 1 > i yk i1 , 2 k 1 > .

for some K1 > 1. This leads to

for all i = 1, . . . , M . Thus, using Lemma 2.1, (9) gives

By the independence of the i s, and by Assumption 2.1 (vii),

E It is now enough to show

Thus if P (Y > t) K3 t2 for t large enough, then EY < . We will show

P for t large enough. We have

Obviously the following implication holds:

so (10) can be bounded from above by

if M is chosen large enough, which completes the proof.

Proof. For any M 1, we have

2 ici (u)yk i > t

< < 1, > 1 such that < 1 and take M M0 (C2 , , )

large enough. Then by (8) we have

Now using the Markov inequality, we bound the above by

almost surely, as n , where

L(u) = 1/2E log w0 (u) +

for some . Using Lemma 2.1,

0 < C1 wk (u) C2 1 + which implies

| log w0 (u)| log C2 + log 1 +

which implies E | log w0 (u)| < by (15). By Lemma 2.2, E

Clearly, there exists a function g such that

almost surely. Also,

sup |Ln (u) Ln (v )|

| log wk (u) log wk (v )| +

When is it possible that E (X log X ) = 1 if X > 0? X log X 1, so it is only possible if

On the one hand, by denition,

On the other hand, by the above discussion,

Take h > 0. Recall that E (Zk+h |Fk ) = 0. From (3), we get

j E (Zk+hj |Fk ) + E (Zk+h |Fk )