Professional Documents
Culture Documents
*
1
*
*
) (
2
where
t
y is the ex post interest differential,
*
t
y is the ex ante real interest differential and
t
is the cross-country differential in inflation forecast errors which is assumed to be
identically and independently distributed. They conclude that ex ante real interest rates
are short-lived and mean-reverting to zero supporting theoretical models of economic
interdependence.
Factor models
There is a long tradition of factor models in finance. These models simplify the
computation of the covariance matrix of returns in the context of mean-variance
portfolio allocation. Furthermore, factors are central in two asset pricing theories: the
mutual fund separation theory, of which the CAPM is a special case, and the arbitrage
pricing theory (APT). In latent factor models, the observed variables depend on a few
factors that are modelled as GARCH processes. Multivariate latent factor models have
been used in several applications. For example, Diebold and Nerlove (1989) fitted a
one-factor model to represent the dynamic evolution of the volatilities of seven dollar
exchange rates. King, Sentana and Wadhwani (1994) used a factor model to asses the
extend of capital market integration on sixteen national stock markets. Sentana (2004)
analyses the statistical properties of alternative ways of creating actively and passively
managed mimicking portfolios from a finite number of assets. He proposes the
following model:
Nt
t
t
kt
t
t
Nkt
kt
kt
t N
t
t
Nt
t
t
Nt
t
t
f
f
f
v
v
v
r
r
r
+ + + =
) (
...
) (
) (
2
1
{ }
{ } { }
{ }
t t
u
t
t
t
c
u
r
t b
b a
t a t b
t a
u
r
+ +
+
+
exp 0
~
~
exp exp
~
exp
1
1
where ) (
t
y is the spot interest rate at time t for maturity + t .
In a dynamic context, Dungey, Martin and Pagan (2000) analyze bond yield spreads
between five countries by decomposing international interest rate spread into national
and global latent factors.
Modelling volatility
Consider that we are interested in modelling the volatility of the price. There are two
main types of models proposed for this goal. The most popular are the GARCH models
where the volatility is assumed to be a non-linear function of past returns. Consider, for
example, the GARCH(1,1) model given by
t t t
y =
2
1
2
1
2
+ + =
t t t
y
where
t
is an IID white noise with variance 1. The parameters have to be restricted to
guarantee the positiveness of the conditional variance. In particular, 0 > , 0 and
0 . The stationarity condition is 1 < + .
ARCH-type models assume that the volatility can be observed one-step-ahead.
However, a more realistic model for volatility can be based on modelling it having a
predictable component that depends on past information and an unexpected noise. In
this case, the volatility is a latent unobserved variable. One interpretation of the latent
volatility is that it represents the arrival of new information into the market; see, for
example, Clark (1973). In the simplest case, the log-volatility follows an AR(1) process.
Then, we have the ARSV(1) model given by
4
t t t
y
*
=
t t t
+ =
) log( ) log(
2
1
2
where
t
is a strict white noise with variance 1. The noise of the volatility equation,
t
,
is assumed to be a Gaussian white noise with variance
2
= = = =
+
0 ... 0 0
... ... ... ...
0 ... 0 1
...
, ), ,..., , ( ), 0 ,... 0 , 1 (
2 1
*
1
*
1
*
p
t t t p t t t t t
T y y y Z
,
2
=
t
H
and
2
=
t
Q .
When
t
and
t
are assumed to be Gaussian, the model is a Gaussian state space
model.
Unobserved component models depend on several disturbances. Provided the model is
linear, the components driven these disturbances can be combined to give a model with
a single disturbance. This is known as reduced form. The reduced form is an ARIMA
model, and the fact that it is derived from a structural form will typically imply
restrictions on its parameters.
Consider, once more, the random walk plus noise model. Taking first differences, we
obtain the following expression:
t t t
y + =
The mean and variance of
t
y are given by
0 ) ( ) ( = + =
t t t
E y E
2 2 2
2 ) ( ) (
+ = + =
t t t
E y Var
The dynamic properties of
t
y can be analysed by looking at its autocorrelation
function given by
=
+
=
+
=
2 , 0
1
,
2
1
2
) (
2 2
2
h
h
q
h
The constant
2
2
,
t
y is a white noise and
t
y is a
random walk.
2.2 The Kalman filter: filtered estimates of the unobserved components
The Kalman filter is made up of two sets of equations. First, we have the prediction
equations that give us the one-step ahead predictions of the unobserved components:
[ ]
t t t t t t t
c a T Y E a + = =
1 1 1 /
|
where [ ]
1 1 1
|
=
t t t
Y E a . The one step-ahead MSE matrices of the components are
given by
[ ]
t t t t t t t t t t
Q T P T Y a E P + = =
'
1 1 1 / 1 /
|
where [ ]
1 1 / 1 /
| )' )( (
=
t t t t t t t t
Y a a E P is the MSE matrix of
1 t
a . Once we have
these one-step ahead estimates of the state, we can also obtain the one-step ahead
estimated of
t
y and corresponding prediction errors and their MSEs as follows
[ ]
t t t t t t t t t
t t t t t t t t
a Z y y
d a Z Y y E y
+ = =
+ = =
) (
|
1 / 1 /
1 / 1 1 /
( )
t t t t t t
H Z P Z E F + = =
'
1 /
2
The one-step ahead estimates of the state,
1 / t t
a , can be updated using the new
information provided by the observation
t
y . The resulting equations are known as
updated equations. These equations can be easily derived using the properties of the
multivariate normal distribution. In particular, consider the distribution of
t
and
t
y
conditional on past information up to and including time 1 t . The conditional mean
and variance of
t
and
t
y have been derived before. The conditional covariance
between both variables can be easily derived taking into account that
t
y can be written
as
t t t t t t t t t t
a Z d a Z y + + + =
) (
1 / 1 /
and, therefore,
7
[ ]
'
1 /
'
1 / 1 /
1
1 / 1 /
1
1
)) )' )(( (( )' )( ( ) | , (
t t t
t t t t t t t t
t
t t t t t t t t
t
t t t
Z P
Z a a E d a Z y a E Y y Cov
=
+ = =
Consequently, the required conditional distribution is given by
t t t t
t t t t t
t t t t
t t
t
t
t
F P Z
Z P P
d a Z
a
N Y
y
1 /
'
1 / 1 /
1 /
1 /
1
, |
.
From where, we can see that the updated equations are given by
[ ]
[ ]
1 /
1 '
1 / 1 /
1 '
1 / 1 /
| )' )( (
|
= =
+ = =
t t t t t t t t t t t t t t t
t t t t t t t t t t
P Z F Z P P Y a a E P
F Z P a Y E a
The prediction error plays a role in updating the new estimates. The more the predictor
deviates from its realized value, the bigger the change made to the estimator of the state.
If the model is Gaussian, and given the initial conditions,
0
a and
0
P , the Kalman filter
delivers the conditional mean of the estate which is the Minimum MSE estimator of the
state as each new observation becomes available. When the disturbances are not
normally distibuted, it is not longer true, in general, that the Kalman filter yields the
conditional mean of the state vector. If the model is Gaussian, then the estimates are
minimum MSE linear estimates.
It is important to note that, in time-invariant models, the observations
t
y do not affect
the MSE matrices
1 / t t
P and
t
P . Therefore, these matrices are both conditional and
unconditional MSE matrices.
Consider, for example, the Random walk plus noise model with known
2
and
2
. To
initalize the filter, we need initial values for
0
a and
0
P . One alternative is to use what is
known as a diffuse prior distribution which in this case, is given by 0
0
= m and
=
0
P , where ) (
0
0
0
E m = . This says that nothing is known about the initial state.
Then, using the expression of the prediction equations, we obtain
2
0 0 / 1
0 0 / 1
0
+ =
= =
P P
m m
We can update this estimate of the underlying level at time 1 by using the information
contained in
1
y . Then, using the updated equations of the Kalman filter, we obtain
8
2
2 2
0
2
0
2
0 0 / 1
1
1 0 / 1 0 / 1 1
1
2 2
0
2
0
0 / 1 1
1
0 / 1
0 / 1 1
1 ) (
) (
+ +
+
+ = =
+ +
+
= + =
P
P
P P F P P P
y
P
P
m y
F
P
m m
Then, we can follow using recursively the prediction and updated equations,
2
1 1 / 2
1 1 / 2
+ =
=
P P
m m
and
+ +
+
+ = =
+ =
2 2
1
2
1
2
1 1 / 2
1
2 1 / 2 1 / 2 2
1 / 2 2
2
1 / 2
1 / 2 2
1 ) (
) (
P
P
P P F P P P
m y
F
P
m m
Note that initializing with a difusse prior is equivalent to using the first observation as
an initial value at time 1 = t .
If the state is generated by a stationary process, the initial conditional for the Kalman
filter are given by its marginal mean and variance.
Summarizing, if the system matrices are observable at time 1 t , the Kalman filter
yields optimal:
i) One-step ahead estimates of the unobserved components:
1 / t t
a .
ii) Updated estimates of the unobserved components:
t
a .
iii) One-step ahead prediction errors of
t
y and their variances:
t
and
t
F .
Consider, for example, the following series generated by a Random walk plus noise
model with parameters 1
2
=
and 49 . 0
2
=
+ +
+ +
+ + +
=
+ =
+ = =
t t t t t
t t t T t t t T t
t t t T t t t T t T t
P T P P
P P P P P P
c a T a P a Y E a
Given that the smoothing estimate of
t
is based on more information than the filtered
estimates, its MSE,
T t
P
/
, is, in general, smaller than that of the filtered estimator.
These smoothers are very useful because they also provide what is known as the
auxilary residuals which are estimates of the disturbances associated to each of the
different components of the model. These auxiliary residuals can be used to identify
outliers that affect different components (Harvey and Koopman, 1992) or to identify
whether the components of a given series are conditionally heteroscedastic (Broto and
Ruiz, 2005a,b). Expressions of the auxiliary residuals have been derived by Durbin and
Koopman (2001).
The following figure represents the smoothed estimates of the underlying level together
with the one-step ahead estimates for the same series considered above.
-20
-16
-12
-8
-4
0
4
8
250 500 750 1000
YFILTERED YSMOOTHED
10
2.4 Prediction
Once, we reach T t = , we can run the prediction equations to obtain forecasts of future
values and their MSEs.
[ ]
[ ] Z T Q T Z Z T P T Z Y a a E P
a T Z Y E a
k
j
j
t
j k
T
k
t k T k T k T k T k T
T
k
t T k T k T
+ = =
= =
=
+ + + + +
+ +
1
0
' '
'
' ' | )' )( (
|
For example, in the random walk plus noise model
2
/ 1
/
k P P
m m
T T T
T T k T
+ =
=
+
+
2.5 Estimation of the parameters
Up to now, we have assumed that the parameters of the model are known. However, in
practice, they are unknown and should be estimated from the data available. If the
model is conditionally Gaussian, then the parameters can be estimated by Maximum
Likelihood (ML) . Remember that the Kalman filter provides the innovations (one-step
ahead errors) and their variances.
The likelihood function can be written as follows:
=
=
T
t
t t
Y y p L
1
1
) | (
The conditional distribution of
t
y can be easily derived by writting
t t t t t t t t t t
a Z d a Z y + + + =
) (
1 / 1 /
Then, if ) (
1 /
t t t
a and
t
are conditionally normal,
) , ( |
1 / 1 t t t t t t t
F d a Z N Y y +
and the log-likelihood function can be written down inmediately as
= =
=
T
t
T
t t
t
t
F
F
T
L
1 1
2
2
1
| | log
2
1
) 2 log(
2
log
This expression is know as the prediction error decomposition form of the likelihood.
The parameters are estimated by maximizing numerically the likelihood function. The
asymptotic properties of the ML estimator are the usual ones as far as the parameters lie
on the interior of the parameter space. However, in many models of interest, the
parameters are variances, and it is of interest to know whether they are zero (we have
deterministic components). In this case, the asymptotic distribution could still be related
11
with the Normal but is modified as to take into account of the boundary; see Harvey
(1989).
If the model is not conditionally Gaussian, then maximizing the Gaussian log-
likelihood, we obtain what is known as the Quasi-Maximum Likelihood (QML)
estimator. In this case, the estimator looses its eficiency. Alternatives based on the true
likelihood are more efficient but, when they can be defined, are more complicated from
the computational point of view. Furthermore, droping the Normality assumption tends
to affect the asymptotic distribution of all the model parameters. In this case, the
asymptotic distribution is given by
1 1
) (
= IJ J T
where
=
'
log
2
L
E J and
=
'
log log
L L
E I and the expectations are taken
with respect to the true distribution; see Gourieroux (1997).
Once the parameters are estimated, the Kalman filter is run again with the parameters
fixed in the estimated values to yield one-step ahead and updated estimates of the
unknown estates,
1 /
t t
a and
t
a respectively, and the smoother to yield estimates based
on the whole sample,
T t
a
/
. As a by-product, we also obtain several residuals:
a) Standarized residuals:
t
t
t
F
~
= . Apply standard tests for Normality,
heteroscedasticity and serial correlation.
b) Auxiliary residuals:
t T t t T t T t
t T t t t T t
c a T a
d a Z y
=
=
/ 1 / /
/ /
References
Broto, C. and E. Ruiz (2005), Unobserved component models with asymmetric
conditional variances, Computational Statistics and Data Analysis, forthcoming.
Harvey, A.C: (1989), Forecasting, Structural Time series models and the Kalman filter,
London, Cambridge University Press. Chapter 4.
Harvey, A.C. (1993), Time Series Models, 2
nd
ed., Harvester Wheatsheaf, London.
Chapters 4 and 5.
12
Harvey, A.C. and S.J . Koopman (1992), Diagnostic checking of unobserved-
components time series models, Journal of Business & Economic Statistics, 10, 377-
389.
Koopman, S.J . (1993), Disturbance smoother for state space models, Biometrika, 80,
117-126.
Koopman, S.J ., A.C. Harvey, J .A. Doornik and N. Shephard (2000). STAMP: Structural
Time Series Analyser, Modeller and Predictor. Timberlake Consultants Press.
Wells, The Kalman Filter in Finance. Kluwer Academic Publishing.
Exercices
1. (a)Consider an AR(1) model in which the first observation is fixed. Write down
the likelihood function of the observation at time is missing. (b) Given a value
of the AR parameter, , show that the estimator of the missing observation
obtained by smoothing is
2
1 1
/
1
) (
+
+
=
+
y y
y
T
.
2. Consider a random walk plus noise model. If 0
2
=
.
3. Using the Kalman filter obtain estimates of the underlying level of the IBEX35.
Derive the reduced form model and check whether this model is in concordance
with the model fitted analysing the correlogram of
t
y .
4. Obtain the reduced form of the local level model given by
t t t
t t t t
t t t
y
+ =
+ + =
+ =
1
1
13
where
t t
, and
t
are mutually uncorrelated white noise process with variances
2 2
,
and
2
respectively.