Professional Documents
Culture Documents
In this paper we present a neural network extended Kalman filter for modeling noisy financial time
series. The neural network is employed to estimate the nonlinear dynamics of the extended Kalman
filter. Conditions for the neural network weight matrix are provided to guarantee the stability of the
filter. The extended Kalman filter presented is designed to filter three types of noise commonly observed
in financial data: process noise, measurement noise, and arrival noise. The erratic arrival of data (arrival
noise) results in the neural network predictions being iterated into the future. Constraining the neural
network to have a fixed point at the origin produces better iterated predictions and more stable results.
The performance of constrained and unconstrained neural networks within the extended Kalman filter is
demonstrated on “Quote” tick data from the $/DM exchange rate (1993–1995).
399
400 P. J. Bolland & J. T. Connor
when a deterministic system is unknown and must be case we do not advocate the constrained neural net-
estimated from the data, see for example Engle and work. However, for modeling foreign exchange data,
Watson (1987). In Sec. 2, we give a brief description this constrained neural network should yield better
of the workings of the Kalman filter on linear models. results.
Neural networks have been successfully applied In Sec. 5, the performance of the extended
to the prediction of time series by many practition- Kalman filter with a constrained neural network is
ers in a diverse range of problem domains (Weigend shown on Dollar–Deutsche Mark foreign exchange
1990). In many cases neural networks have been tick data. The performance of the constrained neu-
shown to yield significant performance improvements ral network is shown to be both quantitatively and
over conventional statistical methodologies. Neural qualitatively superior to the unconstrained neural
networks have many attractive features compared to network.
other statistical techniques. They make no para-
metric assumption about the functional relationship 2. Linear Kalman Filter
being modeled, they are capable of modeling inter-
action effects and the smoothness of fit is locally Kalman filters originated in the engineering commu-
conditioned by the data. Neural networks are gen- nity with Kalman and Bucy (1960) and have been ex-
erally designed under careful laboratory conditions tensively applied to filtering, smoothing and control
which while taking into consideration process noise, problems. The modeling of time series in state space
usually ignore the presence of measurement and ar- form has advantages over other techniques both in
rival noise. In Sec. 3, we show how the extended interpretability and estimation. The Kalman filter
Kalman filter can be used with neural network mod- lies at the heart of state space analysis and provides
els to produce reliable predictions in the presence of the basis for likelihood estimation. The general state
process, measurement and arrival noise. space form for a multivariate time series is expressed
When observations are missing, the neural net- as follows. The observable variables, a n × 1 vector
works predictions are iterated. Because the iterated yt , are related to an m × 1 vector xt , known as the
feedforward neural network predictions are based on state vector, via the observation equation,
previous predictions, it acts as a discrete time recur- yt = Ht xt + εt t = 1, . . . , N (1)
rent neural network. The dynamics of the recurrent
network may converge to a stable point; however it where Ht is the n × m observation matrix and εt ,
is equally possible that the recurrent network could which denotes the measurement noise, is a n × 1 vec-
oscillate or become chaotic. Section 4 describes a tor of serially uncorrelated disturbance terms with
neural network model which is constrained to have a mean zero and covariance matrix Rt . The states are
stable point at the origin. Conditions are determined unobservable and are known to be generated by a
for which the neural network will always converge first-order Markov process,
to a single fixed point. This is the neural network xt = Φt xt−1 + Γt ηt t = 1, . . . , N (2)
analogue of a stable linear system which will always
converge to the fixed point of zero. The constrained where Φτ is the m × m state matrix, Γt is an m × g
neural network is useful for two reasons: matrix, and ηt a g × 1 vector of serially uncorre-
lated disturbance terms with mean zero and covari-
(1) The extended Kalman filter is guaranteed to ance matrix Qt . A linear AR(p) time series model
have bounded errors if the neural network can be represented in the general state form by
model is observable and controllable (dis-
cussed in Sec. 3). The existence of a stable φ1 φ2 φ3 · · · φp−1 φp
1 0 ··· 0
fixed point will help make this the case. 0 0
(2) It reflects our belief that price increments be- 0 1 0 ··· 0 0
yond a certain horizon are unpredictable for xt =
0 0 1 0 0
xt−1 + Γt ηt
the Dollar–DM foreign exchange rates. .. .. ..
. . .
For different problems, a neural network with a 0 0 0 1 0
fixed point at zero may not make sense, in which (3)
A Constrained Neural Network Kalman Filter . . . 401
only measured to a precision of one second. A time (n × (n − 1)) has a single row removed.
discretization of one second is a natural choice for 1
yt
modeling financial time series as aggregation will be ..
.
across synchronous data and missing observations
y i−1
will be in the minority. t
yt = i+1 ,
yt
Augmenting the Kalman filter to deal with erratic
..
time series is achieved by allowing the dimensions of .
the observation vector yt and the observation errors ytn n−1
εt , to vary at each time step (nt ×1). The observation
equation dimensions now vary with time so, 0
..
Ii−1 . 0
yt = Wt Ht xt + εt t = 1, . . . , N (11) ..
.
Wt =
..
(14)
.
where Wt is an nt × n matrix of fixed weights. This
..
gives rise to several possible situations: 0 . In−i
0 n×(n−1)
• Contemporaneous aggregation of the first com- • All components of the observations vector are un-
ponent of the observation vector, in addition to observed, nt is zero, and the weight matrix is
the other components. So the weight matrix is undefined.
((n + 1) × n) and has a row augment for the first
1 yt = [N U LL], Wt = [N U LL] (15)
component to give rise to the two values yt,1 and
1 The dimensions of the innovations ȳt|t−1ψ , and their
yt,2 .
covariance Nt|t−1ψ also vary with nt . Equation (10)
1 is undefined when nt is zero. When there are no ob-
yt,1
y1 servations on a given time t, the Kalman filter updat-
t,2
yt =
..
, ing equations can simply be skipped. The resulting
. predictions for the states and filtering error covari-
ytn n+1 ance’s are, x̂t = x̂t|t−1 and Pt = Pt|t−1 . For con-
secutive missing observations (nt = 0, . . . , nt+l = 0)
1 0 0 ... 0 a multiple step prediction is required, with repeated
1 0 0 ... 0
substitution into the transition Eq. (2) multiple step
0 1 0 ... 0 predictions are obtained by
Wt = 0 0 1
0
(12) Yl Xl−1 Yl
.. .. ..
. . . xt+l = Φt+j xt + Φt+i Γt+j ηt+j
j=1 j=1 i=j+1
0 0 0 1 (n+1)×n t
. .
yt = ... , Wt = .. 1 .. (13) and similarly the expectation of the multiple step
ahead prediction error covariance for the case of time
ytn n
0 . . . 1 n×n
invariant system is given by
X
l−1
• The ith component of the observation vector is un- l l
Pt+l|t = Φ Pt Φ + Φj ΓQ Γ0 Φj . (18)
observed, so nt is (n − 1) and the weight matrix j=0
A Constrained Neural Network Kalman Filter . . . 403
The estimates for multiple step predictions of ŷt+l|t the tails. The heavy tails will allow the robust
and x̂t+l|t can be shown to be the minimum mean Kalman filter to down weight large residuals and
square error estimators Et (yt+l ). provide a degree of insurance against the corrupt-
ing influence of outliers. The degree of insurance is
2.2. Non-Gaussian Kalman filter determined by the distribution parameter c in (19).
The parameter can be related to the level of assumed
The Gaussian estimation methods outlined above
contamination as shown by Huber (1980).
can produce very poor results in the presence of
gross outliers. The Kalman filter can be adapted
to filter processes where the disturbance terms are 2.3. Sufficient conditions
non-Gaussian. The distributions of the process and For a Kalman filter to estimate the time varying
measurement noise for the standard Kalman fil- states of the system given by (1) and (2), the sys-
ter is assumed to be Gaussian. The Kalman fil- tem must be both observable and controllable.
ter can be adapted to filter systems where either An observable system allows the states to be
the process or measurement noise is non-Gaussian uniquely estimated from the data. For example,
(Mazreliez 1973). Mazreliez showed that Kalman fil- in the no noise condition (εt = 0) the state xt
ter update equations are dependent on the distribu- can be determined uniquely from future observa-
tion of the innovations and its score function. For a tions, if the observability matrix given by O =
known distribution the optimal minimum mean vari- [H0 Φ0 H0 Φ02 H00 Φ03 H0 · · · ] has Rank(O) =
ance estimator can be produced. In the situation m where m is the number of elements in the state
where the distribution is unknown, piecewise linear vector xt . The condition for observability is also
methods can employed to approximate it (Kitagawa equivalent to (see Aoki, 1990)
1987). Since the filtering equations depends on the
∞
X
derivative of the distribution, such methods can be
GO = (Φ0 )k H0 HΦk > 0 (20)
inaccurate. Other methods take advantage of higher k=0
moments of the density functions and require no as-
sumptions about the shape of the probability densi- where the observability Grammian, GO , satisfies
ties (Hilands and Thomoploulos 1994). Carlin et al. the following Lyaponov equation, Φ0 GO Φ − GO =
(1992), demonstrate the Kalman filter robust to non- −H0 H.
Gaussian state noise. For robustness, heavy tailed The notion of controllability emerged from the
distributions can be consider as a mixture of Gaus- control theory literature where vt denotes an action
sians, or an ε-contaminated distribution. If the bulk that can be made by an operator of a plant. If
of the data behaves in a Gaussian manner then we the system is controllable, any state can be reached
can employ a density function which is not domi- with the correct sequence of actions. A system Φ
nated by the tails. The Huber function (1980), can is controllable if the reachability matrix defined by
be shown to be the least informative distribution in C = [Γ ΦΓ Φ2 Γ Φ3 Γ · · · ] has Rank(H) = m
the case of ε-contaminated data. which is equivalent to (see Aoki, 1990)
In this paper we achieve robustness to measure- ∞
X
ment outliers by the methodology first described in GC = Φk ΓΓ0 (Φ0 )k > 0 (21)
Bolland and Connor (1996a). A Huber function is k=0
employed as the score function of the residuals. For
where the controllability Grammian, GC , satisfies
spherically ε-contaminated normal distributions in
the following Lyaponov equation, ΦGC Φ0 − GC =
R3 the score function g0 for the Huber function is
−ΓΓ0 .
given by,
When a state space model is in one of the four
g0 = r for r < c canonical variate forms such as (3) and (4), the
Grammians GO and GC are identical and the Con-
=c for r ≥ c . (19)
trollability and Observability requirements are the
The Huber function behaves as a Gaussian for the same. If Φ is non-singular and the absolute values of
center of the distribution and as an exponential in eigenvalues are less than one, the Observability and
404 P. J. Bolland & J. T. Connor
Controllability requirements of (20) and (21) will be The quality of the approximation depends on the
met and the Kalman filter will estimate a unique se- smoothness of the nonlinearity since the extended
quence of underlying states. In the next two sections Kalman filter is only a first order approximation of
a neural network analogue of the stable linear system E{xt |Yt−1 }. The extended Kalman filter can be
is introduced which will be used within the extended augmented as described in Eqs. (11)–(15) to deal
Kalman filter. If the eigenvalues are greater than with missing data and contemporaneous aggregation.
one, the Kalman filter may still estimate a unique The functional form of ht (xt ) and ft (xt−1 ) are esti-
sequence of underlying states but this must be eval- mated using a neural network described in Sec. 5.
uated on a system by system basis. See for example
the work by Harvey on the random trend model.
3.1. Sufficient conditions
3. Extended Kalman Filter The nonlinear analog of the AR(p) model expressed
in phase canonical form given by (3) and (4) is ex-
The Kalman filter can be extended to filter nonlinear pressed as
state space models. These models are not generally
conditionally Gaussian and so an approximate filter
xTt = [xt xt−1 xt−2 ··· xt−p+1 ] (27)
is used. The state space model’s observation function
ht (xt ) and the state update function ft (xt−1 ) are no f (xt−1 )T = [f (xt−1 ) xt−1 xt−2 ··· xt−p+1 ]
longer linear functions of the state vector. Using a
(28)
Taylor expansion of these nonlinear functions around
the predicted state vector and the filtered state vec- yt = Hf (xt−1 )T + εt (29)
tor we have,
ht (xt ) ≈ ht (x̂t|t−1 ) + Ĥt (xt − x̂t|t−1 ) with H = [1 0 0 · · · 0] and Γ=
[1 0 0 · · · 0] as in the linear time series model
∂ht (xt ) (22) of (3) and (4).
Ĥt = ,
∂x0t−1 xt =x̂t|t−1 The Extended Kalman Filter has a long history of
working in practice, but only recently has theory pro-
ft (xt−1 ) ≈ ft (x̂t−1 ) + Φ̂t (xt−1 − x̂t−1 )
duced bounds on the resulting error dynamics. Baras
∂ft (xt−1 ) (23) et al. (1988) and Song and Grizzle (1995) have shown
Φ̂t = .
∂x0t−1 xt−1 =x̂t−1 bounds on the EKF error dynamics of determinis-
tic systems in continuous and discrete time respec-
The extended Kalman filter (Jazwinski 1970) is pro- tively. La Scala, Bitmead, and James (1995) have
duced by modeling the linearized state space model shown how the error dynamics of an EKF on a gen-
using a modified Kalman filter. The linearized ob- eral nonlinear stochastic discrete time are bounded.
servation equation and state update equation are ap- The nonlinear system considered by La Scala et al.
proximated by, has a linear output map which is also true for our
yt ≈ Ĥt xt + d̂t + εt system described in (27)–(29). As in the case of the
(24) linear Kalman filter, the results of La Scala et al. re-
d̂t = ht (x̂t|t−1 ) − Ĥt xt|t−1 quire that the nonlinear system be both observable
xt ≈ Φ̂t xt−1 + ĉt + ηt and controllable.
(25) The nonlinear observability Grammian is more
ĉt = ft (x̂t−1 ) − Φ̂t x̂t−1 complicated than its linear counterpart in (21) and
and the corresponding Kalman prediction equation must be evaluated upon the trajectory of interest
and the state update equation (in prediction error (t1 → t2 ). The nonlinear observability Grammian as
correction format) are, defined by La Scala et al. and applied to the system
x̂t|t−1 = ft (x̂t−1 ) given by (27)–(29) is
system is stable. From view of the financial exam- Other possible dangers exist with recurrent neu-
ple presented later, a limiting point of zero will cor- ral networks. The existence of a fixed point does not
respond to future price increments being unknown preclude chaos, the fixed point may be unstable or
beyond a certain distance into the future. only stable locally. Often recurrent neural networks
There are two ways of putting symmetry con- can perform oscillations or more complicated types
straints in neural networks, a hard symmetry con- of stable or chaotic patterns, see for example Marcus
straint obtained by direct embedding of the symme- and Westervelt (1989), Pearlmutter (1989), Pineda
try in the weight space and a soft constraint of push- (1989), Cohen (1992), Blum and Wang (1992) and
ing a neural network towards a symmetric solution many others. Under some circumstances this com-
but not enforcing the final solution to be symmetric. plicated behavior in an iterated predictor could be
The soft constraint of biasing towards symmet- desirable, however it is the view of this paper that
ric solutions can be viewed from two perspectives, any advantages are outweighed by the dangers of us-
providing hints and regularization. Abu-Mostafa ing a Kalman filter on a poorly understood nonstable
(1990), (1993), and (1995), showed among other system.
things that it is possible to augment a data set with
A fixed point at ŷt = 0 is achieved by augmenting
examples obtained by generating new data under the
a neural network with a constrained output bias pa-
symmetry constraint. The neural network is then
rameter. For a network consisting of H hidden units
trained on the augmented training set and is likely
and with activation function f, the functional form
to have a solution that is closer to the desired sym-
of the constrained network is given by,
metric solution then would otherwise be the case.
The soft constraint can come in the form of extra
data generated from a constraint or as a constraint X
H X
p
ŷt = Wi f wij yt−j + θi
within the learning algorithm itself. Alternatively,
i=1 j=1
neural networks which drift from the symmetric so-
lution can be penalized by a regularization term, see X
H
dŷt Xp
= Wi f wij yt−j + θi
dθi j=1
X
p
× 1 − f wij yt−j + θi
j=1
X 1
Cip = |ωji | (44) 4.2.2. All pi equal
pj
j6=i
The choice of p = (p0 , p0 , . . . , p0 ) will not satisfy
(45)–(47) or (48)–(50) because of the recurrent con-
and (p1 , . . . , pn ) are all positive.
nections which are equal to 1. However, if pi = p0 +
(i − H − 1)ε is used instead for i = H + 2, . . . , H + p
4.2.1. The general case where ε is vanishingly small, the stability guarantees
Stability guarantees for neural networks using (41) of (48)–(50) reduce to
are typically quoted for two cases, γ = 0 and γ = 1. X
p
The fully recurrent network given by (41) is more |wij | < c−1
i , i = 1, . . . , H , (51)
general than the iterated neural network given in j=1
where g is the predicted change in state based on the missing data, namely the xt , εt , and ηt of (1)
the changes in previous time periods. The func- and (2) must be estimated. This amounts to esti-
tion g was estimated by both a constrained feed- mating parameters of the state update function f
forward neural network and an unconstrained neu- and the noise variance matrices Qt and Rt . With
ral network. For the results shown here a simple the estimated missing data assumed to be true, the
NAR(1) model was estimated. A parsimonious neu- parameters are then chosen by way of maximizing
ral network specification was used, with a 4 hidden the likelihood. This procedure is iterative with new
unit feed forward network using sigmoidal activation parameter estimates giving rise to new estimates of
functions. missing data which in turn give rise to newer pa-
An estimation maximization (EM) algorithm is rameter estimates. The iterative estimation proce-
employed at the center of a robust estimation pro- dure was initialized by constructing a contiguous
cedure based on filtered data (for full details see data set (no arrival noise) and estimating a linear
Bolland and Connor 1996). The EM algorithm, see auto-regressive model. The variances of the distur-
Dempster, Laird, and Rubin (1977), is the standard bance terms are non-stationary. To remove some of
approach when estimating model parameters with this non-stationarity the intra day seasonal pattern
missing data. The EM algorithm has been used in of the variances were estimated (Bolland and Connor
the neural network community before, see for ex- 1996). The parameters of the state update function
ample Jordan and Jacobs (1993) or Connor, Mar- were assumed to be stationary across the length of
tin, and Atlas (1994). During the estimation step, the data set.
Table 1 gives the performance of the two models and the unconstrained neural network. The qualita-
for non-iterated forecast. The constraints on the net- tive difference in the models estimated function are
work are not detrimental to the overall performance, only slight.
with the percentage variance explained (r-squared) Figure 3 shows the estimated function around the
and the correlation being very similar. origin. At the origin the constrained network has a
Figure 2 shows the fitted function of a simple bias as it has been restrained from learning the mean
NAR(1) model for the constrained neural network of the estimation set. Although this bias is only very
412 P. J. Bolland & J. T. Connor
small (for linear regression the bias is 5.12 × 10−7 that price increments beyond a certain horizon are
with a t-statistic of 0.872), its effect large as it is unknowable and a predicted price increment of zero
compounded by iterating the forecast. is best (random walk). This constrained neural net-
The filter produces estimated states (shown in work is optimized for foreign exchange modeling, for
Fig. 4) which can be viewed as the “true mid- other problems a constrained neural network with a
prices,” the noise due to market friction’s has been fixed point at zero would be undesirable.
estimated and filtered (bid-ask bounce, price quan- The behavior of the neural network within the ex-
tization, etc.). The iterated forecasts reach the tended Kalman filter under normal operating condi-
stable point after only small number of iterations tions is roughly the same as the unconstrained neural
(approx. 5). network. But in the presence of missing data, the it-
Figure 5 shows a close up of the iterated forecasts erated predictions of the constrained neural network
of the two networks. The value of the stable point far outperformed the unconstrained neural network
in the case of the simple NAR(1) is the final gra- in both quality and performance metrics.
dient of the iterated forecasts. The stable point of
the constrained neural network is zero, and the sta- References
ble point of unconstrained is 1.57 × 10−6. This is the
Y. S. Abu-Mostafa 1990, “Learning from hints in neural
result of a small bias in the model. When the fore-
networks,” J. Complexity 6, 192-198.
cast is iterated this bias is accumulated and therefore
Y. S. Abu-Mostafa 1993, “A method for learning from
the unconstrained network predictions trend. For hints,” Advances in Neural Information Processing 5,
the constrained network the iterated forecast soon ed. S. J. Hanson (Morgan Kaufmann), pp. 73–80.
reach the stable point zero, reflecting our prior be- Y. S. Abu-Mostafa 1995, “Financial applications of learn-
lief in the long term predictability of the series. The ing from hints,” Advances in Neural Information Pro-
mean squared error (MSE) as well as the median cessing 7, eds. J. Tesauro, D. S. Touretzky and T.
absolute deviations (MAD) of the constrained and Leen (Morgan Kaufman), pp. 411–418.
unconstrained networks are given in Table 2, and H. Akaike 1975, “Markovian representation of stochastic
shown in Fig. 6. As the forecast is iterated the MSE processes by stochastic variables,” SIAM J. Control
13, 162–173.
for the unconstrained grows rapidly. This is due to
M. Aoki 1987, State Space Modeling of Time Series,
its trending forecast.
Second Edition (Springer–Verlag).
It is clear that the performance is improved by J. S. Baras, A. Bensoussan and M. R. James 1988,
constraining the neural network. The MSE for the “Dynamic observers as asymptotic limits of recur-
constrained neural network remains relatively con- sive filters: Special cases,” SIAM J. Appl. Math. 48,
stant with prediction. The accuracy of iterated pre- 1147–1158.
diction should decrease as the forecast is iterated. E. K. Blum and X. Wang 1992, “Stability of fixed points
From Fig. 6 it is clear that the MSE is not increasing and periodic orbits and bifurcations in analog neural
with number of iterations. However, only in periods networks,” Neural Networks 5, 577–587.
of very low trading activity are forecasts iterated for P. J. Bolland and J. T. Connor 1996, “A robust non-linear
multivariate Kalman filter for arbitrage identification
40 time steps also in periods of low trading activity
in high frequency data,” in Neural Networks in Fi-
the variance in the time series is low. So the errors nancial Engineering (Proceedings of the NNCM-95),
in these periods are only small even though the time eds. A-P. N. Refenes, Y. Abu-Mostafa, J. Moody and
between observations can be large. A. S. Weigend (World Scientific), pp. 122–135.
P. J. Bolland and J. T. Connor 1996b, “Estimation
of intra day seasonal variances,” Technical Report,
6. Conclusion and Discussion
London Business School.
Using neural networks within an extended Kalman S. J. Butlin and J. T. Connor 1996, “Forecasting
filter is desirable because of the measurement and foreign exchange rates: Bayesian model compari-
arrival noise associated with foreign exchange tick son using Gaussian and Laplacian noise models,”
in Neural Networks in Financial Engineering (Proc.
data. The desirability of using a stable system within
NNCM-95), eds. A.-P. N. Refenes, Y. Abu-Mostafa, J.
a Kalman filter was used as an analogy for develop-
Moody and A. Weigend (World Scientific, Singapore),
ing a “stable neural network” for use within an ex- pp. 146–156.
tended Kalman filter. The “stable neural network” B. P. Carlin, N. G. Polson and D. S. Stoffer 1992, “A
was obtained by constraining the neural network to Monte Carlo approach to nonnormal and nonlin-
have a fixed point of zero input gives zero output. In ear state-space modeling,” J. Am. Stat. Assoc. 87,
addition, the fixed point at zero reflected our belief 493–500.
414 P. J. Bolland & J. T. Connor
P. Y. Chung 1991, “A transactions data test of stock D. G. Kelly 1990, “Stability in contractive nonlinear
index futures market efficiency and index arbitrage neural networks,” IEEE Trans. Biomed. Eng. 37,
profitability,” J. Finance 46, 1791–1809. 231–242.
M. A. Cohen 1992, “The construction of arbitrary sta- G. Kitagawa 1987, “Non-Gaussian state-space modeling
ble dynamics in nonlinear neural networks,” Neural of non-stationary time series,” J. Am. Stat. Assoc.
Networks 5, 83–103. 82, 1033–1063.
J. T. Connor, R. D. Martin and L. E. Atlas 1994, B. F. La Scala, R. R. Bitmead and M. R. James 1995,
“Recurrent neural networks and robust time se- “Conditions for stability of the extended kalman fil-
ries prediction,” IEEE Trans. Neural Networks ter and their application to the frequency tracking
4, 240–254. problem,” Math. Control Signals Syst. 8, 1–26.
G. Cybenko 1989, “Approximation by superpositions of Y. Le Cun, B. Boser, J. S. Denker, D. Henderson, R. E.
a sigmoidal function,” Math. Control, Signals Syst. 2, Howard, W. Hubbard and L. D. Jackel 1990, “Hand-
303–314. written digit recognition with a back-propagation net-
M. M. Dacorogna 1995, “Price behavior and models for work,” in Advances in Neural Information Processing
high frequency data in finance,” Tutorial, NNCM con- Systems 2, ed. D. S. Touretzky (Morgan Kaufmann),
ference, London, England, Oct, pp. 11–13. pp. 396–404.
P. de Jong 1989, “The likelihood for a state space model,” T. K. Leen 1995, “From data distributions to reg-
Biometrica 75, 165–169. ularization in invariant learning,” in Advances in
A. P. Dempster, N. M. Laird and D. B. Rubin 1977, Neural Information Processing 7, eds. J. Tesauro,
“Maximum likelihood from incomplete data via the D. S. Touretzky and T. Leen (Morgan Kaufman),
EM algorithm,” J. Royal Stat. Soc. B39, 1–38. pp. 223–230.
R. F. Engle and M. W. Watson 1987, “The Kalman filter: A. U. Levin and K. S. Narendra 1996, “Control of nonlin-
Applications to forecasting and rational expectation ear dynamical systems using neural networks — Part
models,” in Advances in Econometrics Fifth World II: Observability, identification, and control,” IEEE
Congress, Volume I, ed. T. F. Bewley (Cambridge Trans. Neural Networks 7(1).
University Press). A. U. Levin and K. S. Narendra 1993, “Control of nonlin-
E. Ghysels and J. Jasiak 1995, “Stochastic volatility and ear dynamical systems using neural networks: Con-
time deformation: An application of trading volume trollability and stabilization,” IEEE Trans. Neural
and leverage effects,” Proc. HFDF-I Conf., Zurich, Networks 4(2).
Switzerland, March 29–31, Vol. 1, pp. 1–14. C. M. Marcus and R. M. Westervelt 1989, “Dynam-
C. L. Giles, R. D. Griffen and T. Maxwell 1990, “En- ics of analog neural networks with time delay,” Ad-
coding geometric invariance’s in higher order neural vances in Neural Information Processing Systems 2,
networks,” in Neural Information Processing Systems, ed. D. S. Touretzky (Morgan Kaufmann).
ed. D. Z. Anderson (American Institute of Physics), C. J. Mazreliez 1973, “Approximate non-Gaussian filter-
pp. 301–309. ing with linear state and observation relations,” IEEE
A. V. M. Herz, Z. Li and J. Leo van Hemmen, “Statistical Trans. Automatic Control, February.
mechanics of temporal association in neural networks K. Matsuoka 1992, “Stability Conditions for nonlinear
with delayed interactions,” NIPS, 176–182. continuous neural networks with asymmetric connec-
T. W. Hilands and S. C. Thomoploulos 1994, “High-order tion weights,” Neural Networks 5, 495–500.
filters for estimation in non-Gaussian noise,” Inf. Sci. J. S. Meditch 1969, Stochastic Optimal Linear Estimation
80, 149–179. and Control (McGraw-Hill, New York).
J. J. Hopfield 1984, “Neurons with graded response U. A. Muller, M. M. Dacorogna, R. B. Oslen, O. V.
have collective computational properties like those Pictet, M. Schwarz and C. Morgenegg 1990, “Statisti-
of two-state neurons,” Proc. Natl. Acad. Sci. 81, cal study of foreign exchange rates, empirical evidence
3088–3092. of a price change scaling law, and intraday analysis,”
P. J. Huber 1980, Robust Statistics (Wiley, New York). J. Banking and Finance 14, 1189–1208.
A. H. Jazwinshki 1970, Stochastic Processes and Filtering B. A. Pearlmutter 1989, “Learning state-space trajecto-
Theory (Academic Press, New York). ries in recurrent neural networks,” Neural Computa-
L. Jin, P. N. Nikiforuk and M. Gupta 1994, “Absolute tion 1, 263–269.
stability conditions for discrete-time recurrent neu- F. J. Pineda 1989, “Recurrent backpropagation and the
ral networks,” IEEE Trans. Neural Networks 5(6), dynamical approach to adaptive neural computa-
954–64. tion,” Neural Computation 1, 161–172.
M. I. Jordan and R. A. Jacobs 1992, “Hierarchical mix- R. Roll 1984, “A simple implicit measure of the effective
tures of experts and the EM algorithm,” Neural Com- bid-ask spread in an efficient market,” J. Finance 39,
put. 4, 448–472. 1127–1140.
R. E. Kalman and R. S. Bucy 1961, “New results in lin- P. Simard, B. Victorri, Y. La Cunn and J. Denker 1992,
ear filtering and prediction theory,” Trans. ASME J. “Tangent prop- a formalism for specifying selective
Basic Eng. Series D 83, 95–108. variances in an adaptive network,” in Advances in
A Constrained Neural Network Kalman Filter . . . 415