You are on page 1of 12

SWITCHED KALMAN FILTER

EECS 598-3 Term Project


Fall 2003

Hirak Parikh

1. Introduction
While there are many application-specific approaches to computing
(estimating) an unknown state from a set of process measurements, many of these
methods do not inherently take into consideration the typically noisy nature of the
measurements. For example, consider our work in tracking for interactive computer
graphics. This noise is typically statistical in nature (or can be effectively modeled as
such), which leads us to stochastic methods for addressing the problems.
This is also known as the Observer Design Problem. ([4]Welch & Bishop)
Observer Design Problem: There is a related general problem in the area of linear
systems theory generally called the observer design problem. The basic problem is to
determine (estimate) the internal states of a linear system, given access only to the
systems outputs. Access to the systems control inputs may also presumed. This is
akin to what people often think of as the black box problem where you have access to
some signals coming from the box (the outputs) but you cannot directly observe whats
inside.

I/I/P

BLACK BOX

O/P

Process State and Measured State:


Inside the black box there is some hidden process state that we wish to estimate.
The only variable that we have access to is the measured variable. But in the real world
both these values are corrupted by noise, process noise and measurement noise
respectively.

PROCESS VARIABLE ( X )
+
PROCESS NOISE (Q )

MEASUREMENT VARIABLE (Z)


+
MEASUREMENT NOISE (W)

The many approaches to this basic problem are typically based on a state-space model
There is typically a process model that models the transformation of the process state.
This can usually be represented as a linear stochastic difference equation similar to
equation

xk + 1 = Axk + Qk

In addition there is some form of measurement model that describes the relationship
between the process state and the measurements. This can usually be represented with a
linear expression:

zk = Hkxk + Wk

The terms and W are random variables representing the process and measurement
noise respectively.
Why model measurement and process Noise
We consider here the common case of noisy sensor measurements. There are
many sources of noise in such measurements. For example, each type of sensor has
fundamental limitations related to the associated physical medium, and when pushing the
envelope of these limitations the signals are typically degraded. In addition, some amount
of random electrical noise is added to the signal via the sensor and the electrical circuits.
The time varying ratio of pure signal to the electrical noise continuously affects the
quantity and quality of the information. The result is that information obtained from any
one sensor must be qualified as it is interpreted as part of an overall sequence of
estimates, and analytical measurement models typically incorporate some notion of
random measurement noise or uncertainty as shown above.
There is the additional problem that the actual state transform model is completely
unknown. While we can make predictions over relatively short intervals using models
based on recent state transforms, such predictions assume that the transforms are
predictable, which is not always the case. The result is that like sensor information,
ongoing estimates of the state must be qualified as they are combined with measurements
in an overall sequence of estimates. In addition, process models typically incorporate
some notion of random motion or uncertainty as shown above.

2. The Kalman Filter


Within the significant toolbox of mathematical tools that can be used for stochastic
estimation from noisy sensor measurements, one of the most well-known and often-used
tools is what is known as the Kalman filter. The Kalman filter is named after Rudolph E.
Kalman, who in 1960 published his famous paper describing a recursive solution to the
discrete-data linear filtering problem (Kalman 1960). The Kalman filter is essentially a
set of mathematical equations that implement a predictor-corrector type estimator that is
optimal in the sense that it minimizes the estimated error covariancewhen some
presumed conditions are met. Since the time of its introduction, the Kalman filter has
been the subject of extensive research and application, particularly in the area of
autonomous or assisted navigation. This is likely due in large part to advances in digital
computing that made the use of the filter practical, but also to the relative simplicity and
robust nature of the filter itself. Rarely do the conditions necessary for optimality actually
exist, and yet the filter apparently works well for many applications in spite of this
situation.

Advantages of the Kalman Filter:


Kalman filter is optimal with respect to virtually any criterion that makes sense.
There are a few good reasons to choose a Kalman filter over other approaches ([5]
Maybeck)
1. One aspect of this optimality is that the Kalman filter incorporates all information
that can be provided to it. It processes all available measurements,
regardless of their precision, to estimate the current value of the variables of
interest, with use of (1) knowledge of the system and measurement device
dynamics, (2) the statistical description of the system noises, measurement
errors, and uncertainty in the dynamics models, and (3) any available information
about initial conditions of the variables of interest. Rather than ignore any of
these outputs, a Kalman filter could be built to combine all of this data and
knowledge of the various systems dynamics to generate an overall best estimate
of velocity.
2. The Kalman filter is recursive which means that, unlike certain data processing
concepts, the Kalman filter does not require all previous data to be kept in storage and
reprocessed every time a new measurement is taken. This will be of vital importance to
the practicality of filter implementation.
A Kalman filter combines all available measurement data, plus prior knowledge
about the system and measuring devices, to produce an estimate of the desired variables
in such a manner that the error is minimized statistically. In other words, if we were to
run a number of candidate filters many times for the same application, then the average
results of the Kalman filter would be better than the average results of any other.
Conceptually, what any type of filter tries to do is obtain an optimal estimate of desired
quantities from data provided by a noisy environment, optimal meaning that it
minimizes errors in some respect. There are many means of accomplishing this objective.
If we adopt a Bayesian viewpoint, then we want the filter to propagate the conditional
probability density of the desired quantities, conditioned on knowledge of the actual data
coming from the measuring devices. Once such a conditional probability density function
is propagated, the optimal estimate can be defined. Possible choices would include (1)
the meanthe center of probability mass estimate; (2) the modethe value of that has
the highest probability, locating the peak of the density; and (3) the medianthe value of
such that half of the probability weight lies to the left and half to the right of it.
A Kalman filter performs this conditional probability density propagation for
problems in which the system can be described through a linear model and in which
system and measurement noises are white and Gaussian (to be explained shortly). Under
these conditions, the mean, mode, median, and virtually any reasonable choice for an
optimal estimate all coincide, so there is in fact a unique best estimate of the value of
. Under these three restrictions, the Kalman filter can be shown to be the best filter of any
conceivable form. Some of the restrictions can be relaxed, yielding a qualified optimal
filter. For instance, if the Gaussian assumption is removed, the Kalman filter can be

shown to be the best (minimum error variance) filter out of the class of linear unbiased
filters.

Implementation of the Kalman Filter

Prediction Equations
1) The first equation estimates the a priori estimate of the variable x depending on
the Process Gain A
2) This calculates the a priori estimate of the predictor PkMeasurement Update Equations:
1) Kalman Gain: This is most crucial part of the Kalman filter equations. It has a
large value at the beginning since trusts the a priori estimate less than the
corrected value. As the filter converges the Kalman Gain value becomes small.
2) Then the posteriori estimate of x is obtained depending on the a priori estimate
and Kalman Gain
3) Finally the value of the Pk is corrected.
Training of Parameters:
The A,H,W and Q parameters need to be retrained. This is done by using a sliding
window that refines the values of the parameters of the model depending on the last few
values of the measured variable and the estimated process variable.

3. Switched Kalman Filter:


Unfortunately most systems are not linear and are subject to non-Gaussian noise which
contradicts the basic assumptions under which the Kalman filter is ideal. One approach to
this problem is to discretize the hidden state variables resulting in Dynamic Bayesian
Networks of which the Hidden Markov Model is the simplest example. However the
resulting system will in general have a belief state that is exponential in the number of
hidden state variables resulting in intractable inference. In addition it may also have a
large number of parameters resulting in inefficient learning ie a lot of data is needed
which may not be the best approach for real time applications
Another approach is to have a bank of M different linear models and to switch
between them or take some linear combination of them. Let us consider the case where
the dynamics are piecewise linear. We have a discrete switch variable St which specifies
which A, Q matrix to use at time t. We assume St has Markovian dynamics with
transition matrix Z and initial distribution .

M-Bank of linear Filters


SWITCH

If the Switch (St) were observed we would know when to apply each submodel ie
the segmentation would be known but since St is hidden we use a weighted
combination of each sub-model where the weights are calculated depending on the error
and measured variable. This is called soft switching. Hence the resulting system can be
thought of as a mixture of Kalman Filters.

For example:- we might be interested in tracking a maneuvering airplane. If the


two basic models cover horizontal and vertical motion then we can represent turns using
a convex combination SKFs have been shown to give superior performance to online
adapative methods such as Input Estimation for problems such as these. Let us now
consider the case where St specifies which observation matrices C, R to use at time t.
This can be used to model non-Gaussian observation noise by approximating it as a
mixture of Gaussians. For example we might take Q1 to be the covariance of the
observation process and Q2 to be a very broad covariance e.g.: approximately uniform.
The prior probability of St reflects how often we expect outliers to occur. This is a widely
used technique for making linear regression more robust. E.g.: for modeling sensor
failure. We make the dynamics and the observation model dependent on St on two
separate Markov chains. This is the most general case that we assumed for our
implementation.
The fundamental problem with SKFs is that the belief state grows exponentially
with time. To see this suppose that the initial distribution p(X1) is a mixture of M
Gaussians, one for each value of S1. Then each of these must be propagated through M
different equations (one for each value of S2), so that p(X2) will be a mixture of M2
Gaussians. In general, at time t, the belief state p(Xt| y1:t will be a mixture of Mt
Gaussians, one for each possible model history S1, ,St . The approach that is followed
in this paper and which I implemented in the project is the collapsing method.
Algorithm
1) First the training data the parameters A,Q,C and W are estimated.
Initial Mean and Covariance:
This was calculated from the initial training data provided.
Initial state probability and transition matrix
These were all set to identical probabilities at start. This works quite well. ( [3] Rabiner )
2) Then the filter parameter are calculated for each of the M filters. Since the states
are hidden two we have calculate M estimates of xhat for each of the M filters
giving us M2 xhat values
3) During the filtering operation the probability of each xhat is also calculated. This
probability is used to assign weight to that particular value when the values are
collapsed.
4) The values from the M2 are then collapsed to give M estimates of the xhat
5) Then the these M values are merged to obtain one single value of xhat
6) Go to step 2

Filtering:
We use the following equations which are the same as for the simple Kalman case. The
only overheads are to calculate the various probabilities of the different states. There a
number of probabilities depending on whether is current state or previous state or a
conditional probability.

We compute the error in the prediction (the innovation), the variance of the error the
Kalman gain matrix and likelihood of detection

Now update the estimates of the mean, variance and cross variance

Calculation of the probabilities:


Now using the value of probability for the particular uncollapsed state the (transition
matrix) and the weights are calculated

Collapsing:
The technique is to approximate the mixture of Mt Gaussians with a mixture of r
Gaussians. This is called the Generalized Pseudo Bayesian algorithm of order r (GPB(r),
When r=1, we approximate a mixture of Gaussian with a single Gaussian using moment
matching. When r = 2, we collapse Gaussians which differ in their history two steps ago,
in general these will be more similar than Gaussians that differ in their more recent
history.
One worry is that errors introduced at each time step by approximating the posterior
might accumulate over time, leading to poor performance. However the stochasticity of
the process ensures that the true distribution spreads out and with high probability
overlaps the approximate distribution; hence they are able to prove that error remains
bounded.
[x,new_predictor] = collapse(xhat(i), predictor(i),weight(i)
x = weight(i)*xhat(i)
new_predictor = weight(i)(predictor(i) + (xhat(i) x )(xhat(i) x )T
Training:
At the end of every iteration the filter parameters need to be retrained. The
parameters need to be trained over a window. Equations for the A,C ,W and Q follow.
(The details of the derivation are to be found in [1] Kevin Murphy, Switching Kalman
Filter)

4. Demonstration & Results:


The algorithm was tested for three cases.
1. Two States:
In this model there were two Kalman states assumed and the filter was trained of noisy
data about the two states.
Sample output:

2. Rat Training Data:


In this experiment a rat tone-detection task was simulated. The measured variable
assumed was rat firing rates and the hidden state variable was assumed to be internal
process of the rat.
The three states that were modeled were as follows:
a) High Tone
b) Low Tone
c) Baseline (do nothing)
The rat was trained on there parameters to obtain initial estimates of the A,C,W and Q
parameters.

3. X Y Tracking:
This was a simple X-Y tracking task that was implemented using a single Kalman filter.
Sample Output:

Some results for a sample simulation for each of the above cases
Two_State
Plotting
Mean Error
MSE
of given data
of calculated data

-0.0614
0.0038
3.458
2.8394

Rate Tone
Detection
0.3092
0.0956
1.4193
1.9134

Tracking
-2.882
7.186
106.18
105.18

5. Discussion:
1.The Swiched Kalman filter performed quite well. It matched the data well within the
error bounds. The issues that cropped up during some of the simulations is that the
training data has to reasonably good else some of the initial parameters are set incorrectly
and sometimes the filter might not coverge quickly though given enough time it does
converge even if slowly.
2.The size of window that is used for smoothing can play a role in quick or slow
convergence and this has to be set depending on how much the signal is supposed to
vary.
3. Since the number of states grows to M2 during the filtering operation increasing the
bank of filters can become computationally expensive. Since all the filters work
independently during filtering they can be vectorized. The current code has a for loop in
the collapse routine and this needs to removed as MATLABs speed is comprised by or
loops.
4. This implementation needs to checked for its operation in real time situations.

6. References
[1] Kevin Murphy, Switching Kalman Filter (1998)
[2] Wei Wu, Michael Black, et al, A Switching Kalman Filter Model for Motor
Cortical Coding of Hand Motion (2003)
[3] Lawrence Rabiner, A Tutorial on Hidden Markov Models and Selected
Applications in Speech Recognition (1989)
[4] Welch & Bishop, An Introduction to the Kalman Filter (2001)
[5] Peter Maybeck, Stochastic Models, estimation and control (1979)
[6] T. Kailath, Lectures on Wiener and Kalman Filtering (1981, 2nd ed)

You might also like