Professional Documents
Culture Documents
We derive here the basic equations of the Kalman filter (KF), for discrete-time
linear systems. We consider several derivations under different assumptions and
viewpoints:
• For the Gaussian case, the KF is the optimal (MMSE) state estimator.
• In the non-Gaussian case, the KF is derived as the best linear (LMMSE) state
estimator.
1
4.1 The Stochastic State-Space Model
The noise sequences (wk , vk ) and the initial conditions x0 are stochastic processes
with known statistics.
Assumption A1: The state-noise process {wk } is white in the strict sense, namely
all wk ’s are independent of each other. Furthermore, this process is independent of
x0 .
2
Note:
• Linearity is not essential: The Marko property follows from A1 also for the
nonlinear state equation xk+1 = f (xk , wk ).
• The pdf of the state can (in principle) be computed recursively via the following
(Chapman-Kolmogorov) equation:
Z
p(xk+1 ) = p(xk+1 |xk )p(xk )dxk .
• Assume that the noise sequences {wk }, {vk } and the initial conditions x0 are
jointly Gaussian.
• It easily follows that the processes {xk } and {zk } are (jointly) Gaussian as
well.
3
Second-Order Model
We often assume that only the first and second order statistics of the noise is known.
Consider our linear system:
xk+1 = Fk xk + Gk wk , k≥0
zk = Hk xx + vk ,
4
Mean and covariance propagation
For the standard second-order model, we easily obtain recursive formulas for the
mean and covariance of the state.
xk+1 = Fk xk + Gk wk = Fk xk
.
Pk = E((xk − xk )(xk − x)T ) .
• In the Gaussian case, the pdf of xk is completely specified by the mean and
covariance: xk ∼ N (xk , Pk ).
5
4.2 The KF for the Gaussian Case
xk+1 = Fk xk + Gk wk , k≥0
zk = Hk xx + vk
where:
• {wk } and {vk } are independent, zero-mean Gaussian white processes with
covariances
E(vk vlT ) = Rk δkl , E(wk wlT ) = Qk δkl
• The initial state x0 is a Gaussian RV, independent of the noise processes, with
x0 ∼ N (x0 , P0 ).
.
x̂−
k ≡ x̂k|k−1 = E(xk |Zk−1 )
.
Pk+ ≡ Pk|k = E{xk − x̂+ + T
k )(xk − x̂k ) |Zk }
.
Pk− ≡ Pk|k−1 = E{xk − x̂− − T
k )(xk − x̂k ) |Zk−1 } .
Note that Pk+ (and similarly Pk− ) can be viewed in two ways:
6
(ii) It is the covariance matrix of the “conditional RV (xk |Zk )”, namely an RV
with distribution p(xk |Zk ) (since x̂+
k is its mean).
. .
Finally, denote P0− = P0 , x̂−
0 = x0 .
m = mx + Σxz Σ−1
zz (z − mz ) ,
According to the terminology above, we say in this case that the conditional RV
(x|z) is Gaussian.
Proposition: For the model above, all random processes (noises, xk , zk ) are jointly
Gaussian.
Proof: All can be expressed as linear combinations of the noise seqeunces, which
are jointly Gaussian (why?).
7
Filter Derivation
where
4
Mk = Hk Pk− HkT + Rk .
To compute (xk |Zk ) = (xk |zk , Zk−1 ), we apply the above formula for conditional
expectation of Gaussian RVs, with everything pre-conditioned on Zk−1 . It follows
that (xk |Zk ) is Gaussian, with mean and covariance:
.
x̂+ − − T −1 −
k = E(xk |Zk ) = x̂k + Pk Hk (Mk ) (zk − Hk x̂k )
.
Pk+ = cov(xk |Zk ) = Pk− − Pk− HkT (Mk )−1 Hk Pk−
Time update step Recall that xk+1 = Fk xk + Gk wk . Further, xk and wk are inde-
pendent given Zk (why?). Therefore,
.
x̂− +
k+1 = E(xk+1 |Zk ) = Fk x̂k
− .
Pk+1 = cov(xk+1 |Zk ) = Fk Pk+ FkT + Gk Qk GTk
8
Remarks:
Note that the covariance computation is needed as part of the estimator com-
putation. However, it is also of independent importance as is assigns a measure
of the uncertainly (or confidence) to the estimate.
2. It is remarkable that the conditional covariance matrices Pk+ and Pk− do not de-
pend on the measurements {zk }. They can therefore be computed in advance,
given the system matrices and the noise covariances.
3. As usual in the Gaussian case, Pk+ is also the unconditional error covariance:
In the non-Gaussian case, the unconditional covariance will play the central
role as we compute the LMMSE estimator.
.
4. Suppose we need to estimate some sk = Cxk .
Then the optimal estimate is ŝk = E(sk |Zk ) = C x̂+
k.
.
z̃k = zk − Hk x̂−
k ≡ zk − E(zk |Zk−1 )
9
4.3 Best Linear Estimator – Innovations Approach
a. Linear Estimators
Recall that the best linear (or LMMSE) estimator of x given y is an estimator of
the form x̂ = Ay + b, which minimizes the mean square error E(kx − x̂k2 ). It is
given by:
x̂ = mx + Σxy Σ−1
yy (y − my )
where Σxy and Σyy are the covariance matrices. It easily follows that x̂ is unbiased:
E(x̂) = mx , and the corresponding (minimal) error covariance is
We shall find it convenient to denote this estimator x̂ as E L (x|y). Note that this is
not the standard conditional expectation.
E((x − E L (x|y))L(y)) = 0
The following property will be most useful. It follows simply by using y = (y1 ; y2 )
in the formulas above:
Furthermore,
10
b. The innovations process
where Zk−1 = (z0 ; · · · zk−1 ). The innovation RV z̃k may be regarded as containing
only the new statistical information which is not already in Zk−1 .
The following properties follow directly from those of the best linear estimator:
T
(1) E(z̃k ) = 0, and E(z̃k Zk−1 ) = 0.
This implies that the innovations process is a zero-mean white noise process.
Denote Z̃k = (z̃0 ; · · · ; z̃k ). It is easily verified that Zk and Z̃k are linear functions of
each other. This implies that E L (x|Zk ) = E L (x|Z̃k ) for any RV x.
E L (x|Zk ) = E L (x|Z̃k )
k
X
L L
= E (x|Z̃k−1 ) + E (x|z̃k ) = E L (x|z̃l )
l=0
11
c. Derivation of the KF equations
We proceed to derive the Kalman filter as the best linear estimator for our linear,
non-Gaussian model. We slightly generalize the model that was treated so far by
allowing correlation between the state noise and measurement noise. Thus, we
consider the model
xk+1 = Fk xk + Gk wk , k≥0
zk = Hk xx + vk ,
Zk = (z0 ; · · · ; zk )
x̂k|k−1 = E L (xk |Zk−1 ) x̂k|k = E L (xk |Zk )
x̃k|k−1 = xk − x̂k|k−1 x̃k|k = xk − x̂k|k
Pk|k−1 = cov(x̃k|k−1 ) Pk|k = cov(x̃k|k )
4
z̃k = zk − E L (zk |Zk−1 ) = zk − Hk x̂k|k−1 .
Note that
z̃k = Hk x̃k|k−1 + vk .
12
Measurement update: From our previous discussion of linear estimation and inno-
vations,
This relation is the basis for the innovations approach. The rest follows essentially
by direct computations, and some use of the orthogonality principle. First,
where E(xk x̃Tk|k−1 ) = Pk|k−1 follows by orthogonality, and we also used the fact that
vk and xk are not correlated. Similarly,
.
cov(z̃k ) = cov(Hk x̃k|k−1 + vk ) = Hk Pk|k−1 HkT + Rk = Mk
Time update: This step is less trivial than before due to the correlation between vk
and wk . We have
x̂k+1|k = E L (xk+1 |Z̃k ) = E L (Fk xk + Gk wk |Z̃k )
= Fk x̂k|k + Gk E L (wk |z̃k )
In the last equation we used E L (wk |Z̃k−1 ) = 0 since wk is uncorrelated with Z̃k−1 .
Thus
13
Combined update: Combining the measurement and time updates, we obtain the
one-step update for x̂k|k−1 :
where
.
Kk = (Fk Pk|k−1 Hk + Gk Sk )Mk−1
z̃k = zk − Hk x̂k|k−1
Mk = Hk Pk|k−1 HkT + Rk .
Covariance update: The relation between Pk|k and Pk|k−1 is exactly as before.
The recursion for Pk+1|k is most conveniently obtained in terms of Pk|k−1 directly.
From the previous relations we obtain
14
Addendum: A Hilbert space interpretation
The definitions and results concerning linear estimators can be nicely interpreted in
terms of a Hilbert space formulation.
Recall that a Hilbert space is a (complete) inner-product space. That is, it is a linear
vector space V , with a real-valued inner product operation hv1 , v2 i which is bi-linear,
symmetric, and non-degenerate (hv, vi = 0 iff v = 0). (Completeness means that
every Cauchy sequence has a limit.) The derived norm is defined as kvk2 = hv, vi.
The following facts are standard:
15
5. Given a set of (independent) vectors {v1 , v2 . . . }, the following Gram-Schmidt
procedure provides an orthogonal basis:
We can fit the previous results on linear estimation to this framework by noting the
following correspondence:
• Our Hilbert space is the space of all zero-mean random variables x (on a given
probability space) which are square-integrable: E(x2 ) = 0. The inner product
in defined as hx, yi = E(xy).
• The optimal linear estimator E L (xk |Zk ), with Zk = (z0 , . . . , zk ), is the orthog-
onal projection of the vector xk on the subspace spanned by Zk . (If xk is
vector-valued, we simply consider the projection of each element separately.)
The Hilbert space formulation provides a nice insight, and can also provide useful
technical results, especially in the continuous-time case. However, we shall not go
deeper into this topic.
16
4.4 The Kalman Filter as a Least-Squares Problem
1
Jk = (x0 − x0 )T P0−1 (x0 − x0 )
2
k
1 X
+ (zl − Hl xl )T Rl−1 (zl − Hl xl )
2 l=0
k−1
1 X T −1
+ w Q wl
2 l=0 l l
Constraints:
xl+1 = Fl xl + Gl wl , l = 0, 1, . . . , k − 1
Variables:
x0 , . . . xk ; w0 , . . . wk−1 .
This claim can be established by writing explicitly the least-squares solution for
k − 1 and k, and manipulating the matrix expressions.
We will take here a quicker route, using the Gaussian insight.
(k) (k)
Theorem The minimizing solution (xo , . . . , xk ) of the above LS problem is the
maximizer of the conditional probability (that is, the M AP estimator):
17
related to the Gaussian model:
xk+1 = Fk xk + Gk wk , x0 ∼ N (x0 , P0 )
zk = Hk xk + vk , wk ∼ N (0, Qk ), vk ∼ N (0, Pk )
Immediate Consequence: Since for Gaussian RV’s MAP=MMSE, then (x0 , . . . , xk )(k)
(k)
are equivalent to the expected means: In particular, xk = x+
k.
Remark: The above theorem (but not the last consequence) holds true even for the
non-linear model: xk+1 = Fk (xk ) + Gk wk .
18
4.5 KF Equations – Basic Versions
Initial Conditions:
. .
x̂−
0 = x0 = E(x0 ), P0− = P0 = cov(x0 ) .
Measurement update:
x̂+ − −
k = x̂k + Kk (zk − Hk x̂k )
Time update:
x̂− +
k+1 = Fk x̂k [+Bk uk ]
−
Pk+1 = Fk Pk+ FkT + Gk Qk GTk
b. One-step iterations
The two-step equations may obviously be combined into a one-step update which
computes x̂+ + − −
k+1 from x̂k (or x̂k+1 from x̂k ).
For example,
x̂− − −
k+1 = Fk x̂k + Fk Kk (zk − Hk x̂k )
−
Pk+1 = Fk (Pk− − K k Hk Pk− )FkT + Gk Qk GTk .
.
Lk = Fk Kk is also known as the Kalman gain.
The iterative equation for Pk− is called the (discrete-time, time-varying) Matrix
Riccati Equation.
19
c. Other important quantities
The measurement prediction, the innovations process, and the innovations covari-
ance are given by
.
ẑk = E(zk |Zk−1 ) = Hk x̂−
k (+Ik uk )
.
z̃k = zk − ẑk = Hk x̃−
k
.
Mk = cov(z̃k ) = Hk Pk− HkT + Rk
The measurement update for the (optimal) covariance Pk may be expressed in the
following equivalent formulas:
= (I − Kk Hk )Pk−
= Pk− − Kk Mk KkT
xk − x̂+ −
k = (I − Kk Hk )(xk − x̂k ) − Kk vk
This form may be more computationally expensive, but has the following
advantages:
20
– It holds for any gain Kk (not just the optimal) that is used in the esti-
mator equation x̂+ −
k = x̂k + Kk z̃k .
2. Information form:
(Pk+ )−1 = (Pk− )−1 + Hk Rk−1 Hk
The equivalence may obtained via the useful Matrix Inversion Lemma:
P −1 is called the Information Matrix. It forms the basis for the “information
filter”, which only computes the inverse covariances.
control theory.
Given a (deterministic) system:
xk+1 = Fk xk + Bk uk
zk = Hk xk
.
where Lk are gain matrices to be chosen, with the goal of obtaining x̃k = (xk − x̂k ) →
0 as k → ∞.
21
Since
x̃k+1 = (Fk − Lk Hk )x̃k ,
The Kalman gain automatically satisfies this stability requirement (whenever the
detectability condition is satisfied).
22