Ch4 - Kalman Filter Discrete Time

Technion – Israel Institute of Technology, Department of Electrical Engineering
Estimation and Identification in Dynamical Systems (048825)

Lecture Notes, Fall 2009, Prof. N. Shimkin
4 Derivations of the Discrete-Time Kalman Filter
We derive here the basic equations of the Kalman filter (KF), for discrete-time
linear systems. We consider several derivations under different assumptions and
viewpoints:
• For the Gaussian case, the KF is the optimal (MMSE) state estimator.
• In the non-Gaussian case, the KF is derived as the best linear (LMMSE) state
estimator.
• We also provide a deterministic (least-squares) interpretation.
We start by describing the basic state-space model.
1
4.1 The Stochastic State-Space Model
A discrete-time, linear, time-varying state space system is given by:
xk+1 = Fk xk + Gk wk (state evolution equation)

zk = Hk xk + vk (measurement equation)
for k ≥ 0 (say), and initial conditions x0 . Here:

– Fk , Gk , Hk are known matrices.
– xk ∈ IRn is the state vector.
– wk ∈ IRnw is the state noise.
– zk ∈ IRm is the observation vector.
– vk the observation noise.
– The initial conditions are given by x0 , usually a random variable.
The noise sequences (wk , vk ) and the initial conditions x0 are stochastic processes
with known statistics.
The Markovian model
Recall that a stochastic process {Xk } is a Markov process if
p(Xk+1 |Xk , Xk−1 , . . . ) = p(Xk+1 |Xk ) .
For the state xk to be Markovian, we need the following assumption.
Assumption A1: The state-noise process {wk } is white in the strict sense, namely
all wk ’s are independent of each other. Furthermore, this process is independent of
x0 .
The following is then a simple exercise:
Proposition: Under A1, the state process {xk , k ≥ 0} is Markov.
2
Note:
• Linearity is not essential: The Marko property follows from A1 also for the
nonlinear state equation xk+1 = f (xk , wk ).
• The measurement process zk is usually not Markov.
• The pdf of the state can (in principle) be computed recursively via the following
(Chapman-Kolmogorov) equation:
Z
p(xk+1 ) = p(xk+1 |xk )p(xk )dxk .
where p(xk+1 |xk ) is determined by p(wk ).
The Gaussian model
• Assume that the noise sequences {wk }, {vk } and the initial conditions x0 are
jointly Gaussian.
• It easily follows that the processes {xk } and {zk } are (jointly) Gaussian as
well.
• If, in addition, A1 is satisfied (namely {wk } is white and independent of x0 ),

then xk is a Markov process.
This model is often called the Gauss-Markov Model.
3
Second-Order Model
We often assume that only the first and second order statistics of the noise is known.
Consider our linear system:
xk+1 = Fk xk + Gk wk , k≥0
zk = Hk xx + vk ,
under the following assumptions:
• wk a 0-mean white noise: E(wk ) = 0, cov(wk , wl ) = Qk δkl .
• vk a 0-mean white noise: E(vk ) = 0, cov(vk , vl ) = Rk δkl .
• cov(wk , vl ) = 0: uncorrelated noise.
• x0 is uncorrelated with the other noise sequences.

denote x0 = E(x0 ), cov(x0 ) = P0 .
We refer to this model as the standard second-order model.
It is sometimes useful to allow correlation between vk and wk :
cov(wk , vl ) ≡ E(wk vlT ) = Sk δkl .
This gives the second-order model with correlated noise.
A short-hand notation for the above correlations:

     
wk w Q δ Sk δkl 0
   l   k kl 
     
cov( vk  ,  vl ) =  SkT δkl Rk δkl 0 
     
x0 x0 0 0 P0
Note that the Gauss-Markov model is a special case of this model.
4
Mean and covariance propagation
For the standard second-order model, we easily obtain recursive formulas for the
mean and covariance of the state.
• The mean obviously satisfies:
xk+1 = Fk xk + Gk wk = Fk xk
• Consider next the covariance:
.
Pk = E((xk − xk )(xk − x)T ) .
Note that xk+1 − xk+1 = Fk (xk − xk ) + Gk wk , and wk and xk are uncorrelated

(why?). Therefore
Pk+1 = Fk Pk FkT + Gk Qk GTk .
This equation is in the form of a Lyapunov difference equation.
• Since zk = Hk xx + vk , it is now easy to compute its covariance, and also the

joint covariances of (xk , zk ).
• In the Gaussian case, the pdf of xk is completely specified by the mean and
covariance: xk ∼ N (xk , Pk ).
5
4.2 The KF for the Gaussian Case
Consider the linear Gaussian (or Gauss-Markov) model
zk = Hk xx + vk
where:
• {wk } and {vk } are independent, zero-mean Gaussian white processes with
covariances
E(vk vlT ) = Rk δkl , E(wk wlT ) = Qk δkl
• The initial state x0 is a Gaussian RV, independent of the noise processes, with
x0 ∼ N (x0 , P0 ).
Let Zk = (z0 , . . . , zk ). Our goal is to compute recursively the following optimal

(MMSE) estimator of xk :
.
x̂+
k ≡ x̂k|k = E(xk |Zk ) .
Also define the one-step predictor of xk :
.
x̂−
k ≡ x̂k|k−1 = E(xk |Zk−1 )
and the respective covariance matrices:
.
Pk+ ≡ Pk|k = E{xk − x̂+ + T
k )(xk − x̂k ) |Zk }
.
Pk− ≡ Pk|k−1 = E{xk − x̂− − T
k )(xk − x̂k ) |Zk−1 } .
Note that Pk+ (and similarly Pk− ) can be viewed in two ways:
(i) It is the covariance matrix of the (posterior) estimation error, ek = xk − x̂+

k.
In particular, MMSE = trace(Pk+ ).
6
(ii) It is the covariance matrix of the “conditional RV (xk |Zk )”, namely an RV
with distribution p(xk |Zk ) (since x̂+
k is its mean).
. .
Finally, denote P0− = P0 , x̂−
0 = x0 .
Recall the formulas for conditioned Gaussian vectors:
• If x and z are jointly Gaussian, then px|z ∼ N (m, Σ), with
m = mx + Σxz Σ−1
zz (z − mz ) ,
Σ = Σxx − Σxz Σ−1

zz Σzx .
• The same formulas hold when everything is conditioned, in addition, on an-

other random vector.
According to the terminology above, we say in this case that the conditional RV
(x|z) is Gaussian.
Proposition: For the model above, all random processes (noises, xk , zk ) are jointly
Gaussian.
Proof: All can be expressed as linear combinations of the noise seqeunces, which
are jointly Gaussian (why?).
It follows that (xk |Zm ) is Gaussian (for any k, m). In particular:
(xk |Zk ) ∼ N (x̂+ +

k , Pk ) , (xk |Zk−1 ) ∼ N (x̂− −
k , Pk ) .
7
Filter Derivation
Suppose, at time k, that (x̂− −

k , Pk ) is given.
We shall compute (x̂+ + − −

k , Pk ) and (x̂k+1 , Pk+1 ), using the following two steps.
Measurement update step: Since zk = Hk xk + vk , then the conditional vector

µ ¶
xk
( |Zk−1 ) is Gaussian, with mean and covariance:
zk
   
− − − T
x̂k Pk Pk Hk
 ,  
Hk x̂−
k H P
k k
−
M k
where
4
Mk = Hk Pk− HkT + Rk .
To compute (xk |Zk ) = (xk |zk , Zk−1 ), we apply the above formula for conditional
expectation of Gaussian RVs, with everything pre-conditioned on Zk−1 . It follows
that (xk |Zk ) is Gaussian, with mean and covariance:
.
x̂+ − − T −1 −
k = E(xk |Zk ) = x̂k + Pk Hk (Mk ) (zk − Hk x̂k )
.
Pk+ = cov(xk |Zk ) = Pk− − Pk− HkT (Mk )−1 Hk Pk−
Time update step Recall that xk+1 = Fk xk + Gk wk . Further, xk and wk are inde-
pendent given Zk (why?). Therefore,
.
x̂− +
k+1 = E(xk+1 |Zk ) = Fk x̂k
− .
Pk+1 = cov(xk+1 |Zk ) = Fk Pk+ FkT + Gk Qk GTk
8
Remarks:
1. The KF computes both the estimate x̂+ +

k and its MSE/covariance Pk (and
similarly for x̂−

k ).
Note that the covariance computation is needed as part of the estimator com-
putation. However, it is also of independent importance as is assigns a measure
of the uncertainly (or confidence) to the estimate.
2. It is remarkable that the conditional covariance matrices Pk+ and Pk− do not de-
pend on the measurements {zk }. They can therefore be computed in advance,
given the system matrices and the noise covariances.
3. As usual in the Gaussian case, Pk+ is also the unconditional error covariance:
Pk+ = cov(xk − x̂+ + + T

k ) = E[(xk − x̂k )(xk − x̂k ) ] .
In the non-Gaussian case, the unconditional covariance will play the central
role as we compute the LMMSE estimator.
.
4. Suppose we need to estimate some sk = Cxk .
Then the optimal estimate is ŝk = E(sk |Zk ) = C x̂+
k.
5. The following “output prediction error”
.
z̃k = zk − Hk x̂−
k ≡ zk − E(zk |Zk−1 )
is called the innovation, and {z̃k } is the important innovations process.

Note that Mk = Hk Pk− HkT + Rk is just the covariance of z̃k .
9
4.3 Best Linear Estimator – Innovations Approach
a. Linear Estimators
Recall that the best linear (or LMMSE) estimator of x given y is an estimator of
the form x̂ = Ay + b, which minimizes the mean square error E(kx − x̂k2 ). It is
given by:
x̂ = mx + Σxy Σ−1
yy (y − my )
where Σxy and Σyy are the covariance matrices. It easily follows that x̂ is unbiased:
E(x̂) = mx , and the corresponding (minimal) error covariance is
cov(x − x̂) = E(x − x̂)(x − x̂)T = Σxx − Σxy Σ−1 T

yy Σxy
We shall find it convenient to denote this estimator x̂ as E L (x|y). Note that this is
not the standard conditional expectation.
Recall further the orthogonality principle:
E((x − E L (x|y))L(y)) = 0
for any linear function L(y) of y.
The following property will be most useful. It follows simply by using y = (y1 ; y2 )
in the formulas above:
• Suppose cov(y1 , y2 ) = 0. Then
E L (x|y1 , y2 ) = E L (x|y1 ) + [E L (x|y2 ) − E(x)] .
Furthermore,
cov(x − E L (x|y1 , y2 )) = (Σxx − Σxy1 Σ−1 T −1 T

y1 y1 Σxy1 ) − Σxy2 Σy2 y2 Σxy2 .
10
b. The innovations process
Consider a discrete-time stochastic process {zk }k≥0 . The (wide-sense) innovations

process is defined as
z̃k = zk − E L (zk |Zk−1 ) ,
where Zk−1 = (z0 ; · · · zk−1 ). The innovation RV z̃k may be regarded as containing
only the new statistical information which is not already in Zk−1 .
The following properties follow directly from those of the best linear estimator:
T
(1) E(z̃k ) = 0, and E(z̃k Zk−1 ) = 0.
(2) z̃k is a linear function of Zk .
(3) Thus, cov(z̃k , z̃l ) = E(z̃k z̃lT ) = 0 for k 6= l.
This implies that the innovations process is a zero-mean white noise process.
Denote Z̃k = (z̃0 ; · · · ; z̃k ). It is easily verified that Zk and Z̃k are linear functions of
each other. This implies that E L (x|Zk ) = E L (x|Z̃k ) for any RV x.
It follows that (taking E(x) = 0 for simplicity):
E L (x|Zk ) = E L (x|Z̃k )
k
X
L L
= E (x|Z̃k−1 ) + E (x|z̃k ) = E L (x|z̃l )
l=0
11
c. Derivation of the KF equations
We proceed to derive the Kalman filter as the best linear estimator for our linear,
non-Gaussian model. We slightly generalize the model that was treated so far by
allowing correlation between the state noise and measurement noise. Thus, we
consider the model
zk = Hk xx + vk ,
with [wk ; vk ] a zero-mean white noise sequence with covariance

   
wk Q Sk
E(  [wlT , vlT ]) =  k  δkl .
vk SkT Rk
x0 has mean x0 , covariance P0 , and is uncorrelated with the noise sequence.
We use here the following notation:
Zk = (z0 ; · · · ; zk )
x̂k|k−1 = E L (xk |Zk−1 ) x̂k|k = E L (xk |Zk )
x̃k|k−1 = xk − x̂k|k−1 x̃k|k = xk − x̂k|k
Pk|k−1 = cov(x̃k|k−1 ) Pk|k = cov(x̃k|k )
and defne the innovations process
4
z̃k = zk − E L (zk |Zk−1 ) = zk − Hk x̂k|k−1 .
Note that
z̃k = Hk x̃k|k−1 + vk .
12
Measurement update: From our previous discussion of linear estimation and inno-
vations,
x̂k|k = E L (xk |Zk ) = E L (xk |Z̃k )
= E L (xk |Z̃k−1 ) + E L (xk |z̃k ) − E(xk )
This relation is the basis for the innovations approach. The rest follows essentially
by direct computations, and some use of the orthogonality principle. First,
E L (xk |z̃k ) − E(xk ) = cov(xk , z̃k )cov(z̃k )−1 z̃k .
The two covariances are next computed:
cov(xk , z̃k ) = cov(xk , Hk x̃k|k−1 + vk ) = Pk|k−1 HkT ,
where E(xk x̃Tk|k−1 ) = Pk|k−1 follows by orthogonality, and we also used the fact that
vk and xk are not correlated. Similarly,
.
cov(z̃k ) = cov(Hk x̃k|k−1 + vk ) = Hk Pk|k−1 HkT + Rk = Mk
By substituting in the estimator expression we obtain
x̂k|k = x̂k|k−1 + Pk|k−1 HkT Mk−1 z̃k
Time update: This step is less trivial than before due to the correlation between vk
and wk . We have
x̂k+1|k = E L (xk+1 |Z̃k ) = E L (Fk xk + Gk wk |Z̃k )
= Fk x̂k|k + Gk E L (wk |z̃k )
In the last equation we used E L (wk |Z̃k−1 ) = 0 since wk is uncorrelated with Z̃k−1 .
Thus
x̂k+1|k = Fk x̂k|k + Gk E(wk z̃kT )cov(z̃k )−1 z̃k
= Fk x̂k|k + Gk Sk Mk−1 z̃k
where E(wk z̃kT ) = E(wk vkT ) = Sk follows from z̃k = Hk x̃k|k−1 + vk .
13
Combined update: Combining the measurement and time updates, we obtain the
one-step update for x̂k|k−1 :
x̂k+1|k = Fk x̂k|k−1 + Kk z̃k
where
.
Kk = (Fk Pk|k−1 Hk + Gk Sk )Mk−1
z̃k = zk − Hk x̂k|k−1
Mk = Hk Pk|k−1 HkT + Rk .
Covariance update: The relation between Pk|k and Pk|k−1 is exactly as before.
The recursion for Pk+1|k is most conveniently obtained in terms of Pk|k−1 directly.
From the previous relations we obtain
x̃k+1|k = (Fk − Kk Hk )x̃k|k−1 + Gk wk − Kk vk
Since x̃k is uncorrelated with wk and vk ,
Pk+1|k = (Fk − Kk Hk )Pk|k−1 (Fk − Kk Hk )T + Gk Qk GTk

+Kk Rk KkT − (Gk Sk KkT + Kk SkT GTk )
This completes the filter equations for this case.
14
Addendum: A Hilbert space interpretation
The definitions and results concerning linear estimators can be nicely interpreted in
terms of a Hilbert space formulation.
Consider for simplicity all RVs in this section to have 0 mean.
Recall that a Hilbert space is a (complete) inner-product space. That is, it is a linear
vector space V , with a real-valued inner product operation hv1 , v2 i which is bi-linear,
symmetric, and non-degenerate (hv, vi = 0 iff v = 0). (Completeness means that
every Cauchy sequence has a limit.) The derived norm is defined as kvk2 = hv, vi.
The following facts are standard:
1. A subspace S is a linearly-closed subset of V . Alternatively, it is the linear

span of some set of vectors {vα }.
2. The orthogonal projection ΠS v of a vector v unto the subspace S is the closest

element to v in S, i.e., the vector v 0 ∈ S which minimizes kv − v 0 k. Such a
vector exists and is unique, and satisfies (v − ΠS v) ⊥ S, i.e., hv − ΠS v, si = 0
for s ∈ S.
Pk
3. If S = span{s1 , . . . , sk }, then ΠS v = i=1 αi si , where
[α1 , . . . , αk ] = [hv, s1 i, . . . , hv, sk i][hsi , sj ii,j=1...k ]−1
4. If S = S1 ⊕ S2 (S is the direct sum of two orthogonal subspaces S1 and S2 ),

then
ΠS v = ΠS1 v + ΠS2 v .
If {s1 , . . . , sk } is an orthogonal basis of S, then

k
X
ΠS v = hv, si ihsi , si i−1 si
i=1
15
5. Given a set of (independent) vectors {v1 , v2 . . . }, the following Gram-Schmidt
procedure provides an orthogonal basis:
ṽk = vk − Πspan{v1 ...vk−1 } vk

k−1
X
= vk − hvk , ṽi ihṽi , ṽi i−1 vi
i=1
We can fit the previous results on linear estimation to this framework by noting the
following correspondence:
• Our Hilbert space is the space of all zero-mean random variables x (on a given
probability space) which are square-integrable: E(x2 ) = 0. The inner product
in defined as hx, yi = E(xy).
• The optimal linear estimator E L (xk |Zk ), with Zk = (z0 , . . . , zk ), is the orthog-
onal projection of the vector xk on the subspace spanned by Zk . (If xk is
vector-valued, we simply consider the projection of each element separately.)
• The innovations process {zk } is an orthogonalized version of {zk }.
The Hilbert space formulation provides a nice insight, and can also provide useful
technical results, especially in the continuous-time case. However, we shall not go
deeper into this topic.
16
4.4 The Kalman Filter as a Least-Squares Problem
Consider the following deterministic optimization problem.
Cost function (to be minimized):
1
Jk = (x0 − x0 )T P0−1 (x0 − x0 )
2
k
1 X
+ (zl − Hl xl )T Rl−1 (zl − Hl xl )
2 l=0
k−1
1 X T −1
+ w Q wl
2 l=0 l l
Constraints:
xl+1 = Fl xl + Gl wl , l = 0, 1, . . . , k − 1
Variables:
x0 , . . . xk ; w0 , . . . wk−1 .
Here x0 , {zl } are given vectors, and P0 , Rl , Ql symmetric positive-definite matrices.

(k) (k) (k)
Let (xo , . . . , xk ) denote the optimal solution of this problem. We claim that xk
can be computed exactly as x̂k|k in the corresponding KF problem.
This claim can be established by writing explicitly the least-squares solution for
k − 1 and k, and manipulating the matrix expressions.
We will take here a quicker route, using the Gaussian insight.
(k) (k)
Theorem The minimizing solution (xo , . . . , xk ) of the above LS problem is the
maximizer of the conditional probability (that is, the M AP estimator):
p(x0 , . . . , xk |Zk ) , w.r.t.(xo , . . . , xk )
17
related to the Gaussian model:
xk+1 = Fk xk + Gk wk , x0 ∼ N (x0 , P0 )
zk = Hk xk + vk , wk ∼ N (0, Qk ), vk ∼ N (0, Pk )
with wk , vk white and independent of x0 .
Proof: Write down the distribution p(x0 . . . xk , Zk ).
Immediate Consequence: Since for Gaussian RV’s MAP=MMSE, then (x0 , . . . , xk )(k)
(k)
are equivalent to the expected means: In particular, xk = x+
k.
Remark: The above theorem (but not the last consequence) holds true even for the
non-linear model: xk+1 = Fk (xk ) + Gk wk .
18
4.5 KF Equations – Basic Versions
a. The basic equations
Initial Conditions:
. .
x̂−
0 = x0 = E(x0 ), P0− = P0 = cov(x0 ) .
Measurement update:
x̂+ − −
k = x̂k + Kk (zk − Hk x̂k )
Pk+ = Pk− − Kk Hk Pk−
where Kk is the Kalman Gain matrix:
Kk = Pk− HkT (Hk Pk− HkT + Rk )−1 .
Time update:
x̂− +
k+1 = Fk x̂k [+Bk uk ]
−
Pk+1 = Fk Pk+ FkT + Gk Qk GTk
b. One-step iterations
The two-step equations may obviously be combined into a one-step update which
computes x̂+ + − −
k+1 from x̂k (or x̂k+1 from x̂k ).
For example,
x̂− − −
k+1 = Fk x̂k + Fk Kk (zk − Hk x̂k )
−
Pk+1 = Fk (Pk− − K k Hk Pk− )FkT + Gk Qk GTk .
.
Lk = Fk Kk is also known as the Kalman gain.
The iterative equation for Pk− is called the (discrete-time, time-varying) Matrix
Riccati Equation.
19
c. Other important quantities
The measurement prediction, the innovations process, and the innovations covari-
ance are given by
.
ẑk = E(zk |Zk−1 ) = Hk x̂−
k (+Ik uk )
.
z̃k = zk − ẑk = Hk x̃−
k
.
Mk = cov(z̃k ) = Hk Pk− HkT + Rk
d. Alternative Forms for the covariance update
The measurement update for the (optimal) covariance Pk may be expressed in the
following equivalent formulas:
Pk+ = Pk− − Kk Hk Pk−
= (I − Kk Hk )Pk−
= Pk− − Pk− HkT Mk−1 Hk Pk−
= Pk− − Kk Mk KkT
We mention two alternative forms:
1. The Joseph form: Noting that
xk − x̂+ −
k = (I − Kk Hk )(xk − x̂k ) − Kk vk
it follows immediately that
Pk+ = (I − Kk Hk )Pk− (I − Kk Hk )T + Kk Rk KkT
This form may be more computationally expensive, but has the following
advantages:
20
– It holds for any gain Kk (not just the optimal) that is used in the esti-
mator equation x̂+ −
k = x̂k + Kk z̃k .
– Numerically, it is guaranteed to preserve positive-definiteness (Pk+ > 0).
2. Information form:
(Pk+ )−1 = (Pk− )−1 + Hk Rk−1 Hk
The equivalence may obtained via the useful Matrix Inversion Lemma:
(A + BCD)−1 = A−1 − A−1 B(DA−1 B + C −1 )−1 DA−1
where A, C are square nonsingular matrices (possibly of different size).
P −1 is called the Information Matrix. It forms the basis for the “information
filter”, which only computes the inverse covariances.
e. Relation to Deterministic Observers
The one-step recursion for x̂−

k is similar in form to the algebraic state observer from
control theory.
Given a (deterministic) system:
xk+1 = Fk xk + Bk uk
zk = Hk xk
a state observer is defined by
x̂k+1 = Fk x̂k + Bk uk + Lk (zk − Hk x̂k )
.
where Lk are gain matrices to be chosen, with the goal of obtaining x̃k = (xk − x̂k ) →
0 as k → ∞.
21
Since
x̃k+1 = (Fk − Lk Hk )x̃k ,
we need to choose Lk so that the linear system defined by Ak = (Fk − Lk Hk ) is

asymptotically stable.
This is possible when the original system is detectable.
The Kalman gain automatically satisfies this stability requirement (whenever the
detectability condition is satisfied).
22

Ch4 - Kalman Filter Discrete Time

Uploaded by

Document Information

Original Description:

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Ch4 - Kalman Filter Discrete Time

Uploaded by

Copyright:

Available Formats

Technion – Israel Institute of Technology, Department of Electrical Engineering

Estimation and Identification in Dynamical Systems (048825)

4 Derivations of the Discrete-Time Kalman Filter

• We also provide a deterministic (least-squares) interpretation.

We start by describing the basic state-space model.

A discrete-time, linear, time-varying state space system is given by:

xk+1 = Fk xk + Gk wk (state evolution equation)

for k ≥ 0 (say), and initial conditions x0 . Here:

The Markovian model

Recall that a stochastic process {Xk } is a Markov process if

p(Xk+1 |Xk , Xk−1 , . . . ) = p(Xk+1 |Xk ) .

For the state xk to be Markovian, we need the following assumption.

The following is then a simple exercise:

Proposition: Under A1, the state process {xk , k ≥ 0} is Markov.

• The measurement process zk is usually not Markov.

where p(xk+1 |xk ) is determined by p(wk ).

The Gaussian model

• If, in addition, A1 is satisfied (namely {wk } is white and independent of x0 ),

This model is often called the Gauss-Markov Model.

under the following assumptions:

• wk a 0-mean white noise: E(wk ) = 0, cov(wk , wl ) = Qk δkl .

• vk a 0-mean white noise: E(vk ) = 0, cov(vk , vl ) = Rk δkl .

• cov(wk , vl ) = 0: uncorrelated noise.

• x0 is uncorrelated with the other noise sequences.

We refer to this model as the standard second-order model.

It is sometimes useful to allow correlation between vk and wk :

cov(wk , vl ) ≡ E(wk vlT ) = Sk δkl .

This gives the second-order model with correlated noise.

A short-hand notation for the above correlations:

Note that the Gauss-Markov model is a special case of this model.

• The mean obviously satisfies:

• Consider next the covariance:

Note that xk+1 − xk+1 = Fk (xk − xk ) + Gk wk , and wk and xk are uncorrelated

This equation is in the form of a Lyapunov difference equation.

• Since zk = Hk xx + vk , it is now easy to compute its covariance, and also the

Consider the linear Gaussian (or Gauss-Markov) model

Let Zk = (z0 , . . . , zk ). Our goal is to compute recursively the following optimal

Also define the one-step predictor of xk :

and the respective covariance matrices:

(i) It is the covariance matrix of the (posterior) estimation error, ek = xk − x̂+

In particular, MMSE = trace(Pk+ ).

Recall the formulas for conditioned Gaussian vectors:

• If x and z are jointly Gaussian, then px|z ∼ N (m, Σ), with

Σ = Σxx − Σxz Σ−1

• The same formulas hold when everything is conditioned, in addition, on an-

It follows that (xk |Zm ) is Gaussian (for any k, m). In particular:

(xk |Zk ) ∼ N (x̂+ +

Suppose, at time k, that (x̂− −

We shall compute (x̂+ + − −

Measurement update step: Since zk = Hk xk + vk , then the conditional vector

1. The KF computes both the estimate x̂+ +

similarly for x̂−

Pk+ = cov(xk − x̂+ + + T

5. The following “output prediction error”

is called the innovation, and {z̃k } is the important innovations process.

cov(x − x̂) = E(x − x̂)(x − x̂)T = Σxx − Σxy Σ−1 T

Recall further the orthogonality principle:

for any linear function L(y) of y.

• Suppose cov(y1 , y2 ) = 0. Then

E L (x|y1 , y2 ) = E L (x|y1 ) + [E L (x|y2 ) − E(x)] .

cov(x − E L (x|y1 , y2 )) = (Σxx − Σxy1 Σ−1 T −1 T

Consider a discrete-time stochastic process {zk }k≥0 . The (wide-sense) innovations