688
~ E E ETRANSACTIONS
ON ATJTON.~IC CONTROL, VOL.
AC16, NO. 6,
DECEMBER
1971
A Tutorial Introduction to Estimation and Filtering
AbsiracfIn this tutorial paper the basic principles of least squares propert,ies of thisestimatorare
derived in Section 111,
estimation are introduced and applied tothe solution of some while in Section IV ne summarize the propert,ies of Gausfiltering, prediction, and smoothing problems involving stochastic
linear dynamic systems. In particular, the paper includes derivations sianrandom vect,ors and discuss the leastsquares estiof the discretetime and Continuoustime Kalman a t e r s and their mation of one jointly Gaussian random vector in terms of
prediction and smoothing counterparts, with remarks on the modiii another. Section V is devoted t.0 t.he linear least squares
cations that arenecessary if the noise processes are colored and cor est.imation of one random vector in terms of another, and
related. The examination of these state estimation problems is pre Section VI contains the derivations of a number of properceded by a derivation of both the unconstrained and the linear least
ties of this linear least squares est.imator that are imporsquares estimator of one random vector in terms of another, and an
examination of the properties of each, with particular attention to the tant both in their own right and for the st.raight,forward
case of jointly Gaussian vectors. The paper concludes with a discus inductive derivation, in Sect.ion VII, of the Kalman filt,er
sion of the duality between least squares estimation problems and and predictor for estimdng the stateof a noisecorrupted
least squares optimal control problems.
discretetime linear dynamic system in t,erms of its noisy
output, measurements. These same propert.ies provide the
background
for the introduction of the socalled innovaI. INTRODUCTION
tions sequence (or process) and t.he inclusion of an alternaHIS papercontainsa
tutorialintroductionto
t,he
t,ive derivationof the discretetime Kalman filter and prebasic principles of least squares estimat.ion and their
dictor. The continuoustime filt>ering, predict,ion, and
application to the solution of some stat.eest.imation probsmoothing problems are t.hen examined from an innovalems associated wit.h finitedimensional lineardynamic
tions viewpoint in Sect.ion VIII. Finally, Section IX consystemsoperatingina
stochast.ic environment.. The extains a discussion of t.he dualit,ybetweenleast.squares
position begins nit11t,he problemof cstimat.ing on(: random
est,imat.ion problems and least. squares control problems,
variable or vectorint,erms
of anot>her, and proceeds
andan examination of the st.eadystat,e I<almanBucy
throughaderivation
of the discretetime Kalman filter
filter.
a.nd predict.or t o a derivat.ion of the continuoust.ime I h l In
sections we distinguish between random
_ _ the first, six
. .
.
manBucy filter and predictor, and an examination of t.he
vectors and their sample values by denot,ing the former
continuoust,ime smoothing problem. The paper concludes
with capital 1ett.ersX, Y , and 2 and the latt.er by the corvit.11a discussion of t,he duality bet.wenleast squares estiresponding lower case letter x , y, or z . In~
order
~ to conform
mation and least squares opt.ima1 control problems.
with standard engineering usage, we discontinue making
~ _ _
The development, in each section d r a m nontrivially on
this dist,inction in SectionsVI1 through I X. .where
.~~~
t.he
~that in preceding semions and on the following a.ssumed
lower case letters x , y, z, etc., are used to denote
both
ran.
~~.
prerequisites.
dam vectors and their sample .__values. Unless specifically
1) Familiarity v,it,h the elements of probability t.heory
indicated t.o t>he contrary,the upper case let,ters A , B , etc.
through t,he concept of jointly distributed random varidenote mat.rices, while the lower case 1ett.ersQ, b, etc. deables and random vectors described by their joint. probnote (nonrandom) vectors.
abilitydensityfunction,
and t,he associated means, covariances, and condhional expectations. For a. discussion
of these topics, see, for example, [l].
11. LEASTSQUARES
ESTIMATION
OF ONE RANDOM
VECTOR
2) Beginning xvith Section VII, a.n exposure t.o t.he
IK TERUS
OF ANOTHER
statespace description of linear dynamic systems,t.he dual
Consider two jointly distributed random vectors X and
concepts of controllability and observability, and t,he
Y,
and suppose that in a part,icular sample observationwe
least. squares(linearquadrat.ic)regulatorproblemin
measure the value of the random vector Y to be (the mboth finite and infinite t.ime. For a n exposition of these
vector) y. It. is not. unreasonable t.o expect, t.hat theknowlt.opics the reader is referred t.o the appropriate papers in
edge that Y has value y will convey in general some inthis issue or, for example, t.o [ 2 ] .
formation
about
the corresponding (but unmeasured)
I n Section 11, we esanune t.he least. squares estimationof
sample
value
X of the random vector X . I n particular, we
one random vector in t.erms of anot.her. Some important,
expect t,hat in general the information that Y has value y
will change any a priori guess or estimat,e we might have
3Iannscript received July 19, 19il. Paper recommended by D. G. made a.bout,t,he value of X and that our degree of uncerLuenberger, Associate Guest Editor. The preparation of this paper
was support,ed by t.he Air Force Office of Scientific Research through taintyabout X d l have been decreased. T o be more
Themis Grant Fb4462069C0116.
The author is with the Control Systems Science and Engineering specific, it is natural to ask t,he following question. Given
Laboratory, Washington University, St. Louis, 3.10.
the informat,ion that Y has the value y, what. is the best
~~~
~~
~~~
689
RHODES: ESll3IATION AWD FILTERING
estimate f of t.he corresponding value of the ra.ndom vector The corresponding minimum valueof (2) is then
X ? The sense in which the estima.te f is to be best must,
of course, be defined. While several natural criteria
present.
themselves, we restrict.
attentmion
here
the
to
case where we
define the best est.imate to be the nvector f t.hat, minimizes overall nvect.ors z the conditiona,l expectationwhich
E{llX  f[l21Y= y]
E{IIX1121Y= y )  !IE{XIY = yfllz
E{ilXllY = y ) 
(4)
llf112
is t.he conditional variance of X given that Y = y.
Alteinatively,t,he sa.me conclusioncan be obt.ained by
first different.iating the second line of (2) with respect to z
and sett,ing the result equa.1 to t.he zero vector, and then
of the normsquared estimat,ion errorgiven that Y has noting that t,he second derivativeof (2) with respect t o z is
value y. In ot,her words, we adopt, as our measure
of uncer. twice the .n x n unit matrix, so that, t.he local extremum so
ta.inty about X the meansqua.red estimation error given
obtained
indeed
is a
(global) minimum.
Q.E.D.
t.hat Y has value y, and we choose as the best est,imate the
It is import,ant t.0 observe that the essence of Problem 1
nvect.or f t.hatminimizes this measure of uncertainty. is that thevalue y of the random vect.or Y is given and we
For obvious reasons, t.his est,imat,eis known variously as seek the nvector f that is the best estlmate of the value
the leastsquaresestinmte, the least mea,n squares esti of the random vector X . It. is clear that, f mill depend in
mate, t.he minimummeansquareerror estimate, or the general on the given mvect.or y. Conceptua.lly, t.his prominimum variance est,imate.
cedure could be repeated for every value y of Y for which
m e now summarize andformalize these ideas in the fol fy(y) > 0 t,o yield, in principle, a graph of the correspondlowinP: moblem statement,.
ing best estimate f in terms of the value y of Y. This
Problem 1 (Lea.st Squares Estimate): Consider
two
gra.ph may, of course, be interpret.ed as defining a. funcjoint.ly distributed random vect,ors X and Y with respec tion, which we denot,e 8, of the random vector Y. Since a
tive dimensions
_.~
n. __
and m a.nd with___ joint. probabilit,y function of a random vector is itself a random vector, it
density fx,y(. , and suppose
~.
...
tha.t. in a particular
~~.sample
.
follows t>hat8 is a. ra.ndom vectorit. is the random vector
obse&at.ion the value of Y is measured to be y. Find the defined for all y wit.h .fy(y) > 0 by
estimat,e i of the corresponding samplk
.~  ~~
value
.
x of the ranj ( y ) 4% 17 = E{XIY = y).
( a
dom vcct.or ;P t,hat  is^ best in the sense t,hat f minimizes
over all nvect.ors z the conditional expectation  E
_ _ UIX I n ot.her words, 8 is the random vector
Z I ~ ~=~y)
Y of the normsquaredestimation
error
given
.
.~
_ _ _ ~
that,
Y has value y.
8 = E{XIY).
(jb)
__.__
Propositionl:The solution t,o Problem 1 is given for all
We
can view 8 as .~
an operat,or (or black box). that
.
. . ~
y with fy(y) > 0 by t.he conditional expectation
accepts sample observations~ y on the random vect,or Y
and produces t,he coirespondingleast squares estimate
f = E{XlY = y) =
f = 8(y) = E{XIY = y)

a),
~~~~
~~~~
~~~~
~~~
~~
~~
~~
~~
~~
~~~
~~
of X given t,hat Y has
value
y,iVe
ca.lp$t,he
least squares
~
~
~
.
.
~
~
estimitor
of
X
in
terms
of
Y.
I
f
g
is
any
funct,ion
mapping
_ _ ~
R
into
Rn,
then
it.
is
clear
from
the
way
we
constructed
2
of X given that Y has valuey. The corresponding minimum
tha.t.
for
all
y
with
fy(y)
>
0
we
have
meansquare error is the conditional covariance of X given
that Y has value y.
E{liX  2(Y)l?iY= y ) 5 E { \ ~ x g(Y)JIY= Y].
Proof: Expanding the optimality criterion, using t,he
linearit,y of the expectation,andthen
complet,ing t.he Taking expect,ations of both sides and invoking the
square we obtain
j dent,ityl [1]
~
~~
E(llX  zl121Y
y]
E{XX  2zX
E{XXlY
+ zz(Y = y)
y)
 2ZE{X(Y = y)
=
Ey[E{ilX  g(Y)i121Y)l= E{llX  g(Y)Ilf
we t,hen have
22
E{l\X  2(Y)112) 5 E{\lX  g(Y))!2).
~ { l lz E { X I Y= y)lj2]
+ EtllX1I21Y

(7)
Thus the estimat,or
2constructed by solving Problem 1for
Y)
liE{XlY = y}p.
(2)
The only term on the right. side of (2) involving z is t.he
first, and this is uniquely minimized by setting
z =i
(6)
E{XIY = y).
(3)
This identity k easily proven by noting t.hat. t4heleft side is just
Jam
JPam
IIX
IlXlY(XlY) dx
fY@) dY
which, on witingfXly(xly)fy(y) = .fx,y(x, y) (Bay rule), reduces
immediately the
right
to
side.
' , '
690
IEEE TRANSACTIONS
ON AUTOMATIC
CONTROL, DECEMBER
1971
Problem 1 ' (Least Squares Estimator):
Consider two (x/ \z) =
zIy ) = E { X I Y ) E { Z I Y }= 2 2.
jointlydistributedrandomvectors
X and Y Kithjoint
proba.bility densit,y function fx,y( *, Find t.he est,imator
2) It is unbiased in the sense that
2 of X i n t,erms of Y that is best, inthe sense that 2 mini $1 =
 E~EIXIYl}
mizes E{I~x
functions
 g(Y)I!'}
all
over
g mapping F
Y
into Rn.
= E ( X )  E ( X } = 0.
Proposition 1 ': The least squares estimator 2 of X in
I n fact, we have the stronger statement
terms of Y in the sense of Problem 1' is the conditional
expectation
E{X  2 l Y ) = E{XIY)  E[E{XIY)IYl
+
a ) .
2 = E{XIY}
E { X I Y }  E { X l Y ) = 0.
of X given Y , and t,he corresponding minimum mean3) For any nonnegative nlatrix F , 2 = E{XIY} minisquare erroris the conditional varianceE {\\X  E{XIY}j\z}. mizes E { [ X  g ( Y ) ] ' F [ X g ( Y ) ] )over all funct.ions g:
It is important t.o note that, in Problem 1 we seek an R" + R", and f = E{Xl Y = y } minimizes E { [ X estimate, viz., the vector f that mininlizes overall n z ] ' F [ X  z ] ( Y= y} over all nvectors z .
vectors z the conditional expectation E{llX  zl12:Y= y}
P.roof: The proofs of 1 and 2 are included in t.he st,ategiven that Y has value y, whereas in Problem 1' we seek ments of these properties. Property 3 follows by a direct
an estimator, viz., the function 2 t,hat minimizes the (un modification of the proof ofProposition 1. If F is positive
conditioned)expectation E{jlX  g(Y)I:2)overall func definite t,he proof isunchanged if JJq!12
is interpreted to
tions g mapping Em into R". As long as thcfunctions g a.re mean q'Fq a.nd z'q is replaced by z'Fq; if F is nonnegative
unconstrained in any way (such as being cont.inuous or definite the same identifications may be made but j1qiJ =
linear) we canconstruct the solut.ion to Problem 1' by [q'Fq]1'2
is in this case only a seminorm, and while f =
solving Problem 1 for each y with f&) > 0, as we have E(XIY = y } minimizes t*hefirst t.erm on t,he right. side of
done above. If, on the other hand, t,he functions g over c2), it does not do so uniquely.
which we seek a solution t.o Problem 1' are consbrained to
The following property of the least squares estimator is
be, for example, linear or continuous, then this approach somewhat less trivial and, like the above t.hree propert,ies,
is no longer valid (unless, of course, the unconstrained has far reaching implications in est.imation t,heory.
leastsquaresestimat.orturnsout
t.0 have t.he desired
Proposition 5%: The e: ti nation errorX 4 X  2 in the
property of linearity or continuit.y), and an alt.ernative least squares estimator .f = E {XIY ) is uncorrelabed with
approach must. be adopted. Depending on the joint, prob any function g of the random vector Y , i.e.,
abilitydensityfunctioninvolved,
it may well turn out,
E{g(Y)X'j = 0
(8)
that. in the
constrained casethere maybe some functions g
and values y for which
and,
fact,
in
.rl x nlbut 2 is still t.hc best "on t.he average" in the sense that (7)
holds for allfunct,ions g satisfying the required constrainbs.
Proof: For every value y of Y ne have
E{g(Y)X'IY = y}
tiesare direct consequences of t.he factthat.the
least.
squares estimator of X in terms of Y is t,he condit,ional
expectation E {XI Y 1. We begin with three t.rivia1properties.
Proposition. i a (Propert& of the Least Squa& &timator): The least squares est,imat.or2 = E { X ( Y }has t.he folloving properties.
1) It is linear, i.e., for any deterministic matrix A and
deterministic vector b with the appropriate dimensions
E { g ( y ) [ X  2(Y)I'lY
Yl
IV. LEASTSQUARES
ESTIAIATION
OF Garrss~a~
RAKDONVECTORS
Gaussian random vect,ors play a major role in probabilit.y theory and system theory. Their importance.
s t e .~
m
largely from.two
fa.ct.s:first, they possess many dist,inctive
~_________~.._
dismathemat.ica,lproperties
and,
second, the Gaussian
.
.
. ..
.~~
tribution bears close resembla.nce to t.he _
probability
laws
_
~
~
~~~
~~
691
RHODES: ESTIbTATION AND FILTERING
of many physical random phenomena. We summarize here
the properties of Gaussianrandomvectors
that are of
E[Z]= m =
greatest, importance in estimation theory, leaving
it to the
xI
reader to consult., e.g., [l1, [ 4 ]for a det,ailed discussion of
and covariance
the Gaussian distxibution.
DeJinition I : An rdimensional random vector Z is said
t o be Gaussian. (or normal) wit,h parameters m (an rvector) and x (an r X r positive definite mat.rix) if its
proba.bility density functionfz( is given forall z E R' by
I'
fz(z)
(11)
where I B I is the determinantof Z. The corresponding characteristic function of 2 is
pz(u)
....................
(2rr)r121z11/2exp[4(2  m ) ' z  l ( z  m ) ]
[ ~ v ' z=] }exp [ju'm  i u ' z u ]
E{exp
..'W
(12)
and t,his constitutes an alternative
definition of a Gaussian
ra.ndom vector 1vit.h parameters m and 2;. I n fact, the
definition of a Gaussian ra.ndomvect,or as one whose characteristic funct,ion is given by (12) is more general because
it includes the possibilit,y that Z may be degenerate and
haveitsentiredensityconcentratedona
propersubspace
of R', in which case 1x1 = 0 a.nd X is nonnegative definite
but not positivedefinite and not invertible.I n any case, we
henceforth adopt the shorthand notation N(m, z) for t.he
Gaussian (or normal) distribution with parameters m and
z.
Using the wellknown [I 1 , [ 4 ] properties of the characterist,ic functionto compute t,he moments of Z we have,
in pa.rticular,
[Y, X ] cov [ Y , Y ]
(13b)
x 4W %ffi'L
Then t,hefollowing properties
hold.
Property 1: If W = A 2 where A isanynonrandom
q x r matrix t,hen, from (12),
= LCOV
pw(v) =
E{exp ( j u ' w ) }
= pz(A'u) =
=
EIexp Ij(u'A)Z]}
exp [i(v'A)m  $(u'A)x(A'u)]
exp [ju'(Am)  +v'(AXA')u]
so tha.t W is hr(Am, AXA').
Property 2: I n particular, taking A = [Znl
01 and then
A = [o: I,], where I , is the ?2. x n unit mat,rix and 0 is a
zero matrix of the appropriatedimensions, we see with the
use of (13) t.hat, the marginaldist,rihut.ions of X and Y are
Ga.ussian, i.e., X and Y are, respect>ively, N(mx, BXX)
and N ( m y , Z Y Y ) .
Property 3: Furthermore, if X and Y have the same dimension, t*alringA = [In,Z,] in Property 1 and using (13)
Y is N ( m x m y , Bxx
Zxy
8yx
we see that X
XYY).
Property 4: If X and Y arc uncorrelated, so that Z x y =
= 0 , then they are also independent, beca.use in this
case
(xy$
cov [Z,
Z]
1 d2&
= 7 .  (0)
dv2
22
 mm'
F%Y
X.
Thus t,he pammeters m and X of the Gaussian probability
distribution (11) or (12) are, respectively, the mean and
t.he covariance of 2.It
. is important, to note that theprobability
densit,y
funct.ion
of a Gaussian
ra.ndom
vect.or
__
.
~ is
_
.
.~__________
t,herefore comdetelv sDecified by a knowledge
of
its
mean
.
~
~~~~~
and cova.riance. The importance of Gaussian random vectors in estimation and .control
t,heory is due largely to trhis
..
factand to the following facts.
1) UncoyGlated jointly Gaussian random vect.ors are
independent..
___ .._
2) Linear
. . . functions
. .
of Gaussian random vectors are
themselves Gaussian
random
vectors.
~3). I n ~particular, sums of jointly Gaussian random
vec._____
tors are Gaussian
random
vectors.
~~~~
4) The conditional expectation~of one
jointly Gaussian
~~
random vector givenanother.isaGAussian random vector
t,hat is a linear
funct.ion of theconditionjng vector.
In particular, let X and Y be jointly distributed random
vectors with respective dimensions n and rn whose composite vector 2 = [XI,Y']' is N(m, 8 ) with mean
~
~~~
~~~~
~~
_
.
and (11) reduces to ~x,Y(x,
Y) = fx(x>f~(~>.
~
Property
5: Assuming for the moment, that, mx = 0 and
m y = 0 , the conditional densityof X given Y is, from (11)
and (13),
......
(z  m)'xl(z  m ) = ( x  mx)'8xx1(x  mx)
K exp  4 [z'zlz  y ' z y y  l y ]
K exp  +[x'Sxxx
+ x'Sxyy + y'Syxx
.....
~~
b/=/M+r/L
so tha.t,, expandingS z
sxxxxx
I , we ha.ve
SXYZYX =
(15a.)
692
CONTROL,IEEE
AUTOMATIC
TRANB.4CTIONS ON
SXXZXY
SYXZXX
SYXZXY
+ SXYZYY
+ SYYZYX
+ SYYzYY
= 0
I.
dit,ional mean is intxactable. Under these circumstances,a_
common a.pproach is to modify~theproblem fprmsation
(152) and seek the est.imat,or 2 of X that is thebestlinearfunc,
(l5d) tion of the random vect.or Y. This leads
. tmo
._
the linear Zemt
squares estinzator which is discussedin the next. sect,iog.
~
Completing the square in theexponent of (14) yields
f x l ~ ( x l v=
> K exp
 8[Ilx + ~xx1SxyyJlsxx2
We now note t,hat from (15a) and (15b), respectively, we
have
sxx'
= XXX
+~ x x  ' s x y z Y x
V. THELINEARLEASTSQUXRES
ESTIMATOR
I n this section we restrict attention to the
special case of
Problem 1' in which the estimat.or 2 is rest,ricted to being a
li?zeu.r funct.ion of t,he random vect,or Y. The problem of
interest is therefore the follom4ng.
Problem 2: Cogider t y o iointly dist,ributedrandom
vectorsX and Y whose means and covariances
are as.
sumed known,2 i.e., it. is assumed that the mean and c_; _
variance of the composite random vector 2 defined by
2 = [X', Y']' are givenby (13). Find thelinear est,imaJo_r
~.
2 = A" Y bo of X in terms of Y that & best in thesense
that .? minimizes
~
Sxx'Sxy
 ZxyZyy'
which may be combined to give
sxx1
z x x  ZxyZyy'Zyx.
syy  syxsxx1sxy
mx
zXYxYYl(y
~~~~~
~~~
~~
A E { [ X  AY  b]'[X  AY  b]}
and substitution of these in (16) shows t.hat f x l y (Ig)
. is
~T(zx~z~~'!/ , ~z xx Yx Z Y Y  ~ ~ Y XMore
).
gem) N(mx
ally, if X and Y have nonzero mean, f x l y ( . 1 ~ is
z x Y z Y Y  ' ( Y  m y ) , z x x  zxYzYY'zYx).
Property 6: The least squares est.imat,orof X in terms of
Y is thus
the
random vect.or
li. = E { X I Y )
~~
&.&'
L L V L
A ~FFMZI&&
argument from (15d) a,nd ( l j b ) yields
zyy1 =
1971
(15b)
= 0
DECEMBER
 my). (17)
(18)
over all linear estimat.ors AY
b of X in t,erms of Y, i.e.,
find t,he ?I. x nt nla,trix A" and the fzvector bo that minimizes (15) overall 71. X 171 matrices A andall vectors b.
R e refer to the estimator 2 = A"Y + bo t.hat is optimal
in the sense defined by t.his problem
as
the best lineal e&mator or the linear least squares estimator
of X in ternsof
remark a.t,t,he outset. t,hat if its mean
is known,
thedishibution of the randomvector 2 is completely
specified by t>he distribut,ion of its zeromean component
2  mZ, so that, est.imating X in t.erms of Y is clearly
equivalent to estima.ting X  mx in t.erms of Y  m y as
long as themeans mx and m y are known. I n ot,her words,
it. can be assumed without loss of generality t,hat.Xand Y
have zero mean, and this we adopt as a standing assumption for t,he remainder of this paper, nith the understan&
ing tlmt X and Y a.re t.0 bereplacedbyX
 mx and
Y  m y if they do not already have mean zero.
Since t.he trace of a real number is it.self, the optimality
criterion (1s) is unchanged ifwe take the trace of bot,h
sides. Using t,he trace identity t r [FG] = t r [GF] [5],[GI,
and t.he linearity of expectation and it,s int.erchangeability
nit11 t,he t,race operation, we obtain
mz
Y:
Sinc.ethis ra.ndom vector is a linear function of the random
vector Y it follows immedia.telg from Property 1 t,hat,
EIXIY] is N b x , z x Y z Y Y  l z Y x ) .
Property 7: Theleast squaresestima.tionerror
X =
X  E {XI Y ) is t<hedifference between two joint,ly Ga,ussian random vectors and is, therefore, from Pr0pert.y 3, a
Gaussianrandomvector
1vit.h mean zero (since, from
Proposition 2a, the least squares estimator is unbiased)
a.nd covariance
equal
tothe
conditional
covariance
Z X X  zxyByy'ZYx of X given Y (from Property 5).
Thus X is N ( 0 , z X x ~ x y ~ y y  l ~ y x ) .
PropePtg 8: Any function of t,he least squarese d m a t i o n
error X = X  EIXIY} is independent of any function of
the random vector Y.This follows by recalling from Propmition 2b that X is uncorrelated wit.11 Y, and, from Property 7, that X and Y arejointlyGaussian.Thus,from
E{\!X  AY  b p ) = t r E { [ X  AY  b]'
Property 4, X a.nd Y a.re independent, and invoking the
st.andard
theorem
[l] that functions of independent ran. [X  AY  b ] ]
dom variables are themselves independent establishes the
= t r E { [ X  AY  b ]
desired result.
Irj,p.z45
From an estimation
viewpoint,
the most, interesting and
. [ X  AY  b ] ' ]
important of these properties
~.
is probably Pr0pert.y
.
6, viz.,
= t r [ E I X X ' ]  AEIYX'}
when X and Y are jointly Gaussia.n, the lea.st squares estimat.or 2 of X.in
.
terms of Yis a l&ea,r function
of Y . I n fact,,
 E{XY'}A'
AE{YY'}A'
 .
2 is the Gaussia,n random vector.~
given by (17).
bb'l
If X~andY are not Gaussia,n
random
vect,ors, however,
.it. is in general to be expected that the conditiona.1expecbe
raeiilz,a,,,17
fi&ion of the iandom
As we shall shortly see, it is not necessarv to assume thatthe
tation ~ { X l y )
joint. probability
density
functionfx,Y (. . ) is knosn. It is sufficient,
variable
f.
I
n
many
such
cases
the
calculation
of
the
cont
h
t
we know simply the means and co&iances of X and Y.
__
~~
~~
~~~~
~~
~~
~~~~~
693
RBODES: ESTIXATION AND FILTERING
t r [ z ,  A Z y x  ZxyA'
2 = 2(y)
+ AZyyA' + bb']
(19)
where for the third and fourth equalitieswe recall that X
a.nd Y are now assumedto havezero mean, so that. E {bX' }
= bE{X') = 0 and E { X X ' ) = cov [ X , X ] = Z x x , etc.
R e now note t.hat for any positive definite matrix Q a
valid definition of t.he inner product between two 1% x m
matrices F and G is ( F , G } = tr [FQG'], and the corresponding induced norm is llF1I2 = t r [FQF'] [6]. Thus if
t.he random vector Y is nondegenera.t,eso that Q = Z y y is
positive definit,e, (19) may be rewritken as
An alternative means of deriving this result is t,o set
equal t.0 zerot.he partial derivatives of (19) 1vit.hrespect to
A and b, using the identit.ies listed, for example, in [6], to
obta.in
a

ab
E{iJX AY  b1I2) = 0
 S{llX 
aA
AY
 AY  b112) = t,r ( x x x )  ( A , ~
2b"
bl;'] = 0
=
E{I;x
BxyZyy'y.
AOZyy'
+ A O Z y y  Z x y  Zyx'
x y ~ ~ Y  1 )
which may be solved to give (21). In the case where X
 (ZxYZYYl, 4 lA112 llb1!2 and Y are (realvalued) random oariables there is, of
where we have used the fact t.hat t r [bb'] = t r [b'b] = course, no need to introduce the trace opera.t.ionsand the
process of taking partial derivatives is
more direct, and
llbi12.Completing the square then yields
familiar.
E{(jX  AY  b1i2) = /\A  ZXYZYY1112
l$iI2
It was shown in Section I1 that. the unconstrained least
squares estimat,orof X in terms of Y is the conditional extr [ Z x x  zxYZYY'ZYXI (20)
pectation E(Xi Y ) , and in Section ITr it was not.ed that
when X and Y are jointly Gaussian this conditional exwhich is clearly uniquely minimized by t.a.king
pectation is linear in Y.Thus, when X and Y are jointly
A" = ZxyZyyI,
bo = 0 .
Gaussian, the unconstrained least. squaresestimatoris
alrea.dy
linear in Y a.nd it. must therefore coincide svit,ht.he
Thus t,he least squares linear estimator is
linear least squa.res estimator. A comparison of the expresx = ZxyZyylY
(21) sions for f and cov [X, 21 given in Properties 6 and 7 of
a.nd the corresponding minimum value of the va.riance of Section IV with those given above in Proposition3 shows
that t.his is indeed the case. In fact, t.he idcntificat.ion of the
the estimat,ion errorX = X  2 is, from (20),
best linear est,imator with t.he best unconstrained estimaE{\!Xli2) = tr [ x x x  I ; ~ ~ B ~ ~  ~ z t.or
~ ~in ]the
. Gaussian case provides a means for directly deducing severa.1of t,he propert.ies of the best linear estimator
In fact, it is readily found that.
given in the following section.
cov [ X , X] = Z x x  ZxyZyy'Zyx.
(22)
I n order to emphasize the fact that,t.he best linear estimat,or
has a number of properties, such as linearity, in
In thecase where X and Y do not have zero mean we have
common wit.11 the conditional expectatmion, and
t.0 provide
a
notation
for
t.he
best,
linear
estimat.or
of
X
in
terms
of Y
x  mx = BxyZyy'(Y  m y )
that explicitly identifies b0t.h X and Y, we introduce the
notmation
or, equivalently
2 = mx + ZxyZyyyY
 my)
(23)
with the covariance of the est>imation error still given by
(22).
I n summary, we have the following solut.ion t.o Problem 2.
Proposition 3: The best linear est.ima,t,or,in the sense
defined by Problem 2, of the zeromean random vect,or X
in terms of t<hezeromean random vector Y is given by
x
A
ZxyB yy1Y
while the covarknce of the corresponding estimat,ion error
i = x  f
is given by
cov [X, 21
E{X? )
Z x x  ZxyZyy'Zyx.
I n particular, we define the best 1inea.r estimate i of the
value of X given that Y has value y to be
E*{XIY} =
2=
ZxyByy'Y.
(24
It is emphasized that E* {XIY) is simply an alternative
notation for t.he best, linear est$imatorof X in terms of
Y: it is not to be confused with the conditional expect,ation
E { X J Y nit11
)
svhich it corresponds only in such extremely
special cases as when X and Y are jointly Gaussian.
VI. PROPERTIESOF THE LIKEAR LEAST SQUARES
ESTIMATOR
I n this sect.ion we derive a number of simple properties
of the best linear estima,tor.As well as being import.a,nt.in
their own right, these properties form the basis of our inductivederivationinSectionVI1
of the discretetime
Kalnmn filter andpredictor;in
fa.ct, thisderivation is
nothing morethan thestraightforward applicationof these
properties to estimationproblemsinvolvinglineardynamic systems. These same propert,ies provide the basis
for subsequent, introduction of the socalled innovations
694
ON AUTOMATIC CONTROL, DECEMBER
IEEE TRANSLCTIONS
1971
process which plays a major role in later sections of the [ X  2, Y ] = 0. In fact,X is uncorrelated wit.h any linear
function of Y and, in particular, cov [ X , 21 = 0.
paper.
Property 2: If Y and 2 a.re uncorrelated then the best
As before, we separate the trivial properties, which we
linear estimat,or of X in t.erms of b0t.h Y an.d 2 (i.e., in
give first., from thosethat are somewhatless trivial.
Proposition 4a (Properties of the Best Linear Estimator): terms of the composite vector [Y',2'1') may be written
Let X , Y,and 2 be jointly distributed random vectors and
E*{XIY,2 ) = E*{XIY) E * { X / Z )
(25)
let .? = E* {XI Y ) be the best linear estimator of X in
terms of Y in the sense defined by Problem 2. Then we andthe corresponding estimat.ion errorhascovariance
have the following properties.
z x x  zxyzyy1Zyx
Z ~ z Z z z  1 Z z x = B x x is the covariance of X i y 4
P r o p u t y 1: The best linear estimator (24) and the co ZxzBzzlz;zx, where
variance (22) of the corresponding estimation error X = X  E*{X(Y{.Alternatively, with convenience for later
X  2 depend only on the
first and second momentsof the applications in mind, we can m i t e
random vect,ors X and Y and noton their entire probability
E*{XIY,2 ) = E*{XIY) E * { X l y l Z )
(26)
density functions. Thus jointly distributed random vectors
as Egg with the same means and covariances but. different prob and the covariance of theestimationerror
where ~ 2 %
is as aboveand
z2z =
ability density functions have
the same estimator X =
E*{XIY).
cov [XIY, 21.
Property 3: If Y and 2 are correlated, then the best linear
Property 2: When X and Y are joint.ly Gaussian the
(unconstrained) least squares estimator E { X \ Y ] is linear estimat,or of X in terms of both Y and Z may be written
in Y and coincides svit,h the linear least squares estimat,or
E*{XIY].
Property 3: If X and Yare uncorrelated then
E* {XI Y ] =
E@fProperty 4: The estimator .? = E*{XIY)is unbiased in
the sense that
E { X ) = E{X 
W)
= O
cov [&,
or, equivalently,
E{2j
X,,l  cov
+ ClY] = ME"(X\Y) + c
cov
g(Y)= AY
[ Z , y ,X,,].
Property 4: More generally, the best linear estimator
% E* {XIYl,
Yz, * Yk,',+I
of X in terms of t,he randon1 vectors Yl, Yz, . , Yk, Yk+lmaybewritten recursively as
a ,
and theCovariance of the corresponding
 estimation erroris
MZg2M', where Zgg is the covariance of X  E* 1x1Y}. where
Also, if X and Z have the samedimension, then
E*{X
P , Y , i , Y l [ C O V P , Y , 2IY111
E{XJ.
Property 5: The linear least squares estimator is linear,
i.e., if M is a nonrandom matrix andc is a nonrandom vector (with appropriate dimensions),
E*{MX
The covariance of the estimation erroris given by
W,k+l
2lk
+ E*{X,klYk+llkl
(27)
Xlkl.
(29)
+ ZIY) = E*{XIY) + E*{ZIYj
b of Y.
Proof: Properties 1 a.nd 2 are trivia.1 observations included for completeness.Properties 3 and 4 follow immediately on setting Z x y = cov [ X , Y ] = 0 in (24) and
takingtheexpectation
of both sides of (24) [or, more
genemlly, (23)], respectively. Propert.ies 5 and 6 are est,ablished by direct modification of the proof of Proposition 3.
Pmposition 4b (Furthe?.P.roperties of the Best Linear
Estimator):
Property 1: Thelinearleastsquaresestimator
X =
E*{XIY) is characterized by the condition t,hat the estimation error X = X  2 is uncorrelated with Y,i.e., cov
.cov
[Yk+llk,
Proof: 1) For any linear estimator AY of X in t,erms of
Y we have
cov [X  AY, Y ] = B x y  A Z y y
which (assuming Y is nondegenerate so t,hat B y y is positive definite) is zero if and o n l y if A = z x ~ x y y  which,
~,
in turn, uniquely defines the best linear estimator (21).
Thus cov [ X  AY, Y ] = 0 if and only if A Y = 2 is the
best, linear ehmator. Furthermore, for anymatrix M ,
cov [X,M Y ] = E{XY'jM' = 0.
695
RHODES: ESTIlrL4TION AND FILTERING
[Y,
21x e ha.ve
given the additional data in the form of anot.her random
vect,or Yk+l,
it is not necessary t o resolve an entire new
cov [ Y,Y ] * cov [Y,
Z]
problem of est.imating X in terms of t,he k
1 random
zww= (aov [W,
W ] . . .. .. . . , . .. . ... . . .
vect,ors Y1, .Y2,
. ., Yk,Yk+l;
all that one need do is simply t o
COV [Z, Y ] . cov [Z,
Z]
additively combine the previous best estimator
with
t.he best linear estimator of X in terms of Y,+llk.
As dem Z..O
Y Y : ...
(30) onstxatedin (29), the cova.riance of the corresponding
estimat.ionerrormay
also be updat.edby
an equally
xzz
simple procedure.
zxw = cov [X,W ]= [cov [X, Y]:
cov [X,Z]]
The random vector Y k + l l k defined by (2Sb) is sometimes
(31) c a l l e d ~ h ~ ~ ~ ~ in
~ ~Y,+1
~ a with
t i orespect
12
t.0 Yl,
YZ,
Yk
r711121. As shown in ProDosit.ion 4b. it is uncorrelated..
Then, using (21), we have
with the composite random vector [YI,
YZ,. . Yk]
and
t,herefore with Y1, Yz,. . ., Ykl and Yk separately. It
E* (XJY,
2) = E*{XJw )= Z X ~ B p q 7  1 w
mighttherefore be viewed a.s the component. of Y,+,
andsubstitut,ionfrom (30) and (31) immediately yields t,hat conveys new information not already present in Yl
(25). The expression for the covariance of t,he correspond t,hrough Yk.This viewpoint ca.n be made quite precise by
ing estimation error follows by subst,it.uting (30) and (31) viewing the least, squares linear est,imation problem as a
minimumnorm problem in an appropriately defined inner
,
, into (22) with W replacing Y. The alternative expression
product.
space R of random variables, a.nd solving it using
S (26) for E*(XI Y,2 ) is a direct consequence of t>heobserva.t.he
projection
theorem [13]. I n this formulat.ion, the
% tion t,hat, writing f l y for E*{XIYI and XI, for X  .fly,
lea.st
squares
linear
estimatorbecomes the orthogonal pro= E*{XIZ)
= fly
X l y l ~ j= E * { X I , I Z I
since
jection
of
(the
components
of) X on t.he subspace of R
4 X I yis a linear function of Y and Y is, b,: assumption, ungenerated
by
(the
components
of) the random vect,ors
correlated with 2,so that E*{f;y(Z]
= 0. A similar arguYl,
Yz,
.
.
.
,
Y,,
Yk,.l,
and
t.he
charact,erization
of t,he best
ment shows that
=
and establishes the alternalinear
estimator
as
the
one
whose
estimation
error
is untive expression for the estimat.ion error.
correlated
with
Yl
through
Y,+,
is
merely
a.
statement.
of the
3) This is a n immediate consequence of Property 2,
orthogonality
of
the
projection,
viz.,
that
X
klk+1
must
the observation from Property 1 tha.t. the random vector
be
orthogona.1
to
the
vectors
t,hat
genera.t.e
the
subspace.
Zly= Z  E*{ZIY)is uncorrelated with Y,and t,he obcalcula,tion of the innovations Y1, Y?I1,
servation that E*{XI Y,2) = E* {XlY,
21~)
since a knowl Theiterative
Y k + l l k is nothing more tl1a.n the application
edge of Y and Z is clea.rly equivalent, t,o a knowledge of Y Y 3 1 s , . . ., Yk~kl,
of
the
wellknonm
GramSchmidt orthogonaliza,tion proand 2 1 y.
4) This is simplyarestatement
of Property 3 with cedure t o generat,e a.n orthogonal basis for t.he subspace,
Y replacedby [Yl,Y2,. . Yk]
and Z replaced by and because the innova.tions sequence { Filil] and t,he
] gcnerate the same subYk+b
Q.E.D. sequence of original vect.ors {Yi
From the viewpoint of subsequent,developments, the space they convey equivalent information insofar as the
most important of theseproperties of t,he least. squares linear e h m a t i o n is concerned. The updating formula
linearestimat,or are it,s linea.rit,y, its characterization in (27) is then simply a manifesta.tion of t,he intuitive and
terms of t.he requirement that. the estimationerror be un easily proven observat,ionthat theprojection of a vectoron
correlatedwith the data.,andtheupdating
formulas a subspace is the sum of its projections on each of the
listed in Proposition 4b. I n particular, we first have t.he ort.hogona1basis vect,ors of that. subspace.
We remark that if t>he subscriptk in Yrc
is int,erpret.ed to
important fact that, if Y and Z are uncorrelated then the
best. linear estimatorE* ( X ;Y,Z ] of X in termsof both Y and be a time index, t,he fact that, the innovations Yilila,re
Z can be obtained simply by additively combining the in mut.ually uncorrelat,edmeans that, t.he innovations sedividual best, linea,r est.imat,ors B*{XIY) and E * ( X : Z } . quence { Y j ; $  l ] is discrctctime (widesense) whit.e noise.
If Y and Z are correlated, the same principle can be used Much will be made of this in the next sections when we
wit.11 lineardyonce it is reca.lled that Y is uncorrelated with the error Z I y examineestimationproblemsassociated
in est.imating 2 in terms of Y , and that est,imating X in namic systems.
terms of Y and Z ; y is t,he same as est,imatingX in terms of
VII. THEDISCRETETIME
I<aLu.m FILTER
AKD
Y and 2,since a knowledge of Y and Z I yis clea.rly equivaPREDICTOR
lent to a knowledge of Y and 2,i.e., t.he combined vector
[Y,
Z , y ] i srelat.ed to [Y,21
by a linear transformation,
In this section we apply t,he simple estima.tion princiand any linear funct,ionof [ Y, 21can be writt.en as some ples developed in earlier sections to t,he solut,ion of some
linear function of [Y,
Zly]
and vice versa. More gener estimationproblemsinvolving
discretet,ime lineardgally, the best linear estimator 21,+1
of X in terms of the namic systems. I n particular, we consider the problem of
k
1 random vectors Y1,Yz, ., Yk+lcan be writ.t,en re estima.t.ingthe stateof a.system int.erms of noisecorrupted
cursively in terms of f~~and the error vector Fk+llkde output measurements when there are randomdist.urbances
fined by (2Sb) using the updating formula (27). Thus, entering the state equation of the system and the initial
2) Defining
zlk
0.1
a,
e ,
e ,
69 6
IEEE TRANSACTIONS ON AUTOMATIC CONTROL, DECEMBER
state
More
isvector.
a random
specifically, we first direct
attention to the following prediction problem formulated
as Problem 3. In contrast to our earlier practiceof distinguishing b e b e e n random vectors &ndiheir sample values
by denoting the former wit,h capit,al letters and the lat.ter
wit.h lower case letters, we henceforth follov st,andard
pract.ice of writ.ing b0t.h random vectors a,nd t,heir sa.mple
values as lower case letters. No confusion should arise if,
unless specificalliindicated to the contrary, lowercase
let.ters are interpreted as random
vect.ors.
For the first part, of this section~we depart
t,emporarily
from our earlier standing assumptionthat all vectors have
zero mean, and include in the description of t,he dynamic
system a det,erministic control input. and a nonzeromean
~
initial state. Ret,aining these terms in the inductive
proof
is no more difficult t.han omitting them, and it is perha.ps
constructive to clearly exhibit the role they play by including t.hem inthe proof.
Problem 3: Consider the discretet,ime ndimensional
1inea.r dynamic system whose stat.e X , at time k is generated by thedifference equation
~~
~~
Xk+l
~~~~~
BkU,
k
(32)
(37)
Find, for each k = 0, 1 , 2 , . . .,the best l&zrestimator of
t.he system stateat time k
1 in terms of the
output
se. ___
__
quence up to and including time
k, i.e., find the best1inea.r
estimator & + q R
L& E* {Xk+ll&) of therandom
vec&
Xk+l in terms of the sequence 2,of random vectors zo, zl,
~.
_
. . . , 2%.
~
We choose to estimate the &ate
&+I a t time k + 1
(rat,her tha.nthe &ate xk a t time k ) in terms of the out.put
sequence 2, up to time k for convenience of later interpretations and extensions. Est.imation of x j in terms of 2, for
different relationships bet.\\een j and k will be discussed
later in the paper.
The solut.ion to Problem
3 isgiven by the followingproposition.
Proposition 5: Thebestlinearestinlat.or &+1Ik of
in
terms of the out.put sequence 2, = { z j {j,o*k may be expressed recursively using t k I k  1 and ZI, according to the
iterative relation
~~
ik+llk
Akl?klkI
2011
c,x, +
(33)
ek.
The initial &ate
x 0 of the systemis assumed to be a random
vector with
E{xo}
cov [xo, x01
= rno
= Bo.
(34)
+ Bkui: +
Lk[zk
ck&]Rl]
(38)
with initial condition
with mdimensional output
2, =
1971
= rno
E{xo}
(39)
where the gain matrix L, is given for all k = 0, 1, . . by
Lk
AkZ,C,[C,Z,C,
(40)
and where, in turn, then x n nonnegative definite matrix
is the covaria.nce of the estimation error
The control inputu, E R is assumed known (nonrandom)
XrlkI = x,  X,],1,
(41)
for all k, while the disturba.nces { E,} and {eB{ a.re assumed
l.e.,
to beuncorrelatedwhitezeromeanrandomsequences
with known covariances, i.e., the q and mdimensiona.1
x, = cov [Xklk1, i k ] k  l l ,
(42)
random vectors & and 8, have the following secondorder
and may be precomputed using the iterat.ive relation
statistical properties for all k , j = 0, 1, 2, . . . :
E{<,} = 0
[Ek,E j ]
cov [e,, e j ]
cov [E,, e,]
cov
xk+l
E(O,} = 0
B ~ C ~ [ C k ~ ~ COr]CaZk]Ak
p
= BRB,jr BlCnonnegative definite
= @&,
=
6kj
(35)
1,
k = j
0,
k f j
Furthermore, it. is assumed tha.t the initial state x. is uncorrelatedwith the dist.urbances 6 and e, for all k =
0, 1, 2, . +,i.e.,
x01
cov [Or, x01
0.
+ DAErDr
(43)
with initia.1condition
0, positive definite
where B k j is the Kronecker delt,a.defined by
cov [Et,
(36)
The matrices Ak, B,, C,, and D, are assumed tlo be known
and nonrandom for all k and t,o have t.he appropriate dimensions. For each k = 0, l l 2, . . . , denote by 2, the
of (random)
output
vect,ors up t o and
sequence
including time k , i.e.,
Bo
cov [xo, X O ] .
(44)
Before proceeding nith the proof of this proposition,we
pause to examine the structure of the estimator it. defines
and, at the same time, outline theessence of an inductive
proof. It should be emphasizedat the outset that thisestimator, which is called t>heKa.lman (onestep) predictor
[ 1 4 ] ,recursively generates &+llk from kklkl and the newly
available measurement 2,. Its structure is shown in Fig. 1,
from whichwe see that is it comprised of three Lelements:
1) a model of t.he deterministic count,erpart of t.he system
(32) ; 2) a timevarying gain matrixL,; and 3 ) a unitygain
negative feedback loop. At. each t,ime k = 0, 1 , 2 , .,the
udygain nega.t.ive feedbackloopgenerates
theterm
i k l k  1 = 2,  ck&lkl
xvhichwe will shortly see is uncorrelated wit.h the past measurement, sequence 2,1 and is
the innovation in the newly available output, zk. This is
operated on by t,he gain L, t o generate the best linear
697
RHODES: ESTLXATION AND
__
MODEL OF SYSTEM
1k+l/k
I
I
I
DELAY
Xk/kl
+I
Koting that is uncorrelated
with both xk and &Ikl and
thereforewith & l k  l
= xk  &lx+
a directcalculation
= Cn&Ck
Ok,
using (46) shows that cov [Zklkl,,2klkl]
where Zk = cov [xklkl, iklkll. Thus in this problem the
zeromean whitenoise innova,tions sequence {Zklkl) has
covariance
I
I
*k
UNITY NEGATIVE FEEDBACK
Fig. 1. The discretetime Kalman predictor.
estimate E*{xk+&lkl] = Lt(zk  Ckixlkl)
of t,he system
state xk+l in terms of the innovation ZklJGl. Meanwhile, the
model of t8he system generates t,he best, linear estimate
fp+llkl
= E*{Xk+ll&l)
= Akiklkl
Brut which, by the
updating
formula
(27), map be simply added to
to give i k + l l k = E*{xk+llZk} = ~ * ( x ~ +
&I,
Zklxl}, because t,he innovation Zklxl is uncorrelat.ed
dh past data 2,.
It should also be noted that the covariance Ex of the
est.imat.ionerror Xklkl is independent of t,he part,icular control input sequence {uk}and may be either precomputed
or computed in real time using the iterative relation (43)
withboundarycondition
(44). Consequently the timevarying gain matrixLk, which is derived from& via (40),
is independent of the input sequence {u,) a.nd may be
computed in advance. Notice also t,hat, X k and therefore
zk are timevarying evenif the system (32) is constant and
both noise sequences are st,at,iona,ry.
Thus the Kalman predictor t.hat solves Problem 3 is a
t.imevarying linear dynamic system whose dimension is
t,hat of t.he original system (32). It. accepts as inputs the
output, (33) of the spstem (32) and the det.erministic input
u, of that, syst,em, and has as it,s st>ate thcbest, linear estimate of t,he stat,eof t,he system (32) a.t,time k
1 in terms
of the particular sa.mple output sequence 2, received.
In the preceding section it was noted that the innova)
withall
t.ion Zklk1 2 zk  E * { z ~ I Z ~is ~uncorrelated
past. dataand
consequently the recursively genera.ted
&lo, 2211 . . . aremutuallyuncorrevect.ors ZOll(zo),
lated and the innovat.ions sequence {iklkl) is (widesense)
white noise that. is related t,o the sequence { zr ) by a ca.usally invertible lineart,ransformat,ion. Using the linearit,y of
t.he least squares linear est,imator we have from (33) that
iklk1
+0
E*{zklz,lJ = CkE*{Xk[Z&l)
= C&&1
zk
 iklk1
= zk
Ck.fkl.Z1
ckXklk1
= AkIAk2.
*E,,
Similarly, for k
cov
[Xkljl,
.. A ,
I.
(49)
>j ,
ijljl]
cov [x, 
fkl,l,
ijIj11
2
(Pk,jB,Cj.
(50)
We now turn t,o the proof of Proposition 5. We give two
proofs, an induct,ive one based onthe outline given in t.he
above discussion of the st.ructure of t,he Kalman predictor,
and a.nother t,hat esploit,s both the properties of the innovat.ions sequence and the cha.ra.cterizat,ionof the linear
least squares estimator as t.he linear estimator whose estimationerror is uncorrelated wit.11 all past, data.
Proof of Propcsition 5: For the induct.ive proof, we
1 a.nd the GOassume t.hat we know i k l k  l A E* { xxlZx,
variance
= cov [Xklxl, Xkl,l] of the corresponding
estimation error Xklkl 2 x,  f k l k  l . m e must shorn t,hat
using t.his infornlat.ion and zk we can generate & + I I ~ and
t,he covaria.nce 8k+1
of t,he corresponding estimat.ion error
i k + l l k . The inductive proof is then completed by showing
that, by suitablechoice of t,he initial conditions,
the proposition gives t,he correct est>imatorfor k = 0.
Using t.he updating formula (27), we can immediately
write
,
(45)
and we note t>hat E* {OklZkl
] vanishes because ZxPl depends
only
(and
linearly)
on
xo,
a.nd (Oj)j,ok
all of which a.re, byassumption,uncorrelated
xvith O k .
Using (45) a,nd (33) we t,hus have
iklk1
~ l
+ 6,.
(46)
&+llk
2 E*(xk+llz?:]
=
E*{Xk+llZkl)
&+llkl
E*(xx+llZkl, zx}
Lk[zk
E*{.Yk+llklliklkl)
iklk11
(m
698
TR4XShCTIONS
IEEE
ON AUTOXAllC CONTROL, DECEMBER
1971
a.nd
matrices
gain The
Gdetermined
using
k , * can be
the charact,eriz,ation of 2,l,1 as the linear estimator whose estimaL, = cov [ i , + l I ,  1 , ixlr1l[cov [ i k l k  1 , i,1,1]]1
t.ion error is uncorrelated with all past data, i.e.,
which,using (47) and (50), becomes (40). Furthermore,
tcj 5 k  1.
cov [x,  f , [ k  l , ijlj11 = 0,
can be
from (29), the covariance of X,+~J~; = x,+l  ik+llk
mrritten
from
Substit.ution
(56) yields
 cov
.cov
i=O
ZnlrlI[COV
[i.k+lltl,
[iklk1,
iklellll
tcj<k1
IikIR1,X,+ll,11
and, because {Zjljl) is a whitenoise process, the summaon t,he right reduces to a single term:
in
fact
from
(47)
and, using (47) and (50), we have
tion

xk+l = COV [Xk+lIRl,
&+IltII
*
and (48) we have
AkZrCk
[C,BkCk
Ot]lCkBkAk.
(53)
The required expressions for fk+ll,+l and cov [Xli+lltl,
ik+lpl] in termsof &lkl,
Z Eand the problem datafollow
trivially from the linearit,yof the linear least squaresestimator. From Property5 in Proposition4a and (32) n7e have
2,+11,+1
AJ3*{xr,]Zk1)
Akfxlk1
+ B ~ u ,+
Gk,j
C
j=O
=
=
0jIl.
(57)
AkOk,jf o r j 5 k  1,
AkGknjfor j 5 k  1
k1
I;
D,E*(&lZ,1]
(54)
Ok,,SjCj[CjXjCj
Now note,from (49), t.hat
while
= A,, SO t,hat G E + l , j
and
in+l\r =
+ BrU,
Gk+l,jZj\jl
A&~l
A,
f&.+jljl+
j=O
AkBPCk[CkZECE
GP+l,tikIk1
. [z,  c,n,,,,]
where E*{&\Zkl) vanishesbecause Zkl dependsonly
(and linearly) on xo, { & ] j = o  2 and { & } j = ~  l , all of which which is (38) with u, = 0. The initial condition folloms as
are, by assumption, uncorrelated 1vit.h t;,. Furthermore, a before. Subtraction of (38) from (32) yields
similar a.rgument showstha.t i k l k  l and & are uncorrelated,
&+Ilk
= (A,  L k C X ) i k l E  l  L A
a.nd since ut isdeterministic,the covarince of t,he corresponding est.imation error is easily found to be
from which the difference equa.tion (43) and boundarycondition (44) for the error covariance follow by direct calcov [ i t + 1 1 ,  . 1 , i k + ~ l r i  l l = 8,cov [i&l,t~,
X~lt~lAt
culat.ion. Thus the alt,ernative proof is complete. Q.E.D.
DE cov [&, Ekl&
Remark lThe Gaussian Case: When the initial state
x.
and
t.he
random
processes
are
joint.ly
Gaussian,
the
system
= AEZkAk
D&Dk
(55)
state x,+l and the output. sequence 2 , (or the innovations
using either the formula in Pr0pert.y
5 of Proposit,ion 4a or sequence {ijljl]jsoi)
are jointly Gaussian for all k, since
bydirectcalculationaftersubt,racting
(54) from (32). linear transformat.ions and sums of joint,ly Gaussian vecSubstit,ut,ion of (54) and (55) into (51) and (53) then torsare themselves joint,ly Gaussian. As notedearlier,
yields the iterative expressions (38) and (43).
under these circumstances the unconstrained least squares
The induct>iveproof is complet,edby n&ng that, by estimator is linear and thus coincides with the best linear
choice of t,he boundary conditions (39) and (44), Proposi estimator: thus the Kalman predictor defined by Proposition 5 holds t,rivia.llyfor k = 0, since, in the absence of any tion 5 is in this case t,he unconstrained lea.& squares esti=
outputdatainterms
of which to
estimate xo,
mat.or E{xk+llZk}of X,+]. in terms of 2,.
E { x ois] the correct best linear estimator of xo and ZO=
Remark !&Prediction Beyond T i m k t 1: For any
cov [xo,xo] thecovariance of the corresponding estimation integer i 2 1, the best linear estimator &+i+ilk = E*{x,+il
error.
Q.E.D. Z,] of xkti in terms of 2, may be obtained from
&+1llc
For the alternat,ive proof of Proposition 5 we return to using the formula
our earlier st<anding assumption that all random vectors
have zero mean a.nd t,ake u, = 0 and rno = 0.
Alternative Proof of P ~ o p o s i t w n5: In view of the
equivalence of information (insofar aslinearestimation
where Ok,?
is defined by (49). This result folloms by a.pplyis concerned)
conveyed
by
the innovations
sequence
ing t,he linearity of the least linear estimator to thesolu{ZjIj1jjak
and the output process {zj],=gk,ft~slmay be
tion of (32) at time k
i and recalling t,hat, as noted
written
earlier, Z,l is uncorrelated wit.h t j for j 2 k.
61
Remark 3FiZtekg: If A, is invertible, then itlx
f , , ,  l = C G k , sii 1 i1.
(56)
E*{x,lZ,) may be obtained from f k + l l x using the relation
i=O
699
EHODES: ESTIMATION AND FILTERINQ
2klk = Agl[.&+i+llk
B k ~ k ] .
More genera.lly, whether A, is invertible or not, 2,1k may
be obtained directly from t,he linear dymmic syst,em
%+llb+l
A&I,
+ BKu~
+
tional term carried through
(57).
bhe algebrabeginning with
VIII. CONTIXUOUSTIME FILTERING, PREDICTIO
AND
SMOOTHING
+ B,u,)l
In thissection we consider the continuoustime count.erpart. of t.he discretetime stateestimation problem examined in the preceding section and derive t,hecorresponding
with boundary condition [recall (34)]
continuoustime IhlmanBucyfiker
using t,he contin2010= mo
MO[ZO
 Coma]
uoustimeanalog of thealternative(innovations)
proof
given
in
Section
VII.
This
derivation
is
formal
t.0
t,he
exwhere
tent that it involvessomefamiliarformalmanipulaMk = ZkC,[CkMkC,
@E]
tions with white noise and omits t.he step of rigorously
and Zk sat.isfies (43) withbounda,ry condit.ion (44). The provingthe equivalence of t,he output) process and the
proof of this assert.ion follows from a simple modification innovations process insofar aslinearest,imation is concerned. A more precise treat.ment, would requiret<he
of t,he proof of Proposition 5.
theory
of stochasticintegralsandstochasticd3erentia.l
Remark 4Smoothing: Even if the matrix
e
equations
(see, e.g., [15][lS]) and would includea rig41. . . A j is invertible for all j < k, the best linear est*imaorous
proof
t,hat the linear transformat,ion that, genemtes
tor (smoot,her) 2jlk E* ( x j / & ] is not in general given by
the
innovations
process fromthe output process is causally
k
invertible.
The
innovations
approach
to est,imationand
fjla= @ k + l ,ijfZ
l +llk
~ k + l , ~ + l B ~ u (not
~ true)
2=3
detection problems is duet.0 Kailath a.nd his c.ollaborators,
if j is strictly less than k. Briefly, this is because tiis clearly t.0whose works the reader is referred for more detailed
correlat,ed with zi+l and all future output,s ziC2, zi+3, . . ; t,reatment,and extensions, includingnonlinearproblems
thus E*{&lZ,} is not in general t.he zero vector for i < k . ([7lt121).
Problem 4: Consider the smooth .rzdimensional linear
The discretet,ime smoothing problem can be a.t.tacked by
dyna.mic
syst,em with sta.t.eequat,ion
methods analogous taothose discussed later in connection
vcit,h the continuoustime state est.imation problem.
i ( t ) = A(t)x(t)
B(t)u(t) D(t)T(t)
(58)
Remark5CorrelatedNoises:
If t h e noise sequences
{EZ f and {ea1 are whit.e, zeromean, and uncorrelated with a.nd mdimensional output
x0 but correlat.ed with each other, so that.
z(t) = C ( t ) x ( t ) 8(t).
(59)
1M,,l[Z,+l
Ck+l(AZifklk
cov
[I,, %I
The initial st,ate x ( t 0 ) of the system is assumed to be a
random vector wit,h
w k j ,
(40)for L, must be repla.ced by
Lk
= [AkBkCk
+ Dkrk][CkX,C,+
E { x ( t o ) }= mo
cov [ x ( t o x) ,( t o ) ]
BO.
(60)
The cont,rol input, u(t) E R is assumed known and nonrandom for all t, while the dist,urbances t(.) and e ( . ) are
assumed to be whitmezeromean stochastic processes that
8,+1 = ArBkAl:  [A,X,C,
D k r k ] [ C , ~ k C , r O1;]l are uncorrela.ted with each other and with xo, and have
known covariances, i.e., the q and mdimensional random
. [CEBkAk
r,D,]
D&D,.
vectors t(t) and O(t) havethe
followingsecondorder
The only modifica.tion that, is necessary in the induct,ive st,atistical propert,ies for all t, s:
proof isinthepriorcalculation
of cov [Xk+lIkl,ipl,l]
E{l;(t)j = 0
E{0(t)j = 0
given by (50):t,he correlation bet.ween & and BE leads to a.n
cov [ W , a s ) I = = ( t ) w  s),
addit,ional term in(50), which in this case becomes
while the iterative equation(43) for &+1 must, be replaced
by
cov
[&+Ilkl, Zklkl]=
AgXkC,
+ Dkrl;
and t.his is reflectzedthrough the two equations
immediat.ely
following (52) to give the above expressions for L, and
&+l.
For the alternative proof, (48) must be replaced by
cov [x,, ifljl]
= @ k , j + l [ A j ~ j C j D j r j ]
to reflect t,he correlation between
tj and ef,and the addi
(t) nonnegative definite
cov [ 0 ( t ) ,e ( $ ) ] = @(t)6(t  s),
cov [<(t),8 ( s ) ] = 0
cov
@(t)posit,ive definite
[t(t),
X(&)]
cov [e(t), x(to)l = 0. (61)
The mat>ricesA ( t ) , B(t), C ( t ) , and D(t) are assumed to be
knonm and nonrandom and t.0 have t,he a.ppropriate dimensions. For each t L for denote by 2,t.he stochast,ic
700
I EAUTOZ4TIC
E E "FLAKSACTIOXS ON
CONTROL, DECEMBER
1971
process z(.) defined by (58) and (59) up to t,ime t , i.e.,
while the t.hird term is nonnegative andca.n be made zero
[t,husminimizing (631 bytaking R(T) 0, i.e., H ( T )=
2, = {(z(s), s); 8 E [to, 0
(62) HO(T).We leave it t.0 the reader to check that if y(.) conFind, for each t 2 to, the best linear est.imat.or f ( t l t ) A t.ains a nondegenerate whitenoise component then this is
E* {x(t)lZ,)of the st.atex ( t ) at, timet in termsof the output, t,he only condition underwhich the third termis zero, and,
furthermore, under t.hese conditions the solutionto (64) is
process 2, up to time t.
case, the least squareslinear estimator
As in theproof via t,he innovations approach inthe pre unique; thus, in this
ceding section, we assume wit.hout loss of generality that f is unique and is characterized by (64). Because @ ( t ) is
positive definite, these conditions are met for our problem
u(t) = 0 and mo = 0,so that all mndom variables and
random processes have zero mea.n: for convenience of a.nd ne will shortly see, in fact., that(64) defines a unique
reference, these terms nil1 be reint,roduced when the solu funct.ion HO( ). We n0t.e that, if (64) holds for all u then it.
is trivial that any linear function
of YT is uncorrelated
tion is summarized as a proposition later in the section.
Before proceeding nit11 a derivation of the solution to with x  f. K0t.e also that the linea.ritryof the best, linear
estimator, viz.,
Problem 4, we pause to consider wl1a.twe mea,n by the
least. squares 1inea.restimator of a random vector in terms
E* {Ax1Y,) = AE* { x \Y,]
(66)
of a continuoust.ime random process and to establish two
subsequent,ly needed properties t.hat> are the natural
ex folloms immediat.ely by observing that if HO( .) satisfies
tensions t.o t.his sit.uation of two properties of t,he least (64) t,hen AHO(  ) satisfies t,he same equation with Ax resquares lineasestimat.or of one random vecbor in t.erms of placing x .
another. By t,he least squares linear est,imator E*{xlY,]
Ret,urning to Problem 4, we see that since the estimaof an ndimensional random vector x in terms of an m tion error i(tlt) = z ( t )  2(t[t) in 2 ( t l t ) = E*{z(t)[Z,)is
dimensional random process Y, 4 { (y(s), s); s E [0, 2')
uncorrelatedwithanylinearfunction
of the pa.& data
we mean the linear function
Z,, it. is uncorrelated, in particular, withZ(T) and
for
all T < t. Thus, for all T < t, cov [i(tlt), TIT)] = cov
E*{xlY,1 = f =
H0(T)Y(7) dT
[i(tlt), z ( T ) ~ cov [l;(t[t),^Z(T\T)] = 0. Clearly, by sym(63)
metry of aut>ocorrelat,ionfunct.ions, t,he same is true for
of Y , that minimizes E { \ \ x  :
J H ( T ) ~ ( ( T1 ~ )) 1 ~ ) over all T > t , so that)cov [i(tlt), TIT)] = 0 if t # T.
n. x mmatrixvaluedfunctions H(
Thenatural exFrom the linearityof t.he best 1inea.r estima,tor and (59)
tension to this situationof t,he characteriza.t.ion of f as the ne have
linearestimator whose estimationerror is uncorrelated
nith any 1inea.r function of the dat.a (Proposition 4a) is i ( t [ t ) = E*{Z(t)lZ,)= C(t)E*{x(t)IZ,)
that, for all u E [0, T),
E*{B(t)\Z,) = C(t)f(tlt) (67)
1.
?(TIT)
17'
a
)
.
cov [x  f, y ( u ) ]
cov [ x , y ( u ) ]
 l T H o ( T )COV
Y(u)
[Y(T>,
1 (17 = 0. (64)
We establish t.his by observing that for any linear est,imator wewrite
can
p
E#
i(tlt) = z(t)
=
w Y ( ' )
d.113
1
T
E IIx  SOT Ho(T)y(T)d T 
1(.
and e note that E* { B ( t ) (Z,]vanishes
because 2, depends
only(andlinearly)
on x O , { t ( ~to)5; T < t ) and { e ( T ) ;
t o 5 T < t ) , a.11of whicha.re, by assumption, uncorrela.t,ed
wit.h e@).Thus t,heinnovat.ion ii(tlt) may be written
 J)P(r)y(T)
 2~ ([x 
lT
dT/I2+
Ho(T)y(T) d7]'
&(T)y(T)
j
(IT1'2)
T
fi(T)Y(T)
dT/I2}
ST
~ ( n ) y ( u )d o )
(65)
where HO( .) is defined implicitly by (64) and f i ( t ) 2
H(t)  Ho(t).Nomy note that thesecond t.ermon the right
because it
can be written
side of (65) vanishes
 C(t)f(tlt) = [x(t)
C(t)
C(t)i(tlt)
+ e(t)
 i(tlt) 1
+ O(t)
(68)
from which it, can be calculated directly t.hat cov [ii(t[t),
i(tlt)] is an impulse with ma.gnitude @(t), using t.he fact
that 6(t) is uncorrelated with both Z(tlt) (because, as discussed
above,
it is uncorrelat.ed nit,h 2,) and x ( t ) , and
therefore with i ( t l t ) . Thus the zeromean whitenoise innovations process i has covaria.nce
COV [ i ( t l t ) , i ( T I T ) ] = @(t)6(t  T).
(69)
As in Section VII, it is convenient for later purposes to
have calculated by augments similar to those leading to
(48) and (50) t.hat
COV
[ x ( t ) , i(TIT)] = @(i,T)X\(T)C'(t)
the integrand of which is, by (64), identically zero. The matrix associated
first term on the right side of (65) is independent of H ( . ) , A ( t ) x ( t ) .
with t,hed8erentia.lequation
x(t) =
RHODES: ESTINATION AKD FILTERING
701
Assuming that the linear transformation that generates
the innovations process i from the output process z is
causa.lly invertible, so t,hat.any 1inea.rfunct>ionof z can be
expressed as a linear function of i (and vice versa), we can
mite
JG ( ~ , T ) ~ (dTr ~ T )
i(tlt) =
related with each other a.nd i(t,,)to)(= x(to)) then shows
that
z(t)
4 cov
i ( t l t ) ] = Q(t, $ O ) & ~ ( t ,
+ s, t ( t ,
(71)
to
where the mat.rixvalued gain function G(t, can be calculated using t,he characterization (64) of Z(tIt), viz.,
[x(@),
to)
dt
T ) ~ ( T ) c ( T ) O  ( T ) C ( T ) ~ ( T ) ~ T( )t ,
lo
* ( t , T ) D ( T ) ~ ( T > D ( TT )) dt
~(~,
(77)
e )
COV [ X ( t ) ,
i ( ~ I u )=
]
G(t, T ) cov TIT), ~ ( u I u ) ] d ~ ,
and this mag be differentkted togive
i(t)
[A(t)  z ( t ) C ( t ) ~  ( t ) C ( t ) ] ~ ( t )
(72)
+ B ( t ) [ A ( t ) ~ ( t ) C ( t ) O  ( t ) C ( t ) ]
+ D(t)?(t)D(t) + z(t)C ( t ) W l ( t ) C ( t ) r , ( t )
A ( t ) z ( t ) + B(t)A(t)  z ( t ) C ( t ) o  l ( t ) C ( t ) Z ( t )
+ D(t)S(t)D(t)
(78)
Formally differentiat.ing (71) with respect, t.0 t using the
Leibnitzrule,
(72), andthe defining property of the
t,ransit.ionma.trix, viz, (d/dt)@(t, T ) = A ( t ) @ ( t , T), we obtain
and, set.ting t = fo in (77), the boundary condition ~ ( t , =
)
EO.
A nonzero control inputand a nonzeromean initial
state simply affect the mean of x ( ! ) , which is then
to
to
< t.
Using (69) a.nd (70) this reduces to
G(t, U ) = * ( t , U ) X ( U ) C ( ~ ) @  ( U ) , t o
i(tlt) =
=
1;
~ ( t ) ~ ( tT)i(TlT)
,
A(t)f(tlt)
dT
< t.
+ ~ ( tt)i(tlt)
,
+ G(t, t)i(tlt)
E{X(t)}= * ( t , tO)mO
(73)
&h, setting t = t o in (71), initial condition f(tolto) = 0.
Note that (72) defines G(b, U ) only for u < t and yet (73)
requires t,hat we det,ermine G(t, t). If we integrate (73)
from to t,o t we obtain
i ( @=)
1:
and this sum mustbe added to the right side of (71). This
leads to an addit.iona1 term B(t)u(t) on the right side of
(75) and c.hanges its init,ia.l conditionfrom zero to mo.
The differential equa.tions for i(tit) a.nd Z ( t ) remain una,ffected.
I n summary, we have the following solution t o Problem
Proposition 6: The best linear
estimator
and if this expression is t.o coincide with (71) when G(t, T )
is defined for T < t b37 (72), we must have
the output process 2, is t.he n.dimensional linear dynamic
system
i(tlt) =
vhich, since @(t, T ) is nonsingular, yields
f(tlt)
E ( x ( t ) \ Z l )of the state x ( t ) of t.he system (58) in t,erms of
@(t, T)G(T,T ) = * ( t , T ) ~ ( T ) C ( T ) O  ~ ( T )
+ B(t)u(t) + z ( t ) C ( t ) O  * ( t )
A(t)i(tlt>
[z(t)  C(t)i(tlt)]
(74)
with initial condition
and (73) becomes
i(tlt) = A(t)i(tlt)
T ) B ( T ) U ( T ) (17
4.
@(t, ~ ) G ( TT ,) ~ ( T ) T dr
)
G(T, T ) = Z ( T > ~ ( T ) @  ~ ( T )
+ l:@(t,
+ X(t)C(t)O(t) [ ~ ( t) C(t)i(tlt)].
~ ( t o I t o ) = mo = ~ { x ( t o ) }
where t,he X n symmetrricnonnegative definite matrix
(75) Z ( t ) is the covariance of the est,ima.t,ionerror i ( t l t ) =
Subtracting t.his equation from (58) (wit.h u(t) = 0) gives x ( t )  i ( t l t ) and is the solution to theRiccati equation
t(tlt) = [ A ( t )  X(t)C(t)@l(t)C(t)]i(t(t)
+ D(t)t(t)  x ( t ) C f ( t ) w l ( t ) e ( t )
from which we can write
i(tlt>
d!(tJ to)i(tolto>
A(t)X(t)
+ F(t)A(t)  B(t)C(t)@(t)C(t)B(t)
+ D(t)S(t)D(t)
r,(to)
[(T)t(T)
c(~)
o  y T ) e(T) 1 dr
wit.h initial condition
l+
z
i(t)
(76)
where $(t, T ) is the t.ransitionmatrix associated m4th
tir(t) = [ A ( t )  z ( t ) C ( t ) @  l .( t ) C. ( t ) ] w ( t ),
.4 direct, caIculation using the fact, that e and are white and uncor
= x0 =
cov [x&), x(to)].
The form of t.his best linear estimator, which is called
the KalmanBucy filt.er for Problem 4, is shown in Fig. 2
[19]. Like its discret,et,imecounterpart, it is comprised of
t,hree elements.
11 A unitvaa,in
negative feedback loop which generates
I
702
IEEE TRANSACTIONS ON AUTO1XI.tTlC CONTROL, DECEMBER
u(t)
1971
These expressions are not validinthesmoothing
case
where T <.O, which will be discussed in a later remark.
Remark 8Co?related Noises: If E( and 9( are correlated with each ot,her so that
MODELOF SYSTEM
a )
cov [ ~ ( t e(s)l
),
r(t)&(t
 S)
then equations (78) and (75) must be replaced by
i ( t ) = A(t)X(t)
+ X(t)A(t)  [ X ( t ) C ( t )
+ D(t)r(t)l@W[C(t)r,(t) + r(t>D(t)l
+ D(t)3(t)D(t)
UNITY NEGATIVE REDBACK
mean in t.erms of z(t1t).
3) A model of the deterministic part of the system
(58), theinternalfeedback loop a.nd externallyapplied
input, of which give A(t)i(tlt) B(t)u(t),nhich can be interpreted as the best, linear estimator of x ( t ) prior to the
arrival of z(t), and which, when additively combined with
~ ( t ) ~ ( l l gives
t),
= (d/dt)~*{x(t)I~,].
Kotice also that, as in the discretetime case, the covariance X (t) of the estimation error is independent
of the
deterministic cont,rol inputandmay
beeitherprecomputed or computed in real timefrom t,he Riccati equation
(78) wit,h initialcondition X ( t , J = Eo.The exhence of
solutions to this Riccati equation andt.he behavior of t,he
solution a.s t approaches infinity will be examined in t,he
next section.
Remark &The Gaussian Case: If x ( t o) , E (  ) a,nd e ( . ) are
jointly Gaussia.n so also are the st,atex ( t ) and the innovations process i(. I .) [or the output, process z (  ) ] ,and t.he
KalmanBucy filter is t.he unconstrainedleastsquares
estimat.or of x ( t ) in terms of z,, i.e., ~ ( t . l t )= E { x ( ~ ) ( z , } .
Remark 7Prediction: For T
0, the best 1inea.r e&mator of x ( t
T ) in terms of 2,may be obta.ined from
i ( t l t ) using the relation
a.nd this change reflected through the subsequent algebra;
otherwise, the proof is unchanged.
Remurk 9Colored
Noises: Consider now the case
where, in addition to being correhted, the noise processes
E( and e( .) are not. white. This problem can be reduced
immediately to an equivalent. higher dimensional problem
involving whitenoise processes, the solution to which is
discussed in Remark 8 , provided that the combined process
a )
is a finitedimensional Markov process and e( contains a
whitec.omponent n t h nonsingularcovariance
matrix,
i.e., provided that the combined process n (  ) can be generated as the output
of a finitedimensional linear dynamic
system of the form
a )
.z
H ( M )
+ J(t)v(t)
@Ob)
where w(.) and v ( .) are (possibly correlated) whitenoise
processes Tsith covariances
w(t)&(t
= v(t)s(t T)
Cov [w(t), w ( T ) ] =
cov [v(t), u ( T ) ]
which follows immediat,ely by applying the linearity
of the
best linear estimator t.0 the solution of (58) and recalling
that, as not.ed ea.rlier, T(s) is uncorrelated n<tah 2, for
s 2 t. A direct calculation shows tha.t the covariance
of the
corresponding estimation error is given by
+ Tlt), i ( t + Tlt)] m(t + T , t ) ~ ( t ) @ ( +t T , t )
+ T @(t + T , T ) D ( T ) ~ ( T ) D ( T )+* (T~, clT.
cov [ i ( t
t+
.)
7)
a.nd, in a.ddition,
Jz(t)V(t)Jz(t) > 0,
vt.
(81)
The requirement (81) that 9(t) contain a nondegenerate
whitenoise component will be seen shortly t.0 correspond
t,o our earlier standing assumption (61) that @(t) be posit,ive definite when e( .) is white.
Under these circumst.ances, the systems (58) and (80)
can be combined into the single system
703
RHODES: ES!ITXATION AND FILTERING
all zeros in the left halfplane, and some matrix T [21]P41.
Remark IOSmoothing: Consider now the problem of
finding i(tlT) = E*{x(t)IZ,), where T > t. Proceeding as
in the filtering case, we first m i t e
f ( t J T )=
14T
H(t, T; ~ ) i ( ~ l rdr
)
and then apply the characterizationof a(tlT) as t.he linear
estimator whose est,imation error is uncorrelated wit,h a.ll
the data upt o time T to obtain
whichinvolves whit,e (but. correlated) noises in t.he dynamics and the measurements. The Kalman
filter and predictor for estimat,ingthe state of this combined syst.em is
then obt.ained as a direct, application of Remark S, and t,he
estinmte of x(t) is obtained from t,his estimateof t,he sta.te
of the combined syst,em using the 1inearit.y of t.he best
linear estimator, i.e.,
cov [ x ( t ) , i(uIu)]
L T
H(t, T; 7) cov [ i ( ~ i(olu)]
l ~ ) ,d7
H(t, T;
to
u)@(u),
G(t, T ) ~ ( T ~ d7
T )
+ I*
%(ill!) +
[x(t), i ( T l T ) ] @  ( T ) i ( T l 7 )
COV
iT
COV
T)
(86)
IVow for t
< 7 we have
that
cov [ x ( t ) , i(TIT)] =
=
=
COV
[x(t), X(71T)]c(T)
+ i ( t l t ) , X(71T)]C(T)
cov [X(tlt),
+0
X(7(7)]C(T)
P(t, T)C(7)
(87)
P(t, 7 )
COV
+0
cov [ i ( t l t )
(83) and, using (76), we calculate, for t
forsomecontinuousmatrixvalued
funct,ions P(.) and
S (  ) and somecontinuously difTerent.iable, symmetric,
nonnegative definite, mat.rixvalued function Q(  ) whose
derivative Q( .) also has nonnegative definite values. The
notation t A 7 denotes the minimum of t and T [20][22].
If the combined process n( .) is widesense st,at,ionary,
so that R(t, 7) = R(t  7 , 0) = R(T  t , 0), an equivalent
characteriza,t,ion can be given in t>he frequency domain in
t.erms of the Laplace transform8( of R ( .,0) :a necessary
and suAicient condition for t,he widesense stationary
process n( to be repre~ent~able
as the output of a stable,
constant*finitedimensional linear dynamic system of the
form (82) is t,hatits spectrum A ( . ) = e[E{n(t)n(O)]] is a
rational function of the form
C&
[ x ( t ) , i ( T 1 T ) ] @  ( T ) i ( T ( 7 ) d7.
e)
R(t, 7) = P(t)Q(t A T ) P ( T )+ S ( t ) S ( ~ ) s ( t
< T.
Now observe that. for u < t this equat,ion is identic.al to
(72) andH(t, T; u) = G ( t , u) = @(t, u)Z(u)C(u), so t,hat
splitt.ingthe integral intotmo parts, we can write (85) as
i(tlT) =
We not,e that the requirement (81) ensures that, t,headditive white noise in the mea.surements (82) is nondegenerate in the sense that it, has positive definite cova.riance
nmtrix, which conformswith our earlier standing assumption (61) on the additive whit.e measurement noise.
If we rest,rict, attention to whitenoise processes v( * )
a.nd w( t.hat are uncorrela.ted, a necessary and sufficient
condition for the combined process n( .) defined by (79)
to be representable as t,he output of a finitedimensional
linear dynamic system of the form (82) is that its autocorrelation funct.ion R(t, 7) = E{ n(t)n(T)I can be expressed
for a.11t and 7 a.s
[x(tlt), ?(717)]
T,
z(f)d!(T,t ) ,
t 5
(88a)
where, as before, a(7, t ) is t,he transition matrix associated
Ttit.h zb(t) = [A(t)  x ( t ) C ( t ) @  l ( t ) C ( t ) ] w ( t We
) . not,e inpassing t>hat, fort >_ 7,we have
P(t, T ) p cov [X(@),
X(717)]
lJ(t, T ) Z ( T ) ,
7.
(88b)
a )
e )
Thus
Z(tlT) = f ( t l t )
+ B(t) J*q(7,
t)C1(7)@1(7) [z(.)
 i(71.)]
d7
(89)
and subtracting each side of this equation fromx ( t ) yields
c1Zi a,si with all roots i(tlT) = X(tlt)  X ( t ) t $ ( 7 , t>c(7)@1(T)i(T17)dT
for some polynomial p ( s ) = s
havingnegativerealparts,somematrixvalued
~olvnomial Q(s) =
Qisi 1vit.h degree a.t,most n  1a.nd from which a direct calculation using (69) a,nd (8s)shows

~~~~
704
IEEE TRANSACTIONS ON AUTOMATIC CONTROL, DECENBER
r(t)A
cov [i(tlT),i ( t l T ) ]= X ( t )
 X(t>
[iT
$(T,
1971
It is well known [2] that thecontrol law w(., .) :R" x
R" that minimizes J [ w ] for all boundary condit,ionsp(tl) = p 1 is given by
[to, t l ] P
~ ) C ' ( T ) @  ~ ( T ) C ( Tt )) dr
~ ( ~ B(t).
,
w"(t), t ) = L'(t)p(t) = @'(t)C(t)K(t)$(t)
(95)
(90)
where the n x 7%nonnegative definite matrix K(t) satisfies
If T is fixed and we differentiate (S9) wit,h respect to t
using ( 7 3 , (7S), and the identity [2]d/dt@(T, t ) =  [A(t) the Riccati equation
 r; (t)C'(t)@I@) C ( t )]'4'(7, t ) , we find after some algebra K ( t ) = A(t)K(t) K(t)A'(t)
and the reuse of (89) that
 K(t)C'(t)@'(t)C(t)K(t) D(t)z(t)D'(t) (96a)
$(tlT) = A(t)i(tlT) D(t)=(t)D'(t)Z'(t)
with initial condition (terminal condition in reverse time)
[i(tlT) i ( t l t ) ] (91)
K(4J = Eo.
(96b)
which is integrated backwards from thefinal time T , with
Furthermore, the corresponding minimum cost is
the boundgrycondition Z(TIT) obta.ined by integrating
(75) forward from t o to t. Notice that once i ( t l t ) has been
J[@l = PI'K(tl)#l.
(97)
found over the entire interval from
toto T there is no need
It should be noted at t.he out,set that the Riccati equatoret.aintheoutputmeasurements,
since they a.re not
tion
(96a) andboundarycondition
(96b) for K( are
needed in (91). MTeremark that X  l ( t ) can be comput,ed
identical
to
t,he
Riccati
equation
and
boundary condition
directly from
(78) for X ( .) in the KalmanBucy filter of Proposition 6.
i  1 = A'Bl
 XlA  z  l D s D ' z  1 + C'olC
Furt,hermore,making
the consequent ident,ification of
e )
x'(to)
zo1
which follows from (78) and the identity [2] d / d t ( X  ' ( t ) )
=  Z  l ( t ) i( t ) Z  l ( t ) . Differentiation of (90) shows,
after some algebra, that r(t)satisfies
K(t) with E(t), it is seenthat thetimevarying gain matrix
L'(t) = O  l ( t ) C(t)K(t)in the negative feedbackloop that
implements the optimum control law
(95) for the least
squarescontrolproblem is simply t.he transpose of the
timevaryinggainmatrix
L(t) = x(t)C'(t)@l(t) that
operates
on
t.he
innovation
z
(
t )  C ( t ) i ( t l t )in t,he Kalmanr ( t ) = [A@) ~ ( t ) ~ ( t ) ~ ' ( t ) z  l ( t ) ] r ( t )
Bucy filter of Proposition 6. It. can thus be seen t,hat there
r(t)[ A @ ) D ( t ) ~ ( t ) D ' ( t ) ~  l ( t ) I '
is a onet,oone correspondence between the solutions of
le& squares estimation problems of the typediscussed in
 D(t)B(t)D'(t)
Section VI11 and the solutions of leastsquarescontrol
with terminal condition ( T ) = X ( T ) .
problems of the type discussed above. Partly because dual
Various otherrepresentations of the solution to the
(or adjoint) systems are involved, this correspondence is
smoothing problem have been proposed when the problemoft*en referred t,o astheduality
between leastsquares
is specialized to the socalledfixedint.erva1, fixedpoint,, estimation and least squares control.
or fixedlag smoothing problems. A summary of these can
In this vein, we remark that the linear dynamic system
be found in [9], including the twofilter solution to the defined by (92) with boundary Condition at the termina.1
fixedinterval.problem given in [27].
time tl is t,he dual (or adjoint) of the linea,r dynamic syst,em
IX. THEDUALITY
BETWEEN LEAST
SQUARES
ESrIMATION ~ N LEAST
D
SQUARES CONTROL
Consider the least squares regulator problem involving
the linear dynamic system
b(t) =
A'(t)P(t)  C ' ( t ) ~ ( t )
(924
v(t> = D'(t)P(t)
(92b)
operating in reverse time from the boundary
condition
P(h)
=PI
(93)
a.nd the quadratic cost functional
J [ w ]=
x+
[W'(t)@(t)W(t)
+ u ' ( t ) 8 ( t ) u ( t ) ]d(t)
i ( t > = A(t)x(t)
YO)
+ D(t>dt>
(984
(98b)
C(t>x(t)
with boundary condition at the initial time to. Thus the
system considered inthe aboveleastsquarescontrol
problem is t,he dual of the syst,em involved in the least
squares est.imation problem of Section VIII.Furthermore, the Ka.lmanBucy filter of Proposit,ion 6, viz.,
+ L(t)z(t)
4(tlt) = [ A @ )  L(t)C(t)]Z(tlt)
(99)
with boundary condition at the initial time to has as its
dual the system
(0 =
 [A(t) 
v(t) =
L ( t ) C ( t )1)"
L'(t)#(t)
(100a)
(100b)
P'(t0)xoMo)
lot'
+ P'@o)xoP(to).
tl. The state
with boundary conditionat the terminal time
equation of thisdualsystem will beimmediately recognized as the optimum closedloop system for the above
(94) squares
least
cont,rol problem,
while
output
equation
the
+ P'(t)D(t)8(t)D'(t)#(t)]dt
[w'(t)@(t)w(t)
705
RHODES: ESTIMATION AND FLLTEEINQ
TABLE I
Least
Squares
Estimation
Least
Squares
Problem
For fixed butarbitrary t, Dhe solution B,,(t) to the
Riccati equat,ion (96) with boundary condition Bb(tO)= 0
ha.s the following propertiesas t,he initial timeto decreases
to increasingly negative values.
System p = A@  Cw
(e*)
1) If theconstantsystem
(98a) iscompletely
obu = Dp
servable from the output (9Sb) [equivalently, if the dual
Boundary
At t l
system (92) is completely controllable from the input, w],
condition
t,hen B,,(t) is monotone nondecreasing as to decreases, and
Cost Functional
isuniformlyboundedfromabove.
As to +  a , Zt,(t)
approaches a limit, 8 , that, is independent of t and is a
J[w] =
WOW + UZUdt
(not necessarily unique) nonnega.t,ive definite solut,ion to
P(to)z7P(ta)
the a.lgebraic Riccati equation
Control
Problem
x
+
Solution
wo
 ( A  LC)pO
(S*)
LpO
L = KC@1
K = A K + KA
 KCOlCK
DED
K(t0) = Eo
Transition + ~ ( t ,s) = e A ( s  0
=
Transition
mat,rix
s) = e A ( t  d
matrix
Controllability
matrix
[D, A, D, ...
Observability
AnID] matrix
2 completely
controllable
Observablity
mat.rix
[C,AC, . . .
Controllability
(A)n;C] matrix
2 completely
observable
[D, A D,
. .,
AnlD]
2* completely
observable
[C,AC, . .
(A)lC]
2 * comulet.elv
corkrollible
e ,
AB,
+ :E,A
 z,CO~CB,
+ DsD
0.
(101)
const.ant. system(9s) is com2) If, inaddition,the
pletely controllable from t>he input p [equivalent>ly,if t,he
dual system (9%) is completely observable from the output. (92b)], thenB,,(t) is positive definit.e for a.11t > t o and
Z, is the unique positive definite solution to t.he algebraic
R.iccati equation (101).
3) If the constantr system (98) [equiva.lent,ly, the dual
syst.em (92)] is bot.11 conlpletely cont,rollable a.nd completely observable, then the
eigenva,luesof A  x, C@lC
have strict,lynega.tive real pa.rts, so t.hat t,he system
f(tlt) =
Ai(t1t)
+ Z,COl[z(t)
 Ci(tlt)] (102)
is a.sympt.otically stable, a.nd so bounded input,s result, in
defines the corresponding optimum cont.ro1, i.e., v ( t ) = boundedoutputs,operatinginforwardtime.
[Equiva.wo((p(t),t ) = L(t)#(t). Thus the solut,ion t.o the above lently, the dua.1system
least, squares cont.ro1problem is given by t.he dual of the
j ( t ) =  [ A  ~,C@Cl(p(t)
solution tothe
least
squares
estimat,ion problem of
Section VIII.
is asympt.otically st,able when operating inreverse t>ime.]
The duality between estimation and control provides a.
With t.hese factasat hand, it. is a simple ma.tter t*oshow
direct and convenient means for examining the properties t.hat the const,ant.feedback contxol law
of t.he KalmanBucy filter and it.s associat.ed Riccati equaw0@(t),t ) = L,#(t) = @lCB,#(t)
(103)
tion by dra.wing on esta.blished properties of the solution
to t,he dual least, squares control problem. In particular, is the unique solutiont,ot,he infinit>etimeregulator problem
we c.anexamine the exist.ence and propertiesof any st>eady defined by t>heconshnt completely controllable and comstate solution to the Kalmanfiltering problem by appea.1 plet,ely observablesystem (92) andthe cost funct,ional
ing t.0 the known solution of the corresponding infinite (94), where the sveight,ing matrices 0 and E, are constant
time regulator problem. Wit.h this in mind, we identify in and t o =  a (see, e.g., [SI). The natural analogof this reTable I the corresponding properties of the t.wo problems sult in t>he estimationcase is t,hat the steadgsta,t.e Kalman
and their solut*ions.Since our greatest subsequent, int.erest filter (102) is t.he best, linear estimator of the st.at,c of the
is in the case where the systemsare const.ant, a,nd the completely cont,rollable and completelyobservable connoises are stationary or, equivalently, tThe weighting ma. stant. syst.em
t.rices and @ in the cost functionalareconstant,
we
X(t) = A x @ )
DE(t)
restrict! attention t.0 t.his situation.
The following properties of the solution to the R.iccat,i
~ ( t=
) cx(t)
e(t)
(104)
equat,ion (96) with initial condition Bo are well known for
t,he least squares cont,rol problem. We exploit t,he duality in terms of t,he output process z(.) over ( a ,t ) . Since,
discussed a.bove to write these properties directly in t,erms however,t,he syst,em (104) hasbeen
operating since
of the not,at,ion of t,he estimation problem, and include in  , the processes x ( . ) a.nd z( will not be well defined
brackets the equiva,lent condit,ions on the system involved unless t.he st,ability properties of the system are such t,hat
in the control problemfrom which these properties a.re t,he covariance of x ( t ) has reached a. stea.dystat.e value
deduced. Of part,icula.r import,a.ncehere is the wellknown cov [ x ( t ) , x(t)] = & which is given by the solution t.0 the
facttha.taconshntsystem
is complet.ely cont,rollable algebraic mat.rix equation
(respectively completely observable) if a.nd only if its dua.1
Athx +,A = DEDI.
is completely observable (respect,ively conlpletelg controllable) ; this is evident from the
la.st two lines of Table I.
A sufXcient. conditionfor there t o exist, auniquesuch
+
+
0)
706
IEEE TRANSACTIONS ON AUTOMATIC CONTROL. DECEMBER
solution is that all theeigenvalues of A have strictly negative real parts. [a]
Under these circumstances the problem becomes identical tothe classical Wienerfilteringproblem,and
the
steadystate Kalman filter (102) is the optimum realizable
Wiener filter for this problem.
Perhaps of more practical importance, however, is that
the steadystate Kalman filter (102) is trivially the solution to the finitetime estimation problem of Section VI11
if the covariance at the finite initial time to is taken to be
cov [ x ( t o ) ,x ( t o ) ] = 8,. Also of importanceis the result
that, even if cov [ x ( t o ) , x ( t o ) ] is not 8 , but some other
value &, the error in using the steadystate Kalman filter
(102) instead of the correct timevarying filter approaches
zero as t + m [25], [26].
ACKNOWLEDGMENT
Thecommentsand
suggestions of D. 1,. Snyder, T.
Kailath, and A. S. Gilman are gratefully acknowledged.
REFERENCES
E . Parzen, Modern Probability Theory and ItsApplications.
New York: Wiley, 1960.
R. W. Brockett, FiniteDimensional
LinearSystems.
New
York: Wiley, 1970.
W. M. Wonham, On the separation theoremof stochastic control, S I A M J. Contr., vol. 6, no. 2, pp. 312326, 1968.
H. Cramer, MathematicalMethods
of Statistics. Princeton,
N.J.: Princeton University Press, 1946.
R. Bellman, Introduction to MatrixAnalysis, 2nd ed. New
York: McGrawHill, 1970.
M. Athans, The matrix minimum principle, Inform. Contr.,
vol. 11, pp. 592606, Nov./Dec. 1967.
. . T. Kailath. The innovations amroach to detection and estimation theory, Proc. I E E E , v 6 i 58, pp. 680695, May 1970.
[8] ,
An innovations approach to leastsquares estimationPart I: Linear filtering in additive white noise, I E E E Trans.
Automat. Contr., vol. AC13, pp. 646655, Dec. 1968.
[9] T. Kailath and P. Frost, An innovations approach t o least11: Linear smoothing inadditive
squares estimationPart
white noise, I E E E Trans.Automat. Contr., vol. AC13, pp.
655660, Dec. 1968.
[lo] P. A. FrostandT.Kailath,
An innovationsapproachto
leastsquaresestimationPart
111: Nonlinear estimation in
Contr., vol.
white Gaussian noise, I E E E Trans.Automat.
AC16, pp. 217226, June 1971.
[ l l ] T. Kailath and R. Geesey, An innovations approach to leastIV: Recursive estimation given lumped
squares estimation, Part
covariance functions, this issue, pp. 720727.
[12] P. Frost, Nonlinear estimation in continuoustime systems,
Ph.D. dissertation, Dep. Elec. Eng., Stanford Univ., Stanford,
Calif., June 1968.
[13] D. G. Luenberger, Optimization by Vector SpaceMethods.
New York: Wiley, 1969.
[14] R. E. Kalman, A new approach to linear filtering and predic
1971
tion problems, Trans. A S M E , vol. 82, ser. D, pp. 3543, Mar.
1960.
[15] K. ItoandH.
P. McKean, Diflusion Processes and Their
Sample Paths. New York: Springer, 1965.
[16] H. P. McKean, Stochastic Integrals. New York: Academic
Press, 1969.
[17] I. I. Gikhman and A. V. Skorokhod, Introduction to the Theory of
Random Processes. Philadelphia:Saunders, 1969.
[18] E. Wong, Stochastic Processes in Information and Dynamic Systems. New York: McGrawHill, 1971.
[19] R. E. Kalman and R. S. Bucy, New results in linear filtering
and prediction theory, Trans. A S M E , vol. 83, ser. D, pp. 95108, Mar. 1961.
[20] B. D. 0. Anderson, J. B. Moore, and S. G. Loo, Spectral
IEEE
factoriz:ztion of timevaryingcovariancefunctions,
Trans. Inform. Theory, vol. IT15, pp. 550557, Sept. 1969.
[all R. Geesey, Canonical representations of secondorder processes
with applications, Ph.D. dissertation, Dep. Elec. Eng., Stanford Univ., Stanford, Calif.,,?ec. 1968.
[22] T. Kailath and R. Geesey, Covariance factorizationAn explication via examples, in Proc. 2nd Asilomar Conf. Systems
and Circuits, Monterey, Calif., Nov. 1968.
[23] D. C. Youla, On the factorization of rational matrices, I R E
Trans. Inform. Theory, vol. IT7, pp. 172189, July 1961.
tothespectral
[24] B. D. 0. Anderson, Analgebraicsolution
factorizationproblem,
I E E E Trans.Automat.
Contr., vol.
AC12, pp. 410414, Aug. 1967.
[25] B. FYFdland, On the effect of incorrect gain in Kalman
filter, I E E E Trans.Automat. Contr. (Corresp.), vol. AC12,
p. 610, Oct. 1967.
[26] R. A. Singer and P. A. Frost, On the relative performance of
the Kalman and Wiener filters. I E E E Trans. Automat. Contr..
vol. AC14, pp. 390394, Aug. 1969.
[27] D. C. Fraser and J. E. Potter, The optimum linear smoother
as a combination of two optimum linear filters, I E E E Trans.
Automat. Contr., vol. AC14, pp. 387390, Aug. 1969.
Ian B. Rhodes (67) was bornin
Melbourne,Australia,on May 29,1941. H e received the B.E. and M. Eng. Sc. degrees in
electrical engineering from the University of
Melbourne,Melbourne,Australia,in
1963
and 1965, respectively, and the Ph.D. degree
in electrical engineering from Stanford University, Stanford, Calif., in 1968.
In January1968 he was appointed Assistant
Professor of ElectricalEngineering
atthe
MassachusettsInstitute of Technology and
taught there until September 1970, when he joined the faculty of
Washington University, St. Louis, Mo., as Associate Professor of Engineering and Applied Science in the graduate program of Control
Systems Science and Engineering. His current research interests lie
in linear system theory, optimal estimation and control, and minimax problems.
Dr. Rhodes is a member of the Society of Industrial and Applied
Mathematics,theInstitute
of Radioand ElectronicsEngineers
(Australia), and Sigma Xi, and is an Associate Editor of the IFAC
journal Automatica.