You are on page 1of 15

Automatica, Vol. 7, pp. 465--479. Pergamon Press, 1971. Printcxl in Great Britain.

Recursive Bayesian Estimation Using Gaussian Sums"


Estimation recurrente bayesienne utilisant les sommes gaussiennes
Rekursive Bayes-Schtitzung unter Benutzung Gauss'scher Summen
PeKypCHBHa~ 6afiecoBcra~ ot/errKa c noMomsro rayccoBcKnx CyMM
H. W. SORENSON t and D. L. ALSPACH~
The approximation O/ t he p, obabi l i t y density p(xklZk) of the state of a noisy dynamic
system conditioned on available measurement data using a convex combination of
gaussian densities is proposed as a practical means Jbr accomplishing non-linear filtering.
Sununary--The Bayesian recursion relations which describe
the behavior of the a posteriori probability density function
of the state of a time-discrete stochastic system conditioned
on available measurement data cannot generally be solved
in dosed-form when the system is either non-linear or non-
gaussian. In this paper a density approximation involving
convex combinations of gaussian density functions is intro-
duced and proposed as a meaningful way of circumventing
the difficulties encountered in evaluating these relations and
in using the resulting densities to determine specific estima-
tion policies. It is seen that as the number of terms in the
gaussian sum increases without bound, the approximation
converges uniformly to any density function in a large class.
Further, any finite sum is itself a valid density function unlike
many other approximations that have been investigated.
The problem of determining the a posteriori density and
minimum variance estimates for linear systems with non-
gaussian noise is treated using the gaussian sum approxima-
tion. This problem is considered because it can be dealt
with in a relatively straightforward manner using the
approximation but still contains most of the diffculfies that
one eneotmters in considering non-linear systems since the
a posteriori density is nongaussian. After discussing the
general problem from the point-of-view of applying gaussian
sums, a numericai example is presented in which the actual
statistics of the a posteriori density are compared with the
values predicted by the gaussian sum and by the Kalman
filter approximations.
I. INTRODUCTION
THE P~OBLrM of est i mat i ng t he st at e of a non- l i near
st ochast i c syst em f r om noi sy meas ur ement dat a is
consi dered. Thi s pr obl em has been t he subj ect of
consi derabl e research i nt erest dur i ng t he pas t few
years, and JAZWrNS~I [1] gives a t hor ough dis-
cussi on of t he subject. Al t hough a gr eat deal has
* Received 24 August 1970; revised 18 December 1970.
The original version of this paper was not presented at any
IFAC meeting. It was recommended for publication in
revised form by associate editor L. Meier.
t Department of the Aerospace and Mechanical Engine-
ering Sciences, University of California at San Diego, La Jolla,
California 92037, U.S.A.
~; Department of Electrical Engineering, Colorado State
University, Fort Collins, Colorada, U.S.A.
been publ i shed on t he subject, t he basi c obj ect i ve of
obt ai ni ng a sol ut i on t hat can be i mpl ement ed in a
st r ai ght f or war d manner f or specific appl i cat i ons has
not been sat i sfact ori l y realized. Thi s is mani f est ed
by t he fact t hat t he Ka l ma n filter equat i ons [2, 3],
which are val i d onl y f or linear, gaussi an systems,
cont i nue t o be widely used f or non-l i near, non-
gaussi an systems. Of course cont i nued appl i cat i on
has resul t ed in t he devel opment of ad hoc t echni ques
[e.g. Refs. 4, 5] t hat have i mpr oved t he per f or mance
of t he Ka l ma n filter and whi ch give it some of t he
charact eri st i cs of non-l i near filters.
Cent ral t o t he non-l i near est i mat i on and
st ochast i c cont rol pr obl ems is the det er mi nat i on of
t he pr obabi l i t y densi t y funct i on of t he st at e con-
di t i oned on t he avai l abl e meas ur ement dat a. I f this
a posteriori densi t y f unct i on were known, an esti-
mat e of t he st at e f or any per f or mance cri t eri on
coul d be det ermi ned. Unf or t unat el y, al t hough t he
ma nne r in whi ch t he density evolves wi t h t i me and
addi t i onal meas ur ement dat a can be descri bed in
t er ms of differential, or difference, equat i ons [6-8]
t hese rel at i ons are general l y very difficult t o solve
ei t her in dos e d f or m or numeri cal l y, so t hat it is
usual l y i mpossi bl e t o det er mi ne t he a posteriori
densi t y f or specific appl i cat i ons. Because of this
difficulty it is nat ur al t o i nvest i gat e t he possi bi l i t y of
appr oxi mat i ng t he densi t y wi t h some t r act abl e
f or m. I t is t o this appr oxi mat i on pr obl em t hat this
discussion is directed.
1.I. The general problem
The appr oxi mat i on t hat is discussed bel ow is
i nt r oduced as a means of deal i ng wi t h t he fol l owi ng
syst em and filtering pr obl em. Suppose t hat t he
st at e x evolves accor di ng t o
xk+ ~ = fRx~, wD ( 1)
465
466 H. W. SORENSON and D. L. ALSPACH
and t hat the behavior of the state is observed im-
perfectly t hrough the measurement dat a
Zk=hk(Xk, Vk). (2)
The w k and Vk represent white noise sequences and
are assumed to be mut ual l y independent.
The basic probl em t hat is considered is t hat of
estimating the state x k from the measurement data*
Z k for each k, t hat is, the filtering problem. Gener-
ally, one at t empt s to determine a "best " estimate
by choosing the estimate to extremize some per-
formance criterion. For example, the estimate
could be selected to minimize the mean-square
error. Regardless of the performance criterion,
given the a post eri ori density f unct i on p(Xkl Zk) , any
t ype of estimate can be determined. Thus, the
est i mat i on probl em can first be approached as the
probl em of determining the a post eri ori density.
This is generally referred to as the Bayesian
approach [9].
1.2. The Bayesi an approach
As has been demonst rat ed by the great interest in
the Kal man filter, it is frequently desirable to
determine the estimates recursively. That is, an
estimate of the current state is up-dat ed as a func-
t i on of a previous estimate and the most recent or
new measurement data. In the Bayesian case, the
a post eri ori density can be determined recursively
according to the following relations.
p( x lZ ) -P( x lZ k-1 ) P( Z lX )
p(ZklZk_ 0 ( 3 )
,( XklZk_,) = Sp(Xk_ llZk_ 1)p(xklXk _ ,)dXk_ x (4)
where the normalizing cons t ant p ( Z k [ Z k _ 1) in equa-
t i on (3) is given by
p( z lZ k- 1 ) = . [ p( xkl 1 ) P( z lx ) d x ( 5 )
The initial density p(xo[Zo) is given by
p( x0 l z0 ) - P( Z [ X ) p( x ) . ( 6)
p( zo)
The density p(ZklXk) in equat i on (3) is determined
by the a pri ori measurement noise density p(Vk) and
the measurement equat i on (2). Similarly, the
fl(Xk[Xk- 1) in equat i on (4) is determined byp( w k_ 1)
and equat i on (1). Knowl edge of these densities and
p(Xo) determines the P(xk[Zk) for all k. However,
it is generally impossible to accomplish the integra-
t i on indicated in equat i on (4) in closed form so t hat
the density cannot actually be determined for most
* The upper case letter Ze denotes the set (z0,zl . . . . z~).
applications. The principal exception occurs when
the plant and measurement equations are linear and
the initial state and the noise sequences are gaussian.
Then, equations (3-6) can be evaluated and the
a post eri ori density P ( X k [ Z k ) is gaussian for all k.
The mean and covariances for this system are
known as the Kal man filter equations.
When the system is non-linear and/ or the a
pri ori distributions are non-gaussian, two problems
are encountered. First the integration in equation
(4) cannot be accomplished in closed form and,
second the moment s are not easily obtained from
equat i on (3). These problems lead to the investiga-
t i on of density approxi mat i ons for which the
required operations can be accomplished in a
straightforward manner. A particularly promising
approxi mat i on is considered here. Emphasis in this
discussion is on the approxi mat i on itself rather t han
the non-linear filtering problem but the latter has
been stated because it provides the mot i vat i on for
considering an approximation. The approxi mat i on
of probability density functions is discussed in
section 2. The use of the gaussian stun approxima-
tion for a linear system with non-gaussian a pri ori
density functions for the initial state and for the
pl ant and measurement noise sequences is dis-
cussed in section 3.
2. APPROXIMATION OF DENSITIES
The probl em of approxi mat i ng density functions
in order to facilitate the det ermi nat i on of the a
post eri ori density and its moment s has been
considered previously [10] using Edgewort h and
Gr am- Char l i er expansions. While this approach
has several advantages and its utility has been
demonst rat ed in some applications, it has the
distinct disadvantage t hat when these series are
t runcat ed, the resulting density approxi mat i on is
not positive for all values of the independent
variable. Thus, the approxi mat i on is not a valid
density funct i on itself. To avoid or to at least
reduce the negativity of the density approxi mat i on,
it is sometimes necessary to retain a large number
of terms in the series which can make the approxi-
mat i on comput at i onal l y unattractive. Furt her, it is
well known t hat the series converges to a given
density onl y under somewhat restrictive conditions.
Thus, al t hough the Edgewort h expansion is a con-
venient choice in many ways, it has proven desirable
t o seek an approxi mat i on which eliminates these
difficulties. The gaussian s um approxi mat i on t hat
is described here exhibits the basic advantages of
the Edgeworth expansion but has none of the dis-
advantages already noted. That is, the approxima-
tion is always a valid density funct i on and, further,
converges uni forml y to any density of practical
concern. An approxi mat i on of this type was
Recursive Bayesian est i mat i on using gaussian sums 467
suggested briefly by AOKt [11]. Mor e recently,
CAMERON [12] and Lo [13] have assumed t hat the
a pri ori density funct i on for the initial state of
linear systems with gaussian noise sequences has
the form of a gaussian sum.
2.1. Theoretical f oundat i ons o f the approxi mat Dn
Consider a probability density funct i on p which
is assumed to have the following properties.
1. p is defined and cont i nuous at all but a finite
number of locations
2. Ioo p ( x ) d x = 1
3. p(x)>_ 0 for all x
It is convenient al t hough not necessary to consider
only scalar r andom variables. The generalization
t o the vector case is not difficult but complicates the
presentation and, it is felt, unnecessarily detracts
from the basic ideas.
The probl em of approxi mat i ng p can be con-
veniently considered within the context of delta
families of positive t ype [14]. Basically, these are
families of functions which converge t o a delta, or
impulse, funct i on as a paramet er characterizing the
fami l y converges to a limit value. More precisely,
let {64} be a fami l y of functions on ( - o% oo) which
are integrable over every interval. This is called a
delta f a mi l y o f positive type if the following con-
ditions are satisfied.
(i) I a a 6~(x)dx tends t o one as 2 tends t o some
limit value ;to for some a.
(ii) For every const ant y>_0, no mat t er how
small, 6x tends to zero uni forml y for
7 < [ x [ < ~ as 2 tends to 20.
(iii) 6z ( x ) > 0 for all x and 2.
Using the delta families, the following result can
be used for the approxi mat i on of a density funct i on
p.
Theorem 2.1. The sequence px( x) which is formed
by the convol ut i on of 6z and p
pa( x) = f ~oo cSa( x- u) p( u) du (7)
converges uni forml y to p ( x ) on every interior sub-
interval of ( - ~ , oo).
For a pr oof of this result, see KOREVAAR [14].
When p has a finite number of discontinuities, the
t heorem is still valid except at the points of dis-
continuity. It shoul d be not ed t hat essentially the
same result is given by Theorem 2.1 in FELLER [15].
I f {6a} is required to satisfy the condi t i on t hat
it follows f r om equat i on (7) t hat Pa is a probabi l i t y
density funct i on for all 2.
It is basically the presence of the gaussian weight-
ing funct i on t hat has made the Edgewort h expan-
sion attractive for use in the Bayesian recursion
relations. The operations defined by equations
(3-6) are simplified when the a pri ori densities are
gaussian or closely related to the gaussian. Bearing
this in mind, the following delta fami l y is a nat ural
choice for density approximations. Let
,L(x) A_U~(x)
= (2rc22) e xp[ - x2/i2].
( 8 )
It is shown wi t hout difficulty t hat Na( x) forms a
delta family of positive type as ; t ~0. That is, as the
variance tends to zero, the gaussian density tends
t o the delta function.
Using equations (7) and (8), the density approxi-
mat i on p~ is written as
I
oo
Pz ( x) = p ( u ) Nz ( x - u ) d u . (9)
- - 00
It is this form t hat provides the basis for the
gaussian sum approxi mat i on t hat is the subject of
this discussion.
While equat i on (9) is an interesting result, it does
not immediately provide the approxi mat i on t hat
can be used for specific application. However, it is
clear t hat p ( u ) N x ( x - u) is integrable on ( - o% oo)
and is at least piecewise continuous. Thus, (9) can
itself be approxi mat ed on any finite interval by a
Ri emann sum. In particular, consider an approxi-
mat i on of Pz over some bounded interval (a, b)
given by
/1
p. . / x) = ~ ~__E 1 p(x~)N~(x- x,)[~,- 4,- i ] (i0)
where the interval (a, b) is divided i nt o n sub-
intervals by selecting points ~, such t hat
a = ~ o < ~ l < . . . ~, = b .
In each subinterval, choose the poi nt x~ such t hat
f
~
P( X, ) [ ~i - 4, - 1] = p ( x ) d x
~ - I
which is possible by the mean-value theorem. The
const ant k is a normalizing const ant equal to
f
~o 6z( x) dx= l
k I > x, d x
468 H. W. SORENSON and D. L. ALSPACH
and insures t ha t p. , x is a density funct i on. Clearly,
f or ( b - a ) sufficiently large, k can be made arbi-
t rari l y dos e t o 1. Not e t hat it follows t hat
Thus, one can at t empt to choose
~i, #i ai ( i = 1, 2 . . . . , n)
n
k,'= p( x i ) [ { , - 4, - a] = 1
(11)
so t hat p. , a essentially is a convex combi nat i on of
gaussian density ft mct i ons Na. It is basically this
f or m t hat will be used in all fut ure discussion and
which is referred t o hereaft er as t he gaussian sum
approximation. It is i mpor t ant t o recognize t he
p. , a t hat are f or med in this manner are valid pr oba-
bility density funct i ons f or all n, 2.
2.2. Implementation of the gaussian sum approxima-
tion
The precedi ng discussion has i ndi cat ed t hat a
pr obabi l i t y density funct i on p t hat has a finite
number of discontinuities can be appr oxi mat ed
arbi t rari l y closely outside of a regi on of arbi t rari l y
small measure ar ound each poi nt of di scont i nui t y
by t he gaussian sum p, . a as defined in equat i on (10).
These asympt ot i c propert i es are cert ai nl y necessary
f or the cont i nued investigation of t he gaussian sum.
However, f or pract i cal purposes, it is desirable, in
fact imperative, t hat p can be appr oxi mat ed t o
within an accept abl e accuracy by a relatively small
number of t erms of t he series. This r equi r ement
furnishes an addi t i onal facet t o t he pr obl em t hat is
consi dered in this section.
For t he subsequent discussion, it is conveni ent t o
write t he gaussian sum appr oxi mat i on as
where
p. ( x) = ~ ~l N. , ( x- pt ) (12)
i = 1
~ = 1 ; ~ > 0 f or all i .
i = 1
The rel at i on of equat i on (12) t o equat i on (10) is
obvi ous by inspection. Unl i ke equat i on (10) in
which t he vari ance 2 2 is common t o every t er m of
t he gaussian sum, it has been assumed t hat t he
vari ance ai 2 can var y f r om one t er m t o anot her.
Thi s has been done t o obt ai n great er flexibility f or
appr oxi mat i ons using a finite number of terms.
Cert ai nl y, as t he number of t erms increase, it is
necessary t o requi re t hat t he ai t end t o become
equal and vanish.
The pr obl em of choosi ng t he par amet er s ~, / ~i , a~
t o obt ai n t he " bes t " appr oxi mat i on p, t o some
densi t y f unct i on p can be considered. To define this
mor e precisely, consi der t he L k norm. The distance
bet ween p and p , can be defined as
oO
p _ p , k = p ( x ) -
so t hat t he distance IIP-P,[[ is minimized. As t he
number of t erms n increases and as the variance
decreases t o zero, t he distance must vanish. How-
ever, f or finite n and nonzer o variance, it is reason-
able t o at t empt to minimize the distance in a manner
such as this. In doi ng this, t he stochastic estimation
pr obl em has been recast at this poi nt as a deter-
ministic curve fitting probl em.
Ther e are ot her norms and pr obl em formul at i ons
t hat coul d be considered. In many probl ems, it
may be desirable to cause the appr oxi mat i on t o
mat ch some of t he moment s, f or example, t he mean
and variance, of the t rue densi t y exactly. I f this
were a requi rement , t hen one coul d consi der the
moment s as const rai nt s on the mi ni mi zat i on
pr obl em and proceed appropri at el y. For example,
if t he mean associated with p is p, t hen t he con-
straint t ha t p, have mean value p woul d be
U ~ n ~ i a l - - i
# = ~ ailtl. (14)
i = 1
Thus, equat i on (14) would be consi dered in addi t i on
t o t he const rai nt s on the oq stated aft er equat i on
(12).
Studies rel at ed t o t he pr obl em of approxi mat i ng
p with a small number of terms have been conduct ed
f or a large number of density functions. These in-
vestigations have indicated, not surprisingly, t hat
densities which have discontinuities generally are
mor e difficult t o appr oxi mat e t han are cont i nuous
functions. The results f or t wo density functions, t he
uni f or m and t he gamma, are discussed below. The
uni f or m density is di scont i nuous and is of interest
f r om t hat poi nt of view. The gamma funct i on is
nonzer o onl y f or positive values of x so is an
exampl e of a nonsymmet r i c density t hat extends
over a semi-infinite range.
Consi der t he fol l owi ng uni f or m density funct i on
J' f or - 2 < x _ 2
p(x) = ~0 elsewhere
( 1 5 )
This di st ri but i on has a mean value of zero and
variance of 1.333.
Two different met hods of fitting equat i on (15)
have been considered. First, consi der an approxi -
mat i on t hat is suggested directly by (10) and referred
t o subsequent l y as a Theorem Fit. The paramet ers
of t he appr oxi mat i on are chosen in t he following
general manner.
Recursive Bayesian est i mat i on using gaussian sums 469
(1) Select the mean value p~ of each gaussian
so t hat the densities are equally spaced on
( - 2 , 2). By an appropri at e l ocat i on of the
densities the mean value const rai nt (14) can
be satisfied immediately.
(2) The weighting factors 0q are set equal to
i/n so
~ t = l .
t = l
(3) The variance 12 of each gaussian is the same
and is selected so t hat the L 1 distance between
p and p, is minimized.
This approxi mat i on procedure requires onl y a one-
dimensional search to determine 2.
To investigate the accuracy and the convergence
of the approxi mat i on, the number of terms in the
sum was varied. Figure l( a) shows the approxi ma-
t i on when 6, 10, 20 and 49 terms were included. It
is interesting t o observe t hat the approxi mat i on
retains the general character of the uni form density
even for the six-term case. As shoul d be expected,
the largest errors appear in the vicinity of the dis-
continuities at + 2. These approxi mat i ons exhibit
an apparent oscillation about the true value t hat is
not visually satisfying. This oscillation can be
eliminated by using a slightly larger value for the
variance of the gaussian terms as is depicted in
Fig. l(b).
The second and fourt h moment s and the L 1 error
are listed in Table 1 for these t wo sets of approxi ma-
tion. The individual terms were located sym-
metrically about zero so t hat the mean value of the
gaussian sum agrees with t hat of the uni form den-
sity in all cases. Not e t hat for the best fit the error
in the variance is onl y 1.25 per cent when 20 terms
are used. As should be expected, higher order
[a.
t ~
13.
O. S -
( o )
0 . 4
0 . 3
0 . 2
o - I /
- 3 - 2 . 1
0 . 5 -
(c)
0 . 4
0 . 3
0.2
'-,.b ' 6
X
0.5-
(b)
0 . 4 -
0 . 3
0 . 2 -
' I-~0 2" 0 3 " 0 - 3 - 0 - 2 - 0
i , , i i f i t i ,
- I ' 0 0 I ' 0 2 ' 0 3 ' 0
0.1 {'
r , i i , i , 1 , i
- 3 ' 0 - 2 " 0 - I "0 0 | ' 0
X
Fzo. 1. Gaussian
[ ~ s e o r c h f i t
. . . . . I 0 t e r m - b e s t t h e o r e m f i t
. . . . . . . . . I 0 t e r m - - s m o o t h e d t h e o r e m f i t
k
\
2 . 0 3. 0
sum approxi mat i ons of uni form
density function.
(a) Best t heorem f i t - - 6, 10, 20 and 49 t erm approxi-
mations.
(b) Smoot hed t heorem f i t - - 6, 10, 20 and 49 t erm
approximations.
(C) L 2 Search fit comparison,
470 H. W. SORENSON a nd D. L. ALSPACH
mome nt s conver ge mor e sl owl y since er r or s f ar t hes t
a wa y f r om t he me a n as s ume mor e i mpor t ance. Fo r
exampl e, t he f our t h cent r al mo me n t has an er r or o f
5 per cent f or t he 20- t er m appr oxi mat i on. The
er r or s i n t he mome nt s o f t he s moot he d fit are
aggr avat ed onl y sl i ght l y a l t hough t he L I er r or
i ncreases in a nont r i vi al manner .
TABLE 1. UNIFORM DENSITY APPROXIMATION
Fo u r t h
cent ral L 1
Var i ance mo me n t er r or
True values 1 "333 3.200 - -
o f t he n u mb e r o f t er ms i nvol ved, obt ai ni ng a search
fit is si gni fi cant l y mor e difficult t han obt ai ni ng a
t he or e m fit f or t he same numbe r of t erms. Thus, t he
t he or e m fit ma y be mor e desi rabl e f r om a pr act i cal
st andpoi nt . The mome nt s a nd L 1 er r or f or t he
sear ch fit is al so i ncl uded in Tabl e 1. These val ues
are sl i ght l y bet t er t han t he t heor em fit i nvol vi ng t en
t erms. I t is i nt erest i ng in Fig. l( c) t o not e t he
" s pi kes " t hat have appear ed at t he poi nt s o f dis-
cont i nui t i es. Thi s appear s t o be anal ogous t o t he
Gi bbs p h e n o me n o n o f Four i er series.
The s econd exampl e t hat is di scussed here is t he
g a mma densi t y f unct i on. I t is defi ned as
Best theorem f i t
6 terms 1.581 5-320 0.2199
10 terms 1.417 3.884 0.1271
20 terms 1.354 3.363 0.0623
49 terms 1.336 3.226 0.0272
f
0 f or x < 0
p ( x ) = x 3 e - X f or x > 0 .
6
(16)
Smoothed theorem f i t
6 terms 1 "690 6.387 0.2444
10 terms 1.456 4.224 0.1426
20 terms 1.363 3"442 0.0701
49 terms 1.338 3.238 0.0280
L 2 search f i t
6 terms 1.419 3.626 0.0968
As an al t er nat i ve a ppr oa c h, t he par amet er s ~i
#t, tr2 were chos en t o mi ni mi ze t he L 2 di st ance
whi ch is her eaf t er r ef er r ed t o as an L 2 sear ch fit.
These resul t s are s ummar i zed i n Fi g. 1 (c) f or n = 6.
I ncl uded f or c ompa r i s on i n t hi s fi gure ar e t he 10-
t er m t he or e m fits f r om Figs. l ( a) a nd l ( b) . Because
The di s t r i but i on has a me a n val ue of 4 a nd second,
t hi r d, a nd f our t h cent r al mome nt s o f 4, 8 a nd 72
respect i vel y.
Fi rst , consi der a t he or e m fit o f t hi s densi t y i n
whi ch t he me a n val ues ar e di st r i but ed uni f or ml y on
(0, 10). Fo r t he uni f or m densi t y, t he uni f or m pl ace-
me nt was na t ur a l ; f or t he g a mma densi t y it is not
as appr opr i at e. F o r exampl e, f or n = 6 or 10, it is
seen i n Fi g. 2 ( a) t hat t he appr oxi mat i ng densi t y is
not as good, at l east vi sual l y, as one mi ght hope.
The first f our cent r al mome nt s ar e listed i n Tabl e 2.
Cl earl y, t he hi gher or der mome nt s cont ai n l arge
er r or s a nd even t he me a n val ue is i ncor r ect i n
cont r as t wi t h t he uni f or m densi t y.
TABLE 2. GAMMA DENSI TY APPROXIMATION
Third Fourth
central central L1
Mean Variance moment moment Error
True values 4 4 8 72
Theorem f i t
6 terms in (0, 10) 3"94 4"345 5"496 60.78 0"119
10 terms in (0, 10) 3.94 3.861 4-941 47.84 0.053
20 terms in (0, 10) 3.93 3.611 4-653 41.23 0.023
20 terms in (0, 12) 4.00 4.206 7-477 71.41 0.042
L2 search f i t
1 terms in (0, 10) 3'51 3"510 0 10.53 0"203
2 terms in (0, 10) 3"82 3"427 3.04 34"96 0"078
3 terms in (0, 10) 3.91 3.632 4"711 43-39 0"036
4 terms in (0, 10) 3"95 3-744 5"682 49-57 0"018
Recursi ve Baye s i a, est i mat i on usi ng gaussi an s ums 471
Lt.
( L
O -Z
0 ' 2
0. 1
0.1
0 , 0
0 . 0
- 2 . 0
/J
/
0 " 0
(a)
1 0 - t e r m . . . . . . . . . . . . 0 . Z
6 - t e r m . . . . . . . .
o . 2
0 . 1
LL
Q
0-
0"1
;X
~*k O 0
2 " 0 4 " 0 6 " 0 8 ' 0 I 0 " 0 1 2 ' 0
x
)
: -' 0 . 0 ' - " : ~
14"0 - 2 " 0 0 "0
( b )
i n t e r v a l s
( 0, 10) . . . . . . .
( 0 , 1 2 ) . . . . . . . . . .
2 . 0 4 . 0 6 . 0 8 . 0 I 0 . 0 12-0 14. 0
X
0 . 2
0 ' 2
o.,
rt
0' 1
I 0 ' 0 t
0.0 t x
- 2 - 0 0 . 0 2 - 0 4 . 0 6 . 0
X
FIO. 2. Gaussian sum approximations o f gamma density
function.
( a) Best theorem fit--6 and 1 0 term approximations.
( b) Twenty term approximations over d ifferent inter-
vals.
( c) L2 search fit: 3 and 4 term approximations.
( c )
3 - t e r m . . . . . .
4 - t e r m . . . . . . . . . . .
8 ' 0 1 0 ' 0 12"0 14"0
Tw o d ifferent 20 - term approxi mat i ons are d e-
pi cted in Fig. 2( b) . I n one the mean val ues o f the
gaussi an terms are sel ected in the i nterval ( 0 , 1 0 ) ,
whereas the s econd approxi mat i on is d i stri buted in
( 0 , 1 2) . No t e in t he first case t he gaussi an s um tend s
t o z ero much more rapi d l y than the gamma f unct i on
f or x > 1 0 . Thus, t o i mprove the approxi mat i on and
the mome nt s it is necessary t o i ncrease the i nterval
over whi ch terms are pl aced and the s econd curve
i nd i cates the i nf l uence o f thi s change. The resul ts
of these t wo cases are i ncl ud ed in Fig. 2 and i nd i cate
that i ncreasi ng the i nterval over whi ch the approxi -
mat i on is val i d has si gni f i cantl y i mproved the
moment s .
The t heorem fit provi d es a very si mpl e me t hod
f or obt ai ni ng an approxi mat i on. However, it is
cl ear that better resul ts coul d be obt ai ned by choos -
i ng at l east the mean val ues o f the i nd i vi d ual
gaussi an terms more careful l y. Consi d er now s ome
L 2 search fits. I n Fig. 2( c) , the search fits f or three
472 H. W. SORENSON and D. L. ALSPACH
and four terms are depicted and the moments are
listed in Table 2. Clearly, values of the mean and
variance appear to be converging to the true values
of 4. Note also that the 4-term approximation is
considerably better than the 10-term theorem fit.
Thus, the search technique, while more difficult to
obtain, points out the desirability of judicious
placement of the gauss,an terms in order to obtain
the most suitable approximation.
3. LINEAR SYSTEMS WITH NONGAUSSIAN
NOISE
It is envisioned that the gauss,an sum approxima-
tion will be very useful in dealing with non-linear
stochastic systems. However, many of the
properties and concomitant difficulties of the
approximation are exhibited by considering linear
systems which are influenced by nongaussian noise.
As is well-known, the a p o s t e r i o r , density p( Xk / Zk )
is gauss,an for all k when the system is linear and
the initial state and plant and measurement noise
sequences are gauss,an. The mean and variance of
the conditional density are described by the Kalman
filter equations. When nongaussian distributions
are ascribed to the initial state and/or noise se-
quences, p ( Xk / Zk ) is no longer gauss,an and it is
generally impossible to determine p ( X k / Z k ) in a
closed form. Furthermore, in the linear, gauss,an
problem, the conditional mean, i.e. the minimum
variance estimate, is a linear function of the
measurement data and the conditional variance is
independent of the measurement data. These
characteristics are generally not true for a system
which is either non-linear or nongaussian.
For the following discussion, consider a scalar
system whose state evolves according to
Xk = ~ k , k - l Xk - 1 + Wk - 1 (17)
and whose behavior is observed through measure-
ment data z k described by
zk = Hk x ~ + VK. (18)
Suppose that the density function describing the
initial state has the form
lo
p(xo)= ~ ~ , o N t r ' i ( X o - , ; o ) . (19)
i = l
Assume that the plant and measurement noise
sequences (i.e. {Wk} and {Vk} ) are mutually inde-
pendent, white noise sequences with density func-
tions represented by
~k
p(w~)= Y t~,~Nqk,(w~--o),~) (20)
i = 1
m k
P @ k ) : E ] ' i k N r , k ( O k - - V i k ) " ( 2 1 )
i = 1
There are a variety of ways in which the gauss,an
sum approximation could be introduced. For
example, it is natural to proceed in the manner that
is to be discussed here in which the a p r i o r i distribu-
tions are represented by gauss,an sums as in equa-
tions (19), (20) and (21). This approach has the
advantage that the approximation can be deter-
mined off-line and then used directly in the Bayesian
recurs,on relations. An alternative approach
would be to perform the approximation in more of
an on-line procedure. Instead of approximating the
a p r i o r i densities, one could deal with p ( X J Z k ) in
equation (3) and the integrand in (4) and derive
approximations at each stage. This would be more
direct but has the disadvantage that considerable
computation may be required during the processing
of data. Discussion of the implementation of this
approach will not be attempted in this paper.
3. 1. De t e r mi n a t i o n o f t he a posterior, di s t r i but i ons
Suppose that the a p r i o r i density functions are
given by equations (19, 20 and 21). In using these
representations in the Bayesian recurs,on relations,
it is useful to note the following properties of
gauss,an density functions
S e h o l i u m 3.1: For a4:0,
N . ( x - a y ) = 1 N . l . ( y - x / a ) . (22)
a
S c h o l i u m 3.2:
where
N~, ( x - , i ) N , j ( x - , j )
2
- - 2 21~
- Nt . , , + ~ j ( . , - . s ) N ~ , s ( x - . , j )
. , 4 + . J4
2 2
_ 2 _ _ i f , 7 j
O,s a? +a~.
(23)
The proofs of (22) and (23) are omitted.
Armed with these two results it is a simple matter
to prove that the following descriptions of the
filtering and prediction densities are true.
T h e o r e m 3 . 1 . Suppose that P ( X k / Z k - O is
described by
It,
p ( X k / Z k - 1 ) = E a*kN, ; k( Xk- - "; k) " (24)
i = 1
Then, p ( X k / Z k ) is given by
l k mk
p(~dz~)= E Z , j N . , ~ ( x ~ - ~ , j ) (25)
i = 1 j = I
Recursive Bayesian est i mat i on using gaussian sums 473
where
e ~ , , . ~ N ~(z~ - P~,.)
C i j = O t l k T j k N x ( z k - - P i j m ~ = 1
y' 2 __ 1t./' 2_.' 2 - - _ 2
- - ~ k O t k "1- r j k
, al~H~ rz
eq = I~ik 4 ~ , 2 ~--'7-Y'7, 2 L k - - V~k-- Hk#tk]
~ t k -rl k "t" r j k
2 , ' 2 , ' 4 2 , ' 2 2 2
tru = ~i~ - trtk Hk/ ( ai ~ H~ + r ~ ) .
It is obvious t hat the c u > 0 and t hat
l k rak
2 Z c,~=~.
Thus, equat i on (25) is a gaussian sum and for con-
venienee one can rewrite it as
Ilk
p( xk/ Zk) = ~. atkN~,~(Xk-- I~ik) (26)
t = 1
where nk =( l k ) ( mk ) and the aik, Pi k and /qk are
formed in an obvious fashi on from the c u, tr u and
e U
The pr oof of the t heorem is straightforward.
Fr om the definition of the measurement relation
(18) and the measurement noise density (21), one
sees t hat
mk
P(Zk/Xk) = ~ . , ~kN, , k( Zk- - Hk x k - - Vtk) .
Using this and (24) in (3) and applying the two
scholiums, one obtains (25).
The prediction density is determined from (4) and
leads t o the following result.
Theor em 3.2. Assume t hat p ( x k / Z k ) is given by
(26). Then for the linear system (17) and pl ant
noise (20) the prediction density p(Xk+ 1/ Zk) is
nk $ k
p(x~+l/z~)= Y, 2 a~jJ%/x~+l
i =1 j =l
- Ck + 1, kl~ik -- 09jk) (27)
where
2 2 2 2
2u = ~ + l , k Plk +qyk
It is convenient to redefine terms so t hat (27) can
be written as
l k + l
p(x~+ ~l Z ~) = ~ ~ ( ~ + ~) N~( ~ + , ) ( xk+ ~ - ~i( ~ + ~)).
(28)
Clearly, the definition of p( Xo) has the form (28)
as does the p ( x k / Z k - 1 ) assumed in Theorem 3.1.
Thus, it follows t hat the gaussian sums repeat
themselves from one stage t o the next and t hat (26)
and (28) can be regarded as the general forms t or
an arbi t rary stage. Thus, the gaussian sum can
al most be regarded as a reproducing density [16].
It is i mport ant , however, t o not e t hat the number
of terms in the gaussian sum increases at each stage
so t hat the density is not described by a fixed
number of parameters. The density would be t rul y
reproducing i f onl y the initial state were non-
gaussian, for example, see Ref. [12]. It is clear in
this case t hat the number of terms in the gaussian
sum remains equal to the number used t o define
p( x o) so t h a t p ( Xk / Z k ) is described by a fixed number
of parameters.
There are several aspects t hat require comment
at this point. First, i f the gaussian sums for the a
pr i or i density all cont ai n onl y one term, t hat is, t hey
are gaussian, the Kal man filter equations and the
gaussian a pos t er i or i density are obtained. In fact
the e u and a. . 2 in (25) and the means and variances
~J
21k 2 in (27) each represent the Kal man filter equa-
tions for the i j t h density combi nat i on. Thus, the
gaussian sum in a manner o1 speaking describes a
combi nat i on of Kal man filters operating in concert.
To examine this t urt her, consider the first and
second moment s associated with the prediction and
filtering densities.
Theor em 3.3
l k mk
1= 1 j = l
eE( x~-~/~) ~lzd A_ p~/~ 2
l k mk
= Y _ _ ., Y . , c i j [ o , j 2 + ( ~ k / , , - ~ t j ) 2 ] ( 3 0 )
i = 1 j = l
E [ x k + ~ l Z d A e ~ + x / ~ = O k + ~ , k e ~ / k + E [ w d (31)
E [ ( x ~ + 1 - ~ + ~ /~)2[ Z d = p k + ~/k 2
=~k+ l, k2pk/k2 + E {[wk--E(ogk)] 2} (32)
where
~k
t-----1
~k
E { EWk - - ~ ( ( Dk ) ] 2 } = ~ ~ i k ( q i k 2 dl- (.Oik 2 ) - - E 2 ( W k ) .
i - - - I
In equat i on (29), the mean value ~k/k is formed
as the convex combi nat i on of the mean values 8q
of the individual terms, or Kal man filters, of the
gaussian sum. It is i mpor t ant to recogni ze t hat the
cq, as is apparent from equat i on (25), depend upon
the measurement data. Thus, the condi t i onal mean
is a non-l i near funct i on of the current measurement
data.
474 H. W. SORENSON and D. L. ALSPACH
The condi t i onal variance pk/k 2 descri bed by (30)
is mor e t han a convex combi nat i on of the variances
of the individual t erms because of the presence of
the t erm ( ~k/k--eij) z- Tiffs shows t hat t he variance
is increased by the presence of t erms whose mean
values differ significantly f r om t he condi t i onal mean
~k/k. The influence of these t erms is t empered by
the weighting f act or c~j. Not e also t hat the con-
ditional vari ance (in cont r ast t o t he linear Kal man
filter) is a funct i on of t he measurement dat a because
of t he cij and the (~k/k-- eij).
The mean and vari ance of t he predi ct i on den-
sity are descri bed in an obvi ous manner. I f t he
gaussian sum is an appr oxi mat i on to the t rue
noise density, these rel at i ons suggest t he desirability
of mat chi ng t he first t wo moment s exactly in or der
t o obt ai n an accurat e descri pt i on of the condi t i onal
mean ~k+ l/k and vari ance Pk+ l/~.
As discussed earlier, it is conveni ent t o assign the
same vari ance t o all t erms of the gaussian sum.
Thus, i f t he initial state and t he measurement and
noise sequences are identically distributed, it is
reasonabl e t o consi der t he variances f or all t erms to
be identical and t o det ermi ne the consequences of
this assumpt i on. Not e f r om Scholium 3.2 t hat i f
t hen
0"i 2 = O' j 2 mo -2
(rU2 = a2/2 f or all i, j .
Thus, t he vari ance remai ns t he same f or all t erms
in t he gaussian sum whenp( xk/ Zk) is f or med and the
concent rat i ons as described by t he vari ance becomes
greater. Fur t her mor e, f r om Schol i um 3.2 it follows
t hat t he mean value is given by
i j - - ~ i " 3 7 / ~ j
2
so t he new mean val ue is the average of t he previ ous
means. This suggests t he possibility t hat t he mean
values of some terms, since t hey are t he average of
t wo ot her terms, may become equal, or al most
equal. I f t wo t erms of the sum have equal means
and variances, t hey coul d be combi ned by addi ng
t hei r respective weighting factors. This woul d
reduce t he t ot al number of t erms in t he gaussian
s u m.
The ~j in (25) are essentially det ermi ned by a
gaussian density. Thus, i f Zk--p~ j becomes very
large, t hen t he ci j may become sufficiently small
t hat t he entire t erm is negligible. I f t erms coul d be
neglected, t hen t he t ot al number of t erms in the
gaussian sum coul d be reduced.
The predi ct i on and filtering densities are repre-
sented at each stage by a gaussian sum. However,
it has been seen t hat the sums have the characteristic
t hat the number of t erms increases at each stage as
t he pr oduct of the number of t erms in the two con-
stituent sums f r om which t he densities are formed.
This fact coul d seriously reduce the practicality o!
this appr oxi mat i on i f t here were no alleviating
circumstances. The discussion above regardi ng the
diminishing of the weighting fact ors and the com-
bining of t erms with nearly equal moment s has
i nt r oduced the mechanisms which significantly
reduce the appar ent ill effects caused by t he increase
in t he number of terms in the sum. I t is an observed
fact t hat the mechanisms whereby terms can be
neglected or combi ned are i ndeed operat i ve and in
fact can somet i mes permi t the number of t erms in
t he series t o be reduced by a substantial amount .
Since weighting fact ors f or individual t erms do
not vanish identically nor do the mean and variance
of mor e gaussian densities become identical, it is
necessary t o establish criteria by which one can
det ermi ne when t erms are negligible or are approxi -
mat el y t he same. This is accompl i shed by defining
numeri cal t hreshol ds which are prescribed t o main-
rain t he numeri cal er r or less t han acceptable limits.
Consi der the effects on t he L ~ er r or of neglecting
t erms with small weighting factors. Suppose t hat
t he density is
p( x) = ~ oqNa( x - ai ) (33)
i = 1
and such t hat al , a2 . . . . , ~, , - l ( m <n) are less t han
some positive number 6~. Not e t hat the variance
has been assumed t o be t he same f or each term.
Consi der replacing p by Pa where
pA(X) = 1 ~ o~iN,,(x--ai). (34)
~ , O~i i = m
i = m
The t ol l owi ng
difficulty.
Theorem 3.4.
bound is det ermi ned wi t hout
f
~ m - - 1
I p( x) -p ( x) ld x 2 E ( 3 5 )
oo 1= 1
< 2(m - 1)61 . (36)
The L ~ er r or caused by neglecting ( m- 1) t erms
each of whi ch are less t han fi~ is seen in (35) t o be
less t han twice t he sum of the neglected terms.
Thus, t he t hreshol d fi~ can be selected by using (36)
or (35) t o keep t he increased L 1 er r or within
acceptable limits.
Consi der t he si t uat i on in which t he absol ut e
value of t he difference of t he mean values of t wo
t erms is small. In part i cul ar, suppose t hat a 1 and a2
are approxi mat el y the same and consi der t he L 1
er r or t hat results i f the p( x ) given in (33) is replaced
by
pa( x) = ~ o q N, ( x - a i ) +( o q + o t z ) N ~ ( x - ~ ) (37)
i = 3
Recursi ve Bayesian est i mat i on using gaussian sums 475
where
Usi ng (37), one can pr ove t he fol l owi ng bound.
For a detailed pr oof of this and ot her results, see
Ref. [17].
Theorem 3.5.
f ~ _ o l P ( x ) - p A ( X ) l d x < _ 4 ~ 2 M l a 2 - a l I ~ + ~ (38)
Thus t erms can be combi ned i f t he ri ght -hand side
of (38) is less t han some positive number 62 which
represents t he allowable L ~ error. The M in (38) is
t he maxi mum val ue of N, and is given by
1
M=
Observe t hat as t he vari ance tr decreases, the
distance bet ween t wo t erms t o be combi ned must
also decrease in or der t o retain t he same er r or
bound.
3.2. A numerical example
In this section t he results present ed in section 3.1
are appl i ed t o a specific example. To make (17)
and (18) mor e specific, suppose t hat t he system is
X k " ~ - - X k - I "~ Wk- 1 (39)
Zk = Xk + vk (40)
where t he xo, Wk, Vk(k = O, 1 . . . . ) are assumed t o
be uni f or ml y di st ri but ed on ( - 2 , 2) as defined by
(15).
The pr obl em t hat is consi dered represents some-
thing of a worst case f or t he appr oxi mat i on because
t he initial state and t he noise sequences are assumed
t o be uni f or ml y di st ri but ed with t he density dis-
cussed in section 2.2. As discussed there, t he dis-
cont i nui t i es at + 2 make it diificult t o fit this density
and necessitates t he use of many t erms in t he
gaussian sum. The specific appr oxi mat i on used
here cont ai ns 10 t erms and is shown in Fig. l ( b) .
I t is appar ent t hat this appr oxi mat i on has non-
trivial errors in t he nei ghbor hood of t he dis-
cont i nui t i es but nonetheless retains t he basic
char act er of t he uni f or m distribution.
The per f or mance of the gaussian sum approxi ma-
t i on f or this exampl e is described bel ow by com-
pari ng t he condi t i onal mean and vari ance pr ovi ded
by t he appr oxi mat i on with t hat predi ct ed by t he
Kal man filter and with the statistics obt ai ned by
consi deri ng t he t rue uni f or m di st ri but i on. The
l at t er have been det ermi ned f or this exampl e aft er a
nont ri vi al amount of numeri cal comput at i on. In
addi t i on the t rue a posteriori densi t y p(Xk/Zk) has
been comput ed and is compar ed with t hat obt ai ned
using t he approxi mat i on.
The Kal man er r or vari ance is i ndependent of the
measur ement sequence and can cause misleading
filter response. For example, f or some measure-
ment sequences perfect knowl edge of t he state is
possible, t hat is t he variance is zero, but t he Kal man
vari ance still predicts a large uncert ai nt y. For
example, suppose t hat t he measurement s at each
stage are equal t o
Zk=2k+4, k = 0 , 1 . . . .
Then, t he mi ni mum mean-square estimate f or t he
state based on t he uni f or m di st ri but i on is
.k/k=2k+2, k= O, 1 . . . .
and the vari ance of this estimate is
Pk/k2=O f or all k.
Thus, f or this measur ement real i zat i on the mini-
mum mean-square estimate is error-free. Since t he
Kal man vari ance is i ndependent of t he measure-
ments, it is necessarily a poor appr oxi mat i on of the
act ual condi t i onal variance. The square r oot of t he
Kal man variance 7KA L and t he er r or in t he best
linear estimate t oget her with t he square r oot of t he
vari ance a~s and er r or in t he estimate of t he state
f or t he t en t er m gaussian sum f or this measur ement
real i zat i on is shown in Fig. 3(a). Thi s shows t hat a
considerable i mpr ovement in bot h t he mean and
vari ance is pr ovi ded by t he gaussian sum when
compar ed with t he results pr ovi ded by t he Kal man
filter.
(a)
o ' i o !
0"8
0-7
0. 6
0"5
0"4
0-5
0. 2
0"1
%
I 0
0-9
0.8
0.7
0.6
0.5
0-4-
0. 3
0. 2
0.1
0 0
+ f.tr - + - + - + - + - + - + - + - + - + - + - + - + - + - + - + - + - + C T K A
L
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . " ~ K A L
. . . . . ' "
f
/
( .
: 1 - " ~ " . . . . . " . . . . X C S
- / .
v " I 1 I I I I I I 1 I I I I 1 I I I I I
I 2 3 4 5 6 7 8 9 10 II 12 13 14 15 16 17 18 19 20
K
(b)
I / /
I , , ' / V " , x o o ,
~ . arRuz
- 5 ', "~'.-'.,:'-~c.-~'<'~....:T'-.~ L."~"r..a. , ,-.r..~..,
I 2 3 4 5 6 7 8 9 I0 II 12 13 14 15 16 17 18 19 PO
I<
F[ o. 3. Gaussian sum and kal man fi l ters compared
wi t h best non-l i near f i l t er.
(a) Perfect knowl edge example.
Co) Random noise example.
476 H. W. SORENSON and D. L. ALSPACH
The measurement sequence d escribed above is
highly improbable. A more representative case is
d epicted in Fig. 3 ( b) in which the plant and measure-
ment noise sequences were chosen using a rand om
number generator f or the uniform d istribution
prescribed above. This figure again shows that the
gaussian sum approximation to the true variance is
consid erably improved over the Kal man estimate
and , in fact, agrees very cl osel y with the true
stand ard d eviation aTRU~, of the aposteri ori d ensity.
However, the Kal man variance is representative
enough that the error in the estimate of the mean is
not particularly d ifferent than that provid ed by the
gaussian sum.
The aposteri ori d ensity function for the third and
fourteenth stages is shown in Figs. 4( a) and 4( b) .
I n these figures, the actual d ensity, the gaussian
approximation provid ed by the Kalman filter and
the gaussian sum approximation are all includ ed .
At stage 3 , the true variance is smaller than the
Kal man variance and the Kal man mean has a
significant error whereas at stage 1 4 the true
variance is larger than that pred icted by the Kalman
filter. Not e that the error in the a pri ori d ensity
approximation at the d iscontinuities is still evid ent
in the a posteri ori approximation but that the
general character of the d ensity is reprod uced by
the gaussian sum.
b .
( : 3
13.
0 , 7 5
0 . 5 0
0 " 2 5
( a )
Tr ue p r o b a b i l i t y densi t y
. . . . . . . Kal man a p p r o x i ma t i o n
. . . . Gaussi an sum appr oxi mat i on
/ \
. . . . . . . \
. l / " +. . \
. 7 . .
. : . ' " '.+ . \
. . "f'/ i / " .
. . . i " + . . ' \
\
"'"'# / "+ "~.X
, , , / / "'t..
X \
I I I I I i I i I I I
0" 5 I ' 0 1' 5 2 " 0 2 ' 5 S' O
x
0 . 5
0 . 4
~ 0 . 3
a
( 3 -
0"2
O. i
I 0
( b )
. . " " . . ,
, . . \
. . L
" ' ,
j .:.:,',,,
/ ," ". \
/ ' .. \
. : " "... \ \
/ . : ' . \
/ . . ' ' ,
/ . . " ' ' .
1, 0 2 ' 0 3- 0 . 4 - 0
x
" " ~3
5"0
0 ' 5 0
r ~
Q .
0 ' 2 5
( c )
o.75 T r u e p r o b a b i l i t y d e n s i t y
. . . . . 8~ = 0 . 0 0 1 = 8 2
. , . ' " , ~
; ' " ~
~ I I I
0 - 0 0 . 5
0 . 4 -
0 " 3
Q
n O . 2
%
I I 0
I ' 0 I - 5 2"0 2 ' 5 S' O 0
x
( d)
. . . . . . . . . . . 8 , = 0 . 0 O I , 8 2 = 0 . 0 0 . 5
. . . . . 81= 0 " 0 0 5 , 82 = 0 . 0 O I
. " ~ . ~ :+ -h+
/Y
;W
~ " . .
/ . ' - I : " \
~ - ' 1 , , , , , , ,
1.0 2"0 3"0 4 . 0
x
FI O. 4. I nfluence of 61 , &z on a posteri orl d ensity.
( a) Third stal~l---8 2------0 "0 0 1 .
Co) Fourteenth stag~---Sl = 8 2 = 0 . 0 0 1 .
( c) Third s t a l l = 0 " 0 0 1 , 62= 0 . 0 0 5 . and 5 t = 0 "0 0 5 , 61 = 0 "0 0 1 .
( d ) Fourteenth staile---alffi0 4) 0 5 , &z= 0 "0 0 1 . and 5 1 = 0 . 0 0 1 , 8 z= 0 . 0 0 5 .
\
5 ' 0
Recursive Bayesian estimation using gaussian sums 477
In Figs. 4(a) and 4(b) the approximations at each
stage are based on terms being eliminated when
their weighting functions are less than St=0. 001
and combined when the difference in mean values
cause the L ~ error to be less than 52=0.001. The
effect of changing these parameters, that is, 5~ and
52, can be seen in Figs. 4( c) and 4(d). In these
figures, the effect of changing 5t and c52 is shown.
In one case tS~ is increased from 0.001 to 0.005 while
keeping 52 equal to 0.001. Alternatively, 52 is in-
creased to 0.005 while the value of 5~ is maintained
equal to 0.001. It is apparent in this example that
52 has a larger effect on the density approximation.
The increase in this parameter can be seen to
introduce a "ripple" into the density function and
indicates that the individual terms have become too
widely separated to provide a smooth approxima-
tion. It is interesting that the effect appears to be
cumulative as the ripple is not apparent after three
stages but is quite marked at the 14th stage.
The effect of improving the accuracy of the
density approximation by including more terms in
the a p r i o r i representations and by retaining more
terms for the a p o s t e r i o r i representation can be seen
in Fig. 5. Twenty terms are included in the a p r i o r i
densities and 5~ and 52 are reduced to 0.0001. The
figure presents the actual a p o s t e r i o r i density, the
gaussian sum approximation, and the gaussian
approximation provided by the Kal man filter
equations. Comparison of the results for stages 3
and 14 with Fig. 4 indicates the improvements in
the approximation that have occurred.
- - T r u e p r o b a b i l i t y d e n s i t y
. . . . . K a l m a n approxi mati on
- - - G a u s s i a n s u m approxi mati on
4. CONCL US I ONS
The approximation of density functions by a
sum of gaussian densities has been discussed as a
reasonable framework within which estimation
policies for non-linear and/or nongaussian stochastic
systems can be established. It has been shown that
a probability density function can be approximated
arbitrarily closely except at discontinuities by such
a gaussian sum. In contrast with the Edgeworth or
Gram-Charlier expansions that have been in-
vestigated earlier, this approximation has the
advantage of converging to a broader class of
density functions. Furthermore, any finite sum of
these terms is itself a valid density function.
The gaussian sum approximation is a departure
from more classical approximation techniques
because the sum is restricted to be positive for all
possible values of the independent variable. As a
result, the series is not orthogonalizable so that the
manner in which parameters appearing in the sum
are chosen is not obvious. Two numerical pro-
cedures are discussed in which certain parameters
are chosen to satisfy constraints or are somewhat
arbitrarily selected and others are chosen to mini-
mize the L k error.
It is anticipated that the gaussian sum approxima-
tion will find its greatest application in developing
estimation policies for non-linear stochastic systems.
However, many of the characteristics exhibited by
non-linear systems and some of the difficulties in
using the gaussian sum in these cases are exhibited
by treating linear systems with nongaussian noise
0 5 0 F . . 2 O F 0. 75 I - 0 ' 5 0 I -
L l b - . . l . . . . .
. . " ". . u _ - - - ' " '
0 2 5 ~ .--' ~-"m---~-I o I 0 c-, . . a.- 0 25
/ . . . . . . , . , ..... . . . . . , .
o . _
o , : . , o l . . . . . ; i o o - " i o [ . ' ' , "-~
- 2 - 5 0 0 2 5 0 2 0 1.50 5 ' 0 0 0 2 . 5 0 5 . 0 0
x X X X
0. 50 F . . . . I 0 - 0- 5Of - 0 5 0 [ - . ' . .
. . . . . t - b " - , , t -
0.~:5 ..' ~_ , ~ l 1 . " ~ % 2 5 t- ..7 " . t ~ c ~ t- I,' . . ' " " ' - ' - ' ~ , , ,
o , . . . , J " , ' l _ , o l . . . . . . . . . : / , , I % o 1 . ; ' 1 , ,
- I 0 0 i 5 0 - 2 5 0 ' , 00 5 0 0 0 2. 50 5 0 0 1.50 4 - 0 0 6- 50
X X X X
0 . 5 0 [ - 0 5 0 [ 0 5 0 l - I 0 [
1' 50 4 ' 0 0 6 ' 5 0 0 2 5 0 ! 0 0 1.50 4 0 0 2 0 4 0 6 0
X X X X
5 r l - ..--~-'~m-q I / " " k 5 r 5 1 - I f ~ "
o L , ~ f 1 I I I f ' . , ol , . d" ~ I r r - . , , oL. , L" ~ I I Itz' ~ o i ~ i i t i " L'~---.t
0 2 5 0 5 0 0 0 2 5 0 5. 00 C,'~O 2. 00 4- 50 i O 4 0 5. 0. 6 0
X X X X
Fi e. 5. A posteriori d e n s i t y f o r s i x t e e n s t a g e s .
478 H. W. SORENSON a nd D. L. ALSPACH
sources. Of course, when t he noi se is ent i r el y
gaus s i an t he pr obl e m degener at es and t he f ami l i ar
Ka l ma n filter e qua t i ons are obt a i ne d as t he exact
s ol ut i on of t he pr obl em.
The l i near , nonga us s i a n e s t i ma t i on pr obl e m is
di scussed a n d it is s hown t hat , i f t he a priori densi t y
f unc t i ons are r epr es ent ed as gaus s i an sums, t he n
t he n u mb e r of t er ms r equi r ed t o descr i be t he a
posteriori dens i t y is equal t o t he pr oduc t of t he
n u mb e r of t er ms of t he a priori densi t i es used t o
f or m it. The a ppa r e nt di s advant age is seen t o cause
l i t t l e di ffi cul t y however , because t he mo me n t s of
ma n y i ndi vi dua l t er ms conver ge t o c o mmo n val ues
whi ch al l ows t he m t o be c ombi ne d. Fur t he r , t he
wei ght i ng f act or s associ at ed wi t h ma n y ot her t er ms
become ver y smal l a n d pe r mi t t hose t er ms t o be
negl ect ed wi t hout i nt r oduc i ng si gni f i cant er r or t o
t he a ppr oxi ma t i on.
Nume r i c a l r esul t s f or a specific syst em are
pr es ent ed whi ch pr ovi de a d e mo n s t r a t i o n of some
of t he effects di scussed i n t he text. These r esul t s
conf i r m dr amat i cal l y t ha t t he gaus s i an s um appr oxi -
ma t i o n can pr ovi de cons i der abl e i ns i ght i nt o
pr obl e ms t hat hi t her t o have been i nt r a c t a bl e t o
anal ysi s.
REFERENCES
[1] A. H. JAZWINSKI: Stochastic Processes and Filtering
Theory. Academic Press, New York (1970).
[2] R. E. KALMAN: A new approach to linear filtering and
prediction problems. J. bas. Engng 82D, 35--45 (1960).
[3] H. W. SORENSON : Advances in Control Systems, Vol. 3,
Ch. 5. Academic Press, New York (1966).
[4] R. COSAERT and E. GOTTZEIN: A decoupled shifting
memory filter method for radio tracking of space
vehicles. 18th International Astronautical Congress,
Belgrade, Yugoslavia (1967).
[5] A. JAZWINSKI: Adaptive filtering. Automatica 5, 475-
485 (1969).
[6] Y. C. Ho and R. C. K. LEE: A Bayesian approach to
problems in stochastic estimation and control. IEEE
Trans. Aut. Control 9, 333-339 (1964).
[71 H. J. KUSHNER: On the differential equations satisfied
by conditional probability densities of Markov pro-
cesses. SIAMJ. Control 2, 106-119 (1964).
[8] J. R. FmrlER and E. B. STEAR: Optimal non-linear
filtering for independent increment processes, Parts I
and II. IEEE Trans. Inform. Theory 3, 558-578 (1967).
[9] M. AOKI: Optimization of Stochastic Systems, Topics
in Discrete-Time Systems. Academic Press, New York
(1967).
[10] H. W. SORENSON and A. R. STOBBERUD: Nonlinear
filtering by approximation of the a posteriori density.
Int. J. Control 18, 33-51 (1968).
[11] M. AOKI: Optimal Bayesian and rain-max control of a
class of stochastic and adaptive dynamic systems. Pro-
ceedings IFAC Symposium on Systems Engineering for
Control System Design, Tokyo, pp. 77-84 (1965).
[12] A. V. CAMERON: Control and estimation of linear
systems with nongaussian a priori distributions. Pro-
ceedings of the Third Annual Conference on Circuit and
System Science (1969).
[13] J. T. Lo: Finite dimensional sensor orbits and optimal
nonlinear filtering. University of Southern California.
Report USCAE 114, August 1969.
[14] J. KOREYAAR: Mathematical Methods, Vol. 1, pp. 330-
333. Academic Press, New York (1968).
[l 5] W. FELLER: An Introduction to Probability Theory and
Its Applications, Vol. lI, p. 249. John Wiley, New York
(1966).
[16] J. D. SI'RAOINS: Reproducing distributions lbr machine
learning. Standard Electronics Laboratories, Technical
Report No. 6103-7 (November 1963).
[17] D. L. ALST'ACH: A Bayesian approximation technique
for estimation and control of time-discrete stochastic
systems. Ph.D. Dissertation, University of Cahfornia,
San Diego (1970).
R6sum~--Les relations recurrentes bayesiennes qui decrivent
le comportement de la fonction 'de densit6 de probabilit6
a posteriori de l'6tat d' un syst6me al6atoire, discret dans le
temps, en se basant sur les donn6es des mesures disponibles,
ne peuvent ~tre g6n6ralement resolues sous une forme fermee
lorsque le syst6me est soit non-lin6aire, soit non-gaussien.
ke present article introduit et propose une approximation de
densit6, mettant en jeu des combinaisons convexes de fonc-
tions de densit6 gaussiennes, ~t titre de m6thode efficace pour
6viter les difficultes rencontr6es dans l'6valuation de ces
relations et dans l'utilisation des densit6s en r6sultant pour
determiner des strat6gies d'6valuation particuli6res. II est
montr6 que, lorsque le nombre de termes de la somme
gaussienne augmente sans limites, l'approximation converge
uniformement vers une fonction de densit6 quelconque dans
une cat6gorie 6tendue. De plus, toute somme finie est elle-
m6me une fonetion de densit6 valable, d' une mani6re
diff6rente des nombreuses autres approximations 6tudi6es
dans le pass6.
Le probl6me de la determination des estimations de la
densit6 a posteriori et de la variance minimale pour des
syst~mes lin~aires avee bruit non-gaussien est trait6 en
utilisant l'approximation des sommes gaussiennes. Ce
probl6me est 6tudi6 parce qu' il peut ~tre trait6 d' une mani~re
relativement simple en utilisant l'approximation mais
contient encore la plupart des difficult~.s rencontr6es en
consid6rant des syst+mes non-lin6aires, puisque la densit6
a posteriori est non-gaussienne. Apr6s la discussion du
probl6me g6n6ral du point de vue de l'application des sommes
gaussiennes, l'article pr6sente un exemple num6rique dans
lequel les statistiques r6611es de la densit6 a posteriori sont
compar6es avec les valeurs pr6dites par les approximations
des sommes gaussiennes et du filtre de Kalman.
Zusammenfassung--Die Bayes'schen Rekursionsbezie-
hungen, die das Verhalten der a priori-Wahrscheinlichkeits-
dichtefunktion des Zustandes eines zeitdiskreten stochasti-
schen Systems beschrieben, k6nnen nicht allgemein in
geschlossener Form gel6st werden, wenn das System ent-
weder nichtlinear oder nicht-gaussisch ist. In dieser Arbeit
wird eine Dichteapproximation, die konvexe Kombina-
tionen von Gauss'schen Dichtefunktionen enth~ilt, eingefiihrt
und als bedeutungsvoUer Weg zur Umgehung der Schwierig-
keiten vorgeschlagen, die sich bei der Auswertung dieser
Beziehungen und bei der Benutzung der resultierenden
Dichten ergeben, um die spezifische Schiitzstrategie zu
bestimmen. Ersichtlich konvergiert, da die Zahl der Terme
in der Gauss'schen Summe unbegrenzt w~ichst, die Approxi-
mation gleichm~il~ig zu einer Dichtefunktion in einer um-
fassenden Klasse. Weiter ist eine endliche Summe selbst eine
gfiltige Dichtefunktion, anders als viele andere schon
untersuchte Approximationen.
Das Problem der Bestimmung der Sch~ttzungen der a
posteriori-Dichte und des Varianzminimums ftir lineare
Systeme mit nichtgauss'schem Ger~iusch wird unter Behand-
lung der Gauss'schen Summenapproximation behandelt.
Dieses Problem wird betrachtet, well es in einer relativ
geraden Art unter Benutzung der Approximation behandelt
werden kann, abet immer noch die meisten der Schwierig-
keiten enth~ilt, denen man bei der Betrachtung nichtlinearer
Systeme begegnet, da die a posteriori-Dichte nichtgaussisch
ist. Nach der Diskussion des allgemeinen Problems vom
Gesichtspunkt der Anwendung Gauss'scher Summen, wird
ein Zahlenbeispiel gebracht, in dem die aktuellen Statistiken
der a posteriori-Dichte mit den Werten verglichen werde,
die durch die Gauss'sche Summe und durch die Kalman-
Filter-Approximationen vorhergesagt wurden.
Recursive Bayesian estimation using gaussian sums 479
Pe~IOMe---PeKypcHBHMe 6a ~ ec o m :r ~ e BMpa~eHH~ OnHCM-
aal ot t l ne nOBelIeHHe anocTepHopHoi~ ~ymmH nnOTHOCTH
BepO~ITHOCTH COCTOIIHH~I cay~aI~o~ CHCTeMbI, ]IHCI{peTHO~
n o SpeMetlrl, OCHOSl, magcl , Ha ~ocTynm~Ix pe3ynbTaTax
H3MepeHgl~, He MO FyT 6MTI, O61,DIHO peiiieHl, i B 3aMKHyTOI~
dpOpMe Kor ea CHCTeMa HH60 HCHHHe]~Ha, dIH60-~e He
ssnaer c~ r aycco~cr oi t HaCTOSi~a~ CTaTb~ BBO~HT H
npemmraer npH6HH~KeHHe ILrlOTHOCTH, npa6c raromc c K
CO qeTallI ~ lM rayccoBCKHX dpyHKl1Hlt HHOTHOCTH, B Ka~ICCTBe
3Op~eKTHBHOFO MeTo~Ia ~na H3~KaHHfl 3aTpy~HeHHl~
BCTpeqaeMblX B OI~eHKe 3THX Bblpa~KeHH~ H B HCHO.rlb3OBaHHH
B],~TeKaIOILmX H3 HHX HJIOTHOCTeI'~ ~II~1 onpe~eHH~ ~aHHblX
CTaTerH.fi OReHKH. I'IoKa3MBaeTC~ qTO, KOFRa ~IHCHO
~IeHOB raycco~croit CyMMbI 6ecnpe~eH~HO Bo3pacTaeT,
rlpH6nH~KeHHe paBHOMepHo CTpeMrlTC~I K n/ o6o~t (I~yHKI~HH
ILrIOTHOCTH B IIIHpOKOM KJlaCC~. CBepx ~TOFO, BCaKa~I
KOHeqHall CyMMa caMa ~IBJl~leTCfl rlpHeMneMOl'~ ~yHKHHeI~
IIHOTHOCTH~ B OTJIPIqI4H OT MHOFO~4~HCHeHH]bIX ~Ipyrtlx
npH6Jm~cermll H3y~aBmHXC~ S npomnoM.
I-[po6J1eMa oHpo~eJ1erm~ OHOHOK anOCTOpHOpHOfl rmO-
THOCTII H MHHHMaYlbHO~ BapHaHTHOCTH ~flH HHHCflHbIX
CHCTeM C IIo-rayccoBCIOIM IIIyMOM pemaeTcfl c noMot I~rO
npH6nr mc on~ rayccoBcrdax CyMM. ~Ta npo6f l eMa n3y-
tlaeTCfl IIOTOMy qTO OHa MO)KeT 6bITb peIi i eHa cpaBHllTe-
JlbHO npOCTO Hcno.rtl,3y~ npHf m, txcerme HO co~lep~crlT e me
60JIBIIIHHCTBO 3aTpy~IHeHni~ BCTpeqaeMI, IX pacCMaTpHBa~
HeaaHeianbie CHCTeMbI, H60 ariocrepaopriaa nnor aocr b
a a ~e wc s He-rayccoBczoit Hocne o6cy:~cjXeHaa o6IUeR
npo6HeMbl C TOqKH 3peHHll HpHMeHeHH~I rayCCOBCKHX
CyMM~ CTaTl~fl ~laeT qHCJIOBOI~ rlpltMep B KOTOpOM ~el~CTBHTe-
IlbHble CTaTHCTHKH anOcTeprlopHolt ILrlOTHOCTH cpaBHHBalO-
TCfl C 3HaLIeHH~VIH Hpe~cKa3at-IHblMrl HpH6HH)KeHri~Mri
r ayccoacKax CyMM H ~HHbTpa KanMana.

You might also like