You are on page 1of 11

ARTICLE IN PRESS

Image and Vision Computing xxx (2006) xxx–xxx


www.elsevier.com/locate/imavis

A particle filter for joint detection and tracking of color objects


a,*
Jacek Czyz , Branko Ristic b, Benoit Macq a

a
Communications Laboratory, Université catholique de Louvain, Place du Levant 2, 1348 Louvain-la-Neuve, Belgium
b
ISRD, DSTO, 200 Labs, P.O. Box 1500, Edinburgh, SA 5111, Australia

Received 15 April 2005; received in revised form 30 May 2006; accepted 29 July 2006

Abstract

Color is a powerful feature for tracking deformable objects in image sequences with complex backgrounds. The color particle filter has
proven to be an efficient, simple and robust tracking algorithm. In this paper, we present a hybrid valued sequential state estimation
algorithm, and its particle filter-based implementation, that extends the standard color particle filter in two ways. First, target detection
and deletion are embedded in the particle filter without relying on an external track initialization and cancellation algorithm. Second, the
algorithm is able to track multiple objects sharing the same color description while keeping the attractive properties of the original color
particle filter. The performance of the proposed filter are evaluated qualitatively on various real-world video sequences with appearing
and disappearing targets.
 2006 Elsevier B.V. All rights reserved.

Keywords: Visual tracking; Particle filter; Hybrid sequential estimation; Multiple-target tracking

1. Introduction ical method that allows to find an approximate solution to


the sequential estimation [4], has been successfully used in
Tracking moving objects in video sequences is a central many target tracking problems [1] and visual tracking
concern in computer vision. Reliable visual tracking is problems [5]. Its success, in comparison with the Kalman
indispensable in many emerging vision applications such filter, can be explained by its capability to cope with mul-
as automatic video surveillance, human–computer interfac- ti-modal measurements densities and non-linear observa-
es and robotics. Traditionally, the tracking problem is for- tion models. In visual tracking, multi-modality of the
mulated as sequential recursive estimation [1]: having an measurement density is very frequent due to the presence
estimate of the probability distribution of the target in of distractors – scene elements which has a similar appear-
the previous frame, the problem is to estimate the target ance to the target [6]. The observation model, which relates
distribution in the new frame using all available prior the state vector to the measurements, is non-linear because
knowledge and the new information brought by the new image data (very redundant) undergoes feature extraction,
frame. The state–space formalism, where the current a highly non-linear operation.
tracked object properties are described in an unknown
state vector updated by noisy measurements, is very well 1.1. Joint detection and tracking
adapted to model the tracking. Unfortunately the sequen-
tial estimation has an analytic solution under very restric- Our work evolves from the adaptive color-based particle
tive hypotheses. The well known Kalman filter [2,3] is filter proposed independently by [7,8]. This color-based
such a solution, and is optimal for the class of linear Gauss- tracker uses color histograms as image features following
ian estimation problems. The particle filter (PF), a numer- the popular Mean-Shift tracker by Comaniciu et al. [9].
Since color histograms are robust to partial occlusion,
*
Corresponding author. Tel.: +32 10 479353; fax: +32 10 472089. are scale and rotation invariant, the resulting algorithm
E-mail address: czyz@tele.ucl.ac.be (J. Czyz). can efficiently and successfully handle non-rigid deforma-

0262-8856/$ - see front matter  2006 Elsevier B.V. All rights reserved.
doi:10.1016/j.imavis.2006.07.027

Please cite this article in press as: J. Czyz et al., A particle filter for joint detection and tracking of color objects, Image Vis. Comput.
(2006), doi:10.1016/j.imavis.2006.07.027
ARTICLE IN PRESS

2 J. Czyz et al. / Image and Vision Computing xxx (2006) xxx–xxx

tion of the target and rapidly changing dynamics in com- incorporating the number of visible objects into the state vec-
plex unknown background. However, it is designed for tor. The multi-blob observation likelihood is based on filter
tracking a single object and uses an external mechanism bank responses which may come from the background image
to initialize the track. If multiple targets are present in or one of the object models. In contrast to BraMBLe, which
the scene, but they are distinctive from each other, they requires background and foreground models, the method
can be tracked independently by running multiple instances that we propose does not need a background estimation
of the color particle filter with different target models. In and can be used directly in camera moving sequences. More-
contrast, when several objects sharing the same color over, using color for describing the targets leads to small
description are present in the scene (e.g. football game tele- state vector size in comparison with contour tracking. The
cast, recordings of colonies of ants or bees, etc.), the color low dimensionality of the state–space permits the use of a
particle filter approach fails either because particles are smaller number of particles for the same estimzation accuracy.
attracted by the different objects and the computed state This allows the algorithm to track many similar objects given
estimates are meaningless, or particles tend to track only some computational resources.
the best-fitting target – a phenomenon called coalescence Classically the problem of visual tracking of multiple
[10]. In both cases alternative approaches must be found. objects is tackled using data association techniques [14]. Ras-
In this paper we develop a particle filter which extends mussen and Hager [6] use the Joint Probabilistic Data Asso-
the color particle filter of [7,8] by integrating the detection ciation Filter (JPDAF) to track several different objects.
of new objects in the algorithm, and by tracking multiple Their main concern is to find correspondence between image
similar objects. Detection of new targets entering the scene, measurements and tracked objects. Other approaches
i.e., track initialization, is embedded in the particle filter include [15,16] where moving objects are modeled as blobs,
without relying on an external target detection algorithm. i.e., groups of connected pixels, which are detected in the
Also, the proposed algorithm can track multiple objects images using a combination of stereo vision and background
sharing the same color description and moving within the modeling. MacCormick and Blake [10] studied particle-
scene. The key feature in our approach is the augmentation based tracking of multiple identical objects. They proposed
of the state vector by a discrete-valued variable which rep- an exclusion principle to avoid the coalescence onto the
resents the number of existing objects in the video best-fitting target when two targets come close to each other.
sequence. This random variable is incorporated into the This principle prevents a single image feature from being
state vector and modeled as an M-state Markov chain. In simultaneously reinforced by mutually exclusive hypotheses
this way, the problem of joint detection and tracking of (either the image feature is generated by one target or by the
multiple objects translates into a hybrid valued (continu- other, but not both at the same time). Our approach circum-
ous–discrete) sequential state estimation problem. We pro- vents the need of the exclusion principle by integrating into
pose a conceptual solution to the joint detection and the filter the target-merge operation (one target occludes
tracking problem in the Bayesian framework and we imple- the other) and target-split operation (the two targets are vis-
ment it using sequential Monte Carlo methods. Experimen- ible again) through the object existence variable. Recently
tal results on real data show that the tracker, while keeping the mixture particle filter has been proposed [17]. In this
the attractive properties of the original color particle filter, work, the coalescence of multiple targets is avoided by main-
can detect objects entering or leaving the scene; it keeps an taining the multi-modality of the state posterior density over
internal list of observable objects (that can vary from 0 to a time. This is done by modeling the state density as a non-
predefined number) without the need of external detection parametric mixture. Each particle receives, in addition to a
and deletion mechanisms. weight, an indicator describing to which mixture component
it belongs. A re-clustering procedure must be applied regu-
1.2. Related work larly to take into account appearing and disappearing
modes.
Mixed or hybrid valued (continuous–discrete) sequential
state estimation, and its particle-based solution, have been 1.3. Paper organization
successful in many video sequence analysis problems. In
[11], the proposed tracking algorithm switches between dif- The paper is organized as follows. In the next section
ferent motion models depending on a discrete label, included we formulate joint detection and tracking of multiple
in the state vector, which encodes which one of a discrete set targets as a sequential state estimation problem. First
of motion models is active. Black and Jepson proposed a we explain how the discrete variable denoting the num-
mixed state–space approach to gesture/expression recogni- ber of targets is modeled. Then the two step conceptual
tion [12]. First several models of temporal trajectories are solution to the hybrid estimation problem is given. In
trained. Next the models are matched against new unknown Section 3, we present a numerical solution to the esti-
trajectories using a PF-based algorithm in which the state mation problem which is obtained by a particle filter
vector contains a label of the model that matches the using color histograms as target features. Section 4 is
observed trajectory. The Bayesian multiple-blob tracker devoted to experiments. Conclusions are given in the last
[13], BraMBLe, manages multiple blob tracking also by section.

Please cite this article in press as: J. Czyz et al., A particle filter for joint detection and tracking of color objects, Image Vis. Comput.
(2006), doi:10.1016/j.imavis.2006.07.027
ARTICLE IN PRESS

J. Czyz et al. / Image and Vision Computing xxx (2006) xxx–xxx 3

2. Sequential recursive estimation results in a multi-object state vector Xk ¼ ðxT1;k ; . . . ; xTM;k ÞT


where superscript T stands for the matrix transpose [19].
The aim is to perform simultaneous detection and track- However the detection of appearing objects and deletion
ing of objects described by some image features (we chose of disappearing objects is not integrated in the estimation
the color histograms), in a video sequence Zk = {z1, process. To accommodate the Bayesian estimation with
z2, . . ., zk}, where zj, is the image frame at discrete-time detection and varying number of targets, the state vector
(sequence) index j = 1, . . ., k. This task is to be performed Xk is augmented by a discrete variable Ek, which we call
in a sequential manner, that is as the image frames become existence variable, which denotes the number of visible or
available over time. In the following, we first briefly review existing objects in the video frame k. The problem then
sequential recursive estimation for the case of a single tar- becomes a jump Markov or hybrid estimation problem
get. The reader is referred to [18,1] for more details. We [20]. In the following we first describe how the existence
then provide in detail the proposed formal recursive solu- variable Ek is modeled, we then present the proposed mul-
tion for multiple-target tracking. ti-target sequential estimation.

2.1. Bayesian estimation for single-target tracking 2.2.1. Object detection and deletion
The existence variable Ek is a discrete-valued random
In the state–space formalism, the state of the target is variable and E 2 E ¼ f0; 1; . . . ; Mg with M being the max-
described by a state vector xk containing parameters such imum expected number of objects. The dynamics of this
as target position, velocity, angle, or size. The target random variable is modeled by an M-state Markov chain,
evolves according to the following discrete-time stochastic whose transitions are specified by an (M + 1) · (M + 1)
model called the dynamic model transitional probability matrix (TPM) P = [pij], where
xk ¼ f k1 ðxk1 ; vk1 Þ; ð1Þ pij ¼ PrfEk ¼ jjEk1 ¼ ig; ði; j 2 EÞ ð5Þ
where fk1 is a known function and vk1 is the process is the probability of a transition from i objects existing at
noise. Equivalently, target evolution can be characterized time k  1 toPj objects at time k. The elements of the
M
by the transition density p(xkjxk1). The target state is relat- TPM satisfy j¼1 pij ¼ 1 for each i; j 2 E. The dynamics
ed to the measurements via the observation model of variable E is fully specified by the TPM and its initial
zk ¼ hk1 ðxk ; wk Þ; ð2Þ probabilities at time k = 0, i.e., li = Pr{E0 = i}, for
i = 0,1, . . ., M.
where hk1 is a known function and wk1 is the measure- For illustration, if we were to detect and track a single
ment noise. Again, the observation density p(zkjxk) charac- object (i.e., M = 1), the TPM is a 2 · 2 matrix given by:
terizes equivalently the relationship between the state  
ð1  P b Þ P b
vector and the measurement. Given the sequence of all P¼ ;
available measurements Zk = {zi, i = 1, . . ., k}, we seek Pd ð1  P d Þ
p(xkjZk). The Bayesian estimation allows to find p(xkjZk) where Pb and Pd represent the probability of object ‘‘birth’’
in a recursive manner, i.e., in terms of the posterior density (entering the scene) and ‘‘death’’ (leaving the scene), respec-
at previous time step p(xk1jZk1). The conceptual solution tively. Similarly, for M = 2, a possible Markov chain which
is found in two steps. Using the transition density one can does not allow transitions from zero to two objects and
perform the prediction step: from two to zero objects, is shown in Fig. 1. The TPM of
Z this model is given by:
pðxk jZk1 Þ ¼ pðxk jxk1 Þpðxk1 jZk1 Þdxk1 : ð3Þ 2 3
ð1  P b Þ P b 0
6 7
The prediction step makes use of the available knowledge P ¼ 4 Pd ð1  P d  P m Þ P m 5:
of target evolution encoded in the dynamic model. The up- 0 Pr ð1  P r Þ
date step uses the measurement zk, available at time k, to
Again Pb, Pd, Pm and Pr are the design parameters. For
update the predicted density:
higher values of M a similar model must be adopted.
pðxk jZk Þ / pðzk jxk Þpðxk jZk1 Þ: ð4Þ
2.2.2. Formal solution and state estimates
These two steps, repeated for each frame, allow to compute
This section describes the conceptual solution to inte-
recursively the state density for each frame.
grated detection and tracking of multiple objects in the
sequential Bayesian estimation framework.
2.2. Bayesian estimation for detection and multi-target
tracking

In the case of multiple targets, each target i is character-


ized by one state vector xi,k and one transition density
p(xkjxk1) if independent motion models are assumed. This Fig. 1. A Markov chain of Ek variable for M = 2.

Please cite this article in press as: J. Czyz et al., A particle filter for joint detection and tracking of color objects, Image Vis. Comput.
(2006), doi:10.1016/j.imavis.2006.07.027
ARTICLE IN PRESS

4 J. Czyz et al. / Image and Vision Computing xxx (2006) xxx–xxx

Let us first introduce a new state vector yk, which con- for j = 0, . . ., M. Eq. (9) is a prediction step because on its
sists of variable Ek and the state vector xi,k for each ‘‘exist- right hand side (RHS) features the posterior pdf at time
ing’’ object i. The size of yk depends on the value of Ek that k  1. Further simplification of Eq. (11) follows from
is: pðXmk ; Ek ¼ mjXjk1 ; Ek1 ¼ j; Zk1 Þ
8
>
> Ek if Ek ¼ 0; ¼ pðXmk jXjk1 ; Ek ¼ m; Ek1 ¼ jÞPrfEk ¼ mjEk1 ¼ jg: ð12Þ
> T
>
>
> ½x1;k Ek 
T
if Ek ¼ 1;
>
> Note that the second term on the RHS of Eq. (12) is an ele-
< T T T
yk ¼ ½x1;k x2;k Ek  if Ek ¼ 2; ; ð6Þ ment of the TPM, i.e., Pr{Ek = mjEk1 = j} = pjm. Assum-
>
> ing that objects’ states (kinematics and size parameters) are
> .
> .. .
..
>
> mutually independent, the first term of the RHS of Eq. (12)
>
>
: ½xT . . . xT E T if E ¼ M; can be expressed as:
1;k M;k k k

where xm,k is the state vector of object m = 1, . . ., Ek at time pðXmk jXjk1 ; Ek ¼ m; Ek1 ¼ jÞ
8 m
k. Given the posterior density p(yk1jZk1), and the latest > Q
>
> pðxi;k jxi;k1 Þ if m ¼ j;
available image zk in the video sequence, the goal is to con- >
>
>
> i¼1
struct the posterior density at time k, that is p(ykjZk). This >
<Q j Qm
problem is an instance of sequential hybrid estimation, ¼ pðxi;k jxi;k1 Þ pb ðxi;k Þ if m > j; ;
>
> i¼1 i¼jþ1
since one component of the state vector is discrete valued, >
>
>
> Qj
while the rest is continuous valued. >
> ½pðxi;k jxi;k1 Þdi if m < j;
:
Once the posterior pdf p(ykjZk) is known, the probabil- i¼1
ity Pm = Pr{Ek = mjZk} that there are m objects in a video where
sequence at time k is computed as the marginal of p(ykjZk),
i.e.: • p(xi,kjxi,k1) is the transitional density of object i,
Z Z
defined by the object dynamic model, see Eq. (1). For
Pm ¼ . . . pðx1;k ; . . . ; xm;k ; Ek ¼ mjZk Þ
simplicity, we assume independent motion models of
 dx1;k . . . dxm;k ð7Þ the targets. In theory nothing prevents from creating
joint motion models. However, this would require huge
for m = 1, . . ., M. The case m = 0 is trivial, since in this case amounts of data to train the models.
p(ykjZk) reduces to Pr{Ek = 0jZk}. The MAP estimate of • pb(xi,k) is the initial object pdf on its appearance, which
the number of objects at time k is then determined as: in the Bayesian framework is assumed to be known
^ k ¼ arg max P m :
m ð8Þ (subscript b stands for ‘‘birth’’). For example, we can
m¼0;1;...;M
expect the object to appear in a certain region (e.g. along
This estimate provides the means for automatic detec- the edges of the image), with a certain velocity, length
tion of new object appearance and the existing object and width. If this initial knowledge is imprecise, we
disappearance. The posterior pdfs of state components can model pb(xi,k) with a uniform density.
corresponding to individual objects in the scene are then • d1, d2, . . ., dj, which features in the case m < j, is a ran-
computed as the marginals of pdf pðx1;k ; . . . ; xm;k ^ ; dom
Pj binary sequence, such that di 2 {0,1} and
Ek ¼ mjZ
^ k Þ. i¼1 d i ¼ m. Note that the distribution of the di (which
The two step procedure consisting of prediction and may depend on xi) reflect our knowledge on disappear-
update is described in the following. ance of object i. Again, Pr{di = 0jxi} might be higher if
the object xi is close to the edges of the image. If this
2.2.3. Prediction step knowledge is imprecise, we can model these distributions
Suppose that m objects are present and visible in the by uniform distributions.
scene with 0 6 m 6 M. In this case Ek = m and the predict-
ed state density can be expressed as:
2.2.4. Update step
X
M
pðx1;k ; . . . ; xm;k ; Ek ¼ mjZk1 Þ ¼ pj ; ð9Þ The update step results from the application of the
j¼0 Bayes rule and formally states:
where, using notation pðXmk ; Ek ¼ mjZk Þ
Xjk  x1;k . . . xj;k ; ð10Þ pðzk jXmk ; Ek ¼ mÞpðXmk ; Ek ¼ mjZk1 Þ
¼ ; ð13Þ
pðzk jZk1 Þ
we have
Z where pðXmk ; Ek ¼ mjZk1 Þ is the prediction density given
pj ¼ pðXmk ; Ek ¼ mjXjk1 ; Ek1 ¼ j; Zk1 Þ by Eq. (9) and pðzk jXmk ; Ek ¼ mÞ is the image likelihood
function. From image zk we extract a set of features
 pðXjk1 ; Ek1 ¼ jjZk1 ÞdXjk1 ð11Þ qi,k, for each of i = 1, . . ., m objects, and use them as

Please cite this article in press as: J. Czyz et al., A particle filter for joint detection and tracking of color objects, Image Vis. Comput.
(2006), doi:10.1016/j.imavis.2006.07.027
ARTICLE IN PRESS

J. Czyz et al. / Image and Vision Computing xxx (2006) xxx–xxx 5

measurements (the choice of the features is discussed in as image feature. This choice is motivated by ease of imple-
Section 3.2). Assuming these features are mutually inde- mentation, efficiency and robustness. However, the general
pendent from one object to another, we can replace framework for joint detection and tracking can be adapted
pðzk jXmk ; Ek ¼ mÞ with to other observation features such as appearance models
Y
m [21].
pðq1;k ; . . . ; qm;k jXmk ; Ek ¼ mÞ ¼ pðqi;k jxi;k Þ: ð14Þ
i¼1 3.1. Dynamic model
As the image region where qi,k is computed is encoded in
the state vector xi,k there is no problem of associating mea- The state vector of a single object typically consists of
surements to state vectors. The assumption that object fea- kinematic and region parameters. We adopt the following
tures are independent for different objects holds only if the state vector
image region inside which we compute object features are xk ¼ ½xk y k H k W k T ; ð15Þ
non-overlapping. In the overlapping case, the same image
feature is attributed to two different objects, which is not where (xk,yk) denotes the center of the image region (in our
realistic. In order to prevent several objects from being case a rectangle) within which the computation of object’s
associated to the same feature, we impose that the distance color histogram is carried out; Hk and Wk denote the image
between two objects cannot be smaller than a threshold. region parameters (in our case its width and height); super-
Therefore when a target A passes in front of a target B script T in Eq. (15) stands for the matrix transpose. Object
and occludes it, only one target will be existing. The draw- motion and the dynamics of its size are modeled by a ran-
back is that when B reappears, there must be some logic dom walk, that is the state equation is linear and given by:
that says ‘‘OK this is B again’’. For the time being, B will xk ¼ xk1 þ wk1 : ð16Þ
be simply re-detected and considered as a new object. To
Process noise wk1 in Eq. (16) is assumed to be white, zero-
solve this problem one option is to have a ‘‘track manage-
mean Gaussian, with the covariance matrix Q. Other mo-
ment system’’ on top of the presented algorithm. This sys-
tion models (e.g. constant velocity) and higher dimensional
tem would store the position, heading and speed (and
state vectors (e.g. one could include the aspect ratio change
possibly other useful attributes) of the object when it disap-
rate of the image region rectangle in the state vector) might
pears behind the occlusion and compare it to the position
be more appropriate depending on the application.
of a new object that appears in the vicinity (both in space
and time). Using simple heuristics the correspondence be-
3.2. Color measurement model
tween the disappearance/apparition could be established
in many cases. Another option is to use a multi-camera set-
Following [9,7,8], we do not use the entire image zk as
up. The proposed algorithm would output a list of 2D
the measurement, but rather we extract from the image
target positions for each camera view that are collected
the color histogram qk, computed inside the rectangular
to a central data association module. This module would
region whose location and size are specified by the state
perform data association and output 3D positions of the
vector xk: the center of the region is in (xk, yk); the size
targets. The problem of occlusion is largely avoided in this
of the region is determined by (Hk, Wk).
case since when a target is occluded in one view, it is often
We adopt the Gaussian density for the likelihood func-
visible in the other.
tion of the measured color histogram as follows:
The described conceptual solution for simultaneous
 
detection and tracking of a varying number of objects next 2 1 D2k
has to be implemented. Note that the hybrid sequential pðqk jxk Þ / NðDk ; 0; r Þ ¼ pffiffiffiffiffiffi exp  2 ; ð17Þ
2pr 2r
estimation is non-linear even if the dynamic and observa-
tion models are linear [1], a Kalman filter is therefore inap- where Dk = dist[q*, qk] is the distance between (i) the refer-
propriate for solving the problem and one must look for ence histogram q* of objects to be tracked and (ii) the his-
approximations. togram qk computed from image zk in the region defined by
the state vector xk. The standard deviation r of the Gauss-
ian density in Eq. (17) is a design parameter.
3. Color-based particle filter Suppose q* = {q*(u)}u=1, . . ., U and qk = {qk(u)}u=1, . . ., U
are the two histograms calculated over U bins. We adopt
The general multi-target sequential estimation presented the distance Dk between two histograms derived in [9] from
in the previous section can be adapted to different applica- the Bhattacharyya similarity coefficient, defined as:
tions by an appropriate choice of the dynamic and observa- vffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi
u
tion models. In visual tracking the state vector u X U pffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi

characterizes the target (region or shape parameters). The Dk ¼ t1  q ðuÞqk ðuÞ: ð18Þ
u¼1
observation model reflects which image features will be
used to update the current state estimate. As in [9,7,8], The computation of histograms is typically done in the
we use color histograms extracted from the target region RGB space or HSV space [8]. A weighting function, which

Please cite this article in press as: J. Czyz et al., A particle filter for joint detection and tracking of color objects, Image Vis. Comput.
(2006), doi:10.1016/j.imavis.2006.07.027
ARTICLE IN PRESS

6 J. Czyz et al. / Image and Vision Computing xxx (2006) xxx–xxx

assigns smaller weights to the pixels that are further away Table 1. The input to the PF are the particles at time
from the region center, is often applied in computing the k  1 and the image at time k; the output are the particles
histograms. In this way the reliability of the color distribu- at time k.
tion is increased when boundary pixels belong to the back- Next we describe in more detail each step of the algo-
ground or get occluded. rithm of Table 1.
In the framework of multiple object, we must extract
multiple histograms from the measurement zk as specified • The first step in the algorithm represents random transi-
by Eq. (14). Therefore, based on Eq. (17), the multi-object tion of Enk1 to Enk based on the TPM P. This is done by
observation likelihood becomes implementing the rule that if Enk1 ¼ i then Enk should be
( ) set to j with probability pij. See for example [1] (Table
m 1 1 X m
2 3.9) for more details.
pðzk jXk ; Ek ¼ mÞ / pffiffiffiffiffiffi exp  2 D ; ð19Þ
2pr 2r i¼1 i;k • Step 2.a of Table 1 follows from Eq. (13). If Enk1 ¼ Enk ,
then we draw xni;k  pðxk jzk ; xni;k1 Þ for i ¼ 1; . . . ; Enk . In
where Di,k = dist[q*, qi,k] is the distance between ith object
our implementation we use the transitional prior for this
color histogram and the reference color histogram. Note
purpose, that is xni;k  pðxk jxni;k1 Þ. If the number of
that qi,k is the color histogram computed from zk in the re-
objects is increased from k  1 to k, i.e., Enk1 < Enk , then
gion specified by xi,k.
for the objects that continue to exist we draw xni;k using
We point out that the described measurement likelihood
the transitional prior (as above), but for the newborn
function is not necessarily restricted to color video sequences
objects we draw particles from pb(xk). Finally if
– recently it has been applied for detection and tracking of
Enk1 > Enk , we select at random Enk objects from the pos-
objects in monochromatic FLIR imagery [22].
sible Enk1 , with equal probability. The selected objects
continue to exist (the others do not) and for them we
3.3. Particle filter
draw particles using the transitional prior (as above).
• Step 2.b follows from Eqs. (14) and (19). In order to per-
Particle filters are sequential Monte Carlo techniques
form its role of a detector, the particle filter computes its
specifically designed for sequential Bayesian estimation
importance weights based on the likelihood ratio
when systems are non-linear and random elements are
non-Gaussian. The hybrid estimation presented in the pre- Ym
pðqi;k jxi;k Þ
Lk ðmÞ ¼ ; ð20Þ
vious section, with the dynamic model and the highly non- pðqBi;k jxi;k Þ
i¼1
linear observation model described above, is carried out
using a particle filter. Particle filters approximate the pos- where qBi;k is the color histogram of the image background
terior density p(ykjZk) by a weighted set of random samples computed in the region specified by xi, k. Using Eq. (19),
or particles. In our case, a particle of index n is character- the likelihood ratio can be computed for each existing par-
ized by a certain value of Enk variable and the corresponding ticle n as
( Enk 
)
number of state vectors xni;k where i ¼ 1; . . . ; Enk , i.e.,
1 X 2  2 
h i n n
Lk ðEk Þ ¼ exp  2 n n;B
Di;k  Di;k ; ð21Þ
ynk ¼ Enk ; xn1;k ; . . . ; xnEn ;k ðn ¼ 1; . . . ; N Þ; 2r i¼1
k

where N is the number of particles. The pseudo-code of where


the main steps of this filter (single cycle) are presented in h i
Dni;k ¼ dist q ; qni;k ðzk Þ ð22Þ

Table 1 is the distance between the reference histogram q* and the


Particle filter pseudo-code (single cycle) histogram qni;k computed from zk in the region specified by
½fynk gNn¼1  ¼ PF½fynk1 gNn¼1 ; zk  xni;k and
(1) Transitions of Ek1 variable (random transition of the number of h i
existing objects): Dn;B  n
i;k ¼ dist q ; qi;k ðzB Þ ð23Þ
½fEnk gNn¼1  ¼ ETrans½fEnk1 gNn¼1 ; P
(2) FOR n = 1:N is the distance between the reference histogram q* and the
a. Based on ðEnk1 ; Enk Þ pair, draw at random xn1;k ; . . . ; xnEn ;k ;
k histogram qni;k ðzB Þ computed at the same position but from
b. Evaluate importance weight w ~ nk (up to a normalizing
constant) using Eq. (24). the background image zB. The unnormalized importance
(3) END FOR weights are computed for each particle as:
(4) Normalize importance weights 
1 if Enk ¼ 0;
a. Calculate total weight: t ¼ SUM½f~ wnk gNn¼1  n
~k ¼
w ð24Þ
b. FOR n = 1:N Lnk ðEnk Þ if Enk > 0:
• Normalize: wnk ¼ t1 w ~ nk PEnk n
END FOR Note that if the distance sum i¼1 Di;k in
PEq. (21) is smaller
Enk n;B
(5) Resample: than the background distance sum i¼1 i;k , then the
D
½fynk ; ; gNn¼1  ¼ RESAMPLE½fynk ; wnk gNn¼1  weight w n
~ i;k is greater than 1, and this particle has a better
(6) Compute the output of the PF chance of survival in the resampling step.Strictly speaking,

Please cite this article in press as: J. Czyz et al., A particle filter for joint detection and tracking of color objects, Image Vis. Comput.
(2006), doi:10.1016/j.imavis.2006.07.027
ARTICLE IN PRESS

J. Czyz et al. / Image and Vision Computing xxx (2006) xxx–xxx 7

Fig. 2. Surveillance camera sequence: detected and tracked persons are marked with a rectangle.

Please cite this article in press as: J. Czyz et al., A particle filter for joint detection and tracking of color objects, Image Vis. Comput.
(2006), doi:10.1016/j.imavis.2006.07.027
ARTICLE IN PRESS

8 J. Czyz et al. / Image and Vision Computing xxx (2006) xxx–xxx

the computation of Lk requires that the color histogram of matrix, with approximately 5% of the particles in the state
the image background is known at every location of the with Ek = mk1 ± 1. The probability that the number of
image. In many cases this is impractical (especially when objects remains unchanged is accordingly set to 0.90. This
the camera is moving and the background is varying), simplification of the TPM means that if two objects appear
hence we approximate the distance DBi;k between the target at the same time, the estimate of the object number m ^ will
histogram and the background histogram as constant over be incremented in two steps.
all the image and for i = 1, . . ., M. Introducing the constant The distribution of newborn particles pb(xi,k) was
C B ¼ expfðDBi;k Þ2 =2r2 g the likelihood ratio becomes adopted to be a uniform density over the state vector vari-
  ( ) ables (i.e., no prior knowledge as to where the objects will
Ym
1 1 X m
appear). Finally, the color histograms were computed in
B 2 2
Lk ¼ exp ðD Þ exp  2 D ð25Þ
2r2 i;k 2r i¼1 i;k the RGB color space using 8 · 8 · 8 bins as in [8]. The tar-
i
( ) get histogram is acquired using a few training frames. In
m 1 X m
each frame, the target region is selected manually and a his-
¼ðC B Þ exp  2 D2 : ð26Þ
2r i¼1 i;k togram is computed from the region. The target histogram
is obtained by averaging the histograms obtained for each
We thus treat the constant CB as a design parameter as in frame.
[7]. CB is adapted to take into account the similarity be- The number of particles required by the filter depends
tween the target and the background histogram. An addi- on the selected value of M and the prior knowledge on
tional condition must be added. To prevent a region where the objects are likely to appear (this knowledge is
histogram from being attributed to two overlapping objects modeled by pb(xi,k)). For M = 1, the filter with up to
and to avoid the appearance of multiple objects on the N = 150 particles achieves adequate accuracy both for
same image region, the weight of the particle is set to zero detection and estimation. For M = 6 identical objects, it
if two objects are too close to each other, i.e., w ~ nk ¼ 0 if was necessary to use N = 5000 particles in order to detect
n n 2 n n 2 2 2
ðxj;k  xi;k Þ þ ðy j;k  y i;k Þ < R where R is a constant fixed rapidly new appearing objects. This number can certainly
by the user. be decreased using a better prior pb(xi,k) or a more ade-
quate dynamic model depending on the application. Also,
• For the resampling step 5, standard O(N) algorithms that the transition density that we use as proposal density
exist, see for example Table 3.2 in [1]. is not optimal [23]. As we are primarily interested in testing
• The output of the PF (step 6) is carried out for the the viability of the algorithm, we used the simplest models
reporting purposes, and consists of estimation of the and the least prior knowledge in order to stay independent
number of objects m ^ and the estimation of objects’ of a particular application.
states. The number of objects is estimated based on In the first example the objective is to detect and
Eq. (8), where Pr{Ek = mjZk} is computed in the PF as: track two different objects (i.e., two humans with differ-
ent color histograms) in a video sequence recorded with
1 XN
PrfEk ¼ mjZk g ¼ dðEnk ; mÞ ð27Þ a surveillance camera. The image resolution is 435 ·
N n¼1 343 pixels. The first person (person A) wears a white
and d(i, j) = 1, if i = j, and zero otherwise (Kroneker delta). shirt, with a black tie and his pants are black. The sec-
The estimate of the state vector of object i ¼ 1; . . . ; m
^ is
then
P
N
xni;k dðEnk ; iÞ
n¼1
^i;kjk ¼
x : ð28Þ
P
N
dðEnk ; iÞ
n¼1

4. Experimental results

Experiments were conducted on several real-world


image sequences. The sequences can be found at http://eu-
terpe.tele.ucl.ac.be/Tracking/pf.html. The practical details
of the implemented algorithm are described next.
The transitional probability matrix is simplified as
described in Section 2.2.1: only transitions from mk1
objects at time k  1 to mk1 ± 1 objects at k are allowed, Fig. 3. The probability of existence for objects 1 and 2 in the video
with probability 0.05. In this way the TPM is a tri-diagonal sequence of Fig. 2.

Please cite this article in press as: J. Czyz et al., A particle filter for joint detection and tracking of color objects, Image Vis. Comput.
(2006), doi:10.1016/j.imavis.2006.07.027
ARTICLE IN PRESS

J. Czyz et al. / Image and Vision Computing xxx (2006) xxx–xxx 9

Fig. 4. Can sequence: detected and tracked cans are marked with a rectangle.

ond person (person B) is in a blue t-shirt. There are 200 camera moving slowly in order to follow person A. The
image frames available for detection and tracking, with a estimated probabilities of existence of both person A and

Please cite this article in press as: J. Czyz et al., A particle filter for joint detection and tracking of color objects, Image Vis. Comput.
(2006), doi:10.1016/j.imavis.2006.07.027
ARTICLE IN PRESS

10 J. Czyz et al. / Image and Vision Computing xxx (2006) xxx–xxx

Fig. 5. Football sequence: detected and tracked players are marked with a rectangle.

B are shown in Fig. 2. Eight selected image frames are shows an interesting event: one can is passing in front of
displayed in Fig. 2. Here we effectively run two particle the second. The filter is using N = 1000 particles with
filters in parallel, each tuned (by the reference histogram) parameters r = 0.6 and CB = 70, the image size is 640 ·
to detect and track its respective object. Each filter is 480. The two cans appear at frame 101 and are detected
using 150 particles, with r = 0.8 and CB = 30. Person at frame 151. At frame 183, one can is occluded by the
A appears in the first image frame and continues to exist other. The second object is deleted by the filter at frame
throughout the video sequence. The particle filter detects 187. At frame 226, the second can is again visible and the
it in the frame number 14: the probability of existence of filter detects its presence at frame 232. Note that the filter
person 1 jumps to the value of 1 between frames 14 and does not perform data association. As the object are not
16, as indicated in Fig. 3. A detected object/person is distinguishable, it cannot maintain the ‘‘identity’’ of the
indicated in each image by a white rectangle, located at cans when they merge and subsequently split.
the estimated object position. Person B enters the scene In the third example the objective is to detect (as they
(from the left) in frame 50 and is detected by the PF enter or leave the scene) and track the soccer players of
in frame 60. Frame 79 is noteworthy: here person B the team in red and black shirts (with white-colored num-
partially occludes person A, and this is very well reflect- bers on their back). Fig. 5 displays 12 selected frames of
ed in the drop of the probability of existence for person this video sequence, with a moving camera. The image res-
A; see again Fig. 3. In frame 160, person B leaves the olution is 780 · 320 pixels. The filter is using N = 5000 par-
scene, and hence its probability of existence drops to ticles, with parameters r = 0.6 and CB = 80. We observe
zero; person A is continued to be tracked until the last that initially five red players are present in the scene.
frame. Frames 4, 9, 35 and 67 show that the first, second, third
In the second example, the aim is to detect and track and fourth player are being detected, respectively. Hence
three identical red and white-colored cans in a cluttered ^ 67 ¼ 4. At frame 99, the first detected player leaves the
m
background. The complete sequence is 600 frames long scene. It is deleted by the filter at frame 102, m^ 102 switches
and can be viewed on the web site given above. Fig. 4 back to 3. At the same time, another player is entering the

Please cite this article in press as: J. Czyz et al., A particle filter for joint detection and tracking of color objects, Image Vis. Comput.
(2006), doi:10.1016/j.imavis.2006.07.027
ARTICLE IN PRESS

J. Czyz et al. / Image and Vision Computing xxx (2006) xxx–xxx 11

scene. It is detected at frame 141. One of the players leaves [2] R.E. Kalman, A new approach to linear filtering and prediction
the scene at the top of frame 173, and subsequently it is problems, Transaction of the ASME Journal of Basic Engineering 82
(1960) 35–45.
deleted. This demonstrates a quick response of the particle [3] G. Welch, G. Bishop, A introduction to the Kalman fitler, Technical
filter to the change of the number of objects. All three report TR 95-041.
remaining players are tracked successfully until the last [4] A. Doucet, J.F.G. de Freitas, N.J. Gordon (Eds.), Sequential Monte
frame. Carlo Methods in Practice, Springer, New York, 2001.
The algorithm processing time is of course related to the [5] M. Isard, A. Blake, Visual tracking by stochastic propagation of
conditional density, in: Proceedings of the European Conference on
number of particles needed to make the algorithm work. Computer Vision, 1996, pp. 343–356.
When the number of objects is small (i.e., 1, 2 objects), with [6] C. Rasmussen, G. Hager, Probabilistic data association methods for
‘‘gentle’’ motion (i.e., the dynamical model is accurately tracking complex visual objects, IEEE Transactions on Pattern
describing the motion), then the number of particles is Analysis and Machine Intelligence 23 (6) (2001) 560–576.
below 500. In that case, our C++ implementation can [7] P. Pérez, C. Hue, J. Vermaak, M. Gangnet, Color-based probabilistic
tracking, in: A. H. et al. (Eds.), Proceedings of the European
run at 15 frames/s on a 2.8 GHz CPU. However, in the Conference Computer Vision (ECCV), Springer-Verlag, 2002, pp.
football sequence showed in the experiments, there are five 661–675, lNCS 2350..
objects to detect and track and the required number of par- [8] K. Nummiaro, E. Koller-Meier, L. Van-Gool, An adaptive color-
ticles is then 5000. In that case the algorithm works at based particle filter, Image and Vision Computing 21 (2003) 99–110.
about 1 frame/s. Thus the main drawback of the proposed [9] D. Comaniciu, V. Ramesh, P. Meer, Real-time tracking of non-rigid
objects using mean shift, in: Proceedings of the IEEE Conference on
approach is that the number of particles increases with the Computer Vision and Pattern Recognition, Hilton Head, SC, 2000,
number of objects (i.e., the size of the state vector). An pp. II:142–149.
excellent analysis of the relationship between the state vec- [10] J. MacCormick, A. Blake, A probabilistic exclusion principle for
tor size and the number of particles was presented in [24] tracking multiple objects, International Journal on Computer Vision
and can be summarized as follows: using a smart proposal 39 (1) (2000) 57–71.
[11] M. Isard, A. Blake, A mixed-state condensation tracker with
density in the PF, this relationship can be made linear, automatic model-switching, in: Proceedings of the International
otherwise it tends to be exponential. Conference on Computer Vision, 1998, pp. 107–112.
[12] M.J. Black, A.D. Jepson, Recognizing temporal trajectories using the
5. Conclusion condensation algorithm, in: Proceedings of the 3rd International
Conference on Automatic Face and Gesture Recognition, 1998, pp.
16–21.
The paper presented a formal recursive estimation [13] M. Isard, J. MacCormick, BraMBLe: a bayesian multiple blob
method for joint detection and tracking of multiple tracker, in: Proceedings of the International Conference on Computer
objects having the same feature description. This formal Vision, 2001, pp. 34–41.
solution was then implemented by a particle filter using [14] I. Cox, A review of statistical data association techniques for motion
color histograms as object features. The performance of correspondence, International Journal of Computer Vision 10 (1)
(1993) 53–66.
the detecting and tracking algorithm was then tested [15] I. Haritaoglu, D. Harwood, L. Davis, W4s: A real-time system for
on several real-world sequences. From the results, the detecting and tracking people in 2 1/2-d, in: European Conference on
algorithm can successfully detect and track many identi- Computer vision, 1998, pp. 877–892.
cal targets. It can handle non-rigid deformation of tar- [16] C. Wren, A. Azarbayejani, T. Darrell, A. Pentland, Pfinder: Real-time
gets, partial occlusions and cluttered background. Also tracking of the human body, IEEE Transactions on Pattern Analysis
and Machine Intelligence 19 (1997) 780–785.
the experimental results confirm that the method can [17] J. Vermaak, A. Doucet, P. Perez, Maintaining multi-modality
be successfully applied even when the camera is moving. through mixture tracking, in: International Conference on Computer
The key hypothesis in the adopted approach is that the Vision, 2003, pp. 1110–1116.
background is of a sufficiently different color structure [18] S. Arulampalam, S. Maskell, N.J. Gordon, T. Clapp, A tutorial on
than the objects to be tracked. However, to alleviate this particle filters for on-line non-linear/non-Gaussian bayesian tracking,
IEEE Transactions of Signal Processing 50 (2) (2002) 174–188.
problem different observation features can be used in [19] C. Hue, J.-P.L. Cadre, P. Pérez, Tracking multiple objects with
addition to color as for example appearance models particle filtering, IEEE Transactions on Aerospace and Electronic
[21] or contours. Systems 38 (32) (2002) 791–812.
This basic algorithm can be improved in several ways. [20] G.D. Forney, The Viterbi algorithm, Proceedings of the IEEE 61
For example, color histograms can be computed in differ- (1973) 268–278.
[21] S. Zhou, R. Chellappa, B. Moghaddam, Visual tracking and
ent regions of the target (face, shirt, pants, etc.) in order recognition using appearance-adaptive models in particle filters,
to take into account the topological information [7]. Also, IEEE Transactions on Image Processing 13 (11) (2004) 1434–1456.
the number of required particles could be reduced by [22] A. Yilmaz, K. Shafique, M. Shah, Target tracking in airborne
adopting a better proposal density for existing particles forward looking infrared imagery, Image and Vision Computing 21
and a better prior density of appearing objects. (2003) 623–635.
[23] A. Doucet, S. Godsill, C. Andrieu, On sequential Monte Carlo
sampling methods for Bayesian filtering, Statistics and Computing 10
References (3) (2000) 197–208.
[24] F. Daum, J. Huang, Curse of dimensionality and particle filters, in:
[1] B. Ristic, S. Arulampalam, N. Gordon, Beyond the Kalman Filter: Proceedings of the IEEE Aerospace Conference, Big Sky, Montana,
Particle Filters for Tracking Applications, Artech House (2004). USA, 2003.

Please cite this article in press as: J. Czyz et al., A particle filter for joint detection and tracking of color objects, Image Vis. Comput.
(2006), doi:10.1016/j.imavis.2006.07.027

You might also like