Robot Visual Tracking Via Incremental Self-Updating of Appearance Model

International Journal of Advanced Robotic Systems
Robot Visual Tracking via Incremental

Self-Updating of Appearance Model

Regular Paper

Danpei Zhao
1
, Ming Lu
1
, Xuguang Zhang
2,*
and Zhiguo Jiang
1

1 Image Processing Centre, School of Astronautics, Beihang University, Beijing, Peoples Republic of China
2

Key Lab of Industrial Computer Control Engineering of Hebei Province, Institute of Electrical Engineering,
Yanshan University, Qinhuangdao, Peoples Republic of China
* Corresponding author E-mail: zhangxg@ysu.edu.cn
Received 20 Mar 2013; Accepted 12 Jun 2013

DOI: 10.5772/56759

2013 Zhao et al.; licensee InTech. This is an open access article distributed under the terms of the Creative
Commons Attribution License (http://creativecommons.org/licenses/by/3.0), which permits unrestricted use,
distribution, and reproduction in any medium, provided the original work is properly cited.

Abstract This paper proposes a target tracking method
called Incremental SelfUpdating Visual Tracking for
robot platforms. Our tracker treats the tracking problem
as a binary classification: the target and the background.
The greyscale, HOG and LBP features are used in this
work to represent the target and are integrated into a
particle filter framework. To track the target over long
time sequences, the tracker has to update its model to
follow the most recent target. In order to deal with the
problems of calculation waste and lack of model
updating strategy with the traditional methods, an
intelligent and effective online selfupdating strategy is
devised to choose the optimal update opportunity. The
strategy of updating the appearance model can be
achieved based on the change in the discriminative
capability between the current frame and the previous
updated frame. By adjusting the update step adaptively,
severewasteofcalculationtimeforneedlessupdatescan
be avoided while keeping the stability of the model.
Moreover, the appearance model can be kept away from
serious drift problems when the target undergoes
temporary occlusion. The experimental results show that
the proposed tracker can achieve robust and efficient
performance in several benchmarkchallenging video
sequenceswithvariouscomplexenvironmentchangesin
posture,scale,illuminationandocclusion.
KeywordsVisualTracking,AdaptiveAppearanceModel,
SelfUpdating, Discriminative Capability, MultiFeature
ObservationModel
1.Introduction
Visual tracking [1] [2] is one of the most important tasks
in many applications of robots, such as vehicle tracking
and control [3], military reconnaissance [4], security
surveillance [5], nonrigid target tracking [6], and
investigation of robotics [7]. The main challenges related
to visual tracking include scale and posture variation of
the object, illumination change, complex and clutterlike
background, motion or viewpoint of the camera [8] and
occlusion, inevitably causing large appearance variation.
All these difficulties cause a target to change in a video
sequence. This problem can be overcome if the target
model can be appropriately updated during tracking.
Many methods have been proposed to achieve stable
modelupdating,suchasthatbasedonsubspacelearning,
1 Danpei Zhao, Ming Lu, Xuguang Zhang and Zhiguo Jiang:
Robot Visual Tracking via Incremental Self-Updating of Appearance Model
www.intechopen.com
ARTICLE
www.intechopen.com
Int. j. adv. robot. syst., 2013, Vol. 10, 331:2013
the adaptive mixture model, and online boosting. One of
the most representative methods of subspace learning to
model target appearance is PCA subspace algorithm [9].
Inspiredbythisidea,anincremental2DPCAisproposed
in [10] to significantly reduce the computational cost of
2D image signal; also, an incremental linear discriminant
analysis (ILDA) is proposed in [11] to separate target
samples and background samples. For the mixture
model, the most popular method is to integrate a
Gaussian mixture model into an online expectation
maximization (EM) framework to handle target
appearance variations during tracking [12]. Online
boostingbased work was proposed for feature selection
and successfully applied for target tracking in [13]. This
method was also applied to classify pixels belonging to
foregroundorbackground.
All the above methods focus on representing the

adaptiveappearancemodel.Unfortunately,asignificant
problem is neglected, i.e., how to select the optimal
update opportunity, which directly influences the
accuracyandefficiencyoftheappearancemodelupdate
[14]. Almost all stateoftheart tracking approaches,
based on incremental adaptive appearance model,
rigidly employ a mechanism that updates the
appearancemodelineveryframeorinafixedstep(e.g.,
every five frames). However, this kind of updating
strategy lacks adaptability because the appropriate
update time is hard to determine. The longstep
updating strategy cannot keep up with the changes of
the target; the shortstep updating strategy is
vulnerable to the drifting problem, and is more time
consuming. There are several complicated situations
that should be considered for model updating: 1) the
model should be updated when the background or the
target encounters significant changes in video
sequences, 2) the model should not be updated
frequentlyifthe model maintains stable adaptability,in
order to be less timeconsuming, 3) the model should
not be updated if severe occlusion occurs to avoid the
problem of model drifting. Therefore, neither a short
step nor a longstep fixed updating strategy can
guarantee the validity of the model update, which may
degradethemodelorcausefastdriftovertime.
Thekeytoselectingasuitableupdateopportunityforthe
modelistomeasuretheadaptabilityofthecurrentmodel
duringtracking.Themodelshouldbeupdatedwhenever
adaptabilityislost.Otherwise,thetrackershouldskipthe
updatingprocedure.Aselfupdatingstrategyisproposed
in this paper,based on themeasuring adaptability ofthe
appearance model. Since model updating is used for
tracking the target, i.e., to distinguish the target and the
background, the adaptability of the appearance model
should be measured by using the discriminative
capability changes, Dfc, to determine whether the current
model needs to be updated or not. Compared with the
fixedstep update strategy, our selfupdating mechanism
can capture the optimal update opportunity and
eliminate the majority of unnecessary updates during
tracking.Toachievetheabovegoals,anIncrementalSelf
UpdatingVisualTracking(ISUVT)methodisproposedto
validate the effectiveness of our proposed updating
mechanism. In the ISUVT tracker, three features
(Greyscale, Histogram of Gradient (HOG) and Local
BinaryPattern(LBP))areusedtorepresentthetargets.A
particle filter is adopted as a search mechanism, which
predicts the target state in the next frame based on the
recent target state. Furthermore, three Incremental FDA
(IFLDA) classifiers are trained online with the ISUVT
trackerbysamplingtargetdataandbackgrounddatafor
each of the three features. According to the trained
classifiers, the current target can be located by selecting
themostsimilarparticle.Usingthepretrainedclassifiers
toclassifytargetsamplesandbackgroundsamplesinthe
currentframe, thediscriminantcapabilityofeachfeature
is obtained. Whether the current submodel should be
updated is determined by the change of discriminant
capability between the current frame and the previous
updatedframe.Ifupdatingisindeedrequired,thetracker
will repeat the IFLDA training by sampling current data
untilthebesttargetmodelisobtained.Theframeworkof
theproposedISUVTtrackerisillustratedinFig.1.
Figure1.Basicflowoftheproposedtrackingapproach
The rest of the paper is organized as follows: Section 2
brieflyintroducesthefeaturerepresentationofthetarget.
Section 3 deals with the proposed selfupdating strategy
and Section 4 shows the implementation of the ISUVT
algorithm in detail. The experimental results of the
tracker are presented in Section 5, followed by a
conclusioninSection6.
2 Int. j. adv. robot. syst., 2013, Vol. 10, 331:2013 www.intechopen.com
2.TargetrepresentationinISUVT
Thebasiccomponentofavisualtrackingsystemistarget
representation.However,ultimatelynofeatureissuitable
for all tracking conditions. Therefore, multiple features
should be extracted for tracking the target in the whole
sequence. In this work, three features are extracted to
form the adaptive model: Greyscale, HOG (Histogram of
Gradient) and LBP (Local Binary Pattern). Since ISUVT
adoptsaparticlefilterasthesearchmechanismi.e.,the
tracker detects the target state from all of the candidate
particlesin the current frame according to the prior state
everyparticlecorrespondstoacandidateimageregion.
In other words, the tracker extracts Greyscale, HOG and
LBP features for every particle. These three features can
be used to measure the similarity of every particle with
thetargetparticleofthepreviousframe.Themostsimilar
particle will be chosen for locating the target in the
currentframe.
Greyscale is the basic feature of an image. Many existing

trackers use it for target representation; however, the
Greyscale feature is vulnerable to illumination changes
and noisy data. In this paper, we combine the Greyscale
feature with HOG and LBP to form a robust feature
representation.HOG[15]isawidelyusedgradientbased
feature and is considered to be one of the best feature
descriptors to describe a targets edges and shapes. To
acceleratethecalculationofHOG,weemployanintegral
histogram [16]. LBP is a powerful feature for texture
classification; it has been determined that LBP and HOG
featurescanbetterdetecthumansinconditionsofpartial
occlusion [17]. These three features should form a stable
representation. Figure 2 gives visual examples of these
threefeatures.
Figure2.Threekindsoffeaturerepresentationofthetarget
3.Evaluationandselfupdatingoftargetmodel
Sinceitisveryhardtotrackatargetforalongtimeusing
a fixedappearance model, the target model must be able
to selfupdate. In this section, we will elaborate our self
updating strategy in detail. The tracking procedure can
bedividedintotwostages:observationandupdating.At
theobservationstage,ourtrackersamplesdatarandomly
and extracts features of each sampled data. At the
updating stage, the Dfc value between the current frame
and the last updated frame is calculated to determine
whether the current model should be updated. Part A
below shows how IFLDA (Incremental Fisher Linear
Discriminant Analysis) is used in ISUVT. Part B
introduces the observation stages. In Part C the
discriminant capability of the appearance model is
evaluated, and the updating strategy is discussed in Part
D. The optimal updating occasion will be analysed in
practiceinordertoderiveourselfupdatingstrategy.
A.FeatureclassificationbyFLDA
Many discriminant classifiers have been used for binary
classification in visual tracking applications, including
SVM and Fisher Linear Discriminant Analysis (FLDA).
Since the efficacy of Fisher Linear Discriminant Analysis
invisualtrackinghasbeendemonstratedin[11]and[18],
this classifier is employed inour ISUVT tracker. The aim
of traditional FLDA in target tracking is to find an
optimaldiscriminativegenerativemodelthatcanbeused
to separate the target samples from the background
samples and seek a linear transformation matrix, W, by
maximizing the interclass scatter matrix Sb and
minimizingtheintraclassscattermatrixSw.Byusingthe
random perturbation of the targets position in the
currentframe,thetargetandbackgroundsamplescanbe
obtained and the linear transformation matrix W can be
calculated. The optimal projection direction W is gained
byeigenvaluedecomposition:
1

w b
S S W W.(1)
Sincethetargetandbackgroundsamplesareseparatedin
transformation space, we estimate the similarity in
transformation space instead of feature space. However,
if the target appearance is changed, the transformation
space may lose its discriminant capability for target
samples and background samples. In this case, the
transformation matrix W needs to be retrained with
currentlysampleddata.
B.Observationmodelinparticlefilter
In this work, target tracking is achieved by using a
particle filter. The tracking process is governed by the
observation model of the particle filter framework, as is
wellknown.Weshallthereforeintroducetheobservation
model adopted by our tracker first. The variable

t t t t t t
, , , , X x y h w

isusedtodescribethetargetstateat
time t, representingthe target rectangular window by its
t
x ,
t
y translation, the height
t
h , the width
t
w and the
rotationangle
t
.Duringtheobservationstage,thevalue
of
t
X is estimated by the likelihood maps of particles,
drawn in the input image I. Using FLDA classifiers as
i
L
www.intechopen.com
submodels, the particle likelihood maps are obtained
correspondingtothemultiplefeatures:
( )
i
t t
2
DIFF 1
P |L exp{ }
2 2 o to
= X (2)
where isaconstantand isdefinedas
( )
( )
T
T
T 2
i i
i
B 2
i i
|| ||
DIFF
|| ||
W V
W V
(3)
whereirepresentstheithfeatureandWi istheprojection
matrixofFLDAtrainedusingtheithfeature,theterm
T
i

and
B
i
are the mean appearance vectors of target and

background samples respectively, and
( )
1 2 k
, , = V v v v
istheappearancevectorofthedrawnparticles.
The equation
( )
d d d =
1 2 k
i i i i
DIFF , , denotes the distance
ratio of drawn particles between the target class and the
background class. When is smaller than 1, the kth
particle is closer to the state of the target, which is
classified into the target class while the opposite is
classifiedintothebackgroundclass.
C.Evaluationofdiscriminantcapabilityoftargetmodel
After the previous observation step, the drawn particles
areclassifiedintotwoclasses.Inthelikelihoodmaps,the
particlesaredividedintotwoclasses:
( )
( )
1 2 m m
1 2 n n
t t t t T
i i i i i
b b b t B
i i i i i
DIFF = d ,d ,...,d d 1
DIFF = d ,d ,...,d d 1
<
>
(4)
Particles that are far away from the classification
boundary will make a greater contribution to estimating
thefinaltrackingresultandevaluatingthediscriminative
capability of the appearance model, while particles near
bytheboundarycannotbeselectedforthistask.Inorder
to eliminate the influence of trivial
i
DIFF value and
highlightthediscriminativecapability,
i
DIFF isstretched:
*
*
T T 2
T i i
i
2
1 1
B B 2
B i i
i
2
2 2
{DIFF min(DIFF )} 1
DIFF exp{ }
2 2
{DIFF min(DIFF )} 1
DIFF exp{ }
2 2
o to
o to
=
(5)
To judge class separability of the target and the
background,classvarianceisdefinedas:
( ) ( ) ( ) ( )
( ) ( ) ( ) ( )
2
2
2
2
i i
L; a E L i E L i
a i L i a i L i .
( ( =

(
=
(

var
(6)
wherea(i)denotestheprobabilitydensityfunctionofthe
variousdistribution.
TheVarianceRatio(VR)ofthetargetandthebackground
is then calculated by using the new distribution of the
particlelikelihoodmapsandthediscriminativecapability
maps,andtheappearancemodel canbeobtainedas:
( )
( ) ( )
( ) ( )
i
i i
i i
L ; p q / 2
D VR L ; p, q
L ; p L ; q
+
= =
+
var
var var
(7)
where p and q denote the distribution of the target class
andthebackgroundclass,respectively,inlikelihoodmap
Li. A high value of Di indicates that particles of both the
target class and the background class are tightly
clustered, respectively (low intraclass variance), so that
the two classes are well separated from each other (high
interclass variance); this means that the appearance
model is more able to discriminate between the target
andthesurroundingbackground.
Since ISUVT extracts three features for target

representation, our tracker can adaptively choose the
optimal feature and allocate online a different weight to
every feature for fusion in each frame, according to the
change in the target and the surrounding background.
Only a feature with high discriminative capability can
make a significant contribution to the final tracking
result. The tracking performance depends on how
distinguishable the target is from its background. The
greatestweightshouldbeassignedtothefeaturewiththe
greatest ability to discriminate between the target and
surrounding background. A fusion strategy for the
multiplefeaturesisemployedherebasedontheVariance
Ratiomethod[19][20].
Theweight isapproximatedusingthenormalizedVR.
i
i
M
j
j 1
VR(L ; p,q)
w .
VR(L ; p,q)
=
=
(8)
To compute the final state ( )

t
P |I X from input image I,
the likelihood maps ( )
t i
P |L X are fused, based on
differentfeatureextractionmechanisms
[20]:
( ) ( ) ( ) ( )
M
t t i t i
i 1
P |I P |L P L|I dL w P |L
=
= ~
}
X X X .(9)
where
denotes the particle likelihood map generated

bytheithfeaturefrompreviouslyextractedfeatures,and
the adaptive weight factor of this likelihood map is
representedas .
Once one feature fails, it will be distributed as a very

small weight. Therefore, it cannot seriously influence the
o
i
DIFF
k
i
d
i
D
i
w
i
L
i
w
tracker. Because the weights of likelihood maps are
adjusted in every frame, the tracker can adaptively
chooseonlinethebestfeaturefortracking.Whendealing
with significant challenges such as variation of posture,
scale and illumination, the tracker is more robust and
accuratethanothermethods.
D.Adaptiveselfupdatingmechanismoftheappearancemodel
During tracking, the model needs to be updated
incrementally using the new data obtained in order to
efficiently adapt to the changes in various environments.
However, most existing algorithms employ a rigid
updatemechanismwithafixedstepofframes;thus,they
cannot capture the optimal occasion to update the
appearancemodel.Inthispaper,weproposeasmartself
updatingstrategy,whichcanevaluatethestabilityofthe
appearance model and the change in the tracking
environment based on the change in the discriminative
capabilityoftheappearancemodel.
The demands for selecting the optimal update time will

be analysed first. The appearance model is more robust
and accurate when it can exactly distinguish the target
from the background. When the model loses this
discriminative capability, the tracking performance
drops. When the tracker encounters a significant
environmentalchangeforthefirsttime(e.g.,illumination
changes from bright to dark), the discriminative
capability of the model inevitably declines in the current
frame, because the model lacks prior information on the
current situation. Therefore, the model should be
updatedwhenitdegradestoacertainlevel.
Considering another situation, when the tracking

condition gradually reverts to the previous state (e.g.,
illumination changes back to bright condition), the
discriminative capability will rise in the current frame,
because the appearance model still retains classification
ability in the restored state, based on the prior
information training from previous frames. In this case,
the increase in discriminative capability shows that the
targetitselfismoredistinguishablefromthebackground
in this frame, and the tracker should collect positive and
negativesamplestostrengthenthestabilityofthemodel.
Moreover, when occlusion or accidental error occurs, the
classification is not reliable, and all the prediction
locations of the target might be classified into the
background class, which will lead to dramatic change in
discriminative capability. Because this change would
greatlyexceedthenormalrange,theocclusionorobvious
error can be detected during tracking, thus firmly
avoidingupdatinginthecurrentframe.
As discussed above, the optimal update time and the

change in the tracking surroundings can be captured by
evaluating the change in the discriminative capability of
theappearancemodelbetweenthecurrentframeandthe
lastupdatedframe.
Intheproposedselfupdatingstrategy,thediscriminative
capability of the appearance model is the result of the
fusion of the discriminative capabilities of multiple
independentsubmodels;thediscriminativecapabilityDo
resultingfrom thefusioncan beexpressedviaweighting

thediscriminativecapabilityDi
ofeachindependentsub
modelwiththenormalizedweight.
M
f i i
i 1
D w D
.(10)
The extent of change between the current frame and the
lastupdatedframeisdefinedas:
t l
f f
fc
l
f
D D
D
D
(11)
where
t
f
D and
l
f
D represent the discriminative capability
oftheappearancemodelinthecurrentframeandthelast
updatedframe,respectively.Duringtracking,thetracker
evaluates Dfc after finding the target location in every
frame to determine whether the model should be
updated.
Oncethecondition
u fc o
D issatisfied,with
u
and
o
beingdefaultthresholds,theabilitytoseparatethetarget
from the background changes within a predetermined
range and the appearance model should be updated.
Occlusion or accidental error could occur when
fc o
D ,
and the tracker should firmly avoid updating in such a
scenario.When
fc u
D ,althoughachangeispresent,the
appearance model is still stable enough to separate the
target and background accurately. Therefore, updating
can also be avoided to improve efficiency. By adopting
the selfupdating strategy, the tracker can intelligently
capture the optimal update opportunity, which is
beneficial for improving efficiency and avoiding the
seriousdriftcausedbywrongupdating.
If Dfc satisfies certain complex conditions, as mentioned

above,thetrackerneedstoupdatetheappearancemodel.
It is not necessary to repeat the retraining, calculating the
interclass scatter matrix Sb and the intraclass scatter
matrix Sw using current sampled data again. Here, we
adoptincrementallearningtotraintheFLDAclassifier,as
proposed by Lin [18]. IFLDA establishes an online,
incrementally learning and updating theory using the
previous result and the new training data, which is very
efficient and suitable for tracking and other time
sequence applications. As a result, our tracker trains
FLDA in the first frame and trains IFLDA online in the
laterframeifneeded.
www.intechopen.com
4.ImplementationsofISUVT
In this section we will describe aspects of our ISUVT
tracker in more detail, including particlefilter
framework,multiplefeaturerepresentation,evaluationof
discriminative capability and updating appearance
model. The basic flow of the proposed ISUVT tracker is
illustratedinFigure3.
BasicFlowofISUVT
Given:targetposition
t t t t t t
( , , , , ) X x y h w inlastframet
(t>1)
foreveryframeinvideosequences:
Observationstage:
1:randomlyselectparticlesset
N
t n 1
{P }

basedon
t
X
2:extractfeaturesfromNparticles
3: if
i
W does not exist in the first frame, train FLDA
classifierusingcurrenttargetandbackgrounddata
4:calculate
i
DIFF foreachfeatureusinglastupdated
i
W .
Updatingstage:
5: generate the likelihood maps of each feature using
corresponding
i
DIFF
6:usethelikelihoodmapsgeneratedbystep5tocalculate
classvarianceofeachfeature
7:calculatediscriminantcapability
i
D ofeachfeatureby
usingclassvarianceobtainedinstep6
8: calculate normalized coefficient
i
w of each feature
usingcurrentVR
9:fusethelikelihoodmapsforeachfeatureusing
i
w and

t i
P |L X
10:selecttheweightedparticleasthetargetinframet+1
11: calculate the change of discriminant capability Dfc
using
i
w
12:makedecisionforupdating
ifDfcsatisfies
u o
[ , ]
thenupdate W usingIFLDA
endif
endfor
Figure3.Basicflowoftheproposedtrackingapproach
As the last section showed, the main tracking procedure
can be divided into the observation stage and the
updatingstage.
At the particle observation stage, suppose that we

know the target state of the last frame or initial state.
Our tracker randomly selects positive and negative
samples from the target class and background class
respectively to train the IFLDA classifier online. Then,
our tracker estimates the similarity, called DIFF,
between each particle and the target of the last frame,
and generates the likelihood maps of the current
frame. The DIFF will also be used in the updating
stage since it decides our models adaptability during
tracking.
At the updating stage, we firstly evaluate the likelihood

maps using DIFF at the particle observation stage.
According to the likelihood maps, the tracker will detect
the best candidate particle as the current target state.
Then, by calculating the discriminant capability in the
currentframeweobtainthediscriminantcapabilityofthe
currentframe.Finally,wecanobtainDfc fromthecurrent
frame and last updated one. Our tracker can easily
determinewhetheritneedstobeupdatedornotbasedon
the Dfc value. If necessary, our model will be updated by
retraining the IFLDA classifier online using the recent
data.
Whenourmodelextendstomultifeaturerepresentation,
we will add feature fusion strategy in generating the
likelihood maps and calculating the discriminant
capability. In fact, each fusion weight assigned to
generate the likelihood maps and to calculate the
discriminant capability is a coefficient obtained by
normalizedVarianceRatio.
5.Experimentalresults
In this section, we design a twopart experiment to
demonstrate the efficiency and robustness of the
proposed method. Firstly, the proposed selfupdating
strategy is compared with the fixedsteps updating
strategy. Secondly, the tracking performance of ISUVT is
compared with the stateoftheart tracker Incremental
Visual Tracking (IVT) [9]. IVT is used for adjusting the
appearance model automatically based on Incremental
Principal Component Analysis widely accepted to be a
robusttrackingmethod.
The number of particles in IVT is set to 600, and every

image patch is resized to 3232 in [9]. To significantly
diminish the calculation cost and demonstrate the
validity of the proposed selfupdating strategy, the
number of particles and the size of each image patch is
reduced to 300 and 1616, respectively, in both of the
tested algorithms in this paper; other parameters of IVT
remain at the default values provided by the authors.
Furthermore, the two thresholds
u
and
o
used at the
selfupdating stage are set as 0.3 and 0.7, empirically
basedonourtrialanderrorexperiments.Ifweset
u
too
small and
o
too large, the tracker will be sensitive to
subtle changes and update its model frequently. On the
otherhand,ifwesetthetwothresholdstooclosetoeach
other (i.e., large value of
u
and small value of
o
), the
tracker will be unable to update its model. All the
parameters of the two algorithms are kept constant
duringtheexperiments.
To show the robustness of the proposed method for
dealingwithdifferentsituations,thedatasetsusedinour
work are video sequences challenging the current
benchmark, involving temporary occlusion, variation in
posture, scale, illumination, and complex background.
For all these sequences, the ground truths are manually
labelled. Except Tank andFighter which are captured
by our research group, all the other video sequences are
downloadedfrom[21].
A.Selfupdatingvs.fixedstepsupdating
InthissectionwewillcompareourISUVTwiththefixed
step updating strategy adopted by Incremental Visual
Tracking(IVT)infouraspects,asmentionedbelow.
(1)Dealingwithsevereocclusionwithminortrackingerror
The Tank videosequencescontain a moving tank with
full occlusion caused by flame and smoke after it fires. It
can be clearly seen from Fig. 4 that the tank undergoes
fullocclusionduringframe44toframe60,sothattheDfc
curve faces dramatic changes in these frames. In this
period, IVT updates three times, in frames 45, 50 and 55,
respectively, since IVT updates its model every five
frames.Asaresult,asshowninFig.5,IVTlosesitstarget
with large tracking error. However, ISUVT maintains
tracking performance with minor tracking error. Using
twothresholds[
u
,
o
]intheupdatingstage,ourtracker
can detect effective or poor updating opportunities, as
showninFig.3.
Figure4.TheDfccurveswithcorrespondingvideoframesof
Tank,ISUVT(greendot),andIVT(reddot)
Figure5.TrackingresultsofTanksequence
(2)Trackingthetargetchangedslowly
To estimate the tracking performance for a target whose
appearance model changes slowly, two sequences
(Dudek and Car4) are selected in our experiments.
The tracking results show that minor errors are put out
by the ISUVT tracker with less updating time than IVT,
whichadoptsthefixedstepsupdatingstrategy.
ThetargettrackedintheCar4sequenceisamovingcar
thatundergoeslargeilluminationandscalechanges.The
tracking results are presented in Fig. 6. The IVT tracker
gradually drifts away after a heavy illumination change
underabridgeinframe190.TheISUVTtracker,however,
can track the target quite well. From the curve shown in
Fig.8,ISUVTcanchoosetheoptimalupdatetimearound
frames56,190,235and304inthissequence.Accordingto
theresultshowinTableI,ISUVTupdates51timeswhile
IVTupdates100timesintotal.
Figure6.TrackingresultsoftheCar4sequence:ISUVT(green
dot),IVT(reddot)
Figure7.TrackingresultsoftheDudeksequence:ISUVT
(greendot),IVT(reddot)
www.intechopen.com
Figure8.DfccurveofCar4andDudek
IntheDudeksequences,thetargetisamovingfacethat
undergoes temporary occlusion and large changes in
motionandlighting.Thetrackingresultsarepresentedin
Fig. 7. From the Dfc curve shown in Fig. 8, it can be seen
that ISUVT updates in several frames of the sequence,
including frames 85 (face moving), 185 (taking off
glasses), 277 (smiling), and between frames 350 and 450
(large motion and lighting changes). When temporary
occlusion takes place, there is a dramatic change in Dfc
thatisbeyondthesetrange,sotheproposedappearance
model holds stable without being updated. The tracking
objects in the abovementioned video change slowly, but
IVT loses the target gradually while ISUVT holds it in a
stableway.Figure9showsthetrackingerrorsofIVTand
ISUVTinthesetwovideosequences.TheISUVTupdates
24 times while the IVT tracker updates 92 times (as
showninTable1).
Figure9.TrackingerrorinCar4andDudek
(3)Trackingthetargetchangedrapidly
The target tracked in the Fighter sequence is a flying
fighter jet that undergoes severe scale changes and
changes in position and rotation due to motion. From
TableIwecanseethatIVTupdates28timeswhileISUVT
updates 24 times. Although both trackers can track the
targetsuccessfully,theISUVTtrackerismoreaccuratein
both scale and location tracking, especially in the period
wherethetargetshrinkstoaverysmallsize.FromtheDfc
curves, we can clearly see that in the first half of the
sequences, ISUVT seeks fewer updating occasions
compared with IVT, since the appearance of the target
changes slowly. On the other hand, in the second half of
the sequences, ISUVT increases its updating frequency
accordingtoDfcchanges,whileIVTremainsthesame.As
aresult,ISUVTisabletofollowchangesinthefighter,as
Fig.10andFig.11show.
Figure10.TrackingresultsfortheFightersequence:ISUVT
(greendot),IVT(reddot)
Figure11.Dfccurves(Left)andtrackingerrors(Right)in
Fightersequence
(4)Comparisoninupdatingfrequency
To compare the efficiency of the two trackers, we list the
number of corresponding updates in every tested
sequence in Table.1. The IVT tracker updates the
appearancemodeleveryfiveframes.Comparedwiththe
IVT tracker, the proposed method is more efficient and
obtains more precise results with lower update
frequency.
Videos Totalframes
Updatefrequency
(IVT)
Updatefrequency
(ISUVT)
Dudek 460 92 24
Tank 124 24 10
Trellis 501 100 51
David 462 92 16
Fighter 136 28 24
Car4 500 100 36
Car11 393 78 19
Cars 245 49 11
Table1.EfficiencycomparisonofourmethodwithIVTinterms
ofupdatefrequencyforbenchmarksequences
B.Trackingthetargetinachallengingenvironment
The first part of the experiment has shown that our
proposed selfupdating strategy is superior to the fix
steps updating strategy. In the second part we also have
estimatedtherobustnessofISUVTthroughseveralvideo
sequences challenging the benchmark with various
disturbing factors such as illumination changes, posture
changes,scalechangesandcomplexbackground.

(1)Illumination,postureandscalechangingenvironment
The target tracked in the Trellis sequences is a man
walkingbyatrelliscoveredbyvines.AsshowninFig.12,
the IVT tracker and the proposedtracker are both robust
when the lighting conditions vary drastically; however,
IVT fails completely when the man also shows severe
posturechangesinframe340.FromtheDfccurve(shown
in Fig.14) we can see that the ISUVT tracker updates
frequently during the period when the target undergoes
both illumination and posture change, which guarantees
therobustnessoftheappearancemodel.
The target tracked in the David sequences is a moving

faceundergoingillumination,posture,andscalechanges.
The tracking results are shown in Fig. 13. Although both
trackerstrackthetargetsuccessfully,theISUVTtrackeris
more stable than IVT, which is especially clear in frame
173. The ISUVT tracker is more stable and robust under
the challenging conditions. The Dfc curve is presented in
Fig.14.
Figure12.TrackingresultsforTrellissequences:ISUVT(green
dotwindow),IVT(reddotwindow)
Figure13.TrackingresultsforDavidsequences:ISUVT(green
dot),IVT(reddot)
Figure14.DfccurvesforTrellisandDavidsequences
(2)Complexbackgroundandlowcontrastenvironment
The target tracked in the Car11 sequence is a moving
vehicle driven in a very dark outdoor environment.
FromthetrackingresultsshowninFig.15,itisobvious
thattheIVTtrackerfacesadriftingproblemfromframe
208.AlthoughIVTcanroughlydetectthepositionofthe
target over the whole sequence, these results are not as
accurate as those delivered by ISUVT. The Dfc curve of
thissequenceispresentedinFig.17.TheISUVTismore
efficient when carrying out the updating procedure,
with updates occurring in almost the last part of the
sequencewhenthebackgroundbecomesmorecomplex,
which effectively prevents drifting of the appearance
model.
In the Car sequence, a moving car is experiencing

interference with very similar targets in a lowconstrast
environment. The tracking results are presented in Fig.
16.TheIVTtrackerexhibitsunstablebehaviourinframes
132 and 147, when the target is close to a tree and a
similar car, respectively. Tracking performance of the
ISUVT tracker stays very good throughout the sequence.
ThecorrespondingDfccurveispresentedinFig.17.
Figure15.TrackingresultsforCar11sequence:ISUVT(green
dot),IVT(reddot)
Figure16.TrackingresultsforCarssequence:ISUVT(green
dot),IVT(reddot)
www.intechopen.com
Figure17.DfccurvesforCar11andCarssequences
To quantitatively compare robustness and accuracy, the
groundtruthofthesequenceswasmanuallylabelled.The
evaluation of tracking errors is based on the relative
position errors (in pixels) between the centre of the
tracking result and that of the ground truth. The
quantitativecomparisonresultsareshowninFig.18.Our
method produces smaller tracking error for almost all
sequences.
Figure18.Quantitativecomparisonforbenchmarksequencesof
ISUVT(redline)andIVT(blueline)intermsofpositionerrors
6.Conclusionsandfuturework
This paper has proposed a robust and efficient online
tracking algorithm and presented a smart selfupdating
strategy. Compared with the fixedstep updating
strategy,selfupdatingcanbetterdetectoptimalupdating
opportunities to solve the update efficiency and drifting
problem. From experiments carried out in different
challengingdatasets,wecandrawsomeconclusions:
1. The established selfupdating mechanism can select

theoptimalupdateopportunitybasedonthechange
in the discriminative capability of the appearance
model between the current frame and the last
updated frame. In this way, the tracker is able not
onlytoreduceunnecessarycomputationinupdating
whentheappearancemodelisstillstable,butalsoto
keep the appearance model away from significant
driftproblemswhenthetargetundergoestemporary
partialocclusion.
2. The proposed tracker (ISUVT), which uses multiple
featuresfortargetrepresentationandadoptstheself
updating strategy, can achieve higher efficiency and
greater precision with minimal updating times
comparedtoIVT.
In future work, we would attempt to introduce Multiple
Instance Learning (MIL) into our method, which can
solve the drifting problem and achieve better sampling
performance for tracking tasks over long periods. The
computing speed of the method could also be improved
in order to integrate the proposed tracker into a robotic
system.
7.Acknowledgements
This research was supported by the National Natural
Science Foundation of China (No. 60802043, No.
61071137, No. 61271409), the National Basic Research
Programme (also called 973 Programme, No.
2010CB327900), the China Postdoctoral Science
Foundation (No. 2012M510768, No. 2013T60264) and the
Natural Science Foundation of Hebei Province, China
(No. F2013203364). The authors also thank Prof. David
RossforprovidingthecodefortheIVTtracker.
8.References
[1] X. Jin, J. Du, J. Bao. Target Tracking of a Linear Time
Invariant System Under Irregular Sampling.
International Journal of Advanced Robotic Systems,
vol.9,2012.
[2] X. Zhang, S. Hu, X. Li, D. Chen. Fast covariance
matching with fuzzy genetic algorithm. IEEE
Transactions on Industrial Informatics, vol. 8, no. 1,
pp.148157,2012.
[3] B. Shirinzadeh. Visionbased tracking and control of
remotely controlled vehicles. Industrial Robot: An
InternationalJournal,vol.24,no.4,pp.297301,1997.
[4] A. Lacaze, K. Murphy, M.D. Giorno. Realtime
Reconnaissance and Autonomy for Small Robots
(RASR) team: MAGIC 2010 challenge. Journal of
FieldRobotics,vol.29,no.5,pp.729744,2012.
[5] T.TheodoridisandH.Hu.TowardIntelligentSecurity
Robots: A Survey. IEEE Transactions on Systems,
Man, and Cybernetics, Part C: Applications and
Reviews,vol.42,no.6,pp.12191230,2012.
[6] X. Zhang, X. Li, M. Liang, Y. Wang. Covariance
tracking with forgetting factor and random
sampling. International Journal of Uncertainty,
Fuzziness and KnowledgeBased Systems, vol. 19,
no.3,pp.547558,2011.
[7] X. Zhang, H. Liu, Y. Wang. Feature Fusionbased
Object Tracking for Robot Platforms. Industrial
Robot,vol.38,no.1,pp.6675,2011.
[8] X. Zhang, H. Liu, X. Li. Target Tracking for Mobile
Robot Platforms via Object Matching and
Background Antimatching. Robotics and
Autonomous Systems, vol. 58, no. 11, pp. 11971206,
2010.
[9] D.Ross,J.Lim,R.S.Lin,andM.H.Yang.Incremental
Learning for Robust Visual Tracking. International
JournalofComputerVision,vol.77,no.13,pp.125
141,2008.
[10] T. Wang, Y.H. Gu and P. Shi. Object Tracking Using
Incremental 2DPCA Learning and ML Estimation.
IEEE International Conference on Acoustics, Speech
andSignalProcessing,pp.933936,2007.
[11] R. Lin, D. Ross, J. Lim, and M.H. Yang. Adaptive
Discriminative Generative Model and Its
Applications. Advances in Neural Information
ProcessingSystems,pp.801808,2005.
[12] A.D. Jepson, D.J. Fleet, and T.F. ElMaraghi. Robust
OnlineAppearanceModelsforVisualTracking.IEEE
Transactions on Pattern Analysis and Machine
Intelligence,vol.25,no.10,pp.12961311,2003.
[13] H. Grabner and H. Bischof. Online boosting and
vision. IEEE Proceedings on Computer Vision and
PatternRecognition,pp.260267,2006.
[14] Q. Wang, F. Chen and W. Xu. An Experimental
Comparison of Online Object Tracking Algorithms.
SPIE Proceedings on Image and Signal Processing,
vol.8138,pp.111,2011.
[15] N.Dalal,B.Triggs.Histogramsoforientedgradients
for human detection. IEEE International Conference
on Computer Vision and Pattern Recognition, pp.
886893,2005.
[16] F. Porikli. Integral histogram: a fast way to extract
histograms in Cartesian spaces. IEEE International
Conference on Computer Vision and Pattern
Recognition,pp.886893,2005.
[17] X. Wang, X. Han, S. Yan. An HOGLBP Human
Detector with Partial Occlusion Handling. IEEE
International Conference on Computer Vision. pp.
3239,2009.
[18] R.S. Lin, M.H. Yang and S.E. Levinson. Object
Tracking Using Incremental Fisher Discriminant
Analysis. IEEE Proceedings on International
Conference on Pattern Recognition, vol. 2, pp. 757
760,2004.
[19] R.T. Collins, Y. Liu and M. Leordeanu. Online
Selection Of Discriminative Tracking Features. IEEE
Transactions on Pattern Analysis and Machine
Intelligence,vol.27,no.10,pp.16311643,2005.
[20] Z. Yin, F. Porikli, and R. Collins. Likelihood Map
FusionforVisualObjectTracking.IEEEWorkshopon
ApplicationsofComputerVision,pp.17,2008.
[21] http://www.cs.toronto.edu/~dross/ivt/.

www.intechopen.com

Robot Visual Tracking Via Incremental Self-Updating of Appearance Model

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Robot Visual Tracking Via Incremental Self-Updating of Appearance Model

Uploaded by

Copyright:

Available Formats

International Journal of Advanced Robotic Systems

Robot Visual Tracking via Incremental

Received 20 Mar 2013; Accepted 12 Jun 2013

All the above methods focus on representing the

Greyscale is the basic feature of an image. Many existing

are the mean appearance vectors of target and

Since ISUVT extracts three features for target

To compute the final state ( )

denotes the particle likelihood map generated

Once one feature fails, it will be distributed as a very

The demands for selecting the optimal update time will

Considering another situation, when the tracking

As discussed above, the optimal update time and the

resultingfrom thefusioncan beexpressedviaweighting

If Dfc satisfies certain complex conditions, as mentioned

At the particle observation stage, suppose that we

At the updating stage, we firstly evaluate the likelihood

The number of particles in IVT is set to 600, and every

8 Int. j. adv. robot. syst., 2013, Vol. 10, 331:2013 www.intechopen.com

The target tracked in the David sequences is a moving

In the Car sequence, a moving car is experiencing

1. The established selfupdating mechanism can select

11 Danpei Zhao, Ming Lu, Xuguang Zhang and Zhiguo Jiang:

You might also like