Target Tracking

Detection and Tracking of Moving
Objects from a Moving Platform
Grard Medioni
Institute of Robotics and Intelligent Systems
Computer Science Department
Viterbi School of Engineering
University of Southern California
Problem Definition
Scenario: rigidly moving objects + moving camera
Goal
Motion segmentation: motion regions / background area
Tracking of multiple objects: consistent track(s) over time
Geo-registration and Geo-tracking: Geo-referenced mosaic and tracks
Scenario example 1 moving cameras
Moving cameras
Image stabilization
Motion
segmentation
Tracking
Mosaic+Tracks
+Tracks
Mosaic
Scenario example 2 - moving cameras with a map
Moving camera
Map
Image stabilization
Geo registration
Global data
association
Motion
segmentation
Tracking
Geo-referenced
Geo-referenced
mosaic++tracks
tracks
mosaic
Challenges & Applications

Information sources
Pixel colors + 2D coordinates
Object model information (optional)
Difficulties
Camera motion
3D Static structures (parallax)
Multiple moving objects
Applications
Video surveillance
Video compression and indexing

Outline
Introduction
2D Motion segmentation
Tracking of multiple moving objects
Geo-registration and geo-tracking
Summary and Discussion
Motion Segmentation Overview

Task: to segment motion region and background
Assumptions
General camera motion
Distant scene
Textured background
Feature Extraction & Matching

Salient parts of the scene
Extraction
Harris corners
Multi-scale
Multi-orientation
Sub-pixel accuracy
Matching
Small inter-frame motion
Gray-scale windows
Cross correlation
Large viewpoint change

Gradient histogram
Vector angle
Multiple Image Registration

Frame motion model
Assumptions:
Small inter-frame motion
Distant planar scene
2D affine transform
Robust estimation
Random Sample Consensus
(RANSAC)
Keep the model with the
largest number of inliers
Non-linear refinement over

the inliers
Ap1 = p 2
A11
A
21
0
A12
A22
0
A13 u1 u2
A23 v1 = v2
1 1 1
Motion Segmentation
Two-frame pixel-level segmentation?
Segmentation within a temporal window
Accumulate the pixels warped from adjacent frames
K-Means to find the most representative pixel
Frame differencing and thresholding: |Ioriginal-Imodel|>I
Frame t
Frame t-w
t: reference frame
w: half size of the window
Frame t+w
10/72
Experimental Results (1)
Original
Images
Motion
Prob.
Maps
Initial
Detection
Results
Tracking
Results
11/72
Original
images
Motion
Prob.
Maps
Initial
Detection
Results
Tracking
Results
A synthesized video without motion regions
Outline
Introduction
Problem statement- multiple target tracking

Input: foreground regions in each frame
Output: trajectories with consistent track IDs
Challenges:
Noisy foreground regions
Occlusions
Problematic underlying assumption

One-to-one assumption
One target can correspond to at most one observation
One observation can be associated to at most one target
Appropriate to punctual observations
Underlying one-to-one assumption may not stand for visual tracking
Radar
UAV camera
Stationary camera
Related work
MAP, multi-scan, uniform prior (no missing or false detection)
(Cong et al., 04) Approximate association probabilities in JPDAF

MMSE, MCMC outperforms JPDAF, one-scan/muliti-scan
(Sastry, et.al 04) MCMC to compute joint DA with unknown number of

targets
MAP, multi-scan, outperforms MHT, consider temporal association only
(F.Dellaert et.al 03) MCMC to SfM without correspondence

MMSE, Single scan, similar to JPDAF
Our method: overcome the one-to-one assumption

MAP, multi-scan, consider both spatial and temporal association
One-to-one assumption
(Pasula et al., 99) Gibbs sampling to compute joint DA
Anatomy of the problem

Explain foreground regions:
It is hard at one frame without using any model information

It is solvable if smoothness in motion and appearance is used
Explanation of foreground regions

Two way of explain foreground regions
Precisely
Approximately
Labeling of foreground regions
The label(s) of a pixel indicates the

track ID
Each pixel can have multiple labels
to represent occlusions
Accurate but expensive!
Cover of foreground regions
A set of shapes (rectangles)

Each rectangle can have overlap
with others to represent occlusions
Approximate but Efficient!
Our formulation
Given
A set of noisy observations (foreground regions)
Find
A cover of foreground regions over time
is a sequence of shapes (rectangles)
Solution space
Solution space is a collection of spatio-temporal covers of
observation Y.
Joint association event
= { 1 , 2 K, K }
Two kinds of data association

Spatial data association - change the cover at one instant
Temporal data association - form consistent tracks
Uncovered area belongs to false alarms
(a) Observations Y
(b) One possible cover of Y
Bayesian formulation
MAP estimate
* = arg max( p( | Y ))
p ( | Y ) p (Y | ) p ( )
Prior model p()
Few number of long tracks
One track should have little overlapping with other track unless necessary
p( ) = p ( L) p( K ) p(O)
Likelihood p(Y | )
Smoothness in both motion and appearance
Areas of uncovered false alarms p(F)
K | k |1
p (Y | ) = p ( F ) L( k (ti +1 ) | k (ti ))
k =1 i =1
Motion likelihood
Appearance likelihood
Motion and appearance likelihood
Motion
xtk+1 = Ak xtk + w
y = H x +v
k
t
k
t
w ~ N (0, Q)
v ~ N (0, R)
k (ti+1)
k (ti+1)
Appearance
LM ( k (ti +1 ) | k (ti )) p( k (ti +1 ) | k +1 (ti ))
LA ( k (ti +1 ) | k (ti )) = (1/ z3 ) exp ( 3 D( k (ti ), k (ti +1 ) )

D ( k (ti ), k (ti +1 )
Kullback- Leibler (KL)
distance between two RGB
color histograms
MAP of full posterior p( |Y)

MAP estimate of such a posterior is not a trivial task
Even to determine the parameters in such a posterior is not an
easy task
p( | Y ) exp {C0 Slen C1 K C2 F C3 Solp C4 S app Smot }

MAP is equivalent to minimize an energy function.
Solution to MAP:
Sampling based method to avoid enumerating all possible solutions
Two types of proposal moves (temporal and spatial moves)
Symmetric temporal information
Markov Chain Monte Carlo

Basic idea: construct a Markov chain which will converge to
the target distribution
State of the Markov chain is defined in
Transition of the Markov chain is guided by a proposal distribution
Metropolis-Hasting algorithm
Propose a new state from the previous state (i)
' ~ q( ' | (i ) )
Accept with probability
p( ')q( ( i )
| ')
min 1,
(i )
(i )
p ( )q ( ' | )
Properties
Dont have to compute the global p(), but the local ratio p()/ p()
For MAP, dont have to keep the whole chain, but the current state and the
best one
Metropolis-Hasting algorithm
1. Initialize (0) .
2. For i = 0 to N -1
N is the length of Markov chain
- Sample u U [0,1]
- Propose ' q( ' | (i ) ).
q() is called the proposal distribution
(i )
p
(
')
q
(
| ')
(i )
- Compute A( , ')= min 1,
(i )
(i )
p
(
)
q
(
'
|
)
- If u < A( ( i ) , ')
else
( i +1) = '
(i +1) = (i )
Endfor
The chain { (0) , K , ( N ) }N p( )
Two types of q( | )
Temporal moves and spatial
Birth/Death
Data-driven proposal
q( ' | ) q( ' | , D)
Spatial moves are made only after
Temporal Moves
moves to drive the Markov chain
enough temporal information is
Extension/
Reduction
Split/Merge
Switch
Symmetric temporal information
Forward and backward (e.g. extension)
Deal with occlusions at the very

beginning
Spatial Moves
collected
Segmentation
/Aggregation
Diffusion
MCMC Data Association

1. Initialize (0) .
2. For i = 0 to N -1
- Sample u U [0,1]
- Sample if i < N , ' qTemporal ( ' | ( i ) )
else
' qAll ( ' | (i ) ).
(i )
p
(
')
q
(
| ')
(i )
- Compute A( , ')= min 1,
(i )
(i )
p( )q( ' | )
- If u < A( ( i ) , ')
else
Endfor
( i +1) = '
(i +1) = (i )
Determining Parameters
Determine the parameters in the full posterior
Casual setting makes ground truth p(gt|Y) even much lower than the
solution.
Take advantage of the property of MCMC
p ( | Y ) exp {C0 Slen C1 K C2 F C3 Solp C4 S app S mot }
Degenerate the gt to
p( gt )
p ( ')
A [C0 , C1 , C2 , C3 , C4 ] b
C0 , C1 , C2 , C3 , C4 0
max(C + C + C + C + C )
0
1
2
3
4
Linear Programming to solve it

(GNU Linear Programming Kit)
Simulation experiments
Settings
K (unknown number) moving discs in 200x200

Independent color appearance and motion
Static occlusion and inter-occlusion
False alarms
Original video
Tracking result
Quantitative comparison
MHT (I. Cox94), JPDAF (J.Kang03), Temporal only

STDA score in VACE-II eval
Same motion and appearance likelihood
Average of multiple sequence and multiple runs
FA=0, W=50, 10K MCMC iterations
K=5, W=50, 10K MCMC iterations
Online implementation
Sliding window W
Initialize t with *t-1
Online vs. offline comparison T=1000
Real Scenarios
Experiments
CLEAR 320x240
Vivid-II 320x240
Experiments
Can handle occlusion at the beginning by using symmetric
temporal information
Outline
Introduction
Geo-registration
Use 2D homography to
compensate inter-frame (2-
H i +1, M = ( H i ,i +1 ) H i , M H update
view) motion
Hi,i+1
Hi,M
Hi+1,M
Hupdate
Refine the homography
between map and images
37/72
Geo-registration results
Geo-mosaicing 2000 frames on top of the reference frame.
Experimental results
Results are shown on two UAV data sets
Map is acquired from Google Earth
Geo-registration is performed every 50 frames
Local data association (MCMCDA) window 50 frames
Geo-registration
Without geo-refinement
With geo-refinement
System implementation
C++ implementation
Xeon Dual Core P4 3.0GHz
Preliminary time performance
Procedure
Time (seconds) on 320x240
Image registration
~ 0.25
Motion detection (moving cameras)
~ (2 / 0.1) (CPU / GPU)
Object detection after motion

segmentation
~0.25
Geo-registration
~ 6 every 50 frames
Tracking
~ 0.4
Total
~ 1 ( GPU)
43/72
Outline
Introduction
Summary & Discussion

Detection and tracking in dynamic scene
Moving camera + rigid moving objects

2D motion segmentation and geometric analysis of background
Spatial and temporal (2D+t) data association of moving objects
Tracking with Geo-registration
Highlights
Solution to practical problems in detection and tracking area
Encouraging results and extensive applications
Future directions
Multi-view geometry + object recognition
Automatically determination of applicable tasks
Reference
Qian Yu and Grard Medioni, A GPU-based implementation of Motion Detection from a

Moving Platform, to appear in IEEE workshop on Computer Vision on GPU, in conjunction
with CVPR08
Qian Yu and Grard Medioni, Integrated Detection and Tracking for Multiple Moving
Objects using Data-Driven MCMC Data Association, IEEE Workshop on Motion and Video
Computing (WMVC'08), 2008
Qian Yu, Grard Medioni, Isaac Cohen, "Multiple Target Tracking Using Spatio-Temporal
Monte Carlo Markov Chain Data Association" IEEE Conference on Computer Vision and
Pattern Recognition, 2007 (CVPR'07), pp.1-8
Qian Yu, Grard Medioni, "Map-Enhanced Detection and Tracking from a Moving Platform
with Local and Global Data Association," IEEE Workshop on Motion and Video Computing
(WMVC'07), 2007
Yuping Lin, Qian Yu, Gerard Medioni "Map-Enhanced UAV Image Sequence Registration"
Workshop on Applications of Computer Vision (WACV'07), 2007
Qian Yu, Isaac Cohen, Grard Medioni and Bo Wu "Boosted Markov Chain Monte Carlo
Data Association for Multiple Target Detection and Tracking," Proceedings of the 18th
international Conference on Pattern Recognition (ICPR'06), Vol. 2, pp. 675-678.
Q&A
Thank you!

Target Tracking

Uploaded by

Document Information

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Target Tracking

Uploaded by

Copyright:

Available Formats

Detection and Tracking of Moving

Objects from a Moving Platform

Scenario example 1 moving cameras

Scenario example 2 - moving cameras with a map

Challenges & Applications

Motion Segmentation Overview

Feature Extraction & Matching

Large viewpoint change

Multiple Image Registration

Non-linear refinement over

Experimental Results (1)

Experimental Results (2)

Experimental Results (3)

A synthesized video without motion regions

Problem statement- multiple target tracking

Problematic underlying assumption

Underlying one-to-one assumption may not stand for visual tracking

MAP, multi-scan, uniform prior (no missing or false detection)

(Cong et al., 04) Approximate association probabilities in JPDAF

(Sastry, et.al 04) MCMC to compute joint DA with unknown number of

(F.Dellaert et.al 03) MCMC to SfM without correspondence

Our method: overcome the one-to-one assumption

(Pasula et al., 99) Gibbs sampling to compute joint DA

Anatomy of the problem

It is hard at one frame without using any model information

Explanation of foreground regions

The label(s) of a pixel indicates the

Cover of foreground regions

A set of shapes (rectangles)

is a sequence of shapes (rectangles)

Two kinds of data association

Uncovered area belongs to false alarms

(b) One possible cover of Y

Motion and appearance likelihood

LA ( k (ti +1 ) | k (ti )) = (1/ z3 ) exp ( 3 D( k (ti ), k (ti +1 ) )

MAP of full posterior p( |Y)

p( | Y ) exp {C0 Slen C1 K C2 F C3 Solp C4 S app Smot }

Markov Chain Monte Carlo

N is the length of Markov chain

q() is called the proposal distribution

The chain { (0) , K , ( N ) }N p( )

moves to drive the Markov chain

enough temporal information is

Symmetric temporal information

Forward and backward (e.g. extension)

Deal with occlusions at the very

MCMC Data Association

' qAll ( ' | (i ) ).

p ( | Y ) exp {C0 Slen C1 K C2 F C3 Solp C4 S app S mot }

Linear Programming to solve it

K (unknown number) moving discs in 200x200

MHT (I. Cox94), JPDAF (J.Kang03), Temporal only

FA=0, W=50, 10K MCMC iterations

K=5, W=50, 10K MCMC iterations

Online vs. offline comparison T=1000

compensate inter-frame (2-

Refine the homography

between map and images

Geo-mosaicing 2000 frames on top of the reference frame.

Time (seconds) on 320x240

Motion detection (moving cameras)

~ (2 / 0.1) (CPU / GPU)

Object detection after motion

Summary & Discussion