Professional Documents
Culture Documents
P
k
given the state x
k1
and P
k1
in the previous time step k 1. The measurement update
equations are responsible for the correction of the estimated state and covariance according
to the observed data. Kalman gain K is computed according to the covariance matrix S of
the innovation vector z
k
; the innovation vector z
k
is dened as a dierence between real and
estimated measurement. Both K and z
k
are used to compute the corrected state vector x
k
and its covariance P
k
, which are used as an input for the next interation. In the Fig. 3.10 the
Kalman Filter cycle is presented.
Prediction:
x
k
= Fx
k1
(3.7)
P
k
= FP
k1
F
T
+Q (3.8)
where x
k
and
P
k
are the state and ther covariance predictions at time k.
18
z
k
= H x
k
(3.9)
where z
k
is predicted observation at time k.
Update:
x
k
= x
k
+K
k
z
k
(3.10)
P
k
= (I K
k
H)
P
k
(3.11)
where the lter gain matrix K
k
is
K
k
= P
k
H
T
S
1
k
(3.12)
and the innovation vector z
k
z
k
= z
k
z
k
(3.13)
has the covariance matrix S
k
S
k
= H
P
k
H
T
+R (3.14)
3.3 3D Tracking: Projective Kalman Filter
3.3.1 Projective Geometry and Camera Model
We use simple pinhole camera model with known parameters [12]. The pinhole camera model
is demonstrated in Fig. 3.11.
Let Q be a point of the Euclidean 3D space and q its projection, both Q and q are represented
by homogeneous coordinates. Then transformation from Q to q can be described using camera
matrix M:
q = MQ, M = K
R t
0
1x3
1
(3.15)
where R and t are rotation and translation, which dene extrinsic parameter of the camera; K
19
Figure 3.11: Pinhole camera model
contains the intrinsic parameters of the camera:
K =
x
0 x
0
0
0
y
y
0
0
0 0 1 0
(3.16)
where (x
0
, y
0
) are the coordinates of the principal point of the image plane;
x
= fm
x
and
y
= fm
y
represent the focal length f of the camera in terms of pixel dimensions in the x and
y direction with scale factors m
x
and m
y
respectively.
It is important to mention that actually projection from 3D space to image plane is dened
up to a scale s: q = [x y s]
T
; hence, normalized coordinates q = ( x y 1)
T
can be dened:
q =
q
s
(3.17)
Therefore, in order to model projection from 3D space to normalized coordinates in image
plane, it is neccessary to take into acount non-linearity of such a transformation.
3.3.2 Projective Kalman Filter
In Projective Kalman Filter, designed by [7], the state x
k
, from Eq. (3.3), is dened as the
position and velocity that decribe the dynamics of the tracked 3D location in homogeneous
coordinates:
20
x
k
=
X Y Z 1
X
Y
Z 0
T
(3.18)
The measurement process described by Eq. (3.4) must be modelled according to the pro-
jective nature of the observations. The data captured by the N cameras forms the observation
vector z
k
z =
z
1
. . . z
N
T
=
x
1
y
1
1 . . . x
N
y
N
1
T
(3.19)
The key point of the projective Kalman lter is in denition of the adaptive matrix H
k
that maps the state x
k
to the measurement z
k
according to the Eq. (3.4). Canton-Ferrer [7]
proposed to use H
k
dened as:
H
k
=
1
k
. . . 0
.
.
.
.
.
.
.
.
.
0 . . .
N
k
M
1
0
3x4
.
.
.
.
.
.
M
N
0
3x4
(3.20)
with
k
that models the nonlinearity of projection from 3D coodinates to normalized coordinates
in image plane:
n
k
=
1
M
3
n
x
k
I
3x3
(3.21)
where M
n
is transformation matrix of camera n, M
3
n
is the 3rd row of M
n
, x
k
is predicted state.
Incorrect measurements caused by occlusions can cause big leap in 3D reconstruction. In
order to assure smooth trajectory in 3D space, a special approach for dening the covariance
matrix of measurement noise R is used [7]. Covariance matrix R is dened as:
R
k
=
R
k1
. . . 0
.
.
.
.
.
.
.
.
.
0 . . . R
kN
(3.22)
where R
kn
is noise covariance matrix at time n. Whenever occlusion is detected (one mea-
surement can be assigned to several targets, disscused in section 3.4) in view n, R
kn
is dened
as:
R
kn
=
. . . 0
.
.
.
.
.
.
.
.
.
0 . . .
(3.23)
21
3.4 Multitarget Tracking:Joint Probabilistic Data Associ-
ation Filter
3.4.1 Problem Formulation
In this section the problem of assigning each target state to a measurement is discussed. Data
association is a key stage of a widely used JPDAF algorithm introduced by Fortmann [11].
Kalman Filter as well as Projected Kalman Filter, described in Section 3.2, assume that
correspondence among skin-colored areas and objects is one-to-one. However, as it was shown
above, it is not always like this. Sometimes several targets can produce the same measurement,
and sometimes one target splits into several areas. The JPDAF can handle all this cases and
properly deal with data association.
The data association problem can be formulated as follows [11](to simplify the notion, we
will suppress the time subscript k from all variables, unless it is required for clarity).
Let m and T be the numbers of measuremets and targets. At each time step the sensor
provides a set of candidate measurements to be associated with targets (or rejected). It is
done by forming a validation gate around the predicted measurement z
t
from each target t,
t = 1, ..., T, and retaining only those detections z
j
, j = 1, ..., m, that lie within the gate. The
gate is normally presented by a g-sigma ellipsoid:
z
t
j
S
1
t
z
t
j
g
2
t
(3.24)
where z
t
j
is innovation corresponding to measurement j and predicted measurement z
t
:
z
t
j
= z
j
z
t
(3.25)
Conditional mean estimate x
t
is obtained from Eq. (3.10) by using combined (weighted)
innovation:
z
t
=
m
j=1
t
j
z
t
j
(3.26)
where
j
is the probability that jth measurement (or no measurement for j = 0) is the correct
one.
The update part of the error covariance matrix P
t
k
Eq. (3.11) for target t becomes [11]:
P
t
k
=
P
t
(1
t
0
)K
t
S
t
K
t
+K
t
k
j=1
j
z
t
j
z
t
j
z
t
z
t
K
t
(3.27)
22
where
t
0
is the probability that none of the measurements originated from target t.
The key stage of JPDAF algorithm is in calculating weights taking into account multiple
targets.
3.4.2 Association Probabilities Estimation
Computation of the begins with the construction of a so-called validation matrix for the
T targets and m measurements. The validation matrix is a m (T + 1) rectangular matrix
dened as [11]:
= [
jt
] =
t
. .. .
0 1 2 . . . t
1
11
12
. . .
1T
1
21
22
. . .
2T
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
1
m1
m2
. . .
mT
1
2
.
.
.
m
j
(3.28)
where
j0
= 1 shows that measurement j could originate from clutter,
jt
= 1 if measure-
ment j is inside the validation gate of terget t and
jt
= 0 if measurement j is outside the
validation gate for target t for j = 1, 2, ..., m and t = 1, 2, ..., T. Based on the validation matrix
data association hypotheses (feasible events) are generated subject to two restriction:
each measurement can have only one origign (either a specic target or a clutter)
no more than one measurement originates from a target.
Each feasible event is represented by a hypothesis matrix
of the same size as . A
typical element in
is denoted by
jt
where
jt
= 1 only if measurement j is assumed to be
associated with a clutter (t = 0) or target (t > 0).
Consider an example with two targets and 3 measurements presented in Fig. 3.12. In this
case, validation matrix and hypothesis matrices
will be determined as:
=
1 1 0
1 1 1
1 0 1
0
=
1 0 0
1 0 0
1 0 0
1
=
0 1 0
1 0 0
1 0 0
2
=
0 1 0
0 0 1
1 0 0
3
=
0 1 0
1 0 0
0 0 1
23
Figure 3.12: Example with two targets and three measurements
4
=
1 0 0
0 1 0
1 0 0
5
=
1 0 0
0 1 0
0 0 1
6
=
1 0 0
0 0 1
1 0 0
7
=
1 0 0
1 0 0
0 0 1
After each
is obtained, the measurement association indicator
j
(), target detection
indicator
t
() and total number of false measurements in event are computed:
measurement association indicator shows whether measurement j is associated with any
target in event :
j
() =
T
t=1
jt
(); (3.29)
target detection indicator shows whether target t is associated with any measurement in
event :
t
() =
m
j=1
jt
(); (3.30)
total number of false measurements in event is dened as:
() =
m
j=1
[1
j
()] . (3.31)
Conditional probability of the corresponding data association hypothesis or feasible event
subject to observed data Z
k
at time k is calculated by a formula:
24
P
|Z
k
=
C
j:j=1
exp
1
2
( z
tj
j
)
S
1
tj
( z
tj
j
)
(2)
m/2
|S
tj
|
1/2
t:t=1
P
t
D
t:t=0
(1 P
t
D
) (3.32)
where P
t
D
is a probability of detection of target t, C is a probability of false measurement, c is
a normalization constant c =
|Z
k
.
The probability
t
j
that measurement j belongs to target t may now be obtained by summing
over all feasible events :
t
j
=
|Z
k
jt
() (3.33)
and probability that none of the measurements originated from target t
t
0
= 1
m
j=1
t
j
(3.34)
These probabilities are used to form the combined innovation for each target (3.26) and in
the update of the covariance equation (3.27)
3.5 3D Multitarget Tracking: Projective Joint Probabilis-
tic Data Association Filter
After we analysed Projective Kalman Filter (PKF) and Joint Probabilistic Data Association
Filter (JPDAF), it is neccesary to design a clear scheme that will combine both methods in one
algorithm, Projective Joint Probabilistic Data Association Filter (PJPDAF)
Let T be number of targets, N - the number of views from which me observe measurement
and m
n
- number of measurement in the view n. The PJPDAF algorithm for a time step k and
target t will look like:
1. Prediction: Predict 3D state x
t
k
and error covariance matrix
P
t
k
given the state and its
covariance at step k 1
2. Observation and data association:
measurement prediction: project estimated state x
t
k
into every image plane n as a
measurement z
tn
k
Estimate weighted innovation z
tn
k
for every image plane taking into account m
n
observations
Combine innovations from all the planes together into innovation z
t
k
25
3. Update: Do correction of state and its covariance.
Prediction Prediction step of state x
t
k
and its covariance
P
t
k
is done according to Eq. (3.7,
3.8):
x
t
k|k1
= F
t
x
t
k1
(3.35)
P
t
k
= F
t
P
t
k1
F
T
t
+Q
t
(3.36)
Observation and data association:
Measurement prediction: projection of state to the measurement space is done using Eq.
(3.9) z
tn
k
= H
tn
k
x
t
k
, where H
tn
k
is computed according to Eq. (3.20) taking into account only
one view n:
H
tn
k
=
n
k
M
n
0
3x4
(3.37)
Covariance of innovation S
tn
k
is also computed separately for each view n according to Eq. 3.14
Weighted innovation is estimated using JPDAF approach, and therefore can be determined
as
z
tn
k
=
mn
j=1
jn
z
jn
(3.38)
where z
jn
is dened by Eq. (3.25) and
jn
can be calculated from Eq. (3.33).
The probabiltity that target t was not detected in image plane n
tk
0n
is estimated from Eq.
(3.34).
Next step is that we need to prepare data for update step. Considering model of measurement
described in Eq. (3.19), innovations from all the image planes are combined together:
z
t
k
=
z
t1
k
.
.
.
z
tN
k
(3.39)
where covariance of innovation S
t
k
is computed according to Eq. 3.14 using H
t
k
dened as:
H
t
k
=
H
t1
k
.
.
.
H
tN
k
(3.40)
.
26
Update: Update of state and its covariance is done based on equations (3.10 - 3.12):
Kalman gain K
t
k
is obtained from (3.12) using H
t
k
and S
t
k
dened above; State x
t
k
is corrected
according to Eq. (3.10) using combined innovation z
t
k
and Kalman gain K
t
k
Error covariance
matrix dened in Eq. (3.27) becomes:
P
t
k
=
P
t
k
(1
tk
0
)K
t
k
S
t
k
K
t
k
+K
t
k
t
k
t
k
K
t
k
(3.41)
where
tk
0
is the probabiltity that target was not detected in any view:
tk
0
=
N
n=1
tk
0n
(3.42)
t
k
is:
t
k
=
t1
k
. . . 0
.
.
.
.
.
.
.
.
.
0 . . .
tN
k
tn
k
=
mn
j=1
jn
z
jn
z
jn
(3.43)
and
t
k
is:
t
k
=
z
t1
k
z
t1
k
. . . 0
.
.
.
.
.
.
.
.
.
0 . . . z
tN
k
z
tN
k
(3.44)
3.6 Intialization of Trackers
In previous chapters we introduced the PJPDAF algorithm for 3D tracking of multiple objects
using projective JPDAF; in this chapter we discuss how to do the initial guess about 3D positions
of our objects of interest (hands and head).
In order to estimate initial 3D positions of head and left and right hands, we use following
assumptions (Fig. 3.13):
person is manipulating objects while sitting near the table;
we have results of temporal segmentation: we know when action starts;
when action starts, person has its hand lying on the table; hands and head are visible to
both cameras;
The rst step of initialization is to detect skin-colored areas which can possibly represent
hands and head. The largest ellipse-shaped detected area in the top is considered to be a
head, and skin-colored elliptical areas that are in the middle left and right parts of the image
27
Figure 3.13: Initial image frame
are considered to be left and right hands correspondingly. After the data association is done
between objects and observations, the center of each detected skin area is estimated by the
algorithm described in Section 3.1 .
Second step of initialization is to estimate 3D location with known 2D position of hands
and head in both images. The 3D location is computed using triangulation algorithm. Since we
need approximate estimation of 3D location, linear triangulation algorithm works pretty well.
The algorithm for linear triangulation (Fig. 3.14) can be described as follows:
In each image we have a measurement q = MQ, q
= M
(M
xM
3T
M
1T
yP
3T
M
2T
x
M
3T
M
1T
yM
3T
M
2T
(3.45)
where M
i
is 3rd row of M.
After we know A, Q can be estimated using eigen-analysis: X lies in the nullspace of
A = UDV
T
.
28
Figure 3.14: Triangulation
29
Chapter 4
Results
This chapter demonstrates the performance of the PJPDAF algorithm, introduced in Chapter
3.5. Qualitive and quantative characteristics of this algorithm are evaluated; dierent image
sequences illustrating tracking using PJPDAF are also provided.
4.1 System Setup
System setup can be devided into hardware and software setup.
The hardware used in this system is a humanoid robot head with two foveal and two wideeld
cameras; in our application we analyse images only from wide-eld cameras. In order to know
parameters of the cameras (intrinsic and extrinsic), cameras are calibrated using OpenCV
calibration toolbox. Images are captured at a frame rate of 3 frames per second and have
resolution 640x480. The environment of the room should have normal lightning conditions
and static background during tracking section. The person whose hands and head are tracked,
should wear cloth which leaves open only hands and head.
Software includes grabber of image sequences, and a tracking system, which is composed
from two modules: hands and head detection module (HHD module) and PJPDAF module.
Both modules are written in Matlab and work oine, since running speed is not the main point
of this thesis. HHD module includes also undistortion of captured images, which is done using
Matlab Calibration Toolbox [4].
In our application, set of dierent parameters is used; their values were empirecally estimated
and summarized in Table 4.1.
30
Table 4.1: Parameters and their description
Parameter Value Description Module
A
min
500 minimum size of the skin-
color area to be considered as
hand/head
DDH
p 5 penalty coecient for k-means
clustering
DDH
P
t
0
diag(70, 70, 70, 0, 30, 30, 30, 0) covariance matrix of the initial
state
PJPDAF
Q
t
diag([50 50 50 0 5 5 5 0]) covariance matrix of motion
noise
PJPDAF
R
t
n
diag( 15 15 0.5) covariance matrix of measure-
ment noise
PJPDAF
P
t
D
0.9 detection probability of a target PJPDAF
g
t
n
sqrt(15) gate validation PJPDAF
C 0.1 probability of false measurement PJPDAF
- where t = 1..T, T is number of targets
- where n = 1..N, N is number of views
4.2 Experiment Results Analysis
In this section we analyse results of tracking using PJPDAF algorithm. We concentrate on
joint performance of the HHD and PJPDAF modules and do not present evaluation of HHD
module and PJPDAF module separately.
Results of tracking are in 3D, therefore in order to evaluate how good is the tracker, we
need to know exact 3D position of our objects of interest (head and hands in this application).
However, due to limited hardware (we do not have any 3D sensors), all the evaluation of the
PJPDAF will be done in image planes. It will be done by projecting reconstructed 3D trajectory
to both left and right images.
Accuracy of tracking target t can be expressed by the Err
t
acc
:
Err
t
acc
=
1
N
N
n=1
(x
e
nt
x
g
nt
)
2
+ (y
e
nt
y
g
nt
)
2
(4.1)
where [x
g
nt
, y
g
nt
] is a groundtruth estimation of the target t location based on manual marking,
[x
e
nt
, y
e
nt
] - location of the target t obtained from PJPDAF module.
It is worth to notice that manual estimation is also not perfect, especially considering that
we do not know exactly which point to consider as a centre of the hand/head.
Selected frames from sequence for accuracy estimation are presented in Fig. 4.1. 3D re-
31
construction of the target locations is shown in Fig. 4.2. Accuracy error for every frame for
each target is presented in Fig. 4.3. It can be observed, that error of head tracking is much
more stable than error of hands tracking. It can be explained by two factors: First, head is
usually lightened in the same manner, while illumination of hands is not constant due to their
motion and non-uniformity of light in the room; second; hands move much more then the head,
therefore error in tracking also changes according to the motion. Once again, this estimation
of error is not very accurate due to manual marking, which can be done dierently by dierent
people.
(a) Frame 1
(b) Frame 15 (c) Frame 50
(d) Frame 75 (e) Frame 115
(f) Frame 150
Figure 4.1: Selected frames from PJPDAF tracking; red tracker - head, green tracker - right
hand, blue tracker - left hand
More complicated example of tracking is shown in Fig. 4.5. Here we can observe a boy
drinking a coke. In frame 21-28 hand is occluded with a head, in images both tracker do not
look clear. However, as we can see in frames 30-49, trackers are still properly associated with
32
600
400
200
0
1400
1600
400
300
200
100
x
z
y
Figure 4.2: 3D tracking by PJPDAF
0 50 100 150
0
5
10
15
20
Head tracking error
Right hand tracking error
Left hand tracking error
Figure 4.3: Accuracy estimation
targets. 3D trajectory, presented in Fig. 4.4, is smooth and represent the action done: pick up
a coke, drink and put the coke back on the table.
It is also important to analyse in which conditions the PJPDAF fails and loose the tracker
(and cannot recover). From Fig. 4.7 we can observe how good the occlusion in frames 21-22
was handled: even though two hands are together and represent one skin-blob (Fig. 4.6), we
still have two trackers. After that in frame 62 hands were partially occluded, because of the
fast motion, right and left hands were confused. Later, however, in frames 82-99 hands again
lose their tracker and then recover. In the end of the sequence, due to the fast motion, both
tracker of hands were lost.
We found out that fast motion is our main problem which can be result of low speed images
capturing (just 3 frames per second). Another problem is confusing trackers between hands. It
33
400
300
200
100
0
1200
1400
500
400
300
200
100
x z
y
Figure 4.4: 3D trajectory of an action drinking a coke; red tracker - head, green tracker -
right hand, blue tracker - left hand
can be explained by random hand motion and hand acceleration.
34
(a) Frame 1 (b) Frame 21
(c) Frame 23 (d) Frame 27
(e) Frame 30 (f) Frame 31
(g) Frame 32 (h) Frame 36
(i) Frame 40 (j) Frame 49
Figure 4.5: Selected frames from PJPDAF tracking: a boy is drinking a coke; red tracker -
head, green tracker - right hand, blue tracker - left hand
35
(a) Left image (b) Right image
Figure 4.6: Segmentation of frame 21, both hands are presented as one skin-color area
36
(a) Frame 1 (b) Frame 21
(c) Frame 23 (d) Frame 62
(e) Frame 63 (f) Frame 64
(g) Frame 82 (h) Frame 84
(i) Frame 99 (j) Frame 121
Figure 4.7: Selected frames from PJPDAF tracking; red tracker - head, green tracker - right
hand, blue tracker - left hand
37
Chapter 5
Conclusions
Manipulation actions and manipulated objects recognition is a important part of dierent ap-
plications in video surveillance, human-computer interface, programming by demonstrations
and many others. Meanwhile, hands and head tracking plays particular role in manipulation
action and manipulated objects recognition since in most of the cases trajectories of hands and
head give the main cue to such a system.
The thesis is focused on the two problems: object detection and object tracking. Detection
module, based on background subtraction and color segmentation, showed good results in ob-
ject localization. The disadvantage of this method is that the static and known environment
is assumed. Tracking module designed according to the introduced concept of Projective Joint
Probabilistic Data Association Filter, which is based on two well-known techniques Projec-
tive Kalman Filter and Joint Probabilistic Data Association Filter. PKF algorithm handles
3D tracking with 2D observation and JPDAF provide reliable data association algorithms for
mapping between measurements and targets.
Result analysis showed that PJPDAF is able to track hands and head in 3D space and
overcome occlusion problem in many cases. The main problem in PJPDAF is that trackers
can occasionally swap between left and right hands. The main problem with hands is that
they can move unpredictably, sometimes following linear trajectories and suddenly changing
direction of moving. In these conditions, distinguishing between left and right hands becomes
a real problem with non-trivial solution. Motion cue can show direction of hands movement,
and therefore reduce ambiguity.
38
5.1 Future work
3D tracking of humans head and hands is a complex problem with numerous specic solutions
and without generalized algorithm. For manipulation action and object recognition system,
proposed solution can be improved in dierent ways. The future work includes enhancing of
both detection and PJPDAF modules. Detection model should be improved and be able to
localize hands and head of the person in any environment. Concerning PJPDAF, many dierent
cues can be incorporated into it and therefore, results will be more accurate and tracking will be
more robust. Within my interests, I believe that adding information about motion can help us
to achieve better performance of tracking. Moreover, online reconstruction of 3D trajectories of
hands and head is also an important factor. Therefore, optimization of the introduced algorithm
to achieve real-time tracking is also within the scope of future research.
39
Bibliography
[1] Antonis A. Argyros and Manolis I. A. Lourakis. Real-time tracking of multiple skin-colored
objects with a possibly moving camera. In Tomas Pajdla and Jiri Matas, editors, ECCV
(3), volume 3023 of Lecture Notes in Computer Science, pages 368379. Springer, 2004.
[2] Antonis A. Argyros and Manolis I. A. Lourakis. Binocular hand tracking and reconstruc-
tion based on 2d shape matching. In ICPR 06: Proceedings of the 18th International
Conference on Pattern Recognition, pages 207210, Washington, DC, USA, 2006. IEEE
Computer Society.
[3] O. Bernier and D. Collobert. Head and hands 3d tracking in real time by the em algorithm.
In Recognition, Analysis, and Tracking of Faces and Gestures in Real-Time Systems, 2001.
Proceedings. IEEE ICCV Workshop on, pages 7581, 2001.
[4] G.R. Bradski. Real time face and object tracking as a component of a perceptual user
interface. In Applications of Computer Vision, 1998. WACV 98. Proceedings., Fourth
IEEE Workshop on, pages 214219, Oct 1998.
[5] C. Canton-Ferrer, J. Salvador, J. R. Casas, and M. Pard`as. Multi-person tracking strategies
based on voxel analysis. Multimodal Technologies for Perception of Humans: International
Evaluation Workshops CLEAR 2007 and RT 2007, Baltimore, MD, USA, May 8-11, 2007,
Revised Selected Papers, pages 91103, 2008.
[6] Cristian Canton-Ferrer, Josep R. Casas, A. Murat Tekalp, and Montse Pard`as. Projective
kalman lter: Multiocular tracking of 3d locations towards scene understanding. In Steve
Renals and Samy Bengio, editors, MLMI, volume 3869 of Lecture Notes in Computer
Science, pages 250261. Springer, 2005.
[7] Hyun-Chul Do, Ju-Yeon You, and Sung-Il Chien. Skin color detection through estimation
and conversion of illuminant color under various illuminations. Consumer Electronics,
IEEE Transactions on, 53(3):11031108, Aug. 2007.
40
[8] Ayman El-Sawah, Chris Joslin, Nicolas D. Georganas, and Emil M. Petriu. A framework
for 3d hand tracking and gesture recognition using elements of genetic programming. In
CRV 07: Proceedings of the Fourth Canadian Conference on Computer and Robot Vision,
pages 495502, Washington, DC, USA, 2007. IEEE Computer Society.
[9] James D. Foley, Richard L. Phillips, John F. Hughes, Andries van Dam, and Steven K.
Feiner. Introduction to Computer Graphics. Addison-Wesley Longman Publishing Co.,
Inc., Boston, MA, USA, 1994.
[10] T. Fortmann, Y. Bar-Shalom, and M. Schee. Sonar tracking of multiple targets using
joint probabilistic data association. Oceanic Engineering, IEEE Journal of, 8(3):173184,
Jul 1983.
[11] R. I. Hartley and A. Zisserman. Multiple View Geometry in Computer Vision. Cambridge
University Press, ISBN: 0521540518, second edition, 2004.
[12] Rudolph Emil Kalman. A new approach to linear ltering and prediction problems. Trans-
actions of the ASMEJournal of Basic Engineering, 82(Series D):3545, 1960.
[13] Ig-Jae Kim, Shwan Lee, S.C. Ahn, Yong-Moo Kwon, and Hyoung-Gon Kim. 3d tracking
of multi-objects using color and stereo for hci. In Image Processing, 2001. Proceedings.
2001 International Conference on, volume 3, pages 278281 vol.3, 2001.
[14] Hedvig Kjellstrom, Javier Romero, David Martnez, and Danica Kragic. Simultaneous
visual recognition of manipulation actions and manipulated objects. In ECCV 08: Pro-
ceedings of the 10th European Conference on Computer Vision, pages 336349, Berlin,
Heidelberg, 2008. Springer-Verlag.
[15] J. Kovac, P. Peer, and F. Solina. Human skin color clustering for face detection. In
EUROCON 2003. Computer as a Tool. The IEEE Region 8, volume 2, pages 144148
vol.2, Sept. 2003.
[16] Choong Hwan LEE, Jun Sung Kim, and Kyu Ho Park. Automatic human face location in a
complex background using motion and color information. Pattern Recognition, 29(11):1877
1889, 1996.
[17] S. Lloyd. Least squares quantization in pcm. Information Theory, IEEE Transactions on,
28(2):129137, 1982.
[18] Stephen Mckenna, Stephen Mckenna, Shaogang Gong, and Shaogang Gong. Tracking faces.
In In Proceedings of International Conference on Automatic Face and Gesture Recognition,
pages 271276. IEEE Computer Society Press, 1996.
41
[19] Thomas B. Moeslund, Adrian Hilton, and Volker Kr uger. A survey of advances in vision-
based human motion capture and analysis. Comput. Vis. Image Underst., 104(2):90126,
2006.
[20] Kai Nickel, Edgar Seemann, and Rainer Stiefelhagen. 3d-tracking of head and hands for
pointing gesture recognition in a human-robot interaction scenario. Automatic Face and
Gesture Recognition, IEEE International Conference on, 0:565, 2004.
[21] Son L. Phung, A. Bouzerdoum, and D. Chai. A novel skin color model in ycbcr color space
and its application to human face detection. In Image Processing. 2002. Proceedings. 2002
International Conference on, volume 1, pages I289I292 vol.1, 2002.
[22] S. Ray and R. Turi. Determination of number of clusters in k-means clustering and appli-
cation in colour image segmentation, 1999.
[23] C. Stauer and W. E. L. Grimson. Adaptive background mixture models for real-time
tracking. In Computer Vision and Pattern Recognition, 1999. IEEE Computer Society
Conference on., volume 2, page 252 Vol. 2, 1999.
[24] David J. Sturman and David Zeltzer. A survey of glove-based input. IEEE Computer
Graphics and Applications, 14(1):3039, 1994.
[25] Hisashi Tanizaki and Roberto S. Mariano. Nonlinear and non-gaussian state-space model-
ing with monte carlo simulations. Journal of Econometrics, 83(1-2):263290, 1998.
[26] Arasanathan Thayananthan. Model-based hand tracking using a hierarchical bayesian
lter. IEEE Trans. Pattern Anal. Mach. Intell., 28(9):13721384, 2006. Member-Stenger,,
Bjorn and Senior Member-Torr,, Philip H. S. and Member-Cipolla,, Roberto.
[27] A. Vacavant and T. Chateau. Realtime head and hands tracking by monocular vision. In
Image Processing, 2005. ICIP 2005. IEEE International Conference on, volume 2, pages
II3025, Sept. 2005.
[28] Xavier Varona, Jose Maria Buades Rubio, and Francisco J. Perales Lopez. Hands and face
tracking for vr applications. Computers Graphics, 29(2):179187, 2005.
[29] Vladimir Vezhnevets, Vassili Sazonov, and Alla Andreeva. A survey on pixel-based skin
color detection techniques. In in Proc. Graphicon-2003, pages 8592, 2003.
[30] P. Viola and M. Jones. Rapid object detection using a boosted cascade of simple features.
In Computer Vision and Pattern Recognition, 2001. CVPR 2001. Proceedings of the 2001
IEEE Computer Society Conference on, volume 1, pages I511I518 vol.1, 2001.
42
[31] Christopher Wren, Ali Azarbayejani, Trevor Darrell, and Alex Pentland. Pnder: Real-
time tracking of the human body. IEEE Transactions on Pattern Analysis and Machine
Intelligence, 19:780785, 1997.
[32] Zhanwu Xiong. Robust object tracking using multiple cues. Masters thesis, KTH, 2009.
[33] A. Yilmaz, O. Javed, and M. Shah. Object tracking: A survey. ACM Comput. Surv.,
38(4):13, 2006.
43