You are on page 1of 11

Page 1

Object tracking using Radial basis function networks


1

[a]
A. Prem Kumar , T. N. Rickesh[b], R. Venkatesh Babu[c ], R. Hariharan[d]

Abstract: The applications of visual tracking are broad in scope ranging f rom surveillance
and monitoring to smart rooms. A robust object-tracking algorithm using Radial Basis
Function (RBF) networks has been implemented using OpenCV libraries. The pixel-based
color features are used to develop classifiers. The algorithm has been tested on various
video samples under different conditions, and the results are analyzed.

1. Introduction

The objective of tracking is to follow the target object in successive video frames. The
major utility of such algorithm is in the design of video surveillance system to tackle
terrorism. For instance, large-scale surveillance might have played a crucial role in
preventing (or tracking the trails of terrorism) 26/11 terrorist attacks in Mumbai, many
bomb blasts in Kashmir, North-east Indian region, and other parts of India. It is important
to have a robust object-tracking algorithm. Since neural network f ramework does not
require any assumptions on structures of input data, they have been used in the field of
pattern recognition, image analysis, etc. The Radial Basis Function (RBF) based neural
network is one of many ways to build classifiers. A robust algo rithm for object tracking
using RBF networks was described in the paper [1]. We have implemented that algorithm
using OpenCV libraries so that this module can be integrated into a large surveillance
system.

2. Object Tracking

Object tracking is an important task w ithin the field of computer vision. The growth of
high-performance computers, the availability of high quality yet inexpensive video cameras,
and the increasing need for automated video analysis has generated a great deal of interest
in object tracking algorithms. There are three key steps in video analysis: detection of
interesting moving objects, tracking of such objects from frame to frame, and analysis of
tracks to recognize their behavior. The object tracking is pertinent in the tasks of:

 Motion-based recognition, that is, human identification based on gait, automatic


object detection, etc.
 Automated surveillance, that is, monitoring a scene to detect suspicious activities or
unlikely events
 Video indexing, that is, automatic annotation and retrieval of the videos in
multimedia databases
 Human-computer interaction, that is, gesture recognition, eye gaze tracking for data
input to computers, etc.
 Traffic monitoring, that is, real-time gathering of traffic statistics to direct traffic flow
 Vehicle navigation that is, video-based path planning and obstacle avoidance
capabilities

[a] - Indian Institute of Technology Bombay [c] - Video analytics consultant


[b]- National Institute of Technology Karnataka, Surathkal [d] – Junior scientist, Flosolver

Page 2
In its simplest form, tracking can be defined as the problem of estimating the trajectory
of an object in the image plane as it moves around a scene. A tracker assig ns consistent
labels to the tracked objects in different frames of a video. Additionally, depending on the
tracking domain, a tracker can also provide object-centric information, such as orientation,
area, or shape of an object. Tracking objects can be complex due to:

 Loss of depth information


 Noise in images,
 Complex object motion,
 Non-rigid or articulated nature of objects,
 Partial and full object occlusions,
 Complex object shapes,
 Scene illumination changes, and
 Real-time processing requirements.

One c an simplify tracking by imposing constraints on the motion and/or appearance of


objects. For example, almost all tracking algorithms assume that the object motion is
smooth w ith no abrupt changes. One can further constrain the object motion to be of
constant velocity or constant acceleration based on a priori information. Prior know ledge
about the number and the size of objects, or the object appearance and shape, can also be
incorporated. The foremost factor is the object, its representation, and modeling.

3. Object Representation

Objects can be represented using their shapes and appearances. Here we describe the
object shape representations commonly employed for tracking.

 Points. The object is represented by a point, that is, the centroid or by a set of
points. In general, the point representation is suitable for tracking objects that
occupy small regions in an image.
 Prim itive geometric shapes. Object shape is represented by a rectangle, ellipse, etc.
Object motion for such representations is usually modeled by translation, affine, or
projective transformation. Though primitive geometric shapes are more suitable for
representing simple rigid objects, they are also used for tracking non-rigid objects.
 Object silhouette and contour. Contour representation defines the boundary of an
object. The region inside the contour is called the silhouette of the object. Silhouette
and contour representations are suitable for tracking complex non-rigid shapes

4. Object modeling

The purpose of modeling is to classify whether a pixel chosen belongs to the object or
not. Some of the prominent features used for modeling are:

 Templates: Templates are formed using simple geometric shapes or silhouettes. An


advantage of a template is that it carries both spatial and appearance information.
Templates, however, only encode the object appearance generated from a single
view. Thus, they are only suitable for tracking objects whose poses do not vary
considerably during the course of tracking.

Page 3
 Probabilistic densities of object appearance: The probability density estimates of the
object appearance can either be parametric, such as Gaussian and a mixt ure of
Gaussians (for instance RBF networks), or nonparametric, such as histograms. The
probability densities of object appearance features (color, texture) can be computed
from the image regions specified by the shape models (interior region of an ellipse or
a contour).
 Histogram: It uses the color features of the image. Based on the histogram
developed, a pixel can be decided w hether it belongs to object or not. Under
conditions in which background has similar color to that of object then classification
can be based on a component color that can differentiate an object or non-object.

5. Radial Basis Function Networks

A radial basis function network[2] is an artificial neural network that uses radial basis
functions as activation functions. It is a linear combination of radial basis functions. The
Radial basis function networks are neural nets consisting of three layers. The f irst input
layer feeds data to a hidden intermediate layer. The hidden layer processes the data and
transports it to the output layer. Only the tap weights between the hidden layer and the
output layer are modif ied during training. Each hidden layer neuron represents a basis
function of the output space, with respect to a particular center in the input space. The
activation function chosen is commonly a Gaussian kernel. This kernel is centered at th e
point in the input space specified by the weight vector. The closer the input signal is to the
current weight vector, the higher the output of the neuron w ill be. Radial basis function
networks are used commonly in function approximation and series prediction.

6. Description of Algorithm

6.1 Object bac kground separation

The object is selected, and a white rectangle then marks the object domain. Another box
is marked around the first one w ith surrounding region has equal number of pixels, w hich is
used as the object background.

The object and background are separated from each other. The R-G-B based joint
probability density function (pdf) of the object region and that of the background region is
obtained. The region within the marked region is used to obtain the object pdf and using the
marked background region the background pdf is obtained.

The Log-likelihood of a pixel considered in the object and background region is obtained as

where ho and hb are the probabilities of the ith pixel belonging to the object or the
background respectively, and є is small non-zero value to avoid numerical instability. A
binary image is then constructed by giving a threshold for which a particular pixel is
considered to be on object or in the background.

Page 4
where τ 0 is the threshold.

6.2 Feature Extraction

We use the color features of pixels to develop RBF based classifier. The result obtained
by applying classifier on a pixel gives values –1 or +1. If the selected pixel belongs to object
it is assigned +1, and if it belongs to the background -1.

6.3 Developing Object Mode l

The object model is developed using a radial basis function (RBF) classifier called the
„Object classifier‟ or „Object model‟. The object classifier classif ies the pixels into object or
background based on the output produced by the classifier. It is possible that with sufficient
number of neurons (second layer), any function can be approximated to any required level
accuracy. Let µi is a d-dimensional real vector, and σi is a d-dimensional positive real vector,
let them be the centre and the w idth of the Gaussian hidden neuron respectively, with α be
the output weights and N be the number of pixels.

The output with k neurons has the following form[1]:

The above equation can be rewritten in matrix form,

Ỳ = YH α

where Y H is the matrix representation of the neuron. Each row in the matrix Y H contains the
coefficients with inputs U1 , U2 , U3 …, Un. And µ and σ values are selected randomly. The
output weights are estimated analytically as
α= ( YH )† Ỳ

where (YH ) is the pseudo inverse of Y H.

6.4 Object Trac king

It is the process of tracing the path of an object from one frame to another in a video
sequence. The centroid of the object is calculated from the output of the classifier. In the
first frame where we select the object, we calculate the centroid of the object of that frame.
Then we proceed to the next frame the new centroid for the object is calculated. If the
calculated new centroid is w ithin є range (i.e. tolerance) of the previous f rame then the new
centroid is the assigned as the current object centroid and proceeds to the next f rame.
Otherwise recursively find the new centroid till it is within є range of the previous centroid.

Page 5
7. Implementation

This algorithm was implemented in C++ using OpenCV libraries[3]. The code flow is given:

Fig 1: Code Flow

Page 6
8. Results

The algorithm is tested on various video samples. The results are given below, and
the problems encountered during experiments are also noted.

8.1 Like lihood Results

The following figures show sources (Fig. 2(a), 3(a)) and their binary images (Fig. 2(b),
3(b)) based on likelihoods.

Fig. 2(a) Fig. 2(b) Fig. 3(a) Fig. 3(b)

8.2 C lassifie r Results

The follow ing figures show the results of the classifier. The first column f igures (Fig. 4(a),
5(a)) show the object selection. The second column (F ig. 4(b), 5(b)) shows the
corresponding binary images based on likelihoods, and the third set (Fig. 4(c), 5(c)) shows
the binary images that are obtained from the classifier.

Fig. 4(a) Fig. 4(b) Fig. 4(c)

Fig. 5(a) Fig. 5(b) Fig. 5(c)

Page 7
8.3 Trac king results

The follow ing figures show tracking rectangle of the object and their respective binary
images from the classifier.

Vi deo frame Binary i mage Vi deo Frame Binary Image

Fig. 6(a) Fig. 6(b) Fig. 7(a) Fig. 7(b)

Fig. 8(a) Fig. 8(b) Fig. 9(a) Fig. 9(b)

Fig. 10(a) Fig. 10(b) Fig. 11(a) Fig. 11(b)

Fig. 6(a), 7(a), 8(a), 9(a), 10(a), and 11(a) correspond to frame numbers 89, 172, 265,
316, 394 and 404 respectively.

8.4 Issues

The problems encountered in the tracking experiment are discussed below.

1) Similar bac kground color: when the neighborhood of the object has color very close to
that of the object, then the algorithm gives false detection – white mark on the f loor has
misled the tracking.

Fig. 12(a) Fig. 12(b)

Page 8
Fig. 12(c) Fig. 12(d)

2) Occlusion: When the tracking object (car in F ig. 13) is completely covered by any other
surrounding environment (tree in Fig. 13) then the object information is lost thereby leading
to failure of tracking.

Fig. 13(a) Fig. 13(b)

Fig. 13(c) Fig. 13(d)

3) Intensity change: When the intensity of the light changes (i.e. change in lighting
effects) the color of object changes. Fig. 14(a), 15(a) are video frames and 14(b), 15(b) are
their binary images respectively. The performance of the classifier, designed originally for
different lighting conditions, would degrade. This can be clearly seen in the corresponding
binary images.

Page 9
Fig. 14(a) Fig. 14(b)

Fig. 15(a) Fig. 15(b)

9. Conclusions and future enhancements

A robust object-tracking algorithm using Radial Basis Function (RBF) networks has been
implemented using OpenCV libraries. The pixel-based color features are used to develop
classifiers. The algorithm has been tested on various video samples under differe nt
conditions, and the results are analyzed. The cases where the tracking algorithm fails are
also shown along with possible reasons. The RBF networks could be redesigned to
incorporate adaptive mechanisms for light variations and varying object domain, thresholds,
scale changes, and multiple camera-feeds.

Acknowle dgement: We thank Dr. U. N. Sinha (Head, Flosolver) for his constant
encouragement and inspiration. Without his support and guidance, this work would not have
been carried out.

References

[1] R Venkatesh Babu, S Suresh, and Anamit ra Makur, “ Robust Object Tracking with
Radial Basis Function Networks”, volume I, page 937-940, ICASSP, 2007.
[2] Simon Haykin – Neural Networks, 2nd Edition, 1999 Prentice Hall International
Publication.
[3] Gary Bradski and Adrian Kaebler - Learning OpenCV, 1s t Edition, 2008, O‟Reilly.

Page 10
Page 11

You might also like