You are on page 1of 12

This article has been accepted for inclusion in a future issue of this journal.

Content is final as presented, with the exception of pagination.


IEEE TRANSACTIONS ON INTELLIGENT TRANSPORTATION SYSTEMS 1
On Exploration of Classier Ensemble
Synergism in Pedestrian Detection
Luciano Oliveira, Student Member, IEEE, Urbano Nunes, Senior Member, IEEE, and Paulo Peixoto, Member, IEEE
AbstractA single feature extractorclassier is not usually
able to deal with the diversity of multiple image scenarios. There-
fore, integration of features and classiers can bring benets to
cope with this problem, particularly when the parts are carefully
chosen and synergistically combined. In this paper, we address the
problem of pedestrian detection by a novel ensemble method. Ini-
tially, histograms of oriented gradients (HOGs) and local receptive
elds (LRFs), which are provided by a convolutional neural net-
work, have been both classied by multilayer perceptrons (MLPs)
and support vector machines (SVMs). A diversity measure is used
to rene the initial set of feature extractors and classiers. A nal
classier ensemble was then structured by an HOGand an LRF as
features, classied by two SVMs and one MLP. We have analyzed
the following two classes of fusion methods of combining the
outputs of the component classiers: 1) majority vote and 2) fuzzy
integral. The rst part of the performance evaluation consisted
of running the nal proposed ensemble over the DaimlerChrysler
cropwise data set, which was also articially modied to simulate
sunny and shadowy illumination conditions, which is typical of
outdoor scenarios. Then, a window-wise study has been performed
over a collected video sequence. Experiments have highlighted
a state-of-the-art classication system, performing consistently
better than the component classiers and other methods.
Index TermsConvolutional neural network (CNN), fuzzy inte-
gral (FI), histograms of oriented gradients (HOGs), majority vote
(MV), multilayer perceptron (MLP), pedestrian detection, support
vector machine (SVM).
I. INTRODUCTION
D
ESIGNING a single feature extractorclassier that is
able to cope with a large image variability is a complex
task. This is so because it is particularly difcult to build a
feature extractor that uniquely represents the object of interest.
This way, it does not help the classier in every situation.
Therefore, a fusion of classiers has been studied over the past
few years, with the goal of overcoming certain inabilities of the
single feature extractorclassier [1]. One of the main goals of
classier ensembles is to explore the diversity of the component
classiers to enhance overall classication performance. In
other words, since there is no perfect individual classier yet,
Manuscript received December 12, 2007; revised May 8, 2008 and
February 19, 2009. This work was supported in part by the Portuguese Foun-
dation for Science and Technology under Grant PTDC/EEA-ACR/72226/2006.
The work of L. Oliveira was supported by Coordenao de Aperfeioamento
de Pessoal de Nvel Superior, which is a doctorate program of the Ministry of
Education of Brazil, through Grant BEX 4000-5-6. The Associate Editor for
this paper was M. Trivedi.
The authors are with the Institute of Systems and Robotics, Department
of Electrical and Computer Engineering, University of Coimbra, 3030-290
Coimbra, Portugal (e-mail: lreboucas@isr.uc.pt; urbano@isr.uc.pt; peixoto@
isr.uc.pt).
Color versions of one or more of the gures in this paper are available online
at http://ieeexplore.ieee.org.
Digital Object Identier 10.1109/TITS.2009.2026447
Fig. 1. Diversity in ensemble of two classiers. (Top) Individual classi-
cation by two classiers (dened by the light and dark bounding boxes).
(Bottom) Result of a classication fusion. It is expected that a fusion method
takes the complementarity of the component classiers into account such that
the performance of the ensemble is superior to the performance of the individual
classiers.
by assembling them, one can complement the other. If there
are errors in the ensemble components, it is expected that
they occur on different image objects to give rise to fusion
methods that improve the performance of the whole system.
Furthermore, the rationale of building ensembles is also that
most individual classiers agree in a certain way such that the
whole system can be more successful than its component parts.
Fig. 1 illustrates these aspects for a simple ensemble composed
of two classiers.
There are many ways to integrate features and classiers. An
underlying question can then be stated as follows: What is the
best way to embody the classiers determined? Although diver-
sity is a possible answer to this question, it is difcult to nd
a canonical relationship between diversity and ensemble accu-
racy. Kuncheva [1] presents some methods of measuring diver-
sity of classier ensembles. Aksela and Laaksonen [2] present
an in-depth study about the correlation between ensemble accu-
racy and diversity. Several types of component selection criteria
1524-9050/$25.00 2009 IEEE
Authorized licensed use limited to: Universidade de Coimbra. Downloaded on July 13, 2009 at 04:50 from IEEE Xplore. Restrictions apply.
This article has been accepted for inclusion in a future issue of this journal. Content is final as presented, with the exception of pagination.
2 IEEE TRANSACTIONS ON INTELLIGENT TRANSPORTATION SYSTEMS
for ensemble classiers were examined. Although there is no
unquestionable diversity measure for choosing the best ensem-
ble from an initial set of classiers, by exploiting the diversity
of errors across the component classiers, it is possible to
choose a set of classiers whose decisions are better integrated
by the fusion method. The authors suggest that the choice of
a diversity measure to prune the classiers in one ensemble
should rely on the goal of the fusion method and the type of
component classiers.
Based on two previous studies on performance comparison
of several feature extractors and classiers [3], [4], which
show two types of features as being excellent representations
for pedestrians, we present here an in-depth study on how to
combine variant versions of histograms of oriented gradients
(HOGs) [3] and local receptive elds (LRFs), which are pro-
vided by a type of convolutional neural network (CNN) [5].
These two types of features are then classied by multilayer
perceptrons (MLPs) and support vector machines (SVMs), and
decisions over their outputs are taken by the following two
types of fusion methods: 1) majority vote (MV) using sum,
weighted, and heuristic rules and 2) fuzzy integral (FI) using
Choquet [7] and Sugeno [8] rules.
To evaluate the performance of the nal ensemble classier,
the DaimlerChrysler (DC) cropwise data sets (18 36 pixels
image size) have been used with different characteristics: full
data sets as provided in [4] and full data sets with lighting
transformations (validation data set only). The application of
articial lighting transformations has been essential to the
analysis of the sensitivity of the proposed method to out-
door scenarios, i.e., prone to sunny (overexposure to sunlight)
and shadowy (penumbra) illumination conditions. Finally, we
have also studied the behavior of the proposed ensemble in
a window-wise evaluation over a collected video sequence,
searching all scales and locations in a 320 240 pixel frame.
In this paper, our contribution is not only a novel synergistic
ensemble method that performs consistently well, even in the
presence of changing lighting conditions, as a result of a
balance in the classication expertise, but a detailed evaluation
of CNN[5] applied to pedestrian recognition as well. Moreover,
it is shown that our proposed method is more accurate than the
component classiers, as well as the methods found in [4] and
[9] (also evaluated over DC data sets).
A. Background
There are many pedestrian-detection techniques in the in-
telligent transportation systems literature (for a recent survey
of methods, see the work of Gandhi and Trivedi [10]). One
of the earliest methods of recognizing pedestrians in images
was proposed by Papageorgious and Poggio [11]. In that work,
Haar wavelet features were applied, which were classied by an
SVM. The main drawback of that method resides on the lack of
invariance to illumination and shape distortion. Other methods
have been proposed with the aimof coping with these problems.
Dalal and Triggs [3] have densely applied the HOG, which
was used by Lowe [12] as a region descriptor and classied
by a linear SVM. This HOG extractor was compared with other
extractors, such as the Haar wavelet and principal components
analysis (PCA) applied to the scale-invariant feature transform
(SIFT) and shape contexts, reaching the highest accuracy over
the Institut National de Recherche en Informatique et Automa-
tique (INRIA) image data set [3]. Recently, Munder and Gavrila
[4] published a detailed study on the comparison of different
features and classiers. Their experiments have employed fea-
tures such as PCA, Haar wavelets, and a type of LRF obtained
by the weights of the MLPs hidden layer over a PCA and Haar
wavelet features. They concluded in their work that the latter
approach, which is classied by SVM or Adaboost, provides
the highest performance, with Adaboost presenting a lower
computational cost. Szarvas et al. [13] presented a combination
of a laser scanner sensor and a camera to recognize pedestrians.
The laser scanner was used as a region of interest (ROI) dener
with the goal of narrowing down the image areas to be clas-
sied. That approach is not able to nd pedestrians when the
laser scanner fails. The image classication was performed by
a CNN with state-of-the-art performance, although no analysis
with regard to CNN parameters is given.
With respect to fusion methods, Viola and Jones [14] have
proposed a classication system based on Haar wavelet fea-
tures classied by an Adaboost classier, which performs
classication fusion by using a cascade of weak classiers.
Llorca et al. [15] addressed a combination of classiers
based on a parts-based ensemble method by evaluating which
classier performs better for each part of persons body.
Oliveira et al. [16] presented a novel method based on a
hierarchical fuzzy integration (HFI) that combines HOG/SVM
and Haar wavelet/Adaboost classication systems. HFI relies
on fuzzy logic applied in the closed-unit-interval-scaled scores
provided by the classiers and the overlapping area of the de-
tected windows of two classiers (in a window-wise approach).
Nanni and Lumini [9] conceived a novel ensemble classier
based on a Laplacian eigen map (LEM) and Gabor local binary
(GLB) features, with each feature vector being classied by an
SVM. They followed a parts-based approach, where the images
in the DC data sets are divided into two parts (top and bottom),
where each part is represented by one of those features.
II. FEATURE EXTRACTORS
A. LRFs
The receptive eld (RF) of a neuron is a region in which neu-
ron inputs cause neuron outputs to change behavior. Neuronal
RFs can be categorized as local (i.e., LRF) or nonlocal, whether
stimuli come from a bounded region or not [6]. If the neurons of
an articial neural network (ANN) have LRFs, then this ANN
is an LRF neural network. One well-known example of an LRF
neural network is proposed by LeCun and Bengio [5], based on
a type of CNN that they named LeNet-5. It has been applied
successfully on document recognition [17] and face recogni-
tion [18], [19]. When it was rst applied, LeCun et al. [17]
intuitively showed that this type of CNN with connections to
specic regions (not fully connected) should demonstrate more
stability, shift and scale invariance, and a decrease in output di-
mensionality. However, later, De Ridder et al. [20] experiment-
ally found that a fully connected network might reach the same
performance without loss of generality. A CNN is composed
Authorized licensed use limited to: Universidade de Coimbra. Downloaded on July 13, 2009 at 04:50 from IEEE Xplore. Restrictions apply.
This article has been accepted for inclusion in a future issue of this journal. Content is final as presented, with the exception of pagination.
OLIVEIRA et al.: ON EXPLORATION OF CLASSIFIER ENSEMBLE SYNERGISM IN PEDESTRIAN DETECTION 3
Fig. 2. CNN implemented architecture. The rst ve neural network layers correspond to 18 36 pixel input gray-level image (input layer), with C
1
, S
1
, C
2
,
and S
2
, working as a trainable feature extractor. C layers convolve the input image and the previous S layers, which, in turn, subsample C layers, reducing
the dimensionality of the feature vector, obtaining only features which are hopefully important to the recognition process. In the last two layers, an MLP neural
network is used to classify S
2
features. During the training, the error signal is backpropagated up to the C
1
layer.
of the following two important parts, which characterize it as
having some degree of shift, scale, and distortion invariance:
1) feature maps, which are composed of C and S layers and are
responsible for rst convolving the input and then subsampling
it, consequently reducing its dimensionality, and 2) weight
sharing, where individual neurons of a feature map share a set
of weights, decreasing the number of free parameters to train
and allowing a parallel implementation of the neural network.
We implemented a variant of LeCun and Bengios CNN, with
full connections and just one hidden layer in the MLP. In Fig. 2,
our network architecture is presented with the best parameters
obtained by a cross-validation procedure performed on the
number of feature maps and hidden neurons of the MLP (see
Section V-B for further details). The input layer of our CNN is
an 18 36 pixel gray-level image with each pixel normalized to
(x )/, where x is the gray value of the pixel, is the mean,
and is the standard deviation. Both and are calculated over
all input pixels. This normalization has the advantage of speed-
ing the convergence of the training process, also providing more
invariance to illumination change. The next four layers, namely,
C
1
, S
1
, C
2
, and S
2
, are in charge of extracting the features: The
C
1
layer is composed of four feature maps, which have been
obtained by convolving regions of 5 5 (size of the kernel) pix-
els of the input image (each region of a feature map shares the
same weights); the S
1
layer takes regions of 4 4 (size of the
kernel) of C
1
, subsampling them with the objective of reducing
dimensionality; a sigmoidal function is applied at the end of
each layer; and similar steps are followed on C
2
and S
2
, making
these layers act like a bipyramid feature extractor. The feature
vector is obtained from the output of S
2
. In the last two layers,
an MLP neural network is to classify S
2
features, backpropa-
gating the error up to the C
1
layers input, during the training
stage. Outputs y
(i,j)
n
of each of the C and S layers are given by
y
(i,j)
n
=b
n
+
M

l=1
K

s=1
K

t=1
W
n,l,s,t
X
(
w
(i1)+s,
h
(j+t))
l
(1)
y
(i,j)
n
=b
n
+
K

s=1
K

t=1
X
(
w
(i1)+s,
h
(j+t))
k
(2)
with n = 1, . . . , N, where N is the number of feature map
outputs, i and j are region coordinates of a feature map, X is
the input vector, K denotes the size of a square kernel (usually
with the same size onto width and height directions), M is the
number of input images or feature maps,
w
and
h
are the
strides (in pixels) between each application of the kernel onto
width and height directions, respectively, W
n,l,s,t
represents
the weight vectors of each output neuron, and is a constant
of subsampling.
B. HOG
Lowe [12] used an HOG as a descriptor by computing it only
over stable points, which are obtained after applying a pyramid
of Gaussians. These stable points are referred as invariant
to afne transformations. Later, Dalal and Triggs [3] applied
rectangular and circular log-polar types of HOG in a dense
way, weighting pixels, cells, and blocks and normalizing the
feature vector by its norm. These procedures provide a lexicon
of features that are partially scale and lighting invariant. In our
algorithm, we have followed the same strategy as in [21] to
extract rectangular HOGs, rst computing the edges by a simple
kernel mask [1, 0, 1], onto vertical and horizontal directions;
next, the gradients of the edges are calculated in regions of 3
3 pixel cells and 2 2 cell blocks (block descriptors are then
6 6 pixels wide), with a block stride of 2 pixels, preventing
falling off boundaries. For pedestrians, the histogram of the
edges is calculated in a half circle and is composed of nine
bins of 20

. After extracting the features, the feature vector is


normalized by applying an L2-Hys norm [21].
III. CLASSIFIERS
SVMs and MLPs are both discriminant classiers of the
type f(X) = sign(W X +b), where X R is the input
vector, W R
N
is the weight vector, and b R is the bias
component that adjusts the hyperplane f(X) to better sepa-
rate X. The main difference between these two classiers is
with respect to the way weights and biases are obtained. For
SVMs, a quadratic and convex optimization problem is cast,
Authorized licensed use limited to: Universidade de Coimbra. Downloaded on July 13, 2009 at 04:50 from IEEE Xplore. Restrictions apply.
This article has been accepted for inclusion in a future issue of this journal. Content is final as presented, with the exception of pagination.
4 IEEE TRANSACTIONS ON INTELLIGENT TRANSPORTATION SYSTEMS
Fig. 3. Initial ensemble. Composed of two feature extractors HOG and LRF
(provided by the CNN) and four classiers (two MLPs and two SVMs). HS =
HOG/SVM, HM = HOG/MLP, LS = LRF/SVM, and LM = LRF/MLP.
and when the input space is nonlinearly separable, a kernel
function is used to take the input vector to a higher dimensional
feature space, separating it linearly. In our approach, the SVM-
Light library was used to obtain SVM models (http://svmlight.
joachims.org/). For MLP training, we used a stochastic gradient
descendent method by backpropagating the error.
IV. ENSEMBLE OF CLASSIFIERS
Considering an ensemble of classiers, it is reasonable to
assume that individual classiers make some errors in different
objects, and complementarity among the classiers makes the
ensemble superior than its parts. In this sense, let us consider
the initial ensemble of classiers depicted in Fig. 3.
The initial ensemble is structured by HOG and LRF features,
with each feature vector being classied by an MLP and an
SVM. Henceforth, consider the following notation for each pair
of extractorclassier: HS = HOG/SVM, HM = HOG/MLP,
LS = LRF/SVM, and LM = LRF/MLP. The MLP of HM has a
150-neuron hidden layer, and the parameters of the other feature
extractors and classiers are given in Section V.
The choice of the two feature extractors (HOG and LRF)
was based on the studies found in [3], [4], and [13]. Dalal and
Triggs [3] presented an experimental analysis demonstrating
that HOG features outperforms PCA-SIFT, Haar wavelets, and
shape contexts in a complex data set. Munder and Gavrila [4]
also experimentally showed that LRF features, which are com-
puted from an MLP over the Haar wavelet or PCA features and
classied by SVM or Adaboost, present superior performance
in comparison with PCA and Haar wavelets. Szarvas et al. [13]
found that LRFs built from CNNs have a great potential for
pedestrian recognition. Motivated by these studies, our goal is
to show that there is an opportunity to synergistically integrate
the outputs of high-performance classiers acting over these
two types of features.
A. Rening the Initial Ensemble
There is much discussion whether ensembles of less-accurate
classiers can outperform ensembles of more-accurate classi-
ers with less diversity [1], [22]. Here, we found that ensembles
of accurate classiers have the following advantages: 1) more
stable consensus between the parts and 2) in the presence of
TABLE I
EXPONENTIAL ERROR COUNT (E
ec
) FOR CLASSIFIER COMBINATIONS
FROM THE INITIAL ENSEMBLE
lighting variation, where the classication performance is natu-
rally dropped, the consensus and diversity of opinions can bring
better benets to the ensemble performance. Despite these
advantages, we are still looking for some diversity in the en-
semble, aiming to raise the overall performance of the system.
Rather than simply relying on the accuracies of individual
classiers to choose the best combination of feature extractors
and classiers, we are interested in a causal relationship be-
tween the diversity and ensemble accuracy, as experimentally
demonstrated in [2]. This way, the initial ensemble architecture
depicted in Fig. 3 has been used as a basis for a diversity
analysis to determine the best combination of classiers to
integrate the nal ensemble. In this sense, we have considered
the exponential error count studied in [2] to rene the ensemble.
This diversity measure counts the importance of the component
classiers not making the same errors too often, also consider-
ing the correct classication by scaling the diversity measure.
This assumption is very benecial in our approach since we
are considering component classiers with high accuracy. The
best set of classiers is found by taking the one with the lowest
exponential error count (E
ec
) given by
E
ec
=
K

i=1
_
N
i0
same
_
i
N
K1
+ 1
(3)
where K is the number of classiers in a set, N
i0
same
denotes
the count of errors made by a total of i classiers to the same
class, and N
K1
denotes the number of testing samples for
which all classiers in the set are correct. The rationale of this
diversity measure is, thus, of counting the frequency of the same
mistakes made by groups of classiers in a set, scaling the result
by the total agreement within the set. It is noteworthy that by
making this, one relates not only the accuracy of the classier
but the coincident error of the set of classiers as well. The E
ec
for each combination of classiers from the initial ensemble
was computed by using the original DC data sets provided
in [4]. Each classier was trained by using the three training
data sets, consequently obtaining three classication models.
Each one of those models were then used to classify the two
validation data sets (refer to Section V for more details). The
nal result was then averaged by those six combinations. The
values of E
ec
are summarized in Table I. According to Table I,
the set of classiers HS + LM + LS presents the lowest E
ec
.
When the initial ensemble is evaluated, the value of E
ec
is the
highest one. There is no advantage to selecting sets of two
classiers since it would be difcult for the fusion methods
applied here to recover from errors, which would make a good
expected decision.
Authorized licensed use limited to: Universidade de Coimbra. Downloaded on July 13, 2009 at 04:50 from IEEE Xplore. Restrictions apply.
This article has been accepted for inclusion in a future issue of this journal. Content is final as presented, with the exception of pagination.
OLIVEIRA et al.: ON EXPLORATION OF CLASSIFIER ENSEMBLE SYNERGISM IN PEDESTRIAN DETECTION 5
TABLE II
PEDESTRIAN DATA SETS
B. Classier Fusion Methods
Let V = {V
1
, V
2
, V
3
} be the set of feature extractor
classiers LM, LS, and HS, respectively, such that V
i
: R
n

i
, where
i
= {1, 1}, i = 1, 2, 3 assigns a binary class label
for V
i
. In practice, our deterministic classiers provide a con-
dence score output (s
c
), which is obtained from the distance
between the input vector and the separating hyperplane.
Two classes of fusion methods are evaluated in this paper,
namely 1) MV and 2) FI. For the former, sum, weighted,
and heuristic rules are applied. Heuristic MV was conceived
to strengthen the different characteristics of the component
classiers, i.e., according to evidence in the experiments, we
found that HS is better to dene nonpedestrian class, while LM
and LS are better at classifying pedestrians. In the second class,
Sugeno and Choquets FIs have been analyzed. Next, some
background on the methods is presented:
1) MV: For MV, sum (S
MV
), weighted (W
MV
), and heuris-
tic (H
MV
) rules are given by
S
MV
=sign
_
3

i=1

i
_
(4)
W
MV
=sign
_
3

i=1
b
i

i
_
(5)
where b
i
log(p
i
/(1 p
i
)), and p
i
is the global accuracy of
each component classier
H
MV
=
_
1,

3
i=1

i
> 0 or
1
= 1 or
2
= 1
1,

3
i=1

i
< 0 or
3
= 1.
(6)
2) FI: Implementation of the FI fusion methods follows the
denitions: Let X be an input vector with N
c
elements, where
N
c
is the number of classiers, and let P(X) be the power set
E. Since X must be in the interval [0, 1], the SVM and MLP
scores (s
c
) are scaled to the closed-unit-interval (henceforth,
referred as scaled score) by the following logistic link function
(LLF):
P(s
c
) =
1
1 +e
s
c
(7)
where s
c
[0, ] is the condence score provided by each
classier.
A fuzzy measure g : P(X) [0, 1] represents the individ-
ual importance of each classier V
i
, satisfying the following
properties:
1) g() = 0, and g(X) = 1 (boundary conditions).
2) g(A) g(B) if A B for any subsets A, B P(X)
(monotonicity).
g is called a -fuzzy measure for any subsets A, B P(X)
and A B = such that
g(A B) = g(A) +g(B) +g(A)g(B) (8)
where 1 denotes the degree of interrelation between
subsets A and B according to
+ 1 =
N
c

i=1
(1 +g
i
) (9)
where g
i
is a fuzzy measure used to express the decision support
for each individual classier.
Let h : (X) [0, 1] be a membership function, which
monotonically decreases with respect to each element of X (if
it does not hold, X must be resorted), and let H = {h
i
, i =
1, . . . , N
c
}. Then, Sugeno FI is dened as follows:
_
h(x) g = max
EX
_
min
xE
(h(x), g(E))
_
. (10)
The Choquet FI differs in the way that it is computed, i.e.,
_
h(x) g = h
1
(x) +
2
N
c

i=2
[h
i1
(x) h
i
(x)] g
i1
. (11)
V. EXPERIMENTAL ANALYSIS
In this section, an experimental analysis is described to
highlight the performance of the proposed method. It consists
of two experiments over cropwise data sets (all images are
18 36 pixels wide) and a video sequence (320 240 frame
set). Moreover, the proposed ensemble was also compared with
two other methods over DC data sets. The classication perfor-
mance is analyzed by using receiver operating characteristics
(ROC) curves.
A. Methodology and Data Sets
Here, we evaluate the performance of the proposed ensemble
over DC data sets by also applying articial lighting transfor-
mations (see Section V-D for more details). The objective is to
study systematically the behavior of the ensemble method.
Table II depicts the characteristics of each data set. The
DC data sets contain pedestrian images in a variety of poses,
scene illumination, and contrast. Some samples of these data
sets are depicted in Fig. 4. For experimental evaluation, we
have followed a four-step methodology: 1) parameter selection;
2) performance evaluation over DC data sets; 3) sensitivity
analysis with respect to lighting transformation; and, nally,
Authorized licensed use limited to: Universidade de Coimbra. Downloaded on July 13, 2009 at 04:50 from IEEE Xplore. Restrictions apply.
This article has been accepted for inclusion in a future issue of this journal. Content is final as presented, with the exception of pagination.
6 IEEE TRANSACTIONS ON INTELLIGENT TRANSPORTATION SYSTEMS
Fig. 4. Samples. Some samples of (a) and (b) DC and (c) transformed DC
data sets. (a) Pedestrian examples. (b) Nonpedestrian examples. (c) Pedestrian
images after lighting transformations. From left to right: Original image,
shadowy, and sunny effects. (d) Edge information of the images in (c).
4) a window-wise analysis over a gathered data set in an urban
scenario.
B. Parameter Selection
Only LRF parameters have been found since the HOG pa-
rameters, which are used as reference in this paper, are provided
in [21]. To nd the former, the number of C and S feature
maps and the number of hidden neurons in the MLP have been
varied on a cross-validation procedure. The size of the kernels
has been kept at the nest possible resolution as in [17]: C
1
=
5 5, S
1
= 4 4, C
2
= 2 2, and S
2
= 3 3, as illustrated
in Fig. 2. As shown in Fig. 5, the best LRF parameters were
found as F
C
1
= 4, F
S
1
= 4, F
C
2
= 14, F
S
2
= 14, and HN =
25, where F
C
n
and F
S
n
represent the numbers of feature maps
in C and S layers, respectively, and HN is the number of
hidden neurons in the MLP layer. With these parameters, the
LRF network reached a 94% hit rate (HR).
C. Evaluation of the Component Classiers
All curves plotted over DC data sets were obtained by train-
ing the classiers with the three training data sets, averaging the
result of the classication models over the two validation data
sets, following the same methodology as in [4]. Fig. 6(a) and (b)
shows the ROCs of the individual classiers. For LM, the best
parameters shown in Fig. 5 have been used. For LS and HS,
three types of SVM kernels, namely, 1) linear, 2) third-degree
polynomial (poly3), and 3) radial basis function (RBF), have
been evaluated: The poly3 kernel showed the best performance
for both SVM classiers over LRF and HOG; the best points in
the ROCs were chosen at 4% of false alarm rate (FAR), where
Fig. 5. LRF parameters. Parameters have been found by varying the number
of C and S feature maps and the number of hidden neurons. The best achieved
parameters were F
C
1
= 4, F
S
1
= 4, F
C
2
= 14, F
S
2
= 14, and HN = 25,
with the best HR equal to 94%.
LS-poly3 is equal to 91%, LM is equal to 88%, and HS-poly3
is equal to 92%.
D. Sensitivity Analysis
Two articial lighting transformations have been applied to
modify the validation data sets with the aim of creating the
effect of shadowy and sunny (overexposure to sunlight) effects.
The shadowy effect is obtained by applying I

shad
(x, y) =
I(x, y) (2/(w 1))y +, where I(x, y) is the original im-
age of width w, and is a pixel constant equal to 80 in our ex-
periments; the sunny effect is obtained by applying a multiscale
retinex (MSR) algorithm [25]. MSR was originally designed to
image enhancement by estimating scene reectance from the
ratios of scene intensities. In our experiments, MSR parameters
were taken to produce brighter images, simulating a sunlit effect
on the objects in the scene. Fig. 4(c) depicts some image sam-
ples of the articial transformations applied to DC validation
data sets. Fig. 4(d) shows, in turn, the effect of the lighting
transformation on the edge information of each image; it can
be observed that the shadowy effect makes the image lose some
edge information since this transformation obscures part of the
image. On the other hand, the sunny effect depends on the illu-
mination of the image: If the image is under illumination (orig-
inal image in Fig. 4(c), left), the dark areas raise the contour of
the image, causing more edges to appear, whereas application
on a brighter image (original image in Fig. 4(c), right) leads
the image to have brighter areas and, consequently, to a loss of
some edge information. This side effect of the light transfor-
mations mainly inuences the individual classiers since they
are based on edge detection, causing a decline in their perfor-
mances. This fact also produces an increase of the diversity of
the classiers (as the agreement among them decreased), and
the fusion method can explore this synergismto raise the overall
performance of the system.
Authorized licensed use limited to: Universidade de Coimbra. Downloaded on July 13, 2009 at 04:50 from IEEE Xplore. Restrictions apply.
This article has been accepted for inclusion in a future issue of this journal. Content is final as presented, with the exception of pagination.
OLIVEIRA et al.: ON EXPLORATION OF CLASSIFIER ENSEMBLE SYNERGISM IN PEDESTRIAN DETECTION 7
Fig. 6. ROCs of the component classiers. Curves show the performance of the individual classiers over DC and transformed DC data sets. In (a), the
performance of MLP, linear SVM, RBF SVM, and poly3 SVM over LRF features are depicted; in (b), SVM with linear, RBF, and poly3 kernels are plotted
over HOG features; (c) and (d) illustrate the best individual classier performance over transformed DC data sets. All ROC curves are averaged on DC training
and validation data sets for comparison purposes.
TABLE III
COMPARATIVE RESULTS AFTER LIGHTING TRANSFORMATIONS
The ROC curves of the individual classiers, after the light-
ing transformations, can be seen in Fig. 6(c) and (d), where the
points highlighted in the boxes correspond to the same scores
of the classiers at 4% of FAR in the ROC curves of Fig. 6(a)
and (b).
Table III summarizes the results of the best individual classi-
ers achieved in the ROCcurves of Fig. 6. Note that whereas LS
and LM decreased by approximately 3 and 2 percentage points
of HR, HS decreased by 2 percentage points under the shadowy
effect; under the sunny effect, LS and LM decreased by 1,
and HS decreased by 4 percentage points of HR, respectively.
These individual classier behaviors demonstrate that they
were affected in the opposite way with respect to those two
lighting transformations, allowing the fusion method to balance
the overall performance.
E. Evaluation of the Fusion Methods
In this section, the evaluation of the fusion methods is dis-
cussed. Since N
c
= 3, eight fuzzy measures must be dened.
The initial values for g(P(s
c1
)), g(P(s
c2
)), and g(P(s
c3
))
have been chosen to be 0.15, 0.24, and 0.30 for LM, LS,
and HS, respectively, which come from the best points in the
ROC curves of the scaled scores of the individual classiers,
which are given by (7). The fuzzy measures of the other
aggregated subsets g({P(s
c1
), P(s
c2
)}), g({P(s
c1
), P(s
c3
)}),
and g({P(s
c2
), P(s
c3
)}) can now be calculated from (8) us-
ing 1 + = (1 + 0.15)(1 + 0.24)(1 + 0.30). After nd-
ing the fuzzy measures, a threshold was chosen to be the
value of the minimum fuzzy measure of a set of two
classiers. In other words, the threshold for the FIs was
g({P(s
c1
), P(s
c2
)}) 0.47. This value indicates that one
should rely on a fuzzy output if it is greater than the fuzzy
measure of the set consisting of LM and LS.
Table IV presents the results of the fusion methods over DC
data sets. It is worth noting that Sugeno FI and the heuristic MV
perform similarly, although the employment of Sugeno FI can
provide a more comprehensive framework and theoretical basis
for the classier fusion since one can choose the appropriate
fuzzy measures.
Authorized licensed use limited to: Universidade de Coimbra. Downloaded on July 13, 2009 at 04:50 from IEEE Xplore. Restrictions apply.
This article has been accepted for inclusion in a future issue of this journal. Content is final as presented, with the exception of pagination.
8 IEEE TRANSACTIONS ON INTELLIGENT TRANSPORTATION SYSTEMS
TABLE IV
RESULTS OF THE FUSION METHODS OVER DC DATA SETS
TABLE V
RESULT OF THE FUSION METHODS OVER TRANSFORMED DC DATA SETS
TABLE VI
SUMMARY OF THE RESULTS OVER DC AND TRANSFORMED
DC DATA SETS [HR/FAR (IN PERCENT)]
The effect of the lighting transformations in the ensemble is
summarized in Table V. Interestingly, despite a decrease in the
performance of the individual classiers caused by the light-
ing transformations, the classier fusion compensated those
losses by combining different characteristics of the component
classiers.
Table VI summarizes the global performance results, consid-
ering the best fusion methods compared with the component
classiers over DC and transformed DC data sets. Over DC
data sets, the best fusion methods (heuristic MV and Sugeno FI)
increased by approximately 8.5, 5.5, and 4.5 percentage points
of HR from the best LM, LS, and HS, respectively, whereas
there was a decrease by 2.1 and 1.6 percentage points in FAR
for heuristic MV and Sugeno FI, respectively. The last line on
the table presents the average performance over all data sets.
Heuristic MV and Sugeno FI methods present similar average
performances, with a difference of 0.2 percentage points of HR
and 0.3 of FAR between them. It is worth noting an average
gain of approximately 4.7 percentage points from HS (the
best individual classier) in comparison with the heuristic MV
and Sugeno FI fusion methods. Furthermore, by comparing
fusions after the lighting transformations, in the worst case
(Sugeno FI and DC shadowy), the performance of the ensemble
decreased by approximately 3.5 percentage points of HR, while
it increased by 0.9 percentage points of FAR with respect to the
classication fusion without the lighting transformations.
According to the results, the nal architecture of our pro-
posed method is then illustrated in Fig. 7.
F. Comparison With Other Methods Over DC Data Sets
Table VII summarizes the HR/FAR of our component and
ensemble classiers, as well as the results of two other methods:
SVM on LRF (bootstrapped) [4] and Sum MV on GLB+LEM
(nonbootstrapped) [9]. As can be noticed, even the individual
classiers LS and HS outperform those two methods concern-
ing HR and FAR. In comparison with the methods in [4] and
[9], heuristic MV, and approximately Sugeno FI, increased by
6.5 percentage points of HR, whereas it decreased by 3.1 per-
centage points in FAR. It is important to observe the following:
1) The nal result in [4] is achieved after an increment in the
number of training images (bootstrap), while neither in our
work nor in [9] was this additional step applied, and 2) as men-
tioned before, the generation of the LRF features in [4] involved
the training of an MLP applied to Haar-like features and PCAs,
while in our work, the LRFs obtained from a CNN led to more
invariance on illumination change and image shifts.
G. Evaluation on Nature-inspired Smart Information Systems
(NiSIS) Competition
The proposed ensemble method with the heuristic MV won
the most accurate model award at the 2007 NiSIS interna-
tional competition out of 16 participants (http://www.nisis.risk-
technologies.com). The model was evaluated on a subset
1
of
DC data sets, containing 1225 training images and 2450 labeled
images, for validation of the algorithms, and 6125 unlabeled
images and for measuring the performance in the compe-
tition. Images on testing and validation data sets were arti-
cially occluded. In the pedestrian classication challenge, our
proposed method achieved a classication accuracy of 95.97%.
H. Evaluation on Video Sequence
In addition to the study and conclusions presented over DC
and transformed DC cropwise data sets, the best ensemble,
which is composed of the three classiers and Sugeno FI fusion,
has been tested on a video sequence gathered at the Engineer-
ing campus of Coimbra University. This video sequence has
364 frames (640 480 pixel frames resized to 320 240) with
430 annotated pedestrians (ground truth) in different poses and
interactions. To locate and recognize the pedestrians in each
frame, a sliding window technique has been applied with the
goal of analyzing the tradeoff between recognition performance
and speed. After nding the detected windows, they were clus-
tered by a nonmaxima suppression algorithm. Only pedestrians
at a distance up to 25 m were considered for annotation, as
constrained by a laser scanner used to gather the data set. A
pedestrian was successfully matched if an overlap criterion was
met, i.e.,
A
Overlap
=
A
gt
A
det
A
gt
A
det
0.4 (12)
where A
gt
corresponds to the area of the ground truth bound-
ing box, and A
det
is the area of the detected bounding box.
If A
Overlap
is greater or equal than 40%, then the detected
bounding box is considered a hit.
Since an 18 36 pixel window with three strong classiers is
not a suitable candidate for a proper recognition speed, we have
decided to double and triple the size of the cropped images,
considering yet a window size of 64 128 pixels [3], [11],
1
These data sets are provided in http://www.isr.uc.pt/~lreboucas as they
were since they are no longer publicly available on the NiSIS website.
Authorized licensed use limited to: Universidade de Coimbra. Downloaded on July 13, 2009 at 04:50 from IEEE Xplore. Restrictions apply.
This article has been accepted for inclusion in a future issue of this journal. Content is final as presented, with the exception of pagination.
OLIVEIRA et al.: ON EXPLORATION OF CLASSIFIER ENSEMBLE SYNERGISM IN PEDESTRIAN DETECTION 9
Fig. 7. Final ensemble architecture. The component classiers provide scores with values [0, +], which are scaled to [0, 1] by an LLF. Finally, the Sugeno FI
fuses the classier results giving another condence score [0, 1], which is thresholded by the minimum value of the combination of two classiers.
TABLE VII
COMPARISON OVER DC DATA SETS
making the stride of the searching window equal to one eighth
for the 64 128 window size and one ninth for the rest, again
preventing falling off boundaries. Since doubling and tripling
the DC images would introduce enough distortion, the choice
was to use another data set to guarantee more stability in the
training stage. This way, the INRIA person data set [3] was
chosen (2416 and 15 000 nonpedestrians), resizing the 64
128 pixel images to 36 72 and 54 108 pixels wide. This
data set has the advantage of presenting a xed height for the
pedestrians in the image, these being centralized on a border
of 16 pixels on each side. These characteristics provide a more
stable searching in a sliding window procedure [3], [11].
Fig. 8(a)(c) depicts the ROC curves of the video sequence
evaluation. For the three feature extractorclassiers, we found
that a 54 108 pixel window achieved the best performance.
Adjusting the thresholds of the classiers to lie on 11%, HR of
84%, 84%, and 87% were obtained by LM, LS and HS, respec-
tively; 36 72 pixel windows showed generally more false pos-
itives per frame (considering all frames in the video sequence).
For HS, 54 108 and 64 128 pixel windows presented
similar performance with a small difference of 2 percentage
points. The best parameters found in the cropwise analysis have
been increased proportionally as the windows increased. In this
sense, CNN had the kernel sizes doubled or tripled according to
the increase of the window (for the 64 128 pixel window, the
values were inherited from the 54 108 pixel window). The
number of C
1
and S
1
feature maps has not changed, although
the size of the kernels in each layer has been reduced to half to
keep the number of free parameters in the network under con-
trol. Concerning HOG parameters, for a 36 72 pixel window,
a descriptor of 12 12 pixels with a 2 2 cell block was
applied, while for 54 108 and 64 128 pixel windows, the
best parameters provided in [21] were used, i.e., a 16 16 pixel
descriptor, with a 2 2 cell block. In Fig. 9, two subsets of
sequences are shown. The annotated objects are represented by
the dark bounding boxes, while the detected (light) bounding
boxes represent the result of the recognition by the proposed
ensemble. Frame 189 shows an example of false alarm, and no
miss detection was encountered in those frame sequences.
1) Comparison With HFI: HFI was introduced in [16] with
the aim of fusing two different classiers in a fuzzy way.
Although it is possible to embody more classiers, it can be
intractable to manage fuzzy rules for more than two classi-
ers. As the input fuzzy variables (universe of discourse) are
the classier-scaled scores and overlapping areas of the two
resulting windows, the only possible way to use this method
is in a window-wise way, particularly because of the use of
overlapping areas as fuzzy input variables. HFI is applied to
independent classier executions, i.e., there is no need to dene
the same window size or sliding window parameters. This way,
HFI has been used with the following two different pairs of
classiers: 1) Haar wavelet/Adaboost + HS (originally used in
[16]) and 2) LS + HS. It is worth noting that as the training data
set has been changed and as the classiers have been retrained,
fuzzy measures of Sugeno FI had to be recalculated. The fuzzy
measures obtained for LM, LS, and HS were 0.18, 0.21, and
0.27, respectively, and the threshold of the fusion output was
0.46. Considering the same individual classier thresholds to
represent their ROCs [see Fig. 8(a)(c)], it can be noticed
in Fig. 8(d) that the proposed ensemble method outperforms
the other two HFI-based methods, with 94% of HR, reducing
the FAR of 7 percentage points, while increasing the HR by
7 percentage points with regard to the best individual classier
(HS poly3).
Authorized licensed use limited to: Universidade de Coimbra. Downloaded on July 13, 2009 at 04:50 from IEEE Xplore. Restrictions apply.
This article has been accepted for inclusion in a future issue of this journal. Content is final as presented, with the exception of pagination.
10 IEEE TRANSACTIONS ON INTELLIGENT TRANSPORTATION SYSTEMS
Fig. 8. ROCs of the video sequence evaluation. (a)(c) Evaluation of different sliding window sizes, considering also strides of one eighth for the 64 128 pixel
window and one ninth for the others. In (d), ROCs of the evaluation among the following three types of ensemble: 1) proposed ensemble using Sugeno FI as a
fusion method; 2) HFI over Haar wavelet/adaboost + HS; and 3) HFI over LS + HS.
Fig. 9. Examples of frame sequences. Annotations (ground truth) are the dark bounding boxes, while the light bounding boxes are the detected objects by the
ensemble classier. In frame 189, there is an example of false alarm.
VI. IMPLEMENTATION ISSUES
With the advent of multiprocessing machines, the paral-
lelization has become mandatory to achieve a balance between
classication performance and computational cost.
The more parallelizable a feature extractor or classier is,
the more one can speed them up. In this sense, we have started
research to parallelize our proposed ensemble method in a clus-
ter of Sony Playstation 3 (PS3). This platform brings special
advantages due to its low cost while providing more computa-
tional power than a modern central processing unit (even with
multiple cores). In preliminary studies, an HOG/linear SVM
detector has been implemented in a PS3, making use of some
parallelization techniques, but still without vectorization [26].
Concerning the sliding-window version of the proposed en-
semble, the processing time was 4 s to classify one frame
in a C++ implementation under Linux on a 1.8-MHz dual-
core machine (using only one core) with no programming
optimization. The heaviest component is the HOG extractor,
Authorized licensed use limited to: Universidade de Coimbra. Downloaded on July 13, 2009 at 04:50 from IEEE Xplore. Restrictions apply.
This article has been accepted for inclusion in a future issue of this journal. Content is final as presented, with the exception of pagination.
OLIVEIRA et al.: ON EXPLORATION OF CLASSIFIER ENSEMBLE SYNERGISM IN PEDESTRIAN DETECTION 11
taking 50% of the time. However, HOG implementation can
be optimized by an integral histogram [27], providing linear
time extraction. This way, it is expected that this processing
time could be reduced by a great factor in a PS3 using sliding-
window search, according to, e.g., [28] and [29].
VII. CONCLUSION
The main goal of an image-classication system is to recog-
nize the target object successfully. By building single-feature
extractorclassiers, one can often face problems, for instance,
with lighting changes, since nding the best tradeoff between
increasing the training data and classication performance is
not a simple task. By exploring ensemble methods, one is able
to create synergistic approaches to compensate the individual
inability of the component classiers under certain circum-
stances. In this paper, we contribute with a new ensemble
method, assessing its performance through an in-depth analysis.
Considering two types of light transformation (shadowy and
sunny effects), which were articially applied in the valida-
tion images and prone to happen in the outdoor environment,
we experimentally demonstrated the high performance of the
proposed classier ensemble. Although the heuristic MV and
Sugeno FI have reached the highest performance in the ex-
periments, we found that the use of the Sugeno FI provides
more benets in terms of generic framework. The best average
performance of our proposed method (Sugeno FI) was 94.6% of
HR and 3% of FAR in the cropwise data sets (considering also
the application of the lighting transformations) and 94% of HR
and 4% of FAR in the video sequence. Our research direction
now is to make use of global features and classiers to improve
classication performance under more difcult scenarios, tak-
ing advantage of contextual information in the scene.
REFERENCES
[1] L. Kuncheva, Combining Pattern Classiers: Methods and Algorithms.
New York: Wiley-Interscience, 2004.
[2] M. Aksela and J. Laaksonen, Using diversity of errors for selecting
members of a committee classier, Pattern Recognit., vol. 39, no. 4,
pp. 608623, Apr. 2006.
[3] N. Dalal and B. Triggs, Histograms of oriented gradients for human
detection, in Proc. IEEE Int. Conf. Comput. Vis. Pattern Recog., 2005,
pp. 886893.
[4] S. Munder and M. Gavrila, An experimental study on pedestrian
classication, IEEE Trans. Pattern Anal. Mach. Intell., vol. 28, no. 11,
pp. 18631868, Nov. 2006.
[5] Y. LeCun and Y. Bengio, Convolutional networks for images, speech and
time series, in The Handbook of Brain Theory and Neural Networks.
Cambridge, MA: MIT Press, 1995.
[6] S. Haykin, Neural Networks and Learning Machines. Englewood Cliffs,
NJ: Prentice-Hall, 2008.
[7] G. Choquet, Theory of capacities, Ann. Inst. Fourier, vol. 5, pp. 131
195, 1954.
[8] M. Sugeno, Theory of fuzzy integrals and its applications, Ph.D.
dissertation, Tokyo Inst. Technol., Tokyo, Japan, 1974.
[9] L. Nanni and A. Lumini, Ensemble of multiple pedestrian represen-
tations, IEEE Trans. Intell. Transp. Syst., vol. 9, no. 2, pp. 365369,
Jun. 2008.
[10] T. Gandhi and M. Trivedi, Pedestrian protection systems: Issues,
survey, and challenges, IEEE Trans. Intell. Transp. Syst., vol. 8, no. 3,
pp. 413430, Sep. 2007.
[11] C. Papageorgiou and T. Poggio, A trainable system for object detection,
Int. J. Comput. Vis., vol. 38, no. 1, pp. 1533, Jun. 2000.
[12] D. Lowe, Object recognition from local scale-invariant features, in
Proc. IEEE Int. Conf. Comput. Vis., 1999, pp. 11501157.
[13] M. Szarvas, U. Sakai, and J. Ogata, Real-time pedestrian detection using
LIDAR and convolutional neural networks, in Proc. IEEE Int. Symp.
Intell. Vehicles, 2006, pp. 213218.
[14] P. Viola and M. Jones, Rapid object detection using a boosted cascade,
in Proc. IEEE Int. Conf. Comput. Vis. Pattern Recog., 2001, pp. 511518.
[15] L. Llorca, M. Sotelo, L. Bergasa, P. Toro, J. Nuevo, M. Ocana, and
M. Garrido, Combination of feature extraction methods for SVM
pedestrian detection, IEEE Trans. Intell. Transp. Syst., vol. 8, no. 2,
pp. 292307, Jun. 2006.
[16] L. Oliveira, G. Monteiro, P. Peixoto, and U. Nunes, Towards a robust
vision-based obstacle perception with classier fusion in cybercars, in
Computer Aided System Theory (Eurocast 2007), vol. 4739. New York:
Springer-Verlag, 2007, pp. 10891096.
[17] Y. LeCun, L. Bottou, Y. Bengio, and P. Haffner, Gradient-based
learning applied to document recognition, Proc. IEEE, vol. 86, no. 11,
pp. 22782324, Nov. 1998.
[18] S. Lawrence, L. Giles, A. Tsoi, and A. Back, Face recognition: A con-
volutional neural network approach, IEEE Trans. Neural Netw.Special
Issue Neural Netw. Pattern Recognit., vol. 8, no. 1, pp. 98113, Jan. 1997.
[19] C. Garcia and M. Delakis, Convolutional face nder: A neural
architecture for fast and robust face detection, Proc. IEEE, vol. 26,
no. 11, pp. 14081423, Nov. 2004.
[20] D. De Ridder, R. Duin, M. Egmont-Petersen, L. van Vliet, and
P. Verbeek, Nonlinear image processing using articial neural networks,
Adv. Imaging Electron Phys., vol. 126, pp. 352450, 2003.
[21] N. Dalal, Finding people in images and videos, Ph.D. dissertation,
Institut National Polytechnique de Grenoble, Grenoble, France, 2006.
[22] G. Zenobi and P. Cunningham, Using diversity in preparing ensembles
of classiers based on different feature subsets to minimize generalization
error, in Proc. Eur. Conf. Mach. Learning. New York: Springer-Verlag,
2001, vol. 2167, pp. 576587.
[23] L. Kuncheva, That elusive diversity in classier ensembles, in Pattern
Recognition and Image Analysis, vol. 2652. New York: Springer-Verlag,
2003, pp. 11261138.
[24] P. Melville and R. Mooney, Constructing diverse classier ensembles
using articial training examples, in Proc. Int. Joint Conf. Artif. Intell.,
2003, pp. 505510.
[25] Z. Rahman, D. Jobson, and G. Woodell, A multiscale retinex for colour
rendition and dynamic range compression, in Proc. SPIE Int. Symp. Opt.
Sci., Eng., Instrum., Appl. Digital Image Process. XIX, 1996, pp. 183191.
[26] L. Oliveira, R. Britto, and U. Nunes, On using cell broadband engine for
object detection in ITS, in Proc. IEEE IROS 2nd Workshop Planning,
Perception Navigat. Intell. Vehicles, 2008, pp. 5458.
[27] Q. Zhu, S. Avidan, M-C. Yeh, and K-T. Cheng, Fast human detection
using a cascade of histograms of oriented gradients, in Proc. IEEE Conf.
Comput. Vis. Pattern Recog., 2006, pp. 14911498.
[28] H. Sugano and R. Miyamoto, A real-time object recognition system on
cell broadband engine, in Advances in Image and Video Technology,
vol. 4872. New York: Springer-Verlag, 2007, pp. 932943.
[29] A. Sarje and S. Aluru, Parallel biological sequence alignments on the cell
broadband engine, in Proc. IEEE Int. Parallel Distrib. Process. Symp.,
2008, pp. 111.
Luciano Oliveira (S03) received the B.Sc. and
M.Sc. degrees in mechatronics from the Federal
University of Bahia, Salvador, Brazil, in 1997 and
2005, respectively. He is currently working toward
the Ph.D. degree with the Institute of Systems
and Robotics, Department of Computer and Elec-
trical Engineering, University of Coimbra, Coimbra,
Portugal.
Since 2006, he has been participating on projects
such as the Perception Methods for an Intelligent
Transportation System, Multi-Target Detection and
Tracking in Semi-structured outdoor environment, and CyberC3, developing
object detection and localization systems based on vision and lidar.
Mr. Oliveira received First Place at the Nature-inspired Smart Information
Systems Competition, Problem Task: Analysis and Classication of the Daim-
lerChrysler Automotive Dataset Images, in 2007. In 2008, he received Third
Place at the Intel/GV Entrepreneurship and Venture Capital Competition with
the TruckSafe project for intelligent sensing in carrier trucking.
Authorized licensed use limited to: Universidade de Coimbra. Downloaded on July 13, 2009 at 04:50 from IEEE Xplore. Restrictions apply.
This article has been accepted for inclusion in a future issue of this journal. Content is final as presented, with the exception of pagination.
12 IEEE TRANSACTIONS ON INTELLIGENT TRANSPORTATION SYSTEMS
Urbano Nunes (S90M95SM09) received the
Lic. and Ph.D. degrees in electrical engineering from
the University of Coimbra, Coimbra, Portugal, in
1983 and 1995, respectively.
He is currently an Associate Professor with the
Faculty of Sciences and Technology and the Head
of the Department of Computer and Electrical En-
gineering, University of Coimbra. He is also a Re-
searcher with the Institute for Systems and Robotics,
University of Coimbra, where he is the Coordina-
tor of the Automation and Mobile Robotics Group
and the Coordinator of the Mechatronics Laboratory. He has been involved
with/responsible for several funded projects at both the national and interna-
tional levels in the areas of mobile robotics and intelligent vehicles.
Dr. Nunes is an Associate Editor for the IEEE TRANSACTIONS ON
INTELLIGENT TRANSPORTATION SYSTEMS, an Associate Editor for the IEEE
Intelligent Transportation Systems Magazine, and a Cochair of the Technical
Committee (TC) on Autonomous Ground Vehicles and Intelligent Transporta-
tion Systems (ITS) of the IEEE Robotics and Automation Society (RAS). He
received the IEEE ITS Society Outstanding Service Award in 2006, the IEEE
RAS Most Active TC Award as a Cochair of RAS TC on ITS in 2006, and
the First Place at the 2007 Nature-inspired Smart Information Systems Com-
petition, Problem Task: Analysis and Classication of the DaimlerChrysler
Automotive Data set Images, as a coauthor.
Paulo Peixoto (M09) received the B.Sc. degree in
electrical engineering, the M.Sc. degree in systems
and automation, and the Ph.D. degree in electri-
cal engineering from the University of Coimbra,
Coimbra, Portugal, in 1989, 1995, and 2003,
respectively.
He is currently an Assistant Professor with the
Department of Electrical and Computer Engineering,
University of Coimbra, and a Researcher with the
Institute of Systems and Robotics, University of
Coimbra. He has been involved with/responsible for
several funded projects at both the national and international levels in the areas
of computer vision, visual surveillance, and intelligent vehicles. His research
interests include computer vision applied to intelligent transportation systems,
pattern recognition, and humancomputer interaction.
Dr. Peixoto received First Place at the Nature-inspired Smart Informa-
tion Systems Competition, Problem Task: Analysis and Classication of the
DaimlerChrysler Automotive Data set Images, as a coauthor in 2007.
Authorized licensed use limited to: Universidade de Coimbra. Downloaded on July 13, 2009 at 04:50 from IEEE Xplore. Restrictions apply.

You might also like