Professional Documents
Culture Documents
l=1
K
s=1
K
t=1
W
n,l,s,t
X
(
w
(i1)+s,
h
(j+t))
l
(1)
y
(i,j)
n
=b
n
+
K
s=1
K
t=1
X
(
w
(i1)+s,
h
(j+t))
k
(2)
with n = 1, . . . , N, where N is the number of feature map
outputs, i and j are region coordinates of a feature map, X is
the input vector, K denotes the size of a square kernel (usually
with the same size onto width and height directions), M is the
number of input images or feature maps,
w
and
h
are the
strides (in pixels) between each application of the kernel onto
width and height directions, respectively, W
n,l,s,t
represents
the weight vectors of each output neuron, and is a constant
of subsampling.
B. HOG
Lowe [12] used an HOG as a descriptor by computing it only
over stable points, which are obtained after applying a pyramid
of Gaussians. These stable points are referred as invariant
to afne transformations. Later, Dalal and Triggs [3] applied
rectangular and circular log-polar types of HOG in a dense
way, weighting pixels, cells, and blocks and normalizing the
feature vector by its norm. These procedures provide a lexicon
of features that are partially scale and lighting invariant. In our
algorithm, we have followed the same strategy as in [21] to
extract rectangular HOGs, rst computing the edges by a simple
kernel mask [1, 0, 1], onto vertical and horizontal directions;
next, the gradients of the edges are calculated in regions of 3
3 pixel cells and 2 2 cell blocks (block descriptors are then
6 6 pixels wide), with a block stride of 2 pixels, preventing
falling off boundaries. For pedestrians, the histogram of the
edges is calculated in a half circle and is composed of nine
bins of 20
i=1
_
N
i0
same
_
i
N
K1
+ 1
(3)
where K is the number of classiers in a set, N
i0
same
denotes
the count of errors made by a total of i classiers to the same
class, and N
K1
denotes the number of testing samples for
which all classiers in the set are correct. The rationale of this
diversity measure is, thus, of counting the frequency of the same
mistakes made by groups of classiers in a set, scaling the result
by the total agreement within the set. It is noteworthy that by
making this, one relates not only the accuracy of the classier
but the coincident error of the set of classiers as well. The E
ec
for each combination of classiers from the initial ensemble
was computed by using the original DC data sets provided
in [4]. Each classier was trained by using the three training
data sets, consequently obtaining three classication models.
Each one of those models were then used to classify the two
validation data sets (refer to Section V for more details). The
nal result was then averaged by those six combinations. The
values of E
ec
are summarized in Table I. According to Table I,
the set of classiers HS + LM + LS presents the lowest E
ec
.
When the initial ensemble is evaluated, the value of E
ec
is the
highest one. There is no advantage to selecting sets of two
classiers since it would be difcult for the fusion methods
applied here to recover from errors, which would make a good
expected decision.
Authorized licensed use limited to: Universidade de Coimbra. Downloaded on July 13, 2009 at 04:50 from IEEE Xplore. Restrictions apply.
This article has been accepted for inclusion in a future issue of this journal. Content is final as presented, with the exception of pagination.
OLIVEIRA et al.: ON EXPLORATION OF CLASSIFIER ENSEMBLE SYNERGISM IN PEDESTRIAN DETECTION 5
TABLE II
PEDESTRIAN DATA SETS
B. Classier Fusion Methods
Let V = {V
1
, V
2
, V
3
} be the set of feature extractor
classiers LM, LS, and HS, respectively, such that V
i
: R
n
i
, where
i
= {1, 1}, i = 1, 2, 3 assigns a binary class label
for V
i
. In practice, our deterministic classiers provide a con-
dence score output (s
c
), which is obtained from the distance
between the input vector and the separating hyperplane.
Two classes of fusion methods are evaluated in this paper,
namely 1) MV and 2) FI. For the former, sum, weighted,
and heuristic rules are applied. Heuristic MV was conceived
to strengthen the different characteristics of the component
classiers, i.e., according to evidence in the experiments, we
found that HS is better to dene nonpedestrian class, while LM
and LS are better at classifying pedestrians. In the second class,
Sugeno and Choquets FIs have been analyzed. Next, some
background on the methods is presented:
1) MV: For MV, sum (S
MV
), weighted (W
MV
), and heuris-
tic (H
MV
) rules are given by
S
MV
=sign
_
3
i=1
i
_
(4)
W
MV
=sign
_
3
i=1
b
i
i
_
(5)
where b
i
log(p
i
/(1 p
i
)), and p
i
is the global accuracy of
each component classier
H
MV
=
_
1,
3
i=1
i
> 0 or
1
= 1 or
2
= 1
1,
3
i=1
i
< 0 or
3
= 1.
(6)
2) FI: Implementation of the FI fusion methods follows the
denitions: Let X be an input vector with N
c
elements, where
N
c
is the number of classiers, and let P(X) be the power set
E. Since X must be in the interval [0, 1], the SVM and MLP
scores (s
c
) are scaled to the closed-unit-interval (henceforth,
referred as scaled score) by the following logistic link function
(LLF):
P(s
c
) =
1
1 +e
s
c
(7)
where s
c
[0, ] is the condence score provided by each
classier.
A fuzzy measure g : P(X) [0, 1] represents the individ-
ual importance of each classier V
i
, satisfying the following
properties:
1) g() = 0, and g(X) = 1 (boundary conditions).
2) g(A) g(B) if A B for any subsets A, B P(X)
(monotonicity).
g is called a -fuzzy measure for any subsets A, B P(X)
and A B = such that
g(A B) = g(A) +g(B) +g(A)g(B) (8)
where 1 denotes the degree of interrelation between
subsets A and B according to
+ 1 =
N
c
i=1
(1 +g
i
) (9)
where g
i
is a fuzzy measure used to express the decision support
for each individual classier.
Let h : (X) [0, 1] be a membership function, which
monotonically decreases with respect to each element of X (if
it does not hold, X must be resorted), and let H = {h
i
, i =
1, . . . , N
c
}. Then, Sugeno FI is dened as follows:
_
h(x) g = max
EX
_
min
xE
(h(x), g(E))
_
. (10)
The Choquet FI differs in the way that it is computed, i.e.,
_
h(x) g = h
1
(x) +
2
N
c
i=2
[h
i1
(x) h
i
(x)] g
i1
. (11)
V. EXPERIMENTAL ANALYSIS
In this section, an experimental analysis is described to
highlight the performance of the proposed method. It consists
of two experiments over cropwise data sets (all images are
18 36 pixels wide) and a video sequence (320 240 frame
set). Moreover, the proposed ensemble was also compared with
two other methods over DC data sets. The classication perfor-
mance is analyzed by using receiver operating characteristics
(ROC) curves.
A. Methodology and Data Sets
Here, we evaluate the performance of the proposed ensemble
over DC data sets by also applying articial lighting transfor-
mations (see Section V-D for more details). The objective is to
study systematically the behavior of the ensemble method.
Table II depicts the characteristics of each data set. The
DC data sets contain pedestrian images in a variety of poses,
scene illumination, and contrast. Some samples of these data
sets are depicted in Fig. 4. For experimental evaluation, we
have followed a four-step methodology: 1) parameter selection;
2) performance evaluation over DC data sets; 3) sensitivity
analysis with respect to lighting transformation; and, nally,
Authorized licensed use limited to: Universidade de Coimbra. Downloaded on July 13, 2009 at 04:50 from IEEE Xplore. Restrictions apply.
This article has been accepted for inclusion in a future issue of this journal. Content is final as presented, with the exception of pagination.
6 IEEE TRANSACTIONS ON INTELLIGENT TRANSPORTATION SYSTEMS
Fig. 4. Samples. Some samples of (a) and (b) DC and (c) transformed DC
data sets. (a) Pedestrian examples. (b) Nonpedestrian examples. (c) Pedestrian
images after lighting transformations. From left to right: Original image,
shadowy, and sunny effects. (d) Edge information of the images in (c).
4) a window-wise analysis over a gathered data set in an urban
scenario.
B. Parameter Selection
Only LRF parameters have been found since the HOG pa-
rameters, which are used as reference in this paper, are provided
in [21]. To nd the former, the number of C and S feature
maps and the number of hidden neurons in the MLP have been
varied on a cross-validation procedure. The size of the kernels
has been kept at the nest possible resolution as in [17]: C
1
=
5 5, S
1
= 4 4, C
2
= 2 2, and S
2
= 3 3, as illustrated
in Fig. 2. As shown in Fig. 5, the best LRF parameters were
found as F
C
1
= 4, F
S
1
= 4, F
C
2
= 14, F
S
2
= 14, and HN =
25, where F
C
n
and F
S
n
represent the numbers of feature maps
in C and S layers, respectively, and HN is the number of
hidden neurons in the MLP layer. With these parameters, the
LRF network reached a 94% hit rate (HR).
C. Evaluation of the Component Classiers
All curves plotted over DC data sets were obtained by train-
ing the classiers with the three training data sets, averaging the
result of the classication models over the two validation data
sets, following the same methodology as in [4]. Fig. 6(a) and (b)
shows the ROCs of the individual classiers. For LM, the best
parameters shown in Fig. 5 have been used. For LS and HS,
three types of SVM kernels, namely, 1) linear, 2) third-degree
polynomial (poly3), and 3) radial basis function (RBF), have
been evaluated: The poly3 kernel showed the best performance
for both SVM classiers over LRF and HOG; the best points in
the ROCs were chosen at 4% of false alarm rate (FAR), where
Fig. 5. LRF parameters. Parameters have been found by varying the number
of C and S feature maps and the number of hidden neurons. The best achieved
parameters were F
C
1
= 4, F
S
1
= 4, F
C
2
= 14, F
S
2
= 14, and HN = 25,
with the best HR equal to 94%.
LS-poly3 is equal to 91%, LM is equal to 88%, and HS-poly3
is equal to 92%.
D. Sensitivity Analysis
Two articial lighting transformations have been applied to
modify the validation data sets with the aim of creating the
effect of shadowy and sunny (overexposure to sunlight) effects.
The shadowy effect is obtained by applying I
shad
(x, y) =
I(x, y) (2/(w 1))y +, where I(x, y) is the original im-
age of width w, and is a pixel constant equal to 80 in our ex-
periments; the sunny effect is obtained by applying a multiscale
retinex (MSR) algorithm [25]. MSR was originally designed to
image enhancement by estimating scene reectance from the
ratios of scene intensities. In our experiments, MSR parameters
were taken to produce brighter images, simulating a sunlit effect
on the objects in the scene. Fig. 4(c) depicts some image sam-
ples of the articial transformations applied to DC validation
data sets. Fig. 4(d) shows, in turn, the effect of the lighting
transformation on the edge information of each image; it can
be observed that the shadowy effect makes the image lose some
edge information since this transformation obscures part of the
image. On the other hand, the sunny effect depends on the illu-
mination of the image: If the image is under illumination (orig-
inal image in Fig. 4(c), left), the dark areas raise the contour of
the image, causing more edges to appear, whereas application
on a brighter image (original image in Fig. 4(c), right) leads
the image to have brighter areas and, consequently, to a loss of
some edge information. This side effect of the light transfor-
mations mainly inuences the individual classiers since they
are based on edge detection, causing a decline in their perfor-
mances. This fact also produces an increase of the diversity of
the classiers (as the agreement among them decreased), and
the fusion method can explore this synergismto raise the overall
performance of the system.
Authorized licensed use limited to: Universidade de Coimbra. Downloaded on July 13, 2009 at 04:50 from IEEE Xplore. Restrictions apply.
This article has been accepted for inclusion in a future issue of this journal. Content is final as presented, with the exception of pagination.
OLIVEIRA et al.: ON EXPLORATION OF CLASSIFIER ENSEMBLE SYNERGISM IN PEDESTRIAN DETECTION 7
Fig. 6. ROCs of the component classiers. Curves show the performance of the individual classiers over DC and transformed DC data sets. In (a), the
performance of MLP, linear SVM, RBF SVM, and poly3 SVM over LRF features are depicted; in (b), SVM with linear, RBF, and poly3 kernels are plotted
over HOG features; (c) and (d) illustrate the best individual classier performance over transformed DC data sets. All ROC curves are averaged on DC training
and validation data sets for comparison purposes.
TABLE III
COMPARATIVE RESULTS AFTER LIGHTING TRANSFORMATIONS
The ROC curves of the individual classiers, after the light-
ing transformations, can be seen in Fig. 6(c) and (d), where the
points highlighted in the boxes correspond to the same scores
of the classiers at 4% of FAR in the ROC curves of Fig. 6(a)
and (b).
Table III summarizes the results of the best individual classi-
ers achieved in the ROCcurves of Fig. 6. Note that whereas LS
and LM decreased by approximately 3 and 2 percentage points
of HR, HS decreased by 2 percentage points under the shadowy
effect; under the sunny effect, LS and LM decreased by 1,
and HS decreased by 4 percentage points of HR, respectively.
These individual classier behaviors demonstrate that they
were affected in the opposite way with respect to those two
lighting transformations, allowing the fusion method to balance
the overall performance.
E. Evaluation of the Fusion Methods
In this section, the evaluation of the fusion methods is dis-
cussed. Since N
c
= 3, eight fuzzy measures must be dened.
The initial values for g(P(s
c1
)), g(P(s
c2
)), and g(P(s
c3
))
have been chosen to be 0.15, 0.24, and 0.30 for LM, LS,
and HS, respectively, which come from the best points in the
ROC curves of the scaled scores of the individual classiers,
which are given by (7). The fuzzy measures of the other
aggregated subsets g({P(s
c1
), P(s
c2
)}), g({P(s
c1
), P(s
c3
)}),
and g({P(s
c2
), P(s
c3
)}) can now be calculated from (8) us-
ing 1 + = (1 + 0.15)(1 + 0.24)(1 + 0.30). After nd-
ing the fuzzy measures, a threshold was chosen to be the
value of the minimum fuzzy measure of a set of two
classiers. In other words, the threshold for the FIs was
g({P(s
c1
), P(s
c2
)}) 0.47. This value indicates that one
should rely on a fuzzy output if it is greater than the fuzzy
measure of the set consisting of LM and LS.
Table IV presents the results of the fusion methods over DC
data sets. It is worth noting that Sugeno FI and the heuristic MV
perform similarly, although the employment of Sugeno FI can
provide a more comprehensive framework and theoretical basis
for the classier fusion since one can choose the appropriate
fuzzy measures.
Authorized licensed use limited to: Universidade de Coimbra. Downloaded on July 13, 2009 at 04:50 from IEEE Xplore. Restrictions apply.
This article has been accepted for inclusion in a future issue of this journal. Content is final as presented, with the exception of pagination.
8 IEEE TRANSACTIONS ON INTELLIGENT TRANSPORTATION SYSTEMS
TABLE IV
RESULTS OF THE FUSION METHODS OVER DC DATA SETS
TABLE V
RESULT OF THE FUSION METHODS OVER TRANSFORMED DC DATA SETS
TABLE VI
SUMMARY OF THE RESULTS OVER DC AND TRANSFORMED
DC DATA SETS [HR/FAR (IN PERCENT)]
The effect of the lighting transformations in the ensemble is
summarized in Table V. Interestingly, despite a decrease in the
performance of the individual classiers caused by the light-
ing transformations, the classier fusion compensated those
losses by combining different characteristics of the component
classiers.
Table VI summarizes the global performance results, consid-
ering the best fusion methods compared with the component
classiers over DC and transformed DC data sets. Over DC
data sets, the best fusion methods (heuristic MV and Sugeno FI)
increased by approximately 8.5, 5.5, and 4.5 percentage points
of HR from the best LM, LS, and HS, respectively, whereas
there was a decrease by 2.1 and 1.6 percentage points in FAR
for heuristic MV and Sugeno FI, respectively. The last line on
the table presents the average performance over all data sets.
Heuristic MV and Sugeno FI methods present similar average
performances, with a difference of 0.2 percentage points of HR
and 0.3 of FAR between them. It is worth noting an average
gain of approximately 4.7 percentage points from HS (the
best individual classier) in comparison with the heuristic MV
and Sugeno FI fusion methods. Furthermore, by comparing
fusions after the lighting transformations, in the worst case
(Sugeno FI and DC shadowy), the performance of the ensemble
decreased by approximately 3.5 percentage points of HR, while
it increased by 0.9 percentage points of FAR with respect to the
classication fusion without the lighting transformations.
According to the results, the nal architecture of our pro-
posed method is then illustrated in Fig. 7.
F. Comparison With Other Methods Over DC Data Sets
Table VII summarizes the HR/FAR of our component and
ensemble classiers, as well as the results of two other methods:
SVM on LRF (bootstrapped) [4] and Sum MV on GLB+LEM
(nonbootstrapped) [9]. As can be noticed, even the individual
classiers LS and HS outperform those two methods concern-
ing HR and FAR. In comparison with the methods in [4] and
[9], heuristic MV, and approximately Sugeno FI, increased by
6.5 percentage points of HR, whereas it decreased by 3.1 per-
centage points in FAR. It is important to observe the following:
1) The nal result in [4] is achieved after an increment in the
number of training images (bootstrap), while neither in our
work nor in [9] was this additional step applied, and 2) as men-
tioned before, the generation of the LRF features in [4] involved
the training of an MLP applied to Haar-like features and PCAs,
while in our work, the LRFs obtained from a CNN led to more
invariance on illumination change and image shifts.
G. Evaluation on Nature-inspired Smart Information Systems
(NiSIS) Competition
The proposed ensemble method with the heuristic MV won
the most accurate model award at the 2007 NiSIS interna-
tional competition out of 16 participants (http://www.nisis.risk-
technologies.com). The model was evaluated on a subset
1
of
DC data sets, containing 1225 training images and 2450 labeled
images, for validation of the algorithms, and 6125 unlabeled
images and for measuring the performance in the compe-
tition. Images on testing and validation data sets were arti-
cially occluded. In the pedestrian classication challenge, our
proposed method achieved a classication accuracy of 95.97%.
H. Evaluation on Video Sequence
In addition to the study and conclusions presented over DC
and transformed DC cropwise data sets, the best ensemble,
which is composed of the three classiers and Sugeno FI fusion,
has been tested on a video sequence gathered at the Engineer-
ing campus of Coimbra University. This video sequence has
364 frames (640 480 pixel frames resized to 320 240) with
430 annotated pedestrians (ground truth) in different poses and
interactions. To locate and recognize the pedestrians in each
frame, a sliding window technique has been applied with the
goal of analyzing the tradeoff between recognition performance
and speed. After nding the detected windows, they were clus-
tered by a nonmaxima suppression algorithm. Only pedestrians
at a distance up to 25 m were considered for annotation, as
constrained by a laser scanner used to gather the data set. A
pedestrian was successfully matched if an overlap criterion was
met, i.e.,
A
Overlap
=
A
gt
A
det
A
gt
A
det
0.4 (12)
where A
gt
corresponds to the area of the ground truth bound-
ing box, and A
det
is the area of the detected bounding box.
If A
Overlap
is greater or equal than 40%, then the detected
bounding box is considered a hit.
Since an 18 36 pixel window with three strong classiers is
not a suitable candidate for a proper recognition speed, we have
decided to double and triple the size of the cropped images,
considering yet a window size of 64 128 pixels [3], [11],
1
These data sets are provided in http://www.isr.uc.pt/~lreboucas as they
were since they are no longer publicly available on the NiSIS website.
Authorized licensed use limited to: Universidade de Coimbra. Downloaded on July 13, 2009 at 04:50 from IEEE Xplore. Restrictions apply.
This article has been accepted for inclusion in a future issue of this journal. Content is final as presented, with the exception of pagination.
OLIVEIRA et al.: ON EXPLORATION OF CLASSIFIER ENSEMBLE SYNERGISM IN PEDESTRIAN DETECTION 9
Fig. 7. Final ensemble architecture. The component classiers provide scores with values [0, +], which are scaled to [0, 1] by an LLF. Finally, the Sugeno FI
fuses the classier results giving another condence score [0, 1], which is thresholded by the minimum value of the combination of two classiers.
TABLE VII
COMPARISON OVER DC DATA SETS
making the stride of the searching window equal to one eighth
for the 64 128 window size and one ninth for the rest, again
preventing falling off boundaries. Since doubling and tripling
the DC images would introduce enough distortion, the choice
was to use another data set to guarantee more stability in the
training stage. This way, the INRIA person data set [3] was
chosen (2416 and 15 000 nonpedestrians), resizing the 64
128 pixel images to 36 72 and 54 108 pixels wide. This
data set has the advantage of presenting a xed height for the
pedestrians in the image, these being centralized on a border
of 16 pixels on each side. These characteristics provide a more
stable searching in a sliding window procedure [3], [11].
Fig. 8(a)(c) depicts the ROC curves of the video sequence
evaluation. For the three feature extractorclassiers, we found
that a 54 108 pixel window achieved the best performance.
Adjusting the thresholds of the classiers to lie on 11%, HR of
84%, 84%, and 87% were obtained by LM, LS and HS, respec-
tively; 36 72 pixel windows showed generally more false pos-
itives per frame (considering all frames in the video sequence).
For HS, 54 108 and 64 128 pixel windows presented
similar performance with a small difference of 2 percentage
points. The best parameters found in the cropwise analysis have
been increased proportionally as the windows increased. In this
sense, CNN had the kernel sizes doubled or tripled according to
the increase of the window (for the 64 128 pixel window, the
values were inherited from the 54 108 pixel window). The
number of C
1
and S
1
feature maps has not changed, although
the size of the kernels in each layer has been reduced to half to
keep the number of free parameters in the network under con-
trol. Concerning HOG parameters, for a 36 72 pixel window,
a descriptor of 12 12 pixels with a 2 2 cell block was
applied, while for 54 108 and 64 128 pixel windows, the
best parameters provided in [21] were used, i.e., a 16 16 pixel
descriptor, with a 2 2 cell block. In Fig. 9, two subsets of
sequences are shown. The annotated objects are represented by
the dark bounding boxes, while the detected (light) bounding
boxes represent the result of the recognition by the proposed
ensemble. Frame 189 shows an example of false alarm, and no
miss detection was encountered in those frame sequences.
1) Comparison With HFI: HFI was introduced in [16] with
the aim of fusing two different classiers in a fuzzy way.
Although it is possible to embody more classiers, it can be
intractable to manage fuzzy rules for more than two classi-
ers. As the input fuzzy variables (universe of discourse) are
the classier-scaled scores and overlapping areas of the two
resulting windows, the only possible way to use this method
is in a window-wise way, particularly because of the use of
overlapping areas as fuzzy input variables. HFI is applied to
independent classier executions, i.e., there is no need to dene
the same window size or sliding window parameters. This way,
HFI has been used with the following two different pairs of
classiers: 1) Haar wavelet/Adaboost + HS (originally used in
[16]) and 2) LS + HS. It is worth noting that as the training data
set has been changed and as the classiers have been retrained,
fuzzy measures of Sugeno FI had to be recalculated. The fuzzy
measures obtained for LM, LS, and HS were 0.18, 0.21, and
0.27, respectively, and the threshold of the fusion output was
0.46. Considering the same individual classier thresholds to
represent their ROCs [see Fig. 8(a)(c)], it can be noticed
in Fig. 8(d) that the proposed ensemble method outperforms
the other two HFI-based methods, with 94% of HR, reducing
the FAR of 7 percentage points, while increasing the HR by
7 percentage points with regard to the best individual classier
(HS poly3).
Authorized licensed use limited to: Universidade de Coimbra. Downloaded on July 13, 2009 at 04:50 from IEEE Xplore. Restrictions apply.
This article has been accepted for inclusion in a future issue of this journal. Content is final as presented, with the exception of pagination.
10 IEEE TRANSACTIONS ON INTELLIGENT TRANSPORTATION SYSTEMS
Fig. 8. ROCs of the video sequence evaluation. (a)(c) Evaluation of different sliding window sizes, considering also strides of one eighth for the 64 128 pixel
window and one ninth for the others. In (d), ROCs of the evaluation among the following three types of ensemble: 1) proposed ensemble using Sugeno FI as a
fusion method; 2) HFI over Haar wavelet/adaboost + HS; and 3) HFI over LS + HS.
Fig. 9. Examples of frame sequences. Annotations (ground truth) are the dark bounding boxes, while the light bounding boxes are the detected objects by the
ensemble classier. In frame 189, there is an example of false alarm.
VI. IMPLEMENTATION ISSUES
With the advent of multiprocessing machines, the paral-
lelization has become mandatory to achieve a balance between
classication performance and computational cost.
The more parallelizable a feature extractor or classier is,
the more one can speed them up. In this sense, we have started
research to parallelize our proposed ensemble method in a clus-
ter of Sony Playstation 3 (PS3). This platform brings special
advantages due to its low cost while providing more computa-
tional power than a modern central processing unit (even with
multiple cores). In preliminary studies, an HOG/linear SVM
detector has been implemented in a PS3, making use of some
parallelization techniques, but still without vectorization [26].
Concerning the sliding-window version of the proposed en-
semble, the processing time was 4 s to classify one frame
in a C++ implementation under Linux on a 1.8-MHz dual-
core machine (using only one core) with no programming
optimization. The heaviest component is the HOG extractor,
Authorized licensed use limited to: Universidade de Coimbra. Downloaded on July 13, 2009 at 04:50 from IEEE Xplore. Restrictions apply.
This article has been accepted for inclusion in a future issue of this journal. Content is final as presented, with the exception of pagination.
OLIVEIRA et al.: ON EXPLORATION OF CLASSIFIER ENSEMBLE SYNERGISM IN PEDESTRIAN DETECTION 11
taking 50% of the time. However, HOG implementation can
be optimized by an integral histogram [27], providing linear
time extraction. This way, it is expected that this processing
time could be reduced by a great factor in a PS3 using sliding-
window search, according to, e.g., [28] and [29].
VII. CONCLUSION
The main goal of an image-classication system is to recog-
nize the target object successfully. By building single-feature
extractorclassiers, one can often face problems, for instance,
with lighting changes, since nding the best tradeoff between
increasing the training data and classication performance is
not a simple task. By exploring ensemble methods, one is able
to create synergistic approaches to compensate the individual
inability of the component classiers under certain circum-
stances. In this paper, we contribute with a new ensemble
method, assessing its performance through an in-depth analysis.
Considering two types of light transformation (shadowy and
sunny effects), which were articially applied in the valida-
tion images and prone to happen in the outdoor environment,
we experimentally demonstrated the high performance of the
proposed classier ensemble. Although the heuristic MV and
Sugeno FI have reached the highest performance in the ex-
periments, we found that the use of the Sugeno FI provides
more benets in terms of generic framework. The best average
performance of our proposed method (Sugeno FI) was 94.6% of
HR and 3% of FAR in the cropwise data sets (considering also
the application of the lighting transformations) and 94% of HR
and 4% of FAR in the video sequence. Our research direction
now is to make use of global features and classiers to improve
classication performance under more difcult scenarios, tak-
ing advantage of contextual information in the scene.
REFERENCES
[1] L. Kuncheva, Combining Pattern Classiers: Methods and Algorithms.
New York: Wiley-Interscience, 2004.
[2] M. Aksela and J. Laaksonen, Using diversity of errors for selecting
members of a committee classier, Pattern Recognit., vol. 39, no. 4,
pp. 608623, Apr. 2006.
[3] N. Dalal and B. Triggs, Histograms of oriented gradients for human
detection, in Proc. IEEE Int. Conf. Comput. Vis. Pattern Recog., 2005,
pp. 886893.
[4] S. Munder and M. Gavrila, An experimental study on pedestrian
classication, IEEE Trans. Pattern Anal. Mach. Intell., vol. 28, no. 11,
pp. 18631868, Nov. 2006.
[5] Y. LeCun and Y. Bengio, Convolutional networks for images, speech and
time series, in The Handbook of Brain Theory and Neural Networks.
Cambridge, MA: MIT Press, 1995.
[6] S. Haykin, Neural Networks and Learning Machines. Englewood Cliffs,
NJ: Prentice-Hall, 2008.
[7] G. Choquet, Theory of capacities, Ann. Inst. Fourier, vol. 5, pp. 131
195, 1954.
[8] M. Sugeno, Theory of fuzzy integrals and its applications, Ph.D.
dissertation, Tokyo Inst. Technol., Tokyo, Japan, 1974.
[9] L. Nanni and A. Lumini, Ensemble of multiple pedestrian represen-
tations, IEEE Trans. Intell. Transp. Syst., vol. 9, no. 2, pp. 365369,
Jun. 2008.
[10] T. Gandhi and M. Trivedi, Pedestrian protection systems: Issues,
survey, and challenges, IEEE Trans. Intell. Transp. Syst., vol. 8, no. 3,
pp. 413430, Sep. 2007.
[11] C. Papageorgiou and T. Poggio, A trainable system for object detection,
Int. J. Comput. Vis., vol. 38, no. 1, pp. 1533, Jun. 2000.
[12] D. Lowe, Object recognition from local scale-invariant features, in
Proc. IEEE Int. Conf. Comput. Vis., 1999, pp. 11501157.
[13] M. Szarvas, U. Sakai, and J. Ogata, Real-time pedestrian detection using
LIDAR and convolutional neural networks, in Proc. IEEE Int. Symp.
Intell. Vehicles, 2006, pp. 213218.
[14] P. Viola and M. Jones, Rapid object detection using a boosted cascade,
in Proc. IEEE Int. Conf. Comput. Vis. Pattern Recog., 2001, pp. 511518.
[15] L. Llorca, M. Sotelo, L. Bergasa, P. Toro, J. Nuevo, M. Ocana, and
M. Garrido, Combination of feature extraction methods for SVM
pedestrian detection, IEEE Trans. Intell. Transp. Syst., vol. 8, no. 2,
pp. 292307, Jun. 2006.
[16] L. Oliveira, G. Monteiro, P. Peixoto, and U. Nunes, Towards a robust
vision-based obstacle perception with classier fusion in cybercars, in
Computer Aided System Theory (Eurocast 2007), vol. 4739. New York:
Springer-Verlag, 2007, pp. 10891096.
[17] Y. LeCun, L. Bottou, Y. Bengio, and P. Haffner, Gradient-based
learning applied to document recognition, Proc. IEEE, vol. 86, no. 11,
pp. 22782324, Nov. 1998.
[18] S. Lawrence, L. Giles, A. Tsoi, and A. Back, Face recognition: A con-
volutional neural network approach, IEEE Trans. Neural Netw.Special
Issue Neural Netw. Pattern Recognit., vol. 8, no. 1, pp. 98113, Jan. 1997.
[19] C. Garcia and M. Delakis, Convolutional face nder: A neural
architecture for fast and robust face detection, Proc. IEEE, vol. 26,
no. 11, pp. 14081423, Nov. 2004.
[20] D. De Ridder, R. Duin, M. Egmont-Petersen, L. van Vliet, and
P. Verbeek, Nonlinear image processing using articial neural networks,
Adv. Imaging Electron Phys., vol. 126, pp. 352450, 2003.
[21] N. Dalal, Finding people in images and videos, Ph.D. dissertation,
Institut National Polytechnique de Grenoble, Grenoble, France, 2006.
[22] G. Zenobi and P. Cunningham, Using diversity in preparing ensembles
of classiers based on different feature subsets to minimize generalization
error, in Proc. Eur. Conf. Mach. Learning. New York: Springer-Verlag,
2001, vol. 2167, pp. 576587.
[23] L. Kuncheva, That elusive diversity in classier ensembles, in Pattern
Recognition and Image Analysis, vol. 2652. New York: Springer-Verlag,
2003, pp. 11261138.
[24] P. Melville and R. Mooney, Constructing diverse classier ensembles
using articial training examples, in Proc. Int. Joint Conf. Artif. Intell.,
2003, pp. 505510.
[25] Z. Rahman, D. Jobson, and G. Woodell, A multiscale retinex for colour
rendition and dynamic range compression, in Proc. SPIE Int. Symp. Opt.
Sci., Eng., Instrum., Appl. Digital Image Process. XIX, 1996, pp. 183191.
[26] L. Oliveira, R. Britto, and U. Nunes, On using cell broadband engine for
object detection in ITS, in Proc. IEEE IROS 2nd Workshop Planning,
Perception Navigat. Intell. Vehicles, 2008, pp. 5458.
[27] Q. Zhu, S. Avidan, M-C. Yeh, and K-T. Cheng, Fast human detection
using a cascade of histograms of oriented gradients, in Proc. IEEE Conf.
Comput. Vis. Pattern Recog., 2006, pp. 14911498.
[28] H. Sugano and R. Miyamoto, A real-time object recognition system on
cell broadband engine, in Advances in Image and Video Technology,
vol. 4872. New York: Springer-Verlag, 2007, pp. 932943.
[29] A. Sarje and S. Aluru, Parallel biological sequence alignments on the cell
broadband engine, in Proc. IEEE Int. Parallel Distrib. Process. Symp.,
2008, pp. 111.
Luciano Oliveira (S03) received the B.Sc. and
M.Sc. degrees in mechatronics from the Federal
University of Bahia, Salvador, Brazil, in 1997 and
2005, respectively. He is currently working toward
the Ph.D. degree with the Institute of Systems
and Robotics, Department of Computer and Elec-
trical Engineering, University of Coimbra, Coimbra,
Portugal.
Since 2006, he has been participating on projects
such as the Perception Methods for an Intelligent
Transportation System, Multi-Target Detection and
Tracking in Semi-structured outdoor environment, and CyberC3, developing
object detection and localization systems based on vision and lidar.
Mr. Oliveira received First Place at the Nature-inspired Smart Information
Systems Competition, Problem Task: Analysis and Classication of the Daim-
lerChrysler Automotive Dataset Images, in 2007. In 2008, he received Third
Place at the Intel/GV Entrepreneurship and Venture Capital Competition with
the TruckSafe project for intelligent sensing in carrier trucking.
Authorized licensed use limited to: Universidade de Coimbra. Downloaded on July 13, 2009 at 04:50 from IEEE Xplore. Restrictions apply.
This article has been accepted for inclusion in a future issue of this journal. Content is final as presented, with the exception of pagination.
12 IEEE TRANSACTIONS ON INTELLIGENT TRANSPORTATION SYSTEMS
Urbano Nunes (S90M95SM09) received the
Lic. and Ph.D. degrees in electrical engineering from
the University of Coimbra, Coimbra, Portugal, in
1983 and 1995, respectively.
He is currently an Associate Professor with the
Faculty of Sciences and Technology and the Head
of the Department of Computer and Electrical En-
gineering, University of Coimbra. He is also a Re-
searcher with the Institute for Systems and Robotics,
University of Coimbra, where he is the Coordina-
tor of the Automation and Mobile Robotics Group
and the Coordinator of the Mechatronics Laboratory. He has been involved
with/responsible for several funded projects at both the national and interna-
tional levels in the areas of mobile robotics and intelligent vehicles.
Dr. Nunes is an Associate Editor for the IEEE TRANSACTIONS ON
INTELLIGENT TRANSPORTATION SYSTEMS, an Associate Editor for the IEEE
Intelligent Transportation Systems Magazine, and a Cochair of the Technical
Committee (TC) on Autonomous Ground Vehicles and Intelligent Transporta-
tion Systems (ITS) of the IEEE Robotics and Automation Society (RAS). He
received the IEEE ITS Society Outstanding Service Award in 2006, the IEEE
RAS Most Active TC Award as a Cochair of RAS TC on ITS in 2006, and
the First Place at the 2007 Nature-inspired Smart Information Systems Com-
petition, Problem Task: Analysis and Classication of the DaimlerChrysler
Automotive Data set Images, as a coauthor.
Paulo Peixoto (M09) received the B.Sc. degree in
electrical engineering, the M.Sc. degree in systems
and automation, and the Ph.D. degree in electri-
cal engineering from the University of Coimbra,
Coimbra, Portugal, in 1989, 1995, and 2003,
respectively.
He is currently an Assistant Professor with the
Department of Electrical and Computer Engineering,
University of Coimbra, and a Researcher with the
Institute of Systems and Robotics, University of
Coimbra. He has been involved with/responsible for
several funded projects at both the national and international levels in the areas
of computer vision, visual surveillance, and intelligent vehicles. His research
interests include computer vision applied to intelligent transportation systems,
pattern recognition, and humancomputer interaction.
Dr. Peixoto received First Place at the Nature-inspired Smart Informa-
tion Systems Competition, Problem Task: Analysis and Classication of the
DaimlerChrysler Automotive Data set Images, as a coauthor in 2007.
Authorized licensed use limited to: Universidade de Coimbra. Downloaded on July 13, 2009 at 04:50 from IEEE Xplore. Restrictions apply.