You are on page 1of 5

Proceedings of the Fifth International Conference on Machine Learning and Cybernetics, Dalian, 13-16 August 2006

CONSTRUCTING LEAST SQUARE SUPPORT VECTOR MACHINES


ENSEMBLE BASED ON FUZZY INTEGRAL
CHUN-MEI LIU1, LIANG-KUAN ZHU2
1

College of Foundation Science, Harbin University of Commerce, Harbin, 150076, China


Department of Control Science and Engineering, Harbin Institute of Technology, Harbin, 150001, China
E-MAIL: zhuliangkuan@gmail.com

Abstract:
Even the support vector machine (SVM) has been proved
to improve the classification performance greatly than a single
SVM, the classification result of the practically implemented
SVM is often far from the theoretically expected level because
they dont evaluate the importance degree of the output of
individual component SVMs classifier to the final decision.
This paper proposes a boosting least square support vector
machine (LS-SVM) ensemble method based on fuzzy integral
to improve the limited classification performance. In general,
the proposed method is built in 3 steps: construct the
component LS-SVM; obtain the probabilistic outputs model of
each component LS-SVM; combine the component predictions
based on fuzzy integral. The trained individual LS-SVMs are
aggregated to make a final decision. The simulating results
demonstrate that the proposed LS-SVM ensemble with
boosting outperforms a single SVM and traditional SVM (or
LS-SVM) ensemble technique via majority voting in terms of
classification accuracy.

Keywords:
LS-SVM; SVM ensemble; Boosting; Fuzzy integral;
Information fusion

1.

Introduction

The past decade witnessed support vector machines


(SVMs) increasing popularity because of their easy usage,
relatively high performance, and ability to deal with various
problems, including classification, regression, and density
estimation [1]. By now, it has been successfully applied in
many areas such as the face detection, the handwriting
digital character recognition, the data mining, etc. In SVMs,
the classification problem is formulated and represented as
a convex quadratic programming (QP) problem [2, 3].
Basically, the SVM classifier maps the inputs into a higher
dimensional feature space in which a linear classifier is
obtained by solving a finite dimensional QP problem in the
dual space avoiding explicit knowledge of the high
dimensional mapping and using only the related kernel

function.
The LS-SVM was introduced by Suykens [4], it uses
equality constraints instead of inequality constraints and a
least squares error term in order to obtain a linear set of
equations in the dual space, which is computationally
attractive.
Despite the high performance of SVM and LS-SVM,
researchers seek to further improve them with ensemble
techniques, such as bagging or boosting [5, 6, [7]. However,
all these methods dont consider the degree of importance
of the output of component SVMs and LS-SVMs, which
plays a significant role in the classification. To deal with the
problem, we propose a support vector machines ensemble
method based on fuzzy integral fusion technique. The
proposed method consists of three phases. Firstly, we use
boosting technique to construct the component LS-SVMs.
In boosting, the training samples for each individual SVM
is chosen according to updating probability distribution
(related to error) for samples. So we obtain probabilistic
outputs model of each individual component LS-SVMs
secondly. Finally we combine the component predictions
based on fuzzy integral in which the relative importance of
the different individual component SVM is considered.
Fuzzy integral nonlinearly combines objective evidences, in
the form of a fuzzy membership function, with subjective
evaluation of the worth of the individual LS-SVMs with
respect to the decision. The experimental results confirm
the superiority of the presented method to the majority
voting technique.
The rest of this paper is organized as follows. In
section 2, some basic notions of LS-SVM is reviewed. In
Section 3, probabilistic outputs models for LS-SVMs are
provided. Boosting method for constructing the LS-SVMs
ensemble and fuzzy integral for aggregating LS-SVMs are
described in Section 4. Section 5 presents experimental
results applied to benchmark problems. Finally, a
conclusion is drawn in section 6.

1-4244-0060-0/06/$20.00 2006 IEEE


2391

Proceedings of the Fifth International Conference on Machine Learning and Cybernetics, Dalian, 13-16 August 2006
2.

SVM and LS-SVM for classification problems

1 a y
1T 0 b = 0 ,

2.1. SVM for classification


Given training data, T = {( x1 , y1 ),L ( x N , y N )} ,

where

= ( K + 1 I ), K = {kij = H ( xi , x j )}li , j =1 ,

where xi R is the input pattern, yi {1,1} is the


n

class label for a two-class problem, SVM for classification


attempts to find a classifier f ( x) , which minimizes the
expected misclassification rate. A linear classifier f ( x) in
SVM is equivalent to solving a convex quadratic
optimization problem in (1)
N
1
2
W + C i
(1)
2
i =1
subject to yi ( w , xi + b) 1 i and i 0 ,
where C is called the regularization parameter, and is

max

used to balance the classifiers complexity and


classification accuracy on the training set T . This
quadratic problem is generally solved through its dual
formulation [2]. Simply replacing the involved vector
inner-product with a non-linear kernel function converts
linear SVM into a more flexible non-linear SVM, which is
the essence of the famous kernel trick. Please refer to [1]
for more details on SVM for classification.
2.2.

LS-SVM for classification

The LS-SVM constructs a linear classification model,


f w ,b = sgn(w x + b) , in a high-dimensional feature
space, F ( : X F ) , induced by a kernel function,

H (x, x) = exp 2 x x

} . The optimal values

for the weight vector, w , and bias, b , are given by the


minimum of an objective function

W ( w , b) =

1 = (1,1,L ,1)T , y = ( y1 , y2 ,L , yl )T
and

= (1 , 2 ,L , l )T .

More details about LS-SVM can be found in [4].


3.

Probabilistic outputs for SVMs

This section will make a brief overview of the notion


of probabilistic outputs for SVMs [9].
3.1. Two classes case
n

Given training data xi R , i = 1,L , l , labeled by

yi = {1,1} , the binary support vector machine obtained a

decision function f ( x) so that sgn( f ( x)) is the


prediction of any test sample x .
Instead of predicting the label, many applications
require a posterior class probability

1
(5)
1 + exp( Af (x) + B )
With parameters A and B [9]. To estimate the best
values of ( A, B ) , any subset of l training data ( N + of
them with yi = 1 , and N of them with yi = 1 ) can be
P ( y = 1| x) =

used to solve the following maximum likelihood problem:


min F ( z )
(6)
z = ( A, B )

where
l

F ( z ) = (ti log( pi )) + (1 ti ) log(1 pi )) ,

l
1
2
w + ( yi w (xi ) b) 2 (2)
2
i =1

i =1

pi =

implementing a quadratic regularization of a sumof-squares empirical risk. The solution of this problem can
be written as an expansion in terms of training patterns,

f (x) = sign H (xi , x) + b


i =1

(4)

(3)

Sukens et al. [4] show that the optimal coefficients of


this expansion, ( , b) , are given by the solution of a
system of linear equations,
T

2392

and

1
, f i = f ( xi ) ,
1 + exp( Af i + B )

N+ + 1
N +2

ti = +
1
N + 2

if yi = 1
,

if yi = 1

i = 1,2L, l . (7)

Proceedings of the Fifth International Conference on Machine Learning and Cybernetics, Dalian, 13-16 August 2006
3.2. Multi-classes case
The K classification problems can be efficiently
solved by partitioning the original problem into a set of
K ( K 1) 2 two-class problems. Given the observation
x and the class label y , we assume that the estimated
pairwise

class

probabilities

for

rij of

uij , i.e.

P ( y = i | y = i or j ) , are available. From the i th and


j th classes of a training set, we can obtain a model which,
for any new x , calculates rij as an approximation of uij .
Then using all rij , the following algorithm in [9] can be
used to estimate pi = P ( y = i | x), i = 1,L , K .
Consider that

P( y = i or j | x) ( K 2) P( y = i | x)

P( y =

(8)

j | x) = 1

j =1

Using

rij uij =

P ( y = i | x)
P ( y = i or j | x)

(9)

We can obtain

pi =

4.

(10)

1
( K 2)

j : j i rij

4.1. Boosting for constructing LS-SVM ensemble


The representative boosting algorithm is the AdaBoost
algorithm [11]. Given a training set
TR =

{(x , y ) i = 1, 2,L , l} consisting

of l whole samples

and each sample in the TR is assigned to have the same


value of weight p0 ( xi ) = 1
SVM
samples

classifier,

we

samples TRboostk+1 = ( xi , yi ) i =1,2,L, l of the ( k + 1) th


SVM classifier. This sampling procedure will be repeated
until K training samples set has been built for the K th
SVM classifier.

The fuzzy integral and the associated fuzzy measures


provide a useful way for aggregating information. In the
following, we introduce the basic theory about the fuzzy
integral briefly [12].
A set function g : P (Y ) [0,1] is called a fuzzy
measure if the following conditions are satisfied
1. g ( ) = 0, g (Y ) = 1
2. g ( A) g ( B ), if A B and A, B P (Y )
Following this definition, Sugeno (1977) introduced a
so-called g -fuzzy measure which comes with an
additional property

LS-SVM ensemble

training samples in TR based on the errorness of the


training samples as follows. The weight values of the
incorrectly classified samples are increased but the weight
values of the incorrectly classified samples are increased
but the weight value of the correctly classified samples are
decreased. This implies that the samples which are hard to
classify are selected more frequently. This updated weight
values will be used for building the training

4.2. Fusion LS-SVMs based on fuzzy integral

j: j i
K

classification performance of the k th trained SVM


classifier using the whole training sample TR as follows.
We obtain the updated weight values pk ( xi ) of the

build

. For training the k th


a

set

of

TRboostk = {(xi , yi ) i = 1, 2,L , l }

training
that

is

obtained by selecting l (< l ) samples among the whole


data set TR according to the weight values pk 1 ( xi ) at the

g ( A U B) = g ( A) + g ( B) + g ( A) g ( B) (11)
for all A, B Y and A I B = , and for some
> 1 .
Because of the boundary condition g (Y ) = 1 , is
determined by solving the following polynomial equation
n

+ 1 = (1 + g i ) .
i =1

Let g : Y [0,1] be a fuzzy subset of Y and use


the notation

Ai = { yi , yi +1 ,L , yn } . For g being a

g -fuzzy measure, the values of g ( Ai ) can be determined


recursively as

(k 1) th iteration. This training sample is used for


training the k th SVM classifier. Then, we evaluate the

2393

g ( A1 ) = g ({ y1}) = g 1

g ( Ai ) = g i + g ( Ai 1 ) + g i g ( Ai 1 ) ,

Proceedings of the Fifth International Conference on Machine Learning and Cybernetics, Dalian, 13-16 August 2006

1< i n

for

(12)
suppose

g -fuzzy measure,
h( x1 ) h( x2 ) L h( xn ) , then the so-called fuzzy
integral e can be computed by
Based

on

the

e = max[min(
h xi
, g ( Ai ))]
i

(13)

4.3. The algorithm for LS-SVMs ensemble


The whole process of LS-SVMs ensemble is described
as follows:
Step 1: Generate replicated training data sets via
boosting from original training set according to 4.1
Step 2: Train each component LS-SVM using its
replicated training data set.
Step 3: Obtain probabilistic outputs model of each
component LS-SVMs according to 3.1 and 3.2.
Step 4: Assign the gi values, the degree of importance

problem and the latter one is a multi-class classification


problem.
5.1. Breast cancer classification problem
The breast cancer data was obtained from UCI
benchmark repositories [13]. It is a binary problem with the
dimension size of 9 and 286 instances. In this experiment
200 instances were used for training and others for testing.
For boosting, we iteratively re-sampled 100 data samples
with replacement according to the updated probability
distribution from the training data set. Then we train each
component LS-SVM independently over the replicated
training data set and aggregate several trained LS-SVMs
via fuzzy integral. Table 1 shows the test results.
Table 1. The experimental results of the breast cancer data
set
Algorithms
Test Accuracy (%)
Single SVMs
SVMs ensemble via
majority voting
LS-SVMs ensemble via
majority voting
LS-SVMs ensemble via
fuzzy integral

of each component LS-SVMs, based on how good these


LS-SVMs performed on training data.
Step 5: Obtain local decision of each component
LS-SVMs when given a new test example.
Step 6: Aggregate all the component LS-SVMs by
fuzzy integral to get the final decision according 4.2.

75.66
77.32
78.24
83.65

5.2. Statlog satimage data classification


Final Decision

The satimage data set contains six classes. 4435 data


samples are used for the training set. For boosting, we
resample randomly 3000 data samples with replacement
from the original training data set. We train each component
LS-SVM independently over the replicated training data set
and aggregate several trained LS-SVMs via fuzzy integral.
The remained 2000 samples are test set. Table 2 shows the
test results.

Fusion by fuzzy integral

Componet LSSVMs with


probabilistic
outputs
Replicated
data sets via
boosting

LS-SVM 1

LS-SVM 2

LS-SVM n

TrainingSet 1

Training Set 2

Traing Set n

Original Training
Set

Table 2. The experimental results of satimage data set


Algorithms
Test Accuracy (%)

Figure 1. LS-SVMs ensemble scheme

Single SVM
SVMs ensemble via
majority voting
LS-SVMs ensemble via
majority voting
LS-SVMs ensemble via
fuzzy integral

The above Figure 1 shows the proposed scheme of the


LS-SVMs ensemble.
5.

Experimental results

To evaluate the efficacy of the proposed boosting


LS-SVM ensemble based on fuzzy integral, we have
performed two different classification problems such as the
Breast cancer classification and the statlog satimage data
classification. The former one is a two-class classification

84.98
86.25
87.32
89.98

From table 1 and table 2, we can see that the


performance of SVMs ensemble based on fuzzy integral is
better than that of single SVM classifier, SVMs ensemble

2394

Proceedings of the Fifth International Conference on Machine Learning and Cybernetics, Dalian, 13-16 August 2006
and LS-SVM ensemble via majority voting. The traditional
SVM and LS-SVMs ensemble method based on majority
voting only predicts the label and can not give posterior
probabilities, so the fusion strategy via majority voting do
not evaluate the importance degree of component
classifiers output to the final decision, which we think is
the main reason that the accuracy of LS-SVMs ensemble
via fuzzy integral fusion strategy is higher than via the
other methods.
6.

Conclusions

This paper proposes a novel ensemble method based


on fuzzy integral for LS-SVMs. The most important
advantage of this methodology is that both the classification
results combination and the importance degree of the
different component LS-SVMs classifiers are considered.
We evaluate the classification performance of the proposed
method over two different classification problems such as
the Breast cancer classification and the statlog satimage
data classification, the simulation results show the
effectiveness and efficiency of our method.
Acknowledgements
This work is partly supported by Research Fund for
the Doctoral Program of Higher Education of China
(20050213010).
References
[1] Schlkopf. B., Smola. A., Learning with Kernels,
MIT Press, Cambridge, MA, 2002.
[2] Cristianini. N. and Shawe-Taylor. J., An Introduction
to Support Vector Machines, Cambridge Univ. Press,
2000.

[3] Vapnik. V., Statistical Learning Theory, New York:


Wiley, 1998.
[4] Suykens. J. A.K., De Brabanter. J., Weighted least
squares support vector machines: robustness and
spares approximation, Neruocomputing, Vo1.48, pp.
85-105, 2002
[5] Derbeko. P., El-Yaniv.R., Meir.R., Variance
optimized bagging, Proceedings of the 13th European
Conference on Machine Learning, Helsinki, Finland,
pp. 6071, August 1923, 2002.
[6] Valentini. G., Muselli. M., Ruffino. F., Cancer
recognition with bagged ensembles of support vector
machines, Neurocomputing, Vol. 56 (1), pp. 461-466,
2004.
[7] Kim. H.C., Pang. S., Je.H.M., Kim. D., Bang. S.Y.,
Constructing Support Vector Machine Ensemble,
Pattern Recognition, Vol.36, pp. 2757-2767, 2003.
[8] Burges. C.J., A tutorial on support vector machines
for pattern recognition, Data Mining and Knowledge
Discovery, Vol. 2 (2), pp. 121-167,1998.
[9] Platt. J., Probabilistic Outputs for Support Vector
Machines and Comparison to Regularized Likelihood
Method. In Advance in Large Margin Classifiers,
Smola, A., Bartlett, P., Scholkopf, B., and Schuurmans,
D., Eds. Cambridge, MA: MIT Press , 2000.
[10] Freund. Y., Schapire. R., A decision theoretic
generalization of online learning and an application to
boosting, J. Comput. System Sci., Vol. 55 (1), pp.
119139. 1997.
[11] Kwak, K.C., Pedrycz, W., Face Recognition: A
Study in Information Fusion Using Fuzzy Integral,
Pattern Recognition Letters, Vol.26, 719-733, 2005.
[12] Blake. C., UCI repository of machine learning
database, Department of Information and Computer
Science, University of California, Irvine, CA,
http://www.ics.ucil.edu/~mlearn/MLRepository.html

2395

You might also like