Vedio

Proceedings of the Fifth International Conference on Machine Learning and Cybernetics, Dalian, 13-16 August 2006
CONSTRUCTING LEAST SQUARE SUPPORT VECTOR MACHINES

ENSEMBLE BASED ON FUZZY INTEGRAL
CHUN-MEI LIU1, LIANG-KUAN ZHU2
1
College of Foundation Science, Harbin University of Commerce, Harbin, 150076, China

Department of Control Science and Engineering, Harbin Institute of Technology, Harbin, 150001, China
E-MAIL: zhuliangkuan@gmail.com
Abstract:
Even the support vector machine (SVM) has been proved
to improve the classification performance greatly than a single
SVM, the classification result of the practically implemented
SVM is often far from the theoretically expected level because
they dont evaluate the importance degree of the output of
individual component SVMs classifier to the final decision.
This paper proposes a boosting least square support vector
machine (LS-SVM) ensemble method based on fuzzy integral
to improve the limited classification performance. In general,
the proposed method is built in 3 steps: construct the
component LS-SVM; obtain the probabilistic outputs model of
each component LS-SVM; combine the component predictions
based on fuzzy integral. The trained individual LS-SVMs are
aggregated to make a final decision. The simulating results
demonstrate that the proposed LS-SVM ensemble with
boosting outperforms a single SVM and traditional SVM (or
LS-SVM) ensemble technique via majority voting in terms of
classification accuracy.
Keywords:
LS-SVM; SVM ensemble; Boosting; Fuzzy integral;
Information fusion
1.
Introduction
The past decade witnessed support vector machines

(SVMs) increasing popularity because of their easy usage,
relatively high performance, and ability to deal with various
problems, including classification, regression, and density
estimation [1]. By now, it has been successfully applied in
many areas such as the face detection, the handwriting
digital character recognition, the data mining, etc. In SVMs,
the classification problem is formulated and represented as
a convex quadratic programming (QP) problem [2, 3].
Basically, the SVM classifier maps the inputs into a higher
dimensional feature space in which a linear classifier is
obtained by solving a finite dimensional QP problem in the
dual space avoiding explicit knowledge of the high
dimensional mapping and using only the related kernel
function.
The LS-SVM was introduced by Suykens [4], it uses
equality constraints instead of inequality constraints and a
least squares error term in order to obtain a linear set of
equations in the dual space, which is computationally
attractive.
Despite the high performance of SVM and LS-SVM,
researchers seek to further improve them with ensemble
techniques, such as bagging or boosting [5, 6, [7]. However,
all these methods dont consider the degree of importance
of the output of component SVMs and LS-SVMs, which
plays a significant role in the classification. To deal with the
problem, we propose a support vector machines ensemble
method based on fuzzy integral fusion technique. The
proposed method consists of three phases. Firstly, we use
boosting technique to construct the component LS-SVMs.
In boosting, the training samples for each individual SVM
is chosen according to updating probability distribution
(related to error) for samples. So we obtain probabilistic
outputs model of each individual component LS-SVMs
secondly. Finally we combine the component predictions
based on fuzzy integral in which the relative importance of
the different individual component SVM is considered.
Fuzzy integral nonlinearly combines objective evidences, in
the form of a fuzzy membership function, with subjective
evaluation of the worth of the individual LS-SVMs with
respect to the decision. The experimental results confirm
the superiority of the presented method to the majority
voting technique.
The rest of this paper is organized as follows. In
section 2, some basic notions of LS-SVM is reviewed. In
Section 3, probabilistic outputs models for LS-SVMs are
provided. Boosting method for constructing the LS-SVMs
ensemble and fuzzy integral for aggregating LS-SVMs are
described in Section 4. Section 5 presents experimental
results applied to benchmark problems. Finally, a
conclusion is drawn in section 6.
1-4244-0060-0/06/$20.00 2006 IEEE

2391
2.
SVM and LS-SVM for classification problems
1 a y
1T 0 b = 0 ,
2.1. SVM for classification

Given training data, T = {( x1 , y1 ),L ( x N , y N )} ,
where
= ( K + 1 I ), K = {kij = H ( xi , x j )}li , j =1 ,
where xi R is the input pattern, yi {1,1} is the

n
class label for a two-class problem, SVM for classification

attempts to find a classifier f ( x) , which minimizes the
expected misclassification rate. A linear classifier f ( x) in
SVM is equivalent to solving a convex quadratic
optimization problem in (1)
N
1
2
W + C i
(1)
2
i =1
subject to yi ( w , xi + b) 1 i and i 0 ,
where C is called the regularization parameter, and is
max
used to balance the classifiers complexity and

classification accuracy on the training set T . This
quadratic problem is generally solved through its dual
formulation [2]. Simply replacing the involved vector
inner-product with a non-linear kernel function converts
linear SVM into a more flexible non-linear SVM, which is
the essence of the famous kernel trick. Please refer to [1]
for more details on SVM for classification.
2.2.
LS-SVM for classification
The LS-SVM constructs a linear classification model,

f w ,b = sgn(w x + b) , in a high-dimensional feature
space, F ( : X F ) , induced by a kernel function,
H (x, x) = exp 2 x x
} . The optimal values
for the weight vector, w , and bias, b , are given by the

minimum of an objective function
W ( w , b) =
1 = (1,1,L ,1)T , y = ( y1 , y2 ,L , yl )T
and
= (1 , 2 ,L , l )T .
More details about LS-SVM can be found in [4].

3.
Probabilistic outputs for SVMs
This section will make a brief overview of the notion

of probabilistic outputs for SVMs [9].
3.1. Two classes case
n
Given training data xi R , i = 1,L , l , labeled by
yi = {1,1} , the binary support vector machine obtained a
decision function f ( x) so that sgn( f ( x)) is the

prediction of any test sample x .
Instead of predicting the label, many applications
require a posterior class probability
1
(5)
1 + exp( Af (x) + B )
With parameters A and B [9]. To estimate the best
values of ( A, B ) , any subset of l training data ( N + of
them with yi = 1 , and N of them with yi = 1 ) can be
P ( y = 1| x) =
used to solve the following maximum likelihood problem:

min F ( z )
(6)
z = ( A, B )
where
l
F ( z ) = (ti log( pi )) + (1 ti ) log(1 pi )) ,
l
1
2
w + ( yi w (xi ) b) 2 (2)
2
i =1
i =1
pi =
implementing a quadratic regularization of a sumof-squares empirical risk. The solution of this problem can
be written as an expansion in terms of training patterns,
f (x) = sign H (xi , x) + b

i =1
(4)
(3)
Sukens et al. [4] show that the optimal coefficients of

this expansion, ( , b) , are given by the solution of a
system of linear equations,
T
2392
and
1
, f i = f ( xi ) ,
1 + exp( Af i + B )
N+ + 1
N +2
ti = +
1
N + 2
if yi = 1
,
if yi = 1
i = 1,2L, l . (7)
3.2. Multi-classes case
The K classification problems can be efficiently
solved by partitioning the original problem into a set of
K ( K 1) 2 two-class problems. Given the observation
x and the class label y , we assume that the estimated
pairwise
class
probabilities
for
rij of
uij , i.e.
P ( y = i | y = i or j ) , are available. From the i th and

j th classes of a training set, we can obtain a model which,
for any new x , calculates rij as an approximation of uij .
Then using all rij , the following algorithm in [9] can be
used to estimate pi = P ( y = i | x), i = 1,L , K .
Consider that
P( y = i or j | x) ( K 2) P( y = i | x)
P( y =
(8)
j | x) = 1
j =1
Using
rij uij =
P ( y = i | x)
P ( y = i or j | x)
(9)
We can obtain
pi =
4.
(10)
1
( K 2)
j : j i rij
4.1. Boosting for constructing LS-SVM ensemble

The representative boosting algorithm is the AdaBoost
algorithm [11]. Given a training set
TR =
{(x , y ) i = 1, 2,L , l} consisting
of l whole samples
and each sample in the TR is assigned to have the same

value of weight p0 ( xi ) = 1
SVM
samples
classifier,
we
samples TRboostk+1 = ( xi , yi ) i =1,2,L, l of the ( k + 1) th

SVM classifier. This sampling procedure will be repeated
until K training samples set has been built for the K th
SVM classifier.
The fuzzy integral and the associated fuzzy measures

provide a useful way for aggregating information. In the
following, we introduce the basic theory about the fuzzy
integral briefly [12].
A set function g : P (Y ) [0,1] is called a fuzzy
measure if the following conditions are satisfied
1. g ( ) = 0, g (Y ) = 1
2. g ( A) g ( B ), if A B and A, B P (Y )
Following this definition, Sugeno (1977) introduced a
so-called g -fuzzy measure which comes with an
additional property
LS-SVM ensemble
training samples in TR based on the errorness of the

training samples as follows. The weight values of the
incorrectly classified samples are increased but the weight
values of the incorrectly classified samples are increased
but the weight value of the correctly classified samples are
decreased. This implies that the samples which are hard to
classify are selected more frequently. This updated weight
values will be used for building the training
4.2. Fusion LS-SVMs based on fuzzy integral
j: j i
K
classification performance of the k th trained SVM

classifier using the whole training sample TR as follows.
We obtain the updated weight values pk ( xi ) of the
build
. For training the k th

a
set
of
TRboostk = {(xi , yi ) i = 1, 2,L , l }
training
that
is
obtained by selecting l (< l ) samples among the whole

data set TR according to the weight values pk 1 ( xi ) at the
g ( A U B) = g ( A) + g ( B) + g ( A) g ( B) (11)
for all A, B Y and A I B = , and for some
> 1 .
Because of the boundary condition g (Y ) = 1 , is
determined by solving the following polynomial equation
n
+ 1 = (1 + g i ) .
i =1
Let g : Y [0,1] be a fuzzy subset of Y and use

the notation
Ai = { yi , yi +1 ,L , yn } . For g being a
g -fuzzy measure, the values of g ( Ai ) can be determined

recursively as
(k 1) th iteration. This training sample is used for

training the k th SVM classifier. Then, we evaluate the
2393
g ( A1 ) = g ({ y1}) = g 1
g ( Ai ) = g i + g ( Ai 1 ) + g i g ( Ai 1 ) ,
1< i n
for
(12)
suppose
g -fuzzy measure,
h( x1 ) h( x2 ) L h( xn ) , then the so-called fuzzy
integral e can be computed by
Based
on
the
e = max[min(
h xi
, g ( Ai ))]
i
(13)
4.3. The algorithm for LS-SVMs ensemble

The whole process of LS-SVMs ensemble is described
as follows:
Step 1: Generate replicated training data sets via
boosting from original training set according to 4.1
Step 2: Train each component LS-SVM using its
replicated training data set.
Step 3: Obtain probabilistic outputs model of each
component LS-SVMs according to 3.1 and 3.2.
Step 4: Assign the gi values, the degree of importance
problem and the latter one is a multi-class classification

problem.
5.1. Breast cancer classification problem
The breast cancer data was obtained from UCI
benchmark repositories [13]. It is a binary problem with the
dimension size of 9 and 286 instances. In this experiment
200 instances were used for training and others for testing.
For boosting, we iteratively re-sampled 100 data samples
with replacement according to the updated probability
distribution from the training data set. Then we train each
component LS-SVM independently over the replicated
training data set and aggregate several trained LS-SVMs
via fuzzy integral. Table 1 shows the test results.
Table 1. The experimental results of the breast cancer data
set
Algorithms
Test Accuracy (%)
Single SVMs
SVMs ensemble via
majority voting
LS-SVMs ensemble via
majority voting
fuzzy integral
of each component LS-SVMs, based on how good these

LS-SVMs performed on training data.
Step 5: Obtain local decision of each component
LS-SVMs when given a new test example.
Step 6: Aggregate all the component LS-SVMs by
fuzzy integral to get the final decision according 4.2.
75.66
77.32
78.24
83.65
5.2. Statlog satimage data classification

Final Decision
The satimage data set contains six classes. 4435 data

samples are used for the training set. For boosting, we
resample randomly 3000 data samples with replacement
from the original training data set. We train each component
LS-SVM independently over the replicated training data set
and aggregate several trained LS-SVMs via fuzzy integral.
The remained 2000 samples are test set. Table 2 shows the
test results.
Fusion by fuzzy integral
Componet LSSVMs with

probabilistic
outputs
Replicated
data sets via
boosting
LS-SVM 1
LS-SVM 2
LS-SVM n
TrainingSet 1
Training Set 2
Traing Set n
Original Training
Set
Table 2. The experimental results of satimage data set

Algorithms
Test Accuracy (%)
Figure 1. LS-SVMs ensemble scheme
Single SVM
SVMs ensemble via
majority voting
majority voting
fuzzy integral
The above Figure 1 shows the proposed scheme of the

LS-SVMs ensemble.
5.
Experimental results
To evaluate the efficacy of the proposed boosting

LS-SVM ensemble based on fuzzy integral, we have
performed two different classification problems such as the
Breast cancer classification and the statlog satimage data
classification. The former one is a two-class classification
84.98
86.25
87.32
89.98
From table 1 and table 2, we can see that the

performance of SVMs ensemble based on fuzzy integral is
better than that of single SVM classifier, SVMs ensemble
2394
and LS-SVM ensemble via majority voting. The traditional
SVM and LS-SVMs ensemble method based on majority
voting only predicts the label and can not give posterior
probabilities, so the fusion strategy via majority voting do
not evaluate the importance degree of component
classifiers output to the final decision, which we think is
the main reason that the accuracy of LS-SVMs ensemble
via fuzzy integral fusion strategy is higher than via the
other methods.
6.
Conclusions
This paper proposes a novel ensemble method based

on fuzzy integral for LS-SVMs. The most important
advantage of this methodology is that both the classification
results combination and the importance degree of the
different component LS-SVMs classifiers are considered.
We evaluate the classification performance of the proposed
method over two different classification problems such as
the Breast cancer classification and the statlog satimage
data classification, the simulation results show the
effectiveness and efficiency of our method.
Acknowledgements
This work is partly supported by Research Fund for
the Doctoral Program of Higher Education of China
(20050213010).
References
[1] Schlkopf. B., Smola. A., Learning with Kernels,
MIT Press, Cambridge, MA, 2002.
[2] Cristianini. N. and Shawe-Taylor. J., An Introduction
to Support Vector Machines, Cambridge Univ. Press,
2000.
[3] Vapnik. V., Statistical Learning Theory, New York:

Wiley, 1998.
[4] Suykens. J. A.K., De Brabanter. J., Weighted least
squares support vector machines: robustness and
spares approximation, Neruocomputing, Vo1.48, pp.
85-105, 2002
[5] Derbeko. P., El-Yaniv.R., Meir.R., Variance
optimized bagging, Proceedings of the 13th European
Conference on Machine Learning, Helsinki, Finland,
pp. 6071, August 1923, 2002.
[6] Valentini. G., Muselli. M., Ruffino. F., Cancer
recognition with bagged ensembles of support vector
machines, Neurocomputing, Vol. 56 (1), pp. 461-466,
2004.
[7] Kim. H.C., Pang. S., Je.H.M., Kim. D., Bang. S.Y.,
Constructing Support Vector Machine Ensemble,
Pattern Recognition, Vol.36, pp. 2757-2767, 2003.
[8] Burges. C.J., A tutorial on support vector machines
for pattern recognition, Data Mining and Knowledge
Discovery, Vol. 2 (2), pp. 121-167,1998.
[9] Platt. J., Probabilistic Outputs for Support Vector
Machines and Comparison to Regularized Likelihood
Method. In Advance in Large Margin Classifiers,
Smola, A., Bartlett, P., Scholkopf, B., and Schuurmans,
D., Eds. Cambridge, MA: MIT Press , 2000.
[10] Freund. Y., Schapire. R., A decision theoretic
generalization of online learning and an application to
boosting, J. Comput. System Sci., Vol. 55 (1), pp.
119139. 1997.
[11] Kwak, K.C., Pedrycz, W., Face Recognition: A
Study in Information Fusion Using Fuzzy Integral,
Pattern Recognition Letters, Vol.26, 719-733, 2005.
[12] Blake. C., UCI repository of machine learning
database, Department of Information and Computer
Science, University of California, Irvine, CA,
http://www.ics.ucil.edu/~mlearn/MLRepository.html
2395

Vedio

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Vedio

Uploaded by

Copyright:

Available Formats

Proceedings of the Fifth International Conference on Machine Learning and Cybernetics, Dalian, 13-16 August 2006

CONSTRUCTING LEAST SQUARE SUPPORT VECTOR MACHINES

College of Foundation Science, Harbin University of Commerce, Harbin, 150076, China

The past decade witnessed support vector machines

1-4244-0060-0/06/$20.00 2006 IEEE

SVM and LS-SVM for classification problems

2.1. SVM for classification

where xi R is the input pattern, yi {1,1} is the

class label for a two-class problem, SVM for classification

used to balance the classifiers complexity and

LS-SVM for classification

The LS-SVM constructs a linear classification model,

} . The optimal values

for the weight vector, w , and bias, b , are given by the

More details about LS-SVM can be found in [4].

Probabilistic outputs for SVMs

This section will make a brief overview of the notion

Given training data xi R , i = 1,L , l , labeled by

yi = {1,1} , the binary support vector machine obtained a

decision function f ( x) so that sgn( f ( x)) is the

used to solve the following maximum likelihood problem:

F ( z ) = (ti log( pi )) + (1 ti ) log(1 pi )) ,

f (x) = sign H (xi , x) + b

Sukens et al. [4] show that the optimal coefficients of

P ( y = i | y = i or j ) , are available. From the i th and

4.1. Boosting for constructing LS-SVM ensemble

{(x , y ) i = 1, 2,L , l} consisting

and each sample in the TR is assigned to have the same

samples TRboostk+1 = ( xi , yi ) i =1,2,L, l of the ( k + 1) th

The fuzzy integral and the associated fuzzy measures

training samples in TR based on the errorness of the

4.2. Fusion LS-SVMs based on fuzzy integral

classification performance of the k th trained SVM

. For training the k th

TRboostk = {(xi , yi ) i = 1, 2,L , l }

obtained by selecting l (< l ) samples among the whole

Let g : Y [0,1] be a fuzzy subset of Y and use

g -fuzzy measure, the values of g ( Ai ) can be determined

(k 1) th iteration. This training sample is used for

4.3. The algorithm for LS-SVMs ensemble

problem and the latter one is a multi-class classification

of each component LS-SVMs, based on how good these

5.2. Statlog satimage data classification

The satimage data set contains six classes. 4435 data

Fusion by fuzzy integral

Componet LSSVMs with

Table 2. The experimental results of satimage data set

Figure 1. LS-SVMs ensemble scheme

The above Figure 1 shows the proposed scheme of the

To evaluate the efficacy of the proposed boosting

From table 1 and table 2, we can see that the

This paper proposes a novel ensemble method based

[3] Vapnik. V., Statistical Learning Theory, New York:

You might also like