Professional Documents
Culture Documents
Proceedings of the 2004 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR’04)
1063-6919/04 $20.00 © 2004 IEEE
2. SVM in CBIR RF 3. CCL for SVMs
SVM [11,12] is a very effective binary classification To address the three problems of SVM RF described
algorithm. Consider a linearly separable binary in the introduction, we propose three algorithms in this
classification problem: section.
{( xi , yi )}iN=1 and y i = {+ 1,−1}, (1)
3.1. Asymmetric Bagging SVM
where xi is an n-dimension vector and y i is the label Bagging [8] strategy incorporates the benefits of
of the class that the vector belongs to. SVM separates bootstrapping and aggregation. Multiple classifiers can
the two classes of points by a hyper-plane, be generated by training on multiple sets of samples
that are produced by bootstrapping, i.e. random
wT x + b = 0 , (2)
sampling with replacement on the training samples.
Aggregation of the generated classifiers can then be
where x is an input vector, w is an adaptive weight
implemented by majority voting rule (MVR) [10].
vector, and b is a bias. SVM finds the parameters
Experimental and theoretical results have shown that
w and b for the optimal hyper-plane to maximize the bagging can improve a good but unstable classifier
geometric margin 2 w , subject to significantly [8]. This is exactly the case of the first
( )
yi w T xi + b ≥ +1 . (3) problem of SVM based RF. However, directly using
Bagging in SVM RF is not appropriate since we have
The solution can be found through a Wolfe dual only a very small number of positive feedback samples.
problem with Lagrangian multiplie α i : To overcome this problem we develop a novel
m m
Q (α ) = ¦ α i − ¦ α iα j yi y j (xi ⋅ x j ) 2 , (4) asymmetric Bagging strategy. The bootstrapping is
i =1 i , j =1 executed only on the negative feedbacks, since there
m
subject to α i ≥ 0 and ¦ α i yi = 0 . are far more negative feedbacks than the positive
i =1 feedbacks. This way each generated classifier will be
In the dual format, the data points only appear in the trained on a balanced number of positive and negative
inner product. To get a potentially better representation samples, thus solving the second problem as well. The
of the data, the data points are mapped into the Hilbert Asymmetric Bagging SVM (ABSVM) algorithm is
Inner Product space through a replacement: described in Table 1.
xi ⋅ x j → φ ( x i ) ⋅ φ ( x j ) = K ( xi , x j ) , (5)
where K (.) is a kernel function. We then get the kernel Table 1: Algorithm of Asymmetric Bagging SVM.
version of the Wolfe dual problem: Input: positive training set S + , negative training set S − ,
m m weak classifier I (SVM), integer T (number of
Q (α ) = ¦ α i − ¦ α iα j d i d j K (xi ⋅ x j ) 2 . (6)
i =1 i , j =1 generated classifiers), x is the test sample.
Thus for a given kernel function, the SVM classifier 1. For i = 1 to T {
is given by 2. Si− = bootstrap sample from S − , with Si− = S + .
F ( x ) = sgn ( f ( x ) ) , (7) 3. (
Ci = I Si− , S + )
where f ( x ) = ¦α i yi K ( xi , x ) + b is the output hyper-
l
4. }
{ }
i =1
5. C * ( x ) = aggregation Ci ( x, Si− , S + ),1 ≤ i ≤ T .
plane decision function of the SVM.
In general, when f ( x ) for a given pattern is high, Output: classifier C * .
the corresponding prediction confidence will be high.
In ABSVM, the aggregation is implemented by
On the contrary, a low f ( x ) of a given pattern means Majority Voting Rule (MVR). The asymmetric
the pattern is close to the decision boundary and its Bagging strategy solves the classifier unstable problem
corresponding prediction confidence will be low. and the training set unbalance problem. However, it
Consequently, the output of SVM, f ( x ) has been used cannot solve the small sample size problem. We will
to measure the dissimilarity [5,6] between a given solve it by the Random Subspace Method (RSM) in the
pattern and the query image, in traditional SVM based next section.
CBIR RF.
Proceedings of the 2004 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR’04)
1063-6919/04 $20.00 © 2004 IEEE
( )
ea = EF EL EY ,X Y − ϕ ( X, L, F ) . The corresponding error
2
3.2. Random Subspace Method SVM
Similar to Bagging, RSM [9] also benefits from the estimated by the aggregated predictor is
(
eA = EY ,X Y − ϕ A ( X, P ) . )
2
bootstrapping and aggregation. However, unlike (8)
Bagging that bootstrapping training samples, RSM 2
1 M 1 N
( ) § 1 M 1 N ·
2
performs the bootstrapping in the feature space. Using the inequality ¦ ¦ zij ≥¨ ¦ ¦ zij ¸ ,
For SVM based RF, over fitting happens when the M j =1 N i=1 © M j =1 N i=1 ¹
training set is relatively small compared to the high we have:
dimensionality of the feature vector. In order to avoid
(
EF ELϕ 2 ( X, L, F ) ≥ EF ELϕ ( X, L, F ) )
2
(9)
over fitting, we sample a small subset of features to
reduce the discrepancy between the training data size EY , X EF ELϕ 2
( X, L, F ) ≥ EY , X ϕ ( X, P )2
A (10)
and the feature vector length. Using such a random Thus,
sampling method, we construct a multiple number of ea = EY , X Y 2 − 2 EY , X Yϕ A + EY , X EF ELϕ 2 ( X, L, F )
SVMs free of over fitting problem. We then combine . (11)
≥ EY , X (Y − ϕ A ) = eA
2
these SVMs to construct a more powerful classifier.
Thus the over fitting problem is solved. The RSM Therefore, the predicted error of the aggregated
based SVM (RSVM) algorithm is described in Table 2. method is reduced. From the inequality, we can see
that the more diverse is the ϕ ( x, L, F ) , the more
Table 2: Algorithm of RSM SVM.
accurate is the aggregated predictor. In CBIR RF, the
Input: feature set F , weak classifier I (SVM), integer SVM classifier is unstable both for the training features
T (number of generated classifiers), x is the test and the training samples. Consequently, the Bagging
sample. RSM strategy can improve the performance.
1. For i = 1 to T { Here we made an assumption that the average
2. Fi = bootstrap feature from F . performance of all the individual classifier ϕ ( x, L, F ) ,
3. C i = I (Fi ) trained on a subset of feature and training set replica is
4. } similar to a classifier, which use the full feature set and
5. C * ( x ) = aggregation{Ci ( x, Fi ),1 ≤ i ≤ T } . the whole subset training set. This can be true when the
size of feature and training data subset is adequate to
Output: classifier C * . approximate the full set distribution. Even when this is
not true, the drop of accuracy for each simple classifier
3.3. Asymmetric Bagging RSM SVM may be well compensated in the aggregation process.
Since the asymmetric Bagging method can Table 3: Algorithm of Asymmetric Bagging RSM SVM.
overcome the first two problems of SVMRF and the
Input: positive training set S + , negative training set
RSM can overcome the third problem of the SVMRF,
we should be able to integrate the two methods to solve S − , feature set F , weak classifier I (SVM), integer Ts
all the three problems together. So we propose an (number of Bagging classifiers), integer T f (number of
Asymmetric Bagging RSM SVM (ABRSVM) to RSM classifiers), x is the test sample.
combine the two. The algorithm is described in Table 3. 1. For j = 1 to Ts {
In order to explain why Bagging RSM strategy
2. S −j = bootstrap sample from S − .
works, we derive the proof following a similar
discussion on Bagging in [8]. 3. for i = 1 to T f {
Let ( y, x ) be a data sample in the training set L 4. Fi = bootstrap sample from F .
with feature vector F , where y is the class label of the 5. (
Ci , j = I Fi , S −j , S + . )
sample x. L is drawn from the probability distribution
6. }
P . Suppose ϕ ( x, L, F ) is the simple predictor
7. }
(classifier) constructed by the Bagging RSM strategy, °Cij (x, Fi , S −j , S + )½°
and the aggregated predictor is 8. C * ( x ) = aggregation ® ¾
ϕ A ( x, P ) = E F ELϕ ( x, L, F ) . ¯°1 ≤ i ≤ T f ,1 ≤ j ≤ Ts ¿°
Let random variables (Y , X ) be drawn from the Output: classifier C * .
distribution P independent of the training set L . The
average predictor error, estimated by ϕ ( x, L, F ) , is
Proceedings of the 2004 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR’04)
1063-6919/04 $20.00 © 2004 IEEE
Since Bagging RSM strategy can generate more are quantized into 8, 8, and 4 bins respectively. Texture
diversified classifiers than using Bagging or RSM is extracted from Y component in the YCrCb space by
alone, it should outperform the two. In order to achieve pyramid wavelet transform (PWT) with Haar wavelet.
maximum diversity, we choose to combine all The mean value and standard deviation are calculated
generated classifiers in parallel as shown in Figure 1. for each sub-band at each decomposition level. The
This is better than combining Bagging or RSM first feature length is 2 × 4 × 3 . For shape feature, edge
then combing RSM or Bagging. histogram [14] is calculated on Y component in the
YCrCb color space. Edges are grouped into four
categories, horizontal, 45 diagonal, vertical, and 135
diagonal. We combine the color, texture, and shape
features into a feature vector, and then we normalize
each feature to a normal distribution.
{C ij ( ) }
= C Fi , S j |1 ≤ i ≤ T f ,1 ≤ j ≤ Ts . (12)
Then, an aggregation rule is used to integrate all the
results from the weak classifiers for final classification
of the sample as relevant or irrelevant.
Proceedings of the 2004 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR’04)
1063-6919/04 $20.00 © 2004 IEEE
curves 0 feedback refers to the retrieval based on
Precision Vs. Number of SVMs
Euclidean distance measure without RF. 1
0.9
original SVM based RF [5] and the constrained 0.85
Precision
0.8
Top 10
StandardDeviation
Top 10
0.25 Top 20
Top 30
using different number of SVMs in ABSVM. The Top
Top
40
50
Top 60
results show that the number of SVMs will not affect 0.2 Top
Top
70
80
0.8
The second experiment compares the RSVM using 5
Precision
0.75 Top 10
0.7
Top
Top
Top
20
30
40
weak SVM classifiers with the standard SVM and
0.65 Top
Top
Top
50
60
70
CSVM based RF. The experimental results are also
0.6
0.55
Top 80
shown in Figure 5.
0.5
5 10 15 20 25
The results demonstrate that 5 weak SVMs are
Number of SVMs
Top 80
0.25
SVM
0.2
Proceedings of the 2004 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR’04)
1063-6919/04 $20.00 © 2004 IEEE
6. Conclusion [4] X. Zhou and T. S. Huang, “Small sample learning during
multimedia retrieval using biasmap,” In Proc. IEEE CVPR,
2001.
In this paper, we design a new Asymmetric Bagging [5] L. Zhang, F. Lin, and B. Zhang, “Support vector machine
Random Subspace Method for SVM based RF. The learning for image retrieval,” In Proc. IEEE ICIP, 2001.
proposed algorithm can address the classifier unstable [6] P. Hong, Q. Tian, and T. S. Huang, “Incorporate Support
problem, the unbalanced training set problem, and the Vector Machines to Content-based Image Retrieval with
small sample size problem effectively. Extensive Relevant Feedback,” In Proc. IEEE ICIP, 2000.
[7] G. Guo, A. K. Jain, W. Ma, and H. Zhang, “Learning
experiments on a Corel Photo database with 17, 800 similarity measure for natural image retrieval with relevance
images show that the new algorithm can improve the feedback,” IEEE Trans. on NN, vol. 12, no. 4, pp.811-820, July
performance of relevance feedback significantly. 2002.
[8] L. Breiman, “Bagging Predictors,” Int. J. on Machine
7. Acknowledgement Learning, no. 24, pp 123-140, 1996.
[9] T. K. Ho, “The Random Subspace Method for Constructing
The work described in this paper was fully Decision Forests,” IEEE Trans. On PAMI. vol. 20, no. 8, pp.
supported by a grant from the Research Grants Council 832-844, Aug. 1998.
of the Hong Kong SAR. (Project no. AoE/E-01/99). [10] J. Kittler, M. Hatef, P.W. Duin, and J. Matas, “On
Combining Classifiers,” IEEE Trans. On PAMI. Vol. 20, no. 3,
8. References pp. 226-239, Mar. 1998.
[11] Vapnik, V.: The Nature of Statistical Learning Theory.
[1] Y. Rui, T. S. Huang, and S. Mehrotra. “Content-based image Springer-Verlag, New York (1995).
retrieval with relevance feedback in MARS,” In Proc. IEEE [12] J. C. Burges, “A Tutorial on Support Vector Machines for
ICIP, 1997. Pattern Recognition,” Int. J. on. DMKD 2, pp.121-167, 1998.
[2] A.W.M. Smeulders, M. Worring, S. Santini, A. Gupta, and R. [13] M.J. Swain and D.H. Ballard, “Color indexing,” IJCV,
Jain, “Content-based image retrieval at the end of the early 7(1):11--32, 1991.
years,” IEEE Trans. on PAMI, vol. 22, no. 12, pp. 1349-1380, [14] B.S Manjunath, J. Ohm, V. Vasudevan, and A. Yamada,
Dec. 2000. “Color and texture descriptors,” IEEE Trans. on CSVT, Vol. 11,
[3] Y. Chen, X. Zhou, and T. S. Huang, “One-class SVM for pp. 703 -715, Jun 2001.
learning in image retrieval,” In Proc. IEEE ICIP, 2001. [15].http://www.eleceng.ohio-state.edu/~maj/osu_svm/
Retrieval precision in top20 results Retrieval standard deviation in top20 results
1 0.4
0.9 0.35
0.8
0.3
Standard deviation
0.7
Precision
0.25
0.6
0.2
0.5
ABRSVM 0.15 ABRSVM
0.4 RSMSVM RSMSVM
ABSVM ABSVM
0.3 SVM 0.1
SVM
CSVM CSVM
0.2 0.05
0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9
Number of iterations Number of iterations
Retrieval precision in top60 results Retrieval standard deviation in top60 results
0.8 0.45
0.7 0.4
0.6 0.35
Standard deviation
Precision
0.5 0.3
0.4 0.25
Figure 5. Performance of all proposed algorithms compared to existing algorithms. The algorithms are evaluated over 9 iterations.
Proceedings of the 2004 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR’04)
1063-6919/04 $20.00 © 2004 IEEE