Professional Documents
Culture Documents
The attached
copy is furnished to the author for internal non-commercial research
and education use, including for instruction at the authors institution
and sharing with colleagues.
Other uses, including reproduction and distribution, or selling or
licensing copies, or posting to personal, institutional or third party
websites are prohibited.
In most cases authors are permitted to post their version of the
article (e.g. in Word or Tex form) to their personal website or
institutional repository. Authors requiring further information
regarding Elseviers archiving and manuscript policies are
encouraged to visit:
http://www.elsevier.com/copyright
College of Computer Science and Technology, Jilin University, Changchun 130012, China
Key Laboratory of Symbolic Computation and Knowledge Engineering of Ministry of Education, Jilin University, Changchun 130012, China
a r t i c l e
i n f o
Keywords:
Breast cancer diagnosis
Rough set theory
Support vector machines
Feature selection
a b s t r a c t
Breast cancer is becoming a leading cause of death among women in the whole world, meanwhile, it is
conrmed that the early detection and accurate diagnosis of this disease can ensure a long survival of
the patients. Expert systems and machine learning techniques are gaining popularity in this eld
because of the effective classication and high diagnostic capability. In this paper, a rough set (RS) based
supporting vector machine classier (RS_SVM) is proposed for breast cancer diagnosis. In the proposed
method (RS_SVM), RS reduction algorithm is employed as a feature selection tool to remove the redundant features and further improve the diagnostic accuracy by SVM. The effectiveness of the RS_SVM is
examined on Wisconsin Breast Cancer Dataset (WBCD) using classication accuracy, sensitivity, specicity, confusion matrix and receiver operating characteristic (ROC) curves. Experimental results demonstrate the proposed RS_SVM can not only achieve very high classication accuracy but also detect a
combination of ve informative features, which can give an important clue to the physicians for breast
diagnosis.
2011 Elsevier Ltd. All rights reserved.
1. Introduction
Worldwide, breast cancer is by far the most common cancer
amongst women, with an incidence rate more than twice that of
colorectal cancer and cervical cancer and about three times that
of lung cancer (http://www.en.wikipedia.org/wiki/Breast_cancer,
last accessed September 2009). It is reported in (http://www.
breastcancer.org/symptoms/understand_bc/what_is_bc.jsp,last accessed September 2009), for women in the US, breast cancer death
rates are higher than those for any other cancer besides lung
cancer. Besides skin cancer, breast cancer is the most commonly
diagnosed cancer among US women. According to the World
Health Organization, about one-third of the cancer burden could
be decreased if cases are detected and treated early (http://
www.who.int/mediacentre/factsheets/fs297/en/index.html,
last
accessed September 2009). The commonly used diagnostic techniques, like mammography and ne needle aspiration cytology
(FNAC), are reported to lack of high diagnostic capability. Therefore, there is an absolute necessity in developing better diagnostic
techniques. Owing to the above-mentioned needs, expert systems
2. Related work
There are a great deal of techniques have been proposed to deal
with the automated diagnosis of breast cancer problem, and most
of them achieved high classication accuracies. In Quinlan (1996),
10-fold cross-validation with C4.5 decision tree method was used
and the obtained classication accuracy was 94.74%. In Pena-Reyes
and Sipper (1999), fuzzy-GA method was employed and a classication accuracy of 97.36% was obtained. In Setiono (2000), the classication was based on a feed forward neural network rule
extraction algorithm. The reported accuracy was 98.10%. In Goodman et al. (2002), three different methods, optimized learning vector quantization (LVQ), big LVQ, and articial immune recognition
system (AIRS), were applied and the obtained accuracies were
96.7%, 96.8%, and 97.2%, respectively. In Abonyi and Szeifert
(2003), an accuracy of 95.57% was obtained with the application
of supervised fuzzy clustering technique. In Hassanien (2004),
the classication technique used Rough set method reaching a
classication accuracy of 98%. In Sahan et al. (2007), a new hybrid
method based on fuzzy-articial immune system and k-nn algorithm was used and the obtained accuracy was 99.14%. In Polat
9015
and Gunes (2007), least square SVM was used and accuracy of
98.53% was obtained. In Ubeyli (2007), multilayer perceptron neural network, four different methods, combined neural network,
probabilistic neural network, recurrent neural network and SVM
were used, respectively, highest classication accuracy of 97.36%
was achieved by SVM. In Maglogiannis et al. (2009), three different
methods, SVM, Bayesian classiers and articial neural networks
were applied and the obtained accuracies were 97.54%, 92.80%
and 97.90%, respectively. And in Karabatak and Ince (2009), the
method combined with association rules and neural network were
utilized and accuracy of 95.6% was obtained.
3. Theoretical backgrounds
3.1. Basic concepts of rough set theory
Rough set (RS) theory is a new intelligent mathematical tool
proposed by Prof. Pawlak (1982) to deal with uncertainty and
incompleteness. It is based on the concept of an upper and a lower
approximation of a set, the approximation space and models of
sets. The main advantage of RS theory is that it does not need
any preliminary or additional information about data: like probability in statistics or basic probability assignment in Dempster
Shafer theory and membership grade in fuzzy set theory. One of
the major applications of RS theory is the attribute reduction that
is the elimination of attributes. The reduction of attributes is
achieved by comparing equivalence relations generated by sets of
attributes. Using the dependency degree as a measure, attributes
are removed and reduced set provides the same dependency degree as the original. Here, we just give some main concepts of
the RS theory as need for understanding the rough set analysis
done in this work, for a full description of rough set theory and related terms see (Pawlak, 1982, 1996).
3.1.1. Information system
Knowledge representation in rough sets is done via information
system, which is denoted as 4-tuple S = <U, A, V, f>, where U is the
closed universe, a nite set of N objects {x1, x2, . . . , xn}, A is a nite
set of attributes {a1, a2, . . . , an}, which can be further divided into
two disjoint subsets of C and D, A = {C [ D} where C is condition
attributes and D is a set of decision attributes. V = [ aeAVa and Va
is a domain of the attribute a, and f: U A ? V is the total decision
function called the information function such that f(x, a) e Va for
every a e A, x e U.
3.1.2. Indiscernibility relation
One of the most signicant aspects of RS theory is its indiscernibility relation. The R-indiscernibility relation is denoted by IND(R),
is dened as:
RX fxU : xR # Xg
9016
The upper approximation of X is the set of objects of U that are possibly in X, dened as:
RX fxU : xR \ Xg
BndX RX RX
A set is said to be rough if its boundary region is non-empty, otherwise the set is crisp.
3.1.4. Attribute reduction and core
There often exist some condition attributes that do not provide any additional information about the objects in U in the
information system. So, these redundant attributes can be eliminated without losing essential classicatory information (Kryszkiewicz & Rybinski, 1996). Reduct and core attribute sets are
two fundamental concepts of rough set theory. A reduct attribute
set is a minimal set of attributes from A (the whole attributes
set) that provided that the object classication is the same as
with the full set of attributes. Given C and D # A, a reduct is a
minimal set of attributes such that IND(C) = IND(D). Let RED(A)
denote all reducts of A. The intersection of all reducts of A is referred to as a core of A, i.e., CORE(A) = \ RED(A), the core is common to all reducts.
3.1.5. Dependency degree
Various measures can be dened to represent how much C, a set
of decision attributes, depends on D, a set of condition attributes.
One of the most common measure is the dependency (Pawlak,
1997) denoted as cC(D),is dened as: cC(D) = |POSC(D)|/|U| where
|U| is the cardinality of set U,POSC(D) called positive region, is dened by POSC(D) = [ xeU/D(X). Note that 0 6 cC(D) 6 1, If cC(D) = 1
we say that D depends totally on C, if 0 < cC(D) < 1, we say that D
depends partially on C, and if cC(D) = 0 means that C and D are totally independent of each other.
Minimize gw
1
jjwjj2
2
so that:
yi wT xi b P 1; 8i
Minimize Lp w; b; ai
n
X
1
jjwjj2
ai yi wT xi b 1
2
i1
n
X
@Lp
ai yi xi
0)w
@w
i1
Now we substitute back b and w in the primal gives the Wolfe dual
Lagrangian:
Maximize
n
X
i1
ai
n X
n
1X
ai aj yi yj xTi xj
2 i1 j1
10
so that:
n
X
ai yi 0; ai P 0
11
i1
Minimize gw; n
n
X
1
ni
jjwjj2 C
2
i1
12
so that:
yi wT xi b P 1 ni ;
ni P 0
13
9017
Maximize
n
X
ai
i1
n X
n
1X
ai aj yi yj xTi xj
2 i1 j1
14
so that:
0 6 ai 6 C;
n
X
a i yi 0
15
i1
The solution to this minimizations problem is identical to the separable case except for the upper bound C on the Lagrange multipliers ai.
Table 1
The detail of the nine attributes of breast cancer data.
Label
Attribute
Domain
C1
C2
C3
C4
C5
C6
C7
C8
C9
Clump Thickness
Uniformity of Cell Size
Uniformity of Cell Shape
Marginal Adhesion
Single Epithelial Cell Size
Bare Nuclei
Bland Chromatin
Normal Nucleoli
Mitoses
110
110
110
110
110
110
110
110
110
gx wT /x b
ai /xi T /x b
16
iSV
Maximize
n
X
i1
ai
n X
n
1X
ai aj yi yj Kxi ; xj
2 i1 j1
17
so that:
0 6 ai 6 C;
n
X
a i yi 0
18
Fig. 1. The scatter plot of the breast cancer data using sher discriminant analysis.
i1
After the optimal values of ai have been found the decision function
is based on the sign of:
gx
ai yi Kxi ; x b
19
iSV
Kxi ; xj 1 xTi xj p
20
jjxi xj jj2
Kxi ; xj exp
2r2
!
21
9018
Table 2
Training set and testing set.
Training-test partition (%)
5050
7030
8020
Testing set
341
478
546
342
205
137
50-50%training-test set
Initial
(C,g)
70-30%training-test set
80-20%training-test set
Grid search on
(C,g)
Actual positive
Actual negative
Predicted positive
Predicted negative
TP
FP
FN
TN
9019
Accuracy
TP TN
100%
TP FP FN TN
22
Sensitivity
TP
100%
TP FN
23
The architecture of RS_SVM which combines the feature selection and parameter optimization is shown in Fig. 2. The feature
selection is done by RS reduction algorithm, and the parameter
Specificity
TN
100%
FP TN
24
Table 4
Attribute sets identied by RS.
Attribute set number
Attribute sets
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
{Clump Thickness, Single Epithelial Cell Size, Bare Nuclei, Normal Nucleoli}
{Clump Thickness, Uniformity of Cell Shape, Single Epithelial Cell Size, Bare Nuclei}
{Clump Thickness, Uniformity of Cell Size, Bare Nuclei, Normal Nucleoli}
{Uniformity of Cell Shape, Single Epithelial Cell Size, Bare Nuclei, Bland Chromatin}
{Uniformity of Cell Shape, Marginal Adhesion, Bare Nuclei, Normal Nucleoli}
{Clump Thickness, Marginal Adhesion, Bare Nuclei, Bland Chromatin}
{Clump Thickness, Uniformity of Cell Shape, Bare Nuclei, Normal Nucleoli}
{Clump Thickness, Uniformity of Cell Size, Bare Nuclei, Bland Chromatin}
{Uniformity of Cell Size, Single Epithelial Cell Size, Bare Nuclei, Bland Chromatin, Mitoses}
{Uniformity of Cell Size, Single Epithelial Cell Size, Bare Nuclei, Normal Nucleoli, Mitoses}
{Uniformity of Cell Size, Single Epithelial Cell Size, Bare Nuclei, Bland Chromatin, Normal Nucleoli}
{Clump Thickness, Uniformity of Cell Size, Uniformity of Cell Shape, Marginal Adhesion, Bare Nuclei}
{Uniformity of Cell Shape, Marginal Adhesion, Bare Nuclei, Bland Chromatin, Mitoses}
{Uniformity of Cell Size, Uniformity of Cell Shape, Marginal Adhesion, Bare Nuclei, Bland Chromatin}
{Uniformity of Cell Size, Marginal Adhesion, Single Epithelial Cell Size, Bare Nuclei, Normal Nucleoli}
{Clump Thickness, Uniformity of Cell Size, Single Epithelial Cell Size, Bare Nuclei, Mitoses}
{Uniformity of Cell Size, Marginal Adhesion, Single Epithelial Cell Size, Bare Nuclei, Bland Chromatin}
{Clump Thickness, Uniformity of Cell Shape, Marginal Adhesion, Bare Nuclei, Mitoses}
{Clump Thickness, Uniformity of Cell Size, Marginal Adhesion, Bare Nuclei, Mitoses}
{Single Epithelial Cell Size, Bare Nuclei, Bland Chromatin, Normal Nucleoli, Mitoses}
Table 5
The remaining 7 attribute sets.
Attribute set number
Attribute sets
1
2
3
4
5
6
7
{Uniformity of Cell Size, Single Epithelial Cell Size, Bare Nuclei, Bland Chromatin, Mitoses}
{Uniformity of Cell Size, Single Epithelial Cell Size, Bare Nuclei, Normal Nucleoli, Mitoses}
{Uniformity of Cell Shape, Marginal Adhesion, Bare Nuclei, Bland Chromatin, Mitoses}
{Clump Thickness, Uniformity of Cell Size, Single Epithelial Cell Size, Bare Nuclei, Mitoses}
{Clump Thickness, Uniformity of Cell Shape, Marginal Adhesion, Bare Nuclei, Mitoses}
{Clump Thickness, Uniformity of Cell Size, Marginal Adhesion, Bare Nuclei, Mitoses}
{Single Epithelial Cell Size, Bare Nuclei, Bland Chromatin, Normal Nucleoli, Mitoses}
9020
Table 6
Classication accuracies for each subset and different testing sets.
Subset
#1
#2
#3
#4
#5
#6
#7
7030% training-test
8020% training-test
Highest (frequency)
Average
Highest (frequency)
Average
Highest (frequency)
Average
98.24(1)
97.95(9)
99.12(1)
98.53(1)
99.41(1)
98.24(5)
97.66(5)
95.55
95.71
95.99
95.96
96.55
96.14
95.36
99.51(1)
99.02(3)
99.51(2)
99.51(1)
100(1)
99.02(5)
99.51(1)
95.81
95.90
96.27
96.06
96.72
96.14
95.51
100(1)
100(5)
100(7)
100(1)
100(14)
100(3)
99.27(11)
95.96
95.90
96.49
96.06
96.87
96.05
95.64
The highest and average classication of subset #5 for testing results have been shown in bold.
Table 7
The best parameter pair (C, c) and the cross-validation rate k of subset #5.
Table 8
Sensitivity, specicity for subset #5.
Partition
Metrics
5050%
7030%
8020%
21
21
23
21
21
21
95.3079
96.0251
96.8864
7030%trainingtest partition
Sensitivity (%)
Specicity (%)
99.10
100
100
100
100
100
9021
Benign
Malignant
Benign
Malignant
Benign
Malignant
Predicted
Partitions
Benign
Malignant
221
0
134
0
90
0
2
119
0
71
0
47
Table 10
The correlation between condition attributes and decision attribute.
Condition attribute
C1
C2
C3
C4
C5
C6
C7
C8
C9
0.71479
0.820801
0.821891
0.706294
0.690958
0.822696
0.758228
0.718677
0.423448
Table 11
Classication accuracies for the top ve relevant features on different partitions.
Subset
{C2C3C6C7C8}
7030%
8020%
Max
(freq)
Avg
Max
(freq)
Avg
Max
(freq)
Avg
98.53(3)
95.90
99.51(2)
96.11
100(1)
96.13
Max, freq and Avg represent highest, frequency and average, respectively.
Fig. 5. ROC curve for 8020% training-test partition.
Table 12
Classication accuracies for the nine features on different partitions.
Subset
{C1C2C3C4C5C6C7C8C9}
7030%
8020%
Max (freq)
Avg
Max (freq)
Avg
Max (freq)
Avg
98.53(4)
96.54
99.51(2)
96.59
100(2)
96.54
Max, freq and Avg represent highest, frequency and average, respectively.
Table 13
Comparison of the classication accuracies of three feature subsets.
Subset
{C1C3C4C6C9}
{C2C3C6C7C8}
{C1C2C3C4C5C6C7C8C9}
7030%
8020%
Max (freq)
Avg
Max (freq)
Avg
Max (freq)
Avg
99.41(1)
98.53(3)
98.53(4)
96.55
95.90
96.54
100(1)
99.51(2)
99.51(2)
96.72
96.11
96.59
100(14)
100(1)
100(2)
96.87
96.13
96.54
Max, freq and Avg represent highest, frequency and average, respectively, and the highest one among three subsets has been shown in bold.
9022
As can be seen from Table 13, the ve features, namely, C1, C3,
C4, C6 and C9 obtained by RS_SVM have performed best in terms
of the highest and average classication. These ve features are
shown to be the most informative features for classifying the
breast cancer. It suggests an important clue for the physicians to
pay much more attention to these ve features, namely, Clump
Thickness, Uniformity of Cell Shape, Marginal Adhesion, Bare
Nuclei and Mitoses for breast cancer diagnosis. We believe the
proposed expert system can be very helpful in assisting the physicians to make the accurate diagnosis on the patients and can show
great potential in the area of clinical diagnosis.
6. Conclusion and future work
This work has explored a new expert system, RS_SVM, for breast
cancer diagnosis. Experiments on different portions of the WBCD
demonstrated that RS_SVM performed well in distinguishing the
benign breast tumor from the malignant one. It was observed that
the proposed method achieved the highest classication accuracies
(99.41%, 100%, and 100% for 5050% of training-test partition, 70
30% of training-test partition, and 8020% of training-test partition,
respectively) for a subset that contained ve features (subset #5).
Meanwhile, comparative experiment was conducted on the topranked ve relevant features and the whole nine features, the results showed that the ve features identied by RS_SVM outperformed the other two feature subsets in terms of the highest and
average classication accuracy. In addition, a combination of ve
features (i.e., Clump Thickness, Uniformity of Cell Shape, Marginal Adhesion, Bare Nuclei and Mitoses) for classifying breast
tumors was identied to be most informative by RS-based reduction algorithm. It implied that these ve features were worthwhile
to be taken close attention by the physicians when they conducted
the diagnosis.
We believe the promising results demonstrated by the method
(RS_SVM) in classifying the breast cancer can ensure that the physicians can make very accurate diagnostic decision. Future investigation will pay much attention to evaluate the proposed RS_SVM
in other larger breast cancer datasets. In addition, since the performance of SVM depends greatly on the model parameters, developing a more efcient approach to identify the optimal model
parameters should also be examined in our future work.
Acknowledgements
This research is supported by the National Natural Science
Foundation of China (NSFC) under Grant Nos. 60603030,
60873149, 60973088, 60773099, 60703022, and the National
High-Tech Research and Development Plan of China under Grant
Nos. 2006AA10Z245, 2006AA10A309. This work is also supported
by the Open Projects of Shanghai Key Laboratory of Intelligent
Information Processing in Fudan University under the Grand No.
IIPL-09-007, and the Erasmus Mundus External Cooperation Windows Project (EMECW): Bridging the Gap, under Grant No.
155776-EM-1-2009-1-IT-ERAMUNDUS-ECW-L12. 2009-2012.
References
Abonyi, J., & Szeifert, F. (2003). Supervised fuzzy clustering for the identication of
fuzzy classiers. Pattern Recognition Letters, 24(14), 21952207.
Boser, B. E., Guyon, I. M., et al. (1992). A training algorithm for optimal margin
classiers. In Fifth annual workshop on computational learning theory. Pittsburgh:
ACM.
Chang, C. C., & Lin, C. J. (2001). LIBSVM: A library for support vector machines.
Software available at www.csie.ntu.edu.tw/~cjlin/libsvm.
Cortes, C., & Vapnik, V. (1995). Support-vector networks. Machine Learning, 20(3),
273297.
Cristianini, N., & Shawe-Taylor, J. (2000). An introduction to support vector machines:
And other kernel-based learning methods. Cambridge, UK: Cambridge University
Press.
Frohlich, H., & Chapelle, O., et al. (2003). Feature selection for support vector
machines by means of genetic algorithms. In Proceedings of the 15th IEEE
international conference on tools with articial intelligence, Sacramento, CA, USA,
pp. 142148.
Goodman, D., Boggess, L., et al. (2002). Articial immune system classication of
multiple-class problems. Intelligent Engineering Systems Through Articial Neural
Networks, Fuzzy Logic, Evolutionary Programming Complex Systems and Articial
Life, 12, 179184.
Hassanien, A. E. (2004). Rough set approach for attribute reduction and rule
generation: A case of patients with suspected breast cancer. Journal of the
American Society for Information Science and Technology, 55(11), 954962.
Hsu, C. W., & Chang, C. C., et al. (2003). A practical guide to support vector
classication. Technical report, Department of Computer Science and
Information Engineering, National Taiwan University, Taipei. Available at
http://www.csie.ntu.edu.tw/cjlin/libsvm/.
Joachims, T., & Nedellec, C., et al. (1998). Text categorization with support vector
machines: Learning with many relevant. In Proceedings of the 10th European
conference on machine learning, pp. 137142.
John, G. H., & Kohavi, R., et al. (1994). Irrelevant features and the subset selection
problem. In Proceedings of the 11th international conference on machine learning.
Karabatak, M., & Ince, M. C. (2009). An expert system for detection of breast cancer
based on association rules and neural network. Expert Systems with Applications,
36(2, Part 2), 34653469.
Kryszkiewicz, M., & Rybinski, H., (1996). Attribute reduction versus property
reduction. In Proceedings of the fourth European congress on intelligent techniques
and soft computing, pp. 204208.
Maglogiannis, I., Zaropoulos, E., et al. (2009). An intelligent system for automated
breast cancer diagnosis and prognosis using SVM based classiers. Applied
Intelligence, 30(1), 2436.
Osuna, E., Freund, R., & Girosi, F. (1997). Training support vector machines:
Application to face detection. In Proceedings of computer vision and pattern
recognition, Puerto Rico, pp. 130136.
Pawlak, Z. (1982). Rough sets. International Journal of Parallel Programming, 11(5),
341356.
Pawlak, Z. (1996). Why rough sets. In Proceedings of the IEEE international conference
on fuzzy system.
Pawlak, Z. (1997). Rough set approach to knowledge-based decision support.
European Journal of Operational Research, 99(1), 4857.
Pena-Reyes, C. A., & Sipper, M. (1999). A fuzzy-genetic approach to breast cancer
diagnosis. Articial Intelligence in Medicine, 17(2), 131155.
Polat, K., & Gunes, S. (2007). Breast cancer diagnosis using least square support
vector machine. Digital Signal Processing, 17(4), 694701.
Quinlan, J. R. (1996). Improved use of continuous attributes in C 4.5. Journal of
Articial Intelligence Research, 4(77-90), 325.
Sahan, S., Polat, K., et al. (2007). A new hybrid method based on fuzzy-articial
immune system and k-nn algorithm for breast cancer diagnosis. Computers in
Biology and Medicine, 37(3), 415423.
Scholkopf, B., & Smola, A. J. (2002). Learning with kernels: Support vector machines,
regularization, optimization, and beyond. The MIT Press.
Setiono, R. (2000). Generating concise and accurate classication rules for breast
cancer diagnosis. Articial Intelligence in Medicine, 18(3), 205219.
Ubeyli, E. D. (2007). Implementing automated diagnostic systems for breast cancer
detection. Expert Systems with Applications, 33(4), 10541062.
UCI Repository of Machine Learning Databases. www.archive.ics.uci.edu/ml/
machine-learning-databases/breast-cancer-wisconsin/.
Vapnik, V. (1995). The nature of statistical learning theory. New York: Springer.
Vapnik, V. (1998). Statistical learning theory. New York: Wiley.
All in-text references underlined in blue are linked to publications on ResearchGate, letting you access and read them immediately.