Professional Documents
Culture Documents
0126CS10MT17
Under the Guidance of
Dr. Kavita Burse
CERTIFICATE
THIS IS TO CERTIFY THAT THE DISSERTATION ENTITLED
Classification
of
Wisconsin
Breast
Cancer
Diagnostic and Prognostic Dataset using Polynomial
Neural Network BEING SUBMITTED BY Shweta Saxena
IN PARTIAL FULFILLMENT OF THE REQUIREMENT FOR
THE AWARD OF M.TECH DEGREE IN COMPUTER
SCIENCE & ENGINEERING TO ORIENTAL COLLEGE OF
TECHNOLOGY, BHOPAL (M.P) IS A RECORD OF BONAFIDE
WORK DONE BY HIM UNDER MY GUIDANCE.
Prof. Roopali
Head of
OCT, Bhopal
II
APPROVAL CERTIFICATE
This
dissertation
work
entitled
Classification
of
INTERNAL EXAMINER
EXTERNAL EXAMINER
Date:
Date:
III
CANDIDATE DECLARATION
I hereby declare that the dissertation work presented in the report
entitled as Classification of Wisconsin Breast Cancer Diagnostic
and Prognostic Dataset using Polynomial Neural Network
submitted in the partial fulfillment of the requirements for the award
of the degree of Master of Technology in Computer Science &
Engineering of Oriental College of Technology is an authentic record
of my own work.
I have not submitted the part and partial of this report for the award
of any other degree or diploma.
Date:
Saxena
Shweta
(0126CS10MT17)
IV
ACKNOWLEDGEMENT
I would like to express my deep sense of respect and gratitude towards my advisor
and guide Dr. Kavita Burse, Director Oriental College of Technology who has
given me an opportunity to work under her. She has been a constant source of
inspiration throughout my work. She displayed unique tolerance and understanding at
every step of progress of this work and encouraged me incessantly. Her invaluable
knowledge and innovative ideas helped me to take the work to the final stage. I
consider it my good fortune work under such a wonderful person.
I express my respect to Prof. Roopali Soni, Head, Computer Science
Engineering Department, Oriental College of Technology for her constant
encouragement and invaluable advice in every aspect of my academic life. I am also
thankful to all faculty members of Computer Science and Engineering Department for
their support and guidance.
I am especially thankful to my father Mr. Damodar Saxena, my mother Mrs.
Nirmala Saxena, and my loving sisters Shikha and Shraddha for their love,
sacrifice and support on every path of my life. I extend a special word of thanks to my
husband Mr. Ashish Saxena for his moral support and help in achieving my aim.
Last but not the least I am extremely thankful to all who have directly or indirectly
helped me for the completion of my work.
Shweta Saxena
(0126CS10MT17)
ORGANIZATION OF DISSERTATION
The report Classification of Wisconsin Diagnostic and Prognostic Dataset using
Polynomial Neural Network has been divided into 7 chapters as follows:
Chapter 2 Introduction
Chapter 1 first describes the motivation of this research work. It then describes breast
cancer disease, its symptoms and types in detail. The chapter also describes diagnosis
and prognosis process of the disease.
Chapter 4 MATLAB
The technology used for implementation of proposed work is MATLAB. The chapter
gives a brief introduction of MATLAB along with its advantages and detailed
description of Neural Network Toolbox available in MATLAB for design of neural
network. The chapter also explains the neural network design process using neural
network toolbox.
VI
Chapter 5
Chapter 5 presents the description of dataset used for implementation of this research
and the results of implementation.
Chapter 6
Chapter 6 concludes the dissertation and provides possible directions for relevant
future work.
VII
ABSTRACT
Breast cancer is the most common form of cancer and major cause of death in
women. Normally, the cells of the breast divide in a regulated manner. If cells keep
on dividing when new cells are not needed, a mass of tissue forms. This mass is
called a tumor. This tumor can be cancerous or non-cancerous. The goal of diagnosis
is to distinguish between cancerous and non-cancerous cells. Once a patient is
diagnosed with breast cancer, the prognosis gives the anticipated long-term behavior
of the ailment. Breast cancer detection, classification, scoring and grading of
histopathological images is the standard clinical practice for the diagnosis and
prognosis of breast cancer. In a large hospital, a pathologist typically handles a
number of cancer detection cases per day. It is, therefore, a very difficult and timeconsuming task. Owing to their wide range of applicability and their ability to learn
complex and non linear relationships including noisy or less precise information
Artificial Neural Networks (ANNs) are very well suited to solve problems in
biomedical engineering. ANNs can be applied to medicine in four basic fields:
modeling, bioelectric signal processing, diagnosing and prognostics. There are
several systems available for the diagnosis and selection of therapeutic strategies in
breast cancer.
In this research we propose neural network based clinical support system to provide
medical data analysis for diagnosis and prognosis of breast cancer. The system
classifies the breast cancer diagnostic data which are provided as input to neural
network into two sets- benign (non- cancerous) and malignant (cancerous) to get the
diagnostic results. For getting prognosis results the system classify the prognostic
data which are given as input to neural network into two classes- recurrent and nonrecurrent. Results belong to recurrent set shows that cancer is reoccurred after some
time. Polynomial neural network (PNN) structure is used along with back
propagation algorithm for classification of breast cancer data. Wisconsin Breast
Cancer (WBC) datasets from the UCI Machine Learning repository is used as input
datasets to PNN. Data pre-processing technique named Principal Component
Analysis (PCA) is used as a features reduction transformation method to improve the
accuracy of PNN. In our results the Mean Square error (MSE) is substantially
reduced for PCA preprocessed data as compared to normalized data. Hence we get
more accurate diagnosis and prognosis results.
Keywords- breast cancer, polynomial neural network, principal component
analysis, wisconsin breast cancer dataset.
VIII
CONTENTS
DESCRIPTION
List of Fig.s
PAGE NO.
XII
Chapter I
Introduction
1-7
1.1
Research Motivation 2
1.2
Introduction
1.3
1.4
1.5
1.6
Chapter - 2
Literature Review
8-26
2.1
Introduction
2.2
2.3
prognosis
26
Chapter 3
27-40
Overview of ANN
28
3.2
Basics of ANN
28
3.3
29
11
3.4
3.5
Advantages of ANN 35
3.6
Medical Applications 35
3.7
37
37
3.8
36
38
39
Advantages of PCA
39
40
40
Chapter 4
41-48
MATLAB
4.1
Introduction
42
4.2
Advantages of MATLAB
42
4.3
Limitations of MATLAB
43
4.4
44
4.5
46
46
47
47
X
45
47
48
Chapter 5
Simulation and Results
49-60
5.1
Introduction
50
5.2
Description of dataset 52
5.3
57
58
57
Chapter 6
Conclusion and Future Scope
6.1
Conclusion
62
6.2
Future work
62
List of Publications
References
61-62
63-64
65-74
LIST OF FIGURES
XI
47
Breast Cancer 3
Fig. 1.2
Fig. 2.1
An MLP structure
Fig. 2.2
Fig. 3.1
A single neuron
Fig. 3.2
Fig. 3.3
Fig. 3.4
Fig. 3.5
Fig. 4.1
Fig. 5.1
Fig. 5.2
iterations)
55
Testing error for normalization and PCA data for WPBC dataset over
100 data
55
Testing error for normalization PCA for WPBC dataset over 198 data
11
16
26
30
34
56
LIST OF TABLES
TABLE NO. TITLE
27
PAGE NO.
Table 2.1
23
Table 4.1
42
Table 5.1
Table 5.2
48
Table 5.3
49
Table 5.4
50-51
Table 5.5
52
Table 5.6
53
Table 5.7
53
Table 5.8
53-54
Table 5.9
54
Table 5.10
54
XIII
Chapter 1
Introduction
1.1
Research Motivation
According to the World Health Organization (WHO), breast cancer is currently the
top cancer in women worldwide and the second highest cause of death for all female.
Diagnosis and prognosis of breast cancer at very early stage is recondite due to
various factors, which are cryptically interconnected to each other. We are oblivious
to many of them. Until an effective preventive measure becomes widely available,
early detection followed by effective treatment is the only recourse for reducing
breast cancer mortality. Most breast cancers are detected by the patient as the lump in
the breast. The majority of breast lumps are benign (non- cancerous) so it is the
physicians responsibility to diagnose breast cancer. The goal of diagnosis is to
distinguish between malignant (Cancerous) and benign breast lumps. Once a patient
is diagnosed with breast cancer, the malignant lump must be excised. During this
procedure, or during a different post-operative procedure, physicians must determine
the prognosis of the disease. Prognosis gives the anticipated long-term behavior of
the ailment. A major class of problems in medical science involves the diagnosis and
prognosis of breast cancer, based upon various tests performed upon the patient.
When several tests are involved, the ultimate diagnosis and prognosis may be
difficult to obtain, even for a medical expert. In human operator base analysis of test
results errors may also be created in calculation and this will result in faulty
treatment for the patients. This has given rise, over the past few decades, to
computerized diagnostic and prognostic tools, intended to aid the physician in
making sense out of the welter of data. A prime target for such computerized tools is
in the domain of cancer diagnosis and prognosis. Neural networks are computerXIV
based tools inspired by the vertebrate nervous system that have been increasingly
used in the past decade to model biomedical domains. The motivation for this
research is to create neural network based tool for doctors to use for classifying the
results obtained from various tests performed upon the patient. The neural networks
based clinical support system proposed in this research provide medical data analysis
for diagnosis and prognosis in shorter time and remain unaffected by human errors
caused by inexperience or fatigue. Use of ANN increases the accuracy of most of the
methods and reduces the need of the human expert. The back propagation algorithm
has been used to train neural network keeping in view of the significant
characteristics of NN and its advantages for the implementation of the classification
problem. PCA is used as a features reduction transformation method to improve the
accuracy of ANN. Advantages of feature reduction includes the identification of a
reduced set of features among a large set of features that are used for outcome
prediction. Though the proposed neural network model is implemented on standard
Wisconsin dataset obtained from UCI machine learning repository, it can also be
implemented using similar dataset.
1.2 Introduction
Breast cancer is the major cause of death by cancer in the female population [1].
Most breast cancer cases occur in women aged 40 and above but certain women with
high-risk characteristics may develop breast cancer at a younger age [2]. Breast
cancer occurs in humans and other mammals. While theoverwhelming majority of
human cases occur in women, male breast cancer can also occur [3]. Cancer is a
disease in which cells become abnormal and form more cells in an uncontrolled way.
With breast cancer, the cancer begins in the tissues that make up the breasts. The
breast consists of lobes, lobules, and bulbs that are connected by ducts. The breast
also contains blood and lymph vessels. These lymph vessels lead to structures that
are called lymph nodes. Clusters of lymph nodes are found under the arm, above the
collarbone, in the chest, and in other parts of the body. Together, the lymph vessels
and lymph nodes make up the lymphatic system, which circulates fluid called lymph
throughout the body. Lymph contains cells that help fight infection and disease.
Normally, the cells of the breast divide in a regulated manner. If cells keep dividing
when new cells are not needed, a mass of tissue forms. This mass is called a tumor as
shown in fig. 1.1[4]. A tumor can be benign or malignant. A benign tumor is not
XV
cancer and will not spread to other parts of the body. A malignant tumor is cancer.
Cancer cells divide and damage tissue around them. When breast cancer spreads
outside the breast, cancer cells are most often found under the arm in the lymph
nodes. In many cases, if the cancer has reached the lymph nodes, cancer cells may
have also spread to other parts of the body via the lymphatic system or through the
bloodstream. This can be life-threatening [5].
This is the most common pre-invasive breast cancer. More commonly seen now
because this form is generally seen on a mammogram and is identified by unusual
calcium deposits or puckering of the breast tissue (called stellate appearance). If left
untreated, DCIS will progress to invasive breast cancer.
b)
Unlike DCIS, LCIS is not really cancer at all. Most physicians consider the finding
of LCIS to be accidental, and it is thought to be a marker for breast cancer risk. That
is, women with LCIS seem to have a 7-10 times increased risk of developing some
form of breast cancer (usually invasive lobular carcinoma) over the next 20 years.
LCIS does not warrant treatment by surgery or radiation therapy. Close follow-up is
most commonly indicated, and LCIS is not easily seen on mammogram. Recent data
suggest that this condition may be a precursor to invasive lobular cancer. There may
XVII
be some forms of LCIS (ie, the pleomorphic subtype) that require more aggressive
local therapy and closer follow-up.
Invasive Forms of cancer area)
Ductal carcinoma:
This is the most common form of breast cancer and accounts for 70% of breast
cancer cases. This cancer begins in the milk ducts and grows into surrounding
tissues.
b)
Lobular carcinoma:
This originates in the milk-producing lobules of the breast. It can spread to the fatty
tissue and other parts of the body. About 1 in 10 breast cancers are of this type [10].
c)
Inflammatory carcinoma:
This is the fastest growing and most difficult type of breast cancer to treat. This
cancer invades the lymphatic vessels of the skin and can be very extensive. It is very
likely to spread to the local lymph nodes.
e)
Pagets disease:
Paget's disease is cancer of the areola and nipple. It is very rare (about 1% of all
breast cancers). In general, women who develop this type of cancer have a history of
nipple crusting, scaling, itching, or inflammation.
1.5 Breast Cancer Diagnosis
Most breast cancers are detected by the patient as the lump in the breast. The
majority of breast lumps are benign (non- cancerous) so it is the physicians
responsibility to diagnose breast cancer. The goal of diagnosis is to distinguish
between malignant (Cancerous) and benign breast lumps. The three methods
currently used for breast cancer diagnosis are mammography, fine needle aspirate
(FNA) and surgical biopsy [11]. Mammography has a reported sensitivity
(probability of correctly identifying a malignant lump) which varies between 68%
and79% [12].Taking a fine needle aspirate (i.e. extracting fluid from a breast lump
XVIII
using a small-gauge needle) and visually inspecting the fluid under a microscope has
a reported sensitivity varying from 65% to 98% [13]. Fig 1.2 shows an FNA image
of benign and malignant breast mass.
outlook for the disease for patients whose cancer has been surgically removed[11].
Prognosis is important because the type and intensity of the medications are based on
it. Currently, the most reliable method of determining the prognosis is by axillary
clearance (the dissection of axillary lymph nodes) [Choong]. Unfortunately, for
patients with unaffected lymph nodes, the result is unnecessary numbness, pain,
weakness, swelling, and stiffness[15]. Prognosis poses a more difficult problem than
that of diagnosis since the data is censored. That is, there are only a few cases where
we have an observed recurrence of the disease [14]. A patient can be classified as a
recur if the disease is observed at some subsequent time to tumor excision, a patient
for whom cancer has not been recurred and may never recur, has an unknown or
censored[16] time to recur (TTR). On the other hand, we do not observe recurrence
in most patients. For these, there is no real point at which we can consider the patient
a non recurrent case. So, the data is considered censored since we do not know the
time of recurrence. For such patients, all we know is the time of their last check-up.
We call this the disease-free survival time (DFS) [14]. Prognostic aspect of the
XIX
XX
Chapter 2
Literature Review
2.1 Introduction
Neural network techniques have been successfully applied to the diagnosis and
prognosis of breast cancer. This chapter reviews the existing/popular neural network
techniques for the diagnosis and prognosis of breast cancer. Various neural network
techniques are compared at the end. The Wisconsin breast cancer data set is used to
study the classification accuracy of the neural networks. Two research papers which
were helpful for getting the idea of survey areAn Analysis of the methods employed for breast cancer diagnosis by M. M.
Beg and M. Jain.
Breast cancer diagnosis using statistical neural networks by T. Kiyan, T
Yildirim
A brief description of above two papers is as followsAn Analysis of the methods employed for breast cancer diagnosis, Author:
M. M. Beg and M. Jain [17]
Abstract:
Breast cancer research over the last decade has been tremendous. The ground
breaking innovations and novel methods help in the early detection, in setting the
stages of the therapy and in assessing the response of the patient to the treatment. The
XXI
prediction of the recurrent cancer is also crucial for the survival of the patient. This
paper studies various techniques used for the diagnosis of breast cancer. Different
methods are explored for their merits and de-merits for the diagnosis of breast lesion.
Some of the methods are yet unproven but the studies look very encouraging. It was
found that the recent use of the combination of Artificial Neural Networks in most of
the instances gives accurate results for the diagnosis of breast cancer and their use
can also be extended to other diseases.
Comments:
This paper reviews the existing/popular methods which employ the soft computing
techniques to the diagnosis of breast cancer. The paper demonstrated the better
performance of the multiple neural networks over the monolithic neural networks for
the diagnosis of breast cancer. It can be concluded from this study that the neural
networks based clinical support systems provide the medical experts with a second
opinion thus removing the need for biopsy, excision and reduce the unnecessary
expenditure. Use of ANN increases the accuracy of most of the methods and reduces
the need of the human expert. The ANN, Support Vector Machine, Genetic algorithm
(GA), and K-nearest neighbor may be used for the classification problems. The GA is
better used for the feature selection. The fuzzy co-occurrence matrix and fuzzy
entropy method can also be used for feature extraction.
Jain[18]
Abstract:
Breast cancer is the second largest cause of cancer deaths among women. The
performance of the statistical neural network structures, radial basis network (RBF),
general regression neural network (GRNN) and probabilistic neural network (PNN)
are examined on the Wisconsin breast cancer data (WBCD) in this paper. This is a
well-used database in machine learning, neural network and signal processing.
XXII
Statistical neural networks are used to increase the accuracy and objectivity of breast
cancer diagnosis.
Comments:
This paper shows that how statistical neural networks are used in actual clinical
diagnosis of breast cancer. The simulations were realized by using MATLAB 6.0
Neural Network Toolbox. Four different neural network structures, multi layer
perceptron (MLP), RBF, PNN and GRNN were applied to WBCD database to show
the performance of statistical neural networks on breast cancer data. According to the
results RBF and PNN are the best classifiers in training set whereas GRNN gives the
best classification accuracy when the test set is considered. According to overall
results, it is seen that the most suitable neural network model for classifying WBCD
data is GRNN.
2.2 Neural network techniques for diagnosis and prognosis of breast cancer
Various techniques for diagnosis and prognosis of breast cancer areMultilayer Perceptron (MLP):
MLP has been widely used for the aim of cancer prediction and prognosis [19]. MLP
is a class of feed forward neural networks which is trained in a supervised manner to
become capable of outcome prediction for new data [20]. The structure of MLP is
shown in fig 2.1. An MLP consists of a set of interconnected artificial neurons
connected only in a forward manner to form layers. One input, one or more hidden
and one output layer are the layers forming an MLP [21]. Artificial neuron is basic
processing element of a neural network. It receives signal from other neurons,
multiplies each signal by the corresponding connection strength that is weight, sums
up the weighted signals and passes them through an activation function and feeds the
output to other neurons [22].
XXIII
The simplest form of trainable neural network, first developed (Rosenblatt, 1958),
composed of two layers of nodes namely input and output layer. A mapping between
the input and output data could be established by assigning weights to the input
numerical data during training. More complicated MLPs which are commonly used
consist of some hidden layers in addition to the input and output layers. These hidden
layers enable the MLP to extract higher order statistics from a set of given data and
hence, capture the complex relationship between input-output data. Therefore, MLPs
commonly consist of an input layer for which the number of nodes are defined by
size of input vector, one or more hidden layers which can have variable number of
nodes depending on the application and an output layer which has one or more nodes
depending on the number of output classes. Connections between these layers are
defined by weights which are assigned in a supervised learning process so that the
neural network would respond correctly to new data. This can be done via a training
algorithm, in which a cost function is computed by comparing the networks output
and the desired output and is then minimized with respect to the network parameters
[21]. Neural network classification process consists of two steps- training and testing.
The classification accuracy depends on training [23]. A mapping between the input
and output data could be established by assigning weights to the input numerical data
during training [21]. The training requires a series of input and associated output
vectors. During the training, the network is repeatedly presented with the training
data and the weights and thresholds in the network are adjusted from time to time till
the desired input output mapping occurs [22]. Training is done on known examples
and testing is done on unknown samples. The training procedure itself consisted of
two processes involving feed-forwarding the input data followed by back
propagation of error by adjusting weights to minimize error on each training epoch
[24]. Following research paper presents the effectiveness of MLP for diagnosis and
prognosis of breast cancerAn expert system for detection of breast cancer based on association rules
and neural network, Author: M. Karabatak and M. C. Ince [93]
This paper presents an automatic diagnosis system for detecting breast cancer based
on association rules (AR) and neural network (NN). In this study, AR is used for
reducing the dimension of breast cancer database and NN is used for intelligent
classification. The proposed AR + NN system performance is compared with NN
XXIV
model. The dimension of input feature space is reduced from nine to four by using
AR. In test stage, 3-fold cross validation method was applied to the Wisconsin breast
cancer database to evaluate the proposed system performances. The correct
classification rate of proposed system is 95.6%. This research demonstrated that the
AR can be used for reducing the dimension of feature space and proposed AR + NN
model can be used to obtain fast automatic diagnostic systems for other diseases.
MLP-based analysis provides an accurate and reliable platform for breast cancer
prediction given that an appropriate design and validation method is employed.
WBCD
breast
cancer
database
classification
applying
artificial
case were as follows for AMMLP is 0.989 while the AUC for BP is 0.928, this
indicates one more time the AMMLP superiority over the BP, in this particular case.
From the above results, we conclude that the AMMLP obtains very promising results
in classifying the possible breast cancer. We believe that the proposed system can be
very helpful to the physicians for their as a second opinion for their final decision. By
using such an efficient tool, they can make very accurate decisions. Our AMMLP,
proved to be equal or superior to the state-of-the-art algorithms applied to the
Wisconsin Breast Cancer Database, and shows that it can be an interesting
alternative.
RBFNN is trained to perform a mapping from an m-dimensional input space to an ndimensional output space. An RBFNN consists of the m-dimensional input x being
passed directly to a hidden layer. Suppose there are c neurons in the hidden layer.
Each of the c neurons in the hidden layer applies an activation function, which is a
function of the Euclidean distance between the input and an m-dimensional prototype
vector. Each hidden neuron contains its own prototype vector as a parameter. The
output of each hidden neuron is then weighted and passed to the output layer. The
outputs of the network consist of sums of the weighted hidden layer neurons [28].
The transformation from the input space to the hidden-unit space is nonlinear where
as the transformation from the hidden-unit space to the output space is linear [29].
The performance of an RBFNN depends on the number and location (in the input
space) of the centers, the shape of the RBFNN functions at the hidden neurons, and
the method used for determining the network weights. Some researchers have trained
RBFNN networks by selecting the centers randomly from the training data [30].
Following research paper describes the application of RBFNN in breast cancer
predictionBreast Cancer Detection using Recursive Least Square and Modified Radial
Basis Functional Neural Network, Author: M. R. Senapati, P. K .Routray, P. K.
Dask [31]
Abstract:
A new approach for classification has been presented in this paper. The proposed
technique, Modified Radial Basis Functional Neural Network (MRBFNN) consists of
assigning weights between the input layer and the hidden layer of Radial Basis
functional Neural Network (RBFNN). The centers of MRBFNN are initialized using
Particle swarm Optimization (PSO) and variance and centers are updated using back
propagation and both the sets of weights are updated using Recursive Least Square
(RLS). Our simulation result is carried out on Wisconsin Breast Cancer (WBC) data
set. The results are compared with RBFNN, where the variance and centers are
updated using back propagation and weights are updated using Recursive Least
XXVIII
Square (RLS) and Kalman Filter. It is found the proposed method provides more
accurate result and better classification.
Comments:
Modified Radial Basis Functional Neural Network is same as that of RBFNN with an
exception that weights are assigned between neurons in the input layer and the
neurons in the hidden layer. An efficient Pattern Recognition and rule extraction
technique using Recursive Least square approximation and Modified Radial Basis
Functional Neural Networks (MRBFNN) is presented in this paper. The weights
between input layer and the hidden layer as well as hidden layer and output layer of
the RBFNN classifier can be trained using the linear recursive least square (RLS)
algorithm. The RLS has a much faster rate of convergence compared to gradient
search and least mean square (LMS) algorithms.
PNN used in [33] has a multilayer structures consisting of a single RBF hidden layer
of locally tuned units which are fully interconnected to an output layer (competitive
XXIX
layer) of two units, as shown in Fig. 2.2. In this system, real valued input vector is
features vector, and two outputs are index of two classes. All hidden units
simultaneously receive the eight-dimensional real valued input vector. The input
vector to the network is passed to the hidden layer nodes via unit connection weights.
The hidden layer consists of a set of radial basis functions. Associated with jth
hidden unit is a parameter vector, called (C_j ) a center. The hidden layer node
calculates the Euclidean distance between the center and the network input vector
and then passes the result to the radial basis function. All the radial basis functions
are of Gaussian type. Equations which used in the neural network model are as
follows-
X_j=(f
-c
_j
b^ih)
2.1
(X)=exp(-X^2 )
2.2
b^ih= 0.833/s
2.3
S_i=_(j=1)^hW_ji^ho* X_j
2.4
1, if Si max of { S_1,S_2 }
Y_i=
2.5
0, else
where i = 1,2, j = 1,2,. . . ,h, Y_i is the ith output (classification index), (f ) is the
eight-dimensional real valued input vector, W_ji^ho is the weight between the jth
hidden node and the ith output node, (C _j) is the center vector of the jth hidden
node, s is the real constant known as spread factor, bih is the biasing term of radial
basis layer, and (.) is the nonlinear RBF (Gaussian). PNN provides a general
solution to pattern classification problems by following an approach developed in
statistics, called Bayesian classifiers [34][35]. PNN combines the Bays decision
strategy with the Parzen non-parametric estimator of the probability density functions
XXX
of different classes [36]. Following research papers present the application of PNN in
breast cancer diagnosis and prognosis-
The Wisconsin Breast Cancer Problem: Diagnosis and DFS time prognosis
using probabilistic and generalised regression neural classifiers Author: Ioannis
Anagnostopoulos,
Christos
Anagnostopoulos,
Angelos
Rouskas,
George
is the probability density function of the vector random variable x and its scalar
random variable z, then the GRNN calculates the conditional mean E(z\x) of the
output vector. The joint probability density function f(x, z) is required to compute the
above conditional mean. GRNN approximates the probability density function from
the training vectors using Parzen windows estimation [40]. GRNNs do not require
iterative training; the hidden- to-output weights are just the target values tk, so the
output y(x), is simply a weighted average of the target values tk of training cases xk
close to the given input case x. It can be viewed as a normalized RBF network in
which there is a hidden unit centered at every training case. These RBF units are
called kernels and are usually probability density functions such as the Gaussians.
The only weights that need to be learned are the widths of the RBF units h. These
widths (often a single width is used) are called smoothing parameters or bandwidths
and are usually chosen by cross validation [38]. Following research paper gives
breast cancer diagnosis and prognosis results by GRNNThe Wisconsin Breast Cancer Problem: Diagnosis and DFS time prognosis
using probabilistic and generalised regression neural classifiers, Author: Ioannis
Anagnostopoulos and Christos Anagnostopoulos,
XXXII
XXXIII
variables[46]. Following research papers uses fuzzy logic approach for breast cancer
diagnosis-
combining
two
methodologiesfuzzy systems
and
evolutionary
Cancer Diagnosis Using Modified Fuzzy Network, Author: Essam AlDaoud [48]
Abstract:
In this study, a modified fuzzy c-means (MFCM) radial basis function (RBF)
network is proposed. The main purposes of the suggested model are to diagnose the
cancer diseases by using fuzzy rules with relatively small number of linguistic labels,
reduce the similarity of the membership functions and preserve the meaning of the
linguistic labels. The modified model is implemented and compared with adaptive
neuro-fuzzy inference system (ANFIS). The both models are applied on "Wisconsin
Breast Cancer" data set. Three rules are needed to obtain the classification rate 97%
by using the modified model (3 out of 114 is classified wrongly). On the contrary,
more rules are needed to get the same accuracy by using ANFIS. Moreover, the
results indicate that the new model is more accurate than the state-of-art prediction
methods. The suggested neuro-fuzzy inference system can be re-applied to many
applications such as data approximation, human behavior representation, forecasting
urban water demand and identifying DNA splice sites.
Comments:
ANFIS works with different activation functions and uses un-weighted connections
in each layer. ANFIS consists from five layers and can be adapted by a supervised
learning algorithm. In this paper ANFIS and the modified Fuzzy RBF (MFRBF) are
applied on Wisconsin Breast Cancer data set. The main purposes of the suggested
model are to diagnose the cancer diseases by using fuzzy rules with relatively small
number of linguistic labels, reduce the similarity of the membership functions and
preserve the meaning of the linguistic labels. The standard fuzzy c-means has various
well-known problems, namely the number of the clusters must be specified in
advanced, the output membership functions have high similarity, and FCM is
unsupervised method and cannot preserve the meaning of the linguistic labels. On the
contrary, the grid partitions method solves some of the previous matters, but it has
very high number of the output clusters. The basic idea of the suggested MFCM
algorithm is to combine the advantages of the two methods, such that, if more than
one cluster's center exist in one partition then merge them and calculate the
XXXV
membership values again, but if there is no cluster's center in a partition then delete it
and redefined the other clusters. The experimental results show that MFRBF can be
used to get high accuracy with fewer and unambiguous rules. The classificati-on rate
is 97% by using only three rules. On the contrary, more rules are needed to get the
same accuracy by using ANFIS. Moreover the features projected partition in ANFIS
is amb-iguous and cant preserve the meaning of the linguistic labels.
Genetic Algorithm (GA):
The standard GA proceeds as follows: an initial population of individuals is
generated at random or heuristically. Every evolutionary step, known as a generation,
the individuals in the current population are decoded and evaluated according to
some predefined quality criterion. To form a new population (the next generation),
individuals are selected according to their fitness. Many selection procedures are
currently in use, one of the simplest being fitness-proportionate selection, where
individuals are selected with a probability proportional to their relative fitness. This
ensures that the expected number of times an individual is chosen is approximately
proportional to its relative performance in the population. Thus, high-fitness or good
individuals stand a better chance of reproducing, while low-fitness ones are more
likely to disappear [45]. Genetic algorithms can be used to determine the
interconnecting weights of the ANN. During training of the network, the BP requires
approximately two ANN evaluations (i.e., one forward propagation and one
backward error propagation) for each iteration, while the GA required only one ANN
evaluation (i.e., forward propagation) for each generation and each chromosome. In
comparison to the conventional BP training algorithm, the GA has shown to provide
some benefit in evolving the inter-connecting weights for the ANNs. In [49] although
the GA trained ANN didnt outperform the BP-trained ANN at all numbers of ANN
evaluations in the test set, the GA trained ANN was found to converge faster than the
BP trained ANN in the training set.
Computer-aided diagnosis of breast cancer using artificial neural networks:
Comparison of Back propagation and Genetic Algorithms Author: Yuan-Hsiang
Chang, Bin Zheng, Xiao-Hui Wang, abd Walter F. Good [49].
Abstract:
XXXVI
Comments:
In this paper it is found that although the GA trained ANN didnt outperform the BPtrained ANN at all numbers of ANN evaluations in the test set, the GA trained ANN
was found to converge faster than the BP trained ANN in the training set.
2.3 Comparison of neural network techniques for breast cancer diagnosis and
prognosis
NN techniques for breast cancer diagnosis are compared for WBC data. It is
concluded that the MLP, RBFNN, PNN, GRNN, GA, Fuzzy- neuro -system, SANE,
IGANIFS, Xcyct system, ANFIS, SIANN may be used for the classification problem.
Almost all intelligent computational learning algorithms use supervised learning. The
accuracy of different methods is compared in table 2.1.
Table 2.1 Accuracy comparison for test data classification
Type of Network
Accuracy
References
[18]
XXXVII
96.18%
[18]
98.8% [18]
98.7% [50]
[51]
90 to 91%
[52]
[53]
[54]
XXXVIII
100% [55]
Chapter 4
Matlab
4.1 Introduction
MATLAB is a powerful computing system for handling the calculations involved in
scientific and engineering problems. The name MATLAB stands for MATrix
LABoratory, because the system was designed to make matrix computations
particularly easy[87]. Matlab program and script files always have filenames ending
with ".m". Script files contain a sequence of usual MATLAB commands, that are
executed (in order) once the script is called within MATLAB. In MATLAB almost
every data object is assumed to be an array. A good source of information related to
MATLAB, the creator company THE MATHWORKS INC and their other products
is their Web Page at www.mathworks.com [88]. There are two essential requirements
for successful MATLAB programming [87]a)
b)
XXXIX
Ease of use:
MATLAB is an interpreted language like Basic, it is very easy to use. Programs may
be easily written and modified with the built-in integrated development environment
and debugged with the MATLAB debugger. Because the language is so easy to use,
it is ideal for the rapid prototyping of new programs. Many program development
tools are provided to make the program easy to use. They include an integrated
editor/debugger, on-line documentation and manuals, a workspace browser, and
extensive demos.
2.
Platform Independence:
In MATLAB programs written on any platform will run on all of the other platforms,
and data files written on any platform may be read transparently on any other
platform. As a result, programs written in MATLAB can migrate to new platforms
when the needs of user changes.
3.
Predefined functions:
MATLAB has extensive library of predefined functions that provide tested
and pre-packaged solutions to many basic technical tasks. There are many special
purpose toolboxes available to solve complex problems in specific areas. Toolboxes
are libraries of MATLAB functions used to customize MATLAB for solving
particular class of problem. Toolboxes are a result of some of the worlds top
researchers in specialized fields. They are equivalent to pre-packaged of-theshelfsoftware for particular class of problem. These are the collection of special files
called M files that extend the functionality of the base program. Such files are called
m-files because they must have the filename extension .m. This extension is
required in order for these files to be interpreted by MATLAB. Each toolbox is
purchased separately. If an evaluation license is requested, the MathWorks sales
department requires detailed information about the project for which MATLAB is to
be evaluated. Overall the process of acquiring a license is expensive in terms of
XL
money and time. If granted (which it often is), the evaluation license is valid for two
to four weeks. The various toolboxes area.
Control Systems
b.
Signal Processing
c.
Communications
d.
System Identification
e.
Robust Control
f.
Simulink
g.
Image processing
h.
neural networks
i.
fuzzy logic
j.
Analysis
k.
Optimization
l.
Spline
m.
Symbolic
n.
4.
MATLAB has many integral plotting and imaging commands. The plots and images
can be displayed on any graphical output device supported by the computer on which
MATLAB is running. This capability makes MATLAB an outstanding tool for
visualizing technical data.
5.
MATLAB Compiler:
XLI
compiled languages.
This problem can be mitigated by properly structuring the MATLAB program and by
the use of MATLAB compiler to compile the final MATLAB program before
distribution and general use.
2.
1.
The first level is represented by the GUIs that are described in Getting
Started with Neural Network Toolbox. These provide a quick way to access the
power of the toolbox for many problems of function fitting, pattern recognition,
clustering and time series analysis.
XLII
2.
The command-line functions use simple argument lists with intelligent default
settings for function parameters. (You can override all of the default settings, for
increased functionality.) This topic, and the ones that follow, concentrate on
command-line operations. The GUIs described in Getting Started can automatically
generate MATLAB code files with the command-line implementation of the GUI
operations. This provides a nice introduction to the use of the command-line
functionality.
3.
capability allows you to create your own custom neural networks, while still having
access to the full functionality of the toolbox.
4.
The fourth level of toolbox usage is the ability to modify any of the M-files
4.5
The multilayer feed forward neural network is the workhorse of the Neural Network
Toolbox software. It can be used for both function fitting and pattern recognition
problems. With the addition of a tapped delay line, it can also be used for prediction
problems. The work flow for the neural network design process has seven primary
steps:
XLIII
The first step might happen outside the framework of Neural Network Toolbox
software, but this step is critical to the success of the design process.
4.5.1 Collecting the data
We need to collect and prepare sample data that cover the range of inputs for which
the network will be used. After the data have been collected, there are two steps that
need to be performed before the data are used to train the network: the data need to
be pre-processed, and they need to be divided into subsets.
Input
Output
Algotithm
processpca
Generally, the normalization step is applied to both the input vectors and the target
vectors in the data set. In this way, the network output always falls into a normalized
range. The network output can then be reverse transformed back into the units of the
original target data when the network is put to use in the field.
4.5.1.2 Representing Unknown or Dont Care Targets
Unknown or dont care targets can be represented with NaN values. All the
performance functions of the toolbox will ignore those targets for purposes of
calculating performance and derivatives of performance.
4.5.1.3 Dividing the Data
When training multilayer networks, the general practice is to first divide
the data into three subsets- trining, validation and testing. The function dividerand
is a default function that divide the data randomly into three subsets.
4.5.2 Creating and configuring the network
Basic components of a neural network are created and stored in the network object.
As an example, the dataset file contains a predefined set of input and target vectors.
We Load the dataset using the load command. Loading the dataset file creates two
variables.
The
input
matrix and
The
target
matrix.
The
function
the
training
is
stopped
when
the
validation
increases
over
net.trainParam.max_fail iterations.
4.5.5Validation of network
When the training is complete, we check the network performance and determine if
any changes need to be made to the training process, the network architecture or the
data sets. The first thing to do is to check the training record, tr, which was the
second argument returned from the training function. For example, tr.trainInd,
tr.valInd and tr.testInd contain the indices of the data points that were used in the
training, validation and test sets, respectively. If we want to retrain the network using
the same division of data,
we can set
net.divideFcn to
'divideInd',
practice. If the network is not sufficiently accurate, we can try initializing the
network and the training again. Each time your initialize a feed forward network, the
network parameters are different and might produce different solutions.
4.5.6 Use the network
After the network is trained and validated, the network object can be used to
calculate the network response to any input.
XLVII
Chapter 5
Simulation and Results
5.1 Introduction
For simulation three different datasets named Wisconsin Breast Cancer original
(WBC) dataset, Wisconsin diagnosis Breast Cancer (WBCD) dataset and Wisconsin
Prognosis Breast Cancer (WPBC) dataset are downloaded from the UCI Machine
Learning Repository website [91] and saved as a text file. A brief description of
Wisconsin dataset is given in table 5.1. Detaied decription of dataset is provided in
next section.
Table 5.1 A brief description of Breast Cancer datasets
Dataset name No of attributes
No of instances
11
699
No. of classes
32
569
34
198
XLVIII
After downloading we have got three separate files; one for each dataset. These files
are then imported into Excel spreadsheets and the values are saved with the
corresponding attributes as column headers. The ID of the patient cases does not
contribute to the classifier performance. Hence it is removed and the outcome
attribute defines the target or dependant variable. We preprocessed the data using
principal component analysis described in chapter 3[34]. After pre processing the
WBC data is applied to PNN described in chapter 3[29-31] which classifies the data
into two sets. The overall classification involves training and testing as shown in fig
5.1. Implementation is done with help of MATLAB 7.0 using neural network toolbox
described in chapter 4 [40-41].
Attribute
Domain
Clump thickness
1-10
1-10
XLIX
Marginal adhesion
Bare nuclei
Bland chromatin
1-10
Normal nucleoli
1-10
Mitosis
1-10
1-10
1-10
1-10
1-10
cells that are significantly enlarged may be a malignant cell. The Bare nuclei is a
term used for nuclei that is not surrounded by cytoplasm (the rest of the cell). Those
are typically seen in benign tumors. The Bland Chromatin describes a uniform
texture of the nucleus seen in benign cells. In cancer cells the chromatin tends to be
coarser. The Normal nucleoli are small structures seen in the nucleus. In normal cells
the nucleolus is usually very small if visible. In cancer cells the nucleoli become
more prominent, and sometimes there are more of them. Finally, Mitoses is nuclear
division plus cytokines and produce two identical daughter cells during prophase. It
is the process in which the cell divides and replicates. Pathologists can determine the
grade of cancer by counting the number of mitoses.
Wisconsin Diagnosis Breast Cancer (WDBC) Dataset :
This database has 569 instances and 32 attributes including the class attribute.
Attribute 2 is class attribute. Other attributes are used to represent instances. Each
instance has one of two possible classes: benign or malignant. According to the class
distribution 357 instances are Benign and 212 instances are Malignant. Table 5.3
provides the attribute information of WDBC dataset.
Table 5.3 Attribute information of WDBC dataset
Attribute name
ID
Significance
Attribute ID
Unique ID of patient 1
Outcome
4, 14, 24
Perimeter 1,2,3
5, 15,25
Area 1,2,3
3, 13, 23
6, 16, 26
Smoothness 1,2,3
7, 17,27
Compactness 1,2,3
Concavity 1,2,3
Concave points 1,2,3 Number of concave portions of the contour 10, 20, 30
Symmetry 1,2,3
12, 22, 32
Fractal dimension 1,2,3
The details of the attributes found in WDBC dataset are : ID number, Diagnosis
(M = malignant, B = benign) and ten real-valued features are computed for each cell
nucleus: Radius, Texture, Perimeter, Area, Smoothness, Compactness, Concavity,
Concave points, Symmetry and Fractal dimension [92]. These features are computed
from a digitized image of a fine needle aspirate (FNA) of a breast mass. Where the
radius of an individual nucleus is measured by averaging the length of the radial line
segments defined by the centroid of the snake and the individual snake points. The
total distance between consecutive snake points constitutes the nuclear perimeter.The
area is measured by counting the number of pixels on the interior of the snake and
adding one-half of the pixels on the perimeter. The perimeter and area are combined
to give a measure of the compactness of the cell nuclei using the formula
perimeter2/area. Smoothness is quantified by measuring the difference between the
length of a radial line and the mean length of the lines surrounding it. This is similar
to the curvature energy computation in the snakes. Concavity is captured by
measuring the size of the indentation (concavities) in the boundary of the cell
nucleus. Chords between non-adjacent snake points are drawn and measure the
extent to which the actual boundary of the nucleus lies on the inside of each chord.
Concave Points feature is similar to concavity but counted only the number of
boundary point lying on the concave regions of the boundary. In order to measure
symmetry, the major axis, or longest chord through the center, is found. Then the
length difference between lines perpendicular to the major axis to the nuclear
boundary in both directions is measured. The fractal dimension of a nuclear boundary
is approximated using the coastline approximation described by Mandelbrot. The
perimeter of the nucleus is measured using increasingly larger rulers. As the ruler
size increases, decreasing the precision of the measurement, the observed perimeter
decreases. Plotting log of observed perimeter against log of ruler size and measuring
the downward slope gives (the negative of) an approximation to the fractal
dimension. With all the shape features, a higher value corresponds to a less regular
contour and thus to a higher probability of malignancy. The texture of the cell
LII
nucleus is measured by finding the variance of the gray scale intensities in the
component pixels.
ID
Significance
Attribute ID
Unique ID of patient 1
Outcome
5, 15, 25
Perimeter 1,2,3
6, 16, 26
Area 1,2,3
Smoothness 1,2,3
7, 17,27
8, 18, 28
4, 14, 24
Compactness 1,2,3
9, 19,29
Concavity 1,2,3
Symmetry 1,2,3
Tumour size
34
35
The details of the attributes found in WPBC dataset are: ID number, Outcome (R
= recur, N = non-recur), Time (R => recurrence time, N => disease-free time), from 3
to 33 ten real-valued features are computed for each cell nucleus: Radius, Texture,
Perimeter, Area, Smoothness, Compactness, Concavity, Concave points, Symmetry
and Fractal dimension. These features computed for each cell nucleus from a
digitized image of a fine needle aspirate (FNA) of a breast mass. The mean, standard
error, and/or largest (worst case-mean of the three largest values) of these features
were computed for each image, resulting in 30 features. The thirty four is Tumor size
and the thirty five is the Lymph node status. Its known from the previous lines that
the diagnosis and prognosis has the same features yet the prognosis has two
additional features as follows: Tumor Size is the diameter of the excised tumor in
centimeters. Tumor Size is divided into four classes: T-1 is from 0 - 2 centimeters. T2 is from 2 - 5 cm. T-3 is greater than 5cm. T-4 is a tumor of any size that has broken
LIV
through (ulcerated) the skin, or is attached to the chest wall. Lymph node status is the
number of positive auxiliary lymph nodes observed at time of surgery. The lymph
nodes in the armpit (the auxillary lymph nodes) are the first place breast cancer is
likely to spread. Lymph node status is highly related to prognosis. Lymph nodenegative means the lymph nodes do not contain cancer. And Lymph node-positive
means the lymph nodes contain cancer.
According to the attributes in WDBC and WPBC datasets, these attributes have
following 3 values with 3 columns in the data set.
5.2
WBC data after applying PCA is compared in Table 5.5. We observed that MSE is
substantially reduced for PCA processed data even when 400 instances are used for
training. The testing error is compared in table 5.6.
Table 5.5 Training performance for WBC dataset
Number of Training Patterns MSE for Normalization
100
0.0050 0.0011
200
0.0025 6.6408e-04
300
0.0017 4.1121e-04
400
0.0013 3.0468e-04
0.0076 3.3319e-04
200
9.7617e-05
4.1995e-04
300
3.6278e-04
1.5611e-04
400
1.4996e-04
3.0486e-04
500
1.6873e-04
9.5419e-05
600
1.0297e-04
3.8735e-05
699
1.0078e-05
2.2117e-05
100
0.0050
200
0.0026
300
0.0017
400
0.0013
0.0340
200
0.0016
300
7.6359e-04
400
5.7702e-04
500
1.9828e-04
569
8.8298e-05
0.0050 3.5910e-08
198
0.0025 2.1737e-08
100
0.0022 6.9075e-08
198
0.0019 4.6077e-11
Fig 5.3 (a) Testing error for normalization and PCA data for WPBC dataset over 100
data
Fig 5. 3(b) Testing error for normalization and PCA for WPBC dataset over 198 data.
Chapter 3
Artificial Neural Network
and
Principal Component Analysis
LVIII
Neural networks are an emergent technology with an increasing number of realworld applications [56]. Neural networks are a form of artificial intelligence that
have found application in a wide range of problems [57]-[59] and have given, in
many cases, superior results to standard statistical models [60].Artificial Neural
Networks perform various tasks such pattern matching and classification,
optimization function and data clustering. These tasks are very difficult for
traditional computers, which are faster in algorithmic computational tasks and precise
arithmetic operations [61].Originally inspired by biological models of mammalian
brains, ANN have emerged as a powerful technique for data analysis [62].Neural
Network is able to solve highly complex problems due to the non linear processing
capabilities of its neurons. In addition, the inherent modularity of the neural network
structure makes it adaptable to a wide range of applications [63].Following are the
main characteristics of ANN [64]The NNs exibit mapping capabilities, that is, they can map input patterns to
their associated output patterns.
The NNs learn by examples. Thus NN architectures can be trained with
known examples of a problem before they are tested for their inference capability
of unknown instances of the problem. They can, therefore, identify new objects
previously untrained.
The NNs posses the capability to generalize. Thus they can predict new
outcomes from the past trends.
The NNs are robust systems and are fault tolerant. They can, therefore, recall
full patterns from incomplete, partial or noisy patterns.
The NNs can process information in parallel, at high speed, and in distributed
manner.
3.2 Basic concepts of ANN
The terminology of ANNs has developed from a biological model of brain [64].
There are three aspects involved in the construction of a Neural Networks [63]Structure : The architecture and topology of Neural Networks.
Encoding : The method of changing weights ( Training ).
LIX
A NN consists of a set of connected cells: The neurons. The neuron or unit processes
inputs of NN to create an output[64]. The network consists of a number of input
units, one or more output units, together with internal units. The outputs of the
network correspond to the variables we require to predict: the inputs to the variables
on which we base the prediction. Adjustable weights are associated with the
interconnections between the units [65]. Fig 3.1[64] shows the structure of a single
neuron. Artificial neuron performs the following- Receives signal from other
neurons, multiplies each signal by the corresponding connection strength, that is
weight, sums up the weighted signals and pass them through an activation function
and feeds output to other neurons[66].
Various Neural Networks models exist and among Feed Forward Neural Network.
Feed forward neural network model, besides being popular and simple, is easy to
implement and appropriate for classification applications [63]. The feed forward
backpropagation network does not have feedback connections, but the errors are back
propagated during training. Fig.3. 2 shows the feed forward NN for breast cancer
diagnosis. The Network consists an input layer, one or more hidden layers and an
output layer. It takes the predictive attributes as input and produces the output that is
the class attribute {benign or Malingnant}.
of the network: a forward pass and backward pass. In forward pass, input vector is
applied to the sensory nodes of the network and its effect propagates through the
network layer by layer. Finally, a set of outputs is produced as the actual response of
the network. During the forward pass the synaptic weights of the network are all
fixed. During the backward pass, the synaptic weights are all adjusted in accordance
with an error correction rule. The actual response of the network is subtracted from a
desired (target) response to produce an error signal. This error signal is then
backpropogated through the network, against the direction of synaptic connections
[67]. Propagation of errors is done beginning at the output layer, through the hidden
layer, and so on, to the input layer, in backward direction. The weights are therefore
updated at each layer, beginning at the output layer. The changes in weights are
proportional to the derivative of the errors with respect to incoming weights [68]. For
a given set of training input-output pair, A BP learning algorithm provides a
procedure for changing the weights in a BPNN to classify the given input patterns
correctly. The error is the difference between the actual (calculated) and desired
(target) output [69].The input and output of the neuron, i, (except for the input layer
according to the BP algorithm [70] is formulated in 3.1 and 3.2.
Input
Output
3.1
3.2
where Wij is the weight of the connection from neuron i to node j, bi is the numerical
value called the bias and f is the activation function. The sum in (1) is over all
neurons, j, in the previous layer. The output function is a nonlinear function, which
allows a network to solve problems that a linear network cannot [71].The training
algorithm and various parameters used for training BPNN is as follows [61]-
A. Various Parameters:
Input training vector x = (x_1,,x_i,x_n )
Output target vector t =(t_(1,.) t_(k..,) t_m)
_k= error at the output unit y_k
_j=error at the hidden unit z_j
LXI
= learning rate
V_oj= bias on hidden unit j
z_j=hidden unit at j
w_oj=bias on output unit k
y_k=output unit k.
B. Training Algorithm:
Step 1: Initialize weight to small random values.
Step 2: While stopping condition is false, do steps 3-10.
Step 3: For each training pair do Steps 4-9.
Step 4: Each input unit receives the input signal x_i and transmit to all units in the
hidden layer.
Step 5: Each hidden unit (z_(j,)j=1p) sums its weighted input signals
z_inj=v_oj+_(i=1)^nx_i v_ij , applying
V_oj=_j.
Higher order or Polynomial neural networks ( PNNs) were first introduced by [72]
and further analyzed by [73] who referred to them as 'tensor networks' and regarded
LXIII
them as a special case of his functional-link models. PNNs use joint activations
between inputs, thus removing the task of establishing relationships between them
during training. PNN is faster to train and execute when compared to other neural
networks [73]. An error back propagation based learning using a norm-squared error
function is described as follows [74].The aggregation function is considered as a
product of linear functions in different dimensions of space. A bipolar sigmoidal
activation function is used at each node. This kind of neuron itself looks complex in
the first instance but when used to solve a complicated problem needs less number of
parameters as compared to the existing conventional models. PNN is a type of feed
forward NN. Fig. 3.4 shows a feed forward NN for breast cancer diagnosis. Fig.
3.3[75] shows a schematic diagram of a generalized single multiplicative or
polynomial neuron. The operator is a multiplicative operation as in (1) and (2) with
shows an architecture of PNN.
3.3
3.4
the neural networks would have been very slow. Data preprocessing is required to
improve the predictive accuracy. It can be used to scale the data in the same range of
values for each input feature in order to minimize bias within the neural network for
one feature to another. Data pre-processing speeds up training time by starting the
training process for each feature within the same scale. It is especially useful for
modeling application where the inputs are generally on widely different scales.
Therefore Neural networks learn faster and give better performance if the input
variables are pre-processed before being used to train the network. Preprocessing for
neural network involves feature selection and feature extraction.
3.7.1 Feature selection
Feature selection is the process of finding a subset of the original variables, with the
aim to reduce and eliminate the noise dimension. The main idea of feature selection
is to choose a subset of input variables by eliminating features with little or no
predictive
information.
Feature
selection
can
significantly
improve
the
comprehensibility of the resulting classifier models and often build a model that
generalizes better to unseen points [81].
3.7.2 Feature extraction
Feature extraction is a technique to transform high-dimensional data into lower
dimensions. When the input data to an algorithm is too large to be processed and it is
suspected to be notoriously redundant (much data, but not much information) then
the input data will be transformed into a reduced representation set of features (also
named features vector). If the features extracted are carefully chosen it is expected
that the features set will extract the relevant information from the input data in order
to perform the desired task using this reduced representation instead of the full size
input. By reducing, the dimensionality of the input set correlated information is
eliminated at the cost of a loss of accuracy. Dimensionality reduction can be
achieved by either eliminating data closely related with other data in the set, or
combining data to make a smaller set of features. The identification of a reduced set
of features that are predictive of outcomes can be very useful from a knowledge
discovery perspective. For many learning algorithms, the training and/or
classification time increases directly with the number of features, which is efficiently
reduced by dimension reduction methods Noisy or irrelevant features can have the
LXVII
LXVIII
[(b_1@b_2@@b_K )]
PCA allows us to compute a linear transformation that maps data from a high
dimensional space to a lower dimensional space.
y=Tx where T=[(((T_11& T_12
T_2N ) @ ) @
(&) @T_(K1
T_K2
T_1N
T_KN
@T_21&T_22
)]
[(b_1@b_2@@b_K )]=
[(u_1^T@u_2^T@@u_K^T )](x-x) =
UT( x-x)
3.9 Advantages of Principal Component Analysis
Principal components capture most of the variability in data by using fewer
dimension that where the data exists. Hence the principal components lie in the same
space as data.
The principal eigenvectors are orthogonal and represent the directions where
the signals have maximum variation. This property will speed up the convergence of
model training and improve the system performance [85].
The feature space having reduced features that truly contributes to
classification that cuts pre-processing costs [85].
LXX
Chapter 6
Conclusion and
Future Work
6.1 Conclusion
The last decade has witnessed major advancements in the method of diagnosis and
prognosis of breast cancer. Soft Computing techniques can be used for breast cancer
diagnosis and prognosis. The use of ANN increases the accuracy of most methods
and decreases the need of human expert. The neural network based clinical support
system proposed in this research provides the medical experts with a second opinion
thus removing the need for biopsy, excitation and reduce the unnecessary
expenditure. We believe that the results presented here are interesting and will leads
to further research on how the technique can be more efficiently used for diagnosis of
other diseases. The neural network works depending on the data being train to the
network. If more data being trained in the network, it will make the network more
intelligent. From the diagnosis results, determination can be made weather the
women having cancerous tumor or not. Prognosis results will help in taking treatment
decision for women having cancerous tumor.
6.2 Future work
For future work Neural Network Classification with on Fuzzy Inference System can
be applied to the task of diagnosis and prognosis. So that we can give results with a
percentage of confidence that the women is having cancerous breast tumor or noncancerous breast tumor. Prognosis results can also be given with a confidence
measure that the cancer is re-occurred or not. More accurate learning methods may
be evaluated. It is believed that the fuzzy system along with polynomial neural
network can be very helpful to the physicians for their final decision for diagnosis
LXXI
and prognosis on their patients. The physicians can perform very accurate with a
confidence by using such an efficient tool. It can assist in diagnosis and prognosis of
breast cancer.
We can also use genetic algorithm to generate the optimum weights for our network.
There are three methods where GA can be used in NN. The first is threshold hybrid
where the fitness function uses standard deviation that is less than 0.0001. Secondly
is basic hybrid of GA and BP when iteration is less than 2000 or using basic error
rate is less than 0.01. And the third is adaptation hybrid of GA and BP where error
square root is decreases by 30% until it reaches 0.01. All the three methods have
helped NN from trapped in the local minimum. In addition, GA is also a stochastic
method, it can works well with BPNN as the generated weight might be changed
several times during the learning process.
LXXII
List of Publications
International Journal
1.
Classification of Breast Cancer Data, Int. J. of Eng. and adv. Tech. (IJEAT), ISSN:
2249 8958 , vol. 2, no. 1, pp. 234-237, October 30, 2012.
International Conferences
1.
Neural Network Model for Prognostic Breast Cancer Prediction, in proc. of Int.
Conf. on Advances in Comput. Sci. and Eng., Jan 2013, Hydrabad, India.
LXXIII
References
References
1.
Cancer
Facts
and
http://www.cancer.org/Research/
Figures
2010
[online].
Available:
CancerFactsFig-ures/cancer-facts-and-figures-
2010.
2.
www.ameinfo.co-m/ tawams-2012-breast-cancer-awareness-campaign-312992.
3.
/malebreastcancer.html.
4.
http://www.beliefnet.com/healthandhealing/getcontent.aspx?cid=21322
6.
/research/jan12/0112RA-20.htm.
7.
107.doi:10. 1093/innovait/inn001.
9.
National Cancer Institute (27 June, 2005). "Paget's Disease of the Nipple:
10.
cancer/what-is-breast-cancer.
11.
the international work shop on screening for breast cancer. Journal of the National
Cancer Institute, 85: 1644-1656,1993.
13.
R. W. M. Giard and J.
69:2104-2110, 1992.
14.
15.
E. T. Lee. Statistical Methods for Survival Data Analysis. Joha Wiley and
cancer diagnosis,
29.
18.
networks, J. of Elect. & Electron. Eng., vol .4-2, 2004, pp. 1149- 1153.
19.
G.
LXXV
Networks, American J. of Engineering and Applied Sciences, vol. 5, no. 1, pp. 4251, 2012.
22.
comparing back propagation training algorithms, Int. J. on Comput. Sci. and Eng.
(IJCSE), vol. 3 no. 1, Jan 2011.
23.
comparing back propagation training algorithms, Int. J. on Comput. Sci. and Eng.
(IJCSE), vol. 3 no. 1, Jan 2011.
28.
Recursive Least Square and Modified Radial Basis Functional Neural Network, Int.
Conf. [ICCT-2010], IJCCT vol. 2, no. 2, 3, 4, 3rd-5th December 2010.
29.
LXXVI
31.
Recursive Least Square and Modified Radial Basis Functional Neural Network, Int.
Conf. [ICCT-2010], IJCCT vol. 2, no. 2, 3, 4, 3rd-5th December 2010.
32.
Zhang, Application of
disease diagnosis using neural networks, Expert Systems with Applications, vol. 36,
no. 4, pp. 8610-8615, May 2009.
34.
Approach, J. of Theoretical and Applied Inform. Technology, vol. 4, no. 8, pp. 697
699, 2009.
35.
Angelos Rouskas,
H. Pournaghshband, The
comparison of methods artificial neural network with linear regression using specific
variables for prediction stock price in Tehran stock exchange, Int. J. of Comput. Sci.
and Inform. Security (IJCSIS), vol. 7, no. 2, Feb. 2010.
LXXVII
40.
D. Vergados The wisconsin breast cancer problem: diagnosis and DFS time
prognosis using probabilistic and generalised regression neural classifiers Draft
version of paper to appear at the Oncology Reports, special issue Computational
Analysis and Decision Support Systems in Oncology, last quarter 2005.
41.
Learning for Large Data Sets, Proc. of the IEEE Int. Adv. Computing Conf. 6-7,
Patiala, India, pp.541-545, 2009.
42.
with support-vector regression for noisy regression problems, IEEE Trans. on Fuzzy
Systems, vol. 18, no. 4, pp. 686 699, 2010.
43.
with local feedbacks and its application to dynamic system processing, Fuzzy Sets
and Systems, vol. 161, no. 19, pp. 2552-2562, 2010.
44.
RR Yager, LA. Zadeh, Fuzzy Sets, Neural Networks, and Soft Computing.
breast cancer diagnosis, Artificial Intelligence in Medicine, vol. 1, no 2, pp. 131155, Oct. 1999.
46.
breast cancer diagnosis, Artificial Intelligence in Medicine, vol. 1, no 2, pp. 131155, Oct. 1999.
48.
Essam Al-Daoud
50.
system using symbiotic adaptive neuro-evolution (SANE) in Proc. Int. conf. of Soft
Computing and Pattern Recognition 2010 (SoCPaR-2010), ABV-IIITM, Gwalior,
7th-10th Dec., pp. 326-329.
51.
inference system for breast cancer diagnoses, in Proc. Comput. Sci. Convergence
Inform. Tech. 2010 (ICCIT-2010), IEEE, Seoul , 30th Nov.-2nd Dec., pp. 911 915.
52.
V. Bevilacqua, G. Mastronardi,
methods and artificial neural network design in breast cancer diagnosis: IDEST
experience, in Proc. Int. Conf. on Intelligent Agents, Web Technologies and Internet
Commerce and Int. Conf. on Computational Intelligence for Modeling, Control
Automation 2005 (CIMCA-2005), 28th-30th Nov., IEEE, Vienna, pp. 373-378.
53.
neural network with adaptive boosting for computer aided Diagnosis of breast
cancer, in Proc. IEEE Int. Workshop on Soft Computing in Ind. Application, 2003
(SMCia-2003), Finland, 23rd-25th June, pp. 167-172.
54.
artificial neural networks to medical diagnosis, in Proc. 7th Australian and New
Zealand Intelligent Inform. System Conf. 2001, IEEE, Perth, 18th-21st Nov., 2001,
pp. 89 -94.
56.
60.
Comput 1989;1:42564.
61.
63.
www.iasri.res.in/ebook /EBADAT/5.../5-ANN_GKJHA_2007.pdf
65.
Ruth M. Ripley, Neural network models for breast cancer prognosis, Ph. D.
Thesis, Dept. of Eng. and Sci., St. Cross College, Univ. Of Oxford, 1998.
66.
comparing back propagation training algorithms, Int. J. on Comput. Sci. and Eng.
(IJCSE), vol. 3 no. 1, Jan 2011.
67.
Education, 2001.
68.
connectionist models of
Comput. and
P.
Heermann and N.
sensing data using a back- propagation neural network, IEEE Trans. on Geoscience
and Remote Sensing, vol. 30, pp. 81-88 , 1992.
LXXX
72.
highorder neural networks, In Applied Optics, Vol. 26, No. 23, Optical Society of
America, Washington DC, Pages 4972-4978, 1987.
73.
Network for Reducing Bit Error Rate in Dispersive FIR Channel Noise Model, Int.
J. of Elect. and Comput. Eng., vol. 3, no. 3, 2009, psp. 150-153.
75.
R.N. Yadav, P.K. Kalra, and J. John, Time series prediction with single
multiplicative neuron model, Applied Soft Computing, vol. 7, pp. 1157-1163, 2007.
76.
1995;346:11358.
77.
[Online].Available:uran.donetsk.ua/~masters/2006/kita/zbykovsky/library/nninmed.p
df.
80.
82.
LXXXI
Comput. and
84.
in the Presence of Missing Values, J. of Machine Learning Research , vol. 11, pp.
1957-2000, 2010.
85.
Anil K. Jain, Robert P.W. Duin, and Jianchang Mao, Statistical Pattern
Tang F., Tao H., Fast linear discriminant analysis using binary bases, Proc. of
University,
1999
Available:
web2.clarkson.edu/class/ma571/Xeno-
MATLAB_guide.pdf.
89.
Network
Toolbox
Users
Guide[online].
Available:
www.mathworks.in/help/pdf_doc/nnet/nnet_ug.pdf.
91.
A.
Frank
and
A.
Asuncion
(2010).
UCI
Machine
Learning
extraction for breast tumor diagnosis, Proc. IS&T/ SPIE Int. Symp. on Electron.
Imaging: Sci. and Technology, 1993, vol. 1905, pp. 861870.
93.
based on association ruled and neural network, Expert System with Applications,
vol. 36, pp. 3465-3469, 2009.
LXXXII