You are on page 1of 15

ARTIFICIAL INTELLIGENCE

ITE2010

Prediction of Breast Cancer

Project report for Review-II

Under the guidance of


Prof. Ajit Kumar Santra

BY
Ashish Tiwari
16BIT0005
SLOT: E1+TE1
ABSTRACT

Breast Cancer is becoming one of the most common diseases among women that if not
taken care of in the initial stages can lead to death. Thus it is of utmost importance to
classify the tumors accurately. One of the best ways to treat Breast cancer is to target the
cancer cells at an early stage. Progression of these cells can lead to almost certain death.
This, however, is quite difficult as patients usually ignore minor symptoms that could
predict the disease. Neural network techniques are very useful to detect and monitor cancer.
We would like to try and combat Breast cancer by using soft computing techniques like
neural networks, genetic algorithm fuzzy logic for the timely prediction of Breast Cancer.
By using them to process data faster, we will not only be able to detect minor symptoms
but also deliver the required healthcare as at the earliest possible hour. This will lead to
higher accuracy in treatments as well lower mortality rates for Breast cancer patients. Soft
computing techniques can be used to cluster or classify the gene data. It is also proven to be
very helpful when it comes to pattern recognition thus making it useful for diagnosis of
cancer at very early stages
OBJECTIVE
In this project it will be showing how a combination of supervised and unsupervised
learning methods can be used to solve problems of classification. Using them we will
classify the input and determine whether it the patient has cancer or not.

INTRODUCTION

Cancer is a major root cause of disease among human deaths in many developed
countries. Cancer classification in medical practice trusted on clinical and
histopathological facts may produce incomplete or misleading results. The DNA
microarray is very useful to determine the expression level of thousands of genes
simultaneously in a cell mixture. DNA microarray technology has been applied to find
out the accurate prediction and diagnosis of cancer. Molecular level diagnostics with
gene expression profiles can offer the methodology of accurate and systematic cancer
classification. It’s very important for treatment of cancer to classify tumor accurately.
Since the gene expression data generally comprise of huge number of genes, several
scholars have been scrutinizing the problems of cancer classification using data mining
approaches, statistical methods and machine learning algorithms to effectually evaluate
these data. Various machine learning techniques are used for cancer classification, such
as support vector machine, k-nearest neighbor and neural network techniques etc.
Neural network techniques are very useful for detection and monitoring of cancer.
Artificial neural network is a robust tool recently used as either clustering or
classification of gene expression data. Supervised models are used for classification
while unsupervised models are used for clustering.
Literature Survey
Daoud and Mayo [1] have proposed Neural networks are powerful tools used widely for
building cancer prediction models from microarray data. We Cancer prediction models
review the most recently proposed models to highlight the roles of neural networks in
predicting cancer from Neural networks gene expression data. We identified articles
published between 2013–2018 in scientific databases using key Classification words such
as cancer classification, cancer analysis, cancer prediction, cancer clustering and
microarray data. Analyzing the studies reveals that neural network methods have been
either used for filtering (data engineering)the gene expressions in a prior step to prediction;
predicting the existence of cancer, cancer type or the survivability risk; or for clustering
unlabeled samples. This paper also discusses some practical issues that can be considered
when building a neural network-based cancer prediction model.

Vazifehdan et al. [2] have proposed Data mining and machine learning approaches can
be used to predict breast cancer recurrence. However, real datasets often include missing
values for various reasons. In this paper, a hybrid imputation method is proposed with
respect to the dependency between the attributes and the type of incomplete attributes in
order to especially improve the prediction of breast cancer recurrence. After splitting the
dataset into two discrete and numerical subsets, first missing values of the discrete fields
are imputed using Bayesian network. Then, using Tensor factorization, the integrated
dataset, which comprises of the filled-subset of the previous stage and numerical missing
values subset, is constructed so that both continuous missing values are imputed and the
accuracy of imputation is enhanced. We evaluated the proposed method versus six
imputation methods i.e. mean, Hot-deck, K-NN, Weighted K-NN, Tensor factorization and
Bayesian network on three datasets and used three classifiers, namely decision tree, K-
Nearest Neighbor and Support Vector Machine for recurrence prediction
Agrawal and Agrawal [3] have proposed Cancer is a dreadful disease. Millions of
people died every year because of this disease. It is very essential for medical practitioners
to opt a proper treatment for cancer patients. Therefore cancer cells should be identified
correctly. Neural networks are currently a burning research area in medical science,
especially in the areas of cardiology, radiology, oncology, urology and etc. In this paper,
we are surveying various neural network technologies for classification of cancer. The main
aim of this survey in medical diagnostics is to guide researchers to develop most cost
effective and user friendly systems, processes and approaches for clinicians.

Asri et al. [4] have proposed Breast cancer represents one of the diseases that make a high
number of deaths every year. It is the most common type of all cancers and the main cause
of women’s deaths worldwide. Classification and data mining methods are an effective way
to classify data. Especially in medical field, where those methods are widely used in
diagnosis and analysis to make decisions. In this paper, a performance comparison between
different machine learning algorithms: Support Vector Machine (SVM), Decision Tree
(C4.5), Naive Bayes (NB) and k Nearest Neighbors (k-NN) on the Wisconsin Breast
Cancer (original) datasets is conducted. The main objective is to assess the correctness in
classifying data with respect to efficiency and effectiveness of each algorithm in terms of
accuracy, precision, sensitivity and specificity

Kumari and Singh [5] have proposed Breast cancer became the major source of mortality
between women. The accessibility of healthcare datasets and data analysis promote the
researchers to apply study in extracting unknown pattern from healthcare datasets. The
intention of this study is to design a prediction system that can predict the incidence of the
breast cancer at early stage by analyzing smallest set of attributes that has been selected from
the clinical dataset. Wisconsin breast cancer dataset (WBCD) have been used to conduct the
proposed experiment. The potential of the proposed method is obtained using classification
accuracy which was obtained by comparing actual to predicted values. The outcome confirms
that the maximum classification accuracy (99.28%) is achieved for this study
Kaur et al. [6] have proposed Breast cancer is a crucial reason for death in females. Early
recognition of this disease with the assistance of mammography reduces the death rate.
Deep learning (DL) is an approach being utilized and requested by radiologist that assists
them in making an accurate diagnosis and helps to improve outcome predictions. This
paper includes a new approach, applied on the Mini-MIAS dataset of 322 images,
involving a pre-processing method and inbuilt feature extraction using K-mean clustering
for Speed-Up Robust Features (SURF) selection. A new layer is added at the classification
level, which carries out a ratio of 70% training to30% testing of the deep neural network
and Multiclass Support Vector Machine (MSVM). The outcome described here
demonstrates that the accuracy rate of the proposed automated DL method using K-mean
clustering with MSVM is better than using a decision tree model. Experimental results
show that the average accuracy (ACC) rates of the three classes, i.e., normal, benign and
malignant cancer, using the proposed method are95%, 94% and 98%, respectively

Saxena and Kumar [7] have proposed an Abstract Breast cancer is the most perilous
cancer of all type. It is the second leading cause of death among women throughout the
world. Breast cancer is preventable and curable if detected at its early stage, but as it cannot
be diagnosed early through lab tasks and due to its late detection, the death rate is
increasing enormously. Thus, diagnosis and prediction of breast cancer is always being a
challenging task, which make it alluring study for researchers over past few years. The
work done in field of breast cancer is not bounded to historical methods. Recent
advancement in the field of technology gives liberty to the researchers to expand their field
of research. Since then various data mining and machine learning techniques are used on
various breast cancer dataset.
Chaurasia et al. [8] have proposed Breast cancer is the second most leading cancer
occurring in women compared to all other cancers. Around 1.1 million cases were recorded
in 2004. Observed rates of this cancer increase with industrialization and urbanization and
also with facilities for early detection. It remains much more common in high-income
countries but is now increasing rapidly in middle- and low-income countries including
within Africa, much of Asia, and Latin America. Breast cancer is fatal in under half of all
cases and is the leading cause of death from cancer in women, accounting for 16% of all
cancer deaths worldwide

Said et al. [9] have proposed Breast cancer is a deadly disease in women. Predicting the
breast cancer outcomes is very useful in determining the efficient treatment plan for the
new breast cancer patients. Predicting the breast cancer outcomes (also called Prognosis)
are done based on the previous patient’s data, which show the patient’s characteristics and
how the doctors treated the patient. In this paper we propose a new efficient model for
predicting the main outcomes; Survival Rate, Disease Free Survival, and Recurrence
detection; of breast cancer.

Zhou et al. [10] have proposed gene selection is an important research problem owing to
the large number of genes and the small number of experimental conditions. In this paper,
we propose a Bayesian approach to gene selection and classification using the logistic
regression model. The basic idea of our approach is in conjunction with a logistic
regression model to relate the gene expression with the class labels. We use Gibbs
sampling and Markov chain Monte Carlo (MCMC) methods to discover important genes.
To implement Gibbs Sampler and MCMC search, we derive a posterior distribution of
selected genes given the observed data.
METHODOLOGY

We propose a system that comprises of the best combinations of Genetic Algorithm, neural
networks, fuzzy networks as well as hybrid systems like Neuro-Fuzzy networks and GA-
ANN systems. The primary aim is to compare the various methods to detect early stages of
Breast Cancer. The first face involves performing classification and drawing conclusions to
decide which methods are more accurate with the particular dataset and how economical
each of these methods are. The second phase takes this a step further as we check if
combinations of the methods help in improving prediction accuracy and economical
conditions

“Prediction of Breast Cancer Using Neural Networks”. Few of them are as follows.
“Feedforward Neural Network”. Here the information only travels forward in the network
(no loops), first through the input nodes, then through the hidden nodes (if present), and
finally through the output nodes. It is primarily used for supervised learning in cases where
the data to be learned is neither sequential nor time-dependent.
Another type of the system is “Radial basis function Neural Network” (RBF). It has two
layers, first where the features are combined with the Radial Basis Function in the inner
layer and then the output of these features are taken into consideration while computing the
same output in the next time-step which is basically a memory. It performs classification by
measuring the input’s similarity to examples from the training set.
“Recurrent Neural Network” (RNN) works on the principle of saving the output of a layer
and feeding this back to the input to help in predicting the outcome of the layer.
“Convolutional Neural Network” (CNN) is similar to feed forward neural networks , where
the neurons have learn-able weights and biases. Its application have been in signal and
image processing which takes over OpenCV in field of computer vision
Artificial Neural Network Architecture:

Constructing the Neural Network:


A four layered back propagation-based architecture of a neural network was proposed. The input layer
consisted of 3 nodes, that specify certain attributes of cell size critical in detecting tumours. The hidden
layer1 and hidden layer2 consists of 4 nodes. The number of nodes in the hidden layer was varied, the
most accurate prediction was observed with 4 hidden layer nodes. A series of trials had to be made to
achieve optimization of results.

Learning rate = 0.2

Number of training samples = 599

Number of testing samples = 100

Training iterations = 50
PLATFORM

Jupyter Notebook:
The Jupyter Notebook is an incredibly powerful tool for interactively developing
and presenting data science projects. A notebook integrates code and its output into a
single document that combines visualisations, narrative text, mathematical equations,
and other rich media. The intuitive workflow promotes iterative and rapid development,
making notebooks an increasingly popular choice at the heart of contemporary data
science, analysis, and increasingly science at large. Best of all, as part of the open
source Project Jupyter they are completely free

Numpy:
Numpy (Numerical Python) is a linear algebra library in Python. It is a very
important library on which almost every data science or machine learning Python
packages such as SciPy (Scientific Python), Mat−plotlib (plotting library), Scikit-
learn, etc depends on to a reasonable extent.

Tensorflow:
TensorFlow is a machine learning framework that might be your new best friend if you
have a lot of data and/or you’re after the state-of-the-art in AI: deep learning. Neural
networks. Big ones. It’s not a data science Swiss Army Knife, it’s the industrial lathe
… which means you can probably stop reading if all you want to do is put a regression
line through 20-by-2 spreadsheet.

TEST CASE:

1. Radius= [14,13]
Texture= [25,26]
2. Radius= [20,15]
Texture= [30,29]
CODES

Define the Neural Network:

classifier = tf.estimator.DNNClassifier(
feature_columns=feature_columns,
hidden_units=hidden_units_spec,
n_classes=n_classes_spec,
model_dir=tmp_dir_spec)

Train the network:

# Define the training input function


train_input_fn = tf.estimator.inputs.pandas_input_fn(
x=training_features,
y=training_labels,
num_epochs=epochs_spec,
shuffle=True)

Test the network:

# Define the test input function


test_input_fn = tf.estimator.inputs.pandas_input_fn(
x=testing_features,
y=testing_labels,
num_epochs=epochs_spec,
shuffle=False)
CONCLUSION

Cancer is one of the dreadful diseases. Diagnosis of cancer is very important in initial stage
for its proper treatment. Cancer data is a collection of thousands of genes. DNA microarray
is used to determine the expression level of genes. Analysis of microarray gene expression
data is very difficult due to its sparse and excessive characteristics. Selection of informative
genes among thousands of genes is very challenging task. By analysing these gene expression
data, heterogeneous cancer can be classified into their proper subgroups. Nowadays, various
kinds of machine learning and statistical approaches are used to classify tumour cells accurate
such as support vector machines, k-nearest neighbour, decision trees and neural network
techniques. Recently, many researchers are showing their interest in neural network
techniques to classify cancer cells. This survey clearly demonstrates the effectiveness of
neural network technologies in the detection of cancer. Most of the neural network show
tremendous result to classify tumour cells accurately. Especially MLP (Multi Layer
Perceptron) gives 97.1% accuracy and PNN(Probabilistic Neural Network) which provides
96% accuracy, Perceptron with 93 % and ART1 shows 92% accuracy as well. After removing
missing values from dataset experimental results get improved. Results of neural network
structures can be enhanced by proper settings of neural network parameters. Although neural
network techniques provide good classification rate, but their training time is very high.
Several researchers thus hybridize neural network techniques with optimization algorithms
like PSO for further enhancement of accuracy. These optimization algorithms are used for
dimensionality reduction, they suppresses search space and therefore, reduces the training
time of neural network. FLANN alone shows 63.4% accuracy whereas PSO-FLANN
provides good classification rate with 92.36%. In future study, accuracy of neural network
can be enhanced by increasing the number of neurons in the hidden layer .Different training
and learning rules can be applied for training ANN in order to improve the performance of
classifier
FUTURE SCOPE

We propose a system that comprises of the best combinations of Genetic Algorithm,


neural networks, fuzzy networks as well as hybrid systems like Neuro-Fuzzy networks
and GA-ANN systems. The primary aim is to compare the various methods to detect
early stages of Breast Cancer. The first face involves performing classification and
drawing conclusions to decide which methods are more accurate with the particular
dataset and how economical each of these methods are. The second phase takes this a
step further as we check if combinations of the methods help in improving prediction
accuracy and economical conditions.
REFERENCES

[1] Daoud Maisa and Michael Mayo(2019). Prediction of Breast Cancer using Neural
Network. Artificial Intelligence In Medicin, 0933-3657.
[2] Vazifehdan Mahin, Mohammad Hossein Moattar and Mehrdad Jalali(2019).
Hybrid Bayesian network and tensor factorization approach for missing value
imputation to improve breast cancer recurrence prediction. Journal of King Saud
University – Computer and Information Science, 31, 175-184.

[3] Agrawal Shikha and Jitendra Agrawal(2015). Neural Network Techniques for
Cancer Prediction. Procedia Computer Science,60, 769-774.
[4] Asri Hiba, Hajar Mousannif, Hassan Al Moatassime and Thomas Noel(2016).
Using Machine Learning Algorithms for Breast Cancer Risk Prediction and
Diagnosis. Procedia Computer Science, 83,1064-1069.
[5] Kumari Madhu and Vijendra Singh(2018). Breast Cancer Prediction System.
Procedia Computer Science., 132,371-376.
[6] Kaur Prabhpreet, Gurvinder Singh and Parminder kaur(2019). Intellectual
Detection and Validation of Automated Mammogram Breast Cancer Images by Multi-
Class SVM using Deep Learning Classification. Informatics in Medicine Unlocked,
18,30181-3.
[7] Saxena Niharika and Meeta Kumar(2018). Clustering for breast cancer prognosis
and risk exposure. International Journal of Pure and Applied Mathematics,118,
1314-3395.
[8] Vikas Chaurasia, Saurabh Pal and BB Tiwari(2018). Prediction of benign and
malignant breast cancer using data mining techniques. Journal of Algorithm and
Computational Technology,12,119-126.
[9] Said Attia Ahmed, Laila A.Abd-Elmegid, Sherif Kholeif and Ayman Abdelsamie
Gaber(2018). Classification based on Clustering Model for Predicting Main Outcomes
of Breast Cancer using Hyper-Parameters Optimization. International Journal of
Advanced Computer Science and Applications,9,12.
[10] Zhou Xiaobo, Kuang-Yu Liu and Stephen T.C. Wong(2004). Breast Cancer
classification and prediction using logistic regression with Bayesian gene selection.

Journal of Biomedical Science, 37, 249-259.

You might also like