You are on page 1of 28

Breast Cancer Prediction

Using Data Mining

Department of Information Science


Team members
Under the guidance of -

Mrs Ramya.B.S

Presented by

Asif H [1NT15IS020]

Deivanai A [1NT15IS026]

Dhiraj M G [1NT15IS028]

Gautham N [1NT15IS032]
Introduction
● Breast cancer is cancer that forms in the cells of the breasts.
● It can occur in both men and women, but it’s far more common in
women.
● Data mining which is basically extraction of useful information in a pool of
data and looking for patterns among this data, can be used to develop a
tool to predict the prevalence of breast cancer in a patient.
● Here it is used to create a predictive model that can be accessed on a
website
Motivation
● The rate of incidence was found to be 25.8 in 100,000 women in India
● Studies show India still has a low breast cancer survival rate of 66%
● The major reason for low survival rates of breast cancer in India is that
the awareness about cancer and its treatment is very low
● More than 90% of women diagnosed with breast cancer at the earliest
stage survive their disease for at least 5 years
● Early detection can help cure the disease before it becomes advanced
Objective
● Appropriate analysis of the given dataset after removing null values
● Obtain the clean dataset without null, redundant values etc
● To use the following algorithms to predict the possibility of breast cancer
○ Support Vector Machine
○ Logistic Regression
○ Linear Regression

● Create a web application to predict the possibility of breast cancer with


the three algorithms
Problem Statement

Breast Cancer Prediction Using Data Mining and Machine


learning. Development of a web application that allows users
to get information regarding the prevalence of breast cancer,
based on readings obtained from medical tests.
Literature Survey
● Breast Cancer analysis and prediction has been done using various
algorithms
● SVM has shown the best accuracy amongst all
● Papers referred :
○ Study on prediction of breast cancer recurrence using data mining techniques by Uma
Ojha and Savitha Goel.(IEEE)
○ Breast Cancer severity degree prediction using data mining techniques by Mohammed H
Tafish and Alaa M El-Halees.(IEEE)
○ Breast cancer Prediction using Data Mining Model by Haifeng Wang and Sang Won
Yoon.
Design methodology
● Data mining: Involves the use of sophisticated data analysis tools to
discover previously unknown, valid patterns and relationships in large
data set.
● Regression: Regression algorithms predict the output values based on
input features from the data fed in the system. According to Oracle,
definition of Regression – a data mining function to predict a number.
● Classification : Classification is a data mining function that assigns items
in a collection to target categories or classes. The goal of classification is
to accurately predict the target class for each case in the data.
● Linear Regression.
● Logistic Regression.
● Support Vector Machine.
Architectural
Design

Fig1. Project Architecture


Workflow

Fig 2. Workflow
Linear Regression
● Linear regression is a kind of statistical analysis that attempts to show a
relationship between two variables.
● Linear regression looks at various data points and plots a trend line.

Fig3. Probability graph


Accuracy calculated

Accuracy : 77.07%

Fig4. Confusion matrix graph


Logistic regression
It’s a classification algorithm, that is used where the response variable is categorical. The
idea of Logistic Regression is to find a relationship between features and probability
of particular outcome.

E.g. When we have to predict if a student passes or fails in an exam when the number of
hours spent studying is given as a feature, the response variable has two values, pass
and fail.

This type of a problem is referred to as Binomial Logistic Regression, where the


response variable has two values 0 and 1 or pass and fail or true and false. Multinomial
Logistic Regression deals with situations where the response variable can have three
or more possible values.
Accuracy calculated

Fig4. Confusion matrix graph

Accuracy : 96.10%

Fig5. Confusion matrix graph


Support Vector Machine
Support vector machines are supervised learning models with associated
learning algorithms that analyze data used for classification and regression
analysis.

It is a representation of examples showed as points in a space, where the


points are mapped such that, points belonging to two separate categories are
divided by a hyperplane. Support vector machines can perform linear as well
as non-linear regression.
Accuracy calculated

Accuracy : 98.04%

Fig6. Confusion matrix graph


Software and Hardware requirements
Software Requirements :

- Python
- Windows/Ubuntu
- Web browser

Hardware Requirements :

- i5 processor.
- 4 - 8GB RAM.
Results - Web application
Comparison of the algorithms - Time taken
Comparison of the algorithms - Accuracy
Test case
Result for Support vector machine
Result for Logistic regression
Result for Linear regression
References
● [1] Uma Ojha, Savita Goel, A study on prediction of Breast Cancer
recurrence using Data Mining Techniques
● [2] Mohammed H.Tafish, Dr. Alaa M.El-Halees, Breast Cancer Severity
Prediction Using Data Mining Techniques in the Gaza Strip.
● [3] Jabeena Sultana, Abdul Khader Jilani, Predicting Breast Cancer Using
Logistic Regression and Multi-Class Classifiers.
● [6] https://www.tutorialspoint.com/flask
● [7] https://towardsdatascience.com
● [8] https://flask.pocoo.org
THANK
YOU

You might also like