You are on page 1of 4

International Journal of Trend in Scientific Research and Development (IJTSRD)

International Open Access Journal | www.ijtsrd.com

ISSN No: 2456 - 6470 | Volume - 2 | Issue – 6 | Sep – Oct 2018

A Survey on
n Various Disease Prediction Techniques
C. Leancy Jannet1, G. V. Sumalatha2
1
Student, 2Assistant Professor
Department of CT, Sri Krishna College of Arts and Science, Coimbatore

ABSTRACT
An analysis of various diseases have been predicted technique is mining CARs (class association rules)
using multiple data mining and text mining which locates relationship
ationship among item sets and class
techniques. In this article we are going to discuss labels. These item set constraints minimize CARs and
about 6 prediction techniques. Using gene expression lessen the search space and enhance the performance,
pattern we predict the disease outcome and first a tree structure is initiated for efficient mining,
implementation of pathway based ased approach for second 2 theorems for soon trimming rare item set,
classifying disease based on hyper box principles, we lastly
y efficient and fast algorithm for mining CARs 2
also present a novel hybrid prediction model with approaches are used pre processing and post
missing value imputation (HPM-MI) MI) which analyze processing [3]. The fourth technique is text mining
imputation using simple k-means means clustering. A which help researchers in assessing scientific
technique based on CCAR (Constraint Clas Class literature. Information can be withdrawn using co-
Association Rule) has been used for reducing time occurences based method and NLP-based
NLP methods
consumption in prediction of a particular disease. We and text mining tools are discussed [4]. Fifth
have discussed about text mining technique and their technique is hyper triglyceride mia by anthropometric
applications. Another technique has also been studied measures based on data mining. Many diseases can be
about hyper triglyceride mia from anthropom
anthropometric predicted by the change in hyper triglyceride mia it
measures which diverge according to age and gender. varies according to age and gender [6]. Sixth and final
Using multilayer classifiers for disease prediction we technique is multilayer classifier for disease
can achieve high diagnosis accuracy and high prediction. A huge number of prediction models can
performance. be build from data mining techniques. Here the
concept of machine learning is extended.
extended Machine
Keyword: Prediction, Genes, Data Mining, Text learning are additionally classified into supervised and
Mining, Hyper triglyceride mi a, Missing Values, Hmv unsupervised learning.
and Classifiers
2. RELATED WORK
1. INTRODUCTION 2.1. Pathway Based Approach:
Approach
In this article we are going to discuss about 6 Through gene expression diseases can be predicted
techniques for predicting disease. In the first where samples of genes are sketched as specimen of
technique diseases can be predicted by gene different disease states
tates for understanding the disease
expression pattern. For selective distinctive genes phenotype. The computation of gene expression
feature selection algorithm is enrolled. For estimate disease outcome. By using feature selection
classification 2 approaches are used network based algorithm we can pick a batch of genes which are
approaches and pathway based approach and through differently expressed. In network based approaches
hyper box representation diseases are classified [1]. disease module based methods suspect that all cellular
The second technique is predicting the disease using element that are owned to the identical topological,
MVI we first st evaluate 11 missing data imputation working or disease module have a inflated chance of
techniques practically and then find the perfect having same disease. Greedy search is executed over
method for grasping missing values from dataset a PIN (People of Information Network) to pinpoint a
using k-means
means clustering[2]. The third prediction number of genee modules whose mean countenance is

@ IJTSRD | Available Online @ www.ijtsrd.com | Volume – 2 | Issue – 6 | Sep-Oct


Oct 2018 Page: 734
International Journal of Trend in Scientific Research and Development (IJTSRD) ISSN: 2456-6470
2456
supreme. PIN data is usually undependable and groups. CMC method has given less number of
clattering and it also has a high false positive rate. In wrongly classified instances
nstances in diabetes as well as in
pathway based approaches biological pathways are hepatitis dataset. Case deletion is best in hepatitis
unreliable and organised troupe of molecular dataset [2].
interaction network. Pathway athway level disease
classification approach based on hyper--box principles 2.3. CCAR: With Item set Constraints:
Constraints
where given a microarray gene expression portrait and Class Association Rules (CAR's) are used to construct
a number of biological pathways/gene set the a classification model for prediction and narrate
classification exactness of each pathway/gene sets is association between item set and class labels. The
assessed by using only the adherent
dherent genes in the initiation of mining association rules with item set
pathway. By using psoriasis and breast cancer datasets constraints,3 major approach has been proposed. First
prediction accuracies are greater than 85% superior method is post-processing
processing method, first finds repeated
than sets of genes that are selected unplanned. Hyper item sets using a priori algorithm, second approach is
box disease classification (Hyper DC) analyze disease pre-processing
processing method which filters out the ones
specimen accurately when comparing with other which do not convince the item set constraint, the
prominent classification methodologies considering to third approach is constrained item set filtering which
SNR classification rates. Hyper DC show inflated tries to assimilate the item set constraints into original
SNR. Actual
ctual strength of Hyper DC rely on illustrative mining process because it needs
need to generate only the
power. Main advantage of Hyper DC is it is flexible frequent item set that pleases the constraints .In post-
post
[1]. processing approach CAR-miningmining algorithm is used.
This strategy is not very efficient because all CAR's
2.2. Hybrid Prediction Model with Missing Value must be generated and frequently a large number of
Imputation: candidate CAR's need to o be tested. Advantage of pre-
pre
Exact prediction in huge number of missing values in processing approach is the size of the filtered dataset
dataset is a tough task. Many hybrid models to face is very much lesser than the actual dataset. So the
this problem they have removed the missing instances mining time can be notably reduced. The authors
from dataset which is commercially known as case manifested that their method is higher than the post-post
deletion. Hybrid prediction model with Missing Value processing method.. In our proposed method the item
Imputation (HPM-MI)MI) scrutinize different imputation set constraint is thrust as intense "inside" estimation
technique by applying simple k-means
means clustering and as possible. Instead of using all rules, only rules
use the best one to dataset. Missing values happen which please the item set constraints are formed to
because of many reasons due to mistake in manual speed up the process. Proposed algorithm for mining
data, equipment mistakes, or inaccurate CAR's is treeree structure which includes only nodes
measurements. Missing values can cause many having constrained item set and two theorems for
problems like loss of efficiency,, convolutions in soon trimming infrequent item sets [3].
managing and examining data. It may also lead to bias
decisions. We have analyzed 11 Missing Value 2.4. TEXT MINING:
Imputation techniques they are case deletion, Most One of the major dominant entry point to scientific
Common Method (MC), Concept Most Common literature sources for biomedical research is pub Med
(CMC), K-Nearest
Nearest Neighbor (KNNI
(KNNI), Weighted which
hich give entry to more than 24 million scientific
Imputation with K-Nearest Neighbor (WKNN). K- literature. Fetching of relevant information from
Means Clustering Imputation (KMI), Imputation with literature database fusing these information with
Fuzzy K-Means Clustering (FKMI), Support Vector experimental output is time taking and it also requires
Machines Imputations (SVMI), Singular Value some careful attention so text mining is introduced.
introduc
Decomposition Imputation (SVDI), Local Least Text mining reply to many research questions
Squares Imputation (LLSI), Matrix Factorisation, extending from the discovery of drug targets to drug
Selecting best MVI method is based on accuracy it is repositioning. Definition of text mining by Marti
attained. We have selected clustering as beginning for Hearst is "the discovery by computer of new,
selection of best imputation technique. K K-Means previously unknown information unknown
Clustering seperates datasets into groups so that information from different
ferent written resources, to reveal
instances in one set are similar to each other and as otherwise 'hidden' meanings". The first and the
dissimilar as possible from the objects in other foremost step in text mining is to fetch suitable textual

@ IJTSRD | Available Online @ www.ijtsrd.com | Volume – 2 | Issue – 6 | Sep-Oct


Oct 2018 Page: 735
International Journal of Trend in Scientific Research and Development (IJTSRD) ISSN: 2456-6470
2456
resources for a given subject of interest. This process best predictor of hyper triglyceride mi a in women.
is called as information retrieval. After IR the Rib to Forehead Circumference ratio (RFCR) was the
resulting document set can be examined by search best predictor in men. However in the age group of
algorithms for the occurrence of particular keywords 20-50
50 years the best predictor of
o hyper triglyceride mi
of interest. For example a particular gene should be a were rib circumference and WHtR in 51-90 51 year
acknowledged in the text not only by its gene symbol, group in women and RFCR in 20-5020 years group and
but also by the synonyms and previous names this is BMI in 51-9090 year group men. The best predictor of
called
ed as NER concept. After IR and NER technical hyper triglyceride mi a varies according to gender and
algorithms can be used to discover links between age [6].
concepts in the text. Recently the most used
approaches to extract information from text are co co- 2.6. HMV: Medical Decision Support
occurrence based methods and Natural Processing Framework:
(NLP) based methods. ds. Comparison of these 2 Decision Support System(DSS) help decision makers
approaches NLP based method has more advantages. to collect and interpret information and construct a
Some of the applications of TM to biomedical foundation for decision making . Medical DSS play
problems are genome and gene expression annotation, an important role in medical by guiding doctors with
drug repositioning, adverse events, electronic health clinical decisions. Datata mining in medical area is a
records, domain specific databases[4].Proteins
ases[4].Proteins are the process of uncovering hidden patterns and
molecules that ease most biological process in a cell. information from huge medical datasets, examine and
Some of the text mining tools are Bio RAT use them for disease prediction. The proposed HMV
(Biological Research Assistant for Text Mining, eFIP overcome the disadvantages of conventional
(Extracting Functional impact of phosphorylati on), performance by using a group of seven heterogeneous
FACTA+(Finding Associated Conccepts ccepts with text classifiers. HMV framework removes noise from
analysis), Gene Ways, Hit Predict, In Print, I2D (Inter medical dataset by using clustering approach. The
logo us Interaction Database), iHOP (information selection of optimal set of classifiers is an crucial step.
Hyperlinked Over Project), IMID(Integrated The HMV satisfied 2 conditions accuracy and
Molecular Interaction Database), Negatome, open prediction diversity to achieve high quality.
quali The HMV
DMAP, PCorral (Protein Corral), PIE the search, Poly ensemble framework is done on 2 heart disease
search, PPIExtractor, PPIfinder, PPLook, STRING[5]. dataset, 2 diabetes dataset, 2 liver disease dataset,
dataset 1
hepatitis dataset, 1 park in son’s disease dataset. In all
2.5 Hyper triglyceride mi a From these disease datasets HMV has produced highest
Anthropometric Measures: accuracy in comparison with other classifiers.
The excellent indicator for the prediction of hyper Multilayer classifier is introduced to enhance the
triglyceride mi a from anthropometric measures of prediction. Predictions done by proposed DSS is
body shape which remains a matter of debate. A total parallel with prediction performed by panel of
of 5517 subjects have participated in this crosscross- doctors. Heterogeneous classifier ensemble model is
sectional study (3675 women and 1842 men) aged 20 20- used by combining different type of classifiers
classifier and
90 years. When the subject is standing the achieved a high level of diversity. Naive Bayes (NB)
circumference of 8 particular sites were measured is probabilistic classifier which shows higher
using a flexible non-elastic
elastic tape. The BMI wa was prediction accuracy and classification.
classification Eager
calculated as weight in kilograms divided by square of evaluation methods are decision trees, QDA
the height in meters. Various circumference (Quadratic Discriminant Analysis), LR (Linear
measurements are Forehead Circumference (FC), Regression), SVM, and Bayesian classification. Lazy
Neck Circumference (NC), Axilliary Circumference evaluation method is KNN combining lazy and eager
(AC), Chest Circumference (CC), Pelvic evaluation algorithm (hybrid approach) results in
Circumference (PC), Rib ib Circumference (RC), Waist overcoming the limitation of both eager and lazy
Circumference (WC), Hip Circumference (HC). Male methods. Eager method may suffer of missing rule
and female data are divided seperately because the problem when there iss no matching exists. In this
difference in body shape with aging may vary scenario it adopts default prediction. Proposed
according to sex. Waist to Hip Ratio (WHR) were the framework is constructed on three modules. The first
strongest predictors of hyper triglyceride mi a in module is based on data acquisition and pre
Indian men. WHtR (Waist to Height Ratio) was the processing which gathers data from different data

@ IJTSRD | Available Online @ www.ijtsrd.com | Volume – 2 | Issue – 6 | Sep-Oct


Oct 2018 Page: 736
International Journal of Trend in Scientific Research and Development (IJTSRD) ISSN: 2456-6470
2456
warehouse and pre process them. In second module REFERENCES:
individual classifiers training is executed on the 1. LingjianYanga,a,1,ChrysanthiAinalib,c,1,,Aristote
training set and they are used for predicting unknown lisKittasb,FrankO.Nestlec,LazarosG.Papageorgiou
class labels for test instances. The third module is a,*,SophiaTsokab,*,
,SophiaTsokab,*, "Pathway level disease data
prediction and evaluation of proposed ensemble mining through hyper-box
box principles, JID:MBS,
framework. It is also performed onn real time blood CP [m5G;November 10, 2014;15:25]
datasets which was taken from PIMS hospital to 2. Archana Purwar1 and Sandeep Kumar Singh2
discover healthy and disease patients and the result of
,"Hybrid
Hybrid Prediction Model with missing value
the samples again showed higher disease prediction Imputation for medical data",
data" Expert Systems with
accuracy it also guides practitioners and patients for
Applications(2015)
the prediction of disease based on the symptoms of
disease [7]. 3. Dang Nguyen a, b, Bay Vo a, b, n, BacLe c,"
CCAR: An efficient method for mining class
3. COMPARITIVE STUDY: association rules with item set constraints",
Predictions Advantage Disadvantage Engineering Applications of Artificial Intelligence
37(2015)115–124.
1st High Time
prediction accuracy consuming 4. Wilco W. M. Fleurena, b, Wyn and Alkema a, c,*,
2nd Imbalanced "Application of text mining in the biomedical
Accuracy domain", Methods 74 (2015) 97–106.
97
prediction classification
3rd Duplicate 5. Nikolas Papanikolaou,
laou, Georgios A. Pavlopoulos,
Efficiency
prediction content The odosios The odosiou, Ioannis Iliopoulos *,
4th "Protein–protein
protein interaction predictions using text
Efficient uncertain
prediction mining methods", Methods xxx (2014) xxx–xxx
xxx
5th No cause effect
Simple 6. Bum Ju Lee, Jong Yeol Kim*, "Indicators of
prediction relationship
6th High hyper triglyceride mi a from anthropometric
anthropome
Inflexible measures based on data mining", Computers in
prediction accuracy
Table 1 com paritive study on various predictions Biology and Medicine 57 (2015) 201–211.
201
1
7. Saba Bashir, 2Usman Qamar, 3Farhan Hassan
4. CONCLUSION: Khan, 4Lubna Naseem 1,2,3 Computer
Accuracy plays an requisite role in the medical field Engineering Department, College of Electrical and
as it is related to the life of an individual. In the Mechanical Engineering National University of
analysis of these six prediction techniques HMV: Sciences and Technology (NUST), Islamabad,
DSS using multilayer classifier is considered to be the Pakistan 4Shaheed Zulfiqar Ali Bhutto Medical
best prediction technique among these because it has University, PIMS, Islamabad, Pakistan, "HMV:
" A
higher prediction accuracy and it also help and guide medical decision support framework using multi-multi
practitioners in predicting the disease. layer classifiers for disease prediction", Journal of
Computational Science (2016).

@ IJTSRD | Available Online @ www.ijtsrd.com | Volume – 2 | Issue – 6 | Sep-Oct


Oct 2018 Page: 737

You might also like