You are on page 1of 5

A SURVEY ON PREDICTION IN HEALTH DATABASE

J.SUMITHA

Abstract:

Health can be effected by various environmental factors such as pollution, our


behavioral problems such as smoking, drinking, and our food products such as taking fast
foods and by taking medicines for our health such as pills for cough, head ach, pain killers
and so on which in turns gives adverse effects to our body. In this paper, health and its
affecting factors are predicted using Bayesian networks are analyzed. The study in this paper
about the health database and its affecting factors which may provide awareness about the
health affecting factors in future. Various classification techniques along with the prediction
are used to classify the health problems and it’s affecting factors in this paper. This paper is
also surveyed recently used methods to determine health and its effecting factors.

Keywords: Bayesian network, C4.5, CART, Drug safety, Health, Induction learning
algorithm, Prediction, and. Renal transplantation center.

1. INTRODUCTION

Health and its affecting factors can be analyzed by many algorithms including Bayesian
networks. Bayesian networks are used in many applications. But, in this, a survey is made on
many experiments to analyze the Bayesian networks along with prediction used in health and
it’s affecting factors. In this, an automatic learning method of Bayesian networks with the
decision trees is used to predict the distribution of cancer, cardiology and dengue. These
diseases such as cancer, cardiology and dengue have been predicted based on the
characteristics [1].Health can also affected by the changes occurring in the coastal and river
areas. The increase in the composition of nitrogen inputs in the river flow made the toxic
micro-organisms in one side and carbon content is increased on other side. These makes the
oxygen content to be reduced in the water which in turns makes fishes to be diseased and
even leads to death. These diseased fishes which caught and taken as a food by humans those
whom also gets affected by the disease. The prediction of diseased fishes and the killed fishes
due to this rich nitrogen inputs over the river flow within a particular year can be predicted
[3] .The coastal area which is highly affected by these chemical compositions is classified
and predicted using recent methodologies [4].The probability of patient registration with
certain diseases in a transplantation centers are classified and predicted by using prediction
algorithms. The combination of decision trees and the Bayesian network is used to predict the
registration of the patients in a transplantation center [5].

In recent decade, a new methodology is used to detect the problems of existing adverse drug
reaction in china. An automatic algorithm is used which is incorporated in the Bayesian
networks is used to predict the adverse events for the concern drugs[7].The statistical
methods along with Bayesian approach is used to detect the adverse events as soon as
possible in drugs recently. The self control case series method (sccs) is an integrative part of
a Bayesian network is used to estimate the relative incidence of adverse events of a vaccine
for safe incurring in pharmacovigilance [7]. The patients those whom taking multiple drugs
are considered and the corresponding drug reactions are recorded for prediction of a
particular drugs [7].
The classification algorithms taken here are CART decision trees, C4.5 algorithms and
Bayesian networks. Apart from the classification algorithms, many works has been done to
classify the health and its affecting factors. The other methodologies used in this survey are
an induction learning techniques [1], and self control case series method (SCCS) [8].

2. HEALTH DATABASE

In this, many health databases has been made surveyed in which cancer database can be
predicted on the basis of its tumor characteristics [1]. Cardiological database can be predicted
on the basis of their respective symptoms. Dengue database can be predicted on the basis of
the ambiential characteristics. A combination of the Bayesian and C4.5 algorithm is applied
to predict the distribution of these databases in a locality or an area.

Spontaneous Reporting System is a health database in which it is comprises of suspected


adverse drug reactions and along with a remedy [7]. Individual records in the SRS database
consists of age, sex, date of report, drugs and its adverse events. The advantage of SRS
database is that the pharmaceutical companies, health authorities and drug monitoring centers
uses SRS database for global safety screening purposes.

The drawback of the SRS database is that it is only containing the reports of the adverse
effects but it is failed to determine the number of individuals consuming a particular drug.

Renal transplantation database is divided into two sets. One is the training set (90%) and the
another set is validation set (10%) [5].Chi2test is made to ensure the characteristics of two
data sets. The predictive performance of CART and Bayesian algorithms are evaluated on the
validation set. The advantage of this database is that McNemar test gives more accurate
performance when compared to other tests.

3. ALGORITHMS

Many algorithms are surveyed for predicting the performance of health and its
affecting factors. Some of them are considered as a prediction tool for determining the
predictive performance of the health database.

3.1 C4.5

C4.5 is a one of the simplest approach in the classification methods and it is used in a
number of induction learning methods. The advantage of this C4.5 in this survey is that it
gives more accurate predictive performance for the distribution of cancer, cardiology and the
dengue database in a locality or in an area [1].

The C4.5 analysis consists of four steps. They are follows:


Step 1: The health database is divided into i) control database and ii) the validation database.
Step 2: C4.5 algorithm is applied to control database.
Step 3: Repeating the same process for 10%, 20%...to 100% of control database. (i.e.)
repeating 30 times by each iteration.
Step 4: Evaluate the predictive performance of both C4.5 and Bayesian networks in a
validation database.
The drawback of the C4.5 is that it makes the classification only a 10% better precision than
the other network. Only a small difference is observed.

3.2 CART
CART is one of the most popular classification methods in many application fields
especially in medicine. The main advantages that make CART so popular are its simplicity
and its interpretability.

CART analysis consists of five steps in this survey [5]. They are follows:
Step 1: The dataset is first split into a number of homogeneous classes.
Step 2: The root node is then spitted into a number of child nodes. A split criterion is used
to split the root nodes into Child nodes and for selecting the best classifier sample node.
Step 3: The Splitting continues on the child nodes until the stopping criterion are applied.
CART method uses binary recursive portioning for analysis.
Step 4: The GINI split criterion is used until the stopping criterion allowed the maximum
tree nodes of 5.
Step 5: Then, the predictive performance is predicted and measured.

The drawback of this algorithm in this survey is that the complexity underlying in the CART
is easily predicted by Bayesian networks. These complexities are predicted by selecting some
discriminating factors for performance evaluation.

4. RELATED WORKS

Pablo Felgaer [1] gave a combination of automatic learning method of Bayesian network
with the decision trees for predicting the distribution of the cancer, dengue and cardio logical
diseases in a locality or in an area. The predictive performance is based upon the
characteristics of the corresponding diseases. In this, he proved that the C4.5 gives better
prediction when compared to the Bayesian networks and also he makes proving that the C4.5
is better suited algorithm for inductive machine learning methods.

Mark.E.Bursuk [3] in his work, he proposed Bayesian network for determining the
abundance of nitrogen inputs made the deficient in the oxygen supply in water and made
fishes to death or affecting by disease. These fishes when took as a food by humans also gets
affected. In this, he makes an analysis over a river affecting factors which is then become
harmful for human health.

Sahar Bayat [5] in his recent work, he proposed a probability of patient registration with
certain diseases in a transplantation center. For this, he categorizes diseases in two models.
One is the Bayesian network model and another is the CART model. In his work, he selected
a strong discriminating factor for predicting the performance. The predictive performance is
measured by McNewar test and is evaluated on the validation sets.
This diagram has been adopted from [5].

BAYESIAN NETWORK MODEL

Chen Wen-ge [7] in his recent work, he formulated a method for detecting the adverse drug
reactions. For this, he used SRS database for detecting the drugs and its corresponding side
effects. He also proposed therapy along with the drugs and its side effects. The proportional
ADR reporting ratio (PRR) and the reporting odds ratio (ROR) is used for predicting the
performance measure and for observing the data.

David Madigan [7], in his review, used statistical methods to detect adverse events earlier in
drugs. A Bayesian approach is used for determining the high dimensionality in data.BCPNN
algorithm is used to detect the adverse events earlier.

Farrington [8] proposed self control case series method to estimate the incidence of side
effects occurring for humans. He surveyed the side effects of peoples when incurring the
multiple drugs at the same time and the humans taking single drug with high dose frequently
[8]. The relative incidence is predicted and its prediction is for drug safety [8].

5. DISCUSSIONS

In this survey, it is made clear that the Bayesian network used for predicting health and its
affecting factors is found to be a better networks than other networks that can be used for
prediction .but, while on predicting the distribution of the diseases [1] and for predicting the
composition of the rich nitrogen supply in the river flow [3], it is suggested that the recently
used advanced Markov model can show more better prediction than the Bayesian networks.
Since it is an advanced recently used probabilistic algorithm, it shows better prediction in the
various applications especially, for instances, in medicine. Through this survey, it is also
suggested that the Markov model whether it is applied for predicting the registration of
patients in transplantation center [5] and for detecting the ADRs of the corresponding drugs
[7], it gives more effective results than the Bayesian network.

6. CONCLUSION

In this, a survey is made about the Bayesian networks used in health affecting factors. The
prediction used along with this Bayesian networks is discussed as in which significant sectors
it is used. The Bayesian networks which is used in estimating the distribution of the disease is
accurately estimated. As such in a way, it is for deriving accurate result for abundance of
nitrogen input in the rivers [3]. The self control case series method which is used for drug
safety in pharmogovigilance is also predicted using Bayesian Networks [8]. Analyzing the
existing problems, an automatic algorithm is used to predict the result. The
Dispropositionality concept is used behind it.BCPNN algorithm is used to detecting the
adverse signal detection in the ADRs [7]. In future, the Bayesian networks are expected with
more advancement to be used for estimating many existing problems not only concern in
health areas but also in various areas. Markov model is a probabilistic model which is a one
of the advanced model in the classification methods. The paper suggested that Markov model
can be implemented along with the other Bayesian networks for better prediction. Markov
Model is recognized as more advanced and more effective model for recent times and is
suggested as it is highly effective in predicting many application fields, especially in
medicine. In this survey, it is concluded that the markov model is having an effective future
in predicting medicines and its effects which are affecting the health.

REFERENCES

[1] Pablo felgaer, Paola britos, ramon Garcia Martinez, prediction in health domain using
Bayesian networks optimization based on induction learning techniques, International
Journal of Modern Physics C ,Vol. 17, No. 3 (2006) 447–455 ,World Scientific
publishing Company.

[2] K. J. Ezawa and T. Schuermann, Fraud/uncollectible debt detection using a Bayesian


network based learning system: A rare binary outcome with mixed data structures,
Proc. Conf. Uncertainty Arti. Intell. (San Francisco, CA, 1995), p. 157.

[3] Mark E. Borsuk, Craig A. Stow, and Kenneth H. Reckhow, Integrative environmental
prediction using Bayesian networks: A synthesis of models describing estuarine
eutrophication, Nicholas School of the Environment and Earth Sciences Duke
University, Durham, North Carolina USA.(2002).

[4] Borsuk, M. E., L. A. Eby, and L. B. Crowder.2002b. Probabilistic prediction of fish


health and fish kills in the Neuse River estuary using the elicited judgment of
scientific experts. In review.

[5] Sahar bayat, Marc cuggia, Delphine rossille,Michèle Kessler, Luc rimat, Comparison
of Bayesian Network and Decision Tree Methods for Predicting Access to the Renal
Transplant Waiting List, Medical Informatics in a United and Healthy Europe K.-P.
Adlassnig et al. (Eds.), IOS Press, 2009, © 2009 European Federation for Medical
Informatics.
[6] Bayat, S. et al. (2008) Modeling access to renal transplantation waiting list in a
French healthcare network using Bayesian method. Studies in Health Technology and
Informatics 136:605–610

[7] Wen-ge.C.and Jian-xiong, D., A Study on Signal Detection and Automatic Warning
Algorithm for Adverse Drug Reaction, Computer Science and Software Engineering,
2008 International Conference on 12-14 Dec. 2008 On page(s): 202 – 205.
[8] Farrington, P. (1995). Relative incidence estimation from case series for vaccine
safety evaluation. Biometrics 51, 228-235.

You might also like