Professional Documents
Culture Documents
I.
INTRODUCTION
Modern medicine generates almost every day,
enormous quantities of heterogeneous data. Today, the
biggest challenge is to transform these huge quantities of
data into useful information and knowledge. Medical data
may contain images (RMN), signals (ECG), clinical
information such as temperature, cholesterol level, etc, as
well as the doctors interpretation. More and more medical
procedures use medical imaging as a favorite diagnosis
instrument, so there is a necessity to develop efficient
exploitation methods in the images data bases.
The analysis of the data of sick people permits building
the profile of a patient suffering from a certain disease,
improving the diagnosis [1], determining the relationships
that exist between certain medical parameters in order to
realize medical predictions. Data mining techniques
successfully apply on human brain imaging, genetics (in
order to predict protein structures, to determine 3D
structure of the proteins given their amino-acid sequence).
In the genetics field, data mining techniques are applied in
order to discover some correlations between the
modifications of the DNA sequences of diverse
individuals and the susceptibility of the apparition of some
disease. The aim of the research in this field is to improve
disease diagnosis, to more efficiently prevent them and
treat them more easily.
The data mining techniques permit the discovery of
medicines and support the prediction on the individual
(personalization of medicines).
Due to the fact that the files which contain medical
data, on which data mining techniques are applied, are
linked to human subjects, referring to their private life, the
problem of insuring data confidentiality is very important
[2].
This paper presents a synthesis on the analysis of
medical data through data mining techniques. Aspects
regarding data mining tools are also mentioned. Certain
aspects related to the use of the classification techniques
in the medical field are presented. WEKA is a very
powerful data mining and machine learning instrument.
breast cancer
Diabetes
Cleveland
heart disease
dermatology
76.3
73.83
73.83
85.3
77.56
81.52
97.27
93.99
94.81
VII. CONCLUSIONS
In the medical data bases enormous amounts of data are
gathered everyday. Medical data is very diverse and can
include:
images (RMN), signals (ECG), clinical
information (temperature), etc. Analyzing these data
through data mining techniques aims to extract useful
knowledge from the data.
In the medical field several data mining techniques are
used, such as classification (in order to predict a nominal
value), regression (in order to predict a numeric value),
clustering (in order to determine the main groups of
similar data), the association rules (in order to detect the
association between different types of information which
apparently have no sort of dependency), etc.
The above mentioned data mining techniques apply to
medical data with the means of some Open Source
instruments such as WEKA, R, Orange.
The authors improved the classification interface of the
Open-Source WEKA program, by introducing a section
whose content is dynamic according to the opened data
set, section which allows the use of the classifier model in
order to make predictions. The graphical interface was
also improved through a 3d graph which illustrates the
number of correctly or incorrectly classified instances.
[2]
[3]
[4]
[5]
[6]
[7]
[8]
[9]