Professional Documents
Culture Documents
5, ISSN: 1837-7823
Research Scholar, Sri Chandra Sekarendra Viswa Maha Vidyalaya, Enathur, Kanchipuram-531 602. 2 Sr.Professor, St. Joseph College of Engineering, Chennai-600 119.
Abstract
Data Mining is an emerging field. Every day health care industry produces huge volume of data. It is very tedious task to find the right data for the right place. In other words we are having rich data but poor utilization. In this paper we discuss how data mining techniques can be used in health care industry. Health care industry has enormous data. Healthcare industry today generates large amounts of complex data about patients, hospitals resources, disease diagnosis, electronic patient records, medical devices etc. The large amounts of data is a key resource to be processed and analyzed for knowledge extraction that enables support for cost-savings and decision making. Data mining brings a set of tools and techniques that can be applied to this processed data to discover hidden patterns that provide healthcare professionals an additional source of knowledge for making decisions. With continuous advances in technology, increasing number of clinicians are using electronic medical records to accumulate substantial amounts of data about their patients with the associated clinical conditions and treatment details. The hidden relationships and patterns within these information would further our medical knowledge including its efficiencies and deficiencies. Methodologies that are being used in parallel industries with increasing effectively need to be modified and applied to discover this knowledge. In this article we are going to discuss about how data mining can be used in medical field, how it solves the business issues, challenges in data mining, data mining techniques and the semma methodology.
Keywords: Data Mining, Health Care, Neural Networks, Neuro Fuzzy, Decision Tree algorithm 1.Introduction
Data mining can be defined as the process of finding previously unknown patterns and trends in databases and using that information to build predictive models. Alternatively, it can be defined as the process of data selection and exploration and building models using vast data stores to uncover previously unknown patterns. Data mining is not newit has been used intensively and extensively by financial institutions, for credit scoring and fraud detection; marketers, for direct marketing and cross-selling or up-selling; retailers, for market segmentation and store layout; and manufacturers, for quality control and maintenance scheduling. In healthcare, data mining is becoming increasingly popular, if not increasingly essential. Several factors have motivated the use of data mining applications in healthcare. The existence of medical insurance fraud and abuse, for example, has led many healthcare insurers to attempt to reduce their losses by using data mining tools to help them find and track offenders. Fraud detection using data mining applications is prevalent in the commercial world, for example, in the detection of fraudulent credit card transactions. Recently, there have been reports of successful data mining applications in healthcare fraud and abuse detection. Another factor is that the huge amounts of data generated by healthcare transactions are too complex and voluminous to be processed and analyzed by traditional methods. Data mining can improve decision-making by discovering patterns and trends in large amounts of complex data. Such analysis has become increasingly essential as financial pressures have heightened the need for healthcare organizations to make decisions based on the analysis of clinical and financial data. Insights gained from data mining can influence cost, revenue, and operating efficiency while maintaining a high level of care.
10
International Journal of Computational Intelligence and Information Security, May 2013, Vol. 4 No. 5, ISSN: 1837-7823
International Journal of Computational Intelligence and Information Security, May 2013, Vol. 4 No. 5, ISSN: 1837-7823
warehousing can be supported by decision support tools such as data mart, OLAP and data mining tools. A data mart is a subset of data warehouse. It focuses on selected subjects. Online analytical processing (OLAP) solution provides a multi-dimensional view of the data found in relational databases. With stored data in two dimensional format OLAP makes it possible to analyze potentially large amount of data with very fast response times and provides the ability for users to go through the data and drill down or roll up through various dimensions as defined by the data structure.
Once the business problems have been defined and agreed upon, the next logical step is to determine the type and amount of data that will be necessary for making business decisions. As a precursor to data mining, a data warehouse strategy and implementation is suggested. Integration with SAS software gives the SAS data mining solution several distinguishing characteristics which allow faster, easier and more accurate conversion of data into knowledge useful to decision makers. Data diversity The SAS data mining solution is designed to accept a wider range of data formats than any other data mining product currently on the market. It will accept data from relational and hierarchical databases, flat files, and other data formats, and it will accept this data from all major hardware platforms. Distributed client/server The SAS data mining solution supports both the data server model of client/server computing, in which data located on a remote machine can be accessed, and the compute server model, which allows data to be processed on a remote server and then forwarded to a client. This is particularly well suited to analytical tasks involving large volumes of data that require superior processing capabilities. Consistent implementation on multiple platforms The SAS data mining solutions are fully integrated with SAS Software and give users the flexibility to use their platforms of choice, ranging form desktop machines to powerful servers. Integrated data management SAS softwares data management facilities guarantee data integrity without the need for re-keying or additional validation of data. No other data mining solution includes seamless integration with such a comprehensive range of data management functionality. Once the business objectives and data issues have been resolved, the methodology and approach to data mining can begin.
12
International Journal of Computational Intelligence and Information Security, May 2013, Vol. 4 No. 5, ISSN: 1837-7823
5. Semma Methodology
The methodology and approach that SAS Institute proposes is referred to as SEMMA, for Sample, Explore, Modify, Model, and Assess. Beginning with a statistically representative sample of data, users can apply exploratory statistical and visualization techniques, select and transform the most significant predictive variables, model the variables to predict outcomes, and affirm the models accuracy.
Sample The first step is to extract a portion of a large data set big enough to contain the significant information yet small enough to manipulate quickly. Explore This phase involves searching speculatively for unanticipated trends and anomalies so as to gain understanding and ideas. This can reveal which subset of attributes will be the most productive to work with the modeling phase. Data visualization delivers intuitive tools for business professionals, while statistical techniques offer added detail for specialist. Modify The insights that are gained from the exploration phase enable knowledge workers to group the most productive subsets and clusters of data together for further analysis and exploration. Model This process involves searching automatically for a variable combination that reliably predicts a desired outcome. Data mining techniques such as neural networks, tree-based models, and traditional statistical techniques can help reveal patterns in the data and provide a best-fitting predictive model. Assess During this evaluation process, assessment of the results gained from modeling provides indications as to which results should be conveyed to senior management, how to model new questions that have been raised by the previous results and thus proceed back to the exploration phase. SEMMA is a process that allows SAS Institute to distinguish ourselves by being the only vendor that can offer all of these components, as well as the ability to seamlessly integrate them with a companys existing hardware and software strategy.
13
International Journal of Computational Intelligence and Information Security, May 2013, Vol. 4 No. 5, ISSN: 1837-7823
International Journal of Computational Intelligence and Information Security, May 2013, Vol. 4 No. 5, ISSN: 1837-7823
Diagnosis=Blood_alcohol_content_HIGH. Decision Tree algorithms Decision tree include CART (Classification and Regression Tree), ID3 (Iterative Dichotomized 3). These algorithms differ in selection of splits, when to stop a node from splitting, and assignment of class to a non-split node. CART uses Gini index to measure the impurity of a partition or set of training tuples. It can handle high dimensional categorical data. Decision Trees can also handle continuous data (as in regression) but they must be converted to categorical data. Neural Network Architecture The architecture of the neural network used in this study is the multilayered feed-forward network architecture with 20 input nodes, 10 hidden nodes, and 10 output nodes. The number of input nodes is determined by the finalized data; the number of hidden nodes is determined through trial and error and the number of output nodes is represented as a range showing the disease classification. The most widely used neural-network learning method is the BP algorithm. Learning in a neural network involves modifying the weights and biases of the network in order to minimize a cost function. The cost function always includes an error term a measure of how close the network's predictions are to the class labels for the examples in the training set. Additionally, it may include a complexity term that reacts to a prior distribution over the values that the parameters can take. Neural networks have been proposed as useful tools in decision making in a variety of medical applications. Neural networks will never replace human experts but they can help in screening and can be used by experts to double-check their diagnosis. In general, results of disease classification or prediction task are true only with a certain probability. Neuro-Fuzzy Stochastic back propagation algorithm is used for the construction of fuzzy based neural network. The steps involved in the algorithm are as follows: First, initialize weights of the connections with random values. Second for each unit compute net input value, output value and error rate. Third, to handle uncertainty for each node, certainty measure (c) for each node is calculated. Based on the certainty measure the decision is made. The level of the certainty is computed using the following conditions. a. If 0.8 \< c 1, then there exists very high certainty b. If 0.6 \< c 0.8, then there exists high certainty c. If 0.4 \< c 0.6, then there exists average certainty d. If 0.1 \< c 0.4, then there exists less certainty e. If c 0.1, then there exists very less certainty The network constructed consists of 3 layers namely an input layer, a hidden layer and an output layer. Sample trained neural network consisting of 9 input nodes, 3 hidden nodes and 1 output node is shown in Figure 2. When a thrombus or blood clot occupies more than 75% of surface area of the lumen of an artery then the expected result may be a prediction of cell death or heart disease according to medical guidelines i.e. R is generated with reference to the given set of input data.
7. Summary
The effective use of information and technology is crucial for health care organizations to stay competitive in todays complex, evolving environment. The challenges faced when trying to make sense of large, diverse, and often complex data source are considerable. In an effort to turn information into knowledge, health care organizations are implementing data mining technologies to help control costs and improve the efficacy of patient care. Data mining can be used to help predict future patient behavior and to improve treatment programs. By identifying high-risk patients, clinicians can better manage the care of patients today so they do not become the problems of tomorrow. 15
International Journal of Computational Intelligence and Information Security, May 2013, Vol. 4 No. 5, ISSN: 1837-7823
We studied the problem of constraining and summarizing different algorithms of data mining. We focused on using different algorithms for predicting combinations of several target attributes. Finally we conclude that if we use proper data mining algorithms in health care industry we produce better results and it can be used to prevent several diseases.
8. References
[1] Shams, K. and M. Frashita, 2001. Data Warehousing Toward Knowledge Management. Topics in Health Information Management, 21: 3. [2] Jones, A.W., 1990. Physiological Aspects of Breath-Alcohol Measurements. Alcohol Drugs Driving, 6:1-25. [3] Han, J. and M. Kamber, 2001. Data Mining: Concepts and Techniques. San Francisco, Morgan Kauffmann Publishers. [4] Veletsos, A. (2003). Getting to the bottom of hospital finances. Health Management Technology, 24(8), 30-31. [5] Dakins, D.R. (2001). Center takes data tracking to heart. Health Data Management, 9(1), 32-36. [6] Johnson, D.E.L. (2001). Web-based data analysis tools help providers, MCOs contain costs. Health Care Strategic Management, 19(4), 16-19. [7] Schuerenberg, B.K. (2003). An information excavation. Health Data Management, 11(6), 80-82. [8] Piazza, P. (2002). Health alerts to fight bioterror. Security Management, 46(5), 40. [9] Brewin, B. (2003). New health data net may help in fight against SARS. Computerworld, 37(17), 1, 59. [10] Paddison, N. (2000). Index predicts individual service use. Health Management Technology, 21(2), 14-17. [11] Johnston G. System adds to biodefense readiness. Bio-IT World. November 1, 2002. Available at www.bioitworld.com/ news/110102_report1436.html. Accessed July 21, 2004. [12] Jiawei Han, Micheline Kamber. Data mining concepts and techniques. Morgan Kaufmann Publishers. ISBN 1055860-489-8 [13] Philip Baylis et al. Better health care with data mining, Clementine working with health care. SPSS white paper. Shared Medical Systems Limited, UK [14] Kristin B. Degrug, MSHS. Healthcare Applications of Knowledge Discovery in Databases. Journal of Healthcare Information Management, Vol. 14, no. 2, Summer 2000.
16