You are on page 1of 5

IAETSD JOURNAL FOR ADVANCED RESEARCH IN APPLIED SCIENCES ISSN NO: 2394-8442

Reliable Data Mining Tasks and Techniques for Industrial


Applications
G.Sabarmathi1, Dr.R.Chinnaiyan2
1 Department of Computer Science, ChristChrist
Academy Institute for Advanced Studies,
1 Department of Computer Science, Academy Institute for Advance Studies,Bangalore
BangaloreUniversity,
University, Bangalore
Bangalore
Department of MCA, New Horizon College of Engineering (Autonomous), VTU, Bangalore
1 sabarganesh@gmail.com, 2 vijayachinns@gmail.com

Abstract— Data mining is the process of extracting the useful data, patterns and trends from a large amount of data by
using techniques like clustering, classification, association and regression. In this paper we are going to discuss about various
Data mining applications and techniques used recently.

Keywords— Data Mining, Patterns, Clustering, Classification, Association

1. INTRODUCTION
Data mining is the procedure of extracting knowledge from available data. The result extracted from those data can be used
in various applications like Market analysis, Data analysis in various industries like information technology, health care and
various other areas. There are two categories of functions involved in data mining they are Descriptive model and Predictive
model. Whereas Predictive model helps to analyse the data which is available assuming the unknown or unavailable data in
future, in case of Descriptive model it analyse the data with patterns and comes with new and significant information. They
follow different pattern in analysing the data. Data mining is the process of finding the information from unknown facts
and data which helps in decision making.

Figure 0.1: Data Mining: Multiple Disciplines

2. KNOWLEDGE DISCOVERY in DATABASES(KDD)


Knowledge Discovery in Database (KDD) is the method of extracting useful information from subordinate
databases. KDD involves the steps like data processed from data selection, interpretation, cleansing, transforming the raw
data into some information, integrating and evaluating the pattern for that information received.

VOLUME 4, ISSUE 7, DEC/2017 138 http://iaetsdjaras.org/


IAETSD JOURNAL FOR ADVANCED RESEARCH IN APPLIED SCIENCES ISSN NO: 2394-8442

Figure 1.1 Process of Knowledge Discovery in Database


Above steps shows the process of KDD

 Data come from various resources are combined as single entity data called target data.
 Data from target data are pre-processed and transformed into processed data.
 The result evolved is been transformed to form some patterns by applying some data mining algorithms.
 Finally those results are interpreted to form some useful information called knowledge.

3. DATA MINING TEHNIQUES


Often used techniques in data mining are:
 Data mining software uses advanced pattern recognition algorithms to shift through large amounts of data. This assists
discovery of unknown business information at strategic level.[5]
 Sequence mining is a technique used in Human Genetics to learn and understand the relationship between the inter-
individual variations in human DNA sequence and the variability in disease susceptibility.[5]
 Spatial analysis is the technique applied to structures at the human scale, most notably in the analysis of
geographic data. Complex issues arise in spatial analysis, many of which are neither clearly defined nor completely
resolved, but form the basis for current research. Information Retrieval[6]

4. DATA MINING TASKS


 Classification is the most commonly applied data mining technique. But it requires set of pre-classified examples
to create a model. Based on the model the classification shall be applied on large dataset. Credit risk and fraud
detection is identified using classification.[9]
 Support vector machines are supervised learning models with associated learning algorithms that analyse data used
for regression analysis and classification. A Support Vector Machine (SVM) performs classification by finding the
hyper plane which maximizes the margin between the two classes. The vectors (cases) that define the hyper plane
are the support vectors.[9]
 Decision tree is a widely used data mining method. In decision model, a decision tree is a graph of decisions and
their possible significances, represented in the form of branches and nodes. A decision tree contains a root node,
branches, and leaf nodes. Each internal node denotes a test on an attribute, each branch denotes the end result of a
test, and each leaf node holds a class label. The topmost node in the tree is the root node. Decision trees use
recursive data partitioning to extract useful data.[9]
 Regressions a task used to predict the number such as Age, height, Income distance etc., It predicts the data where
the target data is unrevealed. It is done by the statistical team by measuring the difference between the predicted
and expected values. [20]

VOLUME 4, ISSUE 7, DEC/2017 139 http://iaetsdjaras.org/


IAETSD JOURNAL FOR ADVANCED RESEARCH IN APPLIED SCIENCES ISSN NO: 2394-8442

 Prediction is a method used in predicting the outcome based on available data by incorporating with unavailable
data sets of future.IT is mainly used in Market Analysis, Medical diagnosis, Fraud detection etc.,
 Time series Analysis is a sequence of event based on the earlier events the next event is analysed. It replicates the
process being measured and there are certain components that affect the performance of a process. This includes
methods to analyse time-series data in order to extract useful patterns, trends, rules and statistics. Stock market
prediction is an important application of time- series analysis.[19]

Figure 3.1 categories of Data mining Techniques

 Association rule analysis is a descriptive data mining function, which involves discovering patterns or associations
between the data. It is a method for finding the interesting relations between variables in large databases. It is
intended to identify strong rules discovered in databases using some measures of interestingness. Associative
classification evolved by combining associative rule and classification methodology. This helps to identify interesting
correlations, frequent patterns, associations or casual structures.[9]
 Clustering is a process of partitioning a data set into clusters which contains set of meaningful sub-classes. It is a
descriptive data mining functions which assembles similar objects in the same cluster and dissimilar objects in
different clusters. It is a main task of exploratory data mining, and a common technique for statistical data analysis,
used in pattern, machine learning, image analysis, information retrieval, and bioinformatics.
 Summarization is a simplification of data where a set of data is been summarized that results the combined
information of the data. This method helps us to identify the customer behaviour in sales and purchase of the
product.

5. RELIABILITY IN DATA MINING


A reliability model for data mining systems represents a clear picture of the data functional interdependencies providing a
means to trade-off in data mining design alternatives and to identify areas for data mining design improvement of databases.
The reliability models for data mining are also helpful in:
i. Identifying of critical items and single points of failure of data
ii. Allocating reliability goals to portions of the design of database and mining
iii. Providing a framework for comparing estimated reliability of data mining methods
Trading-off alternative fault tolerance approaches for data mining approaches. Reliability of systems is presented by
Dr.R.Chinnaiyan and Dr.S.Somasundaram [25, 26, 27, 28, 29, 30, 31, and 32]. The authors evaluated the reliability of systems
with novel methods and ensured that the reliability models provides optimized reliability results. The same can be applied
to all the data mining tasks and techniques for providing reliable and optimized results.

VOLUME 4, ISSUE 7, DEC/2017 140 http://iaetsdjaras.org/


IAETSD JOURNAL FOR ADVANCED RESEARCH IN APPLIED SCIENCES ISSN NO: 2394-8442

6. APPLICATIONS IN DATA MINING


Data Mining For Financial Data Analysis
 In Banking Industry data mining is used for Fraud detection in credit card, Predicting the profit, Analysing the current
market trends etc.,
 In financial markets and neural networks data mining is used for Analysing the stock market, Predicting the financial
disasters, commodity price etc.,
 In Telecommunications Industry it is used to predict the customer satisfaction, new investment strategy planning,
current market study etc.
 Retail Industry is a main application where it is used to analyse various customer behaviours in buying the product and
introducing new product.
 In medical science it is used in diagnosis of diseases, patient satisfaction, history generation, etc. Mammography
Computer Aided Design tools are used in detecting the tumour’s.
 It is used in the sports industry also by scheduling the games for each and every day in the national and international
games throughout the world with a huge number of data’s.

7. CONCLUSIONS
Data mining plays a significant role in finding the different patterns and helps in discovering new ideas from a large datasets.
In this we try to group all the different Data Mining techniques based on their category of functions. With the help of these
techniques it is helping the business to take decisions on available data sets. It is also been said about the various application
areas. In future this paper aims to combine different algorithms used in different techniques and various applications that is
involved in those categories. Thus Data Mining is said to be a promising interdisciplinary area in the field of research.

REFERENCES
[1]. PhridviRaj MSB., GuruRao CV (2013) Data mining – past, present and future – a typical survey on data streams. INTER-ENG
Procedia Technology 12:255 – 263
[2]. Nikhil Bhongade.,ShrikantAkarte.,GauravSawaleA Review On:Data Mining Tools and Application.IJSTMR.,Volume 2, Issue
5, May 2017
[3]. Demšar J, Zupan B (2013) Orange: Data Mining Fruitful and Fun - A Historical Perspective. Informatica 37:55–60
[4]. SangeetaGoele, NishaChanana, “Data Mining Trend In Past, Current And Future,” International Journal of Computing & Business
Research, in Proc. I-Society 2012, 2012.
[5]. https://www.dragon1.com/terms/data-mining-definition
[6]. Spatial Data Analysis: Models, Methods and Techniques,by Jinfeng Wang and Manfred M. Fischer,5 August 2011
[7]. Usama Fayyad, Gregory Piatetsky-Shapiro, and Padhraic SmythFrom Data Mining to knowledge discovery in databases
[8]. DhruvMadanGopal, Aditya R, C Vishnu Kumar Reddy, Gautham S, Nagarathna N, “A Survey on Data Mining Applications,
Techniques and Challenges in Healthcare,” International Journal of Emerging Technologies and Innovative Research, April 2015, Volume 2,
Issue 4
[9]. P.Haripriya, R. Porkodi,A Survey Paper on Data mining Techniques and Challenges in Distributed DICOM, IJARCCE, Vol. 5,
Issue 3, March 2016
[10]. Indexing Techniques for Data Warehouses’ Queries,SirirutVanichayobon Le Gruenwald.,The University of Oklahoma School of
Computer Science Norman, OK, 73019.
[11]. R. Agrawal, H. Mannila, R. Srikant, H. Toivonen, and A. InkeriVerkamo. Fast discovery of association rules. In U. Fayyad and
et al, editors, Advances in Knowledge Discovery and Data Mining, pages 307– 328. AAAI Press, Menlo Park, CA, 1996.
[12]. Indexing and Data Access Methods for Database Mining,Ganesh Ramesh , William A. Maniatty,Dept. of Computer
Science,University at Albany ,Albany, NY 12222,Mohammed J. Zaki ,Dept. of Computer Science ,Rensselaer Polytechnic Institute,Troy, NY
12180.

VOLUME 4, ISSUE 7, DEC/2017 141 http://iaetsdjaras.org/


IAETSD JOURNAL FOR ADVANCED RESEARCH IN APPLIED SCIENCES ISSN NO: 2394-8442

[13]. Gary M. Weiss, Ph.D.,Brian D. Davison, Ph.D.,Data Mining,To appear in the Handbook of Technology Management, H. Bidgoli
(Ed.), John Wiley and Sons, 2010
[14]. Jerome H.Friedman, Department of statistics,Stanford Linear Accelerator Center, Stanford University, Stanford, CA 94305.
[15]. Tukey, J.W.(1962).The Future of Data Analysis, Ann.Statist.33,1-67.
[16]. David J. HAND., Data Mining: Statistics and More?
[17]. Bhandari,I.,Colet,E.,Parker,J.,Pines,Z.,Pratap,R., and Ramanujam,K.(1997),.Advanced Scout:Data Mining and Knowledge
Discovery in NBA Data.,Data Mining and Knowledge Discovery,1,121.125.
[18]. https://www.ibm.com/developerworks/library/ba-data-mining-techniques/
[19]. http://www.wideskills.com/data-mining/data-mining-applications
[20]. http://www.comp.dit.ie/btierney/Oracle11gDoc/datamine.111/b28129/regress.htm
[21]. Arun K Pujari, “Data Mining Techniques”, Universities India Private Limited, Hyderabad, 2001.
[22]. R. Tamilselvi., S. Kalaiselvi.,An Overview of Data Mining Techniques and Applications.,IJSR, India Online ISSN: 2319‐7064.
[23]. Jiawei Han and MichelineKamber (2006), Data Mining Concepts and Techniques, published by Morgan Kauffman, 2nd ed.
[24]. Mrs.Bharati M. Ramageri.,Data Mining Techniques and Applications.,Indian Journal of Computer Science and Engineering Vol. 1
No. 4 301-305
[25]. R.Chinnaiyan, Dr.S.Somasundaram, “Reliability Assessment of component based software systems using Test Suite -
A Review”, Journal of Computer Applications, Vol – 1, No.4, Oct – Dec 2008.
[26]. R.Chinnaiyan, Dr.S.Somasundaram, “An Experimental Study on Reliability Estimation of GNU Compiler
Components – A Review”, International Journal of Computer Applications (0975 – 8887) Volume 25– No.3, July 2011.
[27]. R.Chinnaiyan, Dr.S.Somasundaram, “Reliability of Component Based Software With Similar Software Components
- A Review”, i-Manager's Journal on Software Engineering; Nagercoil5.2 (Oct-Dec 2010).
[28]. R Chinnaiyan, S Somasundaram, “Monte Carlo Simulation For Reliability Assessment of Component Based Software
Systems”, i-Manager's Journal on Software Engineering, 2010.
[29]. R Chinnaiyan, S Somasundaram, “RELIABILITY ESTIMATION OF COMPONENT BASED
SOFTWARE SYSTEMS THROUGH MARKOV PROCESS”, International Journal of Mathematics, Computer Sciences and
Information Technology.
[30]. R.Chinnaiyan, S.Somasundaram(2010) “Evaluating the Reliability of Component Based Software Systems
“ ,International Journal of Quality and Reliability Management , Vol. 27, No. 1., pp. 78-88
[31]. R.Chinnaiyan, S.Somasundaram(2011), “An Experimental Study on Reliability Estimation of GNU
Compiler Components - A Review”, International Journal of Computer Applications,Vol.25, No.3, July 2011, pp.13-16.
[32]. R.Chinnaiyan, S.Somasundaram(2009) “Reliability of Object Oriented Software Systems using communication
variables – a review “ ,International Journal of Software Engineering, Vol.2, No.2,PP.87-96

VOLUME 4, ISSUE 7, DEC/2017 142 http://iaetsdjaras.org/

You might also like