Recent Applications

DATA MINING
SUBMITTED TO:
MAAM SOBIA KHALID
SUBMITTED BY:
Rehna BATOOL
2013-BCS-033
RIDA BATOOL
2013-BCS-034
RIDA SAFEER
2013-BCS-036
SADIA FIRDOUS
2013-BCS-038
SUMIA AHMAD
2013-BCS-041
DEPARTMENT:
BS(CS)- VI (GROUP B)
SUBMISSION DATE:
May 19, 2016
A Review Paper On
Recent Trends and Applications in Data Mining

Sumia Ahmed
Rehna Batool
Department of Computer Sciences

Fatima Jinnah Women University
The Mall Rawalpindi, Pakistan
sumiaahmd48@gmail.com

rehna.batool94@gmail.com
Sadia Firdous
Rida Batool

Simple301994@yahoo.com

ridabatool407@gmail.com
Rida Safeer
rida.safeer9@gmail.com
Supervisor Name
Sobia Khalid
Abstract: Data mining has been used in all the fields
of life. In science and technology there is a wide use
of
data
mining
techniques
used
in
telecommunication, education, biology, retail
industry. In this research paper we have described
the use of data mining in the applications which are
listed above. We have discussed that how data mining
is applied in that fields and which algorithms they
use to achieve their result. As data mining involve
extracting information patterns from the data set and
transform them in the considerable form for use. This
makes the data mining process a complicated
problem that it involves data processing, use of
different algorithm etc. Data mining applications
areas include different behaviors, gathering desired
information is vast. As it process the data and
derived some rules from it. In this paper the different
applications of data mining has been discussed along
with the algorithms.
Key words: Network Intrusion Detection (NID),

KDD99, DoS, U2R, R2L, MADAM ID, Nave
Bayes, K-mean, Apriori algorithm, UCAM algorithm,
Moodle System, OLAP.
Introduction:
Data mining can be considered a way to find out
knowledge in large databases. As a new technology,
data mining has emerged with the development of
database technologies, which allow the user to access
or process a large quantity of information. The
purpose of data mining techniques is to produce
automatic tools to investigate and extract information
from databases [9]. The extracted knowledge is
finally offered in terms of models and rules among
variables. Data mining techniques are very dominant
tools and can be used to describe the database in a
concise way by capturing important properties, or to
predict new data based on a set of models/rules
extracted from the database. Due to their
multidisciplinary application, a multitude of data
mining techniques have been studied, used and
proposed in a variety of different fields [4]. Data

mining works best when there is lots of data available
with as many fine points as possible. Modern
technologies allow storage of large amounts of raw
data which always contains usable information. Data
mining offer us techniques which can be used to
analyze that data and uncover previously unknown
rules and associations, hidden knowledge which once
acquired and appropriately interpreted [17].
In the first section of the paper telecommunication
and its techniques are discussed. Second section of
paper consist of retail industry and its techniques.
Third section of paper comprises of intrusion
detection and different algorithms to handle it. Fourth
and fifth sections contain health informatics and
education respectively.
1.
of 1- k (max). It is calculated to find the specific

range that will bring the decrease in average
diameter. The K-mean algorithm run for k=1, 2, 3 .
, log2 K (max) and the value of k lies between this
range. If the range is between [Kmax/2, Kmax], then
the number steps required to be performed are 2log2
Kmax +1. The flow of chart of K-mean is explained
in the Fig 1 [2, 3].
Input Data
Initialize the platform
TELECOMMUNICATION
Telecommunication industry use data mining because

it has to handle large amount of the customer data.
They use data mining methods to improve their
marketing demands and to manage their
telecommunication networks. The change of
customers to competitors is a substantial business
problem
which
is
prominent
in
the
telecommunication industry. This is offered referred
to as churn. The company who can make a possible
transition of customer then it can make an operative
steps to stops this and can take advantage from it. As
Weka is an open source data mining tool, which can
be used to identify the users so that they can meet the
competition in business. Different algorithms are
implemented and checked that through which better
result can be obtained [1].
Different techniques of data mining have been
developed. These techniques are used to reduce the
churn. One of the technique is K-mean
1.1 K- Mean
It is a clustering technique for grouping similar
characteristics. K mean require the number of
clusters to be quantified, but most of time the data is
not available which become difficult for the
determination of K. It minimizes the sum of squared
distances of all the pints and the center of cluster. The
most widely used algorithm for classification of data
is developed by Tou and Gonzalez which describe the
unsupervised clustering technique.
In telecommunication networks customer analysis
permit the k to be determined from the information
provided. The value of range should be from 1-k
(max). The value of k is large to reflect the data set
characteristics. This value should not be closed to
number of data sets because if it is close to the value
of data set it become less meaningful. To find the
value of K the k values should be between the rages
Setup the K value in each node
K=1
K=2
K= K(max)
Data Analysis
Bigger Change
Diameter
Small Change
Output K and mean square.
Fig 1. It is a parallel K-mean clustering algorithm.

Different K values are assigned on different data
nodes. K value is figured on individual machine in
parallel.
1.2 Rough Set Theory
As data mining is a method to process data to find the
rules that are used to make decisions. Rough Set
theory was suggested by y Z. Plawlak in 1980s. It is
mathematical model to study the incomplete or
inconsistent data. Data is redefined in rough set with
the new perception. Rough set theory consist of
Upper lower approximation and reduct and
significance.
Rough Set divides the data mining in
telecommunication in to three steps. These steps are
1.
Data Preprocessing
In data preprocessing the sub chamber is used to

clean the original data. This property will be used to
fill the mixing element.
2. Choose an Attribute Set
After the Data Preprocessing the attribute set is
selected. And the data table will be divided in to two
parts i.e. Condition Attribute and the Distribution
Attribute.
3. Rule extraction and reduction
The attribute condition can be calculated by
discemibility matrix. The discemibility matrix is of
12 X 12 matrix. It requires the half of the elements to
be listed.
The rough set is used for the development of sales
strategy and fee strategy. And it has many filed to
explore it [4].
2.
RETAIL INDUSTRY
Retail industry gathers a large number of retail sales

data. Due to the current competition in this industry,
retailers are struggling to improve their operations in
order to run their stores more professionally [5].
Recently, in retail industry the abilities of generating
and collecting data have been increasing swiftly.
Extensive use of bar codes for most commercial
products, the computerization of many business, and
the progress in data collection tools have provided us
with huge amount of retail data. This huge amount of
data creates problem. In this sense, data mining may
be cure to this problem [6]. Because this explosive
growth in data and databases has produced an urgent
need for data mining techniques and tools that can
extract implicit, previously unknown and possibly
useful information from data in data storages [5]. In
retail sector data mining can be applied for following
purposes:
2.1 Acquiring and Retaining Customer
It is more expensive to reach new customers than to
get existing one. So by knowing existing customers
purchase behavior, direct marketer can guess
customers need and concern in buying specific
product. Using this type of prediction retailer can
hold existing customers, by giving discounts or
offers, fascinate customers and acquire customers [7].
2.2 Market Basket Analysis
One of the most widely held data mining approaches
is "association rules", which is commonly useful to
study market baskets to help managers to
determine which items are generally purchased
together by customers. It is, firstly, presented by R.
Agrawal, T. Imielinski, and A. Swami [5].
The Apriori algorithm, which is recommended by
Rakesh Agrawal in 1993, is the most classical
algorithm for mining association rules among data
mining techniques and tools [8].
2.2.1 Description of the classical Apriori

algorithm
Association rule studies the frequency of items
befalling together in transactional database.
Normally, it is decomposed in two phases for mining
association rules. In the first phase, we discover large
item sets, whose supports (number of occurrences)
are greater than or equal to the user-defined least
support. In the second phase, we use large item sets
created in the first phase to make effective
association rules. An association rule is effective, if
its confidence is greater than or equal to the userdefined minimum confidence. The problem of
determining association rules gradually followed by
refinements, generalizations, improvements and
extensions, containing generalized association rules
and multi-level patterns, determining association
rules with categorical and numeric attributes[5,7].
Apriori provides an iterative approach known as a
level wise search, where k-item sets are used to find
(k+1) item sets. First, if the set of frequent 1-itemset
L1 is found at next step L1 is used to find frequent 2itemset L2. Then L2 is used to determine frequent 3itemset L3. This algorithm repeats or iterate until no
more frequent k-item set can be found. The finding of
Lk (k level) requires a full scan of the database.
An example of the algorithm
The sales data of a retail enterprise is shown in Table
1 as an example of the Apriori algorithm.
Table 1. Sales data of a retail enterprise [7]
Table 2. Logo items [8]
Suppose the minimum support count of the

transaction is 3 and the minimum confidence is 50%.
As a result of Apriori algorithm
I1 => I3 (shampoo => toothpaste)
I2 => I3 (toothbrush => toothpaste)
I2 I3 => I4 (toothbrush toothpaste =>
gargle cupare) are all strong Rules [8].
2.3 Customer Segmentation and Target
marketing
Segmentation is to divide the market into various
parts on the basis of certain characters. Data mining
can be used for making groups or clusters of

customers on the basis of behavior. This type of
information is useful to describe similar customers in
a cluster, holding on good customers and identify
expected responders for target marketing.
Data mining in retail industry can be used for market
campaigns, to identify profitable customers using
reward based points. The retail industry will gain,
sustain and will be more successful in this
competitive market if implemented data mining
technology for market campaigns [7].
3.
INTRUSION DETECTION
Intrusion Detection Systems (IDS) are security tools

that are used to make stronger the security of
communication and information systems. This
method is parallel to other procedures such as
antivirus software, firewalls and access control
schemes [12]. The objective of intrusion detection is
to detect security harms in information systems.
Intrusion detection is a submissive method to security
as it controls information systems and raises alarms
when security abuses are detected [11].
Network attacks and cyber-crimes have increased in
number and become severe over the past few years so
intrusion detection is becoming a critical task to
secure the network. To protect against various cyberattacks and computer bugs, different types of
computer security procedures have been thoroughly
studied in past like cryptography, firewalls and
intrusion detection. But among all these procedures,
network intrusion detection (NID) has been
considered to be one of the best and most important
method for providing protection against complex and
dynamic intrusion behaviors. Furthermore, there are a
lot of redundant or unpredictable information is
increasing security actions. Only mining the real
attacks from these abundant and complex security
events can make a rational valuation and precise
accountabilities on the network security [9, 10].
Different researchers proposed different techniques
and models to provide protection against intrusions.
3.1. Network Security Management Model:
Li et al proposed a novel network security
management model based on data mining to provide
protection against the defect that the traditional
computer network security management system
processes a huge amount of data with low efficiency
and accuracy. This model use multi-source data
collection approach to accomplishment and
assimilate the related data of different security
products and gain benefits from data mining
technology for logical and careful analysis of huge
data and reaction mechanically. Li et al performed
experiment and experimental result discloses that the

model performs well [9].
This model further includes five modules that are
data acquisition, data integration and preprocessing,
data mining, information database and user interface.
This model is useful for the identification of real
attacks from a huge amount of security actions with a
noble performance, so that the precision, intelligence
and flexibility of network security management will
be boosted and the requirements of new security
situation will be met [9].
3.2. Classifier algorithms:
Nguyen et al discuss and evaluate the performance of
inclusive set of classifier algorithms that are using
KDD99 dataset. Classifier algorithms are Bayes Net,
Nave Bayes, C4.5 Decision Tree and Decision Table.
KDD99 is a knowledge discovery database contains a
standard set of data for intrusion detection. It refers to
the overall process of discovering useful knowledge
from data. It involves the evaluation and possibly
interpretation of the patterns to make the decision of
what qualifies as knowledge. It also includes the
choice of encoding schemes, preprocessing,
sampling, and projections of the data prior to the data
mining step. KDD99 dataset contains one type of
normal data and 24 different types of attacks that are
broadly categorized in four groups Probe
(information gathering), DoS (deny of service), U2R
(user to root), R2L (remote to local). The KDD99
dataset might have been criticized for its potential
problems, but the fact is that it is the most widespread
dataset that is used by many researchers and it is
among the few comprehensive datasets that can be
shared in intrusion detection nowadays [10].
3.3. MADAM ID Algorithm:
Manoj et al proposed MADAM ID Algorithm
(Mining Audit Data for Automated Models for
Intrusion Detection). MADAM ID is a network based
intrusion detection system that uses a data mining
approach to detect glitches as well as misuse
detection. Using MADAMID, raw audit data is first
preprocessed into records with a set of basic features,
e.g., duration, source and destination hosts and ports,
number of bytes transmitted, etc. By applying Data
mining algorithms patterns like association rules can
be computed from the audit records. Association
rules describe associations/relations between system
features whereas frequent episodes capture the
consecutive co-occurrences of system events (e.g.,
what network connections are made within a short
time-span). Association rules and frequent episodes
combines to form the statistical summaries of system
actions. Field knowledge is necessary in MADAM
ID [11].
3.4. Clustering based & Classification based
Algorithms:
Agrawal et al review several data mining techniques

for intrusion detection to offer better understanding
between the current procedures that may help
fascinated investigators to work future in this trend.
Anomalies or intrusions are form in the data that do
not follow to precise usual activities. Agrawal et al
discuss different methods of intrusion/anomaly
detection emphases on the wide organization of
current data mining techniques. . Data mining
consists of four types of jobs that are association rule
learning, clustering, classification and regression.
Clustering based anomaly detection techniques (kMeans clustering, Outlier Detection Algorithms),
Classification based anomaly detection (classification
tree is ID3 and C4.5, Nave Bayes network, Support
Vector Machine) and combination of different
algorithms are used to detect anomalies. These
algorithms have better performance. In future
combination of these techniques are used to get good
results.
4.
HEALTH INFORMATICS
Health informatics is one of the top most focal point

of researchers now a days. Availability of timely and
accurate data is crucial for informed medical decision
making. Health care organizations face a common
problem with large amount of data they have in many
systems. Such systems are amorphous and
unorganized, requires computational time for data
integration. Researchers, medical practitioners, health
care providers and patients will not be able to utilize
the knowledge stored in diverse repositories unless
synthesize the information from disparate sources
[13]. The health care scheme is "data rich" however
"knowledge poor ". A tool which can process data in
meaningful way is the need of the time. Data mining
methods can help as therapy in this circumstance. For
this cause, different data mining techniques can be
utilized.
Motivated by the world-wide increasing mortality of
heart disease patients each year and the availability of
vast amounts of data, researchers are using data
mining techniques in the diagnosis of heart disease.
Data mining joins factual examination, machine
learning and database engineering to dig out hidden
patterns and connections from substantial. The two
most typical modeling goals of data mining are
classification and prediction [14]. Many data mining
techniques brings with set of techniques to find out
hidden patterns for making decision in healthcare
organizations. Some are discussed here.
4.1. Decision Tree:
The decision tree approach is one of the most
influential techniques in classification in data mining.
It construct the models in the form of tree structure.
Mainly, dataset breaks in small sets and in parallel, an

associated decision tree is formed. For medical
purposes, decision tree determine order in diverse
attributes and a decision is taken based on an
attribute.
Advantages:
Easy to understand, deduce.
Rules are simply generated.
Implicit carry out feature selection.
Permit addition of new data.
Disadvantages:
Non numeric data is tricky to handle.
Trees with many branches are hard to
understand.
Time consuming [15].

4.2. Artificial neural network:
An artificial neural network is information processing
system encouraged by biological nervous system.
Neural network is organized into number of layers
consisting of enormous number of elements that are
highly interconnected i.e. neurons that have an
activation function. Different patterns are generated
with input layer that communicates with one or more
hidden layer and at last output layer is generated. A
neural network can easily be trained to perform
functions by modification in values of weight among
elements. Neural networks make a functional tool to
help doctors to analyze, model complex clinical data.
Advantages:
Neural networks can straightforwardly
handle missing or noise data.
Once trained, does not require

reprogramming.
It can easily work with huge number of
datasets.
Disadvantages:
Neural network needs training to work well.
High processing time is required for big
networks.
Neural networks cannot be retrained i.e. if
there is any adjustments in data, it is almost
impossible to add to an existing network.
4.3. Naive Bayes classifier:

Naive Bayes classifier is a simple probabilistic
classifier that depends on Bayes' theorem. It is also
known as "independent feature model". In general
terms, a naive Bayes classifier suppose that the
presence (or absence) of a particular feature of a class
is unrelated to the presence (or absence) of any other

feature. Naive Bayes classifiers are taught to work in
supervised learning. It is particularly suitable when
the dimensionality of the inputs is high.
4.4. Bayes' Theorem:
Probability (B given A) = (Probability (A and
B)/Probability (A))
In medical DM, Nave Bayes classifier plays a
critical role. It demonstrates high performance as if
attributes are not dependent on one other, one can
easily use it in medical diagnosis. As in medical data,
there are missing values and this classifier can simply
handle missing values.
Advantages:
Easy handle of huge amount of data.
It mainly requires small amount of training
set to approximate the parameters i.e. mean
and variance needed for classification.
Fast to train and quick to classify.
Not sensitive to unrelated features.
Handles genuine and discrete data.
Handles streaming data fine.
Disadvantages:
Loss of precision.
Practically, there are dependencies between
variables, but these dependencies are not
handled by the classifier.
Suppose independence of features.[16]
So applying data mining techniques to help health
care professionals in the diagnosis of diseases is
having some success. The use of data mining
techniques to classify an appropriate treatment for
disease has received less attention. As it has revealed
promising results in the diagnosis of diseases, even
then it needs further investigation.
5.
EDUCATION
Education is the backbone of all rising countries.

Upgrading of the education structure, upgrades the
country to the world top ranking level [18].Web
based education has become the new growth point of
education development. There are a lot of problems
in current web based education, such as singularity
teaching mode, educational resource stacking simply,
low intelligence level etc. Data mining algorithms
and techniques can be used in the academic
community to potentially advance some aspects of
education quality. The adaptability in systems of
education and learning requires an in depth
photograph of the students mental state [17].
There are many algorithms and techniques used to
mine the educational data. Some techniques and
algorithms are described below.
5.1. Clustering:
It is also a technique of data mining that can be used

to upgrade the education system. Educational systems
has the set of large amount of students profile data,
here the data mining clustering techniques can
applied to find interesting relationship between
attributes of students. Cluster analysis solves the
given data into some meaningful groups. Normally
the performances of the students can be classified
into different patterns as normal, average and below
average. Clustering is a broadly used technique in
data mining application for discovering patterns in
large dataset. The aim of cluster analysis is
investigative, to find if data naturally falls into
meaningful groups with small within-group
variations and large between-group variations [18].
5.2. UCAM (Unique clustering with Affinity
Measures) clustering algorithm:
UCAM algorithm is a clustering algorithm mostly for
numeric data. It mainly focuses on the negative
aspect of K-Means clustering algorithm. In K-Means
algorithm process is begin with the initial seeds and
number of cluster to be obtained. But the number of
cluster that is to be obtained cannot be predicted on a
single observation of the dataset. The result may not
unique if the number of cluster and the initial seed is
not properly recognized. UCAM algorithm is applied
with the help of affinity measure for clustering. The
process of clustering in UCAM initiated without any
centroid and number of clusters that is to be created.
But it set the threshold value for making unique
clusters [18].
5.3. Automatic text categorization technology:
Automatic text categorization consist of automatic
classification and automatic clustering.
Automatic classification:
Automatic classification is that you must define a
fine classification system at first. And then create the
pre-classified document as a training set, get
classification model from the training set, at the last,
classify further documents with classification model
through training.
Automatic clustering:
Automatic clustering is that a computer system
studies internal or external characteristics of the
classified items. And then compare these features in
accordance with definite requirements; group the
objects with similar or identical characteristics. As
the automatic clustering does not require training
process, so it has an assured degree of flexibility and
Power of higher processing.
Therefore, in the mounting of this system, we want to
realize the automatic classification operations of text
by automatic clustering technology. The common
automatic clustering methods take account of once
clustering method, reverse center clustering method,
density test method and so on [19].
5.4. Moodle System (Modular Object Oriented

Development Learning Environment):
E/m-learning and computer game learning are based
on the interaction between the student and the
application, and thus produce a variety of data
suitable for data mining. Although e-learning systems
grasp large amounts of data, they are primarily
designed to support learning rather than the analysis
of the stored data. Moodle System (Modular Object
Oriented Development Learning Environment), is
one of the mainly used open source systems for elearning. It has tables which accumulate
configuration settings, user profiles, courses, access
and activity data, etc. In Moodle, user authentication
is carried out by password. In e learning systems
some other techniques used are:
Regression predicting the time students
will expend logged onto the system, or
predicting to what extent students will be
satisfied with the educational institution.
Grouping establishing models of students
who work in parallel conditions and
recommending activities the student has not
yet used which would help him or her in the
acquisition of knowledge depending on the
predisposition.
Classification
predicting
students final grades, according to
their accomplishments in the
system [20].
Conclusion:
In this paper an attempt is made to define data mining
as a tool used to extract important information from
large databases so that various sectors can make
better business decisions. So it is concluded
from the analysis of above mentioned
techniques
that
among
numerous
innovations in recent technology, data
mining is making comprehensive changes
in the field of telecommunication, retail
industry, intrusion detection, health care
and higher education. It has remarkable
applications in these fields. Data mining
can offer us with a range of answers
relating to data of these areas. The experts of
these areas need the help of the researchers of data
mining field to accelerate the task of more model
construction. Data mining can understand the
concrete needs of the above mentioned techniques
and enhance the accuracy and scientific decisionmaking of departments and provide better service to
the people.
References:
[1]. Srdjan M. Sladojevic, Dubravko R. Culibrk,
Vladimir S. Crnojevic Predicting the Churn of
Telecommunication Service Users using Open
Source Data Mining Tools in TELSIKS October
8-11 978-1-4577-2019-2/11/ 2011 IEEE
[2]. Da-Qi Ren, Da Zheng, Guowei Huang, Shujie
Zhang, Zane Wei The US R&D Center, Huawei
Technologies 2330 Central Expessway, Santa Clara,
CA 95050, USA Parallel Set Determination and
K-means Clustering for Data Mining on
Telecommunication Networks " 2013 IEEE
International Conference on High Performance
Computing and Communications & 2013 IEEE
International Conference on Embedded and
Ubiquitous Computing
[3]. Xu Hong, Qian Gangyi Data Mining in
Market Segmentation and Tariff Policy Design: a
Telecommunication Case 2009 Asia-Pacific
Conference on Information Processing
[4]. Lei Li, Fang-Cheng Shen, De-Zhang Yang
College of Science, Nanjing University of Posts and
Telecommunications,
Nanjing,
210003,China
Application of Data Mining Based on Rough Sets
in the Field of Telecommunications 978-1-42445540-9/10 2010 IEEE
[5]. Maryam Nafari, Jamal Shahrabi, A Data
Mining Approach for Finding Optimal Discount of
Retail Assortments, Proceedings of International
Workshop on Data Mining and Artificial Intelligence
(DMAI 08) 24 December, 2008, Khulna, Bangladesh
[6]. Changsheng Zhang. Jing Ruan, A Modified
Apriori Algorithm with Its Application in
Instituting Cross-Selling Strategies of the Retail
Industry, 2009 International Conference on
Electronic Commerce and Business Intelligence,
IEEE DOI 10.1109/ECBI.2009.121
[7]. X. M. Ye, B. X. Liu, The Retail Cross-Selling
Strategy Based on Mining Association Rules, J.
Statistics and Strategy, vol. 4, pp. 156-157, 2007.
[8]. Bharati M. Ramageri, Dr. B.L. Desai, Role of
data mining in retail sector, International Journal
on Computer Science and Engineering (IJCSE),
ISSN: 0975-3397 Vol. 5 No. 01 Jan 2013
[9]. Lin Li, De-bao Xiao, Research on The
Network Security Management Based on Data
Mining 2010 3rd International Conference on
Advanced Computer Theory and Engineering
(ICACTE).
[10]. Huy Anh Nguyen, Deokjai Choi, Application
of Data Mining to Network Intrusion Detection:
Classifier Selection Model Chonnam National
University, Computer Science Department, 300
Yongbong-dong, Buk-ku, Gwangju 500-757, Korea.
[11]. Manoj, Jatinder Singh, Applications of Data
Mining for Intrusion Detection International
Journal of Educational Planning & Administration.

Volume 1, Number 1 (2011), pp. 37-42 Research
India Publications.
[12]. Shikha Agrawal, Jitendra Agrawal, Survey on
Anomaly
Detection
using
Data
Mining
Techniques
International
Conference
on
Knowledge Based and Intelligent Information and
Engineering Systems. Procedia Computer Science 60
(2015) 708 713.
[13]. Shahidul Islam Khan; Abu Sayed Md. Latiful
Hoque, Towards Development of Health Data
Warehouse: Bangladesh Perspective, 2nd Int'l
Conf. on Electrical Engineering and Information &
Communication Technology (ICEEICT) 2015
Iahangirnagar University, Dhaka-1342, Bangladesh,
21-23 May 2015
[14]. Xia Zhao, Youbing Zhao, Nikolaos Ersotelos,
Dina Fan, Enjie Liu, Gordon J. Clapworthy, Feng
Dong, A Scalable Data Repository for Recording
Self-Managed Longitudinal Health Data of Digital
Patients, 978-1-4799-3163-7/13/$31.00 2013
IEEE.
[15]. Anjana Gosain and AmitKumar, Analysis of
Health Care Data Using Different Data Mining
Techniques, 978-1-4244-4711-4/09/$25.00 2009
IEEE, lAMA 2009
[16]. Monika GandhiDr. Shailendra Narayan Singh,
Predictions in Heart Disease Using Techniques of
Data Mining, 2015 1st International Conference on

Futuristic trend in Computational Analysis and
Knowledge Management (ABLAZE-2015).
[17]. Edson Pinheiro Pimentel and Nizam
Omar,Towards a Model for Organizing and
Measuring Knowledge Upgrade in Education with
Data Mining, 0-7803-9093-8/05/$20.00 2005
IEEE.
[18]. Banumathi.A, Pethalakshmi.A, A Novel
Approach for Upgrading Indian Education by
Using Data Mining Techniques.
[19]. Mingzhang Zuo, Lixin Diao,Qiang Liu,Peishun
Wang Data Mining strategies and techniques of
Internet education public sentiment monitoring
and analysis system, 978-1-4244-5824-0/$26.00 _c
2010 IEEE.
[20]. Petar Juri, Maja Mateti and Marija Brki,
Data Mining of Computer Game Assisted e/mlearning Systems in Higher Education, MIPRO
2014, 26-30 May 2014, Opatija, Croatia.

Recent Applications

Uploaded by

Document Information

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Recent Applications

Uploaded by

Copyright:

Available Formats

DATA MINING

Recent Trends and Applications in Data Mining

Department of Computer Sciences

Department of Computer Sciences

Department of Computer Sciences

Department of Computer Sciences

Key words: Network Intrusion Detection (NID),

proposed in a variety of different fields [4]. Data

of 1- k (max). It is calculated to find the specific

Initialize the platform

Telecommunication industry use data mining because

Setup the K value in each node

Fig 1. It is a parallel K-mean clustering algorithm.

In data preprocessing the sub chamber is used to

Retail industry gathers a large number of retail sales

2.2.1 Description of the classical Apriori

Table 2. Logo items [8]

Suppose the minimum support count of the

can be used for making groups or clusters of

Intrusion Detection Systems (IDS) are security tools

experiment and experimental result discloses that the

Agrawal et al review several data mining techniques

Health informatics is one of the top most focal point

Mainly, dataset breaks in small sets and in parallel, an

Time consuming [15].

Once trained, does not require

4.3. Naive Bayes classifier:

is unrelated to the presence (or absence) of any other

Education is the backbone of all rising countries.

It is also a technique of data mining that can be used

5.4. Moodle System (Modular Object Oriented

Journal of Educational Planning & Administration.

Data Mining, 2015 1st International Conference on

You might also like