Professional Documents
Culture Documents
html
A Paper On
“DATA MINING”
---The era of knowledge
engineering.
1 Email: chinna_chetan05@yahoo.com
Visit: www.geocities.com/chinna_chetan05/forfriends.html
Abstract :
Most organizations have accumulated a great deal of data, but what they really
want is information. What is the profitability of the customers? Which products are
normally sold together? Which customers are likely to jump ship? These are common
business questions, but the answers aren't easy to find. The newest, hottest technology
to address these concerns is data mining. Data Mining is the process of automated
extraction of predictive information from large databases. It predicts future trends and
finds behavior that the experts may miss as it lies beyond their expectations. Data
Mining is part of a larger process called knowledge discovery; specifically, the step in
which advanced statistical analysis and modeling techniques are applied to the data to
find useful patterns and relationships. In this paper we present an overview of the
different processes and techniques involved in Data Mining and with the help of a
case study of an airline we have projected the advantage of data mining.
2 Email: chinna_chetan05@yahoo.com
Visit: www.geocities.com/chinna_chetan05/forfriends.html
3 Email: chinna_chetan05@yahoo.com
Visit: www.geocities.com/chinna_chetan05/forfriends.html
These users usually perform three types 1.3 Data Mining Applications
4 Email: chinna_chetan05@yahoo.com
Visit: www.geocities.com/chinna_chetan05/forfriends.html
5 Email: chinna_chetan05@yahoo.com
Visit: www.geocities.com/chinna_chetan05/forfriends.html
1.4.1 Classification
The clustering techniques analyze a set the case of association) e.g. if a shopper
of data and generate a set of grouping buys item A in the first week of the
rules that can be used to classify future month, and then he buys item B in the
data. The mining tool automatically second week etc.
identifies the clusters, by studying the
pattern in the training data. Once the 1.4.4 Neural Nets and Decision Trees
clusters are generated, classification can For any given problem, the nature of the
be used to identify, to which particular data will affect the techniques we
cluster, an input belongs. For example, choose. Consequently, we'll need a
one may classify diseases and provide variety of tools and technologies to find
the symptoms, which describe each class the best possible model. Classification
or subclass. models are among the most common, so
1.4.2 Association the more popular ways for building them
An association rule is a rule that implies have been explained here.
certain association relationships among a Classifications typically involve at least
set of objects in a database. In this one of two workhorse statistical
process we discover a set of association techniques - logistic regression (a
rules at multiple levels of abstraction generalization of linear regression) and
from the relevant set(s) of data in a discriminate analysis. However, as data
database. For example, one may mining becomes more common, neural
discover a set of symptoms often nets and decision trees are also getting
occurring together with certain kinds of more consideration. Although complex
diseases and further study the reasons in their own way, these methods require
behind them. less statistical sophistication on the part
1.4.3 Sequential Analysis of the user.
In sequential Analysis, we seek to Neural nets use many parameters (the
discover patterns that occur in sequence. nodes in the hidden layer) to build a
This deals with data that appear in model that takes and combines a set of
separate transactions (as opposed to data inputs to predict a continuous or
that appearing the same transaction in categorical variable.
6 Email: chinna_chetan05@yahoo.com
Visit: www.geocities.com/chinna_chetan05/forfriends.html
7 Email: chinna_chetan05@yahoo.com
Visit: www.geocities.com/chinna_chetan05/forfriends.html
2.1 Introduction
'An application' was built for an airlines
company that wanted to explore hidden
trends in its data. The airlines company
wanted to improve its service levels by
identifying the customer behavior in
different sectors. The data that was
mined has information about the
members of the frequent flier program
and their travel as well as redemption
details. The model below was broadly
followed for building the mining
prototype.
8 Email: chinna_chetan05@yahoo.com
Visit: www.geocities.com/chinna_chetan05/forfriends.html
9 Email: chinna_chetan05@yahoo.com
Visit: www.geocities.com/chinna_chetan05/forfriends.html
10 Email: chinna_chetan05@yahoo.com
Visit: www.geocities.com/chinna_chetan05/forfriends.html
2.6.1 Category Type sector. The same people were also flying
11 Email: chinna_chetan05@yahoo.com
Visit: www.geocities.com/chinna_chetan05/forfriends.html
12 Email: chinna_chetan05@yahoo.com
Visit: www.geocities.com/chinna_chetan05/forfriends.html
13 Email: chinna_chetan05@yahoo.com