You are on page 1of 10

Data Mining

Presented by

Dr. J. Nelson Raja


Assistant Professor
Department of Computer Application
Arul Anandar College ,Karumathur
Madurai
Evolution of Data Mining Systems

First Generation Systems :

 The first generation systems introduced in 1980s and included research


tools focused on individual Data Mining tasks, such as building a classifier
using a decision-tree or a neural network.

 These tools addressed specific data-analysis problems, and required


technically sophisticated users.

 The main difficulty was to use more than one tool on the same data,
which often required significant transformation of data and metadata.

 Example Purple Insight’s Mine Set , IBM’s Intelligent Miner, and SAS
Institute’s Enterprise Mine
Evolution of Data Mining Systems

2nd Generation Systems :

 The second-generation systems, called suites, were developed in the mid-


1990s. 2nd Generation Systems provided multiple types of integrated data
analysis methods and supported data cleaning, preprocessing, and data
transformation, visualization.

 Example : SPSS’s Clementine, Silicon Graphics Mineset and IBM’s


Intelligent Miner
Evolution of Data Mining Systems

3rd Generation Systems :

 The third generation systems were developed in late the 1990s and
introduced vertical data mining based applications and solutions.

 These systems used to take business decisions and addressed specific


business problems, such as financial fraud detection and explore the
business opportunities. Some of these suites also used KDDM process
models to guide the execution of projects.

 Example Purple Insight’s Mine Set and SAS Institute’s Enterprise Mine
Knowledge Discovery Process

1. Define the Problem


2. Collect, clean, and prepare the data
3. Data mining
4. Validate the models
5. Monitor the model
Knowledge Discovery Process

 Define the Problem: This step involves understanding the


problem and figuring out goals and expectations of the
project.

 Collect, clean, and prepare the data: This step involves,


Selecting and creating a data set. This step, requires much
effort ie 70% of the total data mining effort.

 Data mining: This step involves selecting appropriate data


mining tools, transforming the data, samples generation for
training and testing, to build or select a model.
Knowledge Discovery Process

 Define the Problem: This step involves understanding the problem


and figuring out goals and expectations of the project.

 Data Cleaning − In this step, the noise and inconsistent data are
removed.

 Data Integration − In this step, multiple data sources are combined.

 Data Selection − In this step, data relevant to the analysis task are
retrieved from the database.

 Data Transformation − In this step, data is transformed or


consolidated into forms for mining.
Knowledge Discovery Process

 Data mining: In this step , appropriate data mining methods are used
to extract data patterns or to build models..

 Validate the models: Test the model to ensure that it is producing


accurate and adequate results.

 Monitor the model: Monitoring a model is necessary as with passing


time, it will be necessary to revalidate the model to ensure that it is still
meeting requirements. A model that works today may not work
tomorrow and it is therefore necessary to monitor the behavior of the
model to ensure it is meeting performance standards.
Data Mining Supporting Technologies Overview

 Data mining is an integration of multiple technologies as shown in Figure.


Data Mining Supporting Technologies Overview

 Statistics, Decision Support Systems, Database Management and Warehousing,


Machine Learning, Visualization, and Parallel Processing are tools that interact
and support a data mining tool.

 Statistics and Machine Learning continue to be developed for more sophisticated


statistical techniques.

 Decision Support System help managers to make management decisions and


guide them in management .

 Visualization is the most useful technique which is used to discover data patterns.

 Database management and data warehouses techniques are used to integrate


the various data sources and organizing the data for effective data mining.

You might also like