You are on page 1of 18

7.

Kerangka Kerja
Proses Penambangan Data

AIK21432 (3 sks)
Penambangan Data
Pokok Bahasan

7.1 Kerangka Kerja CRISP-DM

7.2 Contoh Implementasi CRISP-DM


7.1 Kerangka Kerja CRISP-DM

CRISP-DM: Cross-Industry Standard Process


for Data Mining
Konsorsium yang melibatkan:
NCR Systems Engineering Copenhagen
DaimlerChrysler AG
SPSS Inc.
OHRA Verzekeringen en Bank Groep B.V
Sejarah:
Version 1.0 released in 1999
Version 2.0 being developed
See www.crisp-dm.org for details
7.1 Gambaran Visual
7.1 Fase-Fase CRISP-DM

1. Business Understanding
Initial phase
Focuses on:
Understanding the project objectives and requirements from a
business perspective
Converting this knowledge into a data mining problem definition,
and a preliminary plan designed to achieve the objectives

2. Data Understanding
Starts with an initial data collection
Proceeds with activities aimed at:
Getting familiar with the data
Identifying data quality problems
Discovering first insights into the data
Detecting interesting subsets to form hypotheses for hidden
information
7.1 Fase-Fase CRISP-DM

3. Data Preparation
Covers all activities to construct the final dataset (data that will
be fed into the modeling tool(s)) from the initial raw data
Data preparation tasks are likely to be performed multiple times,
and not in any prescribed order
Tasks include table, record, and attribute selection, as well as
transformation and cleaning of data for modeling tools
4. Modeling
Various modeling techniques are selected and applied, and their
parameters are calibrated to optimal values
Typically, there are several techniques for the same data mining
problem type
Some techniques have specific requirements on the form of data,
therefore, stepping back to the data preparation phase is often
needed
7.1 Gambaran Visual

5. Evaluation
At this stage, a model (or models) that appears to have
high quality, from a data analysis perspective, has been
built
Before proceeding to final deployment of the model, it is
important to more thoroughly evaluate the model, and
review the steps executed to construct the model, to be
certain it properly achieves the business objectives
A key objective is to determine if there is some
important business issue that has not been
sufficiently considered
At the end of this phase, a decision on the use of the
data mining results should be reached
7.1 Gambaran Visual

6. Deployment
Creation of the model is generally not the end of the
project
Even if the purpose of the model is to increase knowledge
of the data, the knowledge gained will need to be
organized and presented in a way that the customer can
use it
Depending on the requirements, the deployment phase
can be as simple as generating a report or as complex as
implementing a repeatable data mining process
In many cases it will be the customer, not the data
analyst, who will carry out the deployment steps
However, even if the analyst will not carry out the
deployment effort, it is important for the customer to
understand up front what actions will need to be
carried out in order to actually make use of the created
models
7.1 Ringkasan CRISP-DM

Business Data Data


Modeling Evaluation Deployment
Understanding Understanding Preparation

Determine Collect Initial Data Data Set Select Modeling Evaluate Results Plan Deployment
Business Objectives Initial Data Collection Data Set Description Technique Assessment of Data Deployment Plan
Background Report Modeling Technique Mining Results w.r.t.
Business Objectives Select Data Modeling Assumptions Business Success Plan Monitoring and
Business Success Describe Data Rationale for Inclusion / Criteria Maintenance
Criteria Data Description Report Exclusion Generate Test Design Approved Models Monitoring and
Test Design Maintenance Plan
Situation Assessment Explore Data Clean Data Review Process
Inventory of Resources Data Exploration Report Data Cleaning Report Build Model Review of Process Produce Final Report
Requirements, Parameter Settings Final Report
Assumptions, and Verify Data Quality Construct Data Models Determine Next Steps Final Presentation
Constraints Data Quality Report Derived Attributes Model Description List of Possible Actions
Risks and Contingencies Generated Records Decision Review Project
Terminology Assess Model Experience
Costs and Benefits Integrate Data Model Assessment Documentation
Merged Data Revised Parameter
Determine Settings
Data Mining Goal Format Data
Data Mining Goals Reformatted Data
Data Mining Success
Criteria

Produce Project Plan


Project Plan
Initial Asessment of
Tools and Techniques
7.2 Contoh Implementasi CRISP-DM

Ch 4 DMM: Correlation
Sarah is a regional sales manager for a
nationwide supplier of fossil fuels for home heating.
Recent volatility in market prices for heating oil
specifically, coupled with wide variability in the size
of each order for home heating oil, has Sarah
concerned. She feels a need to understand the
types of behaviors and other factors that may
influence the demand for heating oil in the
domestic market.
7.2 Contoh Implementasi CRISP-DM

1. Business (Organizational)
Understanding
Sarahs goal is to better understand how her company can
succeed in the home heating oil market. She recognizes that
there are many factors that influence heating oil
consumption, and believes that by investigating the
relationship between a number of those factors, she will be
able to better monitor and respond to heating oil demand.
She has selected correlation as a way to model the
relationship between the factors she wishes to investigate.
7.2 Contoh Implementasi CRISP-DM

2. Data Understanding
Working together, using Sarahs employers data resources
which are primarily drawn from the companys billing
database, we create a data set comprised of the following
attributes:
Insulation*
Temperature*
Heating_Oil*
Num_Occupants*
Avg_Age*
Home_Size*

*) with description
7.2 Contoh Implementasi CRISP-DM

3. Data Preparation
7.2 Contoh Implementasi CRISP-DM

4. Modeling
7.2 Contoh Implementasi CRISP-DM

5. Evaluation
Relationship between the Heating_Oil consumption
attribute, and the Insulation rating level attribute. The
coefficient there, is 0.736. This is a positive number, and
therefore, a positive correlation. But what does that
mean?
Relationship between the Temperature attribute and the
Insulation rating attribute, we see that the coefficient
there is -0.794
The closer a correlation coefficient is to 1 or to -1, the
stronger it is.
7.2 Contoh Implementasi CRISP-DM

6. Deployment
There are several possible outcomes from
this investigation.
1. Dropping the Num_Occupants attribute.
2. Investigating the role of home insulation.
3. Adding greater granularity in the data set.
4. Adding additional attributes to the data set.
Tugas Individu

Install Rapidminer
Selesaikan tutorial :
Basic
Data Handling
Modeling, Scoring, Validation
Kemudian mengisi kuis:
http://bit.ly/quizrapidm
End of File

You might also like