Professional Documents
Culture Documents
Data Mining
Student Name:
Data mining is a process that uses statistical, mathematical, artificial intelligence, and machine learning techniques to extract and identify useful information and subsequent knowledge from large database.
(Cross-Industry Standard Process for Data Mining) SEMMA (Sample, Explore, Modify, Model, and Assess) KDD (Knowledge Discovery in Databases)
2
Data Understanding
3
Data Preparation
Data Sources
6
Deployment Model Building
5
Testing and Evaluation
CRISP-DM provides a systematic and orderly way to conduct data mining projects. This process has six steps. First, an understanding of the data and an understanding of the business issues to be addressed are developed concurrently. Second, data are prepared for modeling; After data are modeled; Next, model results are evaluated; finally the models can be employed for regular use.
Assess
(Evaluate the accuracy and usefulness of the models)
Explore
(Visualization and basic description of the data)
SEMMA
Model
(Use variety of statistical and machine learning models )
Modify
(Select variables, transform variable representations)
The main difference between CRISP-DM and SEMMA is that CRISP-DM takes a more comprehensive approachincluding understanding of the business and the relevant datato data mining projects, whereas SEMMA implicitly assumes that the data mining projects goals and objectives along with the appropriate data sources have been identified and understood.
Classification learns patterns from past data (a set of informationtraits, variables, featureson characteristics of the previously labeled items, objects, or events) in order to place new instances (with unknown labels) into their respective groups or classes. The objective of classification is to analyze the historical data stored in a database and automatically generate a model that can predict future behavior.
Association rule mining is a popular data mining method that is commonly used as an example to explain what data mining is and what it can do to a technologically less savvy audience. Association rule mining aims to find interesting relationships (affinities) between variables (items) in large databases. For example, a recession is associated with decline in house prices.
Commercial SPSS - PASW (formerly Clementine) SAS - Enterprise Miner IBM - Intelligent Miner StatSoft Statistical Data Miner Free and/or Open Source Weka RapidMiner
Data mining is considered to be a powerful analytical tool helping decision-makers understand the past and predict the future. However there are common myths and mistakes associated with this field.
provides instant solutions/predictions is not yet viable for business applications requires a separate, dedicated database can only be done by those with advanced degrees is only for large firms that have lots of customer data
CONCLUSION
Data Mining refers to develop business intelligence from data that an organization collects, organizes, and processes. Data mining techniques are being used by organizations to gain a better understanding of their customers and their own operations.