You are on page 1of 16

Chapter 27

Data Mining Concepts

Copyright © 2004 Pearson Education, Inc.


Overview of Data Mining
Technology
 Data Mining vs Data Warehousing
 KDD (Knowledge Discovery in Databases)
– Data selection
– Data cleansing
– Enrichment
– Data encoding (or transformation)
– Data mining
– Reporting or display of results

Elmasri and Nav athe, Fundamentals of Database Systems, Fourth Edition


Copyright © 2004 Pearson Education, Inc.
Slide 27-3
Goals of Data Mining
 New Knowledge For:
– Prediction (vs forecasting?)
– Identification
Detection
Authentication
– Classification
Categories/classes
– Optimization

Elmasri and Nav athe, Fundamentals of Database Systems, Fourth Edition


Copyright © 2004 Pearson Education, Inc.
Slide 27-4
Forms of Knowledge
Discovered by Data Mining
 Association Rules (Cause-Effect?)
 Classification Hierarchies
 Sequential Patterns
 Patterns Within Time Series
 Spatial Patterns
 Clustering
 Outliers
Elmasri and Nav athe, Fundamentals of Database Systems, Fourth Edition
Copyright © 2004 Pearson Education, Inc.
Slide 27-5
Association Rules
 Market-Basket Model, Support, and Confidence
– Association Rule: X ==> Y
– LHS (left-hand side) ==> RHS (right-hand side)
– X = {x1, x2, x3 … , xn}, Y = { y1, y2, y3, …, ym}
– LHS ∪ RHS is called an itemset
– Support of an rule (or itemset): percentage of times the
itemset occurs (aka prevalence!)
– Confidence of a rule
 support (LHS ∪ RHS) / support (LHS)

Elmasri and Nav athe, Fundamentals of Database Systems, Fourth Edition


Copyright © 2004 Pearson Education, Inc.
Slide 27-6
Example transactions in market-basket model.

What are some rules?


What is the itemset for each?
What is the support?
What is the confidence?
Elmasri and Nav athe, Fundamentals of Database Systems, Fourth Edition
Copyright © 2004 Pearson Education, Inc.
Slide 27-7
Association Rules Algorithms
 Apriori Algorithm
 Sampling Algorithm
 Frequent-Pattern Tree Algorithm
 Partition Algorithm
 Other Types of Association Rules
– Associations in hierarchies
– Multidimensional associations
– Negative associations
 Additional Considerations for Association Rules

Elmasri and Nav athe, Fundamentals of Database Systems, Fourth Edition


Copyright © 2004 Pearson Education, Inc.
Slide 27-8
Taxonomy of items in a supermarket.

Elmasri and Nav athe, Fundamentals of Database Systems, Fourth Edition


Copyright © 2004 Pearson Education, Inc.
Slide 27-9
Simple hierarchy of soft drinks
and chips.

Elmasri and Nav athe, Fundamentals of Database Systems, Fourth Edition


Copyright © 2004 Pearson Education, Inc.
Slide 27-10
Classification
 Supervised learning
 Labeled data - training set
 Decision tree approach
– Machine learning
– Inductive learning
– Information gain
– Entropy
 Neural networks
 Nonlinear regression
 ID3 and C4.5/C5.0
Elmasri and Nav athe, Fundamentals of Database Systems, Fourth Edition
Copyright © 2004 Pearson Education, Inc.
Slide 27-11
Example decision tree for credit
card applications.

Elmasri and Nav athe, Fundamentals of Database Systems, Fourth Edition


Copyright © 2004 Pearson Education, Inc.
Slide 27-12
Decision tree based on sample training data
where the leaf nodes are represented by a
set of RIDs of the partitioned records.

Elmasri and Nav athe, Fundamentals of Database Systems, Fourth Edition


Copyright © 2004 Pearson Education, Inc.
Slide 27-13
Clustering
 Unsupervised learning
 K-Means

Sample 2-
dimensional
records for
clustering
example (the
RID column is
not
considered).

Elmasri and Nav athe, Fundamentals of Database Systems, Fourth Edition


Copyright © 2004 Pearson Education, Inc.
Slide 27-14
Approaches to Other Data
Mining Problems
 Discovery of Sequential Patterns
 Discovery of Patterns in Time Series
 Regression
 Neural Networks
 Genetic Algorithms

Elmasri and Nav athe, Fundamentals of Database Systems, Fourth Edition


Copyright © 2004 Pearson Education, Inc.
Slide 27-15
Applications of Data Mining
 Marketing
 Computer security - intrusion detection
 Finance
 Manufacturing
 Health care
 Homeland security
 Military intelligence
 Scientific research
– Bioinformatics
– Data farming
Elmasri and Nav athe, Fundamentals of Database Systems, Fourth Edition
Copyright © 2004 Pearson Education, Inc.
Slide 27-16
Commercial Data Mining
Tools
 See Table 27.1 in Book
 http://www.kdnuggets.com/software/suites.
html

Elmasri and Nav athe, Fundamentals of Database Systems, Fourth Edition


Copyright © 2004 Pearson Education, Inc.
Slide 27-17

You might also like