DWDM Chapter 7 Decision Tree

Chapter 7
Decision Tree
Decision Trees
An Algorithm for Building Decision Trees
1. Let T be the set of training instances.
2. Choose an attribute that best differentiates the
instances contained in T.
3. Create a tree node whose value is the chosen
attribute. Create child links from this node where
each link represents a unique value for the chosen
attribute. Use the child link values to further
subdivide the instances into subclasses.
Data Warehouse and Data Mining
Chapter 6
Decision Trees
4. For each subclass created in step 3:
a. If the instances in the subclass satisfy
predefined criteria or if the set of remaining attribute
choices for this path of the tree is null, specify the
classification for new instances following this
decision path.
b. If the subclass does not satisfy the
predefined criteria and there is at least one attribute
to further subdivide the path of the tree, let T be the
current set of subclass instances and return to step 2.
Chapter 6
Decision Trees
Chapter 6
Decision Trees
Chapter 6
Decision Trees
Chapter 6
Decision Trees
Chapter 6
Decision Trees
Decision Trees for the Credit Card Promotion Database
Chapter 6
Decision Trees
Chapter 6
Decision Trees
10
Chapter 6
Decision Trees
Decision Tree Rules
IF Age <= 43 & Sex = Male & Credit Card
Insurance = No
THEN Life Insurance Promotion = No
IF Sex = Male & Credit Card Insurance = No
THEN Life Insurance Promotion = No
11
Chapter 6
Decision Trees
General Considerations
Here is a list of a few of the many advantages
decision trees have to offer.
Decision trees are easy to understand and map
nicely to a set of production rules.
Decision trees have been successfully applied to
real problems.
Decision trees make no prior assumptions about the
nature of the data.
Decision trees are able to build models with
datasets containing numerical as well as categorical
data.
Chapter 6
12
Decision Trees
General Considerations
As with all data mining algorithms, there are several
issues surrounding decision tree usage. Specifically,
Output attributes must be categorical, and multiple output
attributes are not allowed.
Decision tree algorithms are unstable in that slight variations
in the training data can result in different attribute selections
at each choice point with in the tree. The effect can be
significant as attribute choices affect all descendent subtrees.
Trees created from numeric datasets can be quite complex as
attribute splits for numeric data are typically binary.
13
Chapter 6
Classification by Decision Tree

Induction
Decision tree
A flow-chart-like tree structure

Internal node denotes a test on an attribute
Branch represents an outcome of the test
Leaf nodes represent class labels or class distribution
Decision tree generation consists of two phases

Tree construction
At start, all the training examples are at the root
Partition examples recursively based on selected attributes
Tree pruning
Identify and remove branches that reflect noise or outliers
Use of decision tree: Classifying an unknown sample

Test the attribute values of the sample against the decision tree
14
Chapter 6
Training Dataset
age
<=30
<=30
3040
>40
>40
>40
3140
<=30
<=30
>40
<=30
3140
3140
>40
income student credit_rating

high
no fair
high
no excellent
high
no fair
medium
no fair
low
yes fair
low
yes excellent
low
yes excellent
medium
no fair
low
yes fair
medium
yes fair
medium
yes excellent
medium
no excellent
high
yes fair
medium
no excellent
buys_computer
no
no
yes
yes
yes
no
yes
no
yes
yes
yes
yes
yes
no
Chapter
6
This follows an example
from
Quinlans
15
ID3
Output: A Decision Tree for buys_computer

age?
<=30
student?
30..40
overcast
yes
>40
credit rating?
no
yes
excellent
fair
no
yes
no
yes
16
Chapter 6
Algorithm for Decision Tree

Induction
Basic algorithm (a greedy algorithm)
Tree is constructed in a top-down recursive divide-and-conquer

manner
At start, all the training examples are at the root
Attributes are categorical (if continuous-valued, they are
discretized in advance)
Examples are partitioned recursively based on selected
attributes
Test attributes are selected on the basis of a heuristic or
statistical measure (e.g., information gain)
Conditions for stopping partitioning

All samples for a given node belong to the same class
There are no remaining attributes for further partitioning
majority voting is employed for classifying the leaf
There are no samples left
17
Chapter 6
weather
sunny
Temp > 75
BBQ
rainy
Eat in
Eat in
cloudy
18
windy
no
yes
BBQ
Eat in
Chapter 6
Attribute Selection
Measure
Information gain (ID3/C4.5)
All attributes are assumed to be categorical

Can be modified for continuous-valued attributes
Gini index (IBM IntelligentMiner)

All attributes are assumed continuous-valued
Assume there exist several possible split values
for each attribute
May need other tools, such as clustering, to get
the possible split values
Can be modified for categorical attributes
19
Chapter 6
Information Gain
Select(ID3/C4.5)
the attribute with the highest
information gain
Assume there are two classes, ....P and ....N
Let the set of examples S contain
...p elements of class P
...n elements of class N
The amount of information, needed to decide if an
arbitrary example in S belongs to P or N is defined
as
p
p
n
n
Chapter
6
I ( Data
p, nMining
)
log
log 2
202
Data Warehouse and
pn
pn pn
pn
Information Gain in Decision Tree

Induction
Assume that using attribute A
a set S will be partitioned into sets
{S1, S2 , ,
Sv}
If Si contains pi examples of P and ni examples

of N ,the entropy, or the expected information
needed to classifyobjects in all subtrees Si is
pi ni
E ( A)
I ( pi , ni )
i 1 p n
The encoding information that would be

Gain
gained by branching
on (AA) I ( p, n) E ( A)
21
Chapter 6
age
<=30
<=30
3140
>40
>40
>40
3140
<=30
<=30
>40
<=30
3140
3140
>40
Attribute Selection by Information

Gain Computation
income student credit_rating buys_computer

high
no fair
no
high
no excellent
no
high
no fair
yes
medium no fair
yes
low
yes fair
yes
low
yes excellent
no
low
yes excellent
yes
medium no fair
no
low
yes fair
yes
medium yes fair
yes
medium yes excellent
yes
medium no excellent
yes
Data
Data Mining
high Warehouse
yes andfair
yes22
medium no excellent
no
age
<=30
3040
>40
pi
2
4
3
Chapter 6
ni I(pi, ni)
3 0.971
0 0
2 0.971

Gain ComputationGain( A) I ( p, n) E ( A)
I ( p, n)
p
p
n
n
log 2
log 2
pn
pn
pn
pn
-9/(9+5)log2 9/(9+5) 5/(9+5)log25/(9+5)
Class P: buys_computer =
yes
Class N: buys_computer =
no
I(p, n) = I(9, 5) = 0.940
Compute the
for age:
p entropy
n
E ( A)
i 1
pn
I ( pi , ni )
23
age
<=30
3040
>40
pi
2
4
3
ni
I(pi, ni)
3 0.971
0 0
2 0.971
9
5
Chapter 6

Gain Computation
I(p,
n) = I(9, 5) =0.940
pi ni
E ( A)
I ( pi , ni )
i 1 p n
Gain(age) I ( p, n) E (age)
5
4
I ( 2,3)
I ( 4,0)
14
14
5
I (3,2) 0.692
14
Gain(age) = 0.940 0.692 = 0.248
E ( age)
age
pi
ni
I(pi, ni)
<=30
2
3 0.971
3040 4
0 0
Data Warehouse 3
and Data Mining
>40
2 0.971 24
Similarly
Gain(income) 0.029
Gain( student ) 0.151
Gain(credit _ rating ) 0.048
Chapter 6
Extracting Classification Rules from

Trees
Represent the knowledge in the form of IFTHEN rules

One rule is created for each path from the
root to a leaf
Each attribute-value pair along a path
forms a conjunction
The leaf node holds the class prediction
Rules are easier for humans to understand
25
Chapter 6
IF age = <=30 AND student = no
THEN buys_computer = no
IF age = <=30 AND student = yes THEN buys_computer = yes

IF age = 3140
IF age = >40
= yes
THEN buys_computer = yes
AND credit_rating = excellent
THEN buys_computer
IF age = <=30 AND credit_rating = fair THEN buys_computer =

no
age?
<=30
student?
yes
no
yes
no
yes
overcast
30..40
>40
credit rating?
excellent
26
fair
no
Chapter
6 yes
Output: A Decision Tree for buys_computer
Avoid Overfitting in
Classification
The generated tree may overfit the training data
Too many branches...., some may reflect anomalies
due to noise or outliers
Result is in poor accuracy for unseen samples

Two approaches to avoid overfitting
Prepruning: Halt tree construction earlydo not split a
node if this would result in the goodness measure
falling below a threshold
Difficult to choose an appropriate threshold
Postpruning: Remove branches from a fully grown
treeget a sequence of progressively pruned trees
Use a set of data different from the training data to
decide which is the best pruned tree
27
Chapter 6
Approaches to Determine the Final Tree

Size
Separate training (2/3) and testing (1/3) sets
Use cross validation, e.g., 10-fold cross validation
Use all the data for training
but apply a statistical test (e.g., chi-square) to estimate
whether expanding or pruning a node may improve the
entire distribution
Use minimum description length (MDL) principle:

halting growth of the tree when the encoding is
minimized
28
Chapter 6
Enhancements to basic
decision tree induction
Allow for continuous-valued attributes
Dynamically define new discrete-valued attributes that
partition the continuous attribute value into a discrete set
of intervals
Handle missing attribute values

Assign the most common value of the attribute
Assign probability to each of the possible values
Attribute construction
Create new attributes based on existing ones that are
sparsely represented
This reduces fragmentation, repetition, and replication
29
Chapter 6
Classification in Large Databases

Classificationa classical problem extensively studied
by statisticians and machine learning researchers
Scalability: Classifying data sets with millions of
examples and hundreds of attributes with reasonable
speed
Why decision tree induction in data mining?
relatively faster learning speed (than other classification
methods)
convertible to simple and easy to understand classification
rules
can use SQL queries for accessing databases
comparable classification accuracy with other methods
30
Chapter 6
Chapter Summary
Decision trees are probably the most popular
structure for supervised data mining.
A common algorithm for building a decision tree
selects a subset of instances from the training data to
construct an initial tree.
The remaining training instances are then used to test
the accuracy of the tree.
If any instance is incorrectly classified the instance is
added to the current set of training data and the
process is repeated.
31
Chapter 6
Chapter Summary
A main goal is to minimize the number of tree levels
and tree nodes, thereby maximizing data
generalization.
Decision trees have been successfully applied to real
problems, are easy to understand, and map nicely to a
set of production rules.
32
Chapter 6
Reference
Data Mining: Concepts and Techniques (Chapter 7 Slide
for textbook), Jiawei Han and Micheline Kamber, Intelligent
Database Systems Research Lab, School of Computing

Science, Simon Fraser University, Canada
33
Chapter 6

DWDM Chapter 7 Decision Tree

Uploaded by

Document Information

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

DWDM Chapter 7 Decision Tree

Uploaded by

Copyright:

Available Formats

Chapter 7

Data Warehouse and Data Mining

Data Warehouse and Data Mining

Data Warehouse and Data Mining

Data Warehouse and Data Mining

Data Warehouse and Data Mining

Data Warehouse and Data Mining

Data Warehouse and Data Mining

Data Warehouse and Data Mining

Data Warehouse and Data Mining

Classification by Decision Tree

A flow-chart-like tree structure

Decision tree generation consists of two phases

Use of decision tree: Classifying an unknown sample

Data Warehouse and Data Mining

income student credit_rating

Data Warehouse and Data Mining

Output: A Decision Tree for buys_computer

Data Warehouse and Data Mining

Algorithm for Decision Tree

Tree is constructed in a top-down recursive divide-and-conquer

Conditions for stopping partitioning

Data Warehouse and Data Mining

Information gain (ID3/C4.5)

All attributes are assumed to be categorical

Gini index (IBM IntelligentMiner)

Information Gain in Decision Tree

a set S will be partitioned into sets

If Si contains pi examples of P and ni examples

The encoding information that would be

Attribute Selection by Information

income student credit_rating buys_computer

Attribute Selection by Information

-9/(9+5)log2 9/(9+5) 5/(9+5)log25/(9+5)

I(p, n) = I(9, 5) = 0.940

Data Warehouse and Data Mining

Attribute Selection by Information

Gain(age) = 0.940 0.692 = 0.248

Extracting Classification Rules from

Represent the knowledge in the form of IFTHEN rules

Data Warehouse and Data Mining

IF age = <=30 AND student = no

IF age = <=30 AND student = yes THEN buys_computer = yes

THEN buys_computer = yes

AND credit_rating = excellent

IF age = <=30 AND credit_rating = fair THEN buys_computer =

Data Warehouse and Data Mining

Output: A Decision Tree for buys_computer

Result is in poor accuracy for unseen samples

Approaches to Determine the Final Tree

Use minimum description length (MDL) principle:

Data Warehouse and Data Mining

Handle missing attribute values

Classification in Large Databases

Data Warehouse and Data Mining

Database Systems Research Lab, School of Computing

Data Warehouse and Data Mining

You might also like