Professional Documents
Culture Documents
Decision Tree
Decision Trees
An Algorithm for Building Decision Trees
1. Let T be the set of training instances.
2. Choose an attribute that best differentiates the
instances contained in T.
3. Create a tree node whose value is the chosen
attribute. Create child links from this node where
each link represents a unique value for the chosen
attribute. Use the child link values to further
subdivide the instances into subclasses.
Data Warehouse and Data Mining
Chapter 6
Decision Trees
An Algorithm for Building Decision Trees
4. For each subclass created in step 3:
a. If the instances in the subclass satisfy
predefined criteria or if the set of remaining attribute
choices for this path of the tree is null, specify the
classification for new instances following this
decision path.
b. If the subclass does not satisfy the
predefined criteria and there is at least one attribute
to further subdivide the path of the tree, let T be the
current set of subclass instances and return to step 2.
Data Warehouse and Data Mining
Chapter 6
Decision Trees
An Algorithm for Building Decision Trees
Chapter 6
Decision Trees
An Algorithm for Building Decision Trees
Chapter 6
Decision Trees
An Algorithm for Building Decision Trees
Chapter 6
Decision Trees
An Algorithm for Building Decision Trees
Chapter 6
Decision Trees
Decision Trees for the Credit Card Promotion Database
Chapter 6
Decision Trees
Decision Trees for the Credit Card Promotion Database
Chapter 6
Decision Trees
Decision Trees for the Credit Card Promotion Database
10
Chapter 6
Decision Trees
Decision Tree Rules
IF Age <= 43 & Sex = Male & Credit Card
Insurance = No
THEN Life Insurance Promotion = No
IF Sex = Male & Credit Card Insurance = No
THEN Life Insurance Promotion = No
11
Chapter 6
Decision Trees
General Considerations
Here is a list of a few of the many advantages
decision trees have to offer.
Decision trees are easy to understand and map
nicely to a set of production rules.
Decision trees have been successfully applied to
real problems.
Decision trees make no prior assumptions about the
nature of the data.
Decision trees are able to build models with
datasets containing numerical as well as categorical
data.
Chapter 6
12
Decision Trees
General Considerations
As with all data mining algorithms, there are several
issues surrounding decision tree usage. Specifically,
Output attributes must be categorical, and multiple output
attributes are not allowed.
Decision tree algorithms are unstable in that slight variations
in the training data can result in different attribute selections
at each choice point with in the tree. The effect can be
significant as attribute choices affect all descendent subtrees.
Trees created from numeric datasets can be quite complex as
attribute splits for numeric data are typically binary.
Data Warehouse and Data Mining
13
Chapter 6
14
Chapter 6
Training Dataset
age
<=30
<=30
3040
>40
>40
>40
3140
<=30
<=30
>40
<=30
3140
3140
>40
buys_computer
no
no
yes
yes
yes
no
yes
no
yes
yes
yes
yes
yes
no
Chapter
6
This follows an example
from
Quinlans
15
ID3
30..40
overcast
yes
>40
credit rating?
no
yes
excellent
fair
no
yes
no
yes
16
Chapter 6
17
Chapter 6
weather
sunny
Temp > 75
BBQ
rainy
Eat in
Eat in
cloudy
18
windy
no
yes
BBQ
Eat in
Chapter 6
Attribute Selection
Measure
19
Chapter 6
Information Gain
Select(ID3/C4.5)
the attribute with the highest
information gain
Assume there are two classes, ....P and ....N
Let the set of examples S contain
...p elements of class P
...n elements of class N
The amount of information, needed to decide if an
arbitrary example in S belongs to P or N is defined
as
p
p
n
n
Chapter
6
I ( Data
p, nMining
)
log
log 2
202
Data Warehouse and
pn
pn pn
pn
{S1, S2 , ,
Sv}
21
Chapter 6
age
<=30
<=30
3140
>40
>40
>40
3140
<=30
<=30
>40
<=30
3140
3140
>40
age
<=30
3040
>40
pi
2
4
3
Chapter 6
ni I(pi, ni)
3 0.971
0 0
2 0.971
p
p
n
n
log 2
log 2
pn
pn
pn
pn
Class P: buys_computer =
yes
Class N: buys_computer =
no
Compute the
for age:
p entropy
n
E ( A)
i 1
pn
I ( pi , ni )
23
age
<=30
3040
>40
pi
2
4
3
ni
I(pi, ni)
3 0.971
0 0
2 0.971
9
5
Chapter 6
n) = I(9, 5) =0.940
pi ni
E ( A)
I ( pi , ni )
i 1 p n
Gain(age) I ( p, n) E (age)
5
4
I ( 2,3)
I ( 4,0)
14
14
5
I (3,2) 0.692
14
E ( age)
age
pi
ni
I(pi, ni)
<=30
2
3 0.971
3040 4
0 0
Data Warehouse 3
and Data Mining
>40
2 0.971 24
Similarly
Gain(income) 0.029
Gain( student ) 0.151
Gain(credit _ rating ) 0.048
Chapter 6
25
Chapter 6
THEN buys_computer = no
THEN buys_computer
<=30
student?
yes
no
yes
no
yes
overcast
30..40
>40
credit rating?
excellent
26
fair
no
Chapter
6 yes
Avoid Overfitting in
Classification
The generated tree may overfit the training data
Too many branches...., some may reflect anomalies
due to noise or outliers
27
Chapter 6
28
Chapter 6
Enhancements to basic
decision tree induction
Allow for continuous-valued attributes
Dynamically define new discrete-valued attributes that
partition the continuous attribute value into a discrete set
of intervals
Attribute construction
Create new attributes based on existing ones that are
sparsely represented
This reduces fragmentation, repetition, and replication
Data Warehouse and Data Mining
29
Chapter 6
30
Chapter 6
Chapter Summary
Decision trees are probably the most popular
structure for supervised data mining.
A common algorithm for building a decision tree
selects a subset of instances from the training data to
construct an initial tree.
The remaining training instances are then used to test
the accuracy of the tree.
If any instance is incorrectly classified the instance is
added to the current set of training data and the
process is repeated.
Data Warehouse and Data Mining
31
Chapter 6
Chapter Summary
A main goal is to minimize the number of tree levels
and tree nodes, thereby maximizing data
generalization.
Decision trees have been successfully applied to real
problems, are easy to understand, and map nicely to a
set of production rules.
32
Chapter 6
Reference
Data Mining: Concepts and Techniques (Chapter 7 Slide
for textbook), Jiawei Han and Micheline Kamber, Intelligent
33
Chapter 6