You are on page 1of 33

Chapter 7

Decision Tree

Decision Trees
An Algorithm for Building Decision Trees
1. Let T be the set of training instances.
2. Choose an attribute that best differentiates the
instances contained in T.
3. Create a tree node whose value is the chosen
attribute. Create child links from this node where
each link represents a unique value for the chosen
attribute. Use the child link values to further
subdivide the instances into subclasses.
Data Warehouse and Data Mining

Chapter 6

Decision Trees
An Algorithm for Building Decision Trees
4. For each subclass created in step 3:
a. If the instances in the subclass satisfy
predefined criteria or if the set of remaining attribute
choices for this path of the tree is null, specify the
classification for new instances following this
decision path.
b. If the subclass does not satisfy the
predefined criteria and there is at least one attribute
to further subdivide the path of the tree, let T be the
current set of subclass instances and return to step 2.
Data Warehouse and Data Mining

Chapter 6

Decision Trees
An Algorithm for Building Decision Trees

Data Warehouse and Data Mining

Chapter 6

Decision Trees
An Algorithm for Building Decision Trees

Data Warehouse and Data Mining

Chapter 6

Decision Trees
An Algorithm for Building Decision Trees

Data Warehouse and Data Mining

Chapter 6

Decision Trees
An Algorithm for Building Decision Trees

Data Warehouse and Data Mining

Chapter 6

Decision Trees
Decision Trees for the Credit Card Promotion Database

Data Warehouse and Data Mining

Chapter 6

Decision Trees
Decision Trees for the Credit Card Promotion Database

Data Warehouse and Data Mining

Chapter 6

Decision Trees
Decision Trees for the Credit Card Promotion Database

Data Warehouse and Data Mining

10

Chapter 6

Decision Trees
Decision Tree Rules
IF Age <= 43 & Sex = Male & Credit Card
Insurance = No
THEN Life Insurance Promotion = No
IF Sex = Male & Credit Card Insurance = No
THEN Life Insurance Promotion = No

Data Warehouse and Data Mining

11

Chapter 6

Decision Trees
General Considerations
Here is a list of a few of the many advantages
decision trees have to offer.
Decision trees are easy to understand and map
nicely to a set of production rules.
Decision trees have been successfully applied to
real problems.
Decision trees make no prior assumptions about the
nature of the data.
Decision trees are able to build models with
datasets containing numerical as well as categorical
data.
Chapter 6

Data Warehouse and Data Mining

12

Decision Trees
General Considerations
As with all data mining algorithms, there are several
issues surrounding decision tree usage. Specifically,
Output attributes must be categorical, and multiple output
attributes are not allowed.
Decision tree algorithms are unstable in that slight variations
in the training data can result in different attribute selections
at each choice point with in the tree. The effect can be
significant as attribute choices affect all descendent subtrees.
Trees created from numeric datasets can be quite complex as
attribute splits for numeric data are typically binary.
Data Warehouse and Data Mining

13

Chapter 6

Classification by Decision Tree


Induction
Decision tree

A flow-chart-like tree structure


Internal node denotes a test on an attribute
Branch represents an outcome of the test
Leaf nodes represent class labels or class distribution

Decision tree generation consists of two phases


Tree construction
At start, all the training examples are at the root
Partition examples recursively based on selected attributes
Tree pruning
Identify and remove branches that reflect noise or outliers

Use of decision tree: Classifying an unknown sample


Test the attribute values of the sample against the decision tree

Data Warehouse and Data Mining

14

Chapter 6

Training Dataset
age
<=30
<=30
3040
>40
>40
>40
3140
<=30
<=30
>40
<=30
3140
3140
>40

income student credit_rating


high
no fair
high
no excellent
high
no fair
medium
no fair
low
yes fair
low
yes excellent
low
yes excellent
medium
no fair
low
yes fair
medium
yes fair
medium
yes excellent
medium
no excellent
high
yes fair
medium
no excellent

buys_computer
no
no
yes
yes
yes
no
yes
no
yes
yes
yes
yes
yes
no

Chapter
6
This follows an example
from
Quinlans
15
ID3

Data Warehouse and Data Mining

Output: A Decision Tree for buys_computer


age?
<=30
student?

30..40
overcast

yes

>40
credit rating?

no

yes

excellent

fair

no

yes

no

yes

Data Warehouse and Data Mining

16

Chapter 6

Algorithm for Decision Tree


Induction
Basic algorithm (a greedy algorithm)

Tree is constructed in a top-down recursive divide-and-conquer


manner
At start, all the training examples are at the root
Attributes are categorical (if continuous-valued, they are
discretized in advance)
Examples are partitioned recursively based on selected
attributes
Test attributes are selected on the basis of a heuristic or
statistical measure (e.g., information gain)

Conditions for stopping partitioning


All samples for a given node belong to the same class
There are no remaining attributes for further partitioning
majority voting is employed for classifying the leaf
There are no samples left
Data Warehouse and Data Mining

17

Chapter 6

weather
sunny

Temp > 75

BBQ

rainy

Eat in

Eat in

Data Warehouse and Data Mining

cloudy

18

windy
no

yes

BBQ

Eat in

Chapter 6

Attribute Selection
Measure

Information gain (ID3/C4.5)

All attributes are assumed to be categorical


Can be modified for continuous-valued attributes

Gini index (IBM IntelligentMiner)


All attributes are assumed continuous-valued
Assume there exist several possible split values
for each attribute
May need other tools, such as clustering, to get
the possible split values
Can be modified for categorical attributes
Data Warehouse and Data Mining

19

Chapter 6

Information Gain
Select(ID3/C4.5)
the attribute with the highest
information gain
Assume there are two classes, ....P and ....N
Let the set of examples S contain
...p elements of class P
...n elements of class N
The amount of information, needed to decide if an
arbitrary example in S belongs to P or N is defined
as
p
p
n
n
Chapter
6
I ( Data
p, nMining
)
log

log 2
202
Data Warehouse and
pn
pn pn
pn

Information Gain in Decision Tree


Induction
Assume that using attribute A

a set S will be partitioned into sets

{S1, S2 , ,

Sv}

If Si contains pi examples of P and ni examples


of N ,the entropy, or the expected information
needed to classifyobjects in all subtrees Si is
pi ni
E ( A)
I ( pi , ni )
i 1 p n

The encoding information that would be


Gain
gained by branching
on (AA) I ( p, n) E ( A)
Data Warehouse and Data Mining

21

Chapter 6

age
<=30
<=30
3140
>40
>40
>40
3140
<=30
<=30
>40
<=30
3140
3140
>40

Attribute Selection by Information


Gain Computation

income student credit_rating buys_computer


high
no fair
no
high
no excellent
no
high
no fair
yes
medium no fair
yes
low
yes fair
yes
low
yes excellent
no
low
yes excellent
yes
medium no fair
no
low
yes fair
yes
medium yes fair
yes
medium yes excellent
yes
medium no excellent
yes
Data
Data Mining
high Warehouse
yes andfair
yes22
medium no excellent
no

age
<=30
3040
>40

pi
2
4
3

Chapter 6

ni I(pi, ni)
3 0.971
0 0
2 0.971

Attribute Selection by Information


Gain ComputationGain( A) I ( p, n) E ( A)
I ( p, n)

p
p
n
n
log 2

log 2
pn
pn
pn
pn

-9/(9+5)log2 9/(9+5) 5/(9+5)log25/(9+5)

Class P: buys_computer =
yes

Class N: buys_computer =
no

I(p, n) = I(9, 5) = 0.940

Compute the
for age:
p entropy
n

E ( A)

i 1

pn

I ( pi , ni )

Data Warehouse and Data Mining

23

age
<=30
3040
>40

pi
2
4
3

ni
I(pi, ni)
3 0.971
0 0
2 0.971

9
5
Chapter 6

Attribute Selection by Information


Gain Computation
I(p,

n) = I(9, 5) =0.940

pi ni
E ( A)
I ( pi , ni )
i 1 p n

Gain(age) I ( p, n) E (age)

5
4
I ( 2,3)
I ( 4,0)
14
14
5

I (3,2) 0.692
14

Gain(age) = 0.940 0.692 = 0.248

E ( age)

age
pi
ni
I(pi, ni)
<=30
2
3 0.971
3040 4
0 0
Data Warehouse 3
and Data Mining
>40
2 0.971 24

Similarly
Gain(income) 0.029
Gain( student ) 0.151
Gain(credit _ rating ) 0.048
Chapter 6

Extracting Classification Rules from


Trees

Represent the knowledge in the form of IFTHEN rules


One rule is created for each path from the
root to a leaf
Each attribute-value pair along a path
forms a conjunction
The leaf node holds the class prediction
Rules are easier for humans to understand

Data Warehouse and Data Mining

25

Chapter 6

IF age = <=30 AND student = no

THEN buys_computer = no

IF age = <=30 AND student = yes THEN buys_computer = yes


IF age = 3140
IF age = >40
= yes

THEN buys_computer = yes

AND credit_rating = excellent

THEN buys_computer

IF age = <=30 AND credit_rating = fair THEN buys_computer =


no
age?

<=30
student?

yes

no

yes

no

yes

Data Warehouse and Data Mining

overcast
30..40

>40
credit rating?
excellent

26

fair

no
Chapter
6 yes

Output: A Decision Tree for buys_computer

Avoid Overfitting in
Classification
The generated tree may overfit the training data
Too many branches...., some may reflect anomalies
due to noise or outliers

Result is in poor accuracy for unseen samples


Two approaches to avoid overfitting
Prepruning: Halt tree construction earlydo not split a
node if this would result in the goodness measure
falling below a threshold
Difficult to choose an appropriate threshold
Postpruning: Remove branches from a fully grown
treeget a sequence of progressively pruned trees
Use a set of data different from the training data to
decide which is the best pruned tree
Data Warehouse and Data Mining

27

Chapter 6

Approaches to Determine the Final Tree


Size
Separate training (2/3) and testing (1/3) sets
Use cross validation, e.g., 10-fold cross validation
Use all the data for training
but apply a statistical test (e.g., chi-square) to estimate
whether expanding or pruning a node may improve the
entire distribution

Use minimum description length (MDL) principle:


halting growth of the tree when the encoding is
minimized

Data Warehouse and Data Mining

28

Chapter 6

Enhancements to basic
decision tree induction
Allow for continuous-valued attributes
Dynamically define new discrete-valued attributes that
partition the continuous attribute value into a discrete set
of intervals

Handle missing attribute values


Assign the most common value of the attribute
Assign probability to each of the possible values

Attribute construction
Create new attributes based on existing ones that are
sparsely represented
This reduces fragmentation, repetition, and replication
Data Warehouse and Data Mining

29

Chapter 6

Classification in Large Databases


Classificationa classical problem extensively studied
by statisticians and machine learning researchers
Scalability: Classifying data sets with millions of
examples and hundreds of attributes with reasonable
speed
Why decision tree induction in data mining?
relatively faster learning speed (than other classification
methods)
convertible to simple and easy to understand classification
rules
can use SQL queries for accessing databases
comparable classification accuracy with other methods
Data Warehouse and Data Mining

30

Chapter 6

Chapter Summary
Decision trees are probably the most popular
structure for supervised data mining.
A common algorithm for building a decision tree
selects a subset of instances from the training data to
construct an initial tree.
The remaining training instances are then used to test
the accuracy of the tree.
If any instance is incorrectly classified the instance is
added to the current set of training data and the
process is repeated.
Data Warehouse and Data Mining

31

Chapter 6

Chapter Summary
A main goal is to minimize the number of tree levels
and tree nodes, thereby maximizing data
generalization.
Decision trees have been successfully applied to real
problems, are easy to understand, and map nicely to a
set of production rules.

Data Warehouse and Data Mining

32

Chapter 6

Reference
Data Mining: Concepts and Techniques (Chapter 7 Slide
for textbook), Jiawei Han and Micheline Kamber, Intelligent

Database Systems Research Lab, School of Computing


Science, Simon Fraser University, Canada

Data Warehouse and Data Mining

33

Chapter 6

You might also like