Professional Documents
Culture Documents
Decision Trees
Mausam
(based on slides by UW-AI faculty)
Decision Trees
To play or
not to play?
http://www.sfgate.com/blogs/images/sfgate/sgreen/2007/09/05/2240773250x321.jpg
Outlook =
sunny,
overcast,
rain
Humidity =
high, normal
Wind = weak,
strong
Normal
Yes
Outlook
Overcast
Rain
Wind
Yes
High
No
Strong
Weak
No
Yes
Decision Trees
Input: Description of an object or a situation through a
set of attributes
Output: a decision that is the predicted output value
for the input
x1
6
Decision Tree
x2
x1
7
10
12
http://a.espncdn.com/media/ten/2006/0306/photo/g_mcenroe_195.jpg
15
p
n
p
p
n
n
I(
,
)
log 2
log 2
pn pn
pn
pn pn
pn
16
Entropy is
highest
when
uncertainty
is greatest
Entropy I
1.0
0.5
Wait = F
Wait = T
.00
.50
1.00
P(Wait = T)
17
18
20
p i ni
pi
ni
Entropy
(A)
Entropy (
,
)
AvgEntropy
pi ni pi ni
i 1 p n
n
Information gain
Information Gain (IG) or reduction in entropy from
using attribute A:
22
2
4
6 2 4
IG( Patrons ) 1 [ I (0,1) I (1,0) I ( , )] .541 bits
12
12
12 6 6
2 1 1
2 1 1
4 2 2
4 2 2
IG(Type) 1 [ I ( , ) I ( , ) I ( , ) I ( , )] 0 bits
12 2 2 12 2 2 12 4 4 12 4 4
24
Performance Evaluation
How do we know that the learned tree h f ?
Answer: Try h on a new test set of examples
Learning curve = % correct on test set as a function
of training set size
25
Overfitting
On training data
Accuracy
On test data
0.9
0.8
0.7
0.6
27
28
29
Test
Tune
Tune
Tune
30
Humidity
High
Low
Dont play
Play
Outlook
Overcast
Play
Rain
Wind
Strong
Weak
Dont play
Play
Outlook
Overcast
Dont play
Play
Rain
Wind
Strong
Weak
Dont play
Play
Humidity
High
Low
Dont play
Play
Outlook
Overcast
Play
Rain
Play
Outlook
Overcast
Dont play
Play
Rain
Wind
Strong
Weak
Dont play
Play
Early Stopping
0.9
On training data
On test data
On validation data
0.8
0.7
0.6
Conversion to Rule
Sunny
Humidity
High
Low
Dont play
Play
Outlook
Rain
Overcast
Play
Wind
Strong
Weak
Dont play
Play
37
38
39
40