Professional Documents
Culture Documents
\
|
+
+
+
= x x f
1 ) ( ,
0
0
1
1
1
0
0
1
1
+ =
|
|
|
|
|
|
|
|
|
|
|
|
|
.
|
\
|
+
+
+
= x x f
1 ) ( ,
0
1
0
0
1
0
1
1
1
+ =
|
|
|
|
|
|
|
|
|
|
|
|
|
.
|
\
|
+
+
+
= x x f
Etc...
Inductive learning method
Construct/adjust h to agree with f on training set
(h is consistent if it agrees with f on all examples
E.g., curve fitting:
Inductive learning method
Construct/adjust h to agree with f on training set
(h is consistent if it agrees with f on all examples)
E.g., curve fitting:
Inductive learning method
Construct/adjust h to agree with f on training set
(h is consistent if it agrees with f on all examples)
E.g., curve fitting:
Inductive learning method
Construct/adjust h to agree with f on training set
(h is consistent if it agrees with f on all examples)
E.g., curve fitting:
Inductive learning method
Construct/adjust h to agree with f on training set
(h is consistent if it agrees with f on all examples)
E.g., curve fitting:
How do we choose from among multiple consistent
hypothesis?
Ockhams razor: prefer the simplest hypothesis
consistent with data
Learning decision trees
Problem: decide whether to wait for a table at
a restaurant, based on the following attributes:
1.Alternate: is there an alternative restaurant nearby?
2.Bar: is there a comfortable bar area to wait in?
3.Fri/Sat: is today Friday or Saturday?
4.Hungry: are we hungry?
5.Patrons: number of people in the restaurant (None, Some,
Full)
6.Price: price range ($, $$, $$$)
7.Raining: is it raining outside?
8.Reservation: have we made a reservation?
9.Type: kind of restaurant (French, Italian, Thai, Burger)
10.Wait Estimate: estimated waiting time (0-10, 10-30, 30-
60, >60)
The Restaurant Domain
Attributes Goal
Example Fri Hun Pat Price Rain Res Type Est WillWait
X
1
No Yes Some $$$ No Yes French 0-10 Yes
X
2
No Yes Full $ No No Thai 30-60 No
X
3
No No Some $ No No Burger 0-10 Yes
X
4
Yes Yes Full $ No No Thai 10-30 Yes
X
5
Yes No Full $$$ No Yes French >60 No
X
6
No Yes Some $$ Yes Yes Italian 0-10 Yes
X
7
No No None $ Yes No Burger 0-10 No
X
8
No Yes Some $$ Yes Yes Thai 0-10 Yes
X
9
Yes No Full $ Yes No Burger >60 No
X
10
Yes Yes Full $$$ No Yes Italian 10-30 No
X
11
No No None $ No No Thai 0-10 No
X
12
Yes Yes Full $ No No Burger 30-60 Yes
Will we wait, or not?
Splitting Examples by Testing on
Attributes
+ X1, X3, X4, X6, X8, X12 (Positive examples)
- X2, X5, X7, X9, X10, X11 (Negative examples)
Splitting Examples by Testing on Attributes (cont)
+ X1, X3, X4, X6, X8, X12 (Positive examples) - X2, X5,
X7, X9, X10, X11 (Negative examples)
+ X1, X3, X4, X6, X8, X12 (Positive ex)
- X2, X5, X7, X9, X10, X11 (Negative ex)
Splitting Examples by Testing on
Attributes (cont)
+ X1, X3, X4, X6, X8, X12 (Positive examples) - X2, X5,
X7, X9, X10, X11 (Negative examples)
Patrons?
+
- X7, X11
none
some
full
+X1, X3, X6, X8
-
+X4, X12
- X2, X5, X9, X10
No Yes
Splitting Examples by Testing on Attributes (cont)
+ X1, X3, X4, X6, X8, X12 (Positive examples) - X2, X5,
X7, X9, X10, X11 (Negative examples)
Yes No
Decision tree learning example
Induced tree (from examples)
Decision tree learning example
True tree
Goal Predicate:
Will wait for a table?
Patrons?
WaitEst?
Hungry?
Yes
none
some
full
>60
30-60 10-30
0-10
no
yes
Logical Representation of a Path
r [Patrons(r, full) . Wait_Estimate(r, 10-30) .
Hungry(r, yes)] Will_Wait(r)
Choosing an attribute
Idea: a good attribute splits the examples into
subsets that are (ideally) "all positive" or "all
negative"
Patrons? is a better choice
Patrons?
+
- X7, X11
none
some
full
+X1, X3, X6, X8
-
+X4, X12
- X2, X5, X9, X10
Type?
+ X1
- X5
French
Italian Thai
+X6
- X10
+X3, X12
- X7, X9
+ X4,X8
- X2, X11
Burger
What Makes a Good Attribute?
Better
Attribute
Not As
Good An
Attribute
Decision tree learning example:
Choosing attribute Test
T = True, F = False
6 True,
6 False
( ) ( ) ( ) ( ) 1
12
6
log
12
6
12
6
log
12
6
Entropy 2 2 = =
Decision tree learning example
( ) ( ) ( ) ( ) | | ( ) ( ) ( ) ( ) | | 1
6
3
ln
6
3
6
3
ln
6
3
12
6
6
3
ln
6
3
6
3
ln
6
3
12
6
Entropy = + =
Alternate?
3 T, 3 F 3 T, 3 F
Yes No
Entropy decrease for Alternate= 1 1= 0
Decision tree learning example
( ) ( ) ( ) ( ) | | ( ) ( ) ( ) ( ) | | 1
6
3
ln
6
3
6
3
ln
6
3
12
6
6
3
ln
6
3
6
3
ln
6
3
12
6
Entropy = + =
Bar?
3 T, 3 F 3 T, 3 F
Yes No
Entropy decrease for Bar= 1 1= 0
Decision tree learning example
( ) ( ) ( ) ( ) | | ( ) ( ) ( ) ( ) | | 98 . 0
7
3
ln
7
3
7
4
ln
7
4
12
7
5
3
ln
5
3
5
2
ln
5
2
12
5
Entropy = + =
Fri
2 T, 3 F 4 T, 3 F
Yes No
Entropy decrease for Fri/Sat= 1 0.98 = 0.02
Decision tree learning example
( ) ( ) ( ) ( ) | | ( ) ( ) ( ) ( ) | | 804 . 0
5
4
ln
5
4
5
1
ln
5
1
12
5
7
2
ln
7
2
7
5
ln
7
5
12
7
Entropy = + =
Hungry?
5 T, 2 F 1 T, 4 F
Yes No
Entropy decrease for Hungry= 1 0.804 = 0.19
Decision tree learning example
( ) ( ) ( ) ( ) | | ( ) ( ) ( ) ( ) | | 1
8
4
ln
8
4
8
4
ln
8
4
12
8
4
2
ln
4
2
4
2
ln
4
2
12
4
Entropy = + =
Raining?
2 T, 2 F 4 T, 4 F
Yes No
Entropy decrease for Raining= 1 1 = 0
Decision tree learning example
( ) ( ) ( ) ( ) | | ( ) ( ) ( ) ( ) | | 978 . 0
7
4
ln
7
4
7
3
ln
7
3
12
7
5
2
ln
5
2
5
3
ln
5
3
12
5
Entropy = + =
Reservation?
3 T, 2 F 3 T, 4 F
Yes No
Entropy decrease for Reservation = 1 0.978 = 0.02
Decision tree learning example
( ) ( ) ( ) ( ) | | ( ) ( ) ( ) ( ) | |
( ) ( ) ( ) ( ) | | 456 . 0
6
4
ln
6
4
6
2
ln
6
2
12
6
4
0
ln
4
0
4
4
ln
4
4
12
4
2
2
ln
2
2
2
0
ln
2
0
12
2
Entropy
= +
+ =
Patrons?
2 F
4 T
None
Full
Entropy decrease for Patrons= 1 0.456 = 0.543
2 T, 4 F
Some
Decision tree learning example
( ) ( ) ( ) ( ) | | ( ) ( ) ( ) ( ) | |
( ) ( ) ( ) ( ) | | 77 . 0
4
3
ln
4
3
4
1
ln
4
1
12
4
2
0
ln
2
0
2
2
ln
2
2
12
2
6
3
ln
6
3
6
3
ln
6
3
12
6
Entropy
= +
+ =
Price
3 T, 3 F
2 T
$
$$$
Entropy decrease for Price = 1 0.77 = 0.23
1 T, 3 F
$$
Decision tree learning example
( ) ( ) ( ) ( ) | | ( ) ( ) ( ) ( ) | |
( ) ( ) ( ) ( ) | | ( ) ( ) ( ) ( ) | | 1
4
2
ln
4
2
4
2
ln
4
2
12
4
4
2
ln
4
2
4
2
ln
4
2
12
4
2
1
ln
2
1
2
1
ln
2
1
12
2
2
1
ln
2
1
2
1
ln
2
1
12
2
Entropy
= + +
+ =
Type
1 T, 1 F
1 T, 1 F
French Burger
Entropy decrease for Type= 1 1 = 0
2 T, 2 F
Italian
2 T, 2 F
Thai
Decision tree learning example
( ) ( ) ( ) ( ) | | ( ) ( ) ( ) ( ) | |
( ) ( ) ( ) ( ) | | ( ) ( ) ( ) ( ) | | 792 . 0
2
2
ln
2
2
2
0
ln
2
0
12
2
2
1
ln
2
1
2
1
ln
2
1
12
2
2
1
ln
2
1
2
1
ln
2
1
12
2
6
2
ln
6
2
6
4
ln
6
4
12
6
Entropy
= + +
+ =
Est. waiting
time
4 T, 2 F
1 T, 1 F
0-10 > 60
Entropy decrease for Est = 1 0.792 = 0.21
2 F
10-30
1 T, 1 F
30-60
Entropy for each Attribute
Entropy decrease for Alternate= 1 1= 0
Entropy decrease for Bar= 1 1= 0
Entropy decrease for Fri/Sat= 1 0.98 = 0.02
Entropy decrease for Hungry= 1 0.804 = 0.19
Entropy decrease for Raining= 1 1 = 0
Entropy decrease for Reservation = 1 0.978 = 0.02
Entropy decrease for Patrons= 1 0.456 = 0.543
Entropy decrease for Price = 1 0.77 = 0.23
Entropy decrease for Est = 1 0.792 = 0.21
Decision tree learning example
Patrons?
2 F
4 T
None
Full
Largest entropy decrease (0.543)
achieved by splitting on Patrons.
2 T, 4 F
Some
X?
Continue like this, making new splits,
always purifying nodes.
Next step
Given Patrons as root node, the next attribute chosen is
Hungry?
( ) ( ) ( ) ( ) | | ( ) ( ) ( ) ( ) | | 33 . 0
2
2
ln
2
2
2
0
ln
2
0
12
2
4
2
ln
4
2
4
2
ln
4
2
12
4
Entropy = + =
Entropy decrease for Hungry= 1 0.33= 0.666
Decision tree learning
Aim: find a small tree consistent with the training
examples
Idea: (recursively) choose "most significant" attribute
as root of (sub)tree.
Thank You