You are on page 1of 46

Chapter 8: Learning

By, Safa Hamdare


Learning
Learning is essential for unknown environments,
i.e., when designer lacks omniscience

Learning is useful as a system construction
method,
i.e., expose the agent to reality rather than trying to
write it down

Learning modifies the agent's decision
mechanisms to improve performance
Learning agents
Learning Agent
Four Components
1. Performance Element: collection of knowledge
and procedures to decide on the next action.
E.g. walking, turning, drawing, etc.
2. Learning Element: takes in feedback from the
critic and modifies the performance element
accordingly.
3. Critic: provides the learning element with
information on how well the agent is doing based on
a fixed performance standard.
E.g. the audience
4. Problem Generator: provides the performance
element with suggestions on new actions to take.


Designing Learning element
Design of a learning element is affected by
4 major issues:
1. Which components of the performance
element to improve
2. The representation of those
components
3. Available feedback
4. Prior knowledge
Two types of learning in AI
1. Deductive: Deduce rules/facts from already
known rules/facts. (We have already dealt
with this)



2. Inductive: Learn new rules/facts from a data
set D.
( ) ( ) C A C B A
{ } ( ) C A n y n
N n
=
= ... 1
) ( ), ( x D
Types of Inductive Learning
1. Supervised learning: Inputs and Outputs
available. For every input, the learner is provided with
a target; that is, the environment tells the learner what its
response should





2. Unsupervised learning: no hint of correct
outcome. the learner receives no feedback from the
world at all just examples (e.g. the same figures as
above , without the labels)
3. Reinforcement learning: evaluation of action. the
learner receives feedback about the appropriateness of its
response. i.e. occasional rewards
M M M F F F
Comparison between types of
inductive learning
1. Supervised: The machine has access to
a teacher who corrects it.


2. Unsupervised: No access to teacher.
Instead, the machine must search for
order and structure in the
environment.
Inductive Learning
Key idea:
To use specific examples to reach
general conclusions
Given a set of examples, the system
tries to approximate the evaluation
function.
Also called Pure Inductive Inference.
Recognizing Handwritten Digits
Learning Agent
Training Examples
Recognizing Handwritten
Digits
Different variations of handwritten 3s
Bias
Bias: Any preference for one hypothesis
over another, beyond mere consistency
with the examples.
Since there are almost always a large
number of possible consistent hypotheses,
all learning algorithms exhibit some sort
of bias.
Example:


Formal Definition for Inductive Learning
Simplest form: learn a function from
examples
Example: a pair (x, f(x)), where
x is the input,
f(x) is the output of the function /target function
applied to x.
hypothesis: a function h that
approximates f, given a set of examples.
Task of induction: Given a set of examples,
find a function h that approximates the true
evaluation function f.

Inductive learning - Example 1
f(x) is the target function
An example is a pair [x, f(x)]
Learning task: find a hypothesis h such that h(x) ~ f(x) given a
training set of examples D = {[x
i
, f(x
i
) ]}, i = 1,2,,N




1 ) ( ,
0
0
1
0
1
0
1
1
1
+ =
|
|
|
|
|
|
|
|
|
|
|
|
|
.
|

\
|
+
+
+

= x x f



1 ) ( ,
0
0
1
1
1
0
0
1
1
+ =
|
|
|
|
|
|
|
|
|
|
|
|
|
.
|

\
|
+
+
+

= x x f



1 ) ( ,
0
1
0
0
1
0
1
1
1
+ =
|
|
|
|
|
|
|
|
|
|
|
|
|
.
|

\
|
+
+
+

= x x f
Etc...
Inductive learning method
Construct/adjust h to agree with f on training set
(h is consistent if it agrees with f on all examples
E.g., curve fitting:

Inductive learning method
Construct/adjust h to agree with f on training set
(h is consistent if it agrees with f on all examples)
E.g., curve fitting:
Inductive learning method
Construct/adjust h to agree with f on training set
(h is consistent if it agrees with f on all examples)
E.g., curve fitting:

Inductive learning method
Construct/adjust h to agree with f on training set
(h is consistent if it agrees with f on all examples)
E.g., curve fitting:

Inductive learning method
Construct/adjust h to agree with f on training set
(h is consistent if it agrees with f on all examples)
E.g., curve fitting:
How do we choose from among multiple consistent
hypothesis?







Ockhams razor: prefer the simplest hypothesis
consistent with data

Learning decision trees
Problem: decide whether to wait for a table at
a restaurant, based on the following attributes:
1.Alternate: is there an alternative restaurant nearby?
2.Bar: is there a comfortable bar area to wait in?
3.Fri/Sat: is today Friday or Saturday?
4.Hungry: are we hungry?
5.Patrons: number of people in the restaurant (None, Some,
Full)
6.Price: price range ($, $$, $$$)
7.Raining: is it raining outside?
8.Reservation: have we made a reservation?
9.Type: kind of restaurant (French, Italian, Thai, Burger)
10.Wait Estimate: estimated waiting time (0-10, 10-30, 30-
60, >60)
The Restaurant Domain
Attributes Goal
Example Fri Hun Pat Price Rain Res Type Est WillWait
X
1
No Yes Some $$$ No Yes French 0-10 Yes
X
2
No Yes Full $ No No Thai 30-60 No
X
3
No No Some $ No No Burger 0-10 Yes
X
4
Yes Yes Full $ No No Thai 10-30 Yes
X
5
Yes No Full $$$ No Yes French >60 No
X
6
No Yes Some $$ Yes Yes Italian 0-10 Yes
X
7
No No None $ Yes No Burger 0-10 No
X
8
No Yes Some $$ Yes Yes Thai 0-10 Yes
X
9
Yes No Full $ Yes No Burger >60 No
X
10
Yes Yes Full $$$ No Yes Italian 10-30 No
X
11
No No None $ No No Thai 0-10 No
X
12
Yes Yes Full $ No No Burger 30-60 Yes
Will we wait, or not?
Splitting Examples by Testing on
Attributes
+ X1, X3, X4, X6, X8, X12 (Positive examples)
- X2, X5, X7, X9, X10, X11 (Negative examples)
Splitting Examples by Testing on Attributes (cont)
+ X1, X3, X4, X6, X8, X12 (Positive examples) - X2, X5,
X7, X9, X10, X11 (Negative examples)
+ X1, X3, X4, X6, X8, X12 (Positive ex)
- X2, X5, X7, X9, X10, X11 (Negative ex)
Splitting Examples by Testing on
Attributes (cont)
+ X1, X3, X4, X6, X8, X12 (Positive examples) - X2, X5,
X7, X9, X10, X11 (Negative examples)
Patrons?
+
- X7, X11
none
some
full
+X1, X3, X6, X8
-
+X4, X12
- X2, X5, X9, X10
No Yes
Splitting Examples by Testing on Attributes (cont)
+ X1, X3, X4, X6, X8, X12 (Positive examples) - X2, X5,
X7, X9, X10, X11 (Negative examples)
Yes No
Decision tree learning example
Induced tree (from examples)
Decision tree learning example
True tree
Goal Predicate:
Will wait for a table?
Patrons?
WaitEst?
Hungry?
Yes
none
some
full
>60
30-60 10-30
0-10
no
yes
Logical Representation of a Path
r [Patrons(r, full) . Wait_Estimate(r, 10-30) .
Hungry(r, yes)] Will_Wait(r)
Choosing an attribute
Idea: a good attribute splits the examples into
subsets that are (ideally) "all positive" or "all
negative"








Patrons? is a better choice
Patrons?
+
- X7, X11
none
some
full
+X1, X3, X6, X8
-
+X4, X12
- X2, X5, X9, X10
Type?
+ X1
- X5
French
Italian Thai
+X6
- X10
+X3, X12
- X7, X9
+ X4,X8
- X2, X11
Burger
What Makes a Good Attribute?
Better

Attribute
Not As
Good An
Attribute



Decision tree learning example:
Choosing attribute Test
T = True, F = False
6 True,
6 False
( ) ( ) ( ) ( ) 1
12
6
log
12
6
12
6
log
12
6
Entropy 2 2 = =
Decision tree learning example
( ) ( ) ( ) ( ) | | ( ) ( ) ( ) ( ) | | 1
6
3
ln
6
3
6
3
ln
6
3
12
6
6
3
ln
6
3
6
3
ln
6
3
12
6
Entropy = + =
Alternate?
3 T, 3 F 3 T, 3 F
Yes No
Entropy decrease for Alternate= 1 1= 0
Decision tree learning example
( ) ( ) ( ) ( ) | | ( ) ( ) ( ) ( ) | | 1
6
3
ln
6
3
6
3
ln
6
3
12
6
6
3
ln
6
3
6
3
ln
6
3
12
6
Entropy = + =
Bar?
3 T, 3 F 3 T, 3 F
Yes No
Entropy decrease for Bar= 1 1= 0
Decision tree learning example
( ) ( ) ( ) ( ) | | ( ) ( ) ( ) ( ) | | 98 . 0
7
3
ln
7
3
7
4
ln
7
4
12
7
5
3
ln
5
3
5
2
ln
5
2
12
5
Entropy = + =
Fri
2 T, 3 F 4 T, 3 F
Yes No
Entropy decrease for Fri/Sat= 1 0.98 = 0.02
Decision tree learning example
( ) ( ) ( ) ( ) | | ( ) ( ) ( ) ( ) | | 804 . 0
5
4
ln
5
4
5
1
ln
5
1
12
5
7
2
ln
7
2
7
5
ln
7
5
12
7
Entropy = + =
Hungry?
5 T, 2 F 1 T, 4 F
Yes No
Entropy decrease for Hungry= 1 0.804 = 0.19
Decision tree learning example
( ) ( ) ( ) ( ) | | ( ) ( ) ( ) ( ) | | 1
8
4
ln
8
4
8
4
ln
8
4
12
8
4
2
ln
4
2
4
2
ln
4
2
12
4
Entropy = + =
Raining?
2 T, 2 F 4 T, 4 F
Yes No
Entropy decrease for Raining= 1 1 = 0
Decision tree learning example
( ) ( ) ( ) ( ) | | ( ) ( ) ( ) ( ) | | 978 . 0
7
4
ln
7
4
7
3
ln
7
3
12
7
5
2
ln
5
2
5
3
ln
5
3
12
5
Entropy = + =
Reservation?
3 T, 2 F 3 T, 4 F
Yes No
Entropy decrease for Reservation = 1 0.978 = 0.02
Decision tree learning example
( ) ( ) ( ) ( ) | | ( ) ( ) ( ) ( ) | |
( ) ( ) ( ) ( ) | | 456 . 0
6
4
ln
6
4
6
2
ln
6
2
12
6
4
0
ln
4
0
4
4
ln
4
4
12
4
2
2
ln
2
2
2
0
ln
2
0
12
2
Entropy
= +
+ =
Patrons?
2 F
4 T
None
Full
Entropy decrease for Patrons= 1 0.456 = 0.543
2 T, 4 F
Some
Decision tree learning example
( ) ( ) ( ) ( ) | | ( ) ( ) ( ) ( ) | |
( ) ( ) ( ) ( ) | | 77 . 0
4
3
ln
4
3
4
1
ln
4
1
12
4
2
0
ln
2
0
2
2
ln
2
2
12
2
6
3
ln
6
3
6
3
ln
6
3
12
6
Entropy
= +
+ =
Price
3 T, 3 F
2 T
$
$$$
Entropy decrease for Price = 1 0.77 = 0.23
1 T, 3 F
$$
Decision tree learning example
( ) ( ) ( ) ( ) | | ( ) ( ) ( ) ( ) | |
( ) ( ) ( ) ( ) | | ( ) ( ) ( ) ( ) | | 1
4
2
ln
4
2
4
2
ln
4
2
12
4
4
2
ln
4
2
4
2
ln
4
2
12
4
2
1
ln
2
1
2
1
ln
2
1
12
2
2
1
ln
2
1
2
1
ln
2
1
12
2
Entropy
= + +
+ =
Type
1 T, 1 F
1 T, 1 F
French Burger
Entropy decrease for Type= 1 1 = 0
2 T, 2 F
Italian
2 T, 2 F
Thai
Decision tree learning example
( ) ( ) ( ) ( ) | | ( ) ( ) ( ) ( ) | |
( ) ( ) ( ) ( ) | | ( ) ( ) ( ) ( ) | | 792 . 0
2
2
ln
2
2
2
0
ln
2
0
12
2
2
1
ln
2
1
2
1
ln
2
1
12
2
2
1
ln
2
1
2
1
ln
2
1
12
2
6
2
ln
6
2
6
4
ln
6
4
12
6
Entropy
= + +
+ =
Est. waiting
time
4 T, 2 F
1 T, 1 F
0-10 > 60
Entropy decrease for Est = 1 0.792 = 0.21
2 F
10-30
1 T, 1 F
30-60
Entropy for each Attribute
Entropy decrease for Alternate= 1 1= 0
Entropy decrease for Bar= 1 1= 0
Entropy decrease for Fri/Sat= 1 0.98 = 0.02
Entropy decrease for Hungry= 1 0.804 = 0.19
Entropy decrease for Raining= 1 1 = 0
Entropy decrease for Reservation = 1 0.978 = 0.02
Entropy decrease for Patrons= 1 0.456 = 0.543
Entropy decrease for Price = 1 0.77 = 0.23
Entropy decrease for Est = 1 0.792 = 0.21
Decision tree learning example
Patrons?
2 F
4 T
None
Full
Largest entropy decrease (0.543)
achieved by splitting on Patrons.
2 T, 4 F
Some
X?
Continue like this, making new splits,
always purifying nodes.
Next step
Given Patrons as root node, the next attribute chosen is
Hungry?







( ) ( ) ( ) ( ) | | ( ) ( ) ( ) ( ) | | 33 . 0
2
2
ln
2
2
2
0
ln
2
0
12
2
4
2
ln
4
2
4
2
ln
4
2
12
4
Entropy = + =
Entropy decrease for Hungry= 1 0.33= 0.666
Decision tree learning
Aim: find a small tree consistent with the training
examples
Idea: (recursively) choose "most significant" attribute
as root of (sub)tree.

Thank You

You might also like