Professional Documents
Culture Documents
.in
rs
de
ea
yr
www.myreaders.info/ , RC Chakraborty, e-mail rcchak@gmail.com , June 01, 2010
www.myreaders.info/html/artificial_intelligence.html
www.myreaders.info
ha
kr
ab
or
ty
,w
.m
Return to Website
Learning System
Artificial Intelligence
Learning systems,
of
learning
by
memorization,
learning
something
by
repeating.
approach,
EBL
architecture,
EBL
system,
generalization
K-mean
Net
clustering
Perceptron;
system.
algorithm.
Genetic
Clustering :
distance
Learning
analogy;
Algorithm.
by
Reinforcement
RL tasks,
fo
.in
rs
de
ea
w
.m
yr
Learning System
ab
or
ty
,w
Artificial Intelligence
ha
kr
Topics
4 hours)
1. What is Learning
Slides
03-09
10
11-38
39-43
44-52
53-62
63
64-67
9. Reinforcement Learning
68-80
81
fo
.in
rs
de
ea
yr
,w
.m
Learning
What is learning ?
Some Quotations
ha
kr
ab
or
ty
Machine Learning
Mitchell, 1997
A computer program is said to learn from experience E with respect to
some class of tasks T and performance measure P,
if its performance at
fo
.in
rs
de
ea
AI - Learning
or
ty
,w
.m
yr
1. What is Learning
Learning denotes changes in a system that enable the system to do the
kr
ab
ha
if its performance at
A task T
A performance measure P
Some experience E with the task
Goal
fo
.in
rs
de
ea
AI - Learning
or
ty
,w
.m
yr
kr
ab
ha
Sensors
Actuators
Human agent
Robotic agent
motors
Software agent
Displays to screen,
write files
Environment
Agent
fo
.in
rs
de
ea
AI - Learning
ty
,w
.m
yr
Agent)
ab
or
ha
kr
Percept sequence
Agent function
Agent program
Learning element,
Performance element,
Critic, and
Problem generator.
Components of a Learning System
Performance
standard
Critic
Sensors
Percepts
feed back
changes
Learning
Element
knowledge
learning
goals
Problem
Generator
Performance
Element
Effectors
Learning Agent
Actions
E
N
V
R
O
N
M
E
N
T
fo
.in
rs
de
ea
AI - Learning
or
ty
,w
.m
yr
ha
kr
ab
Learning Element:
knowledge
about
performance
element
and
some
feedback,
new examples or experiences that will aid in training the system further.
Example : Automated Taxi on city roads
Performance Element: Consists
of
for
driving actions.
e.g., turning, accelerating, braking are performance element on roads.
Learning Element: Formulates goals.
e.g., learn rules for braking, accelerating, learn geography of the city.
Critic: Observes world and passes information to learning
element.
e.g. , quick right turn across three lanes of traffic, observe reaction of
other drivers.
Problem Generator: Try south city road .
07
fo
.in
rs
de
ea
AI - Learning
or
ty
,w
.m
yr
kr
ab
inputs
to
stored
representation;
Association-based
storage
and
ha
retrieval.
Induction:
Inductive
Determine
correspondence
between
two
different
fo
.in
rs
de
ea
yr
.m
w
w
,w
ty
or
ab
kr
ha
C
AI - Learning
both
inductive
and
theorems
and
inductive
when
deductive.
discovers
It
concepts
is
deductive
about
those
if
it
proves
theorems.
It is
by natural evolution;
In the natural
world, the organisms that are poorly suited for an environment die off,
while those well-suited for it prosper. Genetic algorithms search the
space of individuals for good candidates. The "goodness" of an
individual is measured by some fitness function. Search takes place in
parallel, with many individuals in each generation.
Reinforcement:
at
reinforcement
agent
cannot
learning
directly
takes
place
compare
the
in an environment
results of
its
where
action
to
the
a
fo
.in
rs
de
ea
AI Rote Learning
or
ty
,w
.m
yr
2. Rote Learning
Rote
learning
kr
ab
focuses
on
technique avoids
memorizing
the
understanding
material
so that
but
it
the
can be recalled
by
ha
Learning
by
Memorization
which
avoids
understanding
the
inner
over
and
over
and
over
again;
saying the same thing and trying to remember how to say it;
it
fo
.in
rs
de
ea
or
ty
,w
.m
yr
kr
ab
from a set of observed instances. The learning methods extract rules and
ha
fo
.in
rs
de
ea
or
ty
,w
.m
yr
Blocks
World
Learning program.
kr
ab
This
goal is to construct
ha
The program for Each concept is learned through near miss. A near miss
is an object that is not an instance of the concept but a very similar to
such instances.
The program uses procedures to analyze the drawing and construct a
semantic net representation.
An example of such an structural for the house is shown below.
Object - house
Semantic net
has-part
has-part
Supported - by
isa
Wedge
C
isa
Brick
fo
.in
rs
de
ea
ab
or
ty
,w
.m
yr
Winston's Program
ha
kr
fo
.in
rs
de
ea
or
ty
,w
.m
yr
kr
ab
keeps
ha
learning
track
examples
version space
without
method
is
learning process.
The
It is another
Fundamental Assumptions
The data is correct ie no erroneous instances.
A correct description is a conjunction of some attributes with values.
14
fo
.in
rs
de
ea
or
ty
,w
.m
yr
ha
kr
ab
diagram
Bottom
1st row : This is the most specific
hypothesis.
Tree nodes :
a model that
relations in a specialization
tree.
fo
.in
rs
de
ea
ab
or
ty
,w
.m
yr
ha
kr
The key idea in version space learning is that specialization of the general
fo
.in
rs
de
ea
or
ty
,w
.m
yr
all
describable
hypotheses
kr
ab
ha
The version space method handles +ve and -ve examples symmetrically.
Given :
representation
language
and
a set
of
positive and
concept
description
fo
.in
rs
de
ea
yr
.m
w
w
or
ty
,w
ha
kr
ab
Initialize
example p
do :
do
from
any
hypothesis
that
is
subsumed,
more
in
G.
For each
new
negative training
example n
do :
from
hypothesis in G;
4. Delete from G any hypothesis that is not more general than some
hypothesis in S;
If S and G are both singleton sets, then
1. If they are identical, output their value and halt.
2. If they are different, the training cases were inconsistent; output this
18
1. If 0 ,
2. If 1 ,
3. If 2 ,
fo
.in
rs
de
ea
ty
,w
.m
yr
kr
ab
or
ha
Examples 1 to 5 of features
Origin
Manufacturer
Color
Decade
Type
Example Type
1.
Japan
Honda
Blue
1980
Economy
Positive
2.
Japan
Toyota
Green
1970
Sports
Negative
3.
Japan
Toyota
Blue
1990
Economy
Positive
4.
USA
Chrysler
Red
1980
Economy
Negative
5.
Japan
Honda
White
1980
Economy
Positive
Initialize S
( ?, ?, ?, ?, ? )
These two models represent the most general and the most specific
heuristics one might learn.
The actual heuristic to
be
fo
.in
rs
de
ea
yr
.m
w
w
,w
ty
It is a negative example.
ab
or
ha
kr
(?, Honda, ?, ?, ?)
(?, ?, Blue, ?, ?)
(?, ?, ?, 1980, ?)
(?, ?, ?, ?, Economy)
20
fo
.in
rs
de
ea
yr
.m
w
w
,w
ty
It is a positive example.
ab
or
ha
kr
example; i.e., remove from the G set any descriptions that are
(?, Honda, ?, ?, ?)
(?, ?, Blue, ?, ?)
(?, ?, ?, 1980, ?)
(?, ?, ?, ?, Economy)
At this point, the new G and S sets specify a version space that
can be translated roughly in to English as : The concept may be as
specific as "Japanese, blue economy car".
21
fo
.in
rs
de
ea
yr
.m
w
w
,w
ty
ab
or
ha
kr
Prune away all the specific models that match the negative example.
No match found therefore no change in S
S = { (Japan, ?, Blue, ?, Economy) }
(?, ?, ?, ?, Economy)
(Japan, ?, Blue, ?, ?)
S = {(Japan, ?, ?, ?, Economy)}
It is now clear that the car must be Japanese, because all description in
the version space contain Japan as origin.
22
fo
.in
rs
de
ea
yr
.m
w
w
,w
ty
It is positive example.
ab
or
ha
kr
G = { (Japan, ?, ?, ?, Economy) }
(?, ?, ?, ?, Economy)
(?, ?, Blue, ?, ?)
S = {(Japan, ?, ?, ?, Economy)}
S = {(Japan, ?, ?, ?, Economy)}
G = ( ?, ?, ?, ?, ? )
G = {(?, ?, ?, ?, Economy)}
G = {(Japan, ?, ?, ?, Economy)}
S = {(Japan, ?, ?, ?, Economy)}
fo
.in
rs
de
ea
ty
,w
.m
yr
ab
or
Decision trees represent rules. Rules are easily expressed so that humans
ha
kr
like SQL so that records falling into a particular category may be retrieved.
Description
Decision tree is a classifier in the form of a tree structure where each
node is either a leaf or decision node.
leaf node indicates the target attribute (class) values of examples.
decision node specify test to be carried on an attribute-value.
Decision tree is a typical inductive approach to learn knowledge on
classification. The conditions are :
Attribute-value description: Object or case must be expressible as
continuous
A
C=red
C=blue
A
B < 4.5
K=y
K=y
Leaf node
A
B 4.5
K=y
B 8.1
B < 8.1
K=y
C=true
K=y
24
Decision node
C=false
K=y
fo
.in
rs
de
ea
or
ty
,w
.m
yr
kr
ab
training cases.
constructs
a decision tree T
from a set of
ha
expert.
ID3 Algorithm
(Iterative Dichotomiser 3)
fo
.in
rs
de
ea
yr
.m
w
w
,w
ty
Attribute Selection
ab
or
There are different criteria to select which attribute will become a test
ha
kr
i=1
i=1
and
are
discrete
variables
that
take
values
in
of
given
attribute X with
respect to the
=-
H(Y |X) =
xj=1
is conditional entropy of Y
lxj=1
fo
.in
rs
de
ea
yr
.m
w
w
or
ty
,w
Example
Day
Outlook
Temp
Humidity
wind
Play
sunny
85
85
week
no
sunny
80
90
strong
no
cloudy
83
78
week
yes
rainy
70
96
week
yes
rainy
68
80
week
yes
rainy
65
70
strong
no
cloudy
64
65
strong
yes
sunny
72
95
week
no
sunny
69
70
week
yes
10
rainy
75
80
week
yes
11
sunny
75
70
strong
yes
12
cloudy
72
90
strong
yes
13
cloudy
81
75
week
yes
14
rainy
71
85
strong
no
ha
kr
ab
Training Data
Learning set
In the above example, two attributes, the Temperature and
Humidity
have
discrete like
indicates
continuous
the acceptable
values.
Possible Values
Attribute
27
Outlook
Sunny
Cloudy
Rainy
Temperature
Hot
Medium
Cold
Humidity
High
Normal
Wind
Strong
Week
Class
play
no play
Decision
n (negative)
p (positive)
fo
.in
rs
de
ea
yr
or
ty
,w
.m
ha
kr
ab
Humidity :
High (H) 81 to 96
Normal (N)
Class :
Yes (Y)
No (N)
Day Outlook
28
Medium (M) 70 to 75
play
Temp
Cold (C) 64 to 69
65 to 80
no play
Humidity
Wind
Class (play)
Sunny
85
Hot
85
High
week
no
Sunny
80
Hot
90
High
strong
no
Cloudy
83
Hot
78
High
week
yes
Rainy
70
Medium
96
High
week
yes
Rainy
68
Cold
80
Normal
week
yes
Rainy
65
Cold
70
Normal
strong
no
Cloudy
64
Cold
65
Normal
strong
yes
Sunny
72
Medium
95
High
week
no
Sunny
69
Cold
70
Normal
week
yes
10
Rainy
75
80
Normal
week
yes
11
Sunny
75
Medium
70
Normal
strong
yes
12
Cloudy
72
Medium
90
High
strong
yes
13
Cloudy
81
Hot
75
Normal
week
yes
14
Rainy
71
Medium
85
High
strong
no
Medium
fo
.in
rs
de
ea
yr
.m
w
w
Attribute Selection
By applying definitions of Entropy and Information gain
or
ty
,w
ha
kr
ab
i=1
i=1
0.940
fo
.in
rs
de
ea
yr
.m
w
w
,w
ty
ab
or
ha
kr
H(Y)
=-
H(Y |X) =
given X
yi=1
l
xj=1
where
is conditional entropy of Y
xj=1
Sv
|Sv|
is number of elements in Sv
|S|
is number of elements in S
For wind = strong, 3 of the examples are YES and 3 are NO.
Gain(S, wind) = Entropy(S) - (8/14) Entropy(Sweak)
(6/14) Entropy(Sstrong)
= 0.940 - (8/14) 0.811 - (6/14) 1.00 = 0.048
Entropy(Sweak) = - (6/8) log2(6/8) - (2/8) log2(2/8) = 0.811
Entropy(Sstrong) = - (3/6) log2(3/6) - (3/6) log2(3/6) = 1.00
For each attribute, the gain is calculated and the highest gain is
used in the decision node.
30
fo
.in
rs
de
ea
yr
.m
ty
,w
Step-by-Step Calculations :
kr
ab
or
ha
of occurrences 5
Outlook = sunny,
Outlook = cloudy,
Outlook = rainy,
Entropy(Ssunny)
Entropy(Scloudy)
Entropy(Srainy)
Gain(S, Outlook)
31
fo
.in
rs
de
ea
yr
.m
w
w
,w
ty
or
ab
ha
kr
Temp = medium,
Temp = cold,
Entropy(Shot)
Entropy(Smedium)
Entropy(Scold)
Gain(S, Temp)
32
-0.393555 - 0.2317937
fo
.in
rs
de
ea
yr
.m
w
w
,w
ty
or
ab
kr
ha
C
C
is of occurrences 7
Humidity = normal
is of occurrences 7
Humidity = high,
Humidity = normal,
Entropy(Shigh)
(7/14) x 0.5916727
fo
.in
rs
de
ea
yr
.m
w
w
,w
ty
or
ab
kr
ha
C
C
is of occurrences 8
Wind = strong
is of occurrences 6
Wind = weak,
Entropy(Sstrong)
Gain(S, Wind)
34
fo
.in
rs
de
ea
yr
.m
w
w
,w
ty
or
ab
kr
ha
C
C
= 0.940
Gain(S, Outlook)
= 0. 246
Gain(S, Temp)
= 0.0289366072
Gain(S, Humidity)
= 0.1515496
Gain(S, Wind)
= 0.048
35
Temp
Humidity
Wind
Play
11
sunny
75
Medium
70
Normal
strong
yes
sunny
80
Hot
90
High
strong
no
sunny
85
Hot
85
High
week
no
sunny
72
Medium
95
High
week
no
sunny
69
Cold
70
Normal
week
yes
12
cloudy
72
Medium
90
High
strong
yes
cloudy
83
Hot
78
High
week
yes
cloudy
64
Cold
65
Normal
strong
yes
13
cloudy
81
Hot
75
Normal
week
yes
10
14
rainy
71
Medium
85
High
strong
no
11
rainy
65
Cold
70
Normal
strong
no
12
10
rainy
75
Medium
80
Normal
week
yes
13
rainy
68
Cold
80
Normal
week
yes
14
rainy
70
Medium
96
High
week
yes
fo
.in
rs
de
ea
yr
ty
,w
.m
ab
or
ha
kr
Ssunny = {D1, D2, D8, D9, D11} = 5 "examples". "D" represent "Days".
= 0.970
Gain(Ssunny, Temp)
= 0.570
Gain(Ssunny, Wind)
= 0.019
Step 09 : This process goes on until all days data are classified
fo
.in
rs
de
ea
yr
.m
w
w
below.
kr
ab
or
ty
,w
Thus,
ha
Outlook
sunny
Hu midity
High
cloudy
Yes play
Wind
D3 D7 D12 D13
Normal
Strong
No play
Yes play
No play
D1 D2 D8
D9 D11
D6 D14
Decision tree
37
rainy
Weak
Yes play
D4 D5
fo
.in
rs
de
ea
yr
or
ty
,w
.m
Applications
kr
ab
ha
ID3
induction packages.
Some specific applications include
medical diagnosis,
credit risk assessment of loan applications,
equipment malfunctions by their cause,
classification of plant diseases, and
web search classification.
38
commercial
rule-
fo
.in
rs
de
ea
or
ty
,w
.m
yr
kr
ab
ha
Inputs
Specific
goal /
problem
Problem solver
(under-stander)
Knowledge
base
New general
concept
Explanation
Generalizer
39
Partial
external
solution
fo
.in
rs
de
ea
.m
yr
or
ty
,w
ha
kr
ab
Domain
theory
Training
Example
Operationality
criterion
Goal
Concept
Results
Problem
solver
trace
Generalizer
general
trace
Operationality
pruner
Input to EBL :
It represents facts and rules that constitute what the learner knows.
The facts describe an instance of the goal concept and the rules
describe relationships between objects and actions in a domain; e.g.,
the cup domain includes facts: concavities, bases, and lugs, as well as
rules: about lift ability, stability and what makes an open vessel.
fo
.in
rs
de
ea
yr
.m
w
w
,w
ty
ab
or
ha
kr
shows how the training example satisfies the goal concept definition. This
explanation must be constructed so that each branch of the explanation
structure terminates in an expression that satisfies the operationality
criterion.
2. Generalize:
fo
.in
rs
de
ea
or
ty
,w
.m
yr
kr
ab
ha
criterion, and
proof
domain theory,
and;
implies; if .. then;
Goal Concept
cup(x)
Training example
colour(Obj23, Blue) ^ has-part(Obj23, Handle16) ^ has-part(Obj23, Bottom19) ^
owner(Obj23, Ralph) ^ has-part(Obj23, Concavity12) ^ is(Obj23, Light) ^
is(Ralph, Male) ^ isa(Handle16,Handle) ^ isa(Bottom19, Bottom) ^
is(Bottom19, Flat) ^ isa(Concavity12, Concavity) ^
is(Concavity12, Upward-Pointing)
Domain theory
has-part(x,y) ^ isa(y,Concavity) ^ is(y, Upward-Pointing)
is(x, Light) ^ has-part(x,y) ^ isa(y,Handle)
has-part(x,y) ^ isa(y, Bottom) ^ is(y,Flat)
open-vessel(x)
liftable(x)
stable(x)
Operationality Criterion
fo
.in
rs
de
ea
yr
.m
w
w
,w
ty
ha
kr
ab
or
Cup(Obj1)
Stable(Obj1)
Lift-able(Obj1)
Open-vessel(Obj1)
I s (Ob j 1,light )
fo
.in
rs
de
ea
AI Learning : Discovery
ty
,w
.m
yr
5. Discovery
Simon (1966) first proposed the idea that we might explain scientific discovery
kr
ab
or
ha
fo
.in
rs
de
ea
AI Learning : Discovery
.m
yr
or
ty
,w
The Simon's theory driven science, means AI-modeling for theory building.
kr
ab
ha
AM (Automated Mathematician)
fo
.in
rs
de
ea
AI Learning : Discovery
or
ty
,w
.m
yr
kr
ab
The system has around 115 basic elements such as sets, lists, elementary
ha
Around
250 heuristic rules are attached to slots in the concepts. The rules present
hints as how to employ functions, create new concepts, generalization etc.
about activities that might lead to interesting discoveries.
The system operates from an agenda of tasks. It selects the most
interesting task as determined by a set of over 50 heuristics. It then
performs all heuristics it can find which should help in executing it.
The
fo
.in
rs
de
ea
AI Learning : Discovery
.m
yr
or
ty
,w
Data driven science, in contrast to theory driven, starts with empirical data
kr
ab
ha
theory. The modeler tries to write a computer program which generates the
BACON System
laws
like
Ohm's law
rediscovering the
and more.
The next few slides shows how BACON1 rediscovers Keplers third Law and
BACON3 rediscovers Ideal Gas Law.
47
fo
.in
rs
de
ea
ty
,w
.m
yr
AI Learning : Discovery
Given a set of observed values about two variables X and Y, BACON.1 finds
ab
or
ha
kr
Heuristic 1 :
Heuristics 3 and 4
Keplers third Law is stated below. Assume the law is not discovered or known.
"The square of the orbital period T is proportional to the cube of the mean distance
a from the Sun."
ie., T2= k a3 ,
year, and k with these units just equals 1, i.e. T2= a3.
Input : Planets, Distance from Sun ( D ) , orbit time Period ( P )
Planet
Mercury
0.382
0.241
Venus
0.724
0.616
Earth
1.0
1.0
Mars
1.524
1.881
Jupiter
5.199
11.855
Saturn
9.539
29.459
fo
.in
rs
de
ea
yr
ha
kr
ab
or
ty
,w
.m
AI Learning : Discovery
not applicable,
Try heuristic 2:
not applicable,
no linear relationship
Try heuristic 3:
not applicable,
Try heuristic 4:
applicable,
D increases as P increase,
so add new variable D/P to the data set.
Planet
D/P
Mercury
0.382
0.241
1.607
Venus
0.724
0.616
1.175
Earth
1.0
1.0
1.0
Mars
1.524
1.881
0.810
Jupiter
5.199
11.855
0.439
Saturn
9.539
29.459
0.324
49
Planet
D/P
D2/P
Mercury
0.382
0.241
1.607
0.622
Venus
0.724
0.616
1.175
0.851
Earth
1.0
1.0
1.0
1.0
Mars
1.524
1.881
0.810
1.234
Jupiter
5.199
11.855
0.439
2.280
Saturn
9.539
29.459
0.324
3.088
fo
.in
rs
de
ea
yr
.m
w
w
,w
ty
or
ab
kr
ha
AI Learning : Discovery
D/P
D2/P
D3/P2
Mercury
0.382
0.241
1.607
0.622
1.0
Venus
0.724
0.616
1.175
0.851
1.0
Earth
1.0
1.0
1.0
1.0
1.0
Mars
1.524
1.881
0.810
1.234
1.0
Jupiter
5.199
11.855
0.439
2.280
1.0
Saturn
9.539
29.459
0.324
3.088
1.0
Conclusion :
D3/P2 is constant
D3/P2
A limitation of BACON.1
fo
.in
rs
de
ea
yr
.m
w
w
,w
ty
AI Learning : Discovery
BACON.3 :
BACON.3 is a knowledge based system production system that discovers
ab
or
empirical laws. The main heuristics detect constancies and trends in data,
ha
kr
fo
.in
rs
de
ea
AI Learning : Discovery
ty
,w
.m
yr
Example :
Rediscovering the ideal gas law pV/nT = 8.32, where p is the pressure on a
ab
or
ha
kr
the gas.
[The step-by-step
pV = 2496.0.
when
pV = 2579.2.
when
pV = 2662.4.
Similarly,
fo
.in
rs
de
ea
AI Learning : Clustering
ty
,w
.m
yr
6. Clustering
Clustering is a way to form natural groupings or clusters of patterns.
ha
kr
ab
or
Example :
fo
.in
rs
de
ea
AI Learning : Clustering
or
ty
,w
.m
yr
kr
ab
ha
and
i=1 (x1 y1
+ x2 y2 + . . . . + xn yn)
is a real number.
i=1(xi yi)
i=1
(xi - yi)2
fo
.in
rs
de
ea
yr
.m
w
w
,w
ty
AI Learning : Clustering
between two points that one would measure with a ruler. The Euclidean
ab
or
distance between two points P = (p1, p2, . . pi . . , xn) and Q = (q1, q2,
ha
kr
Manhattan distance
with coordinates (x1, y1) and the point P2 with coordinates (x2 , y2) is
|x1 x2| + |y1 y2|
Manhattan versus Euclidean distance:
The
red,
blue,
and
Manhattan distance.
yellow
lines
represent
length as 12.
The green line represent Euclidian distance of
length 62 8.48.
55
fo
.in
rs
de
ea
yr
or
ty
,w
.m
AI Learning : Clustering
Minkowski Metric
Let Xi and
kr
ab
1/h
ha
Manhattan distance IF h = 1
dist (Xi, Xj) = |xi1 xj1| + |xi2 xj2| + . . + |xir xjr|
fo
.in
rs
de
ea
AI Learning : Clustering
or
ty
,w
.m
yr
kr
ab
ha
Start
Number of
cluster P
the
three
below
No Object
Move to group
Grouping based on
min distance
END
iterated
until
mentioned
converge.
Centroid
Objects distance
from Centroid
are
steps
fo
.in
rs
yr
ea
de
Objects
: 4 medicines as A, B, C, D.
or
ty
,w
.m
AI Learning : Clustering
ha
kr
ab
Objects
A
B
C
D
Attributes
X
Y
1
2
4
5
1
1
3
4
4.5
4.0
3.5
3.0
2.5
2.0
1.5
1.0
0.5
0
D
C
58
4.5
4.0
3.5
3.0
2.5
2.0
1.5
1.0
0.5
0
D
C
fo
.in
rs
de
ea
yr
.m
w
w
,w
ty
or
ab
AI Learning : Clustering
1. Iteraion 0
(a) Objects Clusters centers stated before :
Attributes : X and Y
kr
X
Y
ha
: A , B, C, D
Objects
A
1
1
B
2
1
C
4
3
D
5
4
(4 1)2 + (3 1)2
D23 =
(4 2)2
= 3.61
+ (3 1)2 = 2.83
Similarly calculate other elements D11 , D12 , D14 , D21 , D22 , D24
(c) Distance matrix becomes
D0 =
3.61
1
A
0
B
2.83
C
4
D
and
G0 =
59
0
A
1
B
1
C
1
D
fo
.in
rs
de
ea
yr
.m
w
w
,w
ty
AI Learning : Clustering
2. Iteration 1 :
The cluster groups have new members. Compute the new centroids of
ab
or
ha
kr
of their coordinates:
P2 = (
2+4+5
3
1+3+5
3
= (11/3 , 9/3)
4.5
4.0
3.5
3.0
2.5
2.0
1.5
1.0
0.5
0
D
C
= (3.66, 3)
(a) Objects Clusters centers stated above :
: A , B, C, D
Objects
Attributes : X and Y
X
Y
A
1
1
B
2
1
C
4
3
D
5
4
C, D You get D11, D12 , D13 , D14 as the 1st row of the distance matrix.
2nd, calculate the Euclidean distances from cetroid P2 to each point A, B,
C, D. You get D21, D22 , D23 , D24 as the 2nd row of the distance matrix.
(c) Distance matrix becomes
D1 =
3.61
G1 =
60
0
A
0
B
1
C
1
D
fo
.in
rs
de
ea
yr
.m
w
w
,w
ty
AI Learning : Clustering
3. Iteration 2 :
The cluster groups have new members. Compute the new centroids of
ab
or
ha
kr
new
centroid
is
the
average
1+2
2
1+1
2
(1.5, 1)
centroid
is
the
average
P2 = (
3+4
4.5
4.0
3.5
3.0
2.5
2.0
1.5
1.0
0.5
0
D
C
) = (4.5, 3.5)
: A , B, C, D
Objects
Attributes : X and Y
X
Y
A
1
1
B
2
1
C
4
3
D
5
4
C, D You get D11, D12 , D13 , D14 as the 1st row of the distance matrix.
2nd, calculate the Euclidean distances from cetroid P2 to each point A, B,
C, D. You get D21, D22 , D23 , D24 as the 2nd row of the distance matrix.
(c) Distance matrix becomes
0.5
D2 =
0.5
3.2
4.61
0
A
0
B
1
C
1
D
G =
61
fo
.in
rs
de
ea
yr
.m
w
w
,w
ty
AI Learning : Clustering
This means the grouping of objects in this last iteration and the one
ab
or
before does not change anymore. Thus, the computation of the k-mean
ha
kr
62
Objects
Feature 1 (X)
Weight index
Feature 2 (Y)
pH
Cluster Group
Medicine A
Medicine B
Medicine C
Medicine A
fo
.in
rs
de
ea
AI Learning by analogy
or
ty
,w
.m
yr
7. Analogy
Learning by analogy means acquiring new knowledge about an input entity by
kr
ab
ha
Qb = 9
I1
I2
I3 = I1 + I2
Qc = ?
Hydraulic Problem
63
fo
.in
rs
de
ea
The Neural net, the Genetic learning and the Reinforcement learning are the
or
ty
,w
.m
yr
ha
kr
ab
fo
.in
rs
de
ea
yr
X1
w1
ab
or
ty
,w
.m
Perceptron
w2
ha
kr
X2
X3
w3
wn
Xn
Inputs are x1, x2, . . . , xn
as real numbers or
fo
.in
rs
de
ea
or
ty
,w
.m
yr
ha
kr
ab
genes
are
connected
together
into
long
strings
called
chromosomes.
genes
and
their
settings are
referred
as
an
organism's
genotype.
When two organisms mate they share their genes. The resultant
offspring may end up having half the genes from one parent and
half from the other. This process is called cross over.
A gene may be mutated and expressed in the organism as a
completely new trait.
Thus, Genetic Algorithms are a way of solving problems by mimicking
fo
.in
rs
de
ea
yr
or
ty
,w
.m
kr
ab
ha
population.
(3) [New population] Create a new population by repeating following
algorithm.
(5) [Test] If the end condition is satisfied, stop, and return the best
fo
.in
rs
de
ea
or
ty
,w
.m
yr
9. Reinforcement Learning
Reinforcement learning refers to a class of problems in machine learning which
kr
ab
ha
The RL Problem and the tasks are illustrated in the next few slides.
68
fo
.in
rs
de
ea
.m
yr
ty
,w
to
understanding
ab
or
ha
kr
Agent
state st
reward rt
action at
rt + 1
Environment
st + 1
The agent and environment interact in a sequence of discrete time
steps, t = 0, 1, 2, 3, 4, - - At each discrete time t, the agent (learning system) observes state
t,
if st S.
its experience.
The agent's goal, is to maximize the total amount of reward it
fo
.in
rs
de
ea
or
ty
,w
.m
yr
Key Features of RL
The learner is not told what actions to take, instead it find finds out
ha
kr
ab
fo
.in
rs
de
ea
or
ty
,w
.m
yr
ha
kr
ab
involve
decision-making
agent
interacting
with
its
reward
it
The
environment
so
as
to
maximize
the
cumulative
perceives
aspects
of
the
environment's
state
and
selects actions.
The
agent
fo
.in
rs
de
ea
or
ty
,w
.m
yr
ha
kr
ab
the
X(t)
is
random
Any realization of X
named
sample
path,
which
can
be
discrete or continuous.
Markov chain
all
the information that could influence the future evolution of the process.
Future states will be reached through a probabilistic process
instead
of a deterministic one.
At
each step
the system
may change
its
state
probability
distribution.
The
changes
in
state
are
called
fo
.in
rs
de
ea
yr
ha
kr
ab
or
ty
,w
.m
Notations followed
t
st
at
rt+1
s t +1
st
Rt
Rt(n)
Rt
(s)
(s,a)
73
rt+1
at
st+1
rt+2
at+1
st+2
rt+3
at+2
st+3
at+3
S+
A(s)
pass
rass
fo
.in
rs
de
ea
or
ty
,w
.m
yr
Markov System
A
kr
ab
time step, and at each step the system will randomly change states
ha
of
going
from
State i
to
State j
is
called
the
transition probability.
Example : Laundry Detergent Switching
fo
.in
rs
de
ea
yr
.m
or
ty
,w
ha
kr
ab
0.8
0.2
0.1
0.9
the numbers
0.8
0.2
0.9
0.1
P=
are
transition
probabilities,
from
State i to State j , a number pij, and the arrows show the switching
directions.
75
fo
.in
rs
de
ea
yr
.m
w
w
,w
ty
ab
or
Transition probability :
ha
kr
going into state j in the next time step. The probability pij
of
it
is called
a transition probability.
Transition diagram :
that
0.2
0.8
0.4
0.6
0.5
0.3
3
0.1
Transition matrix :
From
P11
P12
P21
P22
From
76
0.2
0.8
0.4
0.6
0.5
0.35
0.15
fo
.in
rs
de
ea
ty
,w
.m
yr
kr
ab
or
ha
set of states;
In each state there are several actions from which the decision maker
state transitions possess the Markov property, ie., given the state of
the MDP at time t, the transition probabilities to the state at time t+1
are independent of all previous states or actions.
MDPs are an extension of Markov chains. The differences are the
rt+1
at
st+1
rt+2
at+1
State transitions,
Actions
st+2
and
rt+3
at+2
st+3
at+3
Rewards
Set of states S
Set of actions A(s) available in each state S
Markov assumption: st+1 and
rass = E
rt+1 | st = s, at = a, st+1 = s
fo
.in
rs
de
ea
or
ty
,w
.m
yr
ha
kr
ab
[0, 1]
:S
with (s) = a
Policy
It defines the learning agent's way of behaving at a given time.
A mapping from states to actions (deterministic policy), or the distributions
over actions (stochastic policy).
It is a mapping from perceived states of the environment to actions
fo
.in
rs
de
ea
or
ty
,w
.m
yr
Reward Function
ha
kr
ab
in that situation.
In general, the reward function may be stochastic.
79
fo
.in
rs
de
ea
ty
,w
.m
yr
Maximize Reward
Agent's goal is to maximize reward it receives in long run.
ab
or
ha
kr
as episodic tasks.
is sum of rewards
Rt = rt+1 + rt+2 + + rT ,
t+k-1 rt+k ,
K=1
Value Functions
The value of state s under policy is the expected return when starting from
s and choosing actions according to :
(s)
=
{
R
|
s
=
s
}
=
{
E
E
t
t
VK=1
K=1
80
t+k-1
rt+k| st = s }
fo
.in
rs
de
ea
AI Learning - References
1. "Artificial Intelligence", by Elaine Rich and Kevin Knight, (2006), McGraw Hill
companies Inc., Chapter 17, page 447-484.
kr
ab
or
ty
,w
.m
yr
ha
5. "AI: A New Synthesis", by Nils J. Nilsson, (1998), Morgan Kaufmann Inc., Chapter
10, Page 163-178.
81
An exhaustive list is