Machine Learning: Consistent and Complete H., The Space Is Restricted by

Machine Learning 2 1
Machine Learning
Basic definitions:
concept: often described implicitely(good politician) using
examples, i.e. training data
hypothesis: an attempt to describe the concept in an
explicite way
concept / hypothesis are presented in the corresponding language
hypothesis is verified using testing data
background knowledge provides info about the context
(properties of environment)
learning algorithm searches the space of hypothesis to find
consistent and complete h., the space is restricted by
introducing bias

Goal of inductive ML
Suggest a hypothesis
characterizing concept in a
given domain (= the set of
objects in this domain)
implicitely described through
a limited set of classified
examples E
+
and E
-
.
The hypothesis:
has to cover E
+
while avoiding
E
-

be applicable to objects which
do not belong to E
+
and E
-
.
Basic notions
O - domain of the concept K, ie. K_O.
E _ O a set of training examples is
complemented by a classifcation, i.e. a
function cl: E -->{yes, no}.
E
+
denotes all elements of E classified as
{yes}
E
+
and E
-
are a disjoint cover of the set E
Example 1 computer game: Is there a way
how to distinguish quickly a friendly robot
from the others?
Friendly r.
Unfriendly r.
Concept Language and
Background Knowledge
Examples of concept language:
A set of real or idealised examples expressed in the object language that
represent each of the concepts learned (Nearest Neighbour)
attribute-value pairs (propositional logic)
relational concepts (first order logic)

One can extend the concept language with user-defined
concepts or background knowledge.
BK plays an important role in Inductive Logic Programming (ILP)
The use of certain BK predicates may be a necessary condition for
learning the right hypothesis.
Redundant or irrelevant BK slows down the learning.

Example 1: hypothesis and its testing
Head
shape
Smiling
face
Neck Body
shape
Holding Friendly

circle nothing tie circle sword yes
triangle yes nothing square nothing yes
H1 in the form of a decision tree
if neck( r) = bow then friendly
= nothing then
if head_shape ( r) = triangle then friendly
else unfriendly
= tie then
if body_shape( r) = square then unfriendly else
if head_shape( r) = circle then friendly
else unfriendly
Example 1: hypothesis and its testing

H2 using the binary relation equal =

if head_shape ( r) = body_shape( r) then friendly
else unfriendly

Head shape

Smiling
face
Neck

Body
shape
Holding

Friendly

circle no tie circle sword yes
triangle yes nothing square nothing no

H1 and H2 classify correctly data in the training set, but their
classification differs in the test set
Hypothesis - attempt for a formal description
Both examples and hypothesis have to be specified in a
language. Hypothesis has the form of a formula (X) with
a single free variable X.
Let us define extension Ext

of a hypotheis (X) wrt. the
domain O as the set of all elements of O, which meet the
condition , tj.Ext

= {oeO: (o) plat }

Properties of hypothesis
hypothesis is complete (pln), iff E
+
_ Ext

h. is consistent, if it covers no negative examples, i.e.
Ext
E
-
= C
h. is correct, if it is complete and consistent
How many correct hypothesis can be
designed for a fixed training set E?
Fact: the number of possible concepts is much more than
possible hypothesis (a formula)
concequence: most of the concepts cannot be
characterized by a corresponding hypothesis - we have to
accept the hypothesis, which are approximately correct
only.
Uniqueness of an approximately correct hypothesis
cannot be ensured.
Choice of a hypthesis and Ockhams rasor
Williamu of Ockham
recommends the way how
to compare the hypothesis:
Entia non sunt
multiplicanda praeter
necessitatem,
Einstein: the
language should not be
sompler than necessary.
Machine Learning Biases
The concept/hypothesis language specifies the
language bias, which limits the set of all
concepts/hypotheses that can be
expressed/considered/learned.
The preference bias allows us to decide between
two hypotheses (even if they both classify the
training data equally).
The search bias defines the order in which
hypotheses will be considered.
Important if one does not search the whole hypothesis
space.

Preference Bias, Search Bias & Version Space
Hypothesis are partially ordered
Version space: searches for the subset of hypotheses that have zero
training error.

+
+
+
+
_
_
_
_
most spec. concept
most gen. concept
Types of learning
- skill refinement (swimming, biking, ...)

- knowledge acquisition
- Rote Learning (chess, checkers), the aim is to find an appropriate
heuristic function evaluating the current state of the game, e.g. MIN-
MAX approach
- Case-Based Reasoning: past experience is stored in a database. To
solve a new problem, the systm searches the DB to find the closest
(the most similar) case - its solution is modified for the current
problem
- Advice Taking, learning to use "interpret" or "operacionalize" an
abstract advice search for applicability conditions
Induction. Difference Analysis: candidate-elimination or version
space approach, decision trees induction etc.
Decision tree induction
Given: Training examples uniformly described by a single set
of the same attributes and classified into a small set of
classes (most often into 2 classes: positive X negative
examples)
Find: a decision tree allowing to characterize the new species

Simple example: robots described by 5 discrete atributes and classified
into 2 classes (friendly, unfriendly)
Is_smiling e{no, yes},
- Holding e{sword, balloon, flag},
- Has_tie e{no, yes},
- Head_shape e{round, square, octagone},
- Body_shape e{round, square, octagone}.

Attributes
Class.
Is_smiling holding Has_tie Head_shap
e
Body_shap
e
friendly yes balloon yes square square
friendly yes flag yes octagon octagon
unfriendly yes sword yes round octagon
unfriendly yes sword no square octagon
unfriendly no sword no octagon round
unfriendly no flag no round octagon
TDIDT: Top-Down Ind. of Decision Trees
given: S ... the set of classified examples
goal: design a decision tree DT ensuring the same classification
as S
1. The root is denoted by S
2. Find the "best" attribute at to be used for splitting the
current set S
3. Split the set S into the subsets S
1
, S
2
, ..., S
n
wrt. value of at
(all examples in the subset S
i
have the same value at = v
i
).
This set denotes a node of the DT
4. For each S
i
do:
If all examples in S
i
belong to the same class or
then create a leaf with the same label,
else go to 1 with S = S
i

TDIDT: How to choose the "best" attribute?
minimize the entropy (Shanon)
H(S
i
) =- p
i
+
log p
i
+
- p
i
-

log p
i
-

p
i
+
= the probability that a random example in S
i
is ,
estimated by frequency

Let the attribute at split S into the subsets S
1
, S
2
, ..., S
n
. The
entropy of this system is defined
H(S,at) =
i
n
=1
P(S
i
) H (S
i
)
where P(S
i
) is probability of the event S
i
, approx. by relative
size |S
i
| / |S|

Choose at with the minimal H(S,at)
Learning to fly simulator F16 [Samuel, 95]
Design an automatic controller for F16 for following complex task:
1. Start up and rise upto the heigth 2000 feet
2. Fly 32000 feet north
3. Turn right 330
4. When 42000 feet from the starting point (direction N-S) turn left and head
towards the starting point, the rotation is finished when the course is between
140 and 180.
5. Adjust the flight direction so that it is paralel to the landing course, tolerance 5
for flight direction and 10 for wing twist wrt. horizont
6. Decrease the heigth and move towards the start of the landing path
7. Lend
Training data: 3 skilled pilots performed the assigned mission, each 30 times
Each flight is described by 1000 vectors characterizing ( total of 90000 training
examples): - Position and state of the plane
- Pilots control action

Learning to fly simulator F16 [Samuel, 95]
Position and state
on_gound boolean: is the plane on the ground?
g_limit boolean: acceleration limit exceeded?
wing_stall (is the plane stabile?), twist (int: 0-360, wings wrt. horizont)
elevation (angle body wrt. horizont), azimuth, roll_speed (wings deflection),
elevation_speed, azimuth_speed , airspeed, climbspeed, E/W distance, N/S
distance, fuel (weight of current supply)
Control:
rollers and elevator: position of horizontal/ vertical deflection
thrust integer: 0-100%, force
flaps integer: 0, 10 or 20, wing twist
Each of the 7 phases calls for a specific type of control.
The training data are divided into 7 disjunctive sets which are used to design specific
decision trees (independently for each task phase and each control action).
Control ensured by 7 * 4 decison trees.

Tasks adressed by ML applications
Classification/prediction
diagnosis (troubleshooting motor pumps, medicine,.., SKICAT -
astronomical cataloguing)
execution/control (GASOIL - separation of hydrocarbons)
configuration/design (Siemens: equipment c., Boeing)
language understanding
vision and speech
planning and schedulling
Why? Important speed up of the development and maintenace
180 man-years to develop ES XCON with 8000 rules, 30 m-y needed for
maint.
1 man-year to develop BP GASOIL (MLbased) with 2800 rules, 0,1 m-y
needed for maint.

Machine Learning: Consistent and Complete H., The Space Is Restricted by

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Machine Learning: Consistent and Complete H., The Space Is Restricted by

Uploaded by

Copyright:

Available Formats

Machine Learning 2 1

You might also like