You are on page 1of 22

CHAPTER 2

Machine Learning Lazy Learning

CONCEPT LEARNING TASK

The task of concept learning is to acquire concepts from specific training examples.
Training examples are instances, which either belong to a special concept, and therefore are positive examples, or do not belong to the concept, and therefore are negative examples.

EXAMPLE

consider the example task of learning the target concept "days on which my friend Aldo enjoys his favorite water sport." Table1 describes a set of example days, each represented by a set of attributes. The attribute Enjoy Sport indicates whether or not Aldo enjoys his favorite water sport on this day. The task is to learn to predict the value of Enjoy Sport for an arbitrary day, based on the values of its other attributes.

Each instance X described by the attributes Sky, AirTemp, Humidity, Wind, Water, and Forecast.

ENJOY SPORT CONCEPT LEARNING TASK

Positive and negative training examples for the target concept EnjoySport

Instances for which c(x) = 1 are called positive examples, or members of C(1) = 1 (positive) the target concept. Instances for which C(X) = 0 are called negative C(2) = 1 (positive) examples, or nonmembers of the target concept. We will often write the C(3) = 0 (negative) ordered pair (x, c(x)) to describe the training example consisting of the C(4) = 1 (positive) instance x and its target concept value c(x). Where c the target concept and x the instance

CLASSIFICATION TASK
A

set of training instances and their classes are given. need to determine a class for a new query instance. Learning Problem: Assume that a set of training instances is given, and we need to classify a new instance according to the closest given instance.

We

CLASSIFICATION TASK:
STRUCTURED REPRESENTATION

Learning Problem: Assume that a set of training instances is given, and we need to classify a new instance according to the closest given instance.

Input/Given: A set of training instances Classes Means/Tools to measure distances A new instance

Output/Results: a class for a new instance

LAZY LEARNING: GENERAL DESCRIPTION

A lazy Learning (LL) method performs its computation in the following stages:

1. LL methods learn by storing instances and require some means of measuring distance between instances, (LL stores a set of training instances in its memory).

2. When a new query instance is encountered, a set of similar instances is retrieved from memory and used to classify the new query instance.

LAZY LEARNING: GENERAL DESCRIPTION


New

instances are classified according to the closest example from memory. order to define the closest example LL uses a distance measure to compare new instances to those stored.

In

FORMAL DESCRIPTION OF AN INSTANCE AND A CLASS


LL

stores a training instance as a pair < xi, g(xi)>, where xi describes the attributes of the instance and g(xi) denotes its class (or value).

CLASSIFICATION TASK:
FORMAL REPRESENTATION

SIMILARITY OR DISTANCE MEASURE: AN EXAMPLE

Euclidean distance: let an arbitrary instance xi be described by the set of attributes as follows: xi= {v1(xi), v2(xi),, vn(xi)} where vr(xi) denotes the value of the rth attribute of instance xi. Then the distance between two instances xi and xj is defined to be d(xi, xj)

MEASUREMENT OF DISTANCE: PROBLEMS

There are many distance measures that have been proposed to decide which instance is closest to the given training instances [Michalski, Stepp & Diday, 1981; Diday, 1974]. Many of these metrics work well for numerical attributes but do not appropriately handle nominal (i.e. symbolic, unordered, e.g. letters of the alphabet which have no natural inter-letter distance) attributes. When the features are numeric, Euclidean distance can be used to compare examples. When the feature values have symbolic, unordered values LL uses scaling attributes methods (e.g. Stanfill & Waltz,1986).
Nominal Value measured in an amount rather than in real value;

INSTANCE-BASED LEARNING ALGORITHM


1.

Define the set of training examples

2.

Read a query example xq


For each training example

3.

xi

determine similarity

between xi and xq , i.e. a distance measure is applied


4.

Label xq with the class of the closest examples in the set of training examples. Include xq in the set of training examples and go to step 2.

5.

BUSINESS INTELLIGENCE EXAMPLE ONE


Machine Learning: Lazy Learning or Instance based Learning Problem 1: Suppose you have a database of people: David, Daniel, Patrick and John are peoples names all stored in this database as instances. Some attributes from the database could be < weight, height, test scores, age>, so instances are the records (rows) of table people and the attributes are columns.

PROBLEM 1:
Name David Daniel Patrick John Weight (Kg) 108.4 88.2 81 104 Height (cm) 177 183 175 198 Test Scores Out of 100% 78 60 54 55 Age (Years) 31 25 27 38

1. 2.

Find the Euclidean distance between the instance David and John.

Suppose that a target instance called Mark, who is not stored in our hypothetical database from table 1, has the following attributes: <weight = 96, height = 187, test _ scores = 91, age = 34> Find the closest person who has attributes that match Marks attributes.

K-NEAREST NEIGHBOUR ALGORITHM: DESCRIPTION

In the basic form the k-nearest neighbour algorithm stores all training instances. The first stage: At classification time, it computes a distance measure between the query instance and each of the training instances, and selects the nearest k training instance.

The second stage: A simple majority vote is used. The majority class of the k nearest training instances is predicted to be the class for the query instance.

FORMAL DESCRIPTION OF K-NEAREST NEIGHBOUR ALGORITHM

K-NEAREST

NEIGHBOUR ALGORITHM:

EXAMPLE 1
The Figure illustrates the operation of the k-NEARESNTE NEIGHBOR algorithm for the case where the instances are points in a two dimensional space and where the target function is Boolean valued. The positive and negative training examples are shown by "+" and "-"respectively:

K-NEAREST

NEIGHBOUR ALGORITHM:

EXAMPLE 1

LAZY LEARNING: THREE DISTINGUISH CHARACTERISTICS


Lazy Learning (LL) algorithms exhibit three characteristics that distinguish them from other learning algorithms (i.e. algorithms that lead to performance improvement over time). They

1. defer processing of their instances until they receive requests for information; they simply store their instances for future use; 2. reply to information requests by combining their stored training instances/data; 3. discard any intermediate results.

DISTANCE-WEIGHTED NEAREST NEIGHBOUR ALGORITHM

Algorithm uses weights to indicate the contribution of each of the k neighbours according to their distance to the query point xq. It gives greater weight to closer neighbour.

f ( xq ) arg max wi (v, f ( xi ))


vV i 1

Where

1 wi 2 d ( xq , xi )

PROBLEMS IN LAZY LEARNING


The central issues in lazy learning systems are: What training instances should be remembered? How can the similarity of instances be measured? How should the new instance be related to remembered instances?

You might also like