Professional Documents
Culture Documents
The task of concept learning is to acquire concepts from specific training examples.
Training examples are instances, which either belong to a special concept, and therefore are positive examples, or do not belong to the concept, and therefore are negative examples.
EXAMPLE
consider the example task of learning the target concept "days on which my friend Aldo enjoys his favorite water sport." Table1 describes a set of example days, each represented by a set of attributes. The attribute Enjoy Sport indicates whether or not Aldo enjoys his favorite water sport on this day. The task is to learn to predict the value of Enjoy Sport for an arbitrary day, based on the values of its other attributes.
Each instance X described by the attributes Sky, AirTemp, Humidity, Wind, Water, and Forecast.
Positive and negative training examples for the target concept EnjoySport
Instances for which c(x) = 1 are called positive examples, or members of C(1) = 1 (positive) the target concept. Instances for which C(X) = 0 are called negative C(2) = 1 (positive) examples, or nonmembers of the target concept. We will often write the C(3) = 0 (negative) ordered pair (x, c(x)) to describe the training example consisting of the C(4) = 1 (positive) instance x and its target concept value c(x). Where c the target concept and x the instance
CLASSIFICATION TASK
A
set of training instances and their classes are given. need to determine a class for a new query instance. Learning Problem: Assume that a set of training instances is given, and we need to classify a new instance according to the closest given instance.
We
CLASSIFICATION TASK:
STRUCTURED REPRESENTATION
Learning Problem: Assume that a set of training instances is given, and we need to classify a new instance according to the closest given instance.
Input/Given: A set of training instances Classes Means/Tools to measure distances A new instance
A lazy Learning (LL) method performs its computation in the following stages:
1. LL methods learn by storing instances and require some means of measuring distance between instances, (LL stores a set of training instances in its memory).
2. When a new query instance is encountered, a set of similar instances is retrieved from memory and used to classify the new query instance.
instances are classified according to the closest example from memory. order to define the closest example LL uses a distance measure to compare new instances to those stored.
In
stores a training instance as a pair < xi, g(xi)>, where xi describes the attributes of the instance and g(xi) denotes its class (or value).
CLASSIFICATION TASK:
FORMAL REPRESENTATION
Euclidean distance: let an arbitrary instance xi be described by the set of attributes as follows: xi= {v1(xi), v2(xi),, vn(xi)} where vr(xi) denotes the value of the rth attribute of instance xi. Then the distance between two instances xi and xj is defined to be d(xi, xj)
There are many distance measures that have been proposed to decide which instance is closest to the given training instances [Michalski, Stepp & Diday, 1981; Diday, 1974]. Many of these metrics work well for numerical attributes but do not appropriately handle nominal (i.e. symbolic, unordered, e.g. letters of the alphabet which have no natural inter-letter distance) attributes. When the features are numeric, Euclidean distance can be used to compare examples. When the feature values have symbolic, unordered values LL uses scaling attributes methods (e.g. Stanfill & Waltz,1986).
Nominal Value measured in an amount rather than in real value;
2.
3.
xi
determine similarity
Label xq with the class of the closest examples in the set of training examples. Include xq in the set of training examples and go to step 2.
5.
PROBLEM 1:
Name David Daniel Patrick John Weight (Kg) 108.4 88.2 81 104 Height (cm) 177 183 175 198 Test Scores Out of 100% 78 60 54 55 Age (Years) 31 25 27 38
1. 2.
Find the Euclidean distance between the instance David and John.
Suppose that a target instance called Mark, who is not stored in our hypothetical database from table 1, has the following attributes: <weight = 96, height = 187, test _ scores = 91, age = 34> Find the closest person who has attributes that match Marks attributes.
In the basic form the k-nearest neighbour algorithm stores all training instances. The first stage: At classification time, it computes a distance measure between the query instance and each of the training instances, and selects the nearest k training instance.
The second stage: A simple majority vote is used. The majority class of the k nearest training instances is predicted to be the class for the query instance.
K-NEAREST
NEIGHBOUR ALGORITHM:
EXAMPLE 1
The Figure illustrates the operation of the k-NEARESNTE NEIGHBOR algorithm for the case where the instances are points in a two dimensional space and where the target function is Boolean valued. The positive and negative training examples are shown by "+" and "-"respectively:
K-NEAREST
NEIGHBOUR ALGORITHM:
EXAMPLE 1
1. defer processing of their instances until they receive requests for information; they simply store their instances for future use; 2. reply to information requests by combining their stored training instances/data; 3. discard any intermediate results.
Algorithm uses weights to indicate the contribution of each of the k neighbours according to their distance to the query point xq. It gives greater weight to closer neighbour.
Where
1 wi 2 d ( xq , xi )