Professional Documents
Culture Documents
Lecture 4
Supervised Learning
Lecture 6
x1
x1(i)
x2(i)
1 4 6
3 5 7
10
X(12) 14
12
11
13 15
16
x2
06/03/2014 3
Nearest Neighbours
x1
X(j)=(x1(j), x2(j), ,xn(j))
D(i, j ) xk i xk j
k 1
x2
06/03/2014
Define a distance metric between points in inputs space. Common measures are:
Euclidean Distance
D(i, j ) xk i xk j
k 1 n 2
06/03/2014
06/03/2014
06/03/2014
06/03/2014
15
06/03/2014
15
06/03/2014
10
15
06/03/2014
11
06/03/2014
12
i=1
For a given query point q, assign the class of the nearest neighbour.
k=1
Compute the k nearest neighbours and assign the class by majority vote.
06/03/2014
k=3
14
k=1 k=5
06/03/2014
15
KNN Flavors
06/03/2014
16
f argmax w d (c, f ( x ))
i i
where
i=1
wi
1 d ( xq , xi )2
06/03/2014
17
06/03/2014
18
Use N fold cross validation Pick K to minimize the cross validation error
For each of N training example
Find its K nearest neighbours Make a classification based on these K neighbours Calculate classification error Output average error over all examples
Use the K that gives lowest average error over the N training examples
06/03/2014
19
06/03/2014
20
Advantages/Disadvantages
Advantages:
Training is very fast Learning complex target functions Dont lose information
Disadvantages:
Slow at query Easily fooled by irrelevant attributes
06/03/2014
21
Storage Requirements
Remove redundant data (condensing) Pre-sorting often increases the storage requirements
Curse of Dimensionality
Required amount of training data increases exponentially with dimension Computational cost also increases dramatically
06/03/2014
22
Condensing
Aim is to reduce the number of training samples For example, Retain only the samples that are needed to define the decision boundary
06/03/2014
24
Applications of KNN
06/03/2014
25
06/03/2014
26
User2
User3 User4
5
4 3
5
4
5
4 3
1
1 5
06/03/2014
27
KNN in CF
Item1 User1 User2 5 Item2 4 5 Item3 Item4 1 1 Item5 5 Item6
x
5
User3
User4
4
3
4
3
1
5
06/03/2014
28
KNN in CF
Item1 User1 User2 5 Item2 4 5 Item3 Item4 1 1 Item5 5 Item6
x
5
User3
User4
4
3
4
3
1
5
06/03/2014
29
KNN in CF
Item1 User1 User2 5 Item2 4 5 Item3 Item4 1 1 Item5 5 Item6
x
5
User3
User4
4
3
4
3
1
5
Sim (u1,u2) = {-1,+1} 0.5 Sim (u1,u3) = {-1,+1} -0.1 Sim (u1,u4) = {-1,+1} 0.3
06/03/2014
30
KNN in CF
Item1 User1 User2 5 Item2 4 5 Item3 Item4 1 1 Item5 5 Item6
x
5
User3
User4
4
3
4
3
1
5
Prediction = {(Sim (u1,u2) * rating_User2) + (Sim (u1,u3) * rating_User3) + (Sim (u1,u4) * rating_User4) } / Sum(sim) Prediction = {(0.5 * 5) + (-0.1 * 4) + (0.3* 3)} /(0.9) = 3.05 31 06/03/2014
Pearson correlation
Select set of K most similar users User their votes for prediction
06/03/2014 32
Pearson Correlation
06/03/2014
33
06/03/2014
34
Questions?
06/03/2014
35