19 views

Original Title: 12 Knn Perceptron

Uploaded by jure1

- ch06
- IJRRAS_1_2_03
- Feature Extraction method for Classification of Approved Halal Logo in Malaysia using Fractionalized Principle Magnitude
- MET_58_1_2_7_10
- Ivan Nunes da Silva, Danilo Hernane Spatti, Rogerio Andrade Flauzino, Luisa Helena Bartocci Liboni, Silas Franco dos Reis Alves (auth.)-Artificial Neural Networks _ A Practical Course-Springer Interna.pdf
- 12615757151_2_10_ACR
- Recognizing Bharatnatyam Mudra Using Principles of Gesture Recognition
- ch06
- Paper 17
- 05.Nn1 Definitions
- Neural Net III
- lab1
- Threat detection in online discussion using convolutional neural networks.pdf
- Data Mining Classification Algorithms for Hepatitis and Thyroid Data Set Analysis
- Padmapriya_KSR Engineering College
- 3CE04 Civil Engg Unit-IV & V
- AI_2013
- cs
- Visualization
- 49

You are on page 1of 42

http://cs246.stanford.edu

Would like to do prediction: estimate a function f(x) so that y = f(x) Where y can be:

Real number: Regression Categorical: Classification Complex object:

X Y

Data is labeled:

Jure Leskovec, Stanford C246: Mining Massive Datasets 2

2/14/2011

k-Nearest Neighbor (Instance based learning) Perceptron algorithm Support Vector Machines Decision trees

Jure Leskovec, Stanford C246: Mining Massive Datasets 3

2/14/2011

Keep the whole training dataset: {(x, y)} A query example (vector) q comes Find closest example(s) x* Predict y*

Recommendation systems

2/14/2011 Jure Leskovec, Stanford C246: Mining Massive Datasets 4

Distance metric:

Euclidean One

How many neighbors to look at? Weighting function (optional): How to fit with the local points?

Unused Just predict the same output as the nearest neighbor

2/14/2011

Suppose x1,, xm are two dimensional: One can draw nearest neighbor regions:

x1=(x11,x12), x2=(x21,x22),

2/14/2011

Distance metric: How many neighbors to look at? Weighting function (optional): How to fit with the local points?

Unused Just predict the average output among k nearest neighbors k Euclidean

k=9

2/14/2011 Jure Leskovec, Stanford C246: Mining Massive Datasets 7

wi = All of them (!) exp(-d(xi, q)2/Kw)

wi

Euclidean

d(xi, q) = 0

K=10 K=20

K=80

2/14/2011

NN: find the nearest neighbor p of q in P Range search: find one/all points in P within distance r from q

p q

2/14/2011

Main memory:

Linear scan Tree based:

Quadtree kd-tree

Hashing:

Secondary storage:

R-trees

Locality-Sensitive Hashing

2/14/2011

10

Simplest spatial structure on Earth! Split the space into 2d equal subsquares Repeat until done:

only one pixel left only one point left only a few points left split only one dimension at a time Kd-trees (in a moment)

Variants:

2/14/2011

11

Range search:

Put root node on the stack Repeat:

pop the next node T from the stack for each child C of T:

if C is a leaf, examine point(s) in C if C intersects with the ball of radius r around q, add C to the stack

Nearest neighbor:

Start range search with r = Whenever a point is found, update r Only investigate nodes with respect to current r

Jure Leskovec, Stanford C246: Mining Massive Datasets 12

2/14/2011

Empty spaces: if the points form sparse clouds, it takes a while to reach them Space exponential in dimension Time exponential in dimension, e.g., points on the hypercube

2/14/2011

13

Advantages:

E.g., Pick dimension of largest variance and split at median (balanced split) Do SVD or CUR, project and split

Jure Leskovec, Stanford C246: Mining Massive Datasets 14

2/14/2011

Range search:

Put root node on the stack Repeat:

pop the next node T from the stack for each child C of T:

if C is a leaf, examine point(s) in C if C intersects with the ball of radius r around q, add C to the stack

Best-Bin-First (BBF), Last-Bin-First (LBF)

2/14/2011 Jure Leskovec, Stanford C246: Mining Massive Datasets 15

Find top few dimensions of largest variance Randomly select one of these dimensions; split on median Construct many complete (i.e., one point per leaf) trees Drawbacks:

More memory Additional parameter to tune: number of trees

Search

Descend through each tree until leaf is reached Maintain a single priority queue for all the trees For approximate search, stop after a certain number of nodes have been examined

Jure Leskovec, Stanford C246: Mining Massive Datasets 16

2/14/2011

2/14/2011

d=128, n=100k

[Muja-Lowe, 2010]

17

no backtracking necessary Increases tree depth

more memory slower to build

Better when split passes through sparse regions Lower nodes may spill too much

hybrid of spill and non-spill nodes

2/14/2011 Jure Leskovec, Stanford C246: Mining Massive Datasets 18

For high dim. data, use randomized projections (CUR) or SVD Use Best-Bin-First (BBF)

Make a priority queue of all unexplored nodes Visit them in order of their closeness to the query

Space permitting:

Keep extra statistics on lower and upper bound for each cell and use triangle inequality to prune space Use spilling to avoid backtracking Use lookup tables for fast distance computation

2/14/2011 Jure Leskovec, Stanford C246: Mining Massive Datasets 19

Start with a set of points/rectangles Partition the set into groups of small cardinality For each group, find minimum rectangle containing objects from this group (MBR) Repeat

Advantages:

Supports near(est) neighbor search (similar as before) Works for points and rectangles Avoids empty spaces

Jure Leskovec, Stanford C246: Mining Massive Datasets 20

2/14/2011

group nearby rectangles to parent MBRs

I A B E D C F G H J

2/14/2011

#21

every parent node completely covers its children

P3 A B E P2 D C F P4 G H J

P1

A B C D E

F G

2/14/2011

#22

every parent node completely covers its children

P3 A B E P2 D C F P4 G H J

P1

P1 P2 P3 P4

A B C D E

F G

2/14/2011

#23

P3 A B E P2 D C F P4 G H J

P1

P1 P2 P3 P4

A B C D E

F G

2/14/2011

#24

P3 A B E P2 D C F P4 G H J

P1

P1 P2 P3 P4

A B C D E

F G

2/14/2011

#25

Insertion of point x:

Find MBR intersecting with x and insert If a node is full, then a split:

Linear choose far apart nodes as ends. Randomly choose nodes and assign them so that they require the smallest MBR enlargement Quadratic choose two nodes so the dead space between them is maximized. Insert nodes so area enlargement is minimized P1 P2 P3 P4

C F E

A B P1 P2

2/14/2011

G P3

H J

P4

A B C D E

D

Jure Leskovec, Stanford C246: Mining Massive Datasets

F G

#26

In high-dimensional spaces, all tree-based indexing structures examine large fraction of leaves If we need to visit so many nodes anyway, it is better to scan the whole data set and avoid performing seeks altogether 1 seek = transfer of few hundred KB

2/14/2011

27

Use only i bits per dimension (and speed-up the scan by a factor of 32/i) Identify all points which could be returned as an answer Verify the points using original data set

2/14/2011

28

Binary feature vectors x of word occurrences d features (words + other things, d~100,000) y: Spam (+1), Ham (-1)

2/14/2011

30

Binary classification: f (x) = Input: Vectors xi and labels yi Goal: Find vector w = (w1, w2 ,... , wn)

Each wi is a real number

wx=

1 0

if w1 x1 + w2 x2 +. . . wn xn otherwise

wx=0

Note:

x x, 1

w w,

31

2/14/2011

(very) Loose motivation: Neuron Inputs are feature values Each feature has a weight wi w1 x1 Activation is the sum: w2 If the f(x) is:

f(x) = i wixi = wx - Positive: predict +1 Negative: predict -1

nigeria wx=0 x2 Ham=-1

x2 w3 x3 w4 x4

>0?

x1 w

Spam=1

viagra

32

2/14/2011

Start with w0 = 0 Pick training examples x one by one (from disk) Predict class of x using current weights If y is correct (i.e., y=y): If y is wrong: adjust w wt+1 = wt + y x

where:

is the learning rate parameter x is the training example y is true class label ({+1, -1})

Jure Leskovec, Stanford C246: Mining Massive Datasets 33

y = sign(wt x)

No change: wt+1 = wt

yx

wt wt+1 x

2/14/2011

If there exist a set of weights that are consistent (i.e., the data is linearly separable) the perceptron learning algorithm will converge

Jure Leskovec, Stanford C246: Mining Massive Datasets

If the training data is not linearly separable the perceptron learning algorithm will eventually repeat the same set of weights and therefore enter an infinite loop

2/14/2011

34

Separability: some parameters get training set perfectly Convergence: if training set is separable, perceptron will converge (binary case) Mistake bound: number of mistakes < 1/2

Jure Leskovec, Stanford C246: Mining Massive Datasets 35

2/14/2011

Weight vector wc for each class Calculate activation for each class

f(x,c)= i wc,ixi = wcx

c = arg maxc f(x,c)

2/14/2011

36

Overfitting: Regularization: if the data is not separable weights dance around Mediocre generalization:

Finds a barely separating solution

2/14/2011

37

Winnow algorithm

Similar to perceptron, just different updates

Initialize : = n; w i = 1 Prediction is 1 iff w x If no mistake : do nothing If f(x) = 1 but w x < , w i 2w i (if x i = 1) (promotion ) If f(x) = 0 but w x , w i w i /2 (if x i = 1) (demotion)

2/14/2011

38

Duplicate variables:

To negate variable xi, introduce a new variable xi=-xi Learn monotone functions over 2n variables

Balanced version:

Keep two weights for each variable; effective weight is the difference

Update Rule : If f ( x) = 1 but ( w + w ) x , If f ( x) = 0 but ( w + w ) x ,

2/14/2011

1 wi where xi = 1 (promotion ) 2

1 + wi wi 2 wi where xi = 1 (demotion) 2

39

(Applies both for Perceptron and Winnow) Promote if: wx= wx> + -- Demote if: - - -- - -- wx< -

wx=0

Note: is a functional margin. Its effect could disappear as w grows. Nevertheless, this has been shown to be a very effective algorithmic addition.

2/14/2011 Jure Leskovec, Stanford C246: Mining Massive Datasets 40

Hypothesis : w R n w x

w w + i y j x j

2/14/2011 Jure Leskovec, Stanford C246: Mining Massive Datasets 41

Perceptron Online: can adjust to changing target, over time Advantages Simple Guaranteed to learn a linearly separable problem Limitations only linear separations only converges for linearly separable data not really efficient with many features

2/14/2011

Winnow Online: can adjust to changing target, over time Advantages Simple Guaranteed to learn a linearly separable problem Suitable for problems with many irrelevant attributes Limitations only linear separations only converges for linearly separable data not really efficient with many features

42

- ch06Uploaded byAmrita Sarkar Manna
- IJRRAS_1_2_03Uploaded byPudi Sekhar
- Feature Extraction method for Classification of Approved Halal Logo in Malaysia using Fractionalized Principle MagnitudeUploaded bySEP-Publisher
- MET_58_1_2_7_10Uploaded byArmin
- Ivan Nunes da Silva, Danilo Hernane Spatti, Rogerio Andrade Flauzino, Luisa Helena Bartocci Liboni, Silas Franco dos Reis Alves (auth.)-Artificial Neural Networks _ A Practical Course-Springer Interna.pdfUploaded byMuhammad Nur Sururi
- 12615757151_2_10_ACRUploaded byPrathibha Chandrashekharappa
- Recognizing Bharatnatyam Mudra Using Principles of Gesture RecognitionUploaded byijcsn
- ch06Uploaded byPratik Shah
- Paper 17Uploaded byRakeshconclave
- 05.Nn1 DefinitionsUploaded byprakashpacet
- Neural Net IIIUploaded byVishnu Teerthacharya Ibharampur
- lab1Uploaded byRohit Verma
- Threat detection in online discussion using convolutional neural networks.pdfUploaded byJunaid Akram
- Data Mining Classification Algorithms for Hepatitis and Thyroid Data Set AnalysisUploaded byIntegrated Intelligent Research
- Padmapriya_KSR Engineering CollegeUploaded byRekhaPadmanabhan
- 3CE04 Civil Engg Unit-IV & VUploaded bysss
- AI_2013Uploaded byUmer Farooq Shah
- csUploaded bySomu
- VisualizationUploaded byali
- 49Uploaded byAnonymous 1aqlkZ
- 514.docxUploaded byDevendra Varma
- Fixed Charge Transportation ModelUploaded byallwin92
- 15 1512984093_11-12-2017.pdfUploaded byAnonymous lPvvgiQjR
- Direction Finding Using 3d Array GeometryUploaded byMohammed Nawaz Shaik
- The SVM classifier zisserman lecture note.pdfUploaded byViresh Ranjan
- Dm_NNUploaded bysudheer0553
- HW 7 DMORUploaded byAshish Vishnoi
- Ricardo Gutierrez-Osuna- Multi-layer perceptronsUploaded byAsvcxv
- stats216_hw2Uploaded byAlex Nutkiewicz
- LEACH AlgorithmUploaded byAhmad Hwadi

- Resonator NotesUploaded byOshani perera
- Designing Software Synthesizer - Will Prickle.epubUploaded byAnonymous D26uJa
- Some BOD problems.pptUploaded byMohamed Mo Galal Hassan-Ghariba
- Cold Facts Buyers Guide (2017).pdfUploaded byBinh Thanh Le
- 3 Design of Reinforce Concrete SlabUploaded byAbdelhakim Elhaddad
- Lecture 05Uploaded byNebiye Solomon
- Copper SmithUploaded bykr0465
- ReportUploaded byYasmin Veralina
- 30164970 Orchard Introduction to Simulink With Engineering Applications 2nd Edition MarUploaded bymarnoonpv
- measurement year 6 7Uploaded byapi-419109401
- Advantages and Disadvantages Moisture Probes_web (1)Uploaded bySelah Abdulahi
- Wireframe Surfaces CATIAUploaded byspsharmagn
- 10.1016 J.cirp.2007.05.090 Modeling and Identification of an Industrial Robot for Machining ApplicationsUploaded bysajjad
- HOW DO CHILDREN SWING?Uploaded bycienio
- CAT-4162-5-UK_Parflange.pdfUploaded bykko
- Assessment of Some Factors Affecting the Mechanical Properties of Potato TubersUploaded byTI Journals Publishing
- Blending of AggregateUploaded byChristian Nicolaus Mbise
- plexiglas-general-information-and-physical-properties.pdfUploaded byRajeev Bujji
- Projectile Motion With Air ResistanceUploaded byJulio Cesar Galdeano Morales
- Brick_065_-_Bonding_-_Allotropes.pdfUploaded byWilliamPotash
- Caster ManualUploaded byThomas Bryan
- PV NotesUploaded byerinaaron
- 2017 Earth Systems and Environment CatalogUploaded byElsevier Research Reference Books
- International Journal of Research in Science and Technology Volume 2, Issue 2 (I) April - June 2015 ISSN: 2394-9554Uploaded byempyreal
- Tips and Tricks Hplc Troubleshooting AgilentUploaded byveraja
- Control Position Dc Motor Using Programmable Logic Controller (Plc) With Proportional Integral Derivative ControllerUploaded byKamla
- LTspice ShortcutsUploaded byMelina Russo
- The First Revision to Supervision Regulations on Safety Technology for Stationary Pressure Vessels (TSG R0004-2009)Uploaded byarbor02
- Servo Valve Animation KSUploaded byMadan Yadav
- kaolin densities.pdfUploaded byalenocito