Professional Documents
Culture Documents
Fall 2016
Narayanan C Krishnan
ckn@iitrpr.ac.in
Introduction
Administrative Trivia
Course Structure 3-0-2
Lecture Timings
Monday 9.55-10.45am
Tuesday 10.50-11.40am
Wednesday 11.45am12.35pm
Lab hours
Monday 1.30-4.10pm
Tuesday 1.30-4.10pm
TA
Sanatan Sukhija
sanatan@iitrpr.ac.in
Second TA - TBD
Introduction
Office Hours
csl603f2016@iitrpr.ac.in
Pseudonym
Reference Material
No fixed textbook.
Primary reference books source will be announced
Pre-requisites
Officially CSL201 Data Structures
However, we will be using concepts from
Probability
Statistics
Linear Algebra
Optimization (operations research)
Introduction
Introduction
Quizzes 30%
Almost every Thursday
9.00-10.00am
Room - L3
Covers material
discussed from the
previous quiz till the
current week
Duration 30-45m
Top 6 out of 8 will be
considered towards the
final grade.
Quiz
Date
Q1
4/8
Q2
11/8
Q3
25/8
Q4
1/9
Q5
6/10
Q6
13/10
Q7
27/10
Q8
3/11
Labs 30%
Due every third Friday
11.55pm
Programming
assignments
Start early, experiments
will take time to run!!!
Individual labs
TA is available for any
assistance
Labs
Date
L1
19/8
L2
9/9
L3
30/9
L4
21/10
L5
11/11
Students are
encouraged to contact
the TA for clarifications
regarding the labs
Introduction
Introduction
Grading Scheme
Tentative Breakup
Passing criteria
A student must secure an overall score of 40(out of 100)
and a combined exam score of 60 (out of 200) to pass
the course.
Introduction
Honor Code
Unless explicitly stated otherwise, for all labs
Strictly individual effort
Group discussions at a high level are encouraged
You are forbidden from trawling the web for
answers/code etc.
Introduction
10
Course Website
http://cse.iitrpr.ac.in/ckn/courses/f2016/csl603/csl60
3.html
All class related material will be accessible from the
webpage
Labs will be uploaded incrementally and will be
notified through email
Lab submission is only on moodle
11
Wikipedia
Deals with the construction and study of systems that can
learn from data, rather than follow only explicity
programmed instructions
Introduction
12
Introduction
13
http://www.gartner.com/newsroom/id/3114217
Introduction
14
Related Disciplines
Probability and Statistics
Applied Mathematics
Operations Research
Pattern Recognition
Artificial Intelligence
Data Mining
Cognitive Science
Neuroscience
Big Data
Introduction
15
General Architecture
Pedro Domingos
Evaluation
How would you like to measure the goodness of what is being
learned
Optimization
Given the evaluation and characterization, find the optimum
representation.
Introduction
16
Introduction
17
Introduction
18
General ArchitectureOptimization
Combinatorial optimization
Greedy search
Convex optimization
Gradient descent
Constrained optimization
Linear programming
Introduction
19
Supervised Learning
Classification
Introduction
The data from this example come from the handwritten ZIP codes on
envelopes from U.S. postal mail. Each image is a segment from a five digit
ZIP code, isolating a single digit. The images are 1616 eight-bit grayscale
maps, with each pixel ranging in intensity from 0 to 255. Some sample
images are shown in Figure 1.2.
The images have been normalized to have approximately the same size
and orientation. The task is to predict, from the 16 16 matrix of pixel
intensities, the identity of each image (0, 1, . . . , 9) quickly and accurately. If
it is accurate enough, the resulting algorithm would be used as part of an
automatic sorting procedure for envelopes. This is a classification problem
for which the error rate needs to be kept very low to avoid misdirection of
20 the
Figure 4: (Left) Eight ILSVRC-2010 test images and
Supervised Learning
Classification
Regression
Introduction
https://www.flickr.com/photos/306864
29@N07/sets/72157622330082619/
21
Unsupervised Learning
Clustering
Introduction
22
Unsupervised Learning
Clustering
Rule Mining
Introduction
23
Unsupervised Learning
Clustering
Rule Mining
Semi-supervised
Learning
24
Reminder
If you have decided to credit this course and have
not pre-registered
Send me an email at the earliest to add you to the google
group.
25
Unsupervised Learning
Clustering
Rule Mining
Semi-supervised
Learning
Dimensionality
Reduction
Introduction
26
Unsupervised Learning
Clustering
Rule Mining
Semi-supervised
Learning
Dimensionality
Reduction
Reinforcement
Learning
Introduction
27
Active Learning
Learning algorithm interactively queries an oracle to
obtain the desired outputs for new data points
Online Learning
Learning on the fly
Zero shot learning
Representation Learning
Automatically learning the representation from raw data
Deep Learning
Introduction
28
Unsupervised Learning
Clustering
Dimensionality reduction
Temporal models
Hidden Markov model
29
Introduction
30
Overfitting
Things look rosy while training, but fail miserably when testing
31
Software
Weka (Java)
R (~ Python)
Machine learning open source software
(mloss.org/software)
LibSVM
32
Supervised Learning
Supervised Learning
33
Supervised Learning
Given a set of training examples x, x = y , for
some unknown function
Estimate a good approximation to
Example applications
Face recognition
x: raw intensity face image
(x): name of the person.
Loan approval
x: properties of a customer (like age, income, liability, job, )
(x): loan approved or not.
Autonomous Steering
x: image of the road ahead
(x): Degrees to turn the steering wheel.
Introduction
34
Representation
Each car is represented by two features (attributes)
engine power and price
Training set
Several training examples of already classified cars
Goal
Learn a classifier that accurately classified (new unseen)
cars
Supervised Learning
35
Example: Cars
x2t
x1t
Introduction
x1: Price
36
Definitions (1)
Feature (attribute): )
Instance: x = [, , - , , / ]
Instance space:
Space of all possible instances
Class:
Categorical feature of an object
Set of instances of objects in this category
E.g., family car
Introduction
37
e2
e1
Introduction
p1
p2
CSL465/603 - Machine Learning
x1: Price
38
Definitions (2)
Example: (x, y)
Instance along with its class membership
Positive example: member of class (y = 1)
Negative example: not a member of class (y = 0)
Target concept ()
Correct expression of class
E.g., (e1 engine power e2) and (p1 price p2)
Concept class
Space of all possible target concepts
E.g., axis-aligned rectangles in instance space
E.g., power set of instance space
Introduction
39
Definitions (3)
Hypothesis: h x {0,1}
Approximation to target concept
Hypothesis class:
Space of all possible hypotheses
E.g., axis-aligned rectangles
E.g., axis-aligned ellipses
Learning goal
Find hypothesis h that closely approximates target
concept
h is the output classifier
Target concept may not be in
Introduction
40
Introduction
41
Definitions (4)
Empirical error
How well h classifies
training set X
D
1
h X = B 1 h x 7 y7
EF,
Generalization error
Version space
True error
How well h classifies
entire instance space
1
h = B 1 h x 7 y7
||
IJ
Introduction
42
G
C
S
Introduction
x1: Price
43
In general
Model (hypothesis): h x
Loss function: |X = E y7 , h x7
Optimization procedure: = argmin X
W
Introduction
44
Impact
Overfitting trying too hard to fit the hypothesis h to the
noisy data.
Introduction
45
x2
Underfitting vs Overfitting
h2
h1
x1
Introduction
46
Bias vs Variance
Low
Variance
UGH
ajor consequence:
ch of it you have.
ay) 100 variables
106 examples
u figure out what
her information,
ping a coin. This
ierent form) by
rs ago, but even
stem from failing
ody some knowlgiven in order to
d by Wolpert in
ding to which no
ossible functions
High
Variance
High
Bias
Low
Bias
Figure 1: Bias
and variance 2012
in dart-throwing.
Domingos, cacm
- Machine Learning
quirks in the data. CSL465/603
This problem
is called overfitting, and is
47
Characterization of Hypothesis
Space
Is the hypothesis deterministic or stochastic?
Deterministic - Training example is either consistent
(correctly predicted) or inconsistent (incorrectly predicted)
Stochastic Training example is more or less likely
(probabilistic output)
Introduction
48
Pedro Domingos
Search procedure
Direct computation solve for hypothesis directly
Local search start with an initial hypothesis, make small
improvements until a local optimum
Timing
Eager Analyze training data and construct an explicit
hypothesis
Online analyze each training example as it is presented
Batch collect training examples and analyze them together
Lazy Store the training data and wait until a test data
point is presented to construct the hypothesis
Introduction
49