W 1

CSL465/603 Machine Learning
Fall 2016
Narayanan C Krishnan
ckn@iitrpr.ac.in
Introduction
CSL465/603 - Machine Learning
Administrative Trivia
Course Structure 3-0-2
Lecture Timings
Monday 9.55-10.45am
Tuesday 10.50-11.40am
Wednesday 11.45am12.35pm
Lab hours
Monday 1.30-4.10pm
Tuesday 1.30-4.10pm
TA
Sanatan Sukhija
sanatan@iitrpr.ac.in
Second TA - TBD
Introduction
Office Hours
Instructor Monday afternoon

during the lab hours or by
appointment
TA- Monday and Tuesday lab
hours
Course google group
csl603f2016@iitrpr.ac.in
Pre-registered students will

be automatically added.
Others, please send an email

by Friday July 29th.
Pseudonym
Email your 5 character key by

July 29th.
Else we will assign a random
one for you.
Reference Material
No fixed textbook.
Primary reference books source will be announced
Other reference material
Copies of reference material is available in the library

Introduction
Pre-requisites
Officially CSL201 Data Structures
However, we will be using concepts from
Probability
Statistics
Linear Algebra
Optimization (operations research)
Revision might be helpful
Introduction
Tentative Course Schedule
Introduction
Quizzes 30%
Almost every Thursday
9.00-10.00am
Room - L3
Covers material
discussed from the
previous quiz till the
current week
Duration 30-45m
Top 6 out of 8 will be
considered towards the
final grade.
Quiz
Date
Q1
4/8
Q2
11/8
Q3
25/8
Q4
1/9
Q5
6/10
Q6
13/10
Q7
27/10
Q8
3/11
Additional quizzes will not

be conducted.
Introduction
Labs 30%
Due every third Friday
11.55pm
Programming
assignments
Start early, experiments
will take time to run!!!
Individual labs
TA is available for any
assistance
Labs
Date
L1
19/8
L2
9/9
L3
30/9
L4
21/10
L5
11/11
Students are
encouraged to contact
the TA for clarifications
regarding the labs
Introduction
Project 10% - Tentative

If project is included, contribution to the overall
grade from quizzes will reduce to 20%
Will be decided after the add and drop period is
over.
Teams of 2 students.
Introduction
Grading Scheme
Tentative Breakup
Quizzes (6 out of 8) 20-30%

Labs (5) 30%
Mid-semester exam 20%
End-semester exam - 20%
Attendance Bonus 1%
Attendance is not mandatory, however attendance will be taken
for every class and will count towards the bonus points.
Passing criteria
A student must secure an overall score of 40(out of 100)
and a combined exam score of 60 (out of 200) to pass
the course.
Introduction
Honor Code
Unless explicitly stated otherwise, for all labs
Strictly individual effort
Group discussions at a high level are encouraged
You are forbidden from trawling the web for
answers/code etc.
Any infraction will be dealt with the severest terms

allowed.
I reserve the right to question you with regards to
your submission, if I suspect any misconduct.
Introduction
10
Course Website
http://cse.iitrpr.ac.in/ckn/courses/f2016/csl603/csl60
3.html
All class related material will be accessible from the
webpage
Labs will be uploaded incrementally and will be
notified through email
Lab submission is only on moodle
No separate handouts, encourage you to take notes

during the class.
PDF version of lecture slides will be available on the
class website.
Introduction
11
What is Machine Learning?

Herbert Simon (1970)
Any process by which a system improves its performance
Tim Mitchell (1990)

A computer program that improves its performance at
some task through experience
Wikipedia
Deals with the construction and study of systems that can
learn from data, rather than follow only explicity
programmed instructions
Introduction
12
Why study machine learning?

Artificial Intelligence design and analysis of
intelligent agents
For an agent to exhibit intelligent behavior requires
knowledge
Explicitly specifying knowledge needed for specific
tasks is hard, and often infeasible
Learning an automated way to acquire
knowledge.
Introduction
13
Why study machine learning?
http://www.gartner.com/newsroom/id/3114217
Introduction
14
Related Disciplines
Probability and Statistics
Applied Mathematics
Operations Research
Pattern Recognition
Artificial Intelligence
Data Mining
Cognitive Science
Neuroscience
Big Data
Introduction
15
General Architecture
Pedro Domingos
Hundreds (if not thousands) of machine learning

algorithms
Generic architecture has three components
Representation
How would you like to characterize what is being learned?
Evaluation
How would you like to measure the goodness of what is being
learned
Optimization
Given the evaluation and characterization, find the optimum
representation.
Introduction
16
General Architecture Representation

Decision Trees
Instances
Bayes Networks
Neural Networks
Support Vector Machines
Ensembles
Gaussian Clusters
Introduction
17
General Architecture - Evaluation

Accuracy
Precision and recall
Sum of Squared Error
Likelihood
Posterior Probability
Margin
K-L Divergence
Entropy
Introduction
18
General ArchitectureOptimization
Combinatorial optimization
Greedy search
Convex optimization
Gradient descent
Constrained optimization
Linear programming
Introduction
19
Learning Paradigms and

Applications
1. Introduction
Supervised Learning
Classification
LeCun et. al., IEEE 1998
prostate specific antigen (PSA) and a number of clinical measures, in 97

men who were about to receive a radical prostatectomy.
The goal is to predict the log of PSA (lpsa) from a number of measurements including log cancer volume (lcavol), log prostate weight lweight,
age, log of benign prostatic hyperplasia amount lbph, seminal vesicle invasion svi, log of capsular penetration lcp, Gleason score gleason, and
percent of Gleason scores 4 or 5 pgg45. Figure 1.1 is a scatterplot matrix
of the variables. Some correlations with lpsa are evident, but a good predictive model is dicult to construct by eye.
This is a supervised learning problem, known as a regression problem,
because the outcome measurement is quantitative.
Example 3: Handwritten Digit Recognition
Introduction
The data from this example come from the handwritten ZIP codes on
envelopes from U.S. postal mail. Each image is a segment from a five digit
ZIP code, isolating a single digit. The images are 1616 eight-bit grayscale
maps, with each pixel ranging in intensity from 0 to 255. Some sample
images are shown in Figure 1.2.
The images have been normalized to have approximately the same size
and orientation. The task is to predict, from the 16 16 matrix of pixel
intensities, the identity of each image (0, 1, . . . , 9) quickly and accurately. If
it is accurate enough, the resulting algorithm would be used as part of an
automatic sorting procedure for envelopes. This is a classification problem
for which the error rate needs to be kept very low to avoid misdirection of
Krizhevsky et. al., nips 2012
FIGURE 1.2. Examples of handwritten digits from U.S. postal envelopes.
20 the
Figure 4: (Left) Eight ILSVRC-2010 test images and
Supervised Learning
Classification
Regression
Introduction
https://www.flickr.com/photos/306864
29@N07/sets/72157622330082619/

Applications
21

Applications
Supervised Learning
Classification
Regression
Unsupervised Learning
Clustering
Wiwie et.al., nature 2015
Introduction
22

Applications
Supervised Learning
Classification
Regression
Clustering
Rule Mining
Introduction
23

Applications
Supervised Learning
Classification
Regression
Clustering
Rule Mining
Semi-supervised
Learning
Shah et.al., bioinformatics 2015

Introduction
24
Reminder
If you have decided to credit this course and have
not pre-registered
Send me an email at the earliest to add you to the google
group.
PG(MS, M.Tech, and PhD) students who are

crediting the course, please meet me after todays
class.
There is no audit option in the course
You can credit the course, or just attend the lectures
If you have pre-registered and have decided to drop

the course
Please do so at the earliest, as it will help us organize the
course and the TAs.
Introduction
25

Applications
Supervised Learning
Classification
Regression
Clustering
Rule Mining
Semi-supervised
Learning
Dimensionality
Reduction
Introduction
Tenenbaum et.al., science 2000
26

Applications
Supervised Learning
Classification
Regression
Clustering
Rule Mining
Semi-supervised
Learning
Dimensionality
Reduction
Reinforcement
Learning
Introduction
Kormushev et.al., robotics 2013
27
Other Learning Paradigms

Transfer Learning
Transfer of knowledge between multiple domains
Active Learning
Learning algorithm interactively queries an oracle to
obtain the desired outputs for new data points
Online Learning
Learning on the fly
Zero shot learning
Representation Learning
Automatically learning the representation from raw data
Deep Learning
Introduction
28
Topics to be covered in this

course*
Supervised Learning
Decision trees, Nave Bayes classifier, Instance based
learning (k-NN), Linear and Logistic regression, Artificial
neural networks, Kernel methods, Ensembles.
Clustering
Dimensionality reduction
Temporal models
Hidden Markov model
Design and Analysis of Experiments

*Tentative
Introduction
29
Machine Learning in Practice

Pedro Domingos
Understanding the domain, prior knowledge, and

goals
Data collection, integration, selection, cleaning, preprocessing,
Learning models
Interpreting results
Consolidating and delpoying discovered knowledge
Loop...
Introduction
30
Machine Learning Challenges

Curse of Dimensionality
Intuition fails in high dimensional spaces
Overfitting
Things look rosy while training, but fail miserably when testing
Sample size (number of examples)

Often obtaining good examples is a hard, cumbersome, and
error-prone process
What algorithm to choose?

No clear answer on what approach to select from the different
options.
Too many knobs (hyper-parameters) to turn

Carefully conducted experiments that search through the
hyper-parameter space for the optimal setting
Introduction
31
Machine Learning Resources

Data Repositories
UCI ML repository
Challenges
Kaggle, KDD cup,
Software
Weka (Java)
R (~ Python)
Machine learning open source software
(mloss.org/software)
LibSVM
Conferences and Journals

ICDM, ICML, KDD, IJCAI, AAAI, UAI, AISTATS, COLT, ...
ACM TKDD, IEEE TKDE, JMLR, MLJ, ...
Introduction
32
Supervised Learning
Supervised Learning
33
Supervised Learning
Given a set of training examples x, x = y , for
some unknown function
Estimate a good approximation to
Example applications
Face recognition
x: raw intensity face image
(x): name of the person.
Loan approval
x: properties of a customer (like age, income, liability, job, )
(x): loan approved or not.
Autonomous Steering
x: image of the road ahead
(x): Degrees to turn the steering wheel.
Introduction
34
Example: Family Car

Learning Task
Learn to classify cars into one of two classes- family car
or otherwise
Representation
Each car is represented by two features (attributes)
engine power and price
Training set
Several training examples of already classified cars
Goal
Learn a classifier that accurately classified (new unseen)
cars
Supervised Learning
35
x2: Engine power
Example: Cars
x2t
x1t
Introduction
x1: Price
36
Definitions (1)
Feature (attribute): )
A property of the object to be classified

Discrete or continuous
E.g., engine power, price
Instance: x = [, , - , , / ]
The feature values for a specific object

E.g., engine power = 100, price = high
Instance space:
Space of all possible instances
Class:
Categorical feature of an object
Set of instances of objects in this category
E.g., family car
Introduction
37
x2: Engine power
Example: Family Car
e2
e1
Introduction
p1
p2
x1: Price
38
Definitions (2)
Example: (x, y)
Instance along with its class membership
Positive example: member of class (y = 1)
Negative example: not a member of class (y = 0)
Training set: X = {x7 , y7 }, 1

Set of N examples
Target concept ()
Correct expression of class
E.g., (e1 engine power e2) and (p1 price p2)
Concept class
Space of all possible target concepts
E.g., axis-aligned rectangles in instance space
E.g., power set of instance space
Introduction
39
Definitions (3)
Hypothesis: h x {0,1}
Approximation to target concept
Hypothesis class:
Space of all possible hypotheses
E.g., axis-aligned rectangles
E.g., axis-aligned ellipses
Learning goal
Find hypothesis h that closely approximates target
concept
h is the output classifier
Target concept may not be in
Introduction
40
Example: Hypothesis Error
Introduction
41
Definitions (4)
Empirical error
How well h classifies
training set X
D
1
h X = B 1 h x 7 y7
EF,
Generalization error
Most specific hypothesis

Consistent hypothesis
covering fewest instances
Most general hypothesis

Consistent hypothesis
covering most instances

instances not in X
Version space
True error
entire instance space
1
h = B 1 h x 7 y7
||
All hypothesis between

and
IJ
Introduction
42
x2: Engine power
Example: Version Space
G
C
S
Introduction
x1: Price
43
Thinking of Supervised Learning

Learning is the removal of our remaining uncertainty
Suppose we know that the concept is a rectangle, we can
use the training data to infer the correct rectangle.
In general
Model (hypothesis): h x
Loss function: |X = E y7 , h x7
Optimization procedure: = argmin X
W
Introduction
44
Learning under noisy conditions

Sources for noise
Incorrect feature values
Incorrect class labels
Hidden or latent features (missing)
Impact
Overfitting trying too hard to fit the hypothesis h to the
noisy data.
Introduction
45
x2
Underfitting vs Overfitting
h2
h1
x1
Introduction
46
Bias vs Variance
Low
Variance
UGH
ajor consequence:
ch of it you have.
ay) 100 variables
106 examples
u figure out what
her information,
ping a coin. This
ierent form) by
rs ago, but even
stem from failing
ody some knowlgiven in order to
d by Wolpert in
ding to which no
ossible functions
How then can we

unctions we want
uniformly from
nctions!
In fact,
Introduction
s, similar exam-
High
Variance
High
Bias
Low
Bias
Figure 1: Bias
and variance 2012
in dart-throwing.
Domingos, cacm
- Machine Learning
quirks in the data. CSL465/603
This problem
is called overfitting, and is
47
Characterization of Hypothesis
Space
Is the hypothesis deterministic or stochastic?
Deterministic - Training example is either consistent
(correctly predicted) or inconsistent (incorrectly predicted)
Stochastic Training example is more or less likely
(probabilistic output)
Parametrization discrete or continuous? (or

mixed)
Discrete space perform combinatorial search
Continuous space perform numerical search
Introduction
48
Framework for Learning

Algorithms
Pedro Domingos
Search procedure
Direct computation solve for hypothesis directly
Local search start with an initial hypothesis, make small
improvements until a local optimum
Timing
Eager Analyze training data and construct an explicit
hypothesis
Online analyze each training example as it is presented
Batch collect training examples and analyze them together
Lazy Store the training data and wait until a test data
point is presented to construct the hypothesis
Introduction
49

W 1

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

W 1

Uploaded by

Copyright:

Available Formats

CSL465/603 Machine Learning

CSL465/603 - Machine Learning

Instructor Monday afternoon

Course google group

Pre-registered students will

Others, please send an email

Email your 5 character key by

CSL465/603 - Machine Learning

Other reference material

Copies of reference material is available in the library

CSL465/603 - Machine Learning

Revision might be helpful

CSL465/603 - Machine Learning

Tentative Course Schedule

CSL465/603 - Machine Learning

Additional quizzes will not

CSL465/603 - Machine Learning

CSL465/603 - Machine Learning

Project 10% - Tentative

CSL465/603 - Machine Learning

Quizzes (6 out of 8) 20-30%

CSL465/603 - Machine Learning

Any infraction will be dealt with the severest terms

CSL465/603 - Machine Learning

No separate handouts, encourage you to take notes

CSL465/603 - Machine Learning

What is Machine Learning?

Tim Mitchell (1990)

CSL465/603 - Machine Learning

Why study machine learning?

CSL465/603 - Machine Learning

Why study machine learning?

CSL465/603 - Machine Learning

CSL465/603 - Machine Learning

Hundreds (if not thousands) of machine learning

CSL465/603 - Machine Learning

General Architecture Representation

CSL465/603 - Machine Learning

General Architecture - Evaluation

CSL465/603 - Machine Learning

CSL465/603 - Machine Learning

Learning Paradigms and

LeCun et. al., IEEE 1998

prostate specific antigen (PSA) and a number of clinical measures, in 97

Example 3: Handwritten Digit Recognition

Krizhevsky et. al., nips 2012

FIGURE 1.2. Examples of handwritten digits from U.S. postal envelopes.

CSL465/603 - Machine Learning

CSL465/603 - Machine Learning

Learning Paradigms and

Learning Paradigms and

Wiwie et.al., nature 2015

CSL465/603 - Machine Learning

Learning Paradigms and

CSL465/603 - Machine Learning

Learning Paradigms and

Shah et.al., bioinformatics 2015

CSL465/603 - Machine Learning

PG(MS, M.Tech, and PhD) students who are

If you have pre-registered and have decided to drop

CSL465/603 - Machine Learning

Learning Paradigms and

Tenenbaum et.al., science 2000