You are on page 1of 49

CSL465/603 Machine Learning

Fall 2016
Narayanan C Krishnan
ckn@iitrpr.ac.in

Introduction

CSL465/603 - Machine Learning

Administrative Trivia
Course Structure 3-0-2
Lecture Timings

Monday 9.55-10.45am
Tuesday 10.50-11.40am
Wednesday 11.45am12.35pm

Lab hours

Monday 1.30-4.10pm
Tuesday 1.30-4.10pm

TA

Sanatan Sukhija
sanatan@iitrpr.ac.in

Second TA - TBD

Introduction

Office Hours

Instructor Monday afternoon


during the lab hours or by
appointment
TA- Monday and Tuesday lab
hours

Course google group

csl603f2016@iitrpr.ac.in

Pre-registered students will


be automatically added.

Others, please send an email


by Friday July 29th.

Pseudonym

Email your 5 character key by


July 29th.
Else we will assign a random
one for you.

CSL465/603 - Machine Learning

Reference Material
No fixed textbook.
Primary reference books source will be announced

Other reference material

Copies of reference material is available in the library


Introduction

CSL465/603 - Machine Learning

Pre-requisites
Officially CSL201 Data Structures
However, we will be using concepts from

Probability
Statistics
Linear Algebra
Optimization (operations research)

Revision might be helpful

Introduction

CSL465/603 - Machine Learning

Tentative Course Schedule

Introduction

CSL465/603 - Machine Learning

Quizzes 30%
Almost every Thursday
9.00-10.00am
Room - L3

Covers material
discussed from the
previous quiz till the
current week
Duration 30-45m
Top 6 out of 8 will be
considered towards the
final grade.

Quiz

Date

Q1

4/8

Q2

11/8

Q3

25/8

Q4

1/9

Q5

6/10

Q6

13/10

Q7

27/10

Q8

3/11

Additional quizzes will not


be conducted.
Introduction

CSL465/603 - Machine Learning

Labs 30%
Due every third Friday
11.55pm
Programming
assignments
Start early, experiments
will take time to run!!!

Individual labs
TA is available for any
assistance

Labs

Date

L1

19/8

L2

9/9

L3

30/9

L4

21/10

L5

11/11

Students are
encouraged to contact
the TA for clarifications
regarding the labs
Introduction

CSL465/603 - Machine Learning

Project 10% - Tentative


If project is included, contribution to the overall
grade from quizzes will reduce to 20%
Will be decided after the add and drop period is
over.
Teams of 2 students.

Introduction

CSL465/603 - Machine Learning

Grading Scheme
Tentative Breakup

Quizzes (6 out of 8) 20-30%


Labs (5) 30%
Mid-semester exam 20%
End-semester exam - 20%
Attendance Bonus 1%
Attendance is not mandatory, however attendance will be taken
for every class and will count towards the bonus points.

Passing criteria
A student must secure an overall score of 40(out of 100)
and a combined exam score of 60 (out of 200) to pass
the course.
Introduction

CSL465/603 - Machine Learning

Honor Code
Unless explicitly stated otherwise, for all labs
Strictly individual effort
Group discussions at a high level are encouraged
You are forbidden from trawling the web for
answers/code etc.

Any infraction will be dealt with the severest terms


allowed.
I reserve the right to question you with regards to
your submission, if I suspect any misconduct.

Introduction

CSL465/603 - Machine Learning

10

Course Website
http://cse.iitrpr.ac.in/ckn/courses/f2016/csl603/csl60
3.html
All class related material will be accessible from the
webpage
Labs will be uploaded incrementally and will be
notified through email
Lab submission is only on moodle

No separate handouts, encourage you to take notes


during the class.
PDF version of lecture slides will be available on the
class website.
Introduction

CSL465/603 - Machine Learning

11

What is Machine Learning?


Herbert Simon (1970)
Any process by which a system improves its performance

Tim Mitchell (1990)


A computer program that improves its performance at
some task through experience

Wikipedia
Deals with the construction and study of systems that can
learn from data, rather than follow only explicity
programmed instructions

Introduction

CSL465/603 - Machine Learning

12

Why study machine learning?


Artificial Intelligence design and analysis of
intelligent agents
For an agent to exhibit intelligent behavior requires
knowledge
Explicitly specifying knowledge needed for specific
tasks is hard, and often infeasible
Learning an automated way to acquire
knowledge.

Introduction

CSL465/603 - Machine Learning

13

Why study machine learning?

http://www.gartner.com/newsroom/id/3114217
Introduction

CSL465/603 - Machine Learning

14

Related Disciplines
Probability and Statistics
Applied Mathematics
Operations Research
Pattern Recognition
Artificial Intelligence
Data Mining
Cognitive Science
Neuroscience
Big Data

Introduction

CSL465/603 - Machine Learning

15

General Architecture

Pedro Domingos

Hundreds (if not thousands) of machine learning


algorithms
Generic architecture has three components
Representation
How would you like to characterize what is being learned?

Evaluation
How would you like to measure the goodness of what is being
learned

Optimization
Given the evaluation and characterization, find the optimum
representation.

Introduction

CSL465/603 - Machine Learning

16

General Architecture Representation


Decision Trees
Instances
Bayes Networks
Neural Networks
Support Vector Machines
Ensembles
Gaussian Clusters

Introduction

CSL465/603 - Machine Learning

17

General Architecture - Evaluation


Accuracy
Precision and recall
Sum of Squared Error
Likelihood
Posterior Probability
Margin
K-L Divergence
Entropy

Introduction

CSL465/603 - Machine Learning

18

General ArchitectureOptimization
Combinatorial optimization
Greedy search

Convex optimization
Gradient descent

Constrained optimization
Linear programming

Introduction

CSL465/603 - Machine Learning

19

Learning Paradigms and


Applications
1. Introduction

Supervised Learning
Classification

LeCun et. al., IEEE 1998

prostate specific antigen (PSA) and a number of clinical measures, in 97


men who were about to receive a radical prostatectomy.
The goal is to predict the log of PSA (lpsa) from a number of measurements including log cancer volume (lcavol), log prostate weight lweight,
age, log of benign prostatic hyperplasia amount lbph, seminal vesicle invasion svi, log of capsular penetration lcp, Gleason score gleason, and
percent of Gleason scores 4 or 5 pgg45. Figure 1.1 is a scatterplot matrix
of the variables. Some correlations with lpsa are evident, but a good predictive model is dicult to construct by eye.
This is a supervised learning problem, known as a regression problem,
because the outcome measurement is quantitative.

Example 3: Handwritten Digit Recognition

Introduction

The data from this example come from the handwritten ZIP codes on
envelopes from U.S. postal mail. Each image is a segment from a five digit
ZIP code, isolating a single digit. The images are 1616 eight-bit grayscale
maps, with each pixel ranging in intensity from 0 to 255. Some sample
images are shown in Figure 1.2.
The images have been normalized to have approximately the same size
and orientation. The task is to predict, from the 16 16 matrix of pixel
intensities, the identity of each image (0, 1, . . . , 9) quickly and accurately. If
it is accurate enough, the resulting algorithm would be used as part of an
automatic sorting procedure for envelopes. This is a classification problem
for which the error rate needs to be kept very low to avoid misdirection of

Krizhevsky et. al., nips 2012

FIGURE 1.2. Examples of handwritten digits from U.S. postal envelopes.

20 the
Figure 4: (Left) Eight ILSVRC-2010 test images and

CSL465/603 - Machine Learning

Supervised Learning
Classification
Regression

Introduction

CSL465/603 - Machine Learning

https://www.flickr.com/photos/306864
29@N07/sets/72157622330082619/

Learning Paradigms and


Applications

21

Learning Paradigms and


Applications
Supervised Learning
Classification
Regression

Unsupervised Learning
Clustering

Wiwie et.al., nature 2015

Introduction

CSL465/603 - Machine Learning

22

Learning Paradigms and


Applications
Supervised Learning
Classification
Regression

Unsupervised Learning
Clustering
Rule Mining

Introduction

CSL465/603 - Machine Learning

23

Learning Paradigms and


Applications
Supervised Learning
Classification
Regression

Unsupervised Learning
Clustering
Rule Mining

Semi-supervised
Learning

Shah et.al., bioinformatics 2015


Introduction

CSL465/603 - Machine Learning

24

Reminder
If you have decided to credit this course and have
not pre-registered
Send me an email at the earliest to add you to the google
group.

PG(MS, M.Tech, and PhD) students who are


crediting the course, please meet me after todays
class.
There is no audit option in the course
You can credit the course, or just attend the lectures

If you have pre-registered and have decided to drop


the course
Please do so at the earliest, as it will help us organize the
course and the TAs.
Introduction

CSL465/603 - Machine Learning

25

Learning Paradigms and


Applications
Supervised Learning
Classification
Regression

Unsupervised Learning
Clustering
Rule Mining

Semi-supervised
Learning
Dimensionality
Reduction
Introduction

Tenenbaum et.al., science 2000

CSL465/603 - Machine Learning

26

Learning Paradigms and


Applications
Supervised Learning
Classification
Regression

Unsupervised Learning
Clustering
Rule Mining

Semi-supervised
Learning
Dimensionality
Reduction
Reinforcement
Learning
Introduction

Kormushev et.al., robotics 2013

CSL465/603 - Machine Learning

27

Other Learning Paradigms


Transfer Learning
Transfer of knowledge between multiple domains

Active Learning
Learning algorithm interactively queries an oracle to
obtain the desired outputs for new data points

Online Learning
Learning on the fly
Zero shot learning

Representation Learning
Automatically learning the representation from raw data
Deep Learning

Introduction

CSL465/603 - Machine Learning

28

Topics to be covered in this


course*
Supervised Learning
Decision trees, Nave Bayes classifier, Instance based
learning (k-NN), Linear and Logistic regression, Artificial
neural networks, Kernel methods, Ensembles.

Unsupervised Learning
Clustering

Dimensionality reduction
Temporal models
Hidden Markov model

Design and Analysis of Experiments


*Tentative
Introduction

CSL465/603 - Machine Learning

29

Machine Learning in Practice


Pedro Domingos

Understanding the domain, prior knowledge, and


goals
Data collection, integration, selection, cleaning, preprocessing,
Learning models
Interpreting results
Consolidating and delpoying discovered knowledge
Loop...

Introduction

CSL465/603 - Machine Learning

30

Machine Learning Challenges


Curse of Dimensionality
Intuition fails in high dimensional spaces

Overfitting
Things look rosy while training, but fail miserably when testing

Sample size (number of examples)


Often obtaining good examples is a hard, cumbersome, and
error-prone process

What algorithm to choose?


No clear answer on what approach to select from the different
options.

Too many knobs (hyper-parameters) to turn


Carefully conducted experiments that search through the
hyper-parameter space for the optimal setting
Introduction

CSL465/603 - Machine Learning

31

Machine Learning Resources


Data Repositories
UCI ML repository
Challenges
Kaggle, KDD cup,

Software
Weka (Java)
R (~ Python)
Machine learning open source software
(mloss.org/software)
LibSVM

Conferences and Journals


ICDM, ICML, KDD, IJCAI, AAAI, UAI, AISTATS, COLT, ...
ACM TKDD, IEEE TKDE, JMLR, MLJ, ...
Introduction

CSL465/603 - Machine Learning

32

Supervised Learning

Supervised Learning

CSL465/603 - Machine Learning

33

Supervised Learning
Given a set of training examples x, x = y , for
some unknown function
Estimate a good approximation to

Example applications
Face recognition
x: raw intensity face image
(x): name of the person.

Loan approval
x: properties of a customer (like age, income, liability, job, )
(x): loan approved or not.

Autonomous Steering
x: image of the road ahead
(x): Degrees to turn the steering wheel.
Introduction

CSL465/603 - Machine Learning

34

Example: Family Car


Learning Task
Learn to classify cars into one of two classes- family car
or otherwise

Representation
Each car is represented by two features (attributes)
engine power and price

Training set
Several training examples of already classified cars

Goal
Learn a classifier that accurately classified (new unseen)
cars
Supervised Learning

CSL465/603 - Machine Learning

35

x2: Engine power

Example: Cars

x2t

x1t
Introduction

CSL465/603 - Machine Learning

x1: Price
36

Definitions (1)
Feature (attribute): )

A property of the object to be classified


Discrete or continuous
E.g., engine power, price

Instance: x = [, , - , , / ]

The feature values for a specific object


E.g., engine power = 100, price = high

Instance space:
Space of all possible instances

Class:
Categorical feature of an object
Set of instances of objects in this category
E.g., family car
Introduction

CSL465/603 - Machine Learning

37

x2: Engine power

Example: Family Car

e2

e1

Introduction

p1

p2
CSL465/603 - Machine Learning

x1: Price

38

Definitions (2)
Example: (x, y)
Instance along with its class membership
Positive example: member of class (y = 1)
Negative example: not a member of class (y = 0)

Training set: X = {x7 , y7 }, 1


Set of N examples

Target concept ()
Correct expression of class
E.g., (e1 engine power e2) and (p1 price p2)

Concept class
Space of all possible target concepts
E.g., axis-aligned rectangles in instance space
E.g., power set of instance space
Introduction

CSL465/603 - Machine Learning

39

Definitions (3)
Hypothesis: h x {0,1}
Approximation to target concept

Hypothesis class:
Space of all possible hypotheses
E.g., axis-aligned rectangles
E.g., axis-aligned ellipses

Learning goal
Find hypothesis h that closely approximates target
concept
h is the output classifier
Target concept may not be in
Introduction

CSL465/603 - Machine Learning

40

Example: Hypothesis Error

Introduction

CSL465/603 - Machine Learning

41

Definitions (4)
Empirical error
How well h classifies
training set X
D
1
h X = B 1 h x 7 y7

EF,

Generalization error

Most specific hypothesis


Consistent hypothesis
covering fewest instances

Most general hypothesis


Consistent hypothesis
covering most instances

How well h classifies


instances not in X

Version space

True error
How well h classifies
entire instance space
1
h = B 1 h x 7 y7
||

All hypothesis between


and

IJ

Introduction

CSL465/603 - Machine Learning

42

x2: Engine power

Example: Version Space

G
C
S

Introduction

CSL465/603 - Machine Learning

x1: Price

43

Thinking of Supervised Learning


Learning is the removal of our remaining uncertainty
Suppose we know that the concept is a rectangle, we can
use the training data to infer the correct rectangle.

In general
Model (hypothesis): h x
Loss function: |X = E y7 , h x7
Optimization procedure: = argmin X
W

Introduction

CSL465/603 - Machine Learning

44

Learning under noisy conditions


Sources for noise
Incorrect feature values
Incorrect class labels
Hidden or latent features (missing)

Impact
Overfitting trying too hard to fit the hypothesis h to the
noisy data.

Introduction

CSL465/603 - Machine Learning

45

x2

Underfitting vs Overfitting

h2

h1

x1
Introduction

CSL465/603 - Machine Learning

46

Bias vs Variance
Low
Variance

UGH

ajor consequence:
ch of it you have.
ay) 100 variables
106 examples
u figure out what
her information,
ping a coin. This
ierent form) by
rs ago, but even
stem from failing
ody some knowlgiven in order to
d by Wolpert in
ding to which no
ossible functions

How then can we


unctions we want
uniformly from
nctions!
In fact,
Introduction
s, similar exam-

High
Variance

High
Bias

Low
Bias

Figure 1: Bias
and variance 2012
in dart-throwing.
Domingos, cacm
- Machine Learning
quirks in the data. CSL465/603
This problem
is called overfitting, and is

47

Characterization of Hypothesis
Space
Is the hypothesis deterministic or stochastic?
Deterministic - Training example is either consistent
(correctly predicted) or inconsistent (incorrectly predicted)
Stochastic Training example is more or less likely
(probabilistic output)

Parametrization discrete or continuous? (or


mixed)
Discrete space perform combinatorial search
Continuous space perform numerical search

Introduction

CSL465/603 - Machine Learning

48

Framework for Learning


Algorithms

Pedro Domingos

Search procedure
Direct computation solve for hypothesis directly
Local search start with an initial hypothesis, make small
improvements until a local optimum

Timing
Eager Analyze training data and construct an explicit
hypothesis
Online analyze each training example as it is presented
Batch collect training examples and analyze them together

Lazy Store the training data and wait until a test data
point is presented to construct the hypothesis

Introduction

CSL465/603 - Machine Learning

49

You might also like