Lecture 0

Introduction
Le Song
Machine Learning I
CSE 6740, Fall 2013
What is Machine Learning (ML)
Study of algorithms that improve their performance at some
task with experience
2
Common to Industrial scale problems
13 million wikipedia pages
800 million users
6 billion photos
24 hours video uploaded per minutes
> 1 trillion webpages
3
Organizing Images
Image
Databases
What are the desired

outcomes?
What are the inputs

(data)?
What are the learning

paradigms? 4
Visualize Image Relations
Each image has
thousands or millions of
pixels.

outcomes?
What are the inputs

(data)?

paradigms?
5
Organizing documents
Reading, digesting, and We want:
categorizing a vast text database
is too much for human!
What are the desired outcomes?
What are the inputs (data)?
What are the learning paradigms?

6
Weather Prediction
Numeric values:
40 F
Predict Wind: NE at 14 km/h
Humidity: 83%

Predict
7
Face Detection

outcomes?

paradigms?
8
Understanding brain activity

outcomes?
What are the inputs

(data)?

paradigms?
9
Product Recommendation

outcomes?
What are the inputs

(data)?

paradigms?
10
Handwritten digit recognition/text annotation
Inter-word
dependency
Inter-character
dependency Aoccdrnig to a sudty at Cmabrigde
Uinervtisy, it deosnt mttaer in waht
oredr the ltteers in a wrod are, the
olny iprmoetnt tihng is taht the frist
and lsat ltteer be at the rghit pclae.
The rset can be a ttoal mses and you
can sitll raed it wouthit a porbelm.
Tihs is bcuseae the huamn mnid
deos not raed ervey lteter by istlef,
but the wrod as a wlohe.

11
Similar problem: speech recognition
Models Hidden Markov Models
Text Machine Learning is the preferred method for speech recognition
Audio signals
12
cacatcgctgcgtttcggcagctaattgccttttagaaattattttcccatttcgagaaactcgtgtgggatgccggatgcggctttcaatcacttctggcccgggatcggattgggtcacattgtctgcgggctctattgtctcgatccgc
ggcgcagttcgcgtgcttagcggtcagaaaggcagagattcggttcggattgatgcgctggcagcagggcacaaagatctaatgactggcaaatcgctacaaataaattaaagtccggcggctaattaatgagcggactgaagccactttgg
attaaccaaaaaacagcagataaacaaaaacggcaaagaaaattgccacagagttgtcacgctttgttgcacaaacatttgtgcagaaaagtgaaaagcttttagccattattaagtttttcctcagctcgctggcagcacttgcgaatgta
Similar problem: bioinformatics

ctgatgttcctcataaatgaaaattaatgtttgctctacgctccaccgaactcgcttgtttgggggattggctggctaatcgcggctagatcccaggcggtataaccttttcgcttcatcagttgtgaaaccagatggctggtgttttggca
cagcggactcccctcgaacgctctcgaaatcaagtggctttccagccggcccgctgggccgctcgcccactggaccggtattcccaggccaggccacactgtaccgcaccgcataatcctcgccagactcggcgctgataaggcccaatgtc
actccgcaggcgtctatttatgccaaggaccgttcttcttcagctttcggctcgagtatttgttgtgccatgttggttacgatgccaatcgcggtacagttatgcaaatgagcagcgaataccgctcactgacaatgaacggcgtcttgtca
tattcatgctgacattcatattcattcctttggttttttgtcttcgacggactgaaaagtgcggagagaaacccaaaaacagaagcgcgcaaagcgccgttaatatgcgaactcagcgaactcattgaagttatcacaacaccatatccata
catatccatatcaatatcaatatcgctattattaacgatcatgctctgctgatcaagtattcagcgctgcgctagattcgacagattgaatcgagctcaatagactcaacagactccactcgacagatgcgcaatgccaaggacaattgccg
tggagtaaacgaggcgtatgcgcaacctgcacctggcggacgcggcgtatgcgcaatgtgcaattcgcttaccttctcgttgcgggtcaggaactcccagatgggaatggccgatgacgagctgatctgaatgtggaaggcgcccagcaggc
aagattactttcgccgcagtcgtcatggtgtcgttgctgcttttatgttgcgtactccgcactacacggagagttcaggggattcgtgctccgtgatctgtgatccgtgttccgtgggtcaattgcacggttcggttgtgtaaccttcgtgt
tctttttttttagggcccaataaaagcgcttttgtggcggcttgatagattatcacttggtttcggtggctagccaagtggctttcttctgtccgacgcacttaattgaattaaccaaacaacgagcgtggccaattcgtattatcgctgtt
tacgtgtgtctcagcttgaaacgcaaaagcttgtttcacacatcggtttctcggcaagatgggggagtcagtcggtctagggagaggggcgcccaccagtcgatcacgaaaacggcgaattccaagcgaaacggaaacggagcgagcactat
agtactatgtcgaacaaccgatcgcggcgatgtcagtgagtcgtcttcggacagcgctggcgctccacacgtatttaagctctgagatcggctttgggagagcgcagagagcgccatcgcacggcagagcgaaagcggcagtgagcgaaagc
gagcggcagcgggtgggggatcgggagccccccgaaaaaaacagaggcgcacgtcgatgccatcggggaattggaacctcaatgtgtgggaatgtttaaatattctgtgttaggtagtgtagtttcatagactatagattctcatacagatt
gagtccttcgagccgattatacacgacagcaaaatatttcagtcgcgcttgggcaaaaggcttaagcacgactcccagtccccccttacatttgtcttcctaagcccctggagccactatcaaacttgttctacgcttgcactgaaaataga
accaaagtaaacaatcaaaaagaccaaaaacaataacaaccagcaccgagtcgaacatcagtgaggcattgcaaaaatttcaaagtcaagtttgcgtcgtcatcgcgtctgagtccgatcaagccgggcttgtaattgaagttgttgatgag
ttactggattgtggcgaattctggtcagcatacttaacagcagcccgctaattaagcaaaataaacatatcaaattccagaatgcgacggcgccatcatcctgtttgggaattcaattcgcgggcagatcgtttaattcaattaaaaggtag
aaaagggagcagaagaatgcgatcgctggaatttcctaacatcacggaccccataaatttgataagcccgagctcgctgcgttgagtcagccaccccacatccccaaatccccgccaaaagaagacagctgggttgttgactcgccagattg
attgcagtggagtggacctggtcaaagaagcaccgttaatgtgctgattccattcgattccatccgggaatgcgataaagaaaggctctgatccaagcaactgcaatccggatttcgattttctctttccatttggttttgtatttacgtac
aagcattctaatgaagacttggagaagacttacgttatattcagaccatcgtgcgatagaggatgagtcatttccatatggccgaaatttattatgtttactatcgtttttagaggtgttttttggacttaccaaaagaggcatttgttttc
ttcaactgaaaagatatttaaattttttcttggaccattttcaaggttccggatatatttgaaacacactagctagcagtgttggtaagttacatgtatttctataatgtcatattcctttgtccgtattcaaatcgaatactccacatctc
ttgtacttgaggaattggcgatcgtagcgatttcccccgccgtaaagttcctgatcctcgttgtttttgtacatcataaagtccggattctgctcgtcgccgaagatgggaacgaagctgccaaagctgagagtctgcttgaggtgctggtc
gtcccagctggataaccttgctgtacagatcggcatctgcctggagggcacgatcgaaatccttccagtggacgaacttcacctgctcgctgggaatagcgttgttgtcaagcagctcaaggagcgtattcgagttgacgggctgcaccacg
ctgctccttcgctggggattcccctgcgggtaagcgccgcttgcttggactcgtttccaaatcccatagccacgccagcagaggagtaacagagctcwhereisthegenetgattaaaaatatcctttaagaaagcccatgggtataactt
actgcgtcctatgcgaggaatggtctttaggttctttatggcaaagttctcgcctcgcttgcccagccgcggtacgttcttggtgatctttaggaagaatcctggactactgtcgtctgcctggcttatggccacaagacccaccaagagcg
aggactgttatgattctcatgctgatgcgactgaagcttcacctgactcctgctccacaattggtggcctttatatagcgagatccacccgcatcttgcgtggaatagaaatgcgggtgactccaggaattagcattatcgatcggaaagtg
ataaaactgaactaacctgacctaaatgcctggccataattaagtgcatacatacacattacattacttacatttgtataagaactaaattttatagtacataccacttgcgtatgtaaatgcttgtcttttctcttatatacgttttataa
cccagcatattttacgtaaaaacaaaacggtaatgcgaacataacttatttattggggcccggaccgcaaaccggccaaacgcgtttgcacccataaaaacataagggcaacaaaaaaattgttaagctgttgtttatttttgcaatcgaaa
cgctcaaatagctgcgatcactcgggagcagggtaaagtcgcctcgaaacaggaagctgaagcatcttctataaatacactcaaagcgatcattccgaggcgagtctggttagaaatttacatggactgcaaaaaggtatagccccacaaac
cacatcgctgcgtttcggcagctaattgccttttagaaattattttcccatttcgagaaactcgtgtgggatgccggatgcggctttcaatcacttctggcccgggatcggattgggtcacattgtctgcgggctctattgtctcgatccgc
ggcgcagttcgcgtgcttagcggtcagaaaggcagagattcggttcggattgatgcgctggcagcagggcacaaagatctaatgactggcaaatcgctacaaataaattaaagtccggcggctaattaatgagcggactgaagccactttgg
attaaccaaaaaacagcagataaacaaaaacggcaaagaaaattgccacagagttgtcacgctttgttgcacaaacatttgtgcagaaaagtgaaaagcttttagccattattaagtttttcctcagctcgctggcagcacttgcgaatgta
ctgatgttcctcataaatgaaaattaatgtttgctctacgctccaccgaactcgcttgtttgggggattggctggctaatcgcggctagatcccaggcggtataaccttttcgcttcatcagttgtgaaaccagatggctggtgttttggca
cagcggactcccctcgaacgctctcgaaatcaagtggctttccagccggcccgctgggccgctcgcccactggaccggtattcccaggccaggccacactgtaccgcaccgcataatcctcgccagactcggcgctgataaggcccaatgtc
actccgcaggcgtctatttatgccaaggaccgttcttcttcagctttcggctcgagtatttgttgtgccatgttggttacgatgccaatcgcggtacagttatgcaaatgagcagcgaataccgctcactgacaatgaacggcgtcttgtca
tattcatgctgacattcatattcattcctttggttttttgtcttcgacggactgaaaagtgcggagagaaacccaaaaacagaagcgcgcaaagcgccgttaatatgcgaactcagcgaactcattgaagttatcacaacaccatatccata
catatccatatcaatatcaatatcgctattattaacgatcatgctctgctgatcaagtattcagcgctgcgctagattcgacagattgaatcgagctcaatagactcaacagactccactcgacagatgcgcaatgccaaggacaattgccg
tggagtaaacgaggcgtatgcgcaacctgcacctggcggacgcggcgtatgcgcaatgtgcaattcgcttaccttctcgttgcgggtcaggaactcccagatgggaatggccgatgacgagctgatctgaatgtggaaggcgcccagcaggc
aagattactttcgccgcagtcgtcatggtgtcgttgctgcttttatgttgcgtactccgcactacacggagagttcaggggattcgtgctccgtgatctgtgatccgtgttccgtgggtcaattgcacggttcggttgtgtaaccttcgtgt
tctttttttttagggcccaataaaagcgcttttgtggcggcttgatagattatcacttggtttcggtggctagccaagtggctttcttctgtccgacgcacttaattgaattaaccaaacaacgagcgtggccaattcgtattatcgctgtt
Where is the gene?

tacgtgtgtctcagcttgaaacgcaaaagcttgtttcacacatcggtttctcggcaagatgggggagtcagtcggtctagggagaggggcgcccaccagtcgatcacgaaaacggcgaattccaagcgaaacggaaacggagcgagcactat
agtactatgtcgaacaaccgatcgcggcgatgtcagtgagtcgtcttcggacagcgctggcgctccacacgtatttaagctctgagatcggctttgggagagcgcagagagcgccatcgcacggcagagcgaaagcggcagtgagcgaaagc
gagcggcagcgggtgggggatcgggagccccccgaaaaaaacagaggcgcacgtcgatgccatcggggaattggaacctcaatgtgtgggaatgtttaaatattctgtgttaggtagtgtagtttcatagactatagattctcatacagatt
gagtccttcgagccgattatacacgacagcaaaatatttcagtcgcgcttgggcaaaaggcttaagcacgactcccagtccccccttacatttgtcttcctaagcccctggagccactatcaaacttgttctacgcttgcactgaaaataga
accaaagtaaacaatcaaaaagaccaaaaacaataacaaccagcaccgagtcgaacatcagtgaggcattgcaaaaatttcaaagtcaagtttgcgtcgtcatcgcgtctgagtccgatcaagccgggcttgtaattgaagttgttgatgag
ttactggattgtggcgaattctggtcagcatacttaacagcagcccgctaattaagcaaaataaacatatcaaattccagaatgcgacggcgccatcatcctgtttgggaattcaattcgcgggcagatcgtttaattcaattaaaaggtag
aaaagggagcagaagaatgcgatcgctggaatttcctaacatcacggaccccataaatttgataagcccgagctcgctgcgttgagtcagccaccccacatccccaaatccccgccaaaagaagacagctgggttgttgactcgccagattg
attgcagtggagtggacctggtcaaagaagcaccgttaatgtgctgattccattcgattccatccgggaatgcgataaagaaaggctctgatccaagcaactgcaatccggatttcgattttctctttccatttggttttgtatttacgtac
aagcattctaatgaagacttggagaagacttacgttatattcagaccatcgtgcgatagaggatgagtcatttccatatggccgaaatttattatgtttactatcgtttttagaggtgttttttggacttaccaaaagaggcatttgttttc
ttcaactgaaaagatatttaaattttttcttggaccattttcaaggttccggatatatttgaaacacactagctagcagtgttggtaagttacatgtatttctataatgtcatattcctttgtccgtattcaaatcgaatactccacatctc
ttgtacttgaggaattggcgatcgtagcgatttcccccgccgtaaagttcctgatcctcgttgtttttgtacatcataaagtccggattctgctcgtcgccgaagatgggaacgaagctgccaaagctgagagtctgcttgaggtgctggtc
gtcccagctggataaccttgctgtacagatcggcatctgcctggagggcacgatcgaaatccttccagtggacgaacttcacctgctcgctgggaatagcgttgttgtcaagcagctcaaggagcgtattcgagttgacgggctgcaccacg
ctgctccttcgctggggattcccctgcgggtaagcgccgcttgcttggactcgtttccaaatcccatagccacgccagcagaggagtaacagagctctgaaaacagttcatggtttaaaaatatcctttaagaaagcccatgggtataactt
actgcgtcctatgcgaggaatggtctttaggttctttatggcaaagttctcgcctcgcttgcccagccgcggtacgttcttggtgatctttaggaagaatcctggactactgtcgtctgcctggcttatggccacaagacccaccaagagcg
13
aggactgttatgattctcatgctgatgcgactgaagcttcacctgactcctgctccacaattggtggcctttatatagcgagatccacccgcatcttgcgtggaatagaaatgcgggtgactccaggaattagcattatcgatcggaaagtg
ataaaactgaactaacctgacctaaatgcctggccataattaagtgcatacatacacattacattacttacatttgtataagaactaaattttatagtacataccacttgcgtatgtaaatgcttgtcttttctcttatatacgttttataa
Spam Filtering
What are the

desired outcomes?
What are the inputs

(data)?
What are the

learning
paradigms?
14
Similar problem: webpage classification
Company homepage vs. University homepage
15
Robot Control
Now cars can find their own ways!

outcomes?

paradigms? 16
Nonlinear classifier
Nonlinear
Decision
Boundaries
Linear SVM
Decision
Boundaries
17
Nonconventional clusters
Need more advanced

methods, such as
kernel methods or spectral
clustering to work
18
Syllabus
Cover a number of most commonly used machine learning
algorithms in sufficient amount of details in their mechanisms.
Organization
Unsupervised learning (data exploration)
Clustering, dimensionality reduction, density estimation, novelty
detection
Supervised learning (predictive models)

Classifications, regressions
Complex models (dealing with nonlinearity, combine models etc)

Kernel methods, graphical models, boosting
19
Prerequisites
Probabilities
Distributions, densities, marginalization, conditioning .
Basic statistics
Moments, classification, regression, maximum likelihood
estimation
Algorithms
Dynamic programming, basic data structures, complexity
Programming
Mostly your choice of language, but Matlab will be very useful
The class will be fast paced
Ability to deal with abstract mathematical concepts
20
Textbooks
Textbooks:
Pattern Recognition and Machine Learning, Chris Bishop
The Elements of Statistical Learning: Data Mining, Inference, and
Prediction, Trevor Hastie, Robert Tibshirani, Jerome Friedman
Machine Learning, Tom Mitchell
21
Grading
6 assignments (60%)
Approximately 1 assignment every 4 lectures
Start early
Midterm exam (20%)
Final exam (20%)
Project for advanced students

Can be used to replace exams
Based on student experience and lecture interests.
22
Homeworks
Zero credit after each deadline
All homeworks must be handed in, even for zero credit
Collaboration
You may discuss the questions
Each student writes their own answers
Write on your homework anyone with whom you collaborate
Each student must write their own codes for the programming
part
23
Staff
Instructor: Le Song, Klaus 1340
TA: Joonseok, Klaus 1305, Nan Du, Klaus 1305
Guest Lecturer: TBD
Administrative Assistant: Mimi Haley, Klaus 1321
Mailing list: mlcda2013@gmail.com
More information:
http://www.cc.gatech.edu/~lsong/teaching/CSE6740fall13.html
24
Today
Probabilities
Independence
Conditional Independence
25
Random Variables (RV)
Data may contain many different attributes
Age, grade, color, location, coordinate, time
Upper-case for random variables (eg. , ), lower-case for

values (eg. , )
() for distribution, () for density

() = possible values of random variable
For discrete (categorical): =1|()| ( = ) = 1
For continuous:
= = 1
0
Shorthand: () for ( = ) 26
Interpretations of probability
Frequentists
() is the frequency of in the limit
Many arguments against this interpretation
What is the frequency of the event it will rain tomorrow?
Subjective interpretation
() is my degree of belief that will happen
What does degree of belief mean?
If () = 0.8, then I am willing to bet
For this class, we dont care the type of interpretation.
27
Conditional probability
After we have seen , how do we feel will happen?
means ( = | = )
A conditional distribution are a family of distributions

For each = , it is a distribution (|)
28
Two of the most important rules: I. The chain rule
, =
More generally:
1 , 2 , , = 1 2 1 1 , , 2 , 1
29
Two of the most important rules: II. Bayes rule
likelihood Prior
(,)
= =
Val (,)
posterior Normalization constant
More generally, additional variable z:

, (|)
(|, ) =
(|)
30
Independence
and independent, if (|) = ()
= ( )
Proposition: and independent if and only if

(, ) = ()()
31
Conditional independence
Independence is rarely true; conditional independence is more
prevalent
and conditionally independent given Z if

(|, ) = (|)
(|, ) = (|) ( | )
if and only if , =
32
Joint distribution, marginalization
Two random variables Grades (G) & Intelligence (I)
G I VH H
, = A 0.7 0.1
B 0.15 0.05
For binary variables, the table (multiway array) gets really big
1 , 2 , , has 2 entries!
Marginalization Compute marginal over a single variable

( = ) = ( = , = ) + ( = , = ) = 0.2
33
Marginalization the general case
Compute marginal distribution from
1 , 2 , , , +1 , ,
1 , 2 , , = 1 , 2 , , , +1 ,
+1 ,,
= 1 , , 1 ,
1,,1
If binary variables, need to sum over 21 terms!
34
Example problem
Estimate the probability of landing in heads
using a biased coin
Given a sequence of independently and

identically distributed (iid) flips
Eg., = 1 , 2 , , = {1,0,1, , 0}, {0,1}
Model: | = 1 1
1 , = 0
(| ) =
, = 1
Likelihood of a single observation ?

| = 1 1
35
Frequentist Parameter Estimation
Frequentists think of a parameter as a fixed, unknown
constant, not a random variable
Hence different objective estimators, instead of Bayes rule

These estimators have different properties, such as being
unbiased, minimum variance, etc.
A very popular estimator is the maximum likelihood estimator

(MLE), which is simple and has good statistical properties

= = =1 ( |)
36
MLE for Biased Coin
Objective function, log likelihood
; = log = log 1 = log +
log 1
We need to maximize this w.r.t.
Take derivatives w.r.t.

1
= = 0 = or =
1
37
Bayesian Parameter Estimation
Bayesian treat the unknown parameters as a random variable,
whose distribution can be inferred using Bayes rule:
() ()
(|) = =
()
The crucial equation can be written in words

=

For iid data, the likelihood is = =1 ( |)
1 1
= # 1 #
=1 1 = 1
The prior encodes our prior knowledge on the domain

Different prior will end up with different estimate (|)!
38
Bayesian estimation for biased coin
Prior over , Beta distribution
+
; , = 1 1 1

When x is discrete + 1 = = !
Posterior distribution of
1 ,,
|1 , , =
1 ,,
1 1 1 1 =
+1 1 +1
and are hyperparameters and correspond to the number of
virtual heads and tails (pseudo counts)
39
Bayesian Estimation for Bernoulli
Posterior distribution
1 ,,
|1 , , =
1 ,,
1 1 1 1 = +1 1 +1
Posterior mean estimation:

= = +1 1 +1
=
( +)
++
Prior strength: = +
A can be interpreted as an imaginary dataset
40

Lecture 0

Uploaded by

Document Information

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Lecture 0

Uploaded by

Copyright:

Available Formats

Introduction

13 million wikipedia pages

800 million users

24 hours video uploaded per minutes

> 1 trillion webpages

What are the desired

What are the inputs

What are the learning

What are the desired

What are the inputs

What are the learning

What are the desired outcomes?

What are the inputs (data)?

What are the learning paradigms?

What are the desired outcomes?

What are the inputs (data)?

What are the learning paradigms?

What are the desired

What are the inputs (data)?

What are the learning

What are the desired

What are the inputs

What are the learning

What are the desired

What are the inputs

What are the learning

What are the learning paradigms?

Models Hidden Markov Models

Text Machine Learning is the preferred method for speech recognition

Similar problem: bioinformatics

Where is the gene?

What are the

What are the inputs

What are the

What are the desired

What are the inputs (data)?

What are the learning

Need more advanced

Supervised learning (predictive models)

Complex models (dealing with nonlinearity, combine models etc)

Midterm exam (20%)

Final exam (20%)

Project for advanced students

All homeworks must be handed in, even for zero credit

TA: Joonseok, Klaus 1305, Nan Du, Klaus 1305

Guest Lecturer: TBD

Administrative Assistant: Mimi Haley, Klaus 1321

Mailing list: mlcda2013@gmail.com

Upper-case for random variables (eg. , ), lower-case for

() for distribution, () for density

For this class, we dont care the type of interpretation.

A conditional distribution are a family of distributions

posterior Normalization constant

More generally, additional variable z:

Proposition: and independent if and only if

and conditionally independent given Z if

Marginalization Compute marginal over a single variable

If binary variables, need to sum over 21 terms!

Given a sequence of independently and

Likelihood of a single observation ?

Hence different objective estimators, instead of Bayes rule

A very popular estimator is the maximum likelihood estimator

We need to maximize this w.r.t.

Take derivatives w.r.t.