You are on page 1of 40

Introduction

Le Song

Machine Learning I
CSE 6740, Fall 2013
What is Machine Learning (ML)
Study of algorithms that improve their performance at some
task with experience

2
Common to Industrial scale problems

13 million wikipedia pages

800 million users

6 billion photos

24 hours video uploaded per minutes

> 1 trillion webpages

3
Organizing Images

Image
Databases

What are the desired


outcomes?

What are the inputs


(data)?

What are the learning


paradigms? 4
Visualize Image Relations
Each image has
thousands or millions of
pixels.

What are the desired


outcomes?

What are the inputs


(data)?

What are the learning


paradigms?

5
Organizing documents
Reading, digesting, and We want:
categorizing a vast text database
is too much for human!

What are the desired outcomes?

What are the inputs (data)?

What are the learning paradigms?


6
Weather Prediction

Numeric values:
40 F
Predict Wind: NE at 14 km/h
Humidity: 83%

What are the desired outcomes?


Predict

What are the inputs (data)?

What are the learning paradigms?

7
Face Detection

What are the desired


outcomes?

What are the inputs (data)?

What are the learning


paradigms?

8
Understanding brain activity

What are the desired


outcomes?

What are the inputs


(data)?

What are the learning


paradigms?

9
Product Recommendation

What are the desired


outcomes?

What are the inputs


(data)?

What are the learning


paradigms?

10
Handwritten digit recognition/text annotation
Inter-word
dependency
Inter-character
dependency Aoccdrnig to a sudty at Cmabrigde
Uinervtisy, it deosnt mttaer in waht
oredr the ltteers in a wrod are, the
olny iprmoetnt tihng is taht the frist
and lsat ltteer be at the rghit pclae.
The rset can be a ttoal mses and you
can sitll raed it wouthit a porbelm.
Tihs is bcuseae the huamn mnid
deos not raed ervey lteter by istlef,
What are the desired outcomes?
but the wrod as a wlohe.
What are the inputs (data)?

What are the learning paradigms?


11
Similar problem: speech recognition

Models Hidden Markov Models

Text Machine Learning is the preferred method for speech recognition

Audio signals

12
cacatcgctgcgtttcggcagctaattgccttttagaaattattttcccatttcgagaaactcgtgtgggatgccggatgcggctttcaatcacttctggcccgggatcggattgggtcacattgtctgcgggctctattgtctcgatccgc
ggcgcagttcgcgtgcttagcggtcagaaaggcagagattcggttcggattgatgcgctggcagcagggcacaaagatctaatgactggcaaatcgctacaaataaattaaagtccggcggctaattaatgagcggactgaagccactttgg
attaaccaaaaaacagcagataaacaaaaacggcaaagaaaattgccacagagttgtcacgctttgttgcacaaacatttgtgcagaaaagtgaaaagcttttagccattattaagtttttcctcagctcgctggcagcacttgcgaatgta

Similar problem: bioinformatics


ctgatgttcctcataaatgaaaattaatgtttgctctacgctccaccgaactcgcttgtttgggggattggctggctaatcgcggctagatcccaggcggtataaccttttcgcttcatcagttgtgaaaccagatggctggtgttttggca
cagcggactcccctcgaacgctctcgaaatcaagtggctttccagccggcccgctgggccgctcgcccactggaccggtattcccaggccaggccacactgtaccgcaccgcataatcctcgccagactcggcgctgataaggcccaatgtc
actccgcaggcgtctatttatgccaaggaccgttcttcttcagctttcggctcgagtatttgttgtgccatgttggttacgatgccaatcgcggtacagttatgcaaatgagcagcgaataccgctcactgacaatgaacggcgtcttgtca
tattcatgctgacattcatattcattcctttggttttttgtcttcgacggactgaaaagtgcggagagaaacccaaaaacagaagcgcgcaaagcgccgttaatatgcgaactcagcgaactcattgaagttatcacaacaccatatccata
catatccatatcaatatcaatatcgctattattaacgatcatgctctgctgatcaagtattcagcgctgcgctagattcgacagattgaatcgagctcaatagactcaacagactccactcgacagatgcgcaatgccaaggacaattgccg
tggagtaaacgaggcgtatgcgcaacctgcacctggcggacgcggcgtatgcgcaatgtgcaattcgcttaccttctcgttgcgggtcaggaactcccagatgggaatggccgatgacgagctgatctgaatgtggaaggcgcccagcaggc
aagattactttcgccgcagtcgtcatggtgtcgttgctgcttttatgttgcgtactccgcactacacggagagttcaggggattcgtgctccgtgatctgtgatccgtgttccgtgggtcaattgcacggttcggttgtgtaaccttcgtgt
tctttttttttagggcccaataaaagcgcttttgtggcggcttgatagattatcacttggtttcggtggctagccaagtggctttcttctgtccgacgcacttaattgaattaaccaaacaacgagcgtggccaattcgtattatcgctgtt
tacgtgtgtctcagcttgaaacgcaaaagcttgtttcacacatcggtttctcggcaagatgggggagtcagtcggtctagggagaggggcgcccaccagtcgatcacgaaaacggcgaattccaagcgaaacggaaacggagcgagcactat
agtactatgtcgaacaaccgatcgcggcgatgtcagtgagtcgtcttcggacagcgctggcgctccacacgtatttaagctctgagatcggctttgggagagcgcagagagcgccatcgcacggcagagcgaaagcggcagtgagcgaaagc
gagcggcagcgggtgggggatcgggagccccccgaaaaaaacagaggcgcacgtcgatgccatcggggaattggaacctcaatgtgtgggaatgtttaaatattctgtgttaggtagtgtagtttcatagactatagattctcatacagatt
gagtccttcgagccgattatacacgacagcaaaatatttcagtcgcgcttgggcaaaaggcttaagcacgactcccagtccccccttacatttgtcttcctaagcccctggagccactatcaaacttgttctacgcttgcactgaaaataga
accaaagtaaacaatcaaaaagaccaaaaacaataacaaccagcaccgagtcgaacatcagtgaggcattgcaaaaatttcaaagtcaagtttgcgtcgtcatcgcgtctgagtccgatcaagccgggcttgtaattgaagttgttgatgag
ttactggattgtggcgaattctggtcagcatacttaacagcagcccgctaattaagcaaaataaacatatcaaattccagaatgcgacggcgccatcatcctgtttgggaattcaattcgcgggcagatcgtttaattcaattaaaaggtag
aaaagggagcagaagaatgcgatcgctggaatttcctaacatcacggaccccataaatttgataagcccgagctcgctgcgttgagtcagccaccccacatccccaaatccccgccaaaagaagacagctgggttgttgactcgccagattg
attgcagtggagtggacctggtcaaagaagcaccgttaatgtgctgattccattcgattccatccgggaatgcgataaagaaaggctctgatccaagcaactgcaatccggatttcgattttctctttccatttggttttgtatttacgtac
aagcattctaatgaagacttggagaagacttacgttatattcagaccatcgtgcgatagaggatgagtcatttccatatggccgaaatttattatgtttactatcgtttttagaggtgttttttggacttaccaaaagaggcatttgttttc
ttcaactgaaaagatatttaaattttttcttggaccattttcaaggttccggatatatttgaaacacactagctagcagtgttggtaagttacatgtatttctataatgtcatattcctttgtccgtattcaaatcgaatactccacatctc
ttgtacttgaggaattggcgatcgtagcgatttcccccgccgtaaagttcctgatcctcgttgtttttgtacatcataaagtccggattctgctcgtcgccgaagatgggaacgaagctgccaaagctgagagtctgcttgaggtgctggtc
gtcccagctggataaccttgctgtacagatcggcatctgcctggagggcacgatcgaaatccttccagtggacgaacttcacctgctcgctgggaatagcgttgttgtcaagcagctcaaggagcgtattcgagttgacgggctgcaccacg
ctgctccttcgctggggattcccctgcgggtaagcgccgcttgcttggactcgtttccaaatcccatagccacgccagcagaggagtaacagagctcwhereisthegenetgattaaaaatatcctttaagaaagcccatgggtataactt
actgcgtcctatgcgaggaatggtctttaggttctttatggcaaagttctcgcctcgcttgcccagccgcggtacgttcttggtgatctttaggaagaatcctggactactgtcgtctgcctggcttatggccacaagacccaccaagagcg
aggactgttatgattctcatgctgatgcgactgaagcttcacctgactcctgctccacaattggtggcctttatatagcgagatccacccgcatcttgcgtggaatagaaatgcgggtgactccaggaattagcattatcgatcggaaagtg
ataaaactgaactaacctgacctaaatgcctggccataattaagtgcatacatacacattacattacttacatttgtataagaactaaattttatagtacataccacttgcgtatgtaaatgcttgtcttttctcttatatacgttttataa
cccagcatattttacgtaaaaacaaaacggtaatgcgaacataacttatttattggggcccggaccgcaaaccggccaaacgcgtttgcacccataaaaacataagggcaacaaaaaaattgttaagctgttgtttatttttgcaatcgaaa
cgctcaaatagctgcgatcactcgggagcagggtaaagtcgcctcgaaacaggaagctgaagcatcttctataaatacactcaaagcgatcattccgaggcgagtctggttagaaatttacatggactgcaaaaaggtatagccccacaaac
cacatcgctgcgtttcggcagctaattgccttttagaaattattttcccatttcgagaaactcgtgtgggatgccggatgcggctttcaatcacttctggcccgggatcggattgggtcacattgtctgcgggctctattgtctcgatccgc
ggcgcagttcgcgtgcttagcggtcagaaaggcagagattcggttcggattgatgcgctggcagcagggcacaaagatctaatgactggcaaatcgctacaaataaattaaagtccggcggctaattaatgagcggactgaagccactttgg
attaaccaaaaaacagcagataaacaaaaacggcaaagaaaattgccacagagttgtcacgctttgttgcacaaacatttgtgcagaaaagtgaaaagcttttagccattattaagtttttcctcagctcgctggcagcacttgcgaatgta
ctgatgttcctcataaatgaaaattaatgtttgctctacgctccaccgaactcgcttgtttgggggattggctggctaatcgcggctagatcccaggcggtataaccttttcgcttcatcagttgtgaaaccagatggctggtgttttggca
cagcggactcccctcgaacgctctcgaaatcaagtggctttccagccggcccgctgggccgctcgcccactggaccggtattcccaggccaggccacactgtaccgcaccgcataatcctcgccagactcggcgctgataaggcccaatgtc
actccgcaggcgtctatttatgccaaggaccgttcttcttcagctttcggctcgagtatttgttgtgccatgttggttacgatgccaatcgcggtacagttatgcaaatgagcagcgaataccgctcactgacaatgaacggcgtcttgtca
tattcatgctgacattcatattcattcctttggttttttgtcttcgacggactgaaaagtgcggagagaaacccaaaaacagaagcgcgcaaagcgccgttaatatgcgaactcagcgaactcattgaagttatcacaacaccatatccata
catatccatatcaatatcaatatcgctattattaacgatcatgctctgctgatcaagtattcagcgctgcgctagattcgacagattgaatcgagctcaatagactcaacagactccactcgacagatgcgcaatgccaaggacaattgccg
tggagtaaacgaggcgtatgcgcaacctgcacctggcggacgcggcgtatgcgcaatgtgcaattcgcttaccttctcgttgcgggtcaggaactcccagatgggaatggccgatgacgagctgatctgaatgtggaaggcgcccagcaggc
aagattactttcgccgcagtcgtcatggtgtcgttgctgcttttatgttgcgtactccgcactacacggagagttcaggggattcgtgctccgtgatctgtgatccgtgttccgtgggtcaattgcacggttcggttgtgtaaccttcgtgt
tctttttttttagggcccaataaaagcgcttttgtggcggcttgatagattatcacttggtttcggtggctagccaagtggctttcttctgtccgacgcacttaattgaattaaccaaacaacgagcgtggccaattcgtattatcgctgtt

Where is the gene?


tacgtgtgtctcagcttgaaacgcaaaagcttgtttcacacatcggtttctcggcaagatgggggagtcagtcggtctagggagaggggcgcccaccagtcgatcacgaaaacggcgaattccaagcgaaacggaaacggagcgagcactat
agtactatgtcgaacaaccgatcgcggcgatgtcagtgagtcgtcttcggacagcgctggcgctccacacgtatttaagctctgagatcggctttgggagagcgcagagagcgccatcgcacggcagagcgaaagcggcagtgagcgaaagc
gagcggcagcgggtgggggatcgggagccccccgaaaaaaacagaggcgcacgtcgatgccatcggggaattggaacctcaatgtgtgggaatgtttaaatattctgtgttaggtagtgtagtttcatagactatagattctcatacagatt
gagtccttcgagccgattatacacgacagcaaaatatttcagtcgcgcttgggcaaaaggcttaagcacgactcccagtccccccttacatttgtcttcctaagcccctggagccactatcaaacttgttctacgcttgcactgaaaataga
accaaagtaaacaatcaaaaagaccaaaaacaataacaaccagcaccgagtcgaacatcagtgaggcattgcaaaaatttcaaagtcaagtttgcgtcgtcatcgcgtctgagtccgatcaagccgggcttgtaattgaagttgttgatgag
ttactggattgtggcgaattctggtcagcatacttaacagcagcccgctaattaagcaaaataaacatatcaaattccagaatgcgacggcgccatcatcctgtttgggaattcaattcgcgggcagatcgtttaattcaattaaaaggtag
aaaagggagcagaagaatgcgatcgctggaatttcctaacatcacggaccccataaatttgataagcccgagctcgctgcgttgagtcagccaccccacatccccaaatccccgccaaaagaagacagctgggttgttgactcgccagattg
attgcagtggagtggacctggtcaaagaagcaccgttaatgtgctgattccattcgattccatccgggaatgcgataaagaaaggctctgatccaagcaactgcaatccggatttcgattttctctttccatttggttttgtatttacgtac
aagcattctaatgaagacttggagaagacttacgttatattcagaccatcgtgcgatagaggatgagtcatttccatatggccgaaatttattatgtttactatcgtttttagaggtgttttttggacttaccaaaagaggcatttgttttc
ttcaactgaaaagatatttaaattttttcttggaccattttcaaggttccggatatatttgaaacacactagctagcagtgttggtaagttacatgtatttctataatgtcatattcctttgtccgtattcaaatcgaatactccacatctc
ttgtacttgaggaattggcgatcgtagcgatttcccccgccgtaaagttcctgatcctcgttgtttttgtacatcataaagtccggattctgctcgtcgccgaagatgggaacgaagctgccaaagctgagagtctgcttgaggtgctggtc
gtcccagctggataaccttgctgtacagatcggcatctgcctggagggcacgatcgaaatccttccagtggacgaacttcacctgctcgctgggaatagcgttgttgtcaagcagctcaaggagcgtattcgagttgacgggctgcaccacg
ctgctccttcgctggggattcccctgcgggtaagcgccgcttgcttggactcgtttccaaatcccatagccacgccagcagaggagtaacagagctctgaaaacagttcatggtttaaaaatatcctttaagaaagcccatgggtataactt
actgcgtcctatgcgaggaatggtctttaggttctttatggcaaagttctcgcctcgcttgcccagccgcggtacgttcttggtgatctttaggaagaatcctggactactgtcgtctgcctggcttatggccacaagacccaccaagagcg
13
aggactgttatgattctcatgctgatgcgactgaagcttcacctgactcctgctccacaattggtggcctttatatagcgagatccacccgcatcttgcgtggaatagaaatgcgggtgactccaggaattagcattatcgatcggaaagtg
ataaaactgaactaacctgacctaaatgcctggccataattaagtgcatacatacacattacattacttacatttgtataagaactaaattttatagtacataccacttgcgtatgtaaatgcttgtcttttctcttatatacgttttataa
Spam Filtering

What are the


desired outcomes?

What are the inputs


(data)?

What are the


learning
paradigms?

14
Similar problem: webpage classification
Company homepage vs. University homepage

15
Robot Control
Now cars can find their own ways!

What are the desired


outcomes?

What are the inputs (data)?

What are the learning


paradigms? 16
Nonlinear classifier

Nonlinear
Decision
Boundaries

Linear SVM
Decision
Boundaries

17
Nonconventional clusters

Need more advanced


methods, such as
kernel methods or spectral
clustering to work

18
Syllabus
Cover a number of most commonly used machine learning
algorithms in sufficient amount of details in their mechanisms.

Organization
Unsupervised learning (data exploration)
Clustering, dimensionality reduction, density estimation, novelty
detection

Supervised learning (predictive models)


Classifications, regressions

Complex models (dealing with nonlinearity, combine models etc)


Kernel methods, graphical models, boosting

19
Prerequisites
Probabilities
Distributions, densities, marginalization, conditioning .
Basic statistics
Moments, classification, regression, maximum likelihood
estimation
Algorithms
Dynamic programming, basic data structures, complexity
Programming
Mostly your choice of language, but Matlab will be very useful
The class will be fast paced
Ability to deal with abstract mathematical concepts

20
Textbooks
Textbooks:
Pattern Recognition and Machine Learning, Chris Bishop
The Elements of Statistical Learning: Data Mining, Inference, and
Prediction, Trevor Hastie, Robert Tibshirani, Jerome Friedman
Machine Learning, Tom Mitchell

21
Grading
6 assignments (60%)
Approximately 1 assignment every 4 lectures
Start early

Midterm exam (20%)

Final exam (20%)

Project for advanced students


Can be used to replace exams
Based on student experience and lecture interests.

22
Homeworks
Zero credit after each deadline

All homeworks must be handed in, even for zero credit

Collaboration
You may discuss the questions
Each student writes their own answers
Write on your homework anyone with whom you collaborate
Each student must write their own codes for the programming
part

23
Staff
Instructor: Le Song, Klaus 1340

TA: Joonseok, Klaus 1305, Nan Du, Klaus 1305

Guest Lecturer: TBD

Administrative Assistant: Mimi Haley, Klaus 1321

Mailing list: mlcda2013@gmail.com

More information:
http://www.cc.gatech.edu/~lsong/teaching/CSE6740fall13.html
24
Today
Probabilities
Independence
Conditional Independence

25
Random Variables (RV)
Data may contain many different attributes
Age, grade, color, location, coordinate, time

Upper-case for random variables (eg. , ), lower-case for


values (eg. , )

() for distribution, () for density


() = possible values of random variable
For discrete (categorical): =1|()| ( = ) = 1

For continuous:
= = 1
0

Shorthand: () for ( = ) 26
Interpretations of probability
Frequentists
() is the frequency of in the limit
Many arguments against this interpretation
What is the frequency of the event it will rain tomorrow?

Subjective interpretation
() is my degree of belief that will happen
What does degree of belief mean?
If () = 0.8, then I am willing to bet

For this class, we dont care the type of interpretation.

27
Conditional probability
After we have seen , how do we feel will happen?

means ( = | = )

A conditional distribution are a family of distributions


For each = , it is a distribution (|)

28
Two of the most important rules: I. The chain rule

, =

More generally:
1 , 2 , , = 1 2 1 1 , , 2 , 1

29
Two of the most important rules: II. Bayes rule
likelihood Prior

(,)
= =
Val (,)

posterior Normalization constant

More generally, additional variable z:


, (|)
(|, ) =
(|)

30
Independence
and independent, if (|) = ()
= ( )

Proposition: and independent if and only if


(, ) = ()()

31
Conditional independence
Independence is rarely true; conditional independence is more
prevalent

and conditionally independent given Z if


(|, ) = (|)
(|, ) = (|) ( | )

if and only if , =

32
Joint distribution, marginalization
Two random variables Grades (G) & Intelligence (I)

G I VH H
, = A 0.7 0.1
B 0.15 0.05

For binary variables, the table (multiway array) gets really big
1 , 2 , , has 2 entries!

Marginalization Compute marginal over a single variable


( = ) = ( = , = ) + ( = , = ) = 0.2

33
Marginalization the general case
Compute marginal distribution from
1 , 2 , , , +1 , ,

1 , 2 , , = 1 , 2 , , , +1 ,
+1 ,,

= 1 , , 1 ,
1,,1

If binary variables, need to sum over 21 terms!

34
Example problem
Estimate the probability of landing in heads
using a biased coin

Given a sequence of independently and


identically distributed (iid) flips
Eg., = 1 , 2 , , = {1,0,1, , 0}, {0,1}

Model: | = 1 1

1 , = 0
(| ) =
, = 1

Likelihood of a single observation ?


| = 1 1

35
Frequentist Parameter Estimation
Frequentists think of a parameter as a fixed, unknown
constant, not a random variable

Hence different objective estimators, instead of Bayes rule


These estimators have different properties, such as being
unbiased, minimum variance, etc.

A very popular estimator is the maximum likelihood estimator


(MLE), which is simple and has good statistical properties

= = =1 ( |)

36
MLE for Biased Coin
Objective function, log likelihood
; = log = log 1 = log +
log 1

We need to maximize this w.r.t.

Take derivatives w.r.t.


1
= = 0 = or =
1

37
Bayesian Parameter Estimation
Bayesian treat the unknown parameters as a random variable,
whose distribution can be inferred using Bayes rule:
() ()
(|) = =
()

The crucial equation can be written in words




=


For iid data, the likelihood is = =1 ( |)
1 1
= # 1 #
=1 1 = 1

The prior encodes our prior knowledge on the domain


Different prior will end up with different estimate (|)!
38
Bayesian estimation for biased coin
Prior over , Beta distribution
+
; , = 1 1 1

When x is discrete + 1 = = !

Posterior distribution of
1 ,,
|1 , , =
1 ,,
1 1 1 1 =
+1 1 +1
and are hyperparameters and correspond to the number of
virtual heads and tails (pseudo counts)

39
Bayesian Estimation for Bernoulli
Posterior distribution
1 ,,
|1 , , =
1 ,,
1 1 1 1 = +1 1 +1

Posterior mean estimation:


= = +1 1 +1
=
( +)
++

Prior strength: = +
A can be interpreted as an imaginary dataset

40

You might also like