Professional Documents
Culture Documents
Chapter 1
Introduction
Hours
EEE 485/585 Statistical Learning and Data Analytics Syllabus
Recommended
textbooks
Grading - Undergrad
Grading - Grad
Supervised learning
Unsupervised learning
Deep learning
Reinforcement
learning
Probability review
Basics
Random variables
Random vectors
Convergence
1.1
Introduction
Statistical Learning: A framework for reasoning from the data
by making statistical assumptions on the way the data is
generated.
Ex: Predictions, decision making, structure identification.
Syllabus
Ex: What happened, why it happened, what will happen, what
Recommended
action to take... textbooks
Grading - Undergrad
Grading - Grad
Supervised learning
Unsupervised learning
Deep learning
Reinforcement
learning
Probability review
Basics
Random variables
Random vectors
Convergence
1.2
Introduction
Lecture & office hours, TAs
Instructor (Section 2): Asst. Prof. Dr. Cem Tekin
Email: cemtekin@ee.bilkent.edu.tr
Webpage: http://kilyos.ee.bilkent.edu.tr/ cemtekin/
Grading - Undergrad
Office: EE-203 Grading - Grad
Office hours: Wed 13:30-15:00 (or by appointment) Supervised learning
Unsupervised learning
1.3
Introduction
Syllabus - 1
Introduction [2 weeks]
Intro & probability review
Bayesian and frequentist machine learning
Supervised learning [4 weeks]
Hours
Linear regression, ordinary least squares
Syllabus
Ridge regression, lasso
Recommended
Parameter estimation, Bayesian linear regression textbooks
How to perform validation Grading - Undergrad
Generalized linear models Grading - Grad
Deep learning
Blind signal separation
Reinforcement
Clustering, Gaussian mixtures and expectation learning
1.4
Introduction
Syllabus - 2
Recommended
Deep learning textbooks
Reinforcement learning Grading - Undergrad
Online learning Grading - Grad
Supervised learning
Unsupervised learning
Deep learning
Reinforcement
learning
Probability review
Basics
Random variables
Random vectors
Convergence
1.5
Introduction
Recommended textbooks
Unsupervised learning
learning, 2011, Springer
Deep learning
Reinforcement
learning
Probability review
Basics
Random variables
Random vectors
Convergence
1.6
Introduction
Grading - Undergrad
Midterm 25%
Final 25%
4 Quizzes 20%
Term project (3 phase) 30% Hours
Syllabus
Minimum requirements to qualify for the final exam:
Recommended
textbooks
All project reports should be completed and submitted (no
Grading - Undergrad
late submission). Grading - Grad
Unsupervised learning
Deep learning
Reinforcement
learning
Probability review
Basics
Random variables
Random vectors
Convergence
1.7
Introduction
Grading - Grad
Recommended
Minimum requirements to qualify for the final exam: textbooks
Grading - Undergrad
All project reports should be completed and submitted (no Grading - Grad
Unsupervised learning
At least 20/100 must be obtained from the midterm.
Deep learning
Reinforcement
learning
Probability review
Basics
Random variables
Random vectors
Convergence
1.8
Introduction
Project
Recommended
final report. textbooks
Grading - Undergrad
Grading - Grad
Supervised learning
Unsupervised learning
Deep learning
Reinforcement
learning
Probability review
Basics
Random variables
Random vectors
Convergence
1.9
Introduction
Supervised learning
Model: |{z}
Y = f (|{z}
X ) + |{z} Hours
Syllabus
output input noise
Recommended
Training data: ( xi , yi ), i = 1, . . . , n. textbooks
|{z} |{z} Grading - Undergrad
instance label Grading - Grad
Supervised learning
Learning process: {(xi , yi )}ni=1 → Learning algorithm → f̂ (·)
Unsupervised learning
Goal: Ensure that f̂ (X ) is close to Y for all possible X and Y Deep learning
Reinforcement
pairs. learning
Probability review
Basics
Random variables
Random vectors
Convergence
1.10
Introduction
Supervised learning - Example: Advertising
Syllabus
25
25
25
Recommended
textbooks
20
20
20
Grading - Undergrad
Grading - Grad
Sales
Sales
Sales
15
15
15
Supervised learning
Unsupervised learning
10
10
10
Deep learning
Reinforcement
5
5
learning
Probability review
0 50 100 200 300 0 10 20 30 40 50 0 20 40 60 80 100 Basics
*Figure from G. James et al, An Introduction to Statistical Learning with Applications in R, Springer, 2013 1.11
Introduction
Supervised learning - Example: Ship recognition
Hours
Convolutional
Syllabus
Neural Network
Recommended
Training textbooks
Grading - Undergrad
Grading - Grad
Supervised learning
Unsupervised learning
Deep learning
Reinforcement
learning
1.12
Introduction
Unsupervised learning
Syllabus
Recommended
textbooks
Grading - Undergrad
Grading - Grad
Supervised learning
Unsupervised learning
Deep learning
Reinforcement
learning
Probability review
Basics
Random variables
Hours
Syllabus
Recommended
textbooks
Grading - Undergrad
Grading - Grad
Sir Ronald Fisher (Statistician)
Supervised learning
Right image: 1024x1024 pixels. Value of each pixel in {0, 1, . . . , 255}.
Unsupervised learning
Size ≈ 1 MB.
Deep learning
Break image into 2x2 blocks of pixels. (Total 512x512 blocks). Reinforcement
learning
Each block is a point in R4 . (Total 512x512 blocks)
Probability review
Cluster these vectors into 200 (middle) and 4 (right) clusters. Basics
Random variables
Find the centers of each cluster. Random vectors
Convergence
Each point is approximated by its closest cluster centroid.
Syllabus
Recommended
textbooks
Grading - Undergrad
Grading - Grad
Supervised learning
Unsupervised learning
Deep learning
Reinforcement
learning
Probability review
Basics
Random variables
Examples: Random vectors
Convergence
http://research.ics.aalto.fi/ica/cocktail/cocktail_en.cgi
Hours
Syllabus
Recommended
Handwritten digit classification and generation textbooks
Grading - Grad
http://www.cs.toronto.edu/~hinton/digits.html
Supervised learning
Unsupervised learning
Deep learning
Reinforcement
learning
Probability review
Basics
Random variables
Random vectors
Convergence
1.16
Introduction
Reinforcement learning
Hours
Locomotion
Syllabus
Example: Recommended
https://www.youtube.com/watch?v=hx_bgoTF7bs textbooks
Grading - Undergrad
Grading - Grad
Supervised learning
Unsupervised learning
Starcraft II Deep learning
Example: https://youtu.be/cUTMhmVh1qs Reinforcement
learning
Probability review
Basics
Random variables
Random vectors
Convergence
1.17
Introduction
Probability review
Hours
Syllabus
Grading - Grad
Learning algorithms are trained on limited data. We want
Supervised learning
algorithms to perform well on “unseen" data!
Unsupervised learning
Deep learning
Reinforcement
learning
Probability review
Basics
Random variables
Random vectors
Convergence
1.18
Introduction
Two interpretations of probability
Hours
Syllabus
Frequentist: Probabilities represent frequencies of events that Recommended
textbooks
happen many times.
Grading - Undergrad
Supervised learning
Example: Consider the experiment of flipping a fair coin. Unsupervised learning
Deep learning
Reinforcement
learning
Probability review
Basics
Random variables
Random vectors
Convergence
1.19
Introduction
Probability space
Recommended
experiment. textbooks
Grading - Grad
subsets of Ω (the actual definition is more advanced)
Supervised learning
P (probability measure): A function P : F → R with the Unsupervised learning
following properties (axioms): Deep learning
1 P(A) ≥ 0 for all A ∈ F Reinforcement
learning
2 P(Ω) = 1 S P Probability review
3 For disjoint events Ai ∩ Aj = ∅, i 6= j, P( i Ai ) = i P(Ai ) Basics
Random variables
Random vectors
Convergence
1.20
Introduction
Probability space - Example
Grading - Undergrad
independently.
Grading - Grad
Supervised learning
1 What is the probability space of this experiment? Unsupervised learning
Deep learning
2 What is the probability that at least one liked the movie?
Reinforcement
3 What is the probability that Ayse liked the movie? learning
Probability review
4 What is the probability that only Ayse liked the movie? Basics
Random variables
Random vectors
Convergence
1.21
Introduction
Basic properties
Hours
Recommended
Union bound: For a collection of events A1 , A2 , . . ., we textbooks
Grading - Grad
X
P(∪i Ai ) ≤ P(Ai ) Supervised learning
Unsupervised learning
i
Deep learning
Reinforcement
learning
Probability review
Basics
Random variables
Random vectors
Convergence
1.22
Introduction
Conditional probability and independence
Conditional probability:
Let B be an event with non-zero probability. Then for any event
Hours
A: Syllabus
P(A ∩ B) Recommended
textbooks
P(A|B) :=
P(B) Grading - Undergrad
Grading - Grad
Supervised learning
Independence: Unsupervised learning
Two events are independent if and only if Deep learning
Reinforcement
P(A ∩ B) = P(A)P(B) learning
Probability review
Basics
or equivalently when P(A|B) = P(A). Random variables
Random vectors
Convergence
1.23
Introduction
Bayes’ rule
Grading - Grad
Supervised learning
1.24
Introduction
Random variables
Syllabus
Discrete RV: Takes a discrete set of values Recommended
textbooks
Continuous RV: Takes a continuum of values
Grading - Undergrad
Supervised learning
Probability mass function (pmf): pX (x) := P(X = x) (for Unsupervised learning
discrete RVs) Deep learning
dFX (x)
Probability density function (pdf): fX (x) := dx (for Reinforcement
learning
continuous RVs) Probability review
Basics
Random variables
Random vectors
Convergence
1.25
Introduction
Properties of the CDF, pmf and pdf
CDF properties:
0 ≤ FX (x) ≤ 1
limx→−∞ FX (x) = 0 and limx→∞ FX (x) = 1 Hours
x ≤ y ⇒ FX (x) ≤ FX (y ) Syllabus
Recommended
pmf properties: textbooks
Grading - Undergrad
0 ≤ pX (x) ≤ 1 Grading - Grad
P
pX (x) = 1 Supervised learning
Px Unsupervised learning
x∈A pX (x) = P(X ∈ A) Deep learning
x∈A X
f (x)dx = P(X ∈ A)
1.26
Introduction
Expectation
Grading - Undergrad
x
Grading - Grad
Unsupervised learning
Z ∞
Deep learning
E[g(Y )] = g(y )fY (y )dy Reinforcement
y =−∞ learning
Probability review
What is E[X ]? What is E[Y ]? Basics
Random variables
Random vectors
Convergence
1.27
Introduction
Properties of expectation
Hours
Syllabus
Recommended
For a constant a, E[a] = a textbooks
Grading - Undergrad
Expectation is linear:
Grading - Grad
E[ag(X ) + bh(Y )] = aE[g(X )] + bE[h(Y )] Supervised learning
Unsupervised learning
Deep learning
Reinforcement
learning
Probability review
Basics
Random variables
Random vectors
Convergence
1.28
Introduction
Variance
Hours
Syllabus
Var(X ) = E[(X − E[X ])2 ] = E[X 2 ] − E[X ]2 Recommended
textbooks
Properties: Grading - Undergrad
Grading - Grad
For a constant a, Var(a) = 0.
Supervised learning
Var(ag(X )) = a2 Var(g(X )). Unsupervised learning
Deep learning
Reinforcement
learning
Probability review
Basics
Random variables
Random vectors
Convergence
1.29
Introduction
Indicator function
Hours
For a RV X ( Recommended
textbooks
1 if X ∈ A
I(X ∈ A) := Grading - Undergrad
Supervised learning
Deep learning
Reinforcement
learning
Probability review
Basics
Random variables
Random vectors
Convergence
1.30
Introduction
Examples of RVs - Discrete RVs - Bernoulli
Bernoulli: X ∼ Ber(p), 0 ≤ p ≤ 1,
( Hours
p if y = 1 Syllabus
pX (y ) :=
1 − p if y = 0 Recommended
textbooks
Grading - Undergrad
Supervised learning
p Unsupervised learning
Deep learning
1 p Reinforcement
learning
Probability review
0 1 y Basics
Random variables
Random vectors
Convergence
1.31
Introduction
Examples of RVs - Discrete RVs - Binomial
Syllabus
Recommended
textbooks
Grading - Undergrad
Grading - Grad
Supervised learning
Unsupervised learning
Deep learning
Reinforcement
learning
Probability review
Basics
Random variables
Random vectors
Convergence
e−λ λy
pX (y ) := for y ∈ {0, 1, . . .}
y!
Hours
Syllabus
Recommended
textbooks
Grading - Undergrad
pX (y)
Grading - Grad
Supervised learning
Unsupervised learning
Deep learning
Reinforcement
learning
Probability review
Basics
Random variables
Random vectors
Convergence
y
Syllabus
Recommended
textbooks
Grading - Undergrad
Grading - Grad
fX (y)
Supervised learning
Unsupervised learning
Deep learning
Reinforcement
learning
Probability review
Basics
y Random variables
Random vectors
Convergence
Syllabus
Recommended
textbooks
Grading - Undergrad
Grading - Grad
fX (y)
Supervised learning
Unsupervised learning
Deep learning
Reinforcement
learning
Probability review
Basics
Random variables
Random vectors
Convergence
y
Syllabus
Joint pmf: pXY (x, y ) = P(X = x, Y = y )
Recommended
2
∂ FXY (x,y ) textbooks
Joint pdf: fXY (x, y ) = ∂x∂y Grading - Undergrad
Supervised learning
0 ≤ FXY (x, y ) ≤ 1 Unsupervised learning
Reinforcement
FX (x) = limy →∞ FXY (x, y ). learning
Probability review
Basics
Random variables
Random vectors
Convergence
1.36
Introduction
Conditional distribution
Recommended
textbooks
Bayes rule (discrete RV):
Grading - Undergrad
Grading - Grad
p(x, y ) p(x|y )p(y )
p(y |x) = =P 0 0
Supervised learning
p(x) y 0 p(x|y )p(y ) Unsupervised learning
Deep learning
Bayes rule (continuous RV): Reinforcement
learning
p(x) y 0 =−∞
p(x|y 0 )p(y 0 )dy 0 Random variables
Random vectors
Convergence
1.37
Introduction
Independence, identical distributions
Hours
Recommended
FXY (x, y ) = FX (x)FY (y ) for all x and y in R textbooks
Supervised learning
Random variables are independent and identically Unsupervised learning
distributed (i.i.d.) if C1 and C2 holds for them Deep learning
Reinforcement
learning
Probability review
Basics
Random variables
Random vectors
Convergence
1.38
Introduction
Covariance
Syllabus
Recommended
Exercise: Show that Cov(X , Y ) = E[XY ] − E[X ]E[Y ] textbooks
Grading - Undergrad
Properties: Grading - Grad
Unsupervised learning
If X and Y are independent, then Cov(X , Y ) = 0 Deep learning
1.39
Introduction
Random vectors
E[X ] = .
Recommended
.. textbooks
Grading - Undergrad
E[Xn ]
Grading - Grad
Covariance matrix of X : Σ = E[(X − E[X ])(X − E[X ])T ]. Supervised learning
Σij =Cov(Xi , Xj )
Unsupervised learning
Σ=
.. .. .. Reinforcement
. . . learning
Probability review
Cov(Xn , X1 ) ... Cov(Xn , Xn ) Basics
Random variables
Σ is symmetric. Random vectors
Convergence
Σ is positive semidefinite.
1.40
Introduction
Multivariate Gaussian distribution
X = [X1 , X2 , . . . , Xn ]T is a Gaussian random vector if
1 1
fX (x1 , x2 , . . . , xn ) = exp − (x − µ)T Σ−1 (x − µ)
(2π)n/2 |Σ|1/2 2
Hours
Syllabus
µ = E[X ] is the mean vector Recommended
textbooks
Σ = E[(X − E[X ])(X − E[X ])T ] is the covariance matrix Grading - Undergrad
Supervised learning
Unsupervised learning
Deep learning
Reinforcement
learning
Probability review
Basics
Random variables
Random vectors
Convergence
X1 + · · · + Xn Hours
→ E[X ] as n → ∞ Syllabus
n
Recommended
textbooks
Grading - Undergrad
Grading - Grad
Supervised learning
Unsupervised learning
Deep learning
Reinforcement
learning
Probability review
Basics
Random variables
Random vectors
Convergence
Syllabus
Recommended
textbooks
Grading - Undergrad
Grading - Grad
Supervised learning
Unsupervised learning
Deep learning
Reinforcement
learning
Probability review
Basics
Random variables
Random vectors
Convergence
Recommended
Important! Used to test how confident we are in the textbooks
Grading - Undergrad
performance of a machine learning algorithm.
Grading - Grad
Unsupervised learning
X : feature space, Y: label space. Deep learning
i.i.d.
Training data (Xi , Yi )ni=1 ∼ PXY Reinforcement
learning
Can we calculate the error rate of the classifier on “unseen Random variables
Random vectors
data"? Convergence
1.44