Chapter1 PDF

Introduction
Chapter 1
Introduction
Hours
EEE 485/585 Statistical Learning and Data Analytics Syllabus
Recommended
textbooks
Grading - Undergrad
Grading - Grad
Supervised learning
Unsupervised learning
Deep learning
Reinforcement
learning
Probability review
Basics
Random variables
Random vectors
Convergence
1.1
Introduction
Statistical Learning: A framework for reasoning from the data
by making statistical assumptions on the way the data is
generated.
Ex: Predictions, decision making, structure identification.
Data Analytics: Using learning tools to analyze the data. Hours
Syllabus
Ex: What happened, why it happened, what will happen, what
Recommended
action to take... textbooks
Grading - Undergrad
Grading - Grad
Supervised learning
Deep learning
Reinforcement
learning
Probability review
Basics
Random variables
Random vectors
Convergence
1.2
Introduction
Lecture & office hours, TAs
Instructor (Section 2): Asst. Prof. Dr. Cem Tekin
Email: cemtekin@ee.bilkent.edu.tr
Webpage: http://kilyos.ee.bilkent.edu.tr/ cemtekin/
Classroom: EE-214 Hours

Reserved lecture hours: Mon 15:40-17:30, Thu 13:40-15:30 Syllabus
(Check Moodle for the exact schedule) Recommended

textbooks
Grading - Undergrad
Office: EE-203 Grading - Grad
Office hours: Wed 13:30-15:00 (or by appointment) Supervised learning
Head TA: Deep learning
Alp Celik, email: alparslan@bilkent.edu.tr Reinforcement

learning
TA office hours will be announced in Moodle. Probability review
Basics
Random variables
Random vectors
Convergence
1.3
Introduction
Syllabus - 1
Introduction [2 weeks]
Intro & probability review
Bayesian and frequentist machine learning
Supervised learning [4 weeks]
Hours
Linear regression, ordinary least squares
Syllabus
Ridge regression, lasso
Recommended
Parameter estimation, Bayesian linear regression textbooks
How to perform validation Grading - Undergrad
Generalized linear models Grading - Grad
Perceptron, neural networks and backpropagation Supervised learning
Unsupervised learning [3 weeks] Unsupervised learning
Deep learning
Blind signal separation
Reinforcement
Clustering, Gaussian mixtures and expectation learning
maximization Probability review

Basics
Feature extraction and feature selection Random variables
Random vectors
Convergence
1.4
Introduction
Syllabus - 2
Graphical models [2 weeks]

Inference and learning in graphical models
Applications of graphical models
Naive Bayes, Restricted Boltzmann Machines, Contrastive
Divergence, etc. Hours
Advanced topics [3 weeks] Syllabus
Recommended
Deep learning textbooks
Reinforcement learning Grading - Undergrad
Online learning Grading - Grad
Supervised learning
Deep learning
Reinforcement
learning
Probability review
Basics
Random variables
Random vectors
Convergence
1.5
Introduction
Recommended textbooks
T. Hastie, R. Tibshirani and J. Friedman, The elements of

statistical learning. Springer, 2003.
G. James, D. Witten , T. Hastie, R. Tibshirani, An
introduction to statistical learning. Springer, 2013.
Hours
K. P. Murphy, Machine learning: A probabilistic Syllabus
perspective. MIT press, 2012. Recommended
textbooks
D. Koller, and N. Friedman, Probabilistic graphical Grading - Undergrad
models: Principles and techniques. MIT press, 2009. Grading - Grad
Christopher M. Bishop, Pattern recognition and machine Supervised learning
learning, 2011, Springer
Deep learning
Reinforcement
learning
Probability review
Basics
Random variables
Random vectors
Convergence
1.6
Introduction
Grading - Undergrad
Midterm 25%
Final 25%
4 Quizzes 20%
Term project (3 phase) 30% Hours
Syllabus
Minimum requirements to qualify for the final exam:
Recommended
textbooks
All project reports should be completed and submitted (no
Grading - Undergrad
late submission). Grading - Grad
At least 20/100 must be obtained from the midterm. Supervised learning
Deep learning
Reinforcement
learning
Probability review
Basics
Random variables
Random vectors
Convergence
1.7
Introduction
Grading - Grad
Midterm 20% (Tentative: Thursday, March 15, 2018)

Final 20%
4 Quizzes 20%
Term project (3 phase) 30% Hours
Research essay on a chosen topic 10% Syllabus
Recommended
Minimum requirements to qualify for the final exam: textbooks
Grading - Undergrad
All project reports should be completed and submitted (no Grading - Grad
late submission). Supervised learning
At least 20/100 must be obtained from the midterm.
Deep learning
Reinforcement
learning
Probability review
Basics
Random variables
Random vectors
Convergence
1.8
Introduction
Project
Hands-on experience with statistical learning methods.

Use your creativity.
Work closely with your TAs.
You can work individually or in a group of 2. Hours
Project proposal → phase 1 report → phase 2 report → Syllabus
Recommended
final report. textbooks
Grading - Undergrad
Grading - Grad
Supervised learning
Deep learning
Reinforcement
learning
Probability review
Basics
Random variables
Random vectors
Convergence
1.9
Introduction
Supervised learning
Model: |{z}
Y = f (|{z}
X ) + |{z} Hours
Syllabus
output input noise
Recommended
Training data: ( xi , yi ), i = 1, . . . , n. textbooks
|{z} |{z} Grading - Undergrad
instance label Grading - Grad
Supervised learning
Learning process: {(xi , yi )}ni=1 → Learning algorithm → f̂ (·)
Goal: Ensure that f̂ (X ) is close to Y for all possible X and Y Deep learning
Reinforcement
pairs. learning
Probability review
Basics
Random variables
Random vectors
Convergence
1.10
Introduction
Supervised learning - Example: Advertising
X : Advertising budget (thousands of dollars).

Y : Sales (thousands of units).
Each dot corresponds to a previous advertising campaign.
Hours
Syllabus
25
25
25
Recommended
textbooks
20
20
20
Grading - Undergrad
Grading - Grad
Sales
Sales
Sales
15
15
15
Supervised learning
10
10
10
Deep learning
Reinforcement
5
5
learning
Probability review
0 50 100 200 300 0 10 20 30 40 50 0 20 40 60 80 100 Basics
TV Radio Newspaper Random variables

Random vectors
Convergence
Blue line: least squares estimate of sales given data.
*Figure from G. James et al, An Introduction to Statistical Learning with Applications in R, Springer, 2013 1.11
Introduction
Supervised learning - Example: Ship recognition
Havelsan ship recognition project:

Ship silhouettes
Hours
Convolutional
Syllabus
Neural Network
Recommended
Training textbooks
Grading - Undergrad
Grading - Grad
Supervised learning
Deep learning
Reinforcement
learning
Test Probability review

Basics
Random variables
Random vectors
Convergence
1.12
Introduction
Training data: xi , i = 1, . . . , n. (no labels)

Can we categorize data into different groups?
Hours
Syllabus
Recommended
textbooks
Grading - Undergrad
Grading - Grad
Supervised learning
Deep learning
Reinforcement
learning
Probability review
Basics
Random variables
K -means with K = 3 clusters Random vectors

Convergence
*Figure from http://pypr.sourceforge.net/kmeans 1.13

Introduction
Unsupervised learning - Example: Compression
Hours
Syllabus
Recommended
textbooks
Grading - Undergrad
Grading - Grad
Sir Ronald Fisher (Statistician)
Supervised learning
Right image: 1024x1024 pixels. Value of each pixel in {0, 1, . . . , 255}.
Size ≈ 1 MB.
Deep learning
Break image into 2x2 blocks of pixels. (Total 512x512 blocks). Reinforcement
learning
Each block is a point in R4 . (Total 512x512 blocks)
Probability review
Cluster these vectors into 200 (middle) and 4 (right) clusters. Basics
Random variables
Find the centers of each cluster. Random vectors
Convergence
Each point is approximated by its closest cluster centroid.
*Figure from Hastie et al. "The elements of statistical learning". 1.14

Introduction
Unsupervised learning - Example: Blind signal seperation
No prior information about the source signals or mixing

process.
Music Information Retrieval: e.g., instrument identification,
voice transcription
Hours
Syllabus
Recommended
textbooks
Grading - Undergrad
Grading - Grad
Supervised learning
Deep learning
Reinforcement
learning
Probability review
Basics
Random variables
Examples: Random vectors
Convergence
http://research.ics.aalto.fi/ica/cocktail/cocktail_en.cgi
*Figure from http://music.cs.northwestern.edu 1.15

Introduction
Deep learning - Deep belief network
Hours
Syllabus
Recommended
Handwritten digit classification and generation textbooks
Example: Grading - Undergrad
Grading - Grad
http://www.cs.toronto.edu/~hinton/digits.html
Supervised learning
Deep learning
Reinforcement
learning
Probability review
Basics
Random variables
Random vectors
Convergence
1.16
Introduction
Reinforcement learning
Hours
Locomotion
Syllabus
Example: Recommended
https://www.youtube.com/watch?v=hx_bgoTF7bs textbooks
Grading - Undergrad
Grading - Grad
Supervised learning
Starcraft II Deep learning
Example: https://youtu.be/cUTMhmVh1qs Reinforcement
learning
Probability review
Basics
Random variables
Random vectors
Convergence
1.17
Introduction
Probability review
Hours
Syllabus
Probability theory studies uncertainty. Recommended

textbooks
Real-world events that generate data are uncertain. Grading - Undergrad
Grading - Grad
Learning algorithms are trained on limited data. We want
Supervised learning
algorithms to perform well on “unseen" data!
Deep learning
Reinforcement
learning
Probability review
Basics
Random variables
Random vectors
Convergence
1.18
Introduction
Two interpretations of probability
Hours
Syllabus
Frequentist: Probabilities represent frequencies of events that Recommended
textbooks
happen many times.
Grading - Undergrad
Bayesian: Probabilities represent uncertainty about events. Grading - Grad
Supervised learning
Example: Consider the experiment of flipping a fair coin. Unsupervised learning
Deep learning
Reinforcement
learning
Probability review
Basics
Random variables
Random vectors
Convergence
1.19
Introduction
Probability space
A probability space is a triplet (Ω, F, P) that models uncertain

real-world situations. Hours
Ω (sample space): The set of all outcomes of a random Syllabus
Recommended
experiment. textbooks
F (set of events, σ-algebra): A set whose elements are Grading - Undergrad
Grading - Grad
subsets of Ω (the actual definition is more advanced)
Supervised learning
P (probability measure): A function P : F → R with the Unsupervised learning
following properties (axioms): Deep learning
1 P(A) ≥ 0 for all A ∈ F Reinforcement
learning
2 P(Ω) = 1 S P Probability review
3 For disjoint events Ai ∩ Aj = ∅, i 6= j, P( i Ai ) = i P(Ai ) Basics
Random variables
Random vectors
Convergence
1.20
Introduction
Probability space - Example
Movie recommendation to twins: Ayse and Zeynep.

Hours
Twins rate the recommended movie. They are equally
Syllabus
likely to like (L) or dislike (D) the movie.
Recommended
Twins are unaware of each other’s rating. They act textbooks
Grading - Undergrad
independently.
Grading - Grad
Supervised learning
1 What is the probability space of this experiment? Unsupervised learning
Deep learning
2 What is the probability that at least one liked the movie?
Reinforcement
3 What is the probability that Ayse liked the movie? learning
Probability review
4 What is the probability that only Ayse liked the movie? Basics
Random variables
Random vectors
Convergence
1.21
Introduction
Basic properties
Hours
Order: If A ⊂ B, then P(A) ≤ P(B) Syllabus
Recommended
Union bound: For a collection of events A1 , A2 , . . ., we textbooks
have Grading - Undergrad
Grading - Grad
X
P(∪i Ai ) ≤ P(Ai ) Supervised learning
i
Deep learning
Reinforcement
learning
Probability review
Basics
Random variables
Random vectors
Convergence
1.22
Introduction
Conditional probability and independence
Conditional probability:
Let B be an event with non-zero probability. Then for any event
Hours
A: Syllabus
P(A ∩ B) Recommended
textbooks
P(A|B) :=
P(B) Grading - Undergrad
Grading - Grad
Supervised learning
Independence: Unsupervised learning
Two events are independent if and only if Deep learning
Reinforcement
P(A ∩ B) = P(A)P(B) learning
Probability review
Basics
or equivalently when P(A|B) = P(A). Random variables
Random vectors
Convergence
1.23
Introduction
Bayes’ rule
Let B be an event with non-zero probability. Then for any event

A:
P(B|A)P(A) Hours
P(A|B) := Syllabus
P(B)
Recommended
textbooks
Having observed B, what is our belief about A? Grading - Undergrad
Grading - Grad
Supervised learning
P(A|B) ∝ P(A) × P(B|A) Unsupervised learning

| {z } | {z } | {z } Deep learning
posterior prior likelihood
Reinforcement
learning
Example: Probability review
Basics
B: Data we have (information) Random variables
Random vectors
A: Our model about real-world (hypothesis) Convergence
1.24
Introduction
Random variables
A random variable (RV) X is a (Borel measurable) function

X :Ω→R Hours
Syllabus
Discrete RV: Takes a discrete set of values Recommended
textbooks
Continuous RV: Takes a continuum of values
Grading - Undergrad
Cumulative distribution function (CDF): FX (x) := P(X ≤ x) Grading - Grad
Supervised learning
Probability mass function (pmf): pX (x) := P(X = x) (for Unsupervised learning
discrete RVs) Deep learning
dFX (x)
Probability density function (pdf): fX (x) := dx (for Reinforcement
learning
continuous RVs) Probability review
Basics
Random variables
Random vectors
Convergence
1.25
Introduction
Properties of the CDF, pmf and pdf
CDF properties:
0 ≤ FX (x) ≤ 1
limx→−∞ FX (x) = 0 and limx→∞ FX (x) = 1 Hours
x ≤ y ⇒ FX (x) ≤ FX (y ) Syllabus
Recommended
pmf properties: textbooks
Grading - Undergrad
0 ≤ pX (x) ≤ 1 Grading - Grad
P
pX (x) = 1 Supervised learning
Px Unsupervised learning
x∈A pX (x) = P(X ∈ A) Deep learning
pdf properties: Reinforcement

learning
fX (x) ≥ 0 Probability review

R∞ Basics
fX (x)dx = 1 Random variables
R−∞ Random vectors

Convergence
x∈A X
f (x)dx = P(X ∈ A)
1.26
Introduction
Expectation
Let g : R → R be an arbitrary function.

Hours
Let X be a discrete RV with pmf pX (x). Then
Syllabus
X Recommended
E[g(X )] = g(x)pX (x) textbooks
Grading - Undergrad
x
Grading - Grad
Let Y be a continuous RV with pdf pY (y ). Then Supervised learning
Z ∞
Deep learning
E[g(Y )] = g(y )fY (y )dy Reinforcement
y =−∞ learning
Probability review
What is E[X ]? What is E[Y ]? Basics
Random variables
Random vectors
Convergence
1.27
Introduction
Properties of expectation
Hours
Syllabus
Recommended
For a constant a, E[a] = a textbooks
Grading - Undergrad
Expectation is linear:
Grading - Grad
E[ag(X ) + bh(Y )] = aE[g(X )] + bE[h(Y )] Supervised learning
Deep learning
Reinforcement
learning
Probability review
Basics
Random variables
Random vectors
Convergence
1.28
Introduction
Variance
Hours
Syllabus
Var(X ) = E[(X − E[X ])2 ] = E[X 2 ] − E[X ]2 Recommended
textbooks
Properties: Grading - Undergrad
Grading - Grad
For a constant a, Var(a) = 0.
Supervised learning
Var(ag(X )) = a2 Var(g(X )). Unsupervised learning
Deep learning
Reinforcement
learning
Probability review
Basics
Random variables
Random vectors
Convergence
1.29
Introduction
Indicator function
Hours
Very useful in calculating the expectation! Syllabus
For a RV X ( Recommended
textbooks
1 if X ∈ A
I(X ∈ A) := Grading - Undergrad
0 otherwise Grading - Grad
Supervised learning
E[I(X ∈ A)] = P(X ∈ A) Unsupervised learning
Deep learning
Reinforcement
learning
Probability review
Basics
Random variables
Random vectors
Convergence
1.30
Introduction
Examples of RVs - Discrete RVs - Bernoulli
Bernoulli: X ∼ Ber(p), 0 ≤ p ≤ 1,
( Hours
p if y = 1 Syllabus
pX (y ) :=
1 − p if y = 0 Recommended
textbooks
Grading - Undergrad
pX (y) Grading - Grad
Supervised learning
p Unsupervised learning
Deep learning
1 p Reinforcement
learning
Probability review
0 1 y Basics
Random variables
Random vectors
Convergence
1.31
Introduction
Examples of RVs - Discrete RVs - Binomial
Binomial: X ∼ Binomial(n, p), 0 ≤ p ≤ 1,

n y
pX (y ) := p (1 − p)n−y for y ∈ {0, . . . , n}
y
Hours
Syllabus
Recommended
textbooks
Grading - Undergrad
Grading - Grad
Supervised learning
Deep learning
Reinforcement
learning
Probability review
Basics
Random variables
Random vectors
Convergence
*Figure from wikipedia 1.32

Introduction
Examples of RVs - Discrete RVs - Poisson
Poisson: X ∼ Poisson(λ), λ > 0
e−λ λy
pX (y ) := for y ∈ {0, 1, . . .}
y!
Hours
Syllabus
Recommended
textbooks
Grading - Undergrad
pX (y)
Grading - Grad
Supervised learning
Deep learning
Reinforcement
learning
Probability review
Basics
Random variables
Random vectors
Convergence
y

Introduction
Examples of RVs - Continuous RVs - Gaussian
Gaussian (Normal): X ∼ N(µ, σ 2 )

1 1
fX (y ) := √ exp − 2 (y − µ)2 , y ∈ R
2πσ 2σ
Hours
Syllabus
Recommended
textbooks
Grading - Undergrad
Grading - Grad
fX (y)
Supervised learning
Deep learning
Reinforcement
learning
Probability review
Basics
y Random variables
Random vectors
Convergence
Important! Widely used in statistical learning

Introduction
Examples of RVs - Continuous RVs - Beta
Beta: X ∼ Beta(α, β)
1
fX (y ) := y α−1 (1 − y )β−1 , y ∈ [0, 1]
B(α, β)
Hours
Syllabus
Recommended
textbooks
Grading - Undergrad
Grading - Grad
fX (y)
Supervised learning
Deep learning
Reinforcement
learning
Probability review
Basics
Random variables
Random vectors
Convergence
y
Widely used in Bayesian inference!

Introduction
Joint distribution
Joint CDF: FXY (x, y ) = P(X ≤ x, Y ≤ y ) Hours
Syllabus
Joint pmf: pXY (x, y ) = P(X = x, Y = y )
Recommended
2
∂ FXY (x,y ) textbooks
Joint pdf: fXY (x, y ) = ∂x∂y Grading - Undergrad
Properties: Grading - Grad
Supervised learning
0 ≤ FXY (x, y ) ≤ 1 Unsupervised learning
limx,y →∞ FXY (x, y ) = 1, limx,y →−∞ FXY (x, y ) = 0 Deep learning
Reinforcement
FX (x) = limy →∞ FXY (x, y ). learning
Probability review
Basics
Random variables
Random vectors
Convergence
1.36
Introduction
Conditional distribution
For both discrete and continuous, given that pX (x) 6= 0:
pXY (x, y ) Hours

pY |X (y |x) =
pX (x) Syllabus
Recommended
textbooks
Bayes rule (discrete RV):
Grading - Undergrad
Grading - Grad
p(x, y ) p(x|y )p(y )
p(y |x) = =P 0 0
Supervised learning
p(x) y 0 p(x|y )p(y ) Unsupervised learning
Deep learning
Bayes rule (continuous RV): Reinforcement
learning
p(x, y ) p(x|y )p(y ) Probability review

p(y |x) = = R∞ Basics
p(x) y 0 =−∞
p(x|y 0 )p(y 0 )dy 0 Random variables
Random vectors
Convergence
1.37
Introduction
Independence, identical distributions
Hours
C1: X and Y independent if and only if Syllabus
Recommended
FXY (x, y ) = FX (x)FY (y ) for all x and y in R textbooks
C2: X and Y are identically distributed if FX (x) = FY (x) Grading - Undergrad
for all x in R Grading - Grad
Supervised learning
Random variables are independent and identically Unsupervised learning
distributed (i.i.d.) if C1 and C2 holds for them Deep learning
Reinforcement
learning
Probability review
Basics
Random variables
Random vectors
Convergence
1.38
Introduction
Covariance
Cov(X , Y ) = E[(X − E[X ])(Y − E[Y ])] Hours
Syllabus
Recommended
Exercise: Show that Cov(X , Y ) = E[XY ] − E[X ]E[Y ] textbooks
Grading - Undergrad
Properties: Grading - Grad
X and Y are called uncorrelated RVs when Cov(X , Y ) = 0 Supervised learning
If X and Y are independent, then Cov(X , Y ) = 0 Deep learning
If X and Y are independent, then Reinforcement

learning
E[g(X )h(Y )] = E[g(X )]E[h(Y )]
Probability review
Are uncorrelated RVs also independent? Basics
Random variables
Random vectors
Convergence
1.39
Introduction
Random vectors
X = [X1 , X2 , . . . , Xn ]T , where Xi ’s are RVs. X : Ω → Rn

Properties:
 
E[X1 ] Hours
E[X2 ] Syllabus
E[X ] =  . 
 
Recommended
 ..  textbooks
Grading - Undergrad
E[Xn ]
Grading - Grad
Covariance matrix of X : Σ = E[(X − E[X ])(X − E[X ])T ]. Supervised learning
Σij =Cov(Xi , Xj ) 
Cov(X1 , X1 ) . . . Cov(X1 , Xn ) Deep learning
Σ=
 .. .. ..  Reinforcement
. . .  learning
Probability review
Cov(Xn , X1 ) ... Cov(Xn , Xn ) Basics
Random variables
Σ is symmetric. Random vectors
Convergence
Σ is positive semidefinite.
1.40
Introduction
Multivariate Gaussian distribution
X = [X1 , X2 , . . . , Xn ]T is a Gaussian random vector if

1 1
fX (x1 , x2 , . . . , xn ) = exp − (x − µ)T Σ−1 (x − µ)
(2π)n/2 |Σ|1/2 2
Hours
Syllabus
µ = E[X ] is the mean vector Recommended
textbooks
Σ = E[(X − E[X ])(X − E[X ])T ] is the covariance matrix Grading - Undergrad
X ∼ N (µ, Σ) Grading - Grad
Supervised learning
Deep learning
Reinforcement
learning
Probability review
Basics
Random variables
Random vectors
Convergence

Introduction
The law of large numbers
Let X1 , X2 , . . . be independent and identically distributed

(i.i.d.) random variables with finite mean E[X ]. Then
X1 + · · · + Xn Hours
→ E[X ] as n → ∞ Syllabus
n
Recommended
textbooks
Grading - Undergrad
Grading - Grad
Supervised learning
Deep learning
Reinforcement
learning
Probability review
Basics
Random variables
Random vectors
Convergence
*Figure from http://www.mathaholic.com 1.42

Introduction
The Central Limit Theorem (CLT)
Let X1 , X2 , . . . be i.i.d. random variables with finite mean µ and
finite variance σ 2 . Let X n = X1 +···+X
n
n
. We have
Pn
i=1 Xi − nµ Xn − µ d
√ = √ −→ N (0, 1) as n → ∞
σ n σ/ n
Hours
Syllabus
Recommended
textbooks
Grading - Undergrad
Grading - Grad
Supervised learning
Deep learning
Reinforcement
learning
Probability review
Basics
Random variables
Random vectors
Convergence
*Figure from http://flylib.com/books/en/2.528.1.68/1/ 1.43

Introduction
Hoeffding’s inequality
Z1 , . . . , Zn independent RVs such that 0 ≤ Zi ≤ 1

Z̄ = n1 (Z1 + Z2 + . . . + Zn )
2 Hours
P(|Z̄ − E[Z̄ ]| ≥ ) ≤ 2e−2n ( > 0) Syllabus
Recommended
Important! Used to test how confident we are in the textbooks
Grading - Undergrad
performance of a machine learning algorithm.
Grading - Grad
Example: Supervised learning
X : feature space, Y: label space. Deep learning
i.i.d.
Training data (Xi , Yi )ni=1 ∼ PXY Reinforcement
learning
Classifier f : X → Y. Probability review

Basics
Can we calculate the error rate of the classifier on “unseen Random variables
Random vectors
data"? Convergence
1.44

Chapter1 PDF

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Chapter1 PDF

Uploaded by

Copyright:

Available Formats

Introduction

Data Analytics: Using learning tools to analyze the data. Hours

Classroom: EE-214 Hours

(Check Moodle for the exact schedule) Recommended

Head TA: Deep learning

Alp Celik, email: alparslan@bilkent.edu.tr Reinforcement

Perceptron, neural networks and backpropagation Supervised learning

Unsupervised learning [3 weeks] Unsupervised learning

maximization Probability review

Graphical models [2 weeks]

Advanced topics [3 weeks] Syllabus

T. Hastie, R. Tibshirani and J. Friedman, The elements of

Christopher M. Bishop, Pattern recognition and machine Supervised learning

At least 20/100 must be obtained from the midterm. Supervised learning

Midterm 20% (Tentative: Thursday, March 15, 2018)

Research essay on a chosen topic 10% Syllabus

late submission). Supervised learning

Hands-on experience with statistical learning methods.

Project proposal → phase 1 report → phase 2 report → Syllabus

X : Advertising budget (thousands of dollars).

TV Radio Newspaper Random variables

Havelsan ship recognition project:

Test Probability review

Training data: xi , i = 1, . . . , n. (no labels)

K -means with K = 3 clusters Random vectors

*Figure from http://pypr.sourceforge.net/kmeans 1.13

*Figure from Hastie et al. "The elements of statistical learning". 1.14

No prior information about the source signals or mixing

*Figure from http://music.cs.northwestern.edu 1.15

Example: Grading - Undergrad

Probability theory studies uncertainty. Recommended

Real-world events that generate data are uncertain. Grading - Undergrad

Bayesian: Probabilities represent uncertainty about events. Grading - Grad

A probability space is a triplet (Ω, F, P) that models uncertain

Ω (sample space): The set of all outcomes of a random Syllabus

F (set of events, σ-algebra): A set whose elements are Grading - Undergrad

Movie recommendation to twins: Ayse and Zeynep.

Order: If A ⊂ B, then P(A) ≤ P(B) Syllabus

have Grading - Undergrad

Let B be an event with non-zero probability. Then for any event

P(A|B) ∝ P(A) × P(B|A) Unsupervised learning

A random variable (RV) X is a (Borel measurable) function

Cumulative distribution function (CDF): FX (x) := P(X ≤ x) Grading - Grad

pdf properties: Reinforcement

fX (x) ≥ 0 Probability review

fX (x)dx = 1 Random variables

R−∞ Random vectors

Let g : R → R be an arbitrary function.

Let Y be a continuous RV with pdf pY (y ). Then Supervised learning

Very useful in calculating the expectation! Syllabus

0 otherwise Grading - Grad

E[I(X ∈ A)] = P(X ∈ A) Unsupervised learning

pX (y) Grading - Grad

Binomial: X ∼ Binomial(n, p), 0 ≤ p ≤ 1,

*Figure from wikipedia 1.32

Poisson: X ∼ Poisson(λ), λ > 0

*Figure from wikipedia 1.33

Gaussian (Normal): X ∼ N(µ, σ 2 )

Important! Widely used in statistical learning

*Figure from wikipedia 1.34

Widely used in Bayesian inference!

Joint CDF: FXY (x, y ) = P(X ≤ x, Y ≤ y ) Hours

Properties: Grading - Grad