You are on page 1of 44

Introduction

Chapter 1
Introduction
Hours
EEE 485/585 Statistical Learning and Data Analytics Syllabus

Recommended
textbooks

Grading - Undergrad

Grading - Grad

Supervised learning

Unsupervised learning

Deep learning

Reinforcement
learning

Probability review
Basics
Random variables
Random vectors
Convergence

1.1
Introduction
Statistical Learning: A framework for reasoning from the data
by making statistical assumptions on the way the data is
generated.
Ex: Predictions, decision making, structure identification.

Data Analytics: Using learning tools to analyze the data. Hours

Syllabus
Ex: What happened, why it happened, what will happen, what
Recommended
action to take... textbooks

Grading - Undergrad

Grading - Grad

Supervised learning

Unsupervised learning

Deep learning

Reinforcement
learning

Probability review
Basics
Random variables
Random vectors
Convergence

1.2
Introduction
Lecture & office hours, TAs
Instructor (Section 2): Asst. Prof. Dr. Cem Tekin
Email: cemtekin@ee.bilkent.edu.tr
Webpage: http://kilyos.ee.bilkent.edu.tr/ cemtekin/

Classroom: EE-214 Hours


Reserved lecture hours: Mon 15:40-17:30, Thu 13:40-15:30 Syllabus

(Check Moodle for the exact schedule) Recommended


textbooks

Grading - Undergrad
Office: EE-203 Grading - Grad
Office hours: Wed 13:30-15:00 (or by appointment) Supervised learning

Unsupervised learning

Head TA: Deep learning

Alp Celik, email: alparslan@bilkent.edu.tr Reinforcement


learning
TA office hours will be announced in Moodle. Probability review
Basics
Random variables
Random vectors
Convergence

1.3
Introduction
Syllabus - 1

Introduction [2 weeks]
Intro & probability review
Bayesian and frequentist machine learning
Supervised learning [4 weeks]
Hours
Linear regression, ordinary least squares
Syllabus
Ridge regression, lasso
Recommended
Parameter estimation, Bayesian linear regression textbooks
How to perform validation Grading - Undergrad
Generalized linear models Grading - Grad

Perceptron, neural networks and backpropagation Supervised learning

Unsupervised learning [3 weeks] Unsupervised learning

Deep learning
Blind signal separation
Reinforcement
Clustering, Gaussian mixtures and expectation learning

maximization Probability review


Basics
Feature extraction and feature selection Random variables
Random vectors
Convergence

1.4
Introduction
Syllabus - 2

Graphical models [2 weeks]


Inference and learning in graphical models
Applications of graphical models
Naive Bayes, Restricted Boltzmann Machines, Contrastive
Divergence, etc. Hours

Advanced topics [3 weeks] Syllabus

Recommended
Deep learning textbooks
Reinforcement learning Grading - Undergrad
Online learning Grading - Grad

Supervised learning

Unsupervised learning

Deep learning

Reinforcement
learning

Probability review
Basics
Random variables
Random vectors
Convergence

1.5
Introduction
Recommended textbooks

T. Hastie, R. Tibshirani and J. Friedman, The elements of


statistical learning. Springer, 2003.
G. James, D. Witten , T. Hastie, R. Tibshirani, An
introduction to statistical learning. Springer, 2013.
Hours
K. P. Murphy, Machine learning: A probabilistic Syllabus
perspective. MIT press, 2012. Recommended
textbooks
D. Koller, and N. Friedman, Probabilistic graphical Grading - Undergrad
models: Principles and techniques. MIT press, 2009. Grading - Grad

Christopher M. Bishop, Pattern recognition and machine Supervised learning

Unsupervised learning
learning, 2011, Springer
Deep learning

Reinforcement
learning

Probability review
Basics
Random variables
Random vectors
Convergence

1.6
Introduction
Grading - Undergrad

Midterm 25%
Final 25%
4 Quizzes 20%
Term project (3 phase) 30% Hours

Syllabus
Minimum requirements to qualify for the final exam:
Recommended
textbooks
All project reports should be completed and submitted (no
Grading - Undergrad
late submission). Grading - Grad

At least 20/100 must be obtained from the midterm. Supervised learning

Unsupervised learning

Deep learning

Reinforcement
learning

Probability review
Basics
Random variables
Random vectors
Convergence

1.7
Introduction
Grading - Grad

Midterm 20% (Tentative: Thursday, March 15, 2018)


Final 20%
4 Quizzes 20%
Term project (3 phase) 30% Hours

Research essay on a chosen topic 10% Syllabus

Recommended
Minimum requirements to qualify for the final exam: textbooks

Grading - Undergrad
All project reports should be completed and submitted (no Grading - Grad

late submission). Supervised learning

Unsupervised learning
At least 20/100 must be obtained from the midterm.
Deep learning

Reinforcement
learning

Probability review
Basics
Random variables
Random vectors
Convergence

1.8
Introduction
Project

Hands-on experience with statistical learning methods.


Use your creativity.
Work closely with your TAs.
You can work individually or in a group of 2. Hours

Project proposal → phase 1 report → phase 2 report → Syllabus

Recommended
final report. textbooks

Grading - Undergrad

Grading - Grad

Supervised learning

Unsupervised learning

Deep learning

Reinforcement
learning

Probability review
Basics
Random variables
Random vectors
Convergence

1.9
Introduction
Supervised learning

Model: |{z}
Y = f (|{z} 
X ) + |{z} Hours

Syllabus
output input noise
Recommended
Training data: ( xi , yi ), i = 1, . . . , n. textbooks
|{z} |{z} Grading - Undergrad
instance label Grading - Grad

Supervised learning
Learning process: {(xi , yi )}ni=1 → Learning algorithm → f̂ (·)
Unsupervised learning

Goal: Ensure that f̂ (X ) is close to Y for all possible X and Y Deep learning

Reinforcement
pairs. learning

Probability review
Basics
Random variables
Random vectors
Convergence

1.10
Introduction
Supervised learning - Example: Advertising

X : Advertising budget (thousands of dollars).


Y : Sales (thousands of units).
Each dot corresponds to a previous advertising campaign.
Hours

Syllabus
25

25

25
Recommended
textbooks
20

20

20
Grading - Undergrad

Grading - Grad
Sales

Sales

Sales
15

15

15
Supervised learning

Unsupervised learning
10

10

10
Deep learning

Reinforcement
5

5
learning

Probability review
0 50 100 200 300 0 10 20 30 40 50 0 20 40 60 80 100 Basics

TV Radio Newspaper Random variables


Random vectors
Convergence
Blue line: least squares estimate of sales given data.

*Figure from G. James et al, An Introduction to Statistical Learning with Applications in R, Springer, 2013 1.11
Introduction
Supervised learning - Example: Ship recognition

Havelsan ship recognition project:


Ship silhouettes

Hours
Convolutional
Syllabus
Neural Network
Recommended
Training textbooks

Grading - Undergrad

Grading - Grad

Supervised learning

Unsupervised learning

Deep learning

Reinforcement
learning

Test Probability review


Basics
Random variables
Random vectors
Convergence

1.12
Introduction
Unsupervised learning

Training data: xi , i = 1, . . . , n. (no labels)


Can we categorize data into different groups?
Hours

Syllabus

Recommended
textbooks

Grading - Undergrad

Grading - Grad

Supervised learning

Unsupervised learning

Deep learning

Reinforcement
learning

Probability review
Basics
Random variables

K -means with K = 3 clusters Random vectors


Convergence

*Figure from http://pypr.sourceforge.net/kmeans 1.13


Introduction
Unsupervised learning - Example: Compression

Hours

Syllabus

Recommended
textbooks

Grading - Undergrad

Grading - Grad
Sir Ronald Fisher (Statistician)
Supervised learning
Right image: 1024x1024 pixels. Value of each pixel in {0, 1, . . . , 255}.
Unsupervised learning
Size ≈ 1 MB.
Deep learning
Break image into 2x2 blocks of pixels. (Total 512x512 blocks). Reinforcement
learning
Each block is a point in R4 . (Total 512x512 blocks)
Probability review
Cluster these vectors into 200 (middle) and 4 (right) clusters. Basics
Random variables
Find the centers of each cluster. Random vectors
Convergence
Each point is approximated by its closest cluster centroid.

*Figure from Hastie et al. "The elements of statistical learning". 1.14


Introduction
Unsupervised learning - Example: Blind signal seperation

No prior information about the source signals or mixing


process.
Music Information Retrieval: e.g., instrument identification,
voice transcription
Hours

Syllabus

Recommended
textbooks

Grading - Undergrad

Grading - Grad

Supervised learning

Unsupervised learning

Deep learning

Reinforcement
learning

Probability review
Basics
Random variables
Examples: Random vectors
Convergence
http://research.ics.aalto.fi/ica/cocktail/cocktail_en.cgi

*Figure from http://music.cs.northwestern.edu 1.15


Introduction
Deep learning - Deep belief network

Hours

Syllabus

Recommended
Handwritten digit classification and generation textbooks

Example: Grading - Undergrad

Grading - Grad
http://www.cs.toronto.edu/~hinton/digits.html
Supervised learning

Unsupervised learning

Deep learning

Reinforcement
learning

Probability review
Basics
Random variables
Random vectors
Convergence

1.16
Introduction
Reinforcement learning

Hours
Locomotion
Syllabus
Example: Recommended
https://www.youtube.com/watch?v=hx_bgoTF7bs textbooks

Grading - Undergrad

Grading - Grad

Supervised learning

Unsupervised learning
Starcraft II Deep learning
Example: https://youtu.be/cUTMhmVh1qs Reinforcement
learning

Probability review
Basics
Random variables
Random vectors
Convergence

1.17
Introduction
Probability review

Hours

Syllabus

Probability theory studies uncertainty. Recommended


textbooks

Real-world events that generate data are uncertain. Grading - Undergrad

Grading - Grad
Learning algorithms are trained on limited data. We want
Supervised learning
algorithms to perform well on “unseen" data!
Unsupervised learning

Deep learning

Reinforcement
learning

Probability review
Basics
Random variables
Random vectors
Convergence

1.18
Introduction
Two interpretations of probability

Hours

Syllabus
Frequentist: Probabilities represent frequencies of events that Recommended
textbooks
happen many times.
Grading - Undergrad

Bayesian: Probabilities represent uncertainty about events. Grading - Grad

Supervised learning
Example: Consider the experiment of flipping a fair coin. Unsupervised learning

Deep learning

Reinforcement
learning

Probability review
Basics
Random variables
Random vectors
Convergence

1.19
Introduction
Probability space

A probability space is a triplet (Ω, F, P) that models uncertain


real-world situations. Hours

Ω (sample space): The set of all outcomes of a random Syllabus

Recommended
experiment. textbooks

F (set of events, σ-algebra): A set whose elements are Grading - Undergrad

Grading - Grad
subsets of Ω (the actual definition is more advanced)
Supervised learning
P (probability measure): A function P : F → R with the Unsupervised learning
following properties (axioms): Deep learning
1 P(A) ≥ 0 for all A ∈ F Reinforcement
learning
2 P(Ω) = 1 S P Probability review
3 For disjoint events Ai ∩ Aj = ∅, i 6= j, P( i Ai ) = i P(Ai ) Basics
Random variables
Random vectors
Convergence

1.20
Introduction
Probability space - Example

Movie recommendation to twins: Ayse and Zeynep.


Hours
Twins rate the recommended movie. They are equally
Syllabus
likely to like (L) or dislike (D) the movie.
Recommended
Twins are unaware of each other’s rating. They act textbooks

Grading - Undergrad
independently.
Grading - Grad

Supervised learning
1 What is the probability space of this experiment? Unsupervised learning

Deep learning
2 What is the probability that at least one liked the movie?
Reinforcement
3 What is the probability that Ayse liked the movie? learning

Probability review
4 What is the probability that only Ayse liked the movie? Basics
Random variables
Random vectors
Convergence

1.21
Introduction
Basic properties

Hours

Order: If A ⊂ B, then P(A) ≤ P(B) Syllabus

Recommended
Union bound: For a collection of events A1 , A2 , . . ., we textbooks

have Grading - Undergrad

Grading - Grad
X
P(∪i Ai ) ≤ P(Ai ) Supervised learning

Unsupervised learning
i
Deep learning

Reinforcement
learning

Probability review
Basics
Random variables
Random vectors
Convergence

1.22
Introduction
Conditional probability and independence

Conditional probability:
Let B be an event with non-zero probability. Then for any event
Hours
A: Syllabus

P(A ∩ B) Recommended
textbooks
P(A|B) :=
P(B) Grading - Undergrad

Grading - Grad

Supervised learning
Independence: Unsupervised learning
Two events are independent if and only if Deep learning

Reinforcement
P(A ∩ B) = P(A)P(B) learning

Probability review
Basics
or equivalently when P(A|B) = P(A). Random variables
Random vectors
Convergence

1.23
Introduction
Bayes’ rule

Let B be an event with non-zero probability. Then for any event


A:
P(B|A)P(A) Hours
P(A|B) := Syllabus
P(B)
Recommended
textbooks
Having observed B, what is our belief about A? Grading - Undergrad

Grading - Grad

Supervised learning

P(A|B) ∝ P(A) × P(B|A) Unsupervised learning


| {z } | {z } | {z } Deep learning
posterior prior likelihood
Reinforcement
learning
Example: Probability review
Basics
B: Data we have (information) Random variables
Random vectors
A: Our model about real-world (hypothesis) Convergence

1.24
Introduction
Random variables

A random variable (RV) X is a (Borel measurable) function


X :Ω→R Hours

Syllabus
Discrete RV: Takes a discrete set of values Recommended
textbooks
Continuous RV: Takes a continuum of values
Grading - Undergrad

Cumulative distribution function (CDF): FX (x) := P(X ≤ x) Grading - Grad

Supervised learning
Probability mass function (pmf): pX (x) := P(X = x) (for Unsupervised learning
discrete RVs) Deep learning
dFX (x)
Probability density function (pdf): fX (x) := dx (for Reinforcement
learning
continuous RVs) Probability review
Basics
Random variables
Random vectors
Convergence

1.25
Introduction
Properties of the CDF, pmf and pdf

CDF properties:
0 ≤ FX (x) ≤ 1
limx→−∞ FX (x) = 0 and limx→∞ FX (x) = 1 Hours

x ≤ y ⇒ FX (x) ≤ FX (y ) Syllabus

Recommended
pmf properties: textbooks

Grading - Undergrad
0 ≤ pX (x) ≤ 1 Grading - Grad
P
pX (x) = 1 Supervised learning
Px Unsupervised learning
x∈A pX (x) = P(X ∈ A) Deep learning

pdf properties: Reinforcement


learning

fX (x) ≥ 0 Probability review


R∞ Basics

fX (x)dx = 1 Random variables

R−∞ Random vectors


Convergence

x∈A X
f (x)dx = P(X ∈ A)

1.26
Introduction
Expectation

Let g : R → R be an arbitrary function.


Hours
Let X be a discrete RV with pmf pX (x). Then
Syllabus
X Recommended
E[g(X )] = g(x)pX (x) textbooks

Grading - Undergrad
x
Grading - Grad

Let Y be a continuous RV with pdf pY (y ). Then Supervised learning

Unsupervised learning
Z ∞
Deep learning
E[g(Y )] = g(y )fY (y )dy Reinforcement
y =−∞ learning

Probability review
What is E[X ]? What is E[Y ]? Basics
Random variables
Random vectors
Convergence

1.27
Introduction
Properties of expectation

Hours

Syllabus

Recommended
For a constant a, E[a] = a textbooks

Grading - Undergrad
Expectation is linear:
Grading - Grad
E[ag(X ) + bh(Y )] = aE[g(X )] + bE[h(Y )] Supervised learning

Unsupervised learning

Deep learning

Reinforcement
learning

Probability review
Basics
Random variables
Random vectors
Convergence

1.28
Introduction
Variance

Hours

Syllabus
Var(X ) = E[(X − E[X ])2 ] = E[X 2 ] − E[X ]2 Recommended
textbooks
Properties: Grading - Undergrad

Grading - Grad
For a constant a, Var(a) = 0.
Supervised learning
Var(ag(X )) = a2 Var(g(X )). Unsupervised learning

Deep learning

Reinforcement
learning

Probability review
Basics
Random variables
Random vectors
Convergence

1.29
Introduction
Indicator function

Hours

Very useful in calculating the expectation! Syllabus

For a RV X ( Recommended
textbooks
1 if X ∈ A
I(X ∈ A) := Grading - Undergrad

0 otherwise Grading - Grad

Supervised learning

E[I(X ∈ A)] = P(X ∈ A) Unsupervised learning

Deep learning

Reinforcement
learning

Probability review
Basics
Random variables
Random vectors
Convergence

1.30
Introduction
Examples of RVs - Discrete RVs - Bernoulli

Bernoulli: X ∼ Ber(p), 0 ≤ p ≤ 1,
( Hours
p if y = 1 Syllabus
pX (y ) :=
1 − p if y = 0 Recommended
textbooks

Grading - Undergrad

pX (y) Grading - Grad

Supervised learning

p Unsupervised learning

Deep learning

1 p Reinforcement
learning

Probability review

0 1 y Basics
Random variables
Random vectors
Convergence

1.31
Introduction
Examples of RVs - Discrete RVs - Binomial

Binomial: X ∼ Binomial(n, p), 0 ≤ p ≤ 1,


 
n y
pX (y ) := p (1 − p)n−y for y ∈ {0, . . . , n}
y
Hours

Syllabus

Recommended
textbooks

Grading - Undergrad

Grading - Grad

Supervised learning

Unsupervised learning

Deep learning

Reinforcement
learning

Probability review
Basics
Random variables
Random vectors
Convergence

*Figure from wikipedia 1.32


Introduction
Examples of RVs - Discrete RVs - Poisson

Poisson: X ∼ Poisson(λ), λ > 0

e−λ λy
pX (y ) := for y ∈ {0, 1, . . .}
y!
Hours

Syllabus

Recommended
textbooks

Grading - Undergrad
pX (y)

Grading - Grad

Supervised learning

Unsupervised learning

Deep learning

Reinforcement
learning

Probability review
Basics
Random variables
Random vectors
Convergence
y

*Figure from wikipedia 1.33


Introduction
Examples of RVs - Continuous RVs - Gaussian

Gaussian (Normal): X ∼ N(µ, σ 2 )


 
1 1
fX (y ) := √ exp − 2 (y − µ)2 , y ∈ R
2πσ 2σ
Hours

Syllabus

Recommended
textbooks

Grading - Undergrad

Grading - Grad
fX (y)

Supervised learning

Unsupervised learning

Deep learning

Reinforcement
learning

Probability review
Basics
y Random variables
Random vectors
Convergence

Important! Widely used in statistical learning

*Figure from wikipedia 1.34


Introduction
Examples of RVs - Continuous RVs - Beta
Beta: X ∼ Beta(α, β)
1
fX (y ) := y α−1 (1 − y )β−1 , y ∈ [0, 1]
B(α, β)
Hours

Syllabus

Recommended
textbooks

Grading - Undergrad

Grading - Grad
fX (y)

Supervised learning

Unsupervised learning

Deep learning

Reinforcement
learning

Probability review
Basics
Random variables
Random vectors
Convergence
y

Widely used in Bayesian inference!


*Figure from wikipedia 1.35
Introduction
Joint distribution

Joint CDF: FXY (x, y ) = P(X ≤ x, Y ≤ y ) Hours

Syllabus
Joint pmf: pXY (x, y ) = P(X = x, Y = y )
Recommended
2
∂ FXY (x,y ) textbooks
Joint pdf: fXY (x, y ) = ∂x∂y Grading - Undergrad

Properties: Grading - Grad

Supervised learning
0 ≤ FXY (x, y ) ≤ 1 Unsupervised learning

limx,y →∞ FXY (x, y ) = 1, limx,y →−∞ FXY (x, y ) = 0 Deep learning

Reinforcement
FX (x) = limy →∞ FXY (x, y ). learning

Probability review
Basics
Random variables
Random vectors
Convergence

1.36
Introduction
Conditional distribution

For both discrete and continuous, given that pX (x) 6= 0:

pXY (x, y ) Hours


pY |X (y |x) =
pX (x) Syllabus

Recommended
textbooks
Bayes rule (discrete RV):
Grading - Undergrad

Grading - Grad
p(x, y ) p(x|y )p(y )
p(y |x) = =P 0 0
Supervised learning
p(x) y 0 p(x|y )p(y ) Unsupervised learning

Deep learning
Bayes rule (continuous RV): Reinforcement
learning

p(x, y ) p(x|y )p(y ) Probability review


p(y |x) = = R∞ Basics

p(x) y 0 =−∞
p(x|y 0 )p(y 0 )dy 0 Random variables
Random vectors
Convergence

1.37
Introduction
Independence, identical distributions

Hours

C1: X and Y independent if and only if Syllabus

Recommended
FXY (x, y ) = FX (x)FY (y ) for all x and y in R textbooks

C2: X and Y are identically distributed if FX (x) = FY (x) Grading - Undergrad

for all x in R Grading - Grad

Supervised learning
Random variables are independent and identically Unsupervised learning
distributed (i.i.d.) if C1 and C2 holds for them Deep learning

Reinforcement
learning

Probability review
Basics
Random variables
Random vectors
Convergence

1.38
Introduction
Covariance

Cov(X , Y ) = E[(X − E[X ])(Y − E[Y ])] Hours

Syllabus

Recommended
Exercise: Show that Cov(X , Y ) = E[XY ] − E[X ]E[Y ] textbooks

Grading - Undergrad
Properties: Grading - Grad

X and Y are called uncorrelated RVs when Cov(X , Y ) = 0 Supervised learning

Unsupervised learning
If X and Y are independent, then Cov(X , Y ) = 0 Deep learning

If X and Y are independent, then Reinforcement


learning
E[g(X )h(Y )] = E[g(X )]E[h(Y )]
Probability review
Are uncorrelated RVs also independent? Basics
Random variables
Random vectors
Convergence

1.39
Introduction
Random vectors

X = [X1 , X2 , . . . , Xn ]T , where Xi ’s are RVs. X : Ω → Rn


Properties:
 
E[X1 ] Hours
E[X2 ] Syllabus

E[X ] =  . 
 
Recommended
 ..  textbooks

Grading - Undergrad
E[Xn ]
Grading - Grad
Covariance matrix of X : Σ = E[(X − E[X ])(X − E[X ])T ]. Supervised learning

Σij =Cov(Xi , Xj ) 
Unsupervised learning

Cov(X1 , X1 ) . . . Cov(X1 , Xn ) Deep learning

Σ=
 .. .. ..  Reinforcement

. . .  learning

Probability review
Cov(Xn , X1 ) ... Cov(Xn , Xn ) Basics
Random variables
Σ is symmetric. Random vectors
Convergence
Σ is positive semidefinite.

1.40
Introduction
Multivariate Gaussian distribution
X = [X1 , X2 , . . . , Xn ]T is a Gaussian random vector if

 
1 1
fX (x1 , x2 , . . . , xn ) = exp − (x − µ)T Σ−1 (x − µ)
(2π)n/2 |Σ|1/2 2
Hours

Syllabus
µ = E[X ] is the mean vector Recommended
textbooks
Σ = E[(X − E[X ])(X − E[X ])T ] is the covariance matrix Grading - Undergrad

X ∼ N (µ, Σ) Grading - Grad

Supervised learning

Unsupervised learning

Deep learning

Reinforcement
learning

Probability review
Basics
Random variables
Random vectors
Convergence

*Figure from wikipedia 1.41


Introduction
The law of large numbers

Let X1 , X2 , . . . be independent and identically distributed


(i.i.d.) random variables with finite mean E[X ]. Then

X1 + · · · + Xn Hours
→ E[X ] as n → ∞ Syllabus
n
Recommended
textbooks

Grading - Undergrad

Grading - Grad

Supervised learning

Unsupervised learning

Deep learning

Reinforcement
learning

Probability review
Basics
Random variables
Random vectors
Convergence

*Figure from http://www.mathaholic.com 1.42


Introduction
The Central Limit Theorem (CLT)
Let X1 , X2 , . . . be i.i.d. random variables with finite mean µ and
finite variance σ 2 . Let X n = X1 +···+X
n
n
. We have
Pn
i=1 Xi − nµ Xn − µ d
√ = √ −→ N (0, 1) as n → ∞
σ n σ/ n
Hours

Syllabus

Recommended
textbooks

Grading - Undergrad

Grading - Grad

Supervised learning

Unsupervised learning

Deep learning

Reinforcement
learning

Probability review
Basics
Random variables
Random vectors
Convergence

*Figure from http://flylib.com/books/en/2.528.1.68/1/ 1.43


Introduction
Hoeffding’s inequality

Z1 , . . . , Zn independent RVs such that 0 ≤ Zi ≤ 1


Z̄ = n1 (Z1 + Z2 + . . . + Zn )
2 Hours
P(|Z̄ − E[Z̄ ]| ≥ ) ≤ 2e−2n ( > 0) Syllabus

Recommended
Important! Used to test how confident we are in the textbooks

Grading - Undergrad
performance of a machine learning algorithm.
Grading - Grad

Example: Supervised learning

Unsupervised learning
X : feature space, Y: label space. Deep learning
i.i.d.
Training data (Xi , Yi )ni=1 ∼ PXY Reinforcement
learning

Classifier f : X → Y. Probability review


Basics

Can we calculate the error rate of the classifier on “unseen Random variables
Random vectors

data"? Convergence

1.44

You might also like