Alex ACM SC Machine Learning Day (Materials) - Introduction To Machine Learning by Eng. Ibrahim Sabek

Introduction to Machine Learning
Ibrahim Sabek
Computer and Systems Engineering Department, Faculty of Engineering, Alexandria University, Egypt
1 / 33
Agenda
1 Machine learning overview and applications 2 Supervised vs. Unsupervised learning 3 Generative vs. Discriminative models 4 Overview of Classication 5 The big picture 6 Bayesian inference 7 Summary 8 Feedback
2 / 33
Machine learning overview and applications
What is Machine Learning (ML)?

Denition: algorithms for inferring unknowns from knowns.
What do you mean by inferring ?? How to get unknowns from knowns??
3 / 33

ML applications
Spam detection Handwriting detection Speech recognition Netix recommendation system
4 / 33

ML applications
Spam detection Handwriting detection Speech recognition Netix recommendation system
Classes of ML models
Supervised vs. Unsupervised. Generative vs. Discriminative
5 / 33
Supervised vs. Unsupervised learning
Supervised vs. Unsupervised

Supervised: Given (x 1 , y 1 ), (x 2 , y 2 ), ......, (x n , y n ), choose a function f (x i ) = y i
x i R 2 , x i = data points y i = class/value
6 / 33

x i R 2 , x i = data points y i = class/value Classication: y i {nite set } Regression: y i R
7 / 33

8 / 33

9 / 33

Unsupervised: Given (x 1 , x 2 , ..., x n ), nd patterns in the data.
x i R 2 , x i = data points
10 / 33

x i R 2 , x i = data points Clustering Density estimation Dimensional reduction
11 / 33

12 / 33

13 / 33

14 / 33
Variations on Supervised and Unsupervised

Semi-supervised: Given (x 1 , y 1 ), (x 2 , y 2 ), ......, (x k , y k ), x k +1 , x k +2 , ..., x n , predict y k +1 , y k +2 , ..., y n
15 / 33

Semi-supervised: Given (x 1 , y 1 ), (x 2 , y 2 ), ......, (x k , y k ), x k +1 , x k +2 , ..., x n , predict y k +1 , y k +2 , ..., y n Active learning:
16 / 33

Decision theory: measure the prediction performance of unlabeled data
17 / 33

Decision theory: measure the prediction performance of unlabeled data Reinforcement learning:
maximize rewards (minimize losses) by actions maximize overall lifetime reward
18 / 33
Generative vs. Discriminative models

Given (x 1 , y 1 ), (x 2 , y 2 ), ......, (x n , y n ), and a new point (x , y )
19 / 33

Given (x 1 , y 1 ), (x 2 , y 2 ), ......, (x n , y n ), and a new point (x , y ) Discriminative:
you want to estimate p (y = 1|x ), p (y = 0|x ) for y {0, 1}
20 / 33

Given (x 1 , y 1 ), (x 2 , y 2 ), ......, (x n , y n ), and a new point (x , y ) Discriminative:
you want to estimate p (y = 1|x ), p (y = 0|x ) for y {0, 1}
Generative:
you want to estimate the joint distribution p (x , y )
21 / 33
Overview of Classication
k-Nearest Neighbor classication (kNN)

Given D = {(x 1 , y 1 ), (x 2 , y 2 ), ......, (x n , y n )}, and a new point (x , y ) where x i R , y i {0, 1}
Dissimilarity metric: d (x , x ) = ||x x ||2 for k = 1 Probabilistic interpretation:

Given xed k , p (y ) = fraction of pts x i in Nk (x ) s.t. y i = y y = argmaxy p (y |x , D )
22 / 33
Classication trees (CART)

Given D = {(x 1 , y 1 ), (x 2 , y 2 ), ......, (x n , y n )}, and a new x where x i R , y i {0, 1} You build a binary tree Minimize error in each leaf
23 / 33
Regression tress (CART)

Given D = {(x 1 , y 1 ), (x 2 , y 2 ), ......, (x n , y n )}, and a new x where xi R, yi R
24 / 33
Bootstrap aggregation (Bagging)

Given D = {(x 1 , y 1 ), (x 2 , y 2 ), ......, (x n , y n )} follows P iid , and a new x where x i R , y i R , we need to nd its y value
25 / 33
Bootstrap aggregation (Bagging)

Given D = {(x 1 , y 1 ), (x 2 , y 2 ), ......, (x n , y n )} follows P iid , and a new x where x i R , y i R , we need to nd its y value Intuition: averaging makes your prediction close to the true label i i Dierent training datasets , (xk , yk ) follows uniform (D ) iid . The nal label y is the average of generated labels from the dierent datasets.
26 / 33
Random forests
Given D = {(x 1 , y 1 ), (x 2 , y 2 ), ......, (x n , y n )} where x i R , y i R For i = 1, ..., B
Choose bootstrap sample Di from D Construct tree Ti using Di s.t. at each node choose random subset of features and only consider splitting on these features.
Given x , take majority vote (for classication) or average (for regression).
27 / 33
The big picture
The big picture

Given the expected loss function EL(y , f (x )) and D = {(x 1 , y 1 ), (x 2 , y 2 ), ......, (x n , y n )} where x i R , y i R , we want to estimate p (y |x ) Discriminative: Estimate p (y |x ) directly using D .
KNN, Trees, SVM
Generative: Estimate p (x , y ) directly using D . and then

p (y |x ) =
p (x ,y ) p (x ) ,
also we have p (x , y ) = p (x |y )p (y )
Params/Latent variables : by including parameters, we have p (x , y |)

for discrete space: p (y |x , D ) = p (y |x , D , )p (|x , D )
p (y |x , D , ) is nice p (|x , D ) is nasty (called posterior dist. on ) summation (or integration in case of continuous space) is nasty and often intractable
28 / 33
The big picture
The big picture

p (y |x , D ) = p (y |x , D , )p (|x , D ) Exact inference:
Multi-variate Gaussian. Graphical models
Point estimate of
Maximum Likelihood Estimation (MLE) Maximum A Prior (MAP) Est . = argmax p (|x , D )
Deterministic Approximation
Laplace Approx. Variational methods
Stochastic Approximation
Importance sampling Gibbs sampling
29 / 33
Bayesian inference
Bayesian inference
Put distributions on everything and then use rules of probability to infer values Aspects of Bayesian inference
Priors: Assuming a prior distribution p () Procedures: Minimizing expected loss (averaging over ) Pros.:
Directly answer questions. Avoid overtting
Cons.:
Must assume prior. Exact computation can be intractable
30 / 33
Bayesian inference
Directed graphical models

Bayesian networks or Conditional independ. diagram:
Why? Tractable inference.
Factorization of the probabilistic model. Notational device Visualization for inference algorithms Example for thinking graphically p (a, b , c ):
p (a, b , c ) = p (c |a, b )p (a, b ) = p (c |a, b )p (b |a)p (a)
31 / 33
Summary
Summary
Machine learning is an essential eld for our life. Machine learning is a broad world, we just started it in this session :D :D.
32 / 33
Feedback
Feedback
Your feedback is welcomed on alex.acm.org/feedback/machine/
33 / 33

Alex ACM SC Machine Learning Day (Materials) - Introduction To Machine Learning by Eng. Ibrahim Sabek

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Alex ACM SC Machine Learning Day (Materials) - Introduction To Machine Learning by Eng. Ibrahim Sabek

Uploaded by

Copyright:

Available Formats

Introduction to Machine Learning

Machine learning overview and applications

What is Machine Learning (ML)?

Machine learning overview and applications

What is Machine Learning (ML)?

Machine learning overview and applications

What is Machine Learning (ML)?

Supervised vs. Unsupervised learning

Supervised vs. Unsupervised

Supervised vs. Unsupervised learning

Supervised vs. Unsupervised

Supervised vs. Unsupervised learning

Supervised vs. Unsupervised

Supervised vs. Unsupervised learning

Supervised vs. Unsupervised

Supervised vs. Unsupervised learning

Supervised vs. Unsupervised

Supervised vs. Unsupervised learning

Supervised vs. Unsupervised

Supervised vs. Unsupervised learning

Supervised vs. Unsupervised

Supervised vs. Unsupervised learning

Supervised vs. Unsupervised

Supervised vs. Unsupervised learning

Supervised vs. Unsupervised

Supervised vs. Unsupervised learning

Variations on Supervised and Unsupervised

Supervised vs. Unsupervised learning

Variations on Supervised and Unsupervised

Supervised vs. Unsupervised learning

Variations on Supervised and Unsupervised

Supervised vs. Unsupervised learning

Variations on Supervised and Unsupervised

Generative vs. Discriminative models

Generative vs. Discriminative models

Generative vs. Discriminative models

Generative vs. Discriminative models

Generative vs. Discriminative models

Generative vs. Discriminative models

k-Nearest Neighbor classication (kNN)

Dissimilarity metric: d (x , x ) = ||x x ||2 for k = 1 Probabilistic interpretation:

Classication trees (CART)

Regression tress (CART)

Bootstrap aggregation (Bagging)

Bootstrap aggregation (Bagging)

Given x , take majority vote (for classication) or average (for regression).

The big picture

The big picture

Generative: Estimate p (x , y ) directly using D . and then

Params/Latent variables : by including parameters, we have p (x , y |)

The big picture

The big picture

Directed graphical models

You might also like