You are on page 1of 113

CS189/CS289A

Introduction to Machine Learning


Lecture 1: Overview
Alexei Efros and Peter Bartlett

January 20, 2015

1 / 37

Organizational Issues

2 / 37

Organizational Issues

Instructors: Peter Bartlett and Alyosha Efros.

2 / 37

Organizational Issues

Instructors: Peter Bartlett and Alyosha Efros.


GSIs: Peter Gao, Yun Park, Faraz Tavakoli, Kevin Tee, Pat Virtue,
Christopher Xie, Daniel Xu, Yuchen Zhang.

2 / 37

Organizational Issues

Instructors: Peter Bartlett and Alyosha Efros.


GSIs: Peter Gao, Yun Park, Faraz Tavakoli, Kevin Tee, Pat Virtue,
Christopher Xie, Daniel Xu, Yuchen Zhang.
Discussion sections: You choose. If the room is full, please go to
another one. (If necessary, we may offer some specialty
sectionswatch website for announcements.)

2 / 37

Organizational Issues

Instructors: Peter Bartlett and Alyosha Efros.


GSIs: Peter Gao, Yun Park, Faraz Tavakoli, Kevin Tee, Pat Virtue,
Christopher Xie, Daniel Xu, Yuchen Zhang.
Discussion sections: You choose. If the room is full, please go to
another one. (If necessary, we may offer some specialty
sectionswatch website for announcements.)
Office hours: see web site.

2 / 37

Organizational Issues

Instructors: Peter Bartlett and Alyosha Efros.


GSIs: Peter Gao, Yun Park, Faraz Tavakoli, Kevin Tee, Pat Virtue,
Christopher Xie, Daniel Xu, Yuchen Zhang.
Discussion sections: You choose. If the room is full, please go to
another one. (If necessary, we may offer some specialty
sectionswatch website for announcements.)
Office hours: see web site.
http://www-inst.eecs.berkeley.edu/cs189
bCourses (+ piazza, kaggle), office hours, syllabus, assignments,
readings, lecture slides, announcements.

2 / 37

Organizational Issues

Assessment:
CS189

Homework 40%
Implementation and application of methods. (Kaggle)
Mathematical/reinforcement of concepts.
Seven total.
Late policy: 5 slip days total. Thats it.

Midterm 20%
(Thursday, March 19, in the lecture slot.)
Final Exam 40%

3 / 37

Organizational Issues

Assessment:
CS289A Plus a project:
Homework 40%
Midterm 20%
Final Exam 20%
Final Project 20%
(due Friday, May 1. Proposal due Friday, April 3.)

4 / 37

Organizational Issues

(Real) Prerequisites:
Math53 (vector calculus); Math54 (linear algebra); CS70 (discrete
math, probability); CS188 (more probability, decision theory).

5 / 37

Organizational Issues

(Real) Prerequisites:
Math53 (vector calculus); Math54 (linear algebra); CS70 (discrete
math, probability); CS188 (more probability, decision theory).
No screens in lectures. (To see why, google laptops in class.)

5 / 37

Organizational Issues

(Real) Prerequisites:
Math53 (vector calculus); Math54 (linear algebra); CS70 (discrete
math, probability); CS188 (more probability, decision theory).
No screens in lectures. (To see why, google laptops in class.)
Ethics:

5 / 37

Organizational Issues

(Real) Prerequisites:
Math53 (vector calculus); Math54 (linear algebra); CS70 (discrete
math, probability); CS188 (more probability, decision theory).
No screens in lectures. (To see why, google laptops in class.)
Ethics:
Discussion of homework problems with other students is encouraged.

5 / 37

Organizational Issues

(Real) Prerequisites:
Math53 (vector calculus); Math54 (linear algebra); CS70 (discrete
math, probability); CS188 (more probability, decision theory).
No screens in lectures. (To see why, google laptops in class.)
Ethics:
Discussion of homework problems with other students is encouraged.
All homeworks must be written individually (including programming
components).

5 / 37

Organizational Issues

(Real) Prerequisites:
Math53 (vector calculus); Math54 (linear algebra); CS70 (discrete
math, probability); CS188 (more probability, decision theory).
No screens in lectures. (To see why, google laptops in class.)
Ethics:
Discussion of homework problems with other students is encouraged.
All homeworks must be written individually (including programming
components).
Please read the department policy on academic dishonesty. We will be
actively checking for plagiarism.

5 / 37

Organizational Issues

(Real) Prerequisites:
Math53 (vector calculus); Math54 (linear algebra); CS70 (discrete
math, probability); CS188 (more probability, decision theory).
No screens in lectures. (To see why, google laptops in class.)
Ethics:
Discussion of homework problems with other students is encouraged.
All homeworks must be written individually (including programming
components).
Please read the department policy on academic dishonesty. We will be
actively checking for plagiarism.

Questions: Use piazza. Public and private.

5 / 37

Texts
Springer Series in Statistics

Trevor Hastie
Robert Tibshirani
Jerome Friedman

The Elements of
Statistical Learning
Data Mining, Inference, and Prediction
Second Edition

6 / 37

CS189: Introduction to Machine Learning

7 / 37

CS189: Introduction to Machine Learning


Machine Learning
Systems that learn to solve
information processing problems.

7 / 37

CS189: Introduction to Machine Learning


Machine Learning
Systems that learn to solve
information processing problems.

Learn
Use experience to improve performance:
data, queries, interaction, experiments
Statistical issues are central.

7 / 37

CS189: Introduction to Machine Learning


Machine Learning
Systems that learn to solve
information processing problems.

Learn
Use experience to improve performance:
data, queries, interaction, experiments
Statistical issues are central.

Systems
Computational issues are also central.
Algorithms, optimization.
7 / 37

An Overview of Machine Learning

1
2
3

8 / 37

An Overview of Machine Learning

Problems

2
3

8 / 37

An Overview of Machine Learning

Problems

Methods

8 / 37

An Overview of Machine Learning

Problems

Methods

Concepts

8 / 37

An Overview of Machine Learning

Problems

Methods

Concepts

8 / 37

Classification Problems (Homework)


Email

9 / 37
ESL

Classification Problems (Homework)

ESL

10 / 37

Classification

11 / 37

Classification

microsoft.com

12 / 37

Classification

apple.com

ESL
13 / 37

Classification

ISLR
14 / 37

Classification

ISLR
15 / 37

Classification

ESL
16 / 37

Regression

ESL
17 / 37

Regression

ESL
18 / 37

Regression

ESL
19 / 37

Regression

ESL

20 / 37

Regression

ESL

21 / 37

Density Estimation

ESL

22 / 37

Density Estimation

ESL
23 / 37

Dimensionality Reduction

ESL
24 / 37

Dimensionality Reduction

ESL

25 / 37

Dimensionality Reduction

ESL
26 / 37

Clustering

ESL
27 / 37

Clustering

28 / 37

Clustering

ESL

29 / 37

Clustering

ESL

30 / 37

Machine Learning Problems


Classification

31 / 37

Machine Learning Problems


Classification
Regression

31 / 37

Machine Learning Problems


Classification
Regression
Density estimation

31 / 37

Machine Learning Problems


Classification
Regression
Density estimation
Dimensionality reduction

31 / 37

Machine Learning Problems


Classification
Regression
Density estimation
Dimensionality reduction
Clustering

31 / 37

Machine Learning Problems


Classification
Regression
Density estimation
Dimensionality reduction
Clustering
Ranking

31 / 37

Machine Learning Problems


Classification
Regression
Density estimation
Dimensionality reduction
Clustering
Ranking
Collaborative filtering

31 / 37

Machine Learning Problems


Classification
Regression
Density estimation
Dimensionality reduction
Clustering
Ranking
Collaborative filtering
Sequential decision
problems:

31 / 37

Machine Learning Problems


Classification
Regression
Density estimation
Dimensionality reduction
Clustering
Ranking
Collaborative filtering
Sequential decision
problems:
bandits

31 / 37

Machine Learning Problems


Classification
Regression
Density estimation
Dimensionality reduction
Clustering
Ranking
Collaborative filtering
Sequential decision
problems:
bandits
contextual bandits

31 / 37

Machine Learning Problems


Classification
Regression
Density estimation
Dimensionality reduction
Clustering
Ranking
Collaborative filtering
Sequential decision
problems:
bandits
contextual bandits
dynamic pricing

31 / 37

Machine Learning Problems


Classification
Regression
Density estimation
Dimensionality reduction
Clustering
Ranking
Collaborative filtering
Sequential decision
problems:
bandits
contextual bandits
dynamic pricing
reinforcement learning

31 / 37

An Overview of Machine Learning

Problems

Methods

Concepts

32 / 37

Methods
Linear classifiers: Perceptron
Support vector machines
Gaussian class conditionals
Logistic regression
Naive Bayes
Linear discriminant analysis
Linear regression
Decision trees, regression trees
Ensemble methods
Neural networks
Nearest neighbor
Principal components analysis
k-means clustering
33 / 37

Methods
Linear classifiers: Perceptron
Support vector machines
Gaussian class conditionals
Logistic regression
Naive Bayes
Linear discriminant analysis
1

Classification

Regression

Linear regression
Decision trees, regression trees
Ensemble methods
Neural networks
Nearest neighbor
Principal components analysis
k-means clustering

33 / 37

Methods
Linear classifiers: Perceptron
Support vector machines
Gaussian class conditionals
Logistic regression
Naive Bayes
Linear discriminant analysis

Probabilistic
modeling.

Prediction; not based


on a model.

Linear regression
Decision trees, regression trees
Ensemble methods
Neural networks
Nearest neighbor
Principal components analysis
k-means clustering

33 / 37

An Overview of Machine Learning

Problems

Methods

Concepts

34 / 37

Concepts
1

Prediction versus probabilistic modeling.

35 / 37

Concepts
1
2

Prediction versus probabilistic modeling.


Probabilistic modeling:

35 / 37

Concepts
1
2

Prediction versus probabilistic modeling.


Probabilistic modeling:
Generative versus discriminative models.

35 / 37

Concepts
1
2

Prediction versus probabilistic modeling.


Probabilistic modeling:
Generative versus discriminative models.
Maximum likelihood estimation.

35 / 37

Concepts
1
2

Prediction versus probabilistic modeling.


Probabilistic modeling:
Generative versus discriminative models.
Maximum likelihood estimation.
Bayesian inference.

35 / 37

Concepts
1
2

Prediction versus probabilistic modeling.


Probabilistic modeling:
Generative versus discriminative models.
Maximum likelihood estimation.
Bayesian inference.

Optimization.

35 / 37

Concepts
1
2

Prediction versus probabilistic modeling.


Probabilistic modeling:
Generative versus discriminative models.
Maximum likelihood estimation.
Bayesian inference.

Optimization.
Convexity.

35 / 37

Concepts
1
2

Prediction versus probabilistic modeling.


Probabilistic modeling:
Generative versus discriminative models.
Maximum likelihood estimation.
Bayesian inference.

Optimization.
Convexity.
(Stochastic) gradient methods.

35 / 37

Concepts
1
2

Prediction versus probabilistic modeling.


Probabilistic modeling:
Generative versus discriminative models.
Maximum likelihood estimation.
Bayesian inference.

Optimization.
Convexity.
(Stochastic) gradient methods.
Newtons method.

35 / 37

Concepts
1
2

Prediction versus probabilistic modeling.


Probabilistic modeling:
Generative versus discriminative models.
Maximum likelihood estimation.
Bayesian inference.

Optimization.
Convexity.
(Stochastic) gradient methods.
Newtons method.

Controlling complexity:

35 / 37

Concepts
1
2

Prediction versus probabilistic modeling.


Probabilistic modeling:
Generative versus discriminative models.
Maximum likelihood estimation.
Bayesian inference.

Optimization.
Convexity.
(Stochastic) gradient methods.
Newtons method.

Controlling complexity:
Bias-variance/approximation-estimation trade-off.

35 / 37

Concepts
1
2

Prediction versus probabilistic modeling.


Probabilistic modeling:
Generative versus discriminative models.
Maximum likelihood estimation.
Bayesian inference.

Optimization.
Convexity.
(Stochastic) gradient methods.
Newtons method.

Controlling complexity:
Bias-variance/approximation-estimation trade-off.
Regularization

35 / 37

Concepts
1
2

Prediction versus probabilistic modeling.


Probabilistic modeling:
Generative versus discriminative models.
Maximum likelihood estimation.
Bayesian inference.

Optimization.
Convexity.
(Stochastic) gradient methods.
Newtons method.

Controlling complexity:
Bias-variance/approximation-estimation trade-off.
Regularization
Priors

35 / 37

Concepts
1
2

Prediction versus probabilistic modeling.


Probabilistic modeling:
Generative versus discriminative models.
Maximum likelihood estimation.
Bayesian inference.

Optimization.
Convexity.
(Stochastic) gradient methods.
Newtons method.

Controlling complexity:
Bias-variance/approximation-estimation trade-off.
Regularization
Priors

Practical issues:

35 / 37

Concepts
1
2

Prediction versus probabilistic modeling.


Probabilistic modeling:
Generative versus discriminative models.
Maximum likelihood estimation.
Bayesian inference.

Optimization.
Convexity.
(Stochastic) gradient methods.
Newtons method.

Controlling complexity:
Bias-variance/approximation-estimation trade-off.
Regularization
Priors

Practical issues:
Train/validate/test. Over-fitting.
35 / 37

Concepts
1
2

Prediction versus probabilistic modeling.


Probabilistic modeling:
Generative versus discriminative models.
Maximum likelihood estimation.
Bayesian inference.

Optimization.
Convexity.
(Stochastic) gradient methods.
Newtons method.

Controlling complexity:
Bias-variance/approximation-estimation trade-off.
Regularization
Priors

Practical issues:
Train/validate/test. Over-fitting.
Resampling methods.
35 / 37

Overview (Part I: Bartlett)


Linear classification

36 / 37

Overview (Part I: Bartlett)


Linear classification
Statistical learning background

36 / 37

Overview (Part I: Bartlett)


Linear classification
Statistical learning background
Decision theory

36 / 37

Overview (Part I: Bartlett)


Linear classification
Statistical learning background
Decision theory
Generative and discriminative models

36 / 37

Overview (Part I: Bartlett)


Linear classification
Statistical learning background
Decision theory
Generative and discriminative models
Controlling complexity.

36 / 37

Overview (Part I: Bartlett)


Linear classification
Statistical learning background
Decision theory
Generative and discriminative models
Controlling complexity.
Resampling, cross-validation.

36 / 37

Overview (Part I: Bartlett)


Linear classification
Statistical learning background
Decision theory
Generative and discriminative models
Controlling complexity.
Resampling, cross-validation.
The multivariate normal distribution.

36 / 37

Overview (Part I: Bartlett)


Linear classification
Statistical learning background
Decision theory
Generative and discriminative models
Controlling complexity.
Resampling, cross-validation.
The multivariate normal distribution.

Linear regression

36 / 37

Overview (Part I: Bartlett)


Linear classification
Statistical learning background
Decision theory
Generative and discriminative models
Controlling complexity.
Resampling, cross-validation.
The multivariate normal distribution.

Linear regression
Optimization

36 / 37

Overview (Part I: Bartlett)


Linear classification
Statistical learning background
Decision theory
Generative and discriminative models
Controlling complexity.
Resampling, cross-validation.
The multivariate normal distribution.

Linear regression
Optimization
Linear Classification revisited

36 / 37

Overview (Part I: Bartlett)


Linear classification
Statistical learning background
Decision theory
Generative and discriminative models
Controlling complexity.
Resampling, cross-validation.
The multivariate normal distribution.

Linear regression
Optimization
Linear Classification revisited
Logistic regression

36 / 37

Overview (Part I: Bartlett)


Linear classification
Statistical learning background
Decision theory
Generative and discriminative models
Controlling complexity.
Resampling, cross-validation.
The multivariate normal distribution.

Linear regression
Optimization
Linear Classification revisited
Logistic regression
Linear Discriminant Analysis

36 / 37

Overview (Part I: Bartlett)


Linear classification
Statistical learning background
Decision theory
Generative and discriminative models
Controlling complexity.
Resampling, cross-validation.
The multivariate normal distribution.

Linear regression
Optimization
Linear Classification revisited
Logistic regression
Linear Discriminant Analysis
Support vector machines

36 / 37

Overview (Part I: Bartlett)


Linear classification
Statistical learning background
Decision theory
Generative and discriminative models
Controlling complexity.
Resampling, cross-validation.
The multivariate normal distribution.

Linear regression
Optimization
Linear Classification revisited
Logistic regression
Linear Discriminant Analysis
Support vector machines

Statistical learning theory


36 / 37

Overview (Part II: Efros)


1

Memory-based/Instance-based learning

37 / 37

Overview (Part II: Efros)


1

Memory-based/Instance-based learning
k-nearest-neighbor

37 / 37

Overview (Part II: Efros)


1

Memory-based/Instance-based learning
k-nearest-neighbor
Properties of high-dimensional spaces

37 / 37

Overview (Part II: Efros)


1

Memory-based/Instance-based learning
k-nearest-neighbor
Properties of high-dimensional spaces
distance learning

37 / 37

Overview (Part II: Efros)


1

Memory-based/Instance-based learning
k-nearest-neighbor
Properties of high-dimensional spaces
distance learning
Efficient indexing and retrieval methods

37 / 37

Overview (Part II: Efros)


1

Memory-based/Instance-based learning
k-nearest-neighbor
Properties of high-dimensional spaces
distance learning
Efficient indexing and retrieval methods

Decision trees

37 / 37

Overview (Part II: Efros)


1

Memory-based/Instance-based learning
k-nearest-neighbor
Properties of high-dimensional spaces
distance learning
Efficient indexing and retrieval methods

Decision trees
Classification and regression trees

37 / 37

Overview (Part II: Efros)


1

Memory-based/Instance-based learning
k-nearest-neighbor
Properties of high-dimensional spaces
distance learning
Efficient indexing and retrieval methods

Decision trees
Classification and regression trees
Random Forests

37 / 37

Overview (Part II: Efros)


1

Memory-based/Instance-based learning
k-nearest-neighbor
Properties of high-dimensional spaces
distance learning
Efficient indexing and retrieval methods

Decision trees
Classification and regression trees
Random Forests

Boosting

37 / 37

Overview (Part II: Efros)


1

Memory-based/Instance-based learning
k-nearest-neighbor
Properties of high-dimensional spaces
distance learning
Efficient indexing and retrieval methods

Decision trees
Classification and regression trees
Random Forests

3
4

Boosting
Neural networks / Deep Learning

37 / 37

Overview (Part II: Efros)


1

Memory-based/Instance-based learning
k-nearest-neighbor
Properties of high-dimensional spaces
distance learning
Efficient indexing and retrieval methods

Decision trees
Classification and regression trees
Random Forests

3
4

Boosting
Neural networks / Deep Learning
Multilayer perceptrons

37 / 37

Overview (Part II: Efros)


1

Memory-based/Instance-based learning
k-nearest-neighbor
Properties of high-dimensional spaces
distance learning
Efficient indexing and retrieval methods

Decision trees
Classification and regression trees
Random Forests

3
4

Boosting
Neural networks / Deep Learning
Multilayer perceptrons
Variations such as convolutional nets

37 / 37

Overview (Part II: Efros)


1

Memory-based/Instance-based learning
k-nearest-neighbor
Properties of high-dimensional spaces
distance learning
Efficient indexing and retrieval methods

Decision trees
Classification and regression trees
Random Forests

3
4

Boosting
Neural networks / Deep Learning
Multilayer perceptrons
Variations such as convolutional nets
Examples and applications

37 / 37

Overview (Part II: Efros)


1

Memory-based/Instance-based learning
k-nearest-neighbor
Properties of high-dimensional spaces
distance learning
Efficient indexing and retrieval methods

Decision trees
Classification and regression trees
Random Forests

3
4

Boosting
Neural networks / Deep Learning
Multilayer perceptrons
Variations such as convolutional nets
Examples and applications

Unsupervised methods

37 / 37

Overview (Part II: Efros)


1

Memory-based/Instance-based learning
k-nearest-neighbor
Properties of high-dimensional spaces
distance learning
Efficient indexing and retrieval methods

Decision trees
Classification and regression trees
Random Forests

3
4

Boosting
Neural networks / Deep Learning
Multilayer perceptrons
Variations such as convolutional nets
Examples and applications

Unsupervised methods
Clustering

37 / 37

Overview (Part II: Efros)


1

Memory-based/Instance-based learning
k-nearest-neighbor
Properties of high-dimensional spaces
distance learning
Efficient indexing and retrieval methods

Decision trees
Classification and regression trees
Random Forests

3
4

Boosting
Neural networks / Deep Learning
Multilayer perceptrons
Variations such as convolutional nets
Examples and applications

Unsupervised methods
Clustering
Density estimation

37 / 37

Overview (Part II: Efros)


1

Memory-based/Instance-based learning
k-nearest-neighbor
Properties of high-dimensional spaces
distance learning
Efficient indexing and retrieval methods

Decision trees
Classification and regression trees
Random Forests

3
4

Boosting
Neural networks / Deep Learning
Multilayer perceptrons
Variations such as convolutional nets
Examples and applications

Unsupervised methods
Clustering
Density estimation
Dimensionality reduction
37 / 37

Overview (Part II: Efros)


1

Memory-based/Instance-based learning
k-nearest-neighbor
Properties of high-dimensional spaces
distance learning
Efficient indexing and retrieval methods

Decision trees
Classification and regression trees
Random Forests

3
4

Boosting
Neural networks / Deep Learning
Multilayer perceptrons
Variations such as convolutional nets
Examples and applications

Unsupervised methods
Clustering
Density estimation
Dimensionality reduction
Applications: Collaborative filtering, etc.
37 / 37

You might also like