Professional Documents
Culture Documents
Lecture 4 - 1
7 Jan 2015
Lecture 4:
Optimization
Lecture 4 - 2
7 Jan 2015
Image Classification
assume given set of discrete labels
{dog, cat, truck, plane, ...}
cat
Lecture 4 - 3
7 Jan 2015
Data-driven approach
Lecture 4 - 4
7 Jan 2015
1. Score function
Lecture 4 - 5
7 Jan 2015
Lecture 4 - 6
7 Jan 2015
1. Score function
Lecture 4 - 7
7 Jan 2015
Lecture 4 - 8
7 Jan 2015
Lecture 4 - 9
7 Jan 2015
Lecture 4 - 10
7 Jan 2015
Lecture 4 - 11
7 Jan 2015
hue bins
+1
Lecture 4 - 12
7 Jan 2015
Lecture 4 - 13
7 Jan 2015
Lecture 4 - 14
7 Jan 2015
Lecture 4 - 15
7 Jan 2015
histogram of
visual words
1000-d vector
1000-d vector
Lecture 4 - 16
1000-d vector
7 Jan 2015
Lecture 4 - 17
7 Jan 2015
Lecture 4 - 18
7 Jan 2015
CNNs:
end-to-end
models
(slide from Yann LeCun)
Lecture 4 - 19
7 Jan 2015
Lecture 4 - 20
7 Jan 2015
Lecture 4 - 21
7 Jan 2015
Lecture 4 - 22
7 Jan 2015
Lecture 4 - 23
7 Jan 2015
Lecture 4 - 24
7 Jan 2015
Lecture 4 - 25
7 Jan 2015
Lecture 4 - 26
7 Jan 2015
Lecture 4 - 27
7 Jan 2015
Lecture 4 - 28
7 Jan 2015
Optimization
Lecture 4 - 29
7 Jan 2015
Lecture 4 - 30
7 Jan 2015
Lecture 4 - 31
7 Jan 2015
whats up
with
0.0001?
Lecture 4 - 32
7 Jan 2015
Lecture 4 - 33
7 Jan 2015
Fun aside:
When W = 0, what is the CIFAR-10 loss for SVM and Softmax?
Lecture 4 - 34
7 Jan 2015
Lecture 4 - 35
7 Jan 2015
gives 21.4%!
Fei-Fei Li & Andrej Karpathy
Lecture 4 - 36
7 Jan 2015
Lecture 4 - 37
7 Jan 2015
Lecture 4 - 38
7 Jan 2015
Lecture 4 - 39
7 Jan 2015
Evaluation the
gradient numerically
Lecture 4 - 40
7 Jan 2015
Evaluation the
gradient numerically
finite difference
approximation
Lecture 4 - 41
7 Jan 2015
Evaluation the
gradient numerically
in practice:
Lecture 4 - 42
7 Jan 2015
Evaluation the
gradient numerically
Lecture 4 - 43
7 Jan 2015
performing
a
parameter
update
Lecture 4 - 44
7 Jan 2015
performing
a
parameter
update
Lecture 4 - 45
7 Jan 2015
original W
negative gradient direction
Fei-Fei Li & Andrej Karpathy
Lecture 4 - 46
7 Jan 2015
Lecture 4 - 47
7 Jan 2015
Lecture 4 - 48
7 Jan 2015
Lecture 4 - 49
7 Jan 2015
Calculus
Lecture 4 - 50
7 Jan 2015
Lecture 4 - 51
7 Jan 2015
Lecture 4 - 52
7 Jan 2015
Lecture 4 - 53
7 Jan 2015
In summary:
-
=>
Lecture 4 - 54
7 Jan 2015
Lecture 4 - 55
7 Jan 2015
Gradient Descent
Lecture 4 - 56
7 Jan 2015
only use a small portion of the training set to compute the gradient.
Lecture 4 - 57
7 Jan 2015
Lecture 4 - 58
7 Jan 2015
Summary
- Always use mini-batch gradient descent
- Incorrectly refer to it as doing SGD as everyone else
(or call it batch gradient descent)
Lecture 4 - 59
7 Jan 2015
increase
decrease
stay the same
become zero
Lecture 4 - 60
7 Jan 2015
Lecture 4 - 61
7 Jan 2015
Lecture 4 - 62
7 Jan 2015
Momentum Update
gradient
update
momentum
Lecture 4 - 63
7 Jan 2015
Lecture 4 - 64
7 Jan 2015
Summary
- We looked at image features, and saw that CNNs can be
thought of as learning the features in end-to-end manner
- We explored intuition about what the loss surfaces of linear
classifiers look like
- We introduced gradient descent as a way of optimizing loss
functions, as well as batch gradient descent and SGD.
- Numerical gradient: slow :(, approximate :(, easy to write :)
- Analytic gradient: fast :), exact :), error-prone :(
- In practice: Gradient check (but be careful)
Fei-Fei Li & Andrej Karpathy
Lecture 4 - 65
7 Jan 2015
Next class:
Becoming a
backprop ninja
Fei-Fei Li & Andrej Karpathy
Lecture 4 - 66
7 Jan 2015