Professional Documents
Culture Documents
26-10-2015
Table of Contents
Neural Networks and loss surfaces
Problems of Deep Architectures
Optimization in Neural Networks
Under fitting
Proliferation of Saddle points
Analysis of Gradient and Hessian based Algorithms
Conclusions
[1] Extremetech.com
[2] Willamette.edu
[3] Wikipedia.com
[4] Allaboutcircuits.com
Curse of dimensionality
[5] Visiondummy.com
Compositionality
[7] Nature.com
[8] Shapeofdata.wordpress.com
10
Eigenvalue distribution
A function of error/energy
[9] Mathworld.wolfram.com
11
Effect of dimensionality
Single draw of a Gaussian process unconstrained
Single valued Hessian
Saddle Point Probability(0)
Maxima/Minima - Probability (1)
12
+ Solution2: Momentum
[10] gist.github.com
13
Analysis of momentum
Idea: Add momentum in persistent directions
Formally
+ Pathological curvatures.
? Choosing an appropriate momentum coefficient.
14
[11] Sutskever, Martens, Dahl, Hinton On the importance of initialization and momentum in
deep learning, [ICML 2013]
15
16
[12] netlab.unist.ac.kr
17
[13] Visiblegeology.com
18
slower convergence
19
[14] Dauphin, Bengio Identifying and attacking the saddle point problem in high
dimensional non-convex optimization arXiv 2014
20
[14] Dauphin, Bengio Identifying and attacking the saddle point problem in high
dimensional non-convex optimization arXiv 2014
21
22
23
24
25
Results
Co-operative and competitive interactions across connectivity modes.
Network driven to a decoupled regime
Fixed points - saddle points
No non-global minima
26
Hyperbolic trajectories
27
Importance of initialization
28
Importance of Initialization
29
Unsupervised pre-training
30
31
Conclusion
32
33
Backup Slides
34
35
36
37
38
39
40
41
Concentration of Measure
Its very difficult for N independent random variables
to work together and pull the sum or any function
dependent on them very far away from its mean.
Informally, A random variable that depends in a
Lipschitz way on many independent random variables
is essentially constant.
42