1 views

Uploaded by Cheng Yu Wang

Tricks of the Trade PPT

- 1-s2.0-S1574954106000355-main
- Anna Ppl
- Fabric Inspection System using Artificial Neural Networks
- Xyz - hjk + 2dc.pdf
- nnet_gs
- Efficient VLSI Implementation Based On Constructive Neural Network Algorithms
- TamilCharacterRecognition
- Summary
- Deap Learning Ai, ai, programming, lisp, artificale inteligence
- Le 2012 Large Scale Unsupervised Learning
- CS231n Convolutional Neural Networks for Visual Recognition
- Articolos Vision Por Computado
- Age and Gender Classification
- Session 5 - Neural Networks
- 05 Aparaschivei F - A Neural Network Model for Sales Analysis
- A Survey of Deep Learning based Network Anomaly Detection
- 10.14569/IJACSA.2016.070731
- List+of+Comm+Final+19-02-2017
- Ann
- Verma 2012

You are on page 1of 80

Marc'Aurelio Ranzato

Facebook, AI Group

Ideal Features

- window, right

- chair, left

- monitor, top of shelf

- carpet, bottom

- drums, corner

-…

Ideal

Feature

Extractor

- pillows on couch

What is on the couch? ...

Ranzato

The Manifold of Natural Images

3

Ideal Feature Extraction

Pixel n Pose

Ideal

Feature

Extractor

Pixel 2

Expression

Pixel 1 4

Ranzato

Learning Non-Linear Features

+

Ranzato

Learning Non-Linear Features

Kernel learning o w

l l

Boosting

h a

...

S

Proposal #2: composition f x≈ g 1 g 2 g n x

Deep learning

Scattering networks (wavelet cascade) e p

e

S.C. Zhou & D. Mumford “grammar” D 6

Ranzato

Linear Combination

prediction of class

+

...

templete matchers

an exponential nr. of

templates!!! Input image 7

Ranzato

Composition

...

prediction of class

high-level

parts

parts distributed representations

low level

parts

GOOD: (exponentially)

Input image more efficient

Lee et al. “Convolutional DBN's ...” ICML 2009, Zeiler & Fergus Ranzato

A Potential Problem with Deep Learning

1 2 3 4

Ranzato

A Potential Problem with Deep Learning

Classifier

k-Means

Pooling

SIFT

4

Solution #1: freeze first N-1 layer (engineer the features)

It makes it shallow!

- How to design features of features?

- How to design features for new imagery?

10

Ranzato

A Potential Problem with Deep Learning

1 2 3 4

Solution #2: live with it!

It will converge to a local minimum.

It is much more powerful!!

Given lots of data, engineer less and learn more!! 11

Deep Learning in Practice

1 2 3 4

A: No distinction, end-to-end learning!

12

Ranzato

Deep Learning in Practice

13

Ranzato

KEY IDEAS: WHY DEEP LEARNING

Distributed representations

Compositionality

End-to-end learning

14

Ranzato

What Is Deep Learning?

15

Ranzato

Buzz Words

16

Ranzato

(My) Definition

predictions by using a sequence of non-linear processing

stages. The resulting intermediate representations can be

interpreted as feature hierarchies and the whole system is

jointly learned from data.

loss-based, some are supervised, other unsupervised...

It's a large family!

17

Ranzato

Perceptron

1957

Rosenblatt

THE SPACE OF

MACHINE LEARNING METHODS 18

Neural Net Perceptron

80s

back-propagation &

Autoencoder

compute power Neural Net

19

Recurrent

Neural Net

Convolutional

Neural Net

90s

LeCun's CNNs Autoencoder

Neural Net

Sparse Coding

GMM

20

Recurrent

Neural Net

Boosting

Convolutional

Neural Net

SVM

00s

SVMs Autoencoder

Neural Net

Sparse Coding

GMM

Restricted BM

21

Recurrent

Neural Net

Boosting

Convolutional

Neural Net

SVM

2006

Hinton's DBN Autoencoder

Neural Net

Sparse Coding

GMM

Deep Belief Net Restricted BM

BayesNP 22

2009

Recurrent

Neural Net

Boosting

Convolutional

Neural Net

2009 SVM

ASR (data + GPU)

Deep (sparse/denoising) Autoencoder

Autoencoder Neural Net

Sparse Coding

ΣΠ GMM

Deep Belief Net Restricted BM

BayesNP 23

Recurrent

Neural Net

Boosting

Convolutional

Neural Net

2012 SVM

CNNs (data + GPU)

Deep (sparse/denoising) Autoencoder

Autoencoder Neural Net

Sparse Coding

ΣΠ GMM

Deep Belief Net Restricted BM

BayesNP 24

Recurrent

Neural Net

Boosting

Convolutional

Neural Net

SVM

Autoencoder Neural Net

Sparse Coding

ΣΠ GMM

Deep Belief Net Restricted BM

BayesNP 25

TIME

Convolutional

Neural Net 2012

Convolutional

Neural Net 1998

Convolutional

Neural Net 1988

A.: The main reason for the breakthrough is: data and GPU, but we

have also made networks deeper and more non-linear.

26

ConvNets: History

- Fukushima 1980: designed network with same basic structure but

did not train by backpropagation.

- LeCun from late 80s: figured out backpropagation for CNN,

popularized and deployed CNN for OCR applications and others.

- Poggio from 1999: same basic structure but learning is restricted

to top layer (k-means at second stage)

- LeCun from 2006: unsupervised feature learning

- DiCarlo from 2008: large scale experiments, normalization layer

- LeCun from 2009: harsher non-linearities, normalization layer,

learning unsupervised and supervised.

- Mallat from 2011: provides a theory behind the architecture

- Hinton 2012: use bigger nets, GPUs, more data 27

ConvNets: till 2012

Common wisdom: training does not work

Loss because we “get stuck in local minima”

parameter 28

ConvNets: today

Local minima are all similar, there are long plateaus,

Loss it can take long time to break symmetries.

1

w w

WT X

breaking ties Saturating units

input/output invariant to permutations between parameters

parameter 29

Like walking on a ridge between valleys

30

ConvNets: today

Local minima are all similar, there are long

Loss plateaus, it can take long to break symmetries.

Optimization is not the real problem when:

– dataset is large

– unit do not saturate too much

– normalization layer

parameter 31

ConvNets: today

Today's belief is that the challenge is about:

Loss – generalization

How many training samples to fit 1B parameters?

How many parameters/samples to model spaces with 1M dim.?

– scalability

parameter 32

SHALLOW

Recurrent

Neural Net

Boosting

Convolutional

Neural Net

SVM

Autoencoder Neural Net

Sparse Coding

ΣΠ GMM

Deep Belief Net Restricted BM

DEEP

BayesNP 33

SHALLOW

Recurrent

Neural Net

Boosting

Convolutional

Neural Net

SVM

SUPERVISED

Deep (sparse/denoising) UNSUPERVISED

Autoencoder

Autoencoder Neural Net

Sparse Coding

ΣΠ GMM

Deep Belief Net Restricted BM

DEEP

BayesNP 34

Deep Learning is a very rich family!

I am going to focus on a few methods...

35

Ranzato

SHALLOW

Recurrent

Neural Net

Boosting

CNN

SVM

SUPERVISED

Deep (sparse/denoising) UNSUPERVISED

Autoencoder

Autoencoder Neural Net

Sparse

Coding GMM

ΣΠ

DBN Restricted BM

DEEP

BayesNP 36

Deep Gated MRF

Layer 1:

c m 1 −1 c m

c m

−E x , h ,h

E x , h , h = x ' x p x , h , h e

2

pair-wise MRF

xp xq

37

Ranzato et al. “Modeling natural images with gated MRFs” PAMI 2013

Deep Gated MRF

Layer 1:

c m 1

E x , h , h = x ' C C ' x

2

pair-wise MRF

xp xq

F

38

Ranzato et al. “Modeling natural images with gated MRFs” PAMI 2013

Deep Gated MRF

Layer 1:

c m 1 c

E x , h , h = x ' C [diag h ]C ' x

2

c

h k

F

gated MRF

xp C C xq

F

39

Ranzato et al. “Modeling natural images with gated MRFs” PAMI 2013

Deep Gated MRF

Layer 1:

c m 1 c 1 m

E x , h , h = x ' C [diag h ]C ' x x ' x− x ' W h

2 2

c

h k

p x ∫h ∫h

c m

−E x ,h , h

M c m e

gated MRF

xp C C xq

F

W

m

h j

N 40

Ranzato et al. “Modeling natural images with gated MRFs” PAMI 2013

Deep Gated MRF

Layer 1:

c m 1 c 1 m

E x , h , h = x ' C [diag h ]C ' x x ' x− x ' W h

2 2

p x ∫h ∫h

c m

−E x ,h , h

Inference of latent variables: c m e

just a forward pass

Training:

requires approximations

(here we used MCMC methods)

41

Ranzato et al. “Modeling natural images with gated MRFs” PAMI 2013

Deep Gated MRF

Layer 1

Layer 2

input

2

h

42

Ranzato et al. “Modeling natural images with gated MRFs” PAMI 2013

Deep Gated MRF

Layer 1

Layer 2 Layer 3

input

2 3

h h

43

Ranzato et al. “Modeling natural images with gated MRFs” PAMI 2013

Sampling High-Resolution Images

Gaussian model marginal wavelet

44

Sampling High-Resolution Images

Gaussian model marginal wavelet

gMRF: 1 layer

45

Sampling High-Resolution Images

Gaussian model marginal wavelet

gMRF: 1 layer

46

Sampling High-Resolution Images

Gaussian model marginal wavelet

gMRF: 1 layer

47

Sampling High-Resolution Images

Gaussian model marginal wavelet

gMRF: 3 layers

48

Sampling High-Resolution Images

Gaussian model marginal wavelet

gMRF: 3 layers

49

Sampling High-Resolution Images

Gaussian model marginal wavelet

gMRF: 3 layers

50

Sampling High-Resolution Images

Gaussian model marginal wavelet

gMRF: 3 layers

51

Sampling After Training on Face Images

unconstrained samples

Original Input 1st layer 2nd layer 3rd layer 4th layer 10 times

52

Ranzato et al. PAMI 2013

Expression Recognition Under Occlusion

53

Ranzato et al. PAMI 2013

Pros Cons

Feature extraction is fast Training is inefficient

Unprecedented generation Slow

quality Tricky

Advances models of natural Sampling scales badly with

images dimensionality

Trains without labeled data What's the use case of

generative models?

Conclusion

If generation is not required, other feature learning methods are

more efficient (e.g., sparse auto-encoders).

What's the use case of generative models?

54

SHALLOW

Recurrent

Neural Net

Boosting

CNN

SVM

SUPERVISED

Deep (sparse/denoising) UNSUPERVISED

Autoencoder

Autoencoder Neural Net

SPARSE

CODING GMM

ΣΠ

DBN Restricted BM

DEEP

BayesNP 55

CONV NETS: TYPICAL ARCHITECTURE

One stage (zoom)

Whole system

Input Class

Image Labels

Fully Conn.

Layers

56

Ranzato

CONV NETS: TYPICAL ARCHITECTURE

One stage (zoom)

Lazebnik et al. “...Spatial Pyramid Matching...” CVPR 2006

Sanchez et al. “Image classifcation with F.V.: Theory and practice” IJCV 2012

57

Ranzato

CONV NETS: EXAMPLES

- OCR / House number & Traffic sign classification

Wan et al. “Regularization of neural networks using dropconnect” ICML 2013

CONV NETS: EXAMPLES

- Texture classification

59

Sifre et al. “Rotation, scaling and deformation invariant scattering...” CVPR 2013

CONV NETS: EXAMPLES

- Pedestrian detection

60

Sermanet et al. “Pedestrian detection with unsupervised multi-stage..” CVPR 2013

CONV NETS: EXAMPLES

- Scene Parsing

61

Farabet et al. “Learning hierarchical features for scene labeling” PAMI 2013 Ranzato

CONV NETS: EXAMPLES

- Segmentation 3D volumetric images

CONV NETS: EXAMPLES

- Action recognition from videos

63

CONV NETS: EXAMPLES

- Robotics

64

Sermanet et al. “Mapping and planning ...with long range perception” IROS 2008

CONV NETS: EXAMPLES

- Denoising

65

Burger et al. “Can plain NNs compete with BM3D?” CVPR 2012 Ranzato

CONV NETS: EXAMPLES

- Dimensionality reduction / learning embeddings

66

Hadsell et al. “Dimensionality reduction by learning an invariant mapping” CVPR 2006

CONV NETS: EXAMPLES

- Image classification

Object

railcar

Recognizer

67

Krizhevsky et al. “ImageNet Classification with deep CNNs” NIPS 2012 Ranzato

CONV NETS: EXAMPLES

- Deployed in commercial systems (Google & Baidu, spring 2013)

68

Ranzato

How To Use ConvNets...(properly)

69

CHOOSING THE ARCHITECTURE

Task dependent

Cross-validation

The more data: the more layers and the more kernels

Look at the number of parameters at each layer

Look at the number of flops at each layer

Computational cost

Be creative :) 70

Ranzato

HOW TO OPTIMIZE

SGD (with momentum) usually works very well

Bottou “Stochastic Gradient Tricks” Neural Networks 2012

Start with large learning rate and divide by 2 until loss does not diverge

Decay learning rate by a factor of ~100 or more by the end of training

Use non-linearity

similar variance. Avoid units in saturation.

71

Ranzato

HOW TO IMPROVE GENERALIZATION

Weight sharing (greatly reduce the number of parameters)

Dropout

Hinton et al. “Improving Nns by preventing co-adaptation of feature detectors”

arxiv 2012

72

Ranzato

OTHER THINGS GOOD TO KNOW

Check gradients numerically by finite differences

Visualize features (feature maps need to be uncorrelated)

and have high variance.

samples

hidden unit

Good training: hidden units are sparse across samples 73

and across features. Ranzato

OTHER THINGS GOOD TO KNOW

Check gradients numerically by finite differences

Visualize features (feature maps need to be uncorrelated)

and have high variance.

samples

hidden unit

Bad training: many hidden units ignore the input and/or 74

exhibit strong correlations. Ranzato

OTHER THINGS GOOD TO KNOW

Check gradients numerically by finite differences

Visualize features (feature maps need to be uncorrelated)

and have high variance.

Visualize parameters

GOOD BAD BAD BAD

75

Ranzato

OTHER THINGS GOOD TO KNOW

Check gradients numerically by finite differences

Visualize features (feature maps need to be uncorrelated)

and have high variance.

Visualize parameters

Measure error on both training and validation set.

Test on a small subset of the data and check the error → 0.

76

Ranzato

WHAT IF IT DOES NOT WORK?

Training diverges:

Learning rate may be too large → decrease learning rate

BPROP is buggy → numerical gradient checking

Check loss function:

Is it appropriate for the task you want to solve?

Does it have degenerate solutions?

Network is underperforming

Compute flops and nr. params. → if too small, make net larger

Visualize hidden units/params → fix optmization

Compute flops and nr. params. → GPU,distrib. framework, make

net smaller 77

Ranzato

SUMMARY

Deep Learning = Learning Hierarchical representations.

Leverage compositionality to gain efficiency.

Unsupervised learning: active research topic.

Supervised learning: most successful set up today.

Optimization

Don't we get stuck in local minima? No, they are all the same!

In large scale applications, local minima are even less of an issue.

Scaling

GPUs

Distributed framework (Google)

Better optimization techniques

Generalization on small datasets (curse of dimensionality):

Input distortions

weight decay 78

dropout Ranzato

THANK YOU!

Deadline: 9 Feb. 2014.

79

Ranzato

SOFTWARE

Torch7: learning library that supports neural net training

http://www.torch.ch

http://code.cogbits.com/wiki/doku.php (tutorial with demos by C. Farabet)

- http://deeplearning.net/software/theano/ (does automatic differentiation)

– http://eblearn.sourceforge.net/

– code.google.com/p/cuda-convnet

http://www.cs.toronto.edu/~ranzato/publications/ranzato_cvpr13.pdf 80

Ranzato

- 1-s2.0-S1574954106000355-mainUploaded byCatalin Crisan
- Anna PplUploaded byNaglaa Mostafa
- Fabric Inspection System using Artificial Neural NetworksUploaded byZeeshan Aslam
- Xyz - hjk + 2dc.pdfUploaded byDavor Predavec
- nnet_gsUploaded byAntoniojuarezjuarez
- Efficient VLSI Implementation Based On Constructive Neural Network AlgorithmsUploaded byIOSRjournal
- TamilCharacterRecognitionUploaded bysuntm
- SummaryUploaded bymohsin
- Deap Learning Ai, ai, programming, lisp, artificale inteligenceUploaded bysarin_scm
- Le 2012 Large Scale Unsupervised LearningUploaded bystald
- CS231n Convolutional Neural Networks for Visual RecognitionUploaded byUsama Javed
- Articolos Vision Por ComputadoUploaded byCarlos Walter Pacheco Lora
- Age and Gender ClassificationUploaded byMohammed Zubair
- Session 5 - Neural NetworksUploaded byAkshay Aggarwal
- 05 Aparaschivei F - A Neural Network Model for Sales AnalysisUploaded bysyzuhdi
- A Survey of Deep Learning based Network Anomaly DetectionUploaded bycoolboy_usama
- 10.14569/IJACSA.2016.070731Uploaded bymellah
- List+of+Comm+Final+19-02-2017Uploaded byIbrahim Lotfy
- AnnUploaded byHarleen Virk
- Verma 2012Uploaded byVitthal Patnecha
- Interpretable Machine LearningUploaded byfarken
- documents.tips_ieee-2008-7th-world-congress-on-intelligent-control-and-automation-chongqing-58eb979ad6910.pdfUploaded byAnonymous XEsTpnpdH
- ARTIFICIAL Neural NetworkUploaded byKirandeep Singh
- SVM TutorialUploaded byNeetu Verma
- 00589532Uploaded byZhazar Toktabolat
- Comparing Soft Computing Techniques For Early Stage Software Development Effort EstimationsUploaded byijsea
- TCP-IP[RO][Tim Parker][Mark Sportack][Ed. Teora - 2002].DjvuUploaded byDuma Tiberiu
- IJAE_V2_I3Uploaded byAI Coordinator - CSC Journals
- Application Methods artificial neural network(Ann) Back propagation structure for Predicting the value Of bed channel roughnes scoefficientUploaded bywww.irjes.com
- 1557070052185_ieeeeeeee.docxUploaded byjyothibg

- Project open sourceUploaded byAgus Suhendar
- 運用TRIZUploaded byCheng Yu Wang
- Newspaper Chase(Penguin Readers Easystarts)Uploaded byIskra
- (2005-04-19) PIC_SERVER教材(13)－如何以PIC_SERVER收發手機簡訊Uploaded byCheng Yu Wang
- (2009-04-05) 史都華平台原理與定位方式Uploaded byCheng Yu Wang
- The Three Billy Goats GrufF Penguin Readers EasUploaded byCheng Yu Wang
- MockTest_N1Uploaded byCheng Yu Wang
- 手持裝置天線設計Ch2-RF概念與天線Uploaded byCheng Yu Wang
- MockTest_N3Uploaded byCheng Yu Wang
- Chap 概率图模型Uploaded byCheng Yu Wang
- Choosing a Good ChartUploaded byMark Druskoff
- The_Three_Billy_Goats_GrufF_Penguin_Readers_Eas.pdfUploaded byCheng Yu Wang
- 10-03-ypl-1aUploaded byCheng Yu Wang
- OFFSET动态图解Uploaded byCheng Yu Wang
- 機器學習與智慧音樂應用Uploaded byCheng Yu Wang
- 【1】香港朗文一年级1A教材Uploaded byCheng Yu Wang
- Oracle DBA Code ExamplesUploaded byJaime Arnold Huanca Valle
- 2012.07.N1Uploaded byCheng Yu Wang
- 2012.12.N1Uploaded byCheng Yu Wang
- 2014.12.N2Uploaded byCheng Yu Wang
- (2016!07!13) 第六章 雲端伺服器與MQTT Broker建置Uploaded byCheng Yu Wang
- 2011.07.N1Uploaded byCheng Yu Wang
- 2010.12.N1Uploaded byCheng Yu Wang
- docslide.net_-255d07d22bb61eb9e198b4635Uploaded byCheng Yu Wang
- 2011.12.N1Uploaded byCheng Yu Wang
- R1PlotsUploaded byEduard Mars
- %E7%AE%97%E6%B3%95%E6%A6%82%E8%AE%BAUploaded byCheng Yu Wang
- Ann and Deeplearning PreviewUploaded byCheng Yu Wang
- (2009-07-03) 室內定位技術簡介Uploaded byCheng Yu Wang

- Soft Computing SyllabusUploaded bypiyanshu agrawal
- Binary Classification Tutorial With the Keras Deep Learning LibraryUploaded byShudu Tang
- SPEECH RECOGNITION WITH DEEP RECURRENT NEURAL NETWORKSUploaded byjehosha
- Diagnosis of Disease through Voice Recordings using Artificial Neural NetworksUploaded byATS
- Noisy Time Series Prediction using a Recurrent.pdfUploaded byrenvor50
- A PREDICTIVE SYSTEM FOR DETECTION OF BANKRUPTCY USING MACHINE LEARNING TECHNIQUESUploaded byLewis Torres
- 1-s2.0-S0168169916304665-mainUploaded byNaufal Abiyasa Pratama
- 10.1.1.135Uploaded byNavaneetha Sattar
- Neural Network Forecasting TechniqueUploaded bysyzuhdi
- sk_cvUploaded byAmit Patel
- Neural CryptographyUploaded byManoj Singh
- 1_what is AI_INAUploaded byBayu 'bayday' Swasono
- 2017 DoS Detection Method Based on ANNUploaded byNew Faizin Ridho
- CS RequirementsUploaded byxAdInfinitum
- Final ProjectUploaded byDrArulkarthick
- Notational Analysisa Math PerspectiveUploaded byIlla Adeng
- A Survey on Data Mining Association Rules by Effective Algorithms1 (1)Uploaded byInternational Journal of Innovative Science and Research Technology
- Fuzzy Ideology Based Long Term Load ForecastingUploaded byCristian Klen
- Modern Robotics Building Versatile Macines - Harry Henderson - Allbooksfree.tkUploaded byAnonymous VfH06fQXbA
- Difference Histograms A new tool for time series analysis applied to bearing fault diagnosis-dataset.pdfUploaded byanepires87
- Introduction to Machine Learning, In New Advances in Machine LearningUploaded byEdwin
- 4_Samuel Karanja Kabini-FInal Paper1Uploaded byiiste
- Multilevel Structure in Behaviour and in the Brain. a Model's of Fuster's Hierarchy - 2007Uploaded bylinopino
- condition assessment of bridge structures using structural health monitoring condition assessment of bridge structures using structural health monitoringUploaded byAsif Chaudry
- Ppt Chapter 07Uploaded byEnfant Mort
- G RamamurthyUploaded byvivekola
- Intrusion Detection Based on AIUploaded byAnonymous Th1S33
- Top 10 Neural NetworksUploaded byijsc
- Marie Cottrell, Barbara Hammer, Alexander Hasenfuß and Thomas Villmann- Batch and median neural gasUploaded byGrettsz
- 1507.07998v1Uploaded bybigbigbig90003270