Professional Documents
Culture Documents
Contents
Introduction
Artificial neuron model
Feed-forward, recurrent network
Learning
Back-propagation
General comments
2
EE531 Statistical Learning Theory
Spring, 2016
Introduction(1)
Neural network is a non-linear classifier/regression. One strategy is to make
networks of units-simple parameterized functions. This was inspired by
the brain, but has pretty clearly demonstrated not to be what the brain does.
3
EE531 Statistical Learning Theory
Spring, 2016
Introduction(2)
Feed-forward network: data flows from input to output with no cycles.
Outputs
Input
layer
Hidden
layer 1
Hidden
layer 2
Output
layer
4
EE531 Statistical Learning Theory
Spring, 2016
Introduction(3)
Sometimes the activation function for output units is the same as for hidden
units.
What if we let be the identity? Then we have a linear combination of linear
combinations, which remains linear.
Could use binary threshold units:
If inputs are also in {0,1}, then you can implement any Boolean function this
way. Neurons act as logic gates.
If inputs are real-valued, you can make a 3-layer network that gets arbitrarily
close to any decision boundary.
Problem with threshold units is that no-one knows a good way to set the
weights based on data. In fact, its NP-hard.
5
EE531 Statistical Learning Theory
Spring, 2016
1
0.8
Log-sigmoid
0.6
0.4
g(a)
0.2
0
-0.2
-0.4
-0.6
-0.8
-1
-30
-20
-10
0
a
10
20
-20
-10
0
a
10
20
30
1
0.8
0.6
0.4
g(a)
0.2
0
-0.2
-0.4
-0.6
-0.8
-1
-30
30
Spring, 2016
7
EE531 Statistical Learning Theory
Spring, 2016
Example(1)
The linear model for regression/classification is based on linear combinations
of fixed nonlinear basis functions
and take the form
8
EE531 Statistical Learning Theory
Spring, 2016
Example(2)
For standard regression problems, the activation function of output unit, ,
is the identity so that
.
For multiple binary classification problem, is the logistic sigmoid function.
For multiclass problems,
9
EE531 Statistical Learning Theory
Spring, 2016
10
EE531 Statistical Learning Theory
Spring, 2016
Input
layer
Hidden
layer 1
Hidden
layer 2
Output
layer
Outputs
11
EE531 Statistical Learning Theory
Spring, 2016
Input
layer
Hidden
layer 1
Hidden
layer 2
Output
layer
Outputs
12
EE531 Statistical Learning Theory
Spring, 2016
Input
layer
Hidden
layer 1
Hidden
layer 2
Output
layer
Outputs
using
13
EE531 Statistical Learning Theory
Spring, 2016
Input
layer
Hidden
layer 1
Hidden
layer 2
Output
layer
Outputs
14
EE531 Statistical Learning Theory
Spring, 2016
Input
layer
Hidden
layer 1
Hidden
layer 2
Output
layer
Outputs
15
EE531 Statistical Learning Theory
Spring, 2016
General comments
Pragmatics
Can be very sensitive to initial conditions.
starting at zero never moves
big weights Saturation never moves
start with small random weights
Overfitting avoidance
stop training early (use validation set to decide when): keeps weights
from getting too big
weight decay : penalize sum of squared weights in error function
16
EE531 Statistical Learning Theory
Spring, 2016
General comments
Dont use 0,1 as targets if you have a sigmoid output unit requires weight
to go .
Local optima : no guarantee that any given run will converge to anything, let
alone the global optimum. Start multiple time from different initial
conditions. Use the apparent best or vote the outcomes.
Learning rate : Should decrease over time there are methods for adapting the
learning rate. Still, back-propagation is slow. Conjugate gradient is usually
better (but hairier to implement.)
17
EE531 Statistical Learning Theory
Spring, 2016
BB-RBM Structure
Undirected bipartite graphical model
Generative model
Special case of product of expert
: visible units
: hidden units
Model parameter :
and
In this case:
18
EE531 Statistical Learning Theory
Spring, 2016
BB-RBM Inference
Given the visible layer , all the hidden units
independent, and vice versa: e.g.,
are conditionally
19
EE531 Statistical Learning Theory
Spring, 2016
GB-RBM Structure
Undirected bipartite graphical model
: visible units
: hidden units
Model parameter :
and
In this case:
20
EE531 Statistical Learning Theory
Spring, 2016
GB-RBM Inference
Given the visible layer , all the hidden units
and vice versa
Spring, 2016
RBM Training
Maximum likelihood learning
Objective: maximize the log-likelihood of the given training data
where
22
EE531 Statistical Learning Theory
Spring, 2016
Summary
Introduction
Artificial neuron model
Feed-forward, recurrent network
Learning
Back-propagation
General comments
23
EE531 Statistical Learning Theory
Spring, 2016
Data distribution
(posterior of
given
Need to compute
and
Model distribution
)
e.g.,
24
EE531 Statistical Learning Theory
Spring, 2016
is
25
EE531 Statistical Learning Theory
Spring, 2016
Equilibrium
distribution
[G. Hinton et al., 2006]
26
EE531 Statistical Learning Theory
Spring, 2016
If
, the samples from the Markov chain converge to samples from the
model distribution, and the bias goes away
The objective is to learn the parameters of the model (RBM) to well-represent the
distribution of given data
Spring, 2016
Spring, 2016
Spring, 2016
reconstructio
n
Spring, 2016
Where
is a sample from k-step CD
Can derive similar update rule for biases and
Mini-batch (around 100 samples) strategy is used to reduce the variance of the
gradient estimate
Implemented in ~25 lines of MATLAB code
Spring, 2016
RBMs are used for constructing deep models to build more powerful
generative model such as Deep Belief Networks (DBNs)
Deep models can represent hierarchical features
Spring, 2016
32