NN-Ch2 New V1

1/10/2017 1
Neural Networks
for
Pattern Classification
1/10/2017 2
Neural Networks for Pattern Classification

General discussion
Linear separability
Hebb nets
Perceptron
1/10/2017 3
General discussion
Pattern recognition
Patterns: images, personal records, driving habits, etc.
Represented as a vector of features (encoded as integers or real
numbers in NN)
Pattern classification:
Classify a pattern to one of the given classes
Form pattern classes
Pattern associative recall
Using a pattern to recall a related pattern
Pattern completion: using a partial pattern to recall the whole
pattern
Pattern recovery: deals with noise, distortion, missing
information
1/10/2017 4
General architecture
1
Single layer
x w1 b
1
wn Y
n x
net input to Y: net b x i w i n
i 1
bias b is treated as the weight from a special unit with constant

output 1.
threshold related to Y output
if net
y f (net )
1
- 1 if net

classify ( x1 , ...... xn ) into one of the two classes
1/10/2017 5
Decision region/boundary
x2 +
n = 2, b != 0, q = 0
b x1w1 x 2 w 2 0 or -
w1 b
x2 x1 x1
w2 w2
is a line, called decision boundary, which partitions the plane into
two decision regions
If a point/pattern ( x1 , x 2 ) is in the positive region, then
b x1w1 x2 w2 0 , and the output is one (belongs to class
one)
Otherwise, b x1w1 x2 w2 0, output 1 (belongs to class two)

n = 2, b = 0, q != 0 would result a similar partition
1/10/2017 6
If n = 3 (three input units), then the decision boundary is a two

dimensional plane in a three dimensional space
In general, a decision boundary b i 1 xi wi 0

n
is a n-1
dimensional hyper-plane in an n dimensional space, which
partition the space into two decision regions
This simple network thus can classify a given pattern into one of
the two classes, provided one of these two classes is entirely in
one decision region (one side of the decision boundary) and the
other class is in another region.
The decision boundary is determined completely by the weights

W and the bias b (or threshold ).
1/10/2017 7
Linear Separability Problem

If two classes of patterns can be separated by a decision
boundary, represented by the linear equation b i 1 xi wi 0
n
then they are said to be linearly separable. The simple network

can correctly classify any patterns.
Decision boundary (i.e., W, b or ) of linearly separable classes
can be determined either by some learning procedures or by
solving linear equation systems based on representative patterns
of each classes
If such a decision boundary does not exist, then the two classes
are said to be linearly inseparable.
Linearly inseparable problems cannot be solved by the simple
network , more sophisticated architecture is needed.
1/10/2017 8
Examples of linearly separable classes

- Logical AND function o x
patterns (bipolar) decision boundary
x1 x2 y w1 = 1
-1 -1 -1 w2 = 1 o o
-1 1 -1 b = -1
1 -1 -1 = 0 x: class I (y = 1)
1 1 1 -1 + x1 + x2 = 0 o: class II (y = -1)
- Logical OR function
x x
x1 x2 y w1 = 1
-1 -1 -1 w2 = 1
-1 1 1 b=1 o x
1 -1 1 =0
1 1 1 1 + x1 + x2 = 0 x: class I (y = 1)
o: class II (y = -1)
1/10/2017 9
Examples of linearly inseparable classes

x o
- Logical XOR (exclusive OR) function
x1 x2 y
-1 -1 -1 o x
-1 1 1
1 -1 1 x: class I (y = 1)
1 1 -1 o: class II (y = -1)
No line can separate these two classes.
1/10/2017 10
XOR can be solved by a more complex network with hidden

units
1
2
x1 z1 2
0
-2
Y
-2
x2 2
z2 2
(-1, -1) (-1,-1) -1

(-1, 1) (-1, 1) 1
(1, -1) (1, -1) 1
(1, 1) (1, 1) -1
1/10/2017 11
1/10/2017 12
1/10/2017 13
Different non linearly separable problems
Types of Exclusive-OR Classes with Most General

Structure
Decision Regions Problem Meshed regions Region Shapes
Single-Layer Half Plane A B
Bounded By B
A
Hyperplane B A
Two-Layer Convex Open A B

Or B
A
Closed Regions B A
Three-Layer Abitrary
A B
(Complexity B
Limited by No. A
B A
of Nodes)
1/10/2017 14
Can a single neuron learn a task?

1/10/2017 15
Hebb Nets
Hebb, in his influential book The organization of Behavior (1949),
claimed
Behavior changes are primarily due to the changes of synaptic
strengths ( wij ) between neurons I and j
w ij increases only when both I and j are on: the Hebbian
learning law
In ANN, Hebbian law can be stated: w increases only if the
outputs of both units x and y have theij same sign.
i j
In our simple network (one output and n input units)
wij wij (new ) wij (old ) xi y
or, wij wij (new ) wij (old ) xi y

1/10/2017 16
Hebb net (supervised) learning algorithm (p.49)

Step 0. Initialization: b = 0, wi = 0, i = 1 to n
Step 1. For each of the training sample s:t do steps 2 -4
/* s is the input pattern, t the target output of the sample */
Step 2. xi := si, I = 1 to n /* set s to input units */
Step 3. y := t /* set y to the target */
Step 4. wi := wi + xi * y, i = 1 to n /* update weight */
b := b + xi * y /* update bias */
Notes: 1) = 1, 2) each training sample is used only once.

Examples: AND function
Binary units (1, 0)
(x1, x2, 1) y=t w1 w2 b An incorrect boundary:
(1, 1, 1) 1 1 1 1 1 + x1 + x2 = 0
(1, 0, 1) 0 1 1 1 Is learned after using
(0, 1, 1) 0 1 1 1
(0, 0, 1) 0 1 1 1 each sample once
bias unit
1/10/2017 17
Bipolar units (1, -1)
(x1, x2, 1) y=t w1 w2 b

(1, 1, 1) 1 1 1 1
(1, -1, 1) -1 0 2 0 A correct boundary
(-1, 1, 1) -1 1 1 -1 -1 + x1 + x2 = 0
(-1, -1, 1) -1 2 2 -2 is successfully learned
It will fail to learn x1 ^ x2 ^ x3, even though the function is
linearly separable.
Stronger learning methods are needed.
Error driven: for each sample s:t, compute y from s based
on current W and b, then compare y and t
Use training samples repeatedly, and each time only change
weights slightly (<< 1)
Learning methods of Perceptron and Adaline are good
examples
1/10/2017 18
The Perceptron
In 1958, Frank Rosenblatt introduced a training algorithm that
provided the first procedure for training a simple ANN: a
perceptron.
The operation of Rosenblatts perceptron is based on the

McCulloch and Pitts neuron model. The model consists of a
linear combiner followed by a hard limiter.
The weighted sum of the inputs is applied to the hard limiter,
which produces an output equal to +1 if its input is positive and -
1 if it is negative.
1/10/2017 19
Single-layer two-input perceptron
Inputs
x1 Linear Hard
w1 Combiner Limiter
Output
Y
w2

x2
Threshold
1/10/2017 20
The aim of the perceptron is to classify inputs, x1, x2, . . ., xn, into
one of two classes, say A1 and A2.
In the case of an elementary perceptron, the n- dimensional
space is divided by a hyperplane into two decision regions. The
hyperplane is defined by the linearly separable function:
n
xi wi 0
i 1
1/10/2017 21
Linear separability in the perceptron

x2 x2
Class A1
1
2
1
x1
Class A2 x1
x1w1 + x2w2 = 0 x1w1 + x2w2 + x3w3 = 0

x3
(a) Two-input perceptron. (b) Three-input perceptron.
1/10/2017 22
How does the perceptron learn its classification tasks?
This is done by making small adjustments in the weights to reduce

the difference between the actual and desired outputs of the
perceptron.
The initial weights are randomly assigned, usually in the range [-
0.5, 0.5], and then updated to obtain the output consistent with the
training examples.
1/10/2017 23
If at iteration p, the actual output is Y(p) and the desired output

is Yd (p), then the error is given by:
e( p) Yd ( p) Y( p) where p = 1, 2, 3, . . .
Iteration p here refers to the pth training example presented to the

perceptron.
If the error, e(p), is positive, we need to increase perceptron
output Y(p), but if it is negative, we need to decrease Y(p).
1/10/2017 24
The perceptron learning rule
wi ( p 1) wi ( p) a . xi ( p) . e( p)
where p = 1, 2, 3, . . .
is the learning rate, a positive constant less than unity.

The perceptron learning rule was first proposed by Rosenblatt in
1960. Using this rule we can derive the perceptron training
algorithm for classification tasks.
1/10/2017 25
Perceptrons training algorithm

Step 1: Initialization
Set initial weights w1, w2,, wn and threshold to random
numbers in the range [-0.5, 0.5].
If the error, e(p), is positive, we need to increase perceptron
output Y(p), but if it is negative, we need to decrease Y(p).
Step 2: Activation
Activate the perceptron by applying inputs x1(p), x2(p),, xn(p)
and desired output Yd (p). Calculate the actual output at iteration p
=1
n
Y ( p ) step x i ( p ) w i ( p )
i 1
1/10/2017 26
Perceptrons training algorithm (continued)

where n is the number of the perceptron inputs, and step is a
step activation function.
Step 3: Weight training

Update the weights of the perceptron
wi ( p 1) wi ( p) wi ( p)
where wi(p) is the weight correction at iteration p.

The weight correction is computed by the delta rule:
wi ( p) xi ( p) . e( p)
1/10/2017 27
Perceptrons training algorithm (continued)
Step 4: Iteration
Increase iteration p by one, go back to Step 2 and repeat the
process until convergence.
1/10/2017 28
Example of perceptron learning: the logical operation AND

Inputs Desired Initial Actual Error Final
Epoch output weights output weights
x1 x2 Yd w1 w2 Y e w1 w2
1 0 0 0 0.3 0.1 0 0 0.3 0.1
0 1 0 0.3 0.1 0 0 0.3 0.1
1 0 0 0.3 0.1 1 1 0.2 0.1
1 1 1 0.2 0.1 0 1 0.3 0.0
2 0 0 0 0.3 0.0 0 0 0.3 0.0
0 1 0 0.3 0.0 0 0 0.3 0.0
1 0 0 0.3 0.0 1 1 0.2 0.0
1 1 1 0.2 0.0 1 0 0.2 0.0
3 0 0 0 0.2 0.0 0 0 0.2 0.0
0 1 0 0.2 0.0 0 0 0.2 0.0
1 0 0 0.2 0.0 1 1 0.1 0.0
1 1 1 0.1 0.0 0 1 0.2 0.1
4 0 0 0 0.2 0.1 0 0 0.2 0.1
0 1 0 0.2 0.1 0 0 0.2 0.1
1 0 0 0.2 0.1 1 1 0.1 0.1
1 1 1 0.1 0.1 1 0 0.1 0.1
5 0 0 0 0.1 0.1 0 0 0.1 0.1
0 1 0 0.1 0.1 0 0 0.1 0.1
1 0 0 0.1 0.1 0 0 0.1 0.1
1 1 1 0.1 0.1 1 0 0.1 0.1
Threshold: = 0.2; learning rate: = 0.1
1/10/2017 29
Two-dimensional plots of basic logical operations

x2 x2 x2
1 1 1
x1 x1 x1
0 1 0 1 0 1
(a) AND (x1 n x2) (b) OR (x 1 x 2 ) (c) Excl us iv e-OR

(x 1 x2 )
A perceptron can learn the operations AND and OR, but not
Exclusive-OR.
1/10/2017 30
Multilayer neural networks
A multilayer perceptron is a feedforward neural network with

one or more hidden layers.
The network consists of an input layer of source neurons, at
least one middle or hidden layer of computational neurons, and
an output layer of computational neurons.
The input signals are propagated in a forward direction on a
layer-by-layer basis.
1/10/2017 31
Multilayer perceptron with two hidden layers
Output Signals
Input Signals
First Second
Input hidden hidden Output
layer layer layer layer
1/10/2017 32
What does the middle layer hide?

A hidden layer hides its desired output. Neurons in the hidden
layer cannot be observed through the input/output behavior of the
network. There is no obvious way to know what the desired
output of the hidden layer should be.
Commercial ANNs incorporate three and sometimes four layers,
including one or two hidden layers. Each layer can contain from
10 to 1000 neurons. Experimental neural networks may have five
or even six layers, including three or four hidden layers, and
utilize millions of neurons.
1/10/2017 33
Back-propagation neural network

Learning in a multilayer network proceeds the same way as for a
perceptron.
A training set of input patterns is presented to the network.
The network computes its output pattern, and if there is an error - or
in other words a difference between actual and desired output
patterns - the weights are adjusted to reduce this error.
In a back-propagation neural network, the learning algorithm has
two phases.
First, a training input pattern is presented to the network input layer.
The network propagates the input pattern from layer to layer until
the output pattern is generated by the output layer.
1/10/2017 34
If this pattern is different from the desired output, an error is

calculated and then propagated backwards through the network
from the output layer to the input layer. The weights are
modified as the error is propagated.
1/10/2017 35
Three-layer back-propagation neural network

Input signals
1
x1
1 y1
1
2
x2 2 y2
2
i wij j wjk
xi k yk
m
n l yl
xn
Input Hi dden Output
layer layer layer
Error signals
1/10/2017 36
The back-propagation training algorithm
Step 1: Initialization
Set all the weights and threshold levels of the network to
random numbers uniformly distributed inside a small range:
2.4 2.4
,
Fi Fi
where Fi is the total number of inputs of neuron i in the

network. The weight initialization is done on a neuron-by-
neuron basis.
1/10/2017 37
Step2: Activation
Activate the back-propagation neural network by applying inputs
x1(p), x2(p),, xn(p) and desired outputs yd,1(p), yd,2(p),,yd,n(p).
(a) Calculate the actual outputs of the neurons in the hidden
layer:
n
y j ( p ) sigmoid xi ( p ) wij ( p ) j
i 1
where n is the number of inputs of neuron j in the hidden layer,

and sigmoid is the sigmoid activation function.
1/10/2017 38
Step 2 : Activation (continued)
(b) Calculate the actual outputs of the neurons in the output

layer:
m
yk ( p ) sigmoid x jk ( p ) w jk ( p ) k
j 1
where m is the number of inputs of neuron k in the output layer.

1/10/2017 39
Step 3: Weight training

Update the weights in the back-propagation network propagating
backward the errors associated with output neurons.
(a) Calculate the error gradient for the neurons in the output layer:
k ( p) yk ( p) 1 yk ( p) ek ( p)
where ek ( p) yd ,k ( p) yk ( p)
Calculate the weight corrections:
w jk ( p) y j ( p) k ( p)
Update the weights at the output neurons:
w jk ( p 1) w jk ( p) w jk ( p)
1/10/2017 40
Step 3: Weight training (continued)
(b) Calculate the error gradient for the neurons in the hidden
layer:
l
j ( p ) y j ( p ) [1 y j ( p ) ] k ( p) w jk ( p )
k 1
Calculate the weight corrections:
wij ( p) xi ( p) j ( p)
Update the weights at the hidden neurons:

wij ( p 1) wij ( p) wij ( p)
1/10/2017 41
Step 4: Iteration
Increase iteration p by one, go back to Step 2 and repeat the
process until the selected error criterion is satisfied.
As an example, we may consider the three-layer back-propagation

network. Suppose that the network is required to perform logical
operation Exclusive-OR. Recall that a single-layer perceptron
could not do this operation. Now we will apply the three-layer net.
1/10/2017 42
Three-layer network for solving the Exclusive-OR operation

1
3
w13 1
x1 1 3 w35
w23 5
5 y5
w24
x2 2 4 w45
w24
Input 4 Output
layer layer
1
Hiddenlayer
1/10/2017 43
The effect of the threshold applied to a neuron in the hidden or

output layer is represented by its weight, , connected to a fixed
input equal to -1.
The initial weights and threshold levels are set randomly as
follows:
w13 = 0.5, w14 = 0.9, w23 = 0.4, w24 = 1.0, w35 = -1.2, w45 = 1.1, 3 =
0.8, 4 = -0.1 and 5 = 0.3.
1/10/2017 44
We consider a training set where inputs x1 and x2 are equal to 1

and desired output yd,5 is 0. The actual outputs of neurons 3 and 4
in the hidden layer are calculated as
y3 sigmoid ( x1w13 x2w23 3) 1/ 1 e(10.510.410.8) 0.5250
y4 sigmoid ( x1w14 x2w24 4 ) 1/ 1 e (10.911.010.1) 0.8808
Now the actual output of neuron 5 in the output layer is

determined as:
y5 sigmoid( y3w35 y4w45 5) 1/ 1 e(0.52501.20.88081.110.3) 0.5097
Thus, the following error is obtained:

e yd,5 y5 0 0.5097 0.5097
1/10/2017 45
The next step is weight training. To update the weights and

threshold levels in our network, we propagate the error, e, from
the output layer backward to the input layer.
First, we calculate the error gradient for neuron 5 in the output
layer:
5 y5 (1 y5 ) e 0.5097 (1 0.5097) ( 0.5097) 0.1274
Then we determine the weight corrections assuming that the

learning rate parameter, , is equal to 0.1:
w35 y3 5 0.1 0.5250 (0.1274) 0.0067
w45 y4 5 0.1 0.8808 (0.1274) 0.0112
5 ( 1) 5 0.1 (1) (0.1274) 0.0127
1/10/2017 46
Next we calculate the error gradients for neurons 3 and 4 in the

hidden layer:
3 y3(1 y3) 5 w35 0.5250 (1 0.5250) ( 0.1274) ( 1.2) 0.0381
4 y4 (1 y4 ) 5 w45 0.8808 (1 0.8808) ( 0.127 4) 1.1 0.0147
We then determine the weight corrections:

w13 x1 3 0.1 1 0.0381 0.0038
w23 x2 3 0.11 0.0381 0.0038
3 ( 1) 3 0.1 ( 1) 0.0381 0.0038
w14 x1 4 0.11 ( 0.0147) 0.0015
w24 x2 4 0.11 ( 0.0147) 0.0015
4 ( 1) 4 0.1 ( 1) ( 0.0147) 0.0015
1/10/2017 47
At last, we update all weights and threshold:
w13 w13 w13 0.5 0.0038 0.5038

w14 w14 w14 0.9 0.0015 0.8985
w23 w23 w23 0.4 0.0038 0.4038
w24 w24 w24 1.0 0.0015 0.9985
w35 w35 w35 1.2 0.0067 1.2067
w45 w45 w45 1.1 0.0112 1.0888
3 3 3 0.8 0.0038 0.7962
4 4 4 0.1 0.0015 0.0985
5 5 5 0.3 0.0127 0.3127
The training process is repeated until the sum of squared errors

is less than 0.001.
1/10/2017 48
Learning curve for operation Exclusive-OR

Sum-Squared Network Error for 224 Epochs
10 1
10 0
Sum-Squared Error
10 -1
10 -2
10 -3
10 -4
0 50 100 150 200
Epoch
1/10/2017 49
Final results of three-layer network learning
Inputs Desired Actual Sum of

output output squared
x1 x2 yd y5 e errors
1 1 0 0.0155 0.0010
0 1 1 0.9849
1 0 1 0.9849
0 0 0 0.0175
1/10/2017 50
Network represented by McCulloch-Pitts model for solving the

Exclusive-OR operation
+1.5
1
+1.0
x1 1 3 2.0 +0.5
+1.0
5 y5
+1.0
x2 2 +1.0
4
+1.0
+0.5
1
1/10/2017 51
Decision boundaries
x2 x2 x2
x1 + x2 1.5 = 0 x1 + x2 0.5 = 0
1 1 1
x1 x1 x1
0 1 0 1 0 1
(a) (b) (c)
(a) Decision boundary constructed by hidden neuron 3;

(b)Decision boundary constructed by hidden neuron 4;
(c)Decision boundaries constructed by the complete three-layer
network
1/10/2017 52
Pattern Association
and
Associative-Memory
1/10/2017 53
Neural networks were designed on analogy with the brain. The

brains memory, however, works by association.
For example, we can recognize a familiar face even in an
unfamiliar environment within 100-200ms. We can also recall a
complete sensory experience, including sounds and scenes, when
we hear only a few bars of music. The brain routinely associates
one thing with another.
Multilayer neural networks trained with the back-propagation
algorithm are used for pattern recognition problems. However, to
emulate the human memorys associative characteristics we need
a different type of network: a recurrent neural network.
A recurrent neural network has feedback loops from its
outputs to its inputs. The presence of such loops has a profound
impact on the learning capability of the network.
1/10/2017 54
Associative-Memory Networks
Input: Pattern (often noisy/corrupted)
Output: Corresponding pattern (complete / relatively noise-free)
Process
1. Load input pattern onto core group of highly-interconnected neurons.
2. Run core neurons until they reach a steady state.
3. Read output off of the states of the core neurons.
Inputs Outputs
Input: (1 0 1 -1 -1) Output: (1 -1 1 -1 -1)

1/10/2017 55
Associative Network Types

1. Auto-associative: X = Y
*Recognize noisy versions of a pattern
2. Hetero-associative Bidirectional: X <> Y

BAM = Bidirectional Associative Memory
*Iterative correction of input and output

1/10/2017 56
Associative Network Types (2)

3. Hetero-associative Input Correcting: X <> Y
*Input clique is auto-associative => repairs input patterns
4. Hetero-associative Output Correcting: X <> Y
*Output clique is auto-associative => repairs output patterns

1/10/2017 57
Hebbs Rule
Connection Weights ~ Correlations
``When one cell repeatedly assists in firing another, the axon of the first cell
develops synaptic knobs (or enlarges them if they already exist) in contact
with the soma of the second cell. (Hebb, 1949)
In an associative neural net, if we compare two pattern components (e.g. pixels)

within many patterns and find that they are frequently in:
a) the same state, then the arc weight between their NN nodes should be positive
b) different states, then negative
Matrix Memory:
The weights must store the average correlations between all pattern components
across all patterns. A net presented with a partial pattern can then use the
correlations to recreate the entire pattern.
1/10/2017 58
Correlated Field Components

Each component is a small portion of the pattern field (e.g. a pixel).
In the associative neural network, each node represents one field component.
For every pair of components, their values are compared in each of several patterns.
Set weight on arc between the NN nodes for the 2 components ~ avg correlation.
a
a
??
??
b
b
Avg Correlation
wab
a b
1/10/2017 59
Quantifying Hebbs Rule

Compare two nodes to calc a weight change that reflects the state correlation:
w jk i pki pj
Auto-Association: * When the two components are the same (different),
increase (decrease) the weight
Hetero-Association: w jk i pk o pj i = input component

o = output component
Ideally, the weights will record the average correlations across all patterns:
P P
Auto: w jk i pk i pj Hetero: w jk i pk o pj
p 1 p 1
Hebbian Principle: If all the input patterns are known prior to retrieval time,
then init weights as:
1 P 1 P
Auto: w jk i pk i pj Hetero: w jk i pk o pj
P p 1 P p 1
Weights = Average Correlations
1/10/2017 60
Matrix Representation
Let X = matrix of input patterns, where each ROW is a pattern. So xk,i = the ith bit
of the kth pattern.
Let Y = matrix of output patterns, where each ROW is a pattern. So yk,j = the jth
bit of the kth pattern.
Then, avg correlation between input bit i and output bit j across all patterns is:
1/P (x1,iy1,j + x2,iy2,j + + xp,iyp,j) = wi,j
To calculate all weights:

Hetero Assoc: W = XTY
Auto Assoc: W = XTX
X XT Dot product Y
P1 P2 Pp
Out P1: y1,1.. y1,jy1,n

In Pattern 1: x1,1..x1,n
.. Out P2: y2,1.. y2,jy2,n
In Pattern 2: x2,1..x2,n X1,i X2,i Xp,i
:
:
Out P3: yp,1.. yp,j yp,n
In Pattern p: x1,1..x1,n
1/10/2017 61
Auto-Associative Memory
1. Auto-Associative Patterns to Remember 3. Retrieval
Comp/Node value legend:

1 2 1 2 1 2
dark (blue) with x => +1
dark (red) w/o x => -1
3 4 3 4 3 4
light (green) => 0
2. Distributed Storage of All Patterns: 1 2
3 4
-1 1 2
1 1 2
3 4 3 4
1 node per pattern unit

Fully connected: clique 1 2
Weights = avg correlations across
all patterns of the corresponding units 3 4
1/10/2017 62
Hetero-Associative Memory
1. Hetero-Associative Patterns (Pairs) to Remember 3. Retrieval
1 1
a a
2 2
b b
3 3
2. Distributed Storage of All Patterns:

1
-1 a
1 2
b
3
1 node per pattern unit for X & Y

Full inter-layer connection
Weights = avg correlations across
all patterns of the corresponding units
1/10/2017 63
The Hopfield Network

&
Bidirectional Associative Memory
1/10/2017 64

John Hopfield formulated the physical principle of storing
information in a dynamically stable network.
Auto-Association Network
Fully-connected (clique) with symmetric weights
State of node = f(inputs)
Weight values based on Hebbian principle
Performance: Must iterate a bit to converge on a pattern, but
generally much less computation than in back-propagation
networks.
1/10/2017 65
Hopfield Networks
Input Output (after many iterations)

1/10/2017 66
The Hopfield network uses McCulloch and Pitts neurons with

the sign activation function as its computing element:
1, if X 0

Ysign 1, if X 0
Y, if X 0

The current state of the Hopfield network is determined by the
current outputs of all neurons, y1, y2, . . ., yn.
Thus, for a single-layer n-neuron network, the state can be
defined by the state vector as:
y1
y 2
Y

y n
1/10/2017 67

1/10/2017 68
1/10/2017 69
1/10/2017 70
Retrieval Algorithm
the output update rule for Hopfield autoassociative memory can
be expressed in the form
where k is the index of recursion and i is the number of the

neuron currently undergoing an update.
Asynchronous update sequence considered here is random.

Assuming that recursion starts at vo, and a random
sequence of updating neurons m,p, q, ... is chosen, the output
vectors obtained are as follows
1/10/2017 71
Storage Algorithm
In the Hopfield network, synaptic weights between neurons are
usually represented in matrix form.
Assume that the bipolar binary prototype vectors that need to be
stored are S(m) ),for m = 1, 2, . . .,p. The storage algorithm
for calculating the weight matrix is
where p is the number of states to be memorized by the

network, I is n*n identity matrix, and superscript t denotes
matrix transposition.
1/10/2017 72
Possible states for the three-neuron Hopfield network

y2
(1,1, 1) (1, 1, 1)
(1, 1, 1) (1, 1, 1)
y1
0
(1,1,1) (1,1,1)
(1,1, 1) (1,1, 1)
y3
1/10/2017 73
Hopfield Network
The stable state-vertex is determined by the weight matrix W, the
current input vector X, and the threshold matrix q. If the input
vector is partially incorrect or incomplete, the initial state will
converge into the stable state-vertex after a few iterations.
Hopfield Network Example: Suppose, for instance, that our

network is required to memorize two opposite states, (1, 1, 1) and
(-1, -1, -1). Thus,
1 1
Y2 1
or Y T 1 1 1
Y1 1 1 Y2T 1 1 1
1 1
where Y1 and Y2 are the three-dimensional vectors.

1/10/2017 74
The 3 3 identity matrix I is

1 0 0
I 0 1 0
0 0 1
Thus, we can now determine the weight matrix as follows:
1 1 1 0 0 0 2 2
W 1 1 1 1 1 1 1 1 2 0 1 0 2 0 2
1 1 0 0 1 2 2 0
Next, the network is tested by the sequence of input vectors, X1

and X2, which are equal to the output (or target) vectors Y1 and
Y2, respectively.
1/10/2017 75
First, we activate the Hopfield network by applying the input

vector X. Then, we calculate the actual output vector Y, and
finally, we compare the result with the initial input vector X.
0 2 2 1 0 1
1
Y1 sign 2 0 2 1 0
2 0
2
1 0 1

0 2 2 1 0 1

Y2 sign2 0 2 1 0 1
2 2 0 1 0 1

1/10/2017 76
The remaining six states are all unstable. However, stable states
(also called fundamental memories) are capable of attracting
states that are close to them.
The fundamental memory (1, 1, 1) attracts unstable states (-1, 1, 1),
(1, -1, 1) and (1, 1, -1). Each of these unstable states represents a
single error, compared to the fundamental memory (1, 1, 1).
The fundamental memory (-1, -1, -1) attracts unstable states (-1, -1,
1), (-1, 1, -1) and (1, -1, -1).
Thus, the Hopfield network can act as an error correction
network.
1/10/2017 77
Hopfield Network Example

1. Patterns to Remember 3. Build Network
p1 p2 p3
1/3
1 2 1 2 1 2 1 2
1/3 [-]
3 4 3 4 3 4 -1/3
-1/3 [+]
1/3
3 4
-1
4. Enter Test Pattern

1/3
1 2 1/3
-1/3 -1/3
3 4
1/3
-1
+1 0 -1
1/10/2017 78
Hopfield Network Example (2)

5. Synchronous Iteration (update all nodes at once)
Goal: Set weights such that an input vector Vi, yields itself when multiplied by
1/10/2017
the weights, W. 78
X = V1,V2..Vp, where p = # input vectors (i.e., patterns)
So Y=X, and the Hebbian weight calculation is: W = XTY = XTX

1 1 -1
1 1 1 -1 1 1 1
X= 1 1 -1 1 X T= 1 -1 1
-1 1 1 -1 -1 1 -1
3 1 -1 1
XTX = 1 3 1 -1
-1 1 3 -3
1 -1 -3 3
1/10/2017 79
Matrices (2)
The upper and lower triangles of the product matrix represents the 6
weights wi,j = wj,i
Scale the weights by dividing by p (i.e., averaging). Picton (ANN book)
subtracts p from each. Either method is fine, as long we apply the
appropriate thresholds to the output values.
This produces the same weights as in the non-matrix description.
Testing with input = ( 1 0 0 -1)
3 1 -1 1
(1 0 0 -1) 1 3 1 -1 = (2 2 2 -2)
-1 1 3 -3
1 -1 -3 3
Scaling* by p = 3 and using 0 as a threshold gives:

(2/3 2/3 2/3 -2/3) => (1 1 1 -1)
*For illustrative purposes, its easier to scale by p at the end
instead of scaling the entire weight matrix, W, prior to testing.
1/10/2017 80
1/10/2017 81
Asynchronous Iteration (One randomly-chosen node at a time)

1/10/2017 82

4c. Enter Another Test Pattern
1/3
1 2 1/3
-1/3 -1/3 Asynchronous Updating is central
3 4
1/3 to Hopfields (1982) original model.
-1
5c. Asynchronous Iteration (One randomly-chosen node at a time)

Update 3 Update 4 Update 2
Stable &
1/3 1/3 1/3 Spurious
1/3 1/3 1/3

-1/3 -1/3 -1/3 -1/3 -1/3
-1/3
1/3 1/3 1/3
-1 -1 -1
1/10/2017 83

4d. Enter Another Test Pattern
1/3
1 2 1/3
-1/3 -1/3
3 4
1/3
-1
5d. Asynchronous Iteration

Update 3 Update 4 Update 2
Stable
1/3 1/3 1/3 Pattern
p3
1/3 1/3 1/3
-1/3 -1/3 -1/3 -1/3 -1/3
-1/3
1/3 1/3 1/3
-1 -1 -1
1/10/2017 84

4e. Enter Same Test Pattern
1/3
1 2 1/3
-1/3 -1/3
3 4
1/3
-1
5e. Asynchronous Iteration (but in different order)

Update 2 Update 3 or 4 (No change)
1/3 1/3
1/3 1/3 Stable &
-1/3 -1/3 Spurious
-1/3 -1/3
1/3 1/3
-1 -1
1/10/2017 85
1/10/2017 86
1/10/2017 87
1/10/2017 88
Bidirectional Associative Memory (BAM)
The Hopfield network represents an autoassociative type of

memory - it can retrieve a corrupted or incomplete memory but
cannot associate this memory with another different memory.
Human memory is essentially associative. One thing may remind
us of another, and that of another, and so on. We use a chain of
mental associations to recover a lost memory. If we forget where
we left an umbrella, we try to recall where we last had it, what
we were doing, and who we were talking to. We attempt to
establish a chain of associations, and thereby to restore a lost
memory.
1/10/2017 89
To associate one memory with another, we need a recurrent neural

network capable of accepting an input pattern on one set of
neurons and producing a related, but different, output pattern on
another set of neurons.
Bidirectional Associative Memory (BAM), first proposed by
Bart Kosko, is a heteroassociative network. It associates patterns
from one set, set A, to patterns from another set, set B, and vice
versa. Like a Hopfield network, the BAM can generalize and also
produce correct outputs despite corrupted or incomplete inputs.
1/10/2017 90
BAM operation
x1(p) 1 x1(p+1) 1
1 y1(p) 1 y1(p)
x2 (p) 2 x2(p+1) 2
2 y2(p) 2 y2(p)
xi (p)
j yj(p) j yj(p)
i xi(p+1) i
m ym(p) m ym(p)
xn(p) n xn(p+1) n
Input Output Input Output
layer layer layer layer
(a) Forward direction. (b) Backward direction.
1/10/2017 91
The basic idea behind the BAM is to store pattern pairs so that
when n-dimensional vector X from set A is presented as input, the
BAM recalls m-dimensional vector Y from set B, but when Y is
presented as input, the BAM recalls X.
To develop the BAM, we need to create a correlation matrix for

each pattern pair we want to store. The correlation matrix is the
matrix product of the input vector X, and the transpose of the
output vector YT. The BAM weight matrix is the sum of all
correlation matrices, that is,
M
W m m
X Y T
m1
where M is the number of pattern pairs to be stored in the BAM.

1/10/2017 92
Setting the Weights :
The weight matrix to store a set of inputs and target vectors

s(p):t(p),p =1,,p, where
Can be determined by the Hebb rule. The formulas for the entries
depend on whether the training vectors are binary or bipolar.
For binary input vectors :
For bipolar input vectors :

1/10/2017 93
Discrete BAM activation functions:

1/10/2017 94
Algothims for Discrete BAM

1/10/2017 95
1/10/2017 96
Application:
Example A BAM net to associate letters with simple bipolar codes.
1/10/2017 97
1/10/2017 98
1/10/2017 99

NN-Ch2 New V1

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

NN-Ch2 New V1

Uploaded by

Copyright:

Available Formats

1/10/2017 1

Neural Networks for Pattern Classification

bias b is treated as the weight from a special unit with constant

Otherwise, b x1w1 x2 w2 0, output 1 (belongs to class two)

If n = 3 (three input units), then the decision boundary is a two

In general, a decision boundary b i 1 xi wi 0

The decision boundary is determined completely by the weights

Linear Separability Problem

then they are said to be linearly separable. The simple network

Examples of linearly separable classes

Examples of linearly inseparable classes

XOR can be solved by a more complex network with hidden

(-1, -1) (-1,-1) -1

Different non linearly separable problems

Types of Exclusive-OR Classes with Most General

Two-Layer Convex Open A B

Can a single neuron learn a task?

wij wij (new ) wij (old ) xi y

or, wij wij (new ) wij (old ) xi y

Hebb net (supervised) learning algorithm (p.49)

Notes: 1) = 1, 2) each training sample is used only once.

Bipolar units (1, -1)

(x1, x2, 1) y=t w1 w2 b

The operation of Rosenblatts perceptron is based on the

Single-layer two-input perceptron

Linear separability in the perceptron

x1w1 + x2w2 = 0 x1w1 + x2w2 + x3w3 = 0

How does the perceptron learn its classification tasks?

This is done by making small adjustments in the weights to reduce

If at iteration p, the actual output is Y(p) and the desired output

Iteration p here refers to the pth training example presented to the

The perceptron learning rule

is the learning rate, a positive constant less than unity.

Perceptrons training algorithm

Perceptrons training algorithm (continued)

Step 3: Weight training

where wi(p) is the weight correction at iteration p.

Perceptrons training algorithm (continued)

Example of perceptron learning: the logical operation AND

Two-dimensional plots of basic logical operations

(a) AND (x1 n x2) (b) OR (x 1 x 2 ) (c) Excl us iv e-OR

Multilayer neural networks

A multilayer perceptron is a feedforward neural network with

Multilayer perceptron with two hidden layers

What does the middle layer hide?

Back-propagation neural network

If this pattern is different from the desired output, an error is

Three-layer back-propagation neural network

The back-propagation training algorithm

where Fi is the total number of inputs of neuron i in the

where n is the number of inputs of neuron j in the hidden layer,

Step 2 : Activation (continued)

(b) Calculate the actual outputs of the neurons in the output

where m is the number of inputs of neuron k in the output layer.

Step 3: Weight training

Calculate the weight corrections:

Step 3: Weight training (continued)

Update the weights at the hidden neurons:

As an example, we may consider the three-layer back-propagation

Three-layer network for solving the Exclusive-OR operation

The effect of the threshold applied to a neuron in the hidden or

We consider a training set where inputs x1 and x2 are equal to 1

y3 sigmoid ( x1w13 x2w23 3) 1/ 1 e(10.510.410.8) 0.5250