You are on page 1of 99

1/10/2017 1

Neural Networks
for
Pattern Classification
1/10/2017 2

Neural Networks for Pattern Classification


General discussion
Linear separability
Hebb nets
Perceptron
1/10/2017 3

General discussion
Pattern recognition
Patterns: images, personal records, driving habits, etc.
Represented as a vector of features (encoded as integers or real
numbers in NN)
Pattern classification:
Classify a pattern to one of the given classes
Form pattern classes
Pattern associative recall
Using a pattern to recall a related pattern
Pattern completion: using a partial pattern to recall the whole
pattern
Pattern recovery: deals with noise, distortion, missing
information
1/10/2017 4

General architecture
1
Single layer
x w1 b
1
wn Y
n x
net input to Y: net b x i w i n
i 1

bias b is treated as the weight from a special unit with constant


output 1.
threshold related to Y output
if net
y f (net )
1
- 1 if net

classify ( x1 , ...... xn ) into one of the two classes
1/10/2017 5

Decision region/boundary
x2 +
n = 2, b != 0, q = 0
b x1w1 x 2 w 2 0 or -
w1 b
x2 x1 x1
w2 w2
is a line, called decision boundary, which partitions the plane into
two decision regions
If a point/pattern ( x1 , x 2 ) is in the positive region, then
b x1w1 x2 w2 0 , and the output is one (belongs to class
one)

Otherwise, b x1w1 x2 w2 0, output 1 (belongs to class two)


n = 2, b = 0, q != 0 would result a similar partition
1/10/2017 6

If n = 3 (three input units), then the decision boundary is a two


dimensional plane in a three dimensional space

In general, a decision boundary b i 1 xi wi 0


n
is a n-1
dimensional hyper-plane in an n dimensional space, which
partition the space into two decision regions

This simple network thus can classify a given pattern into one of
the two classes, provided one of these two classes is entirely in
one decision region (one side of the decision boundary) and the
other class is in another region.

The decision boundary is determined completely by the weights


W and the bias b (or threshold ).
1/10/2017 7

Linear Separability Problem


If two classes of patterns can be separated by a decision
boundary, represented by the linear equation b i 1 xi wi 0
n

then they are said to be linearly separable. The simple network


can correctly classify any patterns.
Decision boundary (i.e., W, b or ) of linearly separable classes
can be determined either by some learning procedures or by
solving linear equation systems based on representative patterns
of each classes
If such a decision boundary does not exist, then the two classes
are said to be linearly inseparable.
Linearly inseparable problems cannot be solved by the simple
network , more sophisticated architecture is needed.
1/10/2017 8

Examples of linearly separable classes


- Logical AND function o x
patterns (bipolar) decision boundary
x1 x2 y w1 = 1
-1 -1 -1 w2 = 1 o o
-1 1 -1 b = -1
1 -1 -1 = 0 x: class I (y = 1)
1 1 1 -1 + x1 + x2 = 0 o: class II (y = -1)
- Logical OR function
x x
patterns (bipolar) decision boundary
x1 x2 y w1 = 1
-1 -1 -1 w2 = 1
-1 1 1 b=1 o x
1 -1 1 =0
1 1 1 1 + x1 + x2 = 0 x: class I (y = 1)
o: class II (y = -1)
1/10/2017 9

Examples of linearly inseparable classes


x o
- Logical XOR (exclusive OR) function
patterns (bipolar) decision boundary
x1 x2 y
-1 -1 -1 o x
-1 1 1
1 -1 1 x: class I (y = 1)
1 1 -1 o: class II (y = -1)
No line can separate these two classes.
1/10/2017 10

XOR can be solved by a more complex network with hidden


units
1
2
x1 z1 2
0
-2
Y
-2

x2 2
z2 2

(-1, -1) (-1,-1) -1


(-1, 1) (-1, 1) 1
(1, -1) (1, -1) 1
(1, 1) (1, 1) -1
1/10/2017 11
1/10/2017 12
1/10/2017 13

Different non linearly separable problems

Types of Exclusive-OR Classes with Most General


Structure
Decision Regions Problem Meshed regions Region Shapes
Single-Layer Half Plane A B
Bounded By B
A
Hyperplane B A

Two-Layer Convex Open A B


Or B
A
Closed Regions B A

Three-Layer Abitrary
A B
(Complexity B
Limited by No. A
B A
of Nodes)
1/10/2017 14

Can a single neuron learn a task?


1/10/2017 15

Hebb Nets
Hebb, in his influential book The organization of Behavior (1949),
claimed
Behavior changes are primarily due to the changes of synaptic
strengths ( wij ) between neurons I and j
w ij increases only when both I and j are on: the Hebbian
learning law
In ANN, Hebbian law can be stated: w increases only if the
outputs of both units x and y have theij same sign.
i j
In our simple network (one output and n input units)

wij wij (new ) wij (old ) xi y

or, wij wij (new ) wij (old ) xi y


1/10/2017 16

Hebb net (supervised) learning algorithm (p.49)


Step 0. Initialization: b = 0, wi = 0, i = 1 to n
Step 1. For each of the training sample s:t do steps 2 -4
/* s is the input pattern, t the target output of the sample */
Step 2. xi := si, I = 1 to n /* set s to input units */
Step 3. y := t /* set y to the target */
Step 4. wi := wi + xi * y, i = 1 to n /* update weight */
b := b + xi * y /* update bias */

Notes: 1) = 1, 2) each training sample is used only once.


Examples: AND function
Binary units (1, 0)
(x1, x2, 1) y=t w1 w2 b An incorrect boundary:
(1, 1, 1) 1 1 1 1 1 + x1 + x2 = 0
(1, 0, 1) 0 1 1 1 Is learned after using
(0, 1, 1) 0 1 1 1
(0, 0, 1) 0 1 1 1 each sample once

bias unit
1/10/2017 17

Bipolar units (1, -1)

(x1, x2, 1) y=t w1 w2 b


(1, 1, 1) 1 1 1 1
(1, -1, 1) -1 0 2 0 A correct boundary
(-1, 1, 1) -1 1 1 -1 -1 + x1 + x2 = 0
(-1, -1, 1) -1 2 2 -2 is successfully learned
It will fail to learn x1 ^ x2 ^ x3, even though the function is
linearly separable.
Stronger learning methods are needed.
Error driven: for each sample s:t, compute y from s based
on current W and b, then compare y and t
Use training samples repeatedly, and each time only change
weights slightly (<< 1)
Learning methods of Perceptron and Adaline are good
examples
1/10/2017 18

The Perceptron
In 1958, Frank Rosenblatt introduced a training algorithm that
provided the first procedure for training a simple ANN: a
perceptron.

The operation of Rosenblatts perceptron is based on the


McCulloch and Pitts neuron model. The model consists of a
linear combiner followed by a hard limiter.
The weighted sum of the inputs is applied to the hard limiter,
which produces an output equal to +1 if its input is positive and -
1 if it is negative.
1/10/2017 19

Single-layer two-input perceptron

Inputs
x1 Linear Hard
w1 Combiner Limiter
Output
Y
w2

x2
Threshold
1/10/2017 20

The aim of the perceptron is to classify inputs, x1, x2, . . ., xn, into
one of two classes, say A1 and A2.
In the case of an elementary perceptron, the n- dimensional
space is divided by a hyperplane into two decision regions. The
hyperplane is defined by the linearly separable function:

n
xi wi 0
i 1
1/10/2017 21

Linear separability in the perceptron


x2 x2

Class A1

1
2
1
x1
Class A2 x1

x1w1 + x2w2 = 0 x1w1 + x2w2 + x3w3 = 0


x3
(a) Two-input perceptron. (b) Three-input perceptron.
1/10/2017 22

How does the perceptron learn its classification tasks?

This is done by making small adjustments in the weights to reduce


the difference between the actual and desired outputs of the
perceptron.
The initial weights are randomly assigned, usually in the range [-
0.5, 0.5], and then updated to obtain the output consistent with the
training examples.
1/10/2017 23

If at iteration p, the actual output is Y(p) and the desired output


is Yd (p), then the error is given by:

e( p) Yd ( p) Y( p) where p = 1, 2, 3, . . .

Iteration p here refers to the pth training example presented to the


perceptron.
If the error, e(p), is positive, we need to increase perceptron
output Y(p), but if it is negative, we need to decrease Y(p).
1/10/2017 24

The perceptron learning rule

wi ( p 1) wi ( p) a . xi ( p) . e( p)
where p = 1, 2, 3, . . .

is the learning rate, a positive constant less than unity.


The perceptron learning rule was first proposed by Rosenblatt in
1960. Using this rule we can derive the perceptron training
algorithm for classification tasks.
1/10/2017 25

Perceptrons training algorithm


Step 1: Initialization
Set initial weights w1, w2,, wn and threshold to random
numbers in the range [-0.5, 0.5].
If the error, e(p), is positive, we need to increase perceptron
output Y(p), but if it is negative, we need to decrease Y(p).

Step 2: Activation
Activate the perceptron by applying inputs x1(p), x2(p),, xn(p)
and desired output Yd (p). Calculate the actual output at iteration p
=1
n
Y ( p ) step x i ( p ) w i ( p )
i 1
1/10/2017 26

Perceptrons training algorithm (continued)


where n is the number of the perceptron inputs, and step is a
step activation function.

Step 3: Weight training


Update the weights of the perceptron

wi ( p 1) wi ( p) wi ( p)

where wi(p) is the weight correction at iteration p.


The weight correction is computed by the delta rule:

wi ( p) xi ( p) . e( p)
1/10/2017 27

Perceptrons training algorithm (continued)

Step 4: Iteration
Increase iteration p by one, go back to Step 2 and repeat the
process until convergence.
1/10/2017 28

Example of perceptron learning: the logical operation AND


Inputs Desired Initial Actual Error Final
Epoch output weights output weights
x1 x2 Yd w1 w2 Y e w1 w2
1 0 0 0 0.3 0.1 0 0 0.3 0.1
0 1 0 0.3 0.1 0 0 0.3 0.1
1 0 0 0.3 0.1 1 1 0.2 0.1
1 1 1 0.2 0.1 0 1 0.3 0.0
2 0 0 0 0.3 0.0 0 0 0.3 0.0
0 1 0 0.3 0.0 0 0 0.3 0.0
1 0 0 0.3 0.0 1 1 0.2 0.0
1 1 1 0.2 0.0 1 0 0.2 0.0
3 0 0 0 0.2 0.0 0 0 0.2 0.0
0 1 0 0.2 0.0 0 0 0.2 0.0
1 0 0 0.2 0.0 1 1 0.1 0.0
1 1 1 0.1 0.0 0 1 0.2 0.1
4 0 0 0 0.2 0.1 0 0 0.2 0.1
0 1 0 0.2 0.1 0 0 0.2 0.1
1 0 0 0.2 0.1 1 1 0.1 0.1
1 1 1 0.1 0.1 1 0 0.1 0.1
5 0 0 0 0.1 0.1 0 0 0.1 0.1
0 1 0 0.1 0.1 0 0 0.1 0.1
1 0 0 0.1 0.1 0 0 0.1 0.1
1 1 1 0.1 0.1 1 0 0.1 0.1
Threshold: = 0.2; learning rate: = 0.1
1/10/2017 29

Two-dimensional plots of basic logical operations


x2 x2 x2

1 1 1

x1 x1 x1
0 1 0 1 0 1

(a) AND (x1 n x2) (b) OR (x 1 x 2 ) (c) Excl us iv e-OR


(x 1 x2 )

A perceptron can learn the operations AND and OR, but not
Exclusive-OR.
1/10/2017 30

Multilayer neural networks

A multilayer perceptron is a feedforward neural network with


one or more hidden layers.
The network consists of an input layer of source neurons, at
least one middle or hidden layer of computational neurons, and
an output layer of computational neurons.
The input signals are propagated in a forward direction on a
layer-by-layer basis.
1/10/2017 31

Multilayer perceptron with two hidden layers

Output Signals
Input Signals

First Second
Input hidden hidden Output
layer layer layer layer
1/10/2017 32

What does the middle layer hide?


A hidden layer hides its desired output. Neurons in the hidden
layer cannot be observed through the input/output behavior of the
network. There is no obvious way to know what the desired
output of the hidden layer should be.
Commercial ANNs incorporate three and sometimes four layers,
including one or two hidden layers. Each layer can contain from
10 to 1000 neurons. Experimental neural networks may have five
or even six layers, including three or four hidden layers, and
utilize millions of neurons.
1/10/2017 33

Back-propagation neural network


Learning in a multilayer network proceeds the same way as for a
perceptron.
A training set of input patterns is presented to the network.
The network computes its output pattern, and if there is an error - or
in other words a difference between actual and desired output
patterns - the weights are adjusted to reduce this error.
In a back-propagation neural network, the learning algorithm has
two phases.
First, a training input pattern is presented to the network input layer.
The network propagates the input pattern from layer to layer until
the output pattern is generated by the output layer.
1/10/2017 34

If this pattern is different from the desired output, an error is


calculated and then propagated backwards through the network
from the output layer to the input layer. The weights are
modified as the error is propagated.
1/10/2017 35

Three-layer back-propagation neural network


Input signals
1
x1
1 y1
1
2
x2 2 y2
2

i wij j wjk
xi k yk

m
n l yl
xn
Input Hi dden Output
layer layer layer

Error signals
1/10/2017 36

The back-propagation training algorithm

Step 1: Initialization
Set all the weights and threshold levels of the network to
random numbers uniformly distributed inside a small range:

2.4 2.4
,
Fi Fi

where Fi is the total number of inputs of neuron i in the


network. The weight initialization is done on a neuron-by-
neuron basis.
1/10/2017 37

Step2: Activation
Activate the back-propagation neural network by applying inputs
x1(p), x2(p),, xn(p) and desired outputs yd,1(p), yd,2(p),,yd,n(p).
(a) Calculate the actual outputs of the neurons in the hidden
layer:
n
y j ( p ) sigmoid xi ( p ) wij ( p ) j
i 1

where n is the number of inputs of neuron j in the hidden layer,


and sigmoid is the sigmoid activation function.
1/10/2017 38

Step 2 : Activation (continued)

(b) Calculate the actual outputs of the neurons in the output


layer:
m
yk ( p ) sigmoid x jk ( p ) w jk ( p ) k
j 1

where m is the number of inputs of neuron k in the output layer.


1/10/2017 39

Step 3: Weight training


Update the weights in the back-propagation network propagating
backward the errors associated with output neurons.
(a) Calculate the error gradient for the neurons in the output layer:

k ( p) yk ( p) 1 yk ( p) ek ( p)

where ek ( p) yd ,k ( p) yk ( p)

Calculate the weight corrections:

w jk ( p) y j ( p) k ( p)
Update the weights at the output neurons:

w jk ( p 1) w jk ( p) w jk ( p)
1/10/2017 40

Step 3: Weight training (continued)

(b) Calculate the error gradient for the neurons in the hidden
layer:
l
j ( p ) y j ( p ) [1 y j ( p ) ] k ( p) w jk ( p )
k 1
Calculate the weight corrections:

wij ( p) xi ( p) j ( p)

Update the weights at the hidden neurons:


wij ( p 1) wij ( p) wij ( p)
1/10/2017 41

Step 4: Iteration
Increase iteration p by one, go back to Step 2 and repeat the
process until the selected error criterion is satisfied.

As an example, we may consider the three-layer back-propagation


network. Suppose that the network is required to perform logical
operation Exclusive-OR. Recall that a single-layer perceptron
could not do this operation. Now we will apply the three-layer net.
1/10/2017 42

Three-layer network for solving the Exclusive-OR operation


1

3
w13 1
x1 1 3 w35
w23 5

5 y5
w24
x2 2 4 w45
w24
Input 4 Output
layer layer
1
Hiddenlayer
1/10/2017 43

The effect of the threshold applied to a neuron in the hidden or


output layer is represented by its weight, , connected to a fixed
input equal to -1.
The initial weights and threshold levels are set randomly as
follows:
w13 = 0.5, w14 = 0.9, w23 = 0.4, w24 = 1.0, w35 = -1.2, w45 = 1.1, 3 =
0.8, 4 = -0.1 and 5 = 0.3.
1/10/2017 44

We consider a training set where inputs x1 and x2 are equal to 1


and desired output yd,5 is 0. The actual outputs of neurons 3 and 4
in the hidden layer are calculated as

y3 sigmoid ( x1w13 x2w23 3) 1/ 1 e(10.510.410.8) 0.5250

y4 sigmoid ( x1w14 x2w24 4 ) 1/ 1 e (10.911.010.1) 0.8808

Now the actual output of neuron 5 in the output layer is


determined as:

y5 sigmoid( y3w35 y4w45 5) 1/ 1 e(0.52501.20.88081.110.3) 0.5097

Thus, the following error is obtained:


e yd,5 y5 0 0.5097 0.5097
1/10/2017 45

The next step is weight training. To update the weights and


threshold levels in our network, we propagate the error, e, from
the output layer backward to the input layer.
First, we calculate the error gradient for neuron 5 in the output
layer:

5 y5 (1 y5 ) e 0.5097 (1 0.5097) ( 0.5097) 0.1274

Then we determine the weight corrections assuming that the


learning rate parameter, , is equal to 0.1:
w35 y3 5 0.1 0.5250 (0.1274) 0.0067
w45 y4 5 0.1 0.8808 (0.1274) 0.0112
5 ( 1) 5 0.1 (1) (0.1274) 0.0127
1/10/2017 46

Next we calculate the error gradients for neurons 3 and 4 in the


hidden layer:

3 y3(1 y3) 5 w35 0.5250 (1 0.5250) ( 0.1274) ( 1.2) 0.0381

4 y4 (1 y4 ) 5 w45 0.8808 (1 0.8808) ( 0.127 4) 1.1 0.0147

We then determine the weight corrections:


w13 x1 3 0.1 1 0.0381 0.0038
w23 x2 3 0.11 0.0381 0.0038
3 ( 1) 3 0.1 ( 1) 0.0381 0.0038
w14 x1 4 0.11 ( 0.0147) 0.0015
w24 x2 4 0.11 ( 0.0147) 0.0015
4 ( 1) 4 0.1 ( 1) ( 0.0147) 0.0015
1/10/2017 47

At last, we update all weights and threshold:

w13 w13 w13 0.5 0.0038 0.5038


w14 w14 w14 0.9 0.0015 0.8985
w23 w23 w23 0.4 0.0038 0.4038
w24 w24 w24 1.0 0.0015 0.9985
w35 w35 w35 1.2 0.0067 1.2067
w45 w45 w45 1.1 0.0112 1.0888
3 3 3 0.8 0.0038 0.7962
4 4 4 0.1 0.0015 0.0985

5 5 5 0.3 0.0127 0.3127

The training process is repeated until the sum of squared errors


is less than 0.001.
1/10/2017 48

Learning curve for operation Exclusive-OR


Sum-Squared Network Error for 224 Epochs
10 1

10 0
Sum-Squared Error

10 -1

10 -2

10 -3

10 -4
0 50 100 150 200
Epoch
1/10/2017 49

Final results of three-layer network learning

Inputs Desired Actual Sum of


output output squared
x1 x2 yd y5 e errors
1 1 0 0.0155 0.0010
0 1 1 0.9849
1 0 1 0.9849
0 0 0 0.0175
1/10/2017 50

Network represented by McCulloch-Pitts model for solving the


Exclusive-OR operation

+1.5
1
+1.0
x1 1 3 2.0 +0.5
+1.0
5 y5
+1.0
x2 2 +1.0
4
+1.0
+0.5

1
1/10/2017 51

Decision boundaries

x2 x2 x2
x1 + x2 1.5 = 0 x1 + x2 0.5 = 0

1 1 1

x1 x1 x1
0 1 0 1 0 1

(a) (b) (c)

(a) Decision boundary constructed by hidden neuron 3;


(b)Decision boundary constructed by hidden neuron 4;
(c)Decision boundaries constructed by the complete three-layer
network
1/10/2017 52

Pattern Association
and
Associative-Memory
1/10/2017 53

Neural networks were designed on analogy with the brain. The


brains memory, however, works by association.
For example, we can recognize a familiar face even in an
unfamiliar environment within 100-200ms. We can also recall a
complete sensory experience, including sounds and scenes, when
we hear only a few bars of music. The brain routinely associates
one thing with another.
Multilayer neural networks trained with the back-propagation
algorithm are used for pattern recognition problems. However, to
emulate the human memorys associative characteristics we need
a different type of network: a recurrent neural network.
A recurrent neural network has feedback loops from its
outputs to its inputs. The presence of such loops has a profound
impact on the learning capability of the network.
1/10/2017 54

Associative-Memory Networks
Input: Pattern (often noisy/corrupted)
Output: Corresponding pattern (complete / relatively noise-free)
Process
1. Load input pattern onto core group of highly-interconnected neurons.
2. Run core neurons until they reach a steady state.
3. Read output off of the states of the core neurons.

Inputs Outputs

Input: (1 0 1 -1 -1) Output: (1 -1 1 -1 -1)


1/10/2017 55

Associative Network Types


1. Auto-associative: X = Y

*Recognize noisy versions of a pattern

2. Hetero-associative Bidirectional: X <> Y


BAM = Bidirectional Associative Memory

*Iterative correction of input and output


1/10/2017 56

Associative Network Types (2)


3. Hetero-associative Input Correcting: X <> Y

*Input clique is auto-associative => repairs input patterns

4. Hetero-associative Output Correcting: X <> Y

*Output clique is auto-associative => repairs output patterns


1/10/2017 57

Hebbs Rule
Connection Weights ~ Correlations

``When one cell repeatedly assists in firing another, the axon of the first cell
develops synaptic knobs (or enlarges them if they already exist) in contact
with the soma of the second cell. (Hebb, 1949)

In an associative neural net, if we compare two pattern components (e.g. pixels)


within many patterns and find that they are frequently in:
a) the same state, then the arc weight between their NN nodes should be positive
b) different states, then negative

Matrix Memory:

The weights must store the average correlations between all pattern components
across all patterns. A net presented with a partial pattern can then use the
correlations to recreate the entire pattern.
1/10/2017 58

Correlated Field Components


Each component is a small portion of the pattern field (e.g. a pixel).
In the associative neural network, each node represents one field component.
For every pair of components, their values are compared in each of several patterns.
Set weight on arc between the NN nodes for the 2 components ~ avg correlation.

a
a
??
??
b
b

Avg Correlation

wab
a b
1/10/2017 59

Quantifying Hebbs Rule


Compare two nodes to calc a weight change that reflects the state correlation:

w jk i pki pj
Auto-Association: * When the two components are the same (different),
increase (decrease) the weight

Hetero-Association: w jk i pk o pj i = input component


o = output component

Ideally, the weights will record the average correlations across all patterns:
P P
Auto: w jk i pk i pj Hetero: w jk i pk o pj
p 1 p 1

Hebbian Principle: If all the input patterns are known prior to retrieval time,
then init weights as:
1 P 1 P
Auto: w jk i pk i pj Hetero: w jk i pk o pj
P p 1 P p 1
Weights = Average Correlations
1/10/2017 60

Matrix Representation
Let X = matrix of input patterns, where each ROW is a pattern. So xk,i = the ith bit
of the kth pattern.
Let Y = matrix of output patterns, where each ROW is a pattern. So yk,j = the jth
bit of the kth pattern.
Then, avg correlation between input bit i and output bit j across all patterns is:
1/P (x1,iy1,j + x2,iy2,j + + xp,iyp,j) = wi,j

To calculate all weights:


Hetero Assoc: W = XTY
Auto Assoc: W = XTX

X XT Dot product Y

P1 P2 Pp

Out P1: y1,1.. y1,jy1,n


In Pattern 1: x1,1..x1,n
.. Out P2: y2,1.. y2,jy2,n
In Pattern 2: x2,1..x2,n X1,i X2,i Xp,i
:
:
Out P3: yp,1.. yp,j yp,n
In Pattern p: x1,1..x1,n
1/10/2017 61

Auto-Associative Memory
1. Auto-Associative Patterns to Remember 3. Retrieval

Comp/Node value legend:


1 2 1 2 1 2
dark (blue) with x => +1
dark (red) w/o x => -1
3 4 3 4 3 4
light (green) => 0

2. Distributed Storage of All Patterns: 1 2

3 4
-1 1 2

1 1 2

3 4 3 4

1 node per pattern unit


Fully connected: clique 1 2
Weights = avg correlations across
all patterns of the corresponding units 3 4
1/10/2017 62

Hetero-Associative Memory
1. Hetero-Associative Patterns (Pairs) to Remember 3. Retrieval
1 1
a a
2 2
b b
3 3

2. Distributed Storage of All Patterns:


1
-1 a
1 2

b
3

1 node per pattern unit for X & Y


Full inter-layer connection
Weights = avg correlations across
all patterns of the corresponding units
1/10/2017 63

The Hopfield Network


&
Bidirectional Associative Memory
1/10/2017 64

The Hopfield Network


John Hopfield formulated the physical principle of storing
information in a dynamically stable network.
Auto-Association Network
Fully-connected (clique) with symmetric weights
State of node = f(inputs)
Weight values based on Hebbian principle
Performance: Must iterate a bit to converge on a pattern, but
generally much less computation than in back-propagation
networks.
1/10/2017 65

Hopfield Networks

Input Output (after many iterations)


1/10/2017 66

The Hopfield network uses McCulloch and Pitts neurons with


the sign activation function as its computing element:

1, if X 0

Ysign 1, if X 0
Y, if X 0

The current state of the Hopfield network is determined by the
current outputs of all neurons, y1, y2, . . ., yn.
Thus, for a single-layer n-neuron network, the state can be
defined by the state vector as:
y1
y 2
Y


y n
1/10/2017 67

The Hopfield Network


1/10/2017 68
1/10/2017 69
1/10/2017 70

Retrieval Algorithm
the output update rule for Hopfield autoassociative memory can
be expressed in the form

where k is the index of recursion and i is the number of the


neuron currently undergoing an update.

Asynchronous update sequence considered here is random.


Assuming that recursion starts at vo, and a random
sequence of updating neurons m,p, q, ... is chosen, the output
vectors obtained are as follows
1/10/2017 71

Storage Algorithm
In the Hopfield network, synaptic weights between neurons are
usually represented in matrix form.
Assume that the bipolar binary prototype vectors that need to be
stored are S(m) ),for m = 1, 2, . . .,p. The storage algorithm
for calculating the weight matrix is

where p is the number of states to be memorized by the


network, I is n*n identity matrix, and superscript t denotes
matrix transposition.
1/10/2017 72

Possible states for the three-neuron Hopfield network


y2

(1,1, 1) (1, 1, 1)

(1, 1, 1) (1, 1, 1)
y1
0

(1,1,1) (1,1,1)

(1,1, 1) (1,1, 1)
y3
1/10/2017 73

Hopfield Network
The stable state-vertex is determined by the weight matrix W, the
current input vector X, and the threshold matrix q. If the input
vector is partially incorrect or incomplete, the initial state will
converge into the stable state-vertex after a few iterations.

Hopfield Network Example: Suppose, for instance, that our


network is required to memorize two opposite states, (1, 1, 1) and
(-1, -1, -1). Thus,

1 1
Y2 1
or Y T 1 1 1
Y1 1 1 Y2T 1 1 1
1 1

where Y1 and Y2 are the three-dimensional vectors.


1/10/2017 74

The 3 3 identity matrix I is


1 0 0
I 0 1 0
0 0 1

Thus, we can now determine the weight matrix as follows:

1 1 1 0 0 0 2 2
W 1 1 1 1 1 1 1 1 2 0 1 0 2 0 2
1 1 0 0 1 2 2 0

Next, the network is tested by the sequence of input vectors, X1


and X2, which are equal to the output (or target) vectors Y1 and
Y2, respectively.
1/10/2017 75

First, we activate the Hopfield network by applying the input


vector X. Then, we calculate the actual output vector Y, and
finally, we compare the result with the initial input vector X.

0 2 2 1 0 1
1
Y1 sign 2 0 2 1 0
2 0
2
1 0 1

0 2 2 1 0 1

Y2 sign2 0 2 1 0 1
2 2 0 1 0 1

1/10/2017 76

The remaining six states are all unstable. However, stable states
(also called fundamental memories) are capable of attracting
states that are close to them.
The fundamental memory (1, 1, 1) attracts unstable states (-1, 1, 1),
(1, -1, 1) and (1, 1, -1). Each of these unstable states represents a
single error, compared to the fundamental memory (1, 1, 1).
The fundamental memory (-1, -1, -1) attracts unstable states (-1, -1,
1), (-1, 1, -1) and (1, -1, -1).
Thus, the Hopfield network can act as an error correction
network.
1/10/2017 77

Hopfield Network Example


1. Patterns to Remember 3. Build Network
p1 p2 p3
1/3
1 2 1 2 1 2 1 2
1/3 [-]
3 4 3 4 3 4 -1/3
-1/3 [+]
1/3
3 4
-1

4. Enter Test Pattern


1/3
1 2 1/3
-1/3 -1/3
3 4
1/3

-1
+1 0 -1
1/10/2017 78

Hopfield Network Example (2)


5. Synchronous Iteration (update all nodes at once)
Goal: Set weights such that an input vector Vi, yields itself when multiplied by
1/10/2017
the weights, W. 78

X = V1,V2..Vp, where p = # input vectors (i.e., patterns)

So Y=X, and the Hebbian weight calculation is: W = XTY = XTX


1 1 -1
1 1 1 -1 1 1 1
X= 1 1 -1 1 X T= 1 -1 1
-1 1 1 -1 -1 1 -1

3 1 -1 1
XTX = 1 3 1 -1
-1 1 3 -3
1 -1 -3 3
1/10/2017 79

Matrices (2)
The upper and lower triangles of the product matrix represents the 6
weights wi,j = wj,i
Scale the weights by dividing by p (i.e., averaging). Picton (ANN book)
subtracts p from each. Either method is fine, as long we apply the
appropriate thresholds to the output values.
This produces the same weights as in the non-matrix description.
Testing with input = ( 1 0 0 -1)
3 1 -1 1
(1 0 0 -1) 1 3 1 -1 = (2 2 2 -2)
-1 1 3 -3
1 -1 -3 3

Scaling* by p = 3 and using 0 as a threshold gives:


(2/3 2/3 2/3 -2/3) => (1 1 1 -1)
*For illustrative purposes, its easier to scale by p at the end
instead of scaling the entire weight matrix, W, prior to testing.
1/10/2017 80
1/10/2017 81

Asynchronous Iteration (One randomly-chosen node at a time)


1/10/2017 82

Hopfield Network Example (4)


4c. Enter Another Test Pattern

1/3
1 2 1/3
-1/3 -1/3 Asynchronous Updating is central
3 4
1/3 to Hopfields (1982) original model.
-1

5c. Asynchronous Iteration (One randomly-chosen node at a time)


Update 3 Update 4 Update 2
Stable &
1/3 1/3 1/3 Spurious

1/3 1/3 1/3


-1/3 -1/3 -1/3 -1/3 -1/3
-1/3
1/3 1/3 1/3

-1 -1 -1
1/10/2017 83

Hopfield Network Example (5)


4d. Enter Another Test Pattern

1/3
1 2 1/3
-1/3 -1/3
3 4
1/3

-1

5d. Asynchronous Iteration


Update 3 Update 4 Update 2
Stable
1/3 1/3 1/3 Pattern
p3
1/3 1/3 1/3
-1/3 -1/3 -1/3 -1/3 -1/3
-1/3
1/3 1/3 1/3

-1 -1 -1
1/10/2017 84

Hopfield Network Example (6)


4e. Enter Same Test Pattern

1/3
1 2 1/3
-1/3 -1/3
3 4
1/3

-1

5e. Asynchronous Iteration (but in different order)


Update 2 Update 3 or 4 (No change)

1/3 1/3
1/3 1/3 Stable &
-1/3 -1/3 Spurious
-1/3 -1/3
1/3 1/3

-1 -1
1/10/2017 85
1/10/2017 86
1/10/2017 87
1/10/2017 88

Bidirectional Associative Memory (BAM)

The Hopfield network represents an autoassociative type of


memory - it can retrieve a corrupted or incomplete memory but
cannot associate this memory with another different memory.
Human memory is essentially associative. One thing may remind
us of another, and that of another, and so on. We use a chain of
mental associations to recover a lost memory. If we forget where
we left an umbrella, we try to recall where we last had it, what
we were doing, and who we were talking to. We attempt to
establish a chain of associations, and thereby to restore a lost
memory.
1/10/2017 89

To associate one memory with another, we need a recurrent neural


network capable of accepting an input pattern on one set of
neurons and producing a related, but different, output pattern on
another set of neurons.
Bidirectional Associative Memory (BAM), first proposed by
Bart Kosko, is a heteroassociative network. It associates patterns
from one set, set A, to patterns from another set, set B, and vice
versa. Like a Hopfield network, the BAM can generalize and also
produce correct outputs despite corrupted or incomplete inputs.
1/10/2017 90

BAM operation

x1(p) 1 x1(p+1) 1

1 y1(p) 1 y1(p)
x2 (p) 2 x2(p+1) 2
2 y2(p) 2 y2(p)

xi (p)
j yj(p) j yj(p)
i xi(p+1) i

m ym(p) m ym(p)
xn(p) n xn(p+1) n
Input Output Input Output
layer layer layer layer
(a) Forward direction. (b) Backward direction.
1/10/2017 91

The basic idea behind the BAM is to store pattern pairs so that
when n-dimensional vector X from set A is presented as input, the
BAM recalls m-dimensional vector Y from set B, but when Y is
presented as input, the BAM recalls X.

To develop the BAM, we need to create a correlation matrix for


each pattern pair we want to store. The correlation matrix is the
matrix product of the input vector X, and the transpose of the
output vector YT. The BAM weight matrix is the sum of all
correlation matrices, that is,
M
W m m
X Y T

m1

where M is the number of pattern pairs to be stored in the BAM.


1/10/2017 92

Setting the Weights :

The weight matrix to store a set of inputs and target vectors


s(p):t(p),p =1,,p, where

Can be determined by the Hebb rule. The formulas for the entries
depend on whether the training vectors are binary or bipolar.
For binary input vectors :

For bipolar input vectors :


1/10/2017 93

Discrete BAM activation functions:


1/10/2017 94

Algothims for Discrete BAM


1/10/2017 95
1/10/2017 96

Application:
Example A BAM net to associate letters with simple bipolar codes.
1/10/2017 97
1/10/2017 98
1/10/2017 99

You might also like