You are on page 1of 37

Storing Capacity of the Hopfield Network

Sachin H B
IISER Kolkata

November 24, 2016

Supervisor : Dr. Anandamohan Ghosh

Artificial Neural Network | Introduction


Motivation
Inspired from biological neuron

Biological Neural network(Neural Pathway)


I

Its a series of interconnected neurons


whose activation defines a recognizable
linear pathway

The interface through which neurons


interact with their neighbors usually
consists of several axon terminals
connected via synapses to dendrites on
other neurons

If the sum of the input signals into one


neuron surpasses a certain threshold, the
neuron sends an action potential at the
axon hillock and transmits this electrical
signal along the axon

Artificial Neuron | Mathematical Model


Perceptron
I

Computational model that tries to mimic the


biological neuron

Weighted summation for each of the input

A action of a (non-Linear) function on the


performed weighted summation

Weights and Biases


I

Weights determine the level of importance for a


particular input

Biases determine the ease with which a neuron


can fire or respond to give an output above
some threshold
Its like a device that makes decision by weighing
up the evidences

Human Memory
I

Memory in a computer is stored at an address. Recalling it


without complete address/information is not possible

Biological memory/Human memory is content addressable

There is no location in the neural network in the brain for a


particular memory, say of an individual

Rather, the memory of the individual is retrieved by a string of


associations about the physical features, personality
characteristics and social relations of that individual, which
are dealt with by different parts of the brain

Human beings are also able to fully recall a memory by just


remembering only particular aspects or features of that
memory

The Hopfield Model

Hopfield Network is an artificial neural network proposed by


John Hopfield to store and retrieve memory like a human brain

The state of the neuron is either +1(firing at maximum rate)


or -1/0(not firing)

The network is initially trained to store a number of


patterns/memories

It is then able to retrieve the pattern if we input a distorted


version of the pattern

It works as an associative memory, just like the human brain

The Hopfield Model | Architecture

Organization of the network


I

It is a single layer fully connected feedback


network, i.e., every neuron is connected to
every other neuron but no self-connection

It has a total N neurons with values at


each node having binary values xi = 1
or xi = 0 or 1

Given two neurons i and j there is a


connectivity weight between them which
is symmetric Wij = Wji and Wii = 0

Training the Network

How do we learn the weights


I

Inspired from biological learning rule, the Hebb Rule

We would like to store the pattern V = (x1 , x2 , ......, xN )


xi [1, 1], where N is the dimension of the network

Then the weights are


Wij = xi xj

We choose =

1
N

, the learning rate

The Hebb Rule

How are neurons connected in the network

When an axon of cell A is near enough to excite a cell B and


repeatedly or persistently takes part in firing it, some growth
process or metabolic change takes place in one or both cells
such that As efficiency, as one of the cells firing B, is
increased.

Neurons that fire together, wire together. Neurons that fire


out of sync, fail to link.

Updating
How do we update the network
I
I
I
I
I
I

Asynchronously: At each point in time, update one node


Synchronously: Every time, update all nodes together.
Asynchronous updating is more biologically realistic
We update one node at a time
The selection of nodes might be in random or in a given
sequence
The state is updated accordingly,
hi =

N
X

Wij xj

then
(

Vi =
I

1,
0,

hi 0
hi < 0

You keep doing this until the system is in a stable state

Energy of the Network


I
I

Hopfield networks have energy function that decreases or is


unchanged with asynchronous updating
The energy for each node is
1
E = hi xi
2
Then the energy for the whole network is
E (x ) =

X
i

Ei =

X1
i

hi xi = hi =

N
1X
Wij xi xj
2 j

Hopfield Model | An Example


I

Lets say we have a 5 node hopfield net, and want to


recognize the pattern (0, 1, 1, 0, 1)

Since there are 5 nodes in the net, we need a matrix of 5 x 5


weights

0 1 1 1 1
1 0
1 1 1

0 1 1
W = 1 1

1 1 1 0 1
1 1
1 1 0
I

Now we want to see if it can recognize (0, 1, 1, 1, 1) which


differs from the original pattern by only 1 bit

Hopfield Model | Example


We shall start updating the pattern (0, 1, 1, 1, 1) using the weight
P
matrix using Vi in = j Wij Vj and Vi > 1 if Vi in 0 else 0
We shall update the nodes in random
I
I
I
I
I
I
I
I
I
I
I
I

Take the node 4


V4 in = (1 1) + (1 1) + (1 1) + (1 1) = 2
since 2 < 0 V4 = 0 update = (1,1,1,0,1) ( it didnt change)
Take the update as input now
V2 in = 1
since 1 > 0 V2 = 1 update = (1,1,1,0,1) (it didnt change)
V1 in = 3
since 3 < 0 V2 = 0 update = (0,1,1,0,1) (it changed)
V4 in = 3
since 3 < 0 V4 = 1 update = (0,1,1,0,1) (it didnt change)
V5 in = 2
since 2 > 0 V5 = 1 update = (0,1,1,0,1) (it didnt change)
So it converged to (0,1,1,0,1), the pattern we initially stored

Storing a Single Pattern | The problem

We shall now try to store a single pattern V = [V0 , V1 , ....VN ]

The condition for V to be a fixed point of the network is


Vi = sgn(

Wij Vj ) i [1, 2..., N]

j
I

If the network is initialized with pattern S = [S1 , S2 , ...SN ]


close but not identical to V , the network will still converge to
V

Let the number of nodes different in S be n

But how does the success of retrieval change with the varying
n

Storing a Single Pattern | Varying n

Storing a Single Pattern | Varying n for different


Dimensions

Storing Multiple Patterns


More generally, we have many patterns to store, but how do we
learn those patterns?
I
I

The trick is to calculate the weight matrix for one pattern as


if the other pattern does not exist
Suppose we wish to store V (1) and V (2) , then
1 (1) (1)
(1)
Wij = xi xj
N
and
1 (2) (2)
(2)
Wij = xi xj
N
and add them
(1)
(2)
Wij = Wij + Wij
for patterns the generalized Hebb rule becomes
()
Wij

1 X (k) (k)
=
x x
N k=1 i j

where xik is the value of node i in k th pattern

Multiple Patterns | Stability of Stored Patterns


I

We will have a fixed point V l for each l iff sgn(hil ) = xil for all
1 i N , where

hil

N
X

(l)
Wij xi

j=1

N X
1 X
() () (l)
xi xj xj
=
N j=1 k=1

considering the case = l


hil

xik

N X
1 X
() () (l)
+
x x x
N j=1 6=l i j j

the second term is called the crosstalk term


I

The crucial idea is that if crosstalk term() is smaller than 1,


it will not affect the sign of hl and hence we can conclude
that V is a fixed point of the network dynamics

Multiple Patterns | Stability of Stored Patterns

is a random variable that is equal to the sum of about NP


random variables that are equally likely to be 1

The cross talk term has a distribution with mean 0 and


varaiance N

As long as the ratio N is small enough the patterns are stable


points of the network

Multiple Patterns | Varying n


But how does the success of retrieval change with the varying n

Success vs Changing N for dimension, N = 35

Multiple Patterns | Varying n for different Dimensions


Similarly how does it change for different dimensions

Success vs Changing N for different dimensions, N

Multiple Pattern | Varying n for different Dimensions


Similarly how does it change for different dimensions

Success vs Changing N for different dimensions, N

Multiple Pattern | Varying n for different Dimensions

Success vs Dimension for n = N

Although the success rate for storing different number of


pattern decreases as we move up in , the behavior of the
graphs still remain the same

Storage Capacity of the Network

Similarly how many patterns can we store before the error becomes
too high
I

Hopfield in the original paper mentions than experimentally


that the number around 0.15N

i.e.,

0.15
N
The other states are not stored as stable patterns and might
be difficult to retrieve

Storage Capacity of the Network | Learning

Success vs Number of patterns N

I
I

There is an exponential decay


The number of patterns roughly stored is around 15-20 ?

Storage Capacity of the Network | Learning

Success vs Number of patterns N for different dimensions

Spurious States
I
I
I
I

Apart from the attractors, we also have so called spurious


states
Any linear combination of an odd number of stored patterns
give rise to the so-called mixture states
For large , we get local minima that are not correlated to
any linear combination of stored patterns.
starting close to any of the these spurious attractors, it will
converge to them

Energy Landscape

Strong Patterns
How can we increase the success of recalling a pattern ?
I

We know that in human memory , it becomes stronger when


learned multiple times. A pattern learned many times is a
strong pattern

The degree/number of times the pattern is learned is called


the multiplicity of the pattern

Strong patterns are strongly stable and have large basins of


attraction

It seems that a strong pattern with multiplicity d can be


retrieved in the presence of up to N 0.138 d 2 .

For d = 3 and dimension N = 100, the stored patterns can be


100 0.138 32 = 124. This is massive increase from just
storing 0.15N, which is around 15 simple pattern.

Strong Patterns | Learning

Strong Patterns N = 110, d =3 , d = 7 and d1 = 4 d2 = 3

Strong Patterns | Learning

Strong Patterns, d =3 for different dimensions

Strong Patterns | Learning

Strong Patterns, d =7 for different dimensions

Qualitatively the graphs are the same but with higher success
rate.Why?

Strong Patterns | Learning

Success vs Changing n for N = 100 and = 100

Strong Patterns | Learning

Success vs Changing n for N = 100 and = 120

What can we infer?


I
I
I

More the time we learn, higher the success rate


Less distorted the input pattern, higher is the success rate
More the number of pattern we try to memorize, lesser the
success rate

Sequential Pattern Recognition


Untill now we have been storing only static patterns
I

Each sequence of patterns is stored as trajectory attractors in


the energy landscape

Energy landscape of sequential pattern

Assume m pattern vectors S 0 , S 1 , ....S m1 ,each


S = (s1 , ....., sn ) , si = 1 to be orthogonal to each other,
or S T S = 0 for all 6= .

Sequential Pattern Recognition


I

Then a sequence S 0 S 1 ..... S m1 is stored in the


network by setting
W =

X
1 m2
S +1 S T
N

When S 0 or a similar pattern is given as an initial state X (0),


X (1) = sgn(

X
1 m2
(S T X (0))S +1 ) = S 1
n =0

that is S 1 is recalled at t = 1

Similarly X (2) = S 2 , X (3) = S 3 ...., or the stored sequence is


recalled

Future Work
Sequential Pattern Recognition
I
I

I
I

Try to implement the cross-correlation matrix memory method


and see how many pattern can be stored in a sequence
The cross-correlation matrix memory method is not one of the
best methods for sequential pattern recognition as the
capacity is only around 0.1N
Also it may not work if the initial condition is not the first
pattern that we stored
So, try to study other models which are much better

Static Pattern Recognition


I
I
I

Are there better models that can store the patterns much
efficiently
Why? The memory is quite less in hopfield model
The computing time to get the statistics is quite high

References

Sequential Pattern Recognition


I

David J. C. Mackay, Information Theory , Inference and


Learning Algorithms, Cambridge University Press

J. J. Hopfield, (1982)Neural Networks and Physical Systems


with Emergent Collective Computational Abilities

Donq-Liang Lee, (2002) Pattern Sequence Recognition Using


a Time-Varying Hopfield Network

Masahiko Morita,(1996) Memory and Learning of Sequential


Patterns by Nonmonotone Neural Networks

Thank You.

You might also like