Presentation Thesis

Storing Capacity of the Hopfield Network
Sachin H B
IISER Kolkata
November 24, 2016
Supervisor : Dr. Anandamohan Ghosh
Artificial Neural Network | Introduction

Motivation
Inspired from biological neuron
Biological Neural network(Neural Pathway)

I
Its a series of interconnected neurons

whose activation defines a recognizable
linear pathway
The interface through which neurons

interact with their neighbors usually
consists of several axon terminals
connected via synapses to dendrites on
other neurons
If the sum of the input signals into one

neuron surpasses a certain threshold, the
neuron sends an action potential at the
axon hillock and transmits this electrical
signal along the axon
Artificial Neuron | Mathematical Model

Perceptron
I
Computational model that tries to mimic the

biological neuron
Weighted summation for each of the input
A action of a (non-Linear) function on the

performed weighted summation
Weights and Biases

I
Weights determine the level of importance for a

particular input
Biases determine the ease with which a neuron

can fire or respond to give an output above
some threshold
Its like a device that makes decision by weighing
up the evidences
Human Memory
I
Memory in a computer is stored at an address. Recalling it

without complete address/information is not possible
Biological memory/Human memory is content addressable
There is no location in the neural network in the brain for a

particular memory, say of an individual
Rather, the memory of the individual is retrieved by a string of

associations about the physical features, personality
characteristics and social relations of that individual, which
are dealt with by different parts of the brain
Human beings are also able to fully recall a memory by just

remembering only particular aspects or features of that
memory
The Hopfield Model
Hopfield Network is an artificial neural network proposed by

John Hopfield to store and retrieve memory like a human brain
The state of the neuron is either +1(firing at maximum rate)

or -1/0(not firing)
The network is initially trained to store a number of

patterns/memories
It is then able to retrieve the pattern if we input a distorted

version of the pattern
It works as an associative memory, just like the human brain
The Hopfield Model | Architecture
Organization of the network

I
It is a single layer fully connected feedback

network, i.e., every neuron is connected to
every other neuron but no self-connection
It has a total N neurons with values at

each node having binary values xi = 1
or xi = 0 or 1
Given two neurons i and j there is a

connectivity weight between them which
is symmetric Wij = Wji and Wii = 0
Training the Network
How do we learn the weights

I
Inspired from biological learning rule, the Hebb Rule
We would like to store the pattern V = (x1 , x2 , ......, xN )

xi [1, 1], where N is the dimension of the network
Then the weights are

Wij = xi xj
We choose =
1
N
, the learning rate
The Hebb Rule
How are neurons connected in the network
When an axon of cell A is near enough to excite a cell B and

repeatedly or persistently takes part in firing it, some growth
process or metabolic change takes place in one or both cells
such that As efficiency, as one of the cells firing B, is
increased.
Neurons that fire together, wire together. Neurons that fire

out of sync, fail to link.
Updating
How do we update the network
I
I
I
I
I
I
Asynchronously: At each point in time, update one node

Synchronously: Every time, update all nodes together.
Asynchronous updating is more biologically realistic
We update one node at a time
The selection of nodes might be in random or in a given
sequence
The state is updated accordingly,
hi =
N
X
Wij xj
then
(
Vi =
I
1,
0,
hi 0
hi < 0
You keep doing this until the system is in a stable state
Energy of the Network

I
I
Hopfield networks have energy function that decreases or is

unchanged with asynchronous updating
The energy for each node is
1
E = hi xi
2
Then the energy for the whole network is
E (x ) =
X
i
Ei =
X1
i
hi xi = hi =
N
1X
Wij xi xj
2 j
Hopfield Model | An Example

I
Lets say we have a 5 node hopfield net, and want to

recognize the pattern (0, 1, 1, 0, 1)
Since there are 5 nodes in the net, we need a matrix of 5 x 5

weights
0 1 1 1 1
1 0
1 1 1
0 1 1
W = 1 1
1 1 1 0 1
1 1
1 1 0
I
Now we want to see if it can recognize (0, 1, 1, 1, 1) which

differs from the original pattern by only 1 bit
Hopfield Model | Example

We shall start updating the pattern (0, 1, 1, 1, 1) using the weight
P
matrix using Vi in = j Wij Vj and Vi > 1 if Vi in 0 else 0
We shall update the nodes in random
I
I
I
I
I
I
I
I
I
I
I
I
Take the node 4

V4 in = (1 1) + (1 1) + (1 1) + (1 1) = 2
since 2 < 0 V4 = 0 update = (1,1,1,0,1) ( it didnt change)
Take the update as input now
V2 in = 1
since 1 > 0 V2 = 1 update = (1,1,1,0,1) (it didnt change)
V1 in = 3
since 3 < 0 V2 = 0 update = (0,1,1,0,1) (it changed)
V4 in = 3
since 3 < 0 V4 = 1 update = (0,1,1,0,1) (it didnt change)
V5 in = 2
since 2 > 0 V5 = 1 update = (0,1,1,0,1) (it didnt change)
So it converged to (0,1,1,0,1), the pattern we initially stored
Storing a Single Pattern | The problem
We shall now try to store a single pattern V = [V0 , V1 , ....VN ]
The condition for V to be a fixed point of the network is

Vi = sgn(
Wij Vj ) i [1, 2..., N]
j
I
If the network is initialized with pattern S = [S1 , S2 , ...SN ]

close but not identical to V , the network will still converge to
V
Let the number of nodes different in S be n
But how does the success of retrieval change with the varying
n
Storing a Single Pattern | Varying n
Storing a Single Pattern | Varying n for different

Dimensions
Storing Multiple Patterns

More generally, we have many patterns to store, but how do we
learn those patterns?
I
I
The trick is to calculate the weight matrix for one pattern as

if the other pattern does not exist
Suppose we wish to store V (1) and V (2) , then
1 (1) (1)
(1)
Wij = xi xj
N
and
1 (2) (2)
(2)
Wij = xi xj
N
and add them
(1)
(2)
Wij = Wij + Wij
for patterns the generalized Hebb rule becomes
()
Wij
1 X (k) (k)
=
x x
N k=1 i j
where xik is the value of node i in k th pattern
Multiple Patterns | Stability of Stored Patterns

I
We will have a fixed point V l for each l iff sgn(hil ) = xil for all
1 i N , where
hil
N
X
(l)
Wij xi
j=1
N X
1 X
() () (l)
xi xj xj
=
N j=1 k=1
considering the case = l

hil
xik
N X
1 X
() () (l)
+
x x x
N j=1 6=l i j j
the second term is called the crosstalk term

I
The crucial idea is that if crosstalk term() is smaller than 1,

it will not affect the sign of hl and hence we can conclude
that V is a fixed point of the network dynamics
Multiple Patterns | Stability of Stored Patterns
is a random variable that is equal to the sum of about NP

random variables that are equally likely to be 1
The cross talk term has a distribution with mean 0 and

varaiance N
As long as the ratio N is small enough the patterns are stable

points of the network
Multiple Patterns | Varying n

But how does the success of retrieval change with the varying n
Success vs Changing N for dimension, N = 35
Multiple Patterns | Varying n for different Dimensions

Similarly how does it change for different dimensions
Success vs Changing N for different dimensions, N
Multiple Pattern | Varying n for different Dimensions

Similarly how does it change for different dimensions
Success vs Changing N for different dimensions, N
Multiple Pattern | Varying n for different Dimensions
Success vs Dimension for n = N
Although the success rate for storing different number of

pattern decreases as we move up in , the behavior of the
graphs still remain the same
Storage Capacity of the Network
Similarly how many patterns can we store before the error becomes
too high
I
Hopfield in the original paper mentions than experimentally

that the number around 0.15N
i.e.,
0.15
N
The other states are not stored as stable patterns and might
be difficult to retrieve
Storage Capacity of the Network | Learning
Success vs Number of patterns N
I
I
There is an exponential decay

The number of patterns roughly stored is around 15-20 ?
Storage Capacity of the Network | Learning
Success vs Number of patterns N for different dimensions
Spurious States
I
I
I
I
Apart from the attractors, we also have so called spurious

states
Any linear combination of an odd number of stored patterns
give rise to the so-called mixture states
For large , we get local minima that are not correlated to
any linear combination of stored patterns.
starting close to any of the these spurious attractors, it will
converge to them
Energy Landscape
Strong Patterns
How can we increase the success of recalling a pattern ?
I
We know that in human memory , it becomes stronger when

learned multiple times. A pattern learned many times is a
strong pattern
The degree/number of times the pattern is learned is called

the multiplicity of the pattern
Strong patterns are strongly stable and have large basins of

attraction
It seems that a strong pattern with multiplicity d can be

retrieved in the presence of up to N 0.138 d 2 .
For d = 3 and dimension N = 100, the stored patterns can be

100 0.138 32 = 124. This is massive increase from just
storing 0.15N, which is around 15 simple pattern.
Strong Patterns | Learning
Strong Patterns N = 110, d =3 , d = 7 and d1 = 4 d2 = 3
Strong Patterns, d =3 for different dimensions
Strong Patterns, d =7 for different dimensions
Qualitatively the graphs are the same but with higher success
rate.Why?
Success vs Changing n for N = 100 and = 100
Success vs Changing n for N = 100 and = 120
What can we infer?

I
I
I
More the time we learn, higher the success rate

Less distorted the input pattern, higher is the success rate
More the number of pattern we try to memorize, lesser the
success rate
Sequential Pattern Recognition

Untill now we have been storing only static patterns
I
Each sequence of patterns is stored as trajectory attractors in

the energy landscape
Energy landscape of sequential pattern
Assume m pattern vectors S 0 , S 1 , ....S m1 ,each

S = (s1 , ....., sn ) , si = 1 to be orthogonal to each other,
or S T S = 0 for all 6= .

I
Then a sequence S 0 S 1 ..... S m1 is stored in the

network by setting
W =
X
1 m2
S +1 S T
N
When S 0 or a similar pattern is given as an initial state X (0),

X (1) = sgn(
X
1 m2
(S T X (0))S +1 ) = S 1
n =0
that is S 1 is recalled at t = 1
Similarly X (2) = S 2 , X (3) = S 3 ...., or the stored sequence is

recalled
Future Work
I
I
I
I
Try to implement the cross-correlation matrix memory method

and see how many pattern can be stored in a sequence
The cross-correlation matrix memory method is not one of the
best methods for sequential pattern recognition as the
capacity is only around 0.1N
Also it may not work if the initial condition is not the first
pattern that we stored
So, try to study other models which are much better
Static Pattern Recognition

I
I
I
Are there better models that can store the patterns much
efficiently
Why? The memory is quite less in hopfield model
The computing time to get the statistics is quite high
References

I
David J. C. Mackay, Information Theory , Inference and

Learning Algorithms, Cambridge University Press
J. J. Hopfield, (1982)Neural Networks and Physical Systems

with Emergent Collective Computational Abilities
Donq-Liang Lee, (2002) Pattern Sequence Recognition Using

a Time-Varying Hopfield Network
Masahiko Morita,(1996) Memory and Learning of Sequential

Patterns by Nonmonotone Neural Networks
Thank You.

Presentation Thesis

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Presentation Thesis

Uploaded by

Copyright:

Available Formats

Storing Capacity of the Hopfield Network

November 24, 2016

Supervisor : Dr. Anandamohan Ghosh

Artificial Neural Network | Introduction

Biological Neural network(Neural Pathway)

Its a series of interconnected neurons

The interface through which neurons

If the sum of the input signals into one

Artificial Neuron | Mathematical Model

Computational model that tries to mimic the

Weighted summation for each of the input

A action of a (non-Linear) function on the

Weights and Biases

Weights determine the level of importance for a

Biases determine the ease with which a neuron

Memory in a computer is stored at an address. Recalling it

Biological memory/Human memory is content addressable

There is no location in the neural network in the brain for a

Rather, the memory of the individual is retrieved by a string of

Human beings are also able to fully recall a memory by just

The Hopfield Model

Hopfield Network is an artificial neural network proposed by

The state of the neuron is either +1(firing at maximum rate)

The network is initially trained to store a number of

It is then able to retrieve the pattern if we input a distorted

It works as an associative memory, just like the human brain

The Hopfield Model | Architecture

Organization of the network

It is a single layer fully connected feedback

It has a total N neurons with values at

Given two neurons i and j there is a

Training the Network

How do we learn the weights

Inspired from biological learning rule, the Hebb Rule

We would like to store the pattern V = (x1 , x2 , ......, xN )

Then the weights are

, the learning rate

The Hebb Rule

How are neurons connected in the network

When an axon of cell A is near enough to excite a cell B and

Neurons that fire together, wire together. Neurons that fire

Asynchronously: At each point in time, update one node

You keep doing this until the system is in a stable state

Energy of the Network

Hopfield networks have energy function that decreases or is

Hopfield Model | An Example

Lets say we have a 5 node hopfield net, and want to

Since there are 5 nodes in the net, we need a matrix of 5 x 5

Now we want to see if it can recognize (0, 1, 1, 1, 1) which

Hopfield Model | Example

Take the node 4

Storing a Single Pattern | The problem

We shall now try to store a single pattern V = [V0 , V1 , ....VN ]

The condition for V to be a fixed point of the network is

Wij Vj ) i [1, 2..., N]

If the network is initialized with pattern S = [S1 , S2 , ...SN ]

Let the number of nodes different in S be n

Storing a Single Pattern | Varying n

Storing a Single Pattern | Varying n for different

Storing Multiple Patterns

The trick is to calculate the weight matrix for one pattern as

where xik is the value of node i in k th pattern

Multiple Patterns | Stability of Stored Patterns