Professional Documents
Culture Documents
Neural Networks
for
Pattern Classification
1/10/2017 2
General discussion
Pattern recognition
Patterns: images, personal records, driving habits, etc.
Represented as a vector of features (encoded as integers or real
numbers in NN)
Pattern classification:
Classify a pattern to one of the given classes
Form pattern classes
Pattern associative recall
Using a pattern to recall a related pattern
Pattern completion: using a partial pattern to recall the whole
pattern
Pattern recovery: deals with noise, distortion, missing
information
1/10/2017 4
General architecture
1
Single layer
x w1 b
1
wn Y
n x
net input to Y: net b x i w i n
i 1
Decision region/boundary
x2 +
n = 2, b != 0, q = 0
b x1w1 x 2 w 2 0 or -
w1 b
x2 x1 x1
w2 w2
is a line, called decision boundary, which partitions the plane into
two decision regions
If a point/pattern ( x1 , x 2 ) is in the positive region, then
b x1w1 x2 w2 0 , and the output is one (belongs to class
one)
This simple network thus can classify a given pattern into one of
the two classes, provided one of these two classes is entirely in
one decision region (one side of the decision boundary) and the
other class is in another region.
x2 2
z2 2
Three-Layer Abitrary
A B
(Complexity B
Limited by No. A
B A
of Nodes)
1/10/2017 14
Hebb Nets
Hebb, in his influential book The organization of Behavior (1949),
claimed
Behavior changes are primarily due to the changes of synaptic
strengths ( wij ) between neurons I and j
w ij increases only when both I and j are on: the Hebbian
learning law
In ANN, Hebbian law can be stated: w increases only if the
outputs of both units x and y have theij same sign.
i j
In our simple network (one output and n input units)
bias unit
1/10/2017 17
The Perceptron
In 1958, Frank Rosenblatt introduced a training algorithm that
provided the first procedure for training a simple ANN: a
perceptron.
Inputs
x1 Linear Hard
w1 Combiner Limiter
Output
Y
w2
x2
Threshold
1/10/2017 20
The aim of the perceptron is to classify inputs, x1, x2, . . ., xn, into
one of two classes, say A1 and A2.
In the case of an elementary perceptron, the n- dimensional
space is divided by a hyperplane into two decision regions. The
hyperplane is defined by the linearly separable function:
n
xi wi 0
i 1
1/10/2017 21
Class A1
1
2
1
x1
Class A2 x1
e( p) Yd ( p) Y( p) where p = 1, 2, 3, . . .
wi ( p 1) wi ( p) a . xi ( p) . e( p)
where p = 1, 2, 3, . . .
Step 2: Activation
Activate the perceptron by applying inputs x1(p), x2(p),, xn(p)
and desired output Yd (p). Calculate the actual output at iteration p
=1
n
Y ( p ) step x i ( p ) w i ( p )
i 1
1/10/2017 26
wi ( p 1) wi ( p) wi ( p)
wi ( p) xi ( p) . e( p)
1/10/2017 27
Step 4: Iteration
Increase iteration p by one, go back to Step 2 and repeat the
process until convergence.
1/10/2017 28
1 1 1
x1 x1 x1
0 1 0 1 0 1
A perceptron can learn the operations AND and OR, but not
Exclusive-OR.
1/10/2017 30
Output Signals
Input Signals
First Second
Input hidden hidden Output
layer layer layer layer
1/10/2017 32
i wij j wjk
xi k yk
m
n l yl
xn
Input Hi dden Output
layer layer layer
Error signals
1/10/2017 36
Step 1: Initialization
Set all the weights and threshold levels of the network to
random numbers uniformly distributed inside a small range:
2.4 2.4
,
Fi Fi
Step2: Activation
Activate the back-propagation neural network by applying inputs
x1(p), x2(p),, xn(p) and desired outputs yd,1(p), yd,2(p),,yd,n(p).
(a) Calculate the actual outputs of the neurons in the hidden
layer:
n
y j ( p ) sigmoid xi ( p ) wij ( p ) j
i 1
k ( p) yk ( p) 1 yk ( p) ek ( p)
where ek ( p) yd ,k ( p) yk ( p)
w jk ( p) y j ( p) k ( p)
Update the weights at the output neurons:
w jk ( p 1) w jk ( p) w jk ( p)
1/10/2017 40
(b) Calculate the error gradient for the neurons in the hidden
layer:
l
j ( p ) y j ( p ) [1 y j ( p ) ] k ( p) w jk ( p )
k 1
Calculate the weight corrections:
wij ( p) xi ( p) j ( p)
Step 4: Iteration
Increase iteration p by one, go back to Step 2 and repeat the
process until the selected error criterion is satisfied.
3
w13 1
x1 1 3 w35
w23 5
5 y5
w24
x2 2 4 w45
w24
Input 4 Output
layer layer
1
Hiddenlayer
1/10/2017 43
10 0
Sum-Squared Error
10 -1
10 -2
10 -3
10 -4
0 50 100 150 200
Epoch
1/10/2017 49
+1.5
1
+1.0
x1 1 3 2.0 +0.5
+1.0
5 y5
+1.0
x2 2 +1.0
4
+1.0
+0.5
1
1/10/2017 51
Decision boundaries
x2 x2 x2
x1 + x2 1.5 = 0 x1 + x2 0.5 = 0
1 1 1
x1 x1 x1
0 1 0 1 0 1
Pattern Association
and
Associative-Memory
1/10/2017 53
Associative-Memory Networks
Input: Pattern (often noisy/corrupted)
Output: Corresponding pattern (complete / relatively noise-free)
Process
1. Load input pattern onto core group of highly-interconnected neurons.
2. Run core neurons until they reach a steady state.
3. Read output off of the states of the core neurons.
Inputs Outputs
Hebbs Rule
Connection Weights ~ Correlations
``When one cell repeatedly assists in firing another, the axon of the first cell
develops synaptic knobs (or enlarges them if they already exist) in contact
with the soma of the second cell. (Hebb, 1949)
Matrix Memory:
The weights must store the average correlations between all pattern components
across all patterns. A net presented with a partial pattern can then use the
correlations to recreate the entire pattern.
1/10/2017 58
a
a
??
??
b
b
Avg Correlation
wab
a b
1/10/2017 59
w jk i pki pj
Auto-Association: * When the two components are the same (different),
increase (decrease) the weight
Ideally, the weights will record the average correlations across all patterns:
P P
Auto: w jk i pk i pj Hetero: w jk i pk o pj
p 1 p 1
Hebbian Principle: If all the input patterns are known prior to retrieval time,
then init weights as:
1 P 1 P
Auto: w jk i pk i pj Hetero: w jk i pk o pj
P p 1 P p 1
Weights = Average Correlations
1/10/2017 60
Matrix Representation
Let X = matrix of input patterns, where each ROW is a pattern. So xk,i = the ith bit
of the kth pattern.
Let Y = matrix of output patterns, where each ROW is a pattern. So yk,j = the jth
bit of the kth pattern.
Then, avg correlation between input bit i and output bit j across all patterns is:
1/P (x1,iy1,j + x2,iy2,j + + xp,iyp,j) = wi,j
X XT Dot product Y
P1 P2 Pp
Auto-Associative Memory
1. Auto-Associative Patterns to Remember 3. Retrieval
3 4
-1 1 2
1 1 2
3 4 3 4
Hetero-Associative Memory
1. Hetero-Associative Patterns (Pairs) to Remember 3. Retrieval
1 1
a a
2 2
b b
3 3
b
3
Hopfield Networks
1, if X 0
Ysign 1, if X 0
Y, if X 0
The current state of the Hopfield network is determined by the
current outputs of all neurons, y1, y2, . . ., yn.
Thus, for a single-layer n-neuron network, the state can be
defined by the state vector as:
y1
y 2
Y
y n
1/10/2017 67
Retrieval Algorithm
the output update rule for Hopfield autoassociative memory can
be expressed in the form
Storage Algorithm
In the Hopfield network, synaptic weights between neurons are
usually represented in matrix form.
Assume that the bipolar binary prototype vectors that need to be
stored are S(m) ),for m = 1, 2, . . .,p. The storage algorithm
for calculating the weight matrix is
(1,1, 1) (1, 1, 1)
(1, 1, 1) (1, 1, 1)
y1
0
(1,1,1) (1,1,1)
(1,1, 1) (1,1, 1)
y3
1/10/2017 73
Hopfield Network
The stable state-vertex is determined by the weight matrix W, the
current input vector X, and the threshold matrix q. If the input
vector is partially incorrect or incomplete, the initial state will
converge into the stable state-vertex after a few iterations.
1 1
Y2 1
or Y T 1 1 1
Y1 1 1 Y2T 1 1 1
1 1
1 1 1 0 0 0 2 2
W 1 1 1 1 1 1 1 1 2 0 1 0 2 0 2
1 1 0 0 1 2 2 0
0 2 2 1 0 1
1
Y1 sign 2 0 2 1 0
2 0
2
1 0 1
0 2 2 1 0 1
Y2 sign2 0 2 1 0 1
2 2 0 1 0 1
1/10/2017 76
The remaining six states are all unstable. However, stable states
(also called fundamental memories) are capable of attracting
states that are close to them.
The fundamental memory (1, 1, 1) attracts unstable states (-1, 1, 1),
(1, -1, 1) and (1, 1, -1). Each of these unstable states represents a
single error, compared to the fundamental memory (1, 1, 1).
The fundamental memory (-1, -1, -1) attracts unstable states (-1, -1,
1), (-1, 1, -1) and (1, -1, -1).
Thus, the Hopfield network can act as an error correction
network.
1/10/2017 77
-1
+1 0 -1
1/10/2017 78
3 1 -1 1
XTX = 1 3 1 -1
-1 1 3 -3
1 -1 -3 3
1/10/2017 79
Matrices (2)
The upper and lower triangles of the product matrix represents the 6
weights wi,j = wj,i
Scale the weights by dividing by p (i.e., averaging). Picton (ANN book)
subtracts p from each. Either method is fine, as long we apply the
appropriate thresholds to the output values.
This produces the same weights as in the non-matrix description.
Testing with input = ( 1 0 0 -1)
3 1 -1 1
(1 0 0 -1) 1 3 1 -1 = (2 2 2 -2)
-1 1 3 -3
1 -1 -3 3
1/3
1 2 1/3
-1/3 -1/3 Asynchronous Updating is central
3 4
1/3 to Hopfields (1982) original model.
-1
-1 -1 -1
1/10/2017 83
1/3
1 2 1/3
-1/3 -1/3
3 4
1/3
-1
-1 -1 -1
1/10/2017 84
1/3
1 2 1/3
-1/3 -1/3
3 4
1/3
-1
1/3 1/3
1/3 1/3 Stable &
-1/3 -1/3 Spurious
-1/3 -1/3
1/3 1/3
-1 -1
1/10/2017 85
1/10/2017 86
1/10/2017 87
1/10/2017 88
BAM operation
x1(p) 1 x1(p+1) 1
1 y1(p) 1 y1(p)
x2 (p) 2 x2(p+1) 2
2 y2(p) 2 y2(p)
xi (p)
j yj(p) j yj(p)
i xi(p+1) i
m ym(p) m ym(p)
xn(p) n xn(p+1) n
Input Output Input Output
layer layer layer layer
(a) Forward direction. (b) Backward direction.
1/10/2017 91
The basic idea behind the BAM is to store pattern pairs so that
when n-dimensional vector X from set A is presented as input, the
BAM recalls m-dimensional vector Y from set B, but when Y is
presented as input, the BAM recalls X.
m1
Can be determined by the Hebb rule. The formulas for the entries
depend on whether the training vectors are binary or bipolar.
For binary input vectors :
Application:
Example A BAM net to associate letters with simple bipolar codes.
1/10/2017 97
1/10/2017 98
1/10/2017 99