You are on page 1of 11

2- KOHONEN SELF-ORGANIZING MAPS (SOM)

- The self-organizing neural networks assume a topological structure


among the cluster units.
- There are m cluster units, arranged in a one- or two-dimensional
array; the input signals are n-tuples [Kohonen, 1989al].
- The weight vector for a cluster unit serves as an exemplar of the input
patterns associated with that cluster.
- During the self-organization process, the cluster unit whose weight
vector matches the input pattern most closely (typically, the square
of the minimum Euclidean distance) is chosen as the winner.
- The winning unit and its neighboring units (in terms of the topology
of the cluster units) update their weights.
- The weight vectors of neighboring units are not, in general, close to
the input pattern.
- The architecture and algorithm that follow for the net can be used to
cluster a set of p continuous-valued vectors x = ( x1 , . . . , xi, . . . , xn)
into m clusters.
- Note that the connection weights do not multiply the signal sent from
the input units to the cluster units (unless the dot product measure of
similarity is being used).
Architecture
The architecture of the Kohonen self-organizing map is shown in Fig. 5.

Figure (5) Kohonen Self-Organized Map

98
Neighborhoods of the unit designated by # of radius R = 2, 1, and 0 in a
one-dimensional topology (with 10 cluster units) are shown in Figure 6.

* * * {* (* [#] *) *} * *
Figure (6) Linear array of cluster unit
The neighborhoods of radius R = 2, 1 and 0 are shown in Figure 7 for a
rectangular grid and in Figure 8 for a hexagonal grid (each with 49 units).
In each illustration, the winning unit is indicated by the symbol "#" and
the other units are denoted by "*". Note that each unit has eight nearest
neighbors in the rectangular grid, but only six in the hexagonal grid.
Winning units that are close to the edge of the grid will have some
neighborhoods that have fewer units than that shown in the respective
figure.

Figure (7) Neighborhoods for rectangular grid

99
Figure (8) Neighborhoods for hexagonal grid
Algorithm
1- Initialize weights wij. (Random values may be assigned for the initial
weights as a possible choices)
2- Set topological neighborhood parameters, (R2, R1, and R0)
3- Set learning rate parameters α
4- For each input vector x, and For each j, compute:
D(j) = ∑ (wij – xi)2
i

5- Find index J such that D(J) is minimum.


6- For all units j within a specified neighborhood of J, and for all i
update the weights:
wij(new) = wij(old) + α[xi – wij(old)]
7- Update learning rate
8- Reduce radius of topological neighborhood at specified times.
9- Test stopping condition.
Alternative structures are possible for reducing R and α. The learning rate
α is a slowly decreasing function of time (or training epochs).
The radius of the neighborhood around a cluster unit also decreases as the
clustering process progresses.

100
Random values may be assigned for the initial weights. In the next
example the weights are initialized to random values (chosen from the
same range of values as the components of the input vectors).
Example-4: A Kohonen self-organizing map (SOM) to cluster four
vectors
Let the vectors to be clustered be
(1, 1, 0, 0); (0, 0, 0, 1); (1, 0, 0, 0); (0, 0, 1, 1).
The maximum number of clusters to be formed is m = 2.
Suppose the learning rate (geometric decrease) is :
α(0) = 0.6
α (t + 1) = 0.5 α ( t )
With only two clusters available, the neighborhood of node J (Step 5) is
set so that only one cluster updates its weights at each step (i.e., R = 0).
Sol:
Initial weight matrix:

Initial radius: R = 0
Initial learning rate: α(0) = 0.6
D(j) = ∑ (wij – xi)2
i

wij(new) = wij(old) + α[xi – wij(old)]


Begin training
For the first vector, (1, 1 , 0, 0):
D(1) = (w11 – x1)2 + (w21 – x2)2 + (w31 – x3)2 + (w41 – x4)2
D(1) =(.2 - 1)2 +(.6 – 1)2 +(.5 – 0)2 + (.9 - 0)2 = .64+.16+.25+.81=
1.86
D(2) = (w12 – x1)2 + (w22 – x2)2 + (w32 – x3)2 + (w42 – x4)2

101
D(2) =(.8 - 1)2 +(.4 – 1)2 +(.7 – 0)2 +(.3 - 0)2 = .04+.36+.49+.09=
0.98
The input vector is closest to output node 2, so: J=2
The weights on the winning unit are updated, This gives the weight
matrix

For the second vector, (0, 0, 0, l ):


D(1) = (w11 – x1)2 + (w21 – x2)2 + (w31 – x3)2 + (w41 – x4)2
D(1) =(.2 - 0)2 +(.6 – 0)2 +(.5 – 0)2 +(.9 - 1)2 = .04+.36+.25+.01=
0.66
D(2) = (w12 – x1)2 + (w22 – x2)2 + (w32 – x3)2 + (w42 – x4)2
D(2) = (.92 - 0)2 + (.76 – 0)2 + (..28 – 0)2 + (.12 - 1)2 = 2.2768
The input vector is closest to output node 1 , so J = 1
Update the first column of the weight matrix:

For the third vector, (1, 0, 0, 0),


D(1) = 1.8656
D(2) = 0.6768
The input vector is closest to output node 2, so J = 2.
Update the second column of the weight matrix:

For the fourth vector, (0, 0, 1, 1),


D(1) = 0.7056

102
D(2) = 2.724.
J = 1.
Update the first column of the weight matrix:

Reduce the learning rate:


α (t + 1) = 0.5 α ( t )
α = .5 (0.6) = .3
The weight update equations are now
wij(new) = wij(old) + 0.3 [xi - wij(old)]
=0.7wij(0Id) + 0.3xi
The weight matrix after the second epoch of training is

Modifying the adjustment procedure for the learning rate so that it


decreases geometrically from .6 to .01 over 100iterations (epochs) gives
the following results:

These weight matrices appear to be converging to the matrix

the first column of which is the average of the two vectors placed in
cluster 1 and the second column of which is the average of the two
vectors placed in cluster 2.

103
3- LEARNING VECTOR QUANTIZATION
Learning vector quantization (LVQ) [Kohonen, 1989a, 1990a] is a pattern
classification method in which each output unit represents a particular
class or category. (Several output units should be used for each class).
The weight vector for an output unit is often referred to as a reference (or
codebook) vector for the class that the unit represents.
During training, it is assumed that a set of training patterns with known
classifications is provided, along with an initial distribution of reference
vectors (each of which represents a known classification).
After training, an LVQ net classifies an input vector by assigning it to the
same class as the output unit that has its weight vector (reference vector)
closest to the input vector.
Architecture
The architecture of an LVQ neural net, shown in Figure 9, is the same as
that of a Kohonen, self-organizing map (without a topological structure
being assumed for the output units). In addition, each output unit has a
known class that it represents.

Figure (9) Learning vector quantization neural net(LVQ)

104
Algorithm
The motivation for the algorithm for the LVQ net is to find the output
unit that is closest to the input vector.
Toward that end, if x and w, belong to the same class, then we move the
weights toward the new input vector; if x and w, belong to different
classes, then we move the weights away from this input vector.
The nomenclature we use is as follows:
X training vector (x1 , . . . ,xi , . . . , xn).
T correct category or class for the training vector.
wj weight vector for jth output unit (w1j, …, wij , . . . , wnj)
Cj category or class represented by jth output unit.
||x – wj|| Euclidean distance between input vector and (weight vector
for
jth output unit.
1- Initialize reference vectors (several strategies are discussed shortly);
2- Initialize learning rate, α(0).
3- For each training input vector x, find J so that ||x – wj|| is a minimum.
4- Update wj as follows:
if T = Cj , then
wJ(new) = wJ(old) + α[x – wJ(old)]
if T ≠ Cj , then
wJ(new) = wJ(old) - α[x – wJ(old)]
5- Reduce learning rate.
6- Test stopping condition:
The condition may specify a fixed number of iterations or the learning
rate reaching a sufficiently small value.
The simplest method of initializing the weight (reference) vectors is to
take the first m training vectors and use them as weight vectors; the
remaining vectors are then used for training [Kohonen, 1989a]. Another

105
simple method is to assign the initial weights and classifications
randomly. Another possible method of initializing the weights is to use
the self-organizing map [Kohonen, 1989a] to place the weights. Each
weight vector is then calibrated by determining the input patterns that are
closest to it, finding the class that the largest number of these input
patterns belong to, and assigning that class to the weight vector.
Example-5: Learning vector quantization (LVQ): five vectors assigned to
two classes
In this example, two reference vectors will be used. The following input
vectors represent two classes, 1 and 2:
VECTOR CLASS
(1, l, 0, 0) 1
(0, 0, 0, 1) 2
(0, 0, 1, 1) 2
(1, 0, 0, 0) 1
(0, 1, 1, 0) 2
The first two vectors will be used to initialize the two reference vectors.
Thus, the first output unit represents class 1, the second class 2
(symbolically, C1= 1 and C2 = 2).
This leaves vectors (0, 0, 1, 1), (1, 0, 0, 0), and (0, 1 , 1. 0) as the training
vectors. Only one iteration (one epoch) is shown:
Sol:
Initialize weights:
w1 = (1, l, 0, 0)
w2 = (0, 0, 0, 1)
Initialize learning rate, α = 0.1
For input vector x = (0, 0, 1, 1) with T = 2,
D(1) = (w11 – x1)2 + (w21 – x2)2 + (w31 – x3)2 + (w41 – x4)2
D(1) =(1 - 0)2 +(1 – 0)2 +(0 – 1)2 + (0 - 1)2 = 4
D(2) = (w12 – x1)2 + (w22 – x2)2 + (w32 – x3)2 + (w42 – x4)2

106
D(2) =(0 - 0)2 +(0 – 0)2 +(0 – 1)2 +(1 - 1)2 = 1
J = 2, since x is closer to w2 than to w1
Since T = 2 and C2 = 2, update w2 as follows:
wJ(new) = wJ(old) + α[x – wJ(old)]
w2 = (0, 0, 0, 1) + 0.1[(0, 0, 1, 1) – (0, 0, 0, 1)] = (0, 0, 0.1,
1)
For input vector x = (1, 0, 0, 0) with T = 1,
D(1) = (w11 – x1)2 + (w21 – x2)2 + (w31 – x3)2 + (w41 – x4)2
D(1) =(1 - 1)2 +(1 – 0)2 +(0 – 0)2 + (0 - 0)2 = 1
D(2) = (w12 – x1)2 + (w22 – x2)2 + (w32 – x3)2 + (w42 – x4)2
D(2) =(0 - 1)2 +(0 – 0)2 +(0.1 – 0)2 +(1 - 0)2 = 2.01
J = 1, since x is closer to w1 than to w2
Since T = 1 and C2 = 1, update w1 as follows:
wJ(new) = wJ(old) + α[x – wJ(old)]
w1 = (1, 1, 0, 0) + 0.1[(1, 0, 0, 0) – (1, 1, 0, 0)] = (1, 0.9, 0,
0)
For input vector x = (0, 1, 1, 0) with T = 2,
D(1) = (w11 – x1)2 + (w21 – x2)2 + (w31 – x3)2 + (w41 – x4)2
D(1) =(1 - 0)2 +(0.9 – 1)2 +(0 – 1)2 + (0 - 0)2 = 2.01
D(2) = (w12 – x1)2 + (w22 – x2)2 + (w32 – x3)2 + (w42 – x4)2
D(2) =(0 - 0)2 +(0 – 1)2 +(0.1 – 1)2 +(1 - 0)2 = 2.81
J = 1, since x is closer to w1 than to w2
Since T = 2 and C2 = 1, (T ≠ Cj ) update w1 as follows:
wJ(new) = wJ(old) - α[x – wJ(old)]
w1=(1, .9, 0, 0) - 0.1[(0, 1, 1, 0) – (1, .9, 0, 0)]=(1.1, 0.89, -
.1, 0)
This complete one epoch of the training.
Reduce the learning rate, and test the stopping condition

107
Variations:
Several algorithms represent an improvement to LVQ, called LVQ2,
LVQ2.1 [Kohonen, 1990a], and LVQ3 [Kohonen, 1990b]. In the original
LVQ algorithm, only the reference vector that is closest to the input
vector is updated. In the improved algorithms, two vectors (the winner
and a runner-up) learn if several conditions are satisfied. The idea is that
if the input is approximately the same distance from both the winner and
the runner-up, then each of them should learn.

108

You might also like