Professional Documents
Culture Documents
2005
The Dynamics of Learning Vector Quantization
Rijksuniversiteit Groningen
Mathematics and Computing Science
Michael Biehl, Anarta Ghosh
TU Clausthal-Zellerfeld
Institute of Computing Science
Barbara Hammer
The Dynamics of Learning Vector Quantization, RUG, 10.01.2005
Vector Quantization (VQ)
Learning Vector Quantization (LVQ)
Introduction
The dynamics of learning
a model situation: randomized data
learning algorithms for VQ und LVQ
analysis and comparison: dynamics, success of learning
Summary
Outlook
prototype-based learning from example data:
representation, classification
The Dynamics of Learning Vector Quantization, RUG, 10.01.2005
Vector Quantization (VQ)
aim:
representation of large amounts
of data by (few) prototype vectors
example:
identification and grouping
in clusters of similar data
assignment of feature vector
to the closest prototype w
(similarity or distance measure,
e.g. Euclidean distance )
The Dynamics of Learning Vector Quantization, RUG, 10.01.2005
unsupervised competitive learning
initialize K prototype vectors
present a single example
identify the closest prototype,
i.e the so-called winner
move the winner even
closer towards the example
intuitively clear, plausible procedure
- places prototypes in areas with high density of data
- identifies the most relevant combinations of features
- (stochastic) on-line gradient descent with respect to
the cost function ...
The Dynamics of Learning Vector Quantization, RUG, 10.01.2005
quantization error
( ) ( )
k
K
j k
P
1
j
K
1 j
VQ
d d
2
w H =
[
= = =
j
d prototypes data
w
j
is the winner !
here:
Euclidean distance
aim: faithful representation (in general: clustering )
Result depends on - the number of prototype vectors
- the distance measure / metric used
The Dynamics of Learning Vector Quantization, RUG, 10.01.2005
Learning Vector Quantization (LVQ)
aim:
classification of data
learning from examples
Learning: choice of prototypes according to example data
example situtation:
3 classes
classification:
assignment of a vector
to the class of the closest
prototype w
, 3 prototypes
aim : generalization ability, i.e. correct classification
of novel data after training
The Dynamics of Learning Vector Quantization, RUG, 10.01.2005
prominent example [Kohonen]: LVQ 2.1.
present a single example
initialize prototype vectors
(for different classes)
identify the closest correct
and the closest wrong prototype
move the corresponding winner
towards / away from the example
known convergence / stability problems,
e.g. for infrequent classes
mostly: heuristically motivated variations of competitive learning
The Dynamics of Learning Vector Quantization, RUG, 10.01.2005
LVQ algorithms ...
- are frequently applied in a variety of problems involving
the classification of structured data, a few examples:
- appear plausible, intuitive, flexible
- are fast, easy to implement
- real time speech recognition
- medical diagnosis, e.g. from histological data
- texture recognition and classification
- gene expression data analysis
- . . .
The Dynamics of Learning Vector Quantization, RUG, 10.01.2005
illustration: microscopic images of (pig) semen cells after freezing
and storage, c/o Lidia Sanchez-Gonzalez, Leon/Spain
The Dynamics of Learning Vector Quantization, RUG, 10.01.2005
healthy cells
damaged cells
prototypes
obtained
by LVQ (1)
illustration: microscopic images of (pig) semen cells after freezing
and storage, c/o Lidia Sanchez-Gonzalez, Leon/Spain
The Dynamics of Learning Vector Quantization, RUG, 10.01.2005
LVQ algorithms ...
- are often based on purely heuristic arguments,
or derived from a cost function with unclear
relation to the generalization ability
- almost exclusively use the Euclidean distance measure,
inappropriate for heterogeneous data
- lack, in general, a thorough theoretical understanding of
dynamics, convergence properties,
performance w.r.t. generalization, etc.
The Dynamics of Learning Vector Quantization, RUG, 10.01.2005
In the following:
analysis of LVQ algorithms w.r.t.
- dynamics of the learning process
- performance, i.e. generalization ability
- asymptotic behavior in the limit of many examples
typical behavior in a model situation
- randomized, high-dimensional data
- essential features of LVQ learning
aim: - contribute to the theoretical understanding
- develop efficient LVQ schemes
- test in applications
The Dynamics of Learning Vector Quantization, RUG, 10.01.2005
model situation: two clusters of N-dimensional data
random vectors
N
according to
) P( p ) P(
1
=
=
( )
( )
(
=
2
N/2
-
2
1
exp
2
1
) P( mixture of two Gaussians:
orthonormal center vectors:
B
+
, B
-
N
, ( B
o
)
2
=1, B
+
B
-
=0
prior weights of classes p
+
,
p
-
p
+
+ p
-
= 1
B
+
B
-
(p
+
)
(p
-
)
separation
j
j
B
,
=
2 2 2
2
2
N 1
N
1
+ = = =
= j
j j j
independent components:
The Dynamics of Learning Vector Quantization, RUG, 10.01.2005
high-dimensional data (formally: N)
400 examples
N
, N=200, =1, p
+
=0.6
B
y
(240)
(160)
projections into the plane of
center vectors B
+
,
B
-
B y
=
+ +
2
2
x
=
(240)
(160)
projections in two independent
random directions w
1,2
1 1
x
w
=
model for studying typical behavior of LVQ algorithms,
not: density-estimation based classification
Note:
The Dynamics of Learning Vector Quantization, RUG, 10.01.2005
dynamics of on-line training
sequence of independent random data ( ) ... 1,2,3,
= acc. to ( )
P
learning rate,
step size
competition,
direction of
update etc.
change of prototype
towards or away
from the current data
above examples:
unsupervised Vector Quantization | | ( ) d d f
s
s s
O =
...
The Winner Takes It All
(classes irrelevant/unknown)
Learning Vector Quantization 2.1. | | { S f
s
) ( 1
) ( 1
...
class correct
class wrong
+
= =
here: two prototypes, no
explicit competition
| | ( )
1 -
s
s -
s s
1 -
s
s
, S, d d f
N
w w w + = ... , ,
( )
2
1
=
=
s
s
d
1 S,
w
update of prototype vectors:
The Dynamics of Learning Vector Quantization, RUG, 10.01.2005
| | ( )
| | ( ) | | ( ) | | | | ( )
1
... f ... f Q x ... f Q x ... f
1/N
Q Q
R y ... f
1/N
R R
t s
1 -
st
s t
1 -
st
t
s
1 -
st
st
1 -
s
s
1 -
s
s
+ + + =
2
| | ( )
1 -
s
s -
s s
1 -
s
s
, S, d d f
N
w w w + = ... , ,
recursions
mathematical analysis of the learning dynamics
( ) ( )
1
2 2
1 -
ss
s
s
s
Q 2x d + = =
w
1 -
s
s
B y x = =
t t
w
projections
distances
random vector
enters only in the form of
{ } ( ) 1 , 1 + e = = t, s,
st
s
Q B R w w w
projections in
the (B
+,
B
-
)-plane
length and relative
position of prototypes
1. description in terms of a few characteristic quantitities
( here:
2N
7
)
The Dynamics of Learning Vector Quantization, RUG, 10.01.2005
s
N
1 j
j , j s,
N
1 j
j s,
s
R x = = =
= =
B w w
j
s
B y
w x
=
=
t t
correlated Gaussian
random quantities
st
t s
Q x x - x x =
t t t s
s
s
R y x - y x =
t t t
o y y - y y
=
= =
=
else
if
s
y
0
S
o
t
2. average over the current example
averaged recursions closed in { R
s
, Q
st
} p
1
=
=
The Dynamics of Learning Vector Quantization, RUG, 10.01.2005
characteristic quantities
- depend on the random sequence of example data
- their variance vanishes with N (here: N
-1
)
st
R , Q
learning dynamics is completely described in terms of averages
3. self-averaging properties
4. continuous learning time
N
=
# of examples
# of learning steps
per degree of freedom
) ( R ), ( Q
s st
recursions coupled, ordinary differential equations
evolution of projections
The Dynamics of Learning Vector Quantization, RUG, 10.01.2005
probability for misclassification of a novel example
( ) ( )
O +
+
O =
+ + +
d d p d d p
g
( ) ( )
(
(
+
(
(
u =
+
+
+ +
+
+ +
+
+
+ +
+
+ +
+ +
+
Q Q Q
R R 2 Q Q
Q Q Q
R R 2 Q Q
p p
2 2 2
1
2
1
5. learning curve
generalization error
g
() after training with N examples
N
- repulsive/attractive fixed points of the dynamics
- asymptotic behavior for o
- dependence on learning rate, separation, initialization
- ...
investigation and comparison of given algorithms
- time-dependent learning rate ()
- variational optimization w.r.t. f
s
[...]
- ...
optimization and development of new prescriptions
maximize
g
d
d
The Dynamics of Learning Vector Quantization, RUG, 10.01.2005
optimal classification with minimal generalization error
B
-
B
+
(p
-
>p
+
)
(p
+
)
separation of classes by the plane with
1) P( p 1) P( p + = = =
+
in the model situation (equal variances of clusters):
excess error
minimal
g
as a function
of prior weights
=2
g
0.25
0.50
0
0.5
1.0 0 p
+
=1
=0
The Dynamics of Learning Vector Quantization, RUG, 10.01.2005
LVQ 2.1. update the correct and wrong winner
( )
1 -
s
1 -
s
s
S
N
w w w + =
(analytical)
integration
for w
s
(0) = 0
( ) ( )
( ) ( )
m m
m m
e 1
2
m 1
m
R e 1
2
m 1
m
R
Q e 1
2
m 1
m
R e 1
2
m 1
m
R
+
+
+ +
+ +
+
=
+
=
=
=
+
=
p
o
= (1+m o ) / 2 (m>0)
[Seo, Obermeyer]: LVQ2.1. cost function
(likelihood ratios)
+ +
+ + + + +
Q Q R R
Q , R , R
with , ,
finite remain
,
Q
+ +
R
+ +
R
+
R
+
Q
+
Q
R
10 2 4 8 6
6 -
0
6
theory and simulation (N=100)
p
+
=0.8, =1, q=0.5
averages over 100 independent runs
The Dynamics of Learning Vector Quantization, RUG, 10.01.2005
(p
-
)
(p
+
> p
-
)
strategies:
- selection of data in a window close to
the current decision boundary
slows down the repulsion,
system remains instable
- Soft Robust Learning Vector Quantization [Seo & Obermayer]
density-estimation based cost function
limiting case Learning from mistakes: LVQ2.1-step only,
if the example is currently misclassified
slow learning, poor generalization
problem: instability of the algorithm
due to repulsion of wrong prototypes
trivial classification fr :
g
= max { p
+
,p
-
}
The Dynamics of Learning Vector Quantization, RUG, 10.01.2005
The winner takes it all
numerical
integration
for w
s
(0)=0
theory and simulation (N=200)
p
+
=0.2, =1.2, q=1.2
averaged over 100 indep. runs
Q
++
Q
--
Q
+-
w
+
w
-
B
+
B
-
trajectories in the (B
+
,B
-
)-plane
() o=20,40,....140
....... optimal decision boundary
____ asymptotic position
R
S
+
R
S-
R
--
R
-+
R
--
R
++
winner w
s
1
I) LVQ 1 [Kohonen] | | ( )
1 -
s
S
S
1 -
s
s
S d d
N
w w w O + =
only the winner is updated
according to the class membership
w
-
The Dynamics of Learning Vector Quantization, RUG, 10.01.2005
learning curve
o
g
q=1.2
(p+=0.2, =1.2)
g
() grows lin. with
- stationary state:
- role of the learning rate
100 200 300
g
0.26
0.22
0.18
0.14
0
2.0
0.4
0.2
0
- variable rate () !?
- well-defined asymptotics:
(ODE linear in )
10
g
20 30 40 50
0
0.14
0.26
0.22
0.18
min.
g
( )
0
0,
( )
suboptimal
The Dynamics of Learning Vector Quantization, RUG, 10.01.2005
The winner takes it all
II ) LVQ+ ( only positive steps without repulsion)
| | ( ) ( )
1 -
s
S
1 -
s
s
d d
N
w w w O + =
,
winner correct
asymptotic configuration
symmetric about (B
+
+B
-
)/2
w
-
w
+
B
+
B
-
p+=0.2, =1.2, q=1.2
classification scheme and the
achieved generalization error are
independent of the prior weights p
(and optimal for p
= 1/2 )
LVQ+ VQ within the classes
(w
s
updated only from class S)
The Dynamics of Learning Vector Quantization, RUG, 10.01.2005
- LVQ 2.1.
trivial assignment to the
more frequent class
optimal
classification
g
p
+
min {p
+
,p
-
}
- LVQ 1
here: close to optimal
classification
p
+
- LVQ+
min-max solution
p
-independent classification
p+=0.2, =1.0, q=1.0
g
learning curves
LVQ+
LVQ1
asymptotics: 0, ()
The Dynamics of Learning Vector Quantization, RUG, 10.01.2005
Vector Quantization
competitive learning | | ( )
1 -
s
S
S
1 -
s
s
d d
N
w w w O + =
w
s
winner
class membership is unknown
or identical for all data
numerical integration for w
s
(0)0
( p
+
=0.2, =1.0, q=1.2 )
g
VQ
LVQ+
LVQ1
R
++
R
+-
R
-+
R
--
100 200 300 0
0
1.0
system is invariant under
exchange of the prototypes
weakly repulsive fixed points
The Dynamics of Learning Vector Quantization, RUG, 10.01.2005
interpretations:
- VQ, unsupervised learning
unlabelled data
- LVQ, two prototypes of the
same class, identical labels
- LVQ, different classes, but
labels are not used in training
g
p
+
asymptotics (o,q0, qo)
p
+
0
p
-
1
- low quantization error
- high gen. error
g
The Dynamics of Learning Vector Quantization, RUG, 10.01.2005
work in progress, outlook
regularization of LVQ 2.1, Robust Soft LVQ [Seo, Obermayer]
model: different cluster variances, more clusters/prototypes
optimized procedures: learning rate schedules,
variational approach / density estimation / Bayes optimal on-line
several classes and prototypes
Summary
prototype-based learning
Vector Quantization and Learning Vector Quantization
a model scenario: two clusters, two prototypes
dynamics of online training
comparison of algorithms:
LVQ 2.1.: instability, trivial (stationary) classification
LVQ 1 : close to optimal asymptotic generalization
LVQ + : min-max solution w.r.t. asymptotic generalization
VQ : symmetry breaking, representation
The Dynamics of Learning Vector Quantization, RUG, 10.01.2005
Perspectives
Self-Organizing Maps (SOM)
(many) N-dim. prototypes form a (low) d-dimensional grid
representation of data in a topology preserving map
neighborhood preserving SOM Neural Gas (distance based)
Generalized Relevance LVQ [Hammer & Villmann]
adaptive metrics, e.g. distance measure ( )
=
=
N
i
i i i
w
1
2
) (
s
w, d
training
applications