Ki2 s07 Clustering Algorithms

KI2 - 7
Clustering Algorithms
Johan Everts
Kunstmatige Intelligentie / RuG

1
What is Clustering?
Find K clusters (or a classification that consists of K clusters) so that

the objects of one cluster are similar to each other whereas objects of
different clusters are dissimilar. (Bacher 1996)
The Goals of Clustering
 Determine the intrinsic grouping in a set of unlabeled

data.
 What constitutes a good clustering?

 All clustering algorithms will produce clusters,
regardless of whether the data contains them
 There is no golden standard, depends on goal:

 data reduction
 “natural clusters”
 “useful” clusters
 outlier detection
Stages in clustering
Taxonomy of Clustering Approaches
Hierarchical Clustering
Agglomerative clustering treats each data point as a singleton cluster, and

then successively merges clusters until all points have been merged into a
single remaining cluster. Divisive clustering works the other way around.
Agglomerative Clustering
Single link
In single-link hierarchical clustering, we merge in each step the two

clusters whose two closest members have the smallest distance.
Agglomerative Clustering
Complete link
In complete-link hierarchical clustering, we merge in each step the two

clusters whose merger has the smallest diameter.
Example – Single Link AC
BA FI MI NA RM TO
BA 0 662 877 255 412 996
FI 662 0 295 468 268 400
MI 877 295 0 754 564 138
NA 255 468 754 0 219 869
RM 412 268 564 219 0 669
TO 996 400 138 869 669 0
BA FI MI/TO NA RM
BA 0 662 877 255 412
FI 662 0 295 468 268
MI/TO 877 295 0 754 564
NA 255 468 754 0 219
RM 412 268 564 219 0

BA FI MI/TO NA/RM
BA 0 662 877 255
FI 662 0 295 268
MI/TO 877 295 0 564
NA/RM 255 268 564 0

BA/NA/RM FI MI/TO
BA/NA/RM 0 268 564
FI 268 0 295
MI/TO 564 295 0

BA/FI/NA/RM MI/TO
BA/FI/NA/RM 0 295
MI/TO 295 0
Taxonomy of Clustering Approaches
Square error
K-Means
 Step 0: Start with a random partition into K clusters

 Step 1: Generate a new partition by assigning each
pattern to its closest cluster center
 Step 2: Compute new cluster centers as the
centroids of the clusters.
 Step 3: Steps 1 and 2 are repeated until there is no
change in the membership (also cluster centers
remain the same)
K-Means
K-Means – How many K’s ?
K-Means – How many K’s ?
Locating the ‘knee’
The knee of a curve is defined as the point of

maximum curvature.
Leader - Follower
 Online
 Specify threshold distance
 Find the closest cluster center

 Distance above threshold ? Create new cluster
 Or else, add instance to cluster
Leader - Follower

 Or else, add instance to cluster
Leader - Follower
Distance < Threshold

 Or else, add instance to cluster and update cluster
center
Leader - Follower

center
Leader - Follower
Distance > Threshold

center
Kohonen SOM’s
The Self-Organizing Map (SOM) is an unsupervised

artificial neural network algorithm. It is a compromise
between biological modeling and statistical data processing
Kohonen SOM’s
 Each weight is representative of a certain input.

 Input patterns are shown to all neurons simultaneously.
 Competitive learning: the neuron with the largest response is chosen.
Kohonen SOM’s
 Initialize weights
 Repeat until convergence
 Select next input pattern
 Find Best Matching Unit
 Update weights of winner and neighbours
 Decrease learning rate & neighbourhood size
Learning rate & neighbourhood size

Kohonen SOM’s
Distance related learning

Kohonen SOM’s
Some nice illustrations
Kohonen SOM’s
 Kohonen SOM Demo (from ai-junkie.com):

mapping a 3D colorspace on a 2D Kohonen map
Performance Analysis
 K-Means
 Depends a lot on a priori knowledge (K)
 Very Stable
 Leader Follower
 Depends a lot on a priori knowledge (Threshold)
 Faster but unstable
Performance Analysis
 Self Organizing Map

 Stability and Convergence Assured
 Principle of self-ordering
 Slow and many iterations needed for convergence

 Computationally intensive
Conclusion
 No Free Lunch theorema

 Any elevated performance over one class, is
exactly paid for in performance over another class
 Ensemble clustering ?
 Use SOM and Basic Leader Follower to identify
clusters and then use k-mean clustering to refine.
Any Questions ?

Ki2 s07 Clustering Algorithms

Uploaded by

Document Information

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Ki2 s07 Clustering Algorithms

Uploaded by

Copyright:

Available Formats

KI2 - 7

Kunstmatige Intelligentie / RuG

Find K clusters (or a classification that consists of K clusters) so that

 Determine the intrinsic grouping in a set of unlabeled

 What constitutes a good clustering?

 There is no golden standard, depends on goal:

Agglomerative clustering treats each data point as a singleton cluster, and

In single-link hierarchical clustering, we merge in each step the two

In complete-link hierarchical clustering, we merge in each step the two

BA 0 662 877 255 412

FI 662 0 295 468 268

MI/TO 877 295 0 754 564

NA 255 468 754 0 219

RM 412 268 564 219 0

BA 0 662 877 255

FI 662 0 295 268

MI/TO 877 295 0 564

NA/RM 255 268 564 0

BA/NA/RM 0 268 564

MI/TO 564 295 0

 Step 0: Start with a random partition into K clusters

The knee of a curve is defined as the point of

 Find the closest cluster center

 Find the closest cluster center

Distance < Threshold

 Find the closest cluster center

 Find the closest cluster center

Distance > Threshold

 Find the closest cluster center

The Self-Organizing Map (SOM) is an unsupervised

 Each weight is representative of a certain input.

Learning rate & neighbourhood size

Distance related learning

 Kohonen SOM Demo (from ai-junkie.com):

 Self Organizing Map

 Slow and many iterations needed for convergence

 No Free Lunch theorema

You might also like