Professional Documents
Culture Documents
Cluster Analysis
9
8
Eventually all nodes belong to the same cluster 9
8
9
7 7 7
6 6 6
5 5 5
4 4 4
3 3 3
2 2 2
1 1 1
0 0 0
0 1 2 3 4 5 6 7 8 9 10 0 1 2 3 4 5 6 7 8 9 10 0 1 2 3 4 5 6 7 8 9 10
9 9
9
8 8
8
7 7
7
6 6
6
5 5
5
4 4
4
3 3
3
2 2
2
1 1
1
0 0
0
0 1 2 3 4 5 6 7 8 9 10 0 1 2 3 4 5 6 7 8 9 10
0 1 2 3 4 5 6 7 8 9 10
clustering
BIRCH (1996): uses CF-tree and incrementally
9
(3,4)
(2,6)
8
4 (4,5)
3
1
(4,7)
(3,8)
0
0 1 2 3 4 5 6 7 8 9 10
Non-leaf node
CF1 CF2 CF3 CF5
child1 child2 child3 child5
x
y y
x x
x x
Data Mining: Concepts and
June 27, 2009 Techniques 13
Cure: Shrinking Representative
Points
y y
x x
Basic ideas:
Similarity function and neighbors: T ∩T
Sim( T , T ) = 1 2
T ∪T
1 2
{3} 1
Sim( T 1, T 2) = = =0.2
{1,2,3,4,5} 5
Data Mining: Concepts and
June 27, 2009 Techniques 15
Rock: Algorithm
Links: The number of common neighbours
for the two points.
{1,2,3}, {1,2,4}, {1,2,5}, {1,3,4}, {1,3,5}
{1,4,5}, {2,3,4}, {2,3,5}, {2,4,5}, {3,4,5}
{1,2,3} 3 {1,2,4}
Algorithm
Draw random sample
model
Two clusters are merged only if the
Data Set
Merge Partition
Final Clusters
Advantages:
Straightforward
Dendograms provide good visual guide for the
clustering
Works very well with domains that are naturally
nested
(e.g. plant and animal classifications)
Disadvantages: