Professional Documents
Culture Documents
C Mean Clustering
Professor Ashok Deshpande PhD(Engineering)
Founding Chair: Berkeley initiative in Soft Computing
(BISC)-Special Interest Group (SIG)-Environment
Management Systems (EMS)
Guest Faculty: University of California Berkeley Ca USA
Visiting Professor: University of New South Wales Canberra
Australia and Indian Institute of Technology Mumbai
Adjunct Professor: NIT Silchar and College of Engineering Pune
Former Deputy Director: National Environmental Engineering
research Institute (NEERI) India
So far as the laws of mathematics
refer to reality they are not
certain. And so far as they are
certain, they do not refer to
reality.
Albert Einstein
Theoretical Physicist and Nobel Laureate,
Geometrie und Erfahrung, Lecture tp Prussian Academy,1921
Classification and Clustering
David Hume
Scottish philosopher
Equity Concerning Human Understanding,1748
A Commentary on Clustering
and Classification
Examples
Image recognition
Molecular biology application such as protein folding
and 3D molecular structure.
Oil exploration
Cancer detection and, so on.
A Commentary on Clustering
and Classification
Sequence Alignment
Phylogenetic Tree Building
Ecological Data Analysis
Grouping patients into disease areas
Physiology and Anatomy
Taxonomy
Chemistry and so on.
Basic Methods
Clustering Formalism
-Three Types
1. Based on Classical methods :
Similarity measures:
Where: xik and xjk are the two individuals values for
variable k, and Rk is the range of variable usually in
the set of individuals to be considered.
Data: Five Psychiatrically Ill Patients
Y2-y1
Object1
X2-X1
X1,y1
2-X1)2+ (Y2-Y1)2
Distance=(X
Euclidean Distance
Dissimilarity and Distance Measures
3
d13=12.37
Dissimilarity and Distance Measures
What to do?
Standardization zik = xik /sk where sk is the standard
deviation of the kth variable.
Distance Measures
Other options include:
Mahalanobis Distance
Quantitative Variables
A distance measure that geneticists have used
when describing groups, or populations in terms of
gene frequencies is so called genetic distance, dAB
defined as
dAB = (1-cos ) where: cos = (piA piB)
i
The terms piA and piB are the gene frequencies
for the ith allele at a given focus in the two
populations.
The angular transformation for the proportion has a
variance stabilizing role. When several genetic loci are
considered the dAB values are added together.
Issue 1
How to measure (mathematical) similarity or
similarity between pairs of observations
-A Simple method could be..
Distance between pairs of feature vectors in a
Feature space. Then we expect that the distance
between points in the same cluster will be
considerably less than the distance between points
in different clusters.
Issues of Importance in Cluster Analysis
Issue 2
How to evaluate the partitions (cluster) once they
are formed?--This could be termed as cluster
validity.
In this case, it is necessary to identify the value of
cluster c that gives the most plausible number of
clusters in the data for the analysis at hand.
Nonhierarchical Methods
V2
V1
o
x
U Ai = X
i=1
Ai Aj = all i j 3
Aj X all i 4
Where X= {x1, x2, x3,xn} is a finite set space of the
universe of data samples, and c is the number of
classes, or partitions ,or clusters , into which we want
to classify the data.
Suppose we have the case where c=2, following
are the set expressions:
A2 = A1 A1 A1 = X and A1 A1 =
Explanation:
Any sample xk can only and definitely belong to one
of the c classes ( eqs. 7 &8) while eq. 9 implies
that no class is empty and no class is the whole set
X, ( that is the universe).
jth data point in the ith cluster or class , is defined to
be ij = Ai (xj). We now define U matrix comprising
of elements ij ( i=1,2,.c; j=1,2,n): that is, c rows and
n columns. We define a hard c partition space for X as
the following Matrix set:
c n
Mc = ( 1/c!) c (-1)c-1 ,
in 12
i=1 i
Example
X ={x1, x2, x3, x4, x5} that is n=5 and suppose we
want to cluster these point into two clusters (c) = 2
Mc = {2(-1) + 25 } = 15
J (U,v) = ik (dik)2 ..
13
k=1 i=1
Where dik is a ED measure (in m dimensional feature
Space, Rm ) between the kth data sample xk and ith
cluster center vi, given by
m=1
dik= d ( xk-vi) = xk vi = ( xkj vij)2
j=1
14
Since each data sample requires m coordinates to
describe its location in Rm-space , each cluster center
also requires m coordinates to describe its location in
this same place. Therefore, the ith cluster is a vector
of length m,
Vi ={ vi1, vi2, vim} 15
where the jth coordinate is calculated by
vij = n ik .xkj / n ik . 16
k=1 k=1
1.0
x
1 1.3 1.5 2 3
Four Data Points in two- dimensional feature space
To classify these data points into two classes (c=2).It
is desirable to compute cardinality of the possible
number of crisp partitions for the system I.e. to find
Mc .
U(0) = 1 0 0 0
0 1 1 1
We seek optimal partition U* from the seven
partitions. With desired tolerance or convergence
level, : U(0), U(1) U(2). U*
For class 1 we calculate the coordinates of the
cluster center,
V1j = 1x1j + 2x2 j+ 3x3j + 4x4j
(1+2+3+4)
= (1) x1j +(0) x2 j + (0) x3j +(0) x4j
(1+0+0+0)
and vi ={vi1, vi2,.vim}
For m =2, which means we deal with two coordinates
for each data point: vi ={vi1,vi2 }
where for c =1 (class 1) v1= {v11, v12 }; similarly for c
= 2 ( class 2) v2 ={v21, v22 }
Therefore, using expression for vij for c=1, and j =1,
respectively,
V11 =1(1)/1 = 1 x coordinate
V12 = 1(3)/1 = 3 y coordinate v1 ={1,3)
which just happens to be the co ordinates of point x 1,
since this is the only point in the class for the
assumed initial partition, U(0).
For c= 2, we get the cluster center coordinates
V2j = (0) x1j + (1)x2 j+ (1) x3j + (1) x4j
0 + 1 + 1+1
= ( x2 j+ x3j + x4j ) / 3
d12=.54, min(d12,d22)=min(.54,.97)=.54,thus 12 =1
d13=.36, min(d13,d23)=min(.36,.78)=.36,thus 13 =1
d14=2.83,m(d14,d24)=min(2.83,1.70)=1.70, 14 =0
Therefore, the updated partition is
U(0) = 1 1 1 0
0 0 0 1
Since the updated partitions U(0) and U(1) are different ,
we repeat the same procedure based on the new setup
of two classes. For c =1, the center coordinates are
v1j or vj =x1j +x2j + x3j/(1+1+1+0) since x14= 0
v11=x11 +x21 +x31 /3 =1 +1.5 +1.3/3 =1.26
v12=x12 +x22 +x32 /3 =3 +3.2 +2.8/3 = 3.0
v1 ={1.26, 3.0}
and for c =2, the center coordinates are
v2j = or vj = x4j/( 0+0+0+1), since x21 =x23 =0
v21 =3/1 = 3
v22=1/1 = 1 v2 = {3, 1}
d11=.26, min(d11,d21)=min(.26,2.83)=0.26, 11 =1
d12=.31, min(d12,d22)=min(.31, 2.66)=0.31, 12 =1
d13=.20, min(d13,d23)=min(.20, 2.47)=0.20, 13 =1
d14=2.65,m(d14,d24)=min(2.65,0) =0.0, 14 =0
Because the partition U(1) and U (2) are identical, we
could say the iterative process has converged;
therefore, the optimum hard partition ( crisp) is
U(*) = 1 1 1 0
0 0 0 1
The optimum partition tells us that for the
catalytic converter example, the data points x1, x2
and x3 are more indicative of a non polluting
converter than is data point x4.
y
Final Two Clusters
3.2
C1
3.0
2.8
U(Final)
C2
1.0
x
1 1.3 1.5 2 3
Four Data Points in two- dimensional feature space
HCM is also known as k
means Clustering)
What Next?
A Primer
Professor Ashok
Deshpande
Fuzzy c Means (FCM)
Iterative Optimization
To which class this point belong?
U(4)= 1 1 1 1 1 1 1 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 1 1 1 1 1 1 1 1
c1 c2 c1 c2
Example-3
Suppose you are a fruit geneticist interested in
genetic relationships among fruits. In particular, you
know that a tangelo is a cross between a grapefruit
and a tangerine. You describe the fruit with such
features as colour, weight, sphericity, sugar content,
skin texture and so on. Hence, your feature space
could be highly dimensional.
We can show that the sum of each row is a number between 0 and n:
0 <n1k =1.62 and n2k =1.38, and that is less than 3.
11 ^ 21 = min ( 0.91, 0.09) = 0.09 0
12 ^ 22 = min ( 0.58, 0.42) = 0.42 0 and 13 ^ 23= 0.13 0
How??
Fuzzy c-Means
Algorithm
Fuzzy c-Means
1 i k, x X [2]
k
[ C i (x) ] m
x
x X
Vi = ________________________________
k
[ C i (x) ] m
x X
1 i
k
[3]
Based on this theorem, FCM updates the
prototypes and the MF iteratively using equation [2]
and [3] until a convergence criterion is reached .
The Fuzzy c-means (FCM) Algorithm
FCM {X, C, m, }
X: an unlabeled data set ; C: the number of clusters
to form ; m is the parameter in the objective
function , and is threshold for the convergence
criteria
Initiate prototype V={v1,v2,., vc}
Repeat V Previous V; Compute MF using equation 3),
update prototype ,vi in V using equation 2
c
until || vi Previous vi ||
i=1
Problem on Fuzzy c-Means
15 (7,13)
(2,12) x3
x1
Feature Distance (f2)
10 + +v2 (12,7)
(4,9)
x2 + x5
(10,10)
(11,5)
(14,4)
5 +v1 (5,5) x4 x6
0 f1
5 10 15
15
V1 [ [6.6273,9.1484]
x3
X1 [2,12]
2 +v2
8 V2 [9.7374,8.4887]
10 + [10,10]
x2 [4,9] x5
7
5 +v1 [5,5] x4 x6
3
0 f1
5 10 15
Problem on Fuzzy c-Means
{ c1 (xk)}2 x x k
k=1
V1 = ________________________
6
{ c1 (xk)}2
k=1
(0.4609)2x(2,12) +(0.3148)2x(4,9)+(0.7909)2x(7,13)+
0.5806)2x (11,5) +0.803)2x(12,7)+(0.6119)2x(14,4)
_________________________________________
=(0.4609)2+(0.3148)2+(0.7909)2+(0.5806)2+0.803)2
+(0.6119)2
={22.326/2.2928, 19.4629/2.2928}
= (9.7374, 8.4887)
Example of Fuzzy c-Means Algorithm
f2
15
V1 [ [6.6273,9.1484]
x3
X1 [2,12]
2 +v2
8 V2 [9.7374,8.4887]
10 + [10,10]
x2 [4,9] x5
7
5 +v1 [5,5] x4 x6
3
0 f1
5 10 15
Fuzzy C Means :Example
5 x1
x2
4
x3
1.5
0.5 x4
0 1 1.5 2
k=1 k=1
For c=1; V11 = 1(1) + 1 (1) /2 =1
V12 = 5(1) + 1.5 (1) /2 =3.25
j=1
1
U = 0.775 0.668 0.268 0.284
0.2738 0.3353 0.732 0.716
(1) (0)
ik - ik = 0.27 .0.05
A Commentary:
Limitations and Research Efforts
Limitations of FCM
The hard clustering panel the dataset into clusters
such that one object exactly belongs to only one
cluster. This procedure is unsuitable for real world
dataset in which the elements are not strict and
there are no distinct boundaries between the
clusters.