You are on page 1of 17

Fuzzy Clustering Techniques: Fuzzy C-Means and

Fuzzy Min-Max Clustering Neural Networks


Benjamin James Bush
SSIE 617 Term Paper, Fall 2012

|1| INTRODUCTION
Data clustering is a data processing strategy which aims to organize a collection of
data points (hereby simply called points) into groups. Traditionally, the data set is
partitioned so that each point belongs to one and only one cluster. However, unless
the data is very highly clustered, it is often the case that some points do not
completely belong to any one cluster. With the arrival of Fuzzy clustering, these
points could be assigned a set of membership degrees associated with each
cluster instead of artificially pigeonholing it as belonging to only one. The volume
of literature available on fuzzy clustering is immense; a general review of the
literature is outside the scope of this term paper. This paper discusses only 2
approaches to fuzzy clustering: the ubiquitous Fuzzy C-Means Clustering algorithm
and the less well known but interesting Fuzzy Min-Max Clustering Neural Network.
These approaches are discussed in sections 2 and 3, respectively. In part 4 I will
briefly discuss several applications which use the fuzzy clustering techniques
covered here.

|2| THE FUZZY C-MEANS (FCM) CLUSTERING


ALGORITHM
Fuzzy C-Means, also known as Fuzzy K-Means and Fuzzy ISODATA, is one of the
oldest and most ubiquitous fuzzy clustering algorithms. FCM is a generalization of
the K-Means clustering algorithm, which is a simple and widely used method for
finding crisp clusters. Understanding FCMs crisp ancestor is instructive and is
discussed below.
|2.1| K-MEANS CLUSTERING
The K in K-Means refers to the fact that in K-Means clustering, the number of
clusters is decided before the process begins. The Means in K-Means refers to the
fact that each cluster is characterized by the mean of all the points that belong to
the cluster. Thus, in K-Means clustering our goal is literally to find K means, thereby
giving us the K clusters we seek. In particular, the means we seek are those which
minimize the cost function depicted in the following figure:

Figure 1: Cost fucntion minimized during K-Means clustering. Equation taken from [1].
Annotations by me.

The process is initialized by picking K different centroids at random from the


space in which the points are embedded. From here, the K-Means process can be
divided into two phases:
Phase 1: Form Clusters. Each centroid is associated with a different cluster. To
form these clusters, each point in the data set is evaluated in turn. When
evaluated, a point is assigned to the cluster corresponding to the closest centroid.
Phase 2: Move Centroids. Each of the centroid is now moved to the position
obtained by taking the mean of each of the points in the cluster associated with
the centroid.
These two phases are repeated in turn until convergence is reached (i.e. until the
value of the cost function stops decreasing significantly). It should be noted that
there is no guarantee that the cost function will be minimized. The outcome
depends on initial conditions. A flow chart is provided below to aid the readers
understanding of the algorithm.

Figure 2: A flow chart summarizing the K-Means Clustering Algorithm.

Understanding of the K-means clustering algorithm can be further enhanced by


viewing a series of animated GIF images produced by Andrey A. Shabalin, available
at http://shabal.in/visuals.html
Key frames from the animation are provided below for the readers convenience.
Visually inspecting these key frames in conjunction with the above flow chart can
be very instructive.

Figure 3: Key frames from an animation on k-means clustering by Andrey A. Shabalin, PhD

|2.2| FUZZY C-MEANS CLUSTERING (FCM)


FCM is a generalization of K-Means. While K-Means assigns each point to one and
only one cluster, FCM allows clusters to be fuzzy sets, so that each point belongs to
all clusters to varying degrees, with the following restriction: The sum of all
membership degrees for any given data point is equal to 1. The cost function used
in FCM (shown in figure 3) is very similar to the one used by K-Means, but there are
some key differences: The inner sum contains a term for each data point in the set.
Each of these terms is weighed by a membership degree raised to the power of a
fuzziness exponent.

Figure 4: Cost function for FCM. Figure adapted from [1]. Annotations by me. Compare with Figure
1.

Applying the method of Lagrange multipliers to minimize the above cost functions
yields the following necessary (but not sufficient) constraints [1]:

Like K-means, FCM is initialized by choosing a fixed number of centroids at random.


Also like K-means, FCM after initialization is divided into two phases:
Phase 1: Form Clusters. Each centroid is associated with a different fuzzy
cluster. To form these clusters, each point in the data set is evaluated in turn.
When evaluated, a point is assigned a membership degree with respect to each
cluster. The numerical value of these degrees is given by the second of the above
constraints.
Phase 2: Move Centroids. Each of the centroid is now moved to the position
obtained via the first of the above constraints.
The reader should verify that the flow chart for FCM provided below closely
resembles the flow chart for K-Means above. Note also the incorporation of the
aforementioned constraints.

Figure 5: A flow chart summarizing the FCM Clustering Algorithm. Compare to Figure 2

It is instructive to visualize fuzzy clusters visualized by FCM. For this purpose, it is


convenient to use a one dimensional data set, as is used in Figure 6 below.

Figure 6: Three fuzzy clusters produced by FCM on a 1 dimensional data set. Figure taken from
[2].

MATLABs fcmdemo command provides a great way to interact with FCM using 2
dimensional data.1 One can run FCM on several preloaded data sets or provide a
custom data file. The number of clusters can be varied, as can the fuzziness
exponent and the stopping criteria. Once FCM has finished running, one can
directly view and manipulate each of the fuzzy clusters. Screenshots follow on the
next page.

1 MATLABs fcmdemo depends on the Fuzzy Logic Toolbox, which is available for purchase from
MathWorks at the following URL: http://www.mathworks.com/products/fuzzy-logic/index.htmlThe
Laptops in the Enginet classrooms at Binghamton University already have the Fuzzy Logic Toolbox
installed. To start the demo, simply enter the command fcmdemo into the MATLAB command
window.

Figure 7: The main window of fcmdemo after running it on data set 2 with C = 3 and m = 2.

Figure 8: Membership function plots after running fcmdemo with fuzziness exponent m = 1.5

Figure 9: Membership function plots after running fcmdemo with fuzziness exponent m = 4.
Compare with figure 8.

|3| FUZZY MIN-MAX CLUSTERING NEURAL NETWORKS


(FMMCNN)
FCM requires that the number of clusters be specified in advance. However, the
number of clusters that should be used is not always clear, as the figure below
illustrates.

Figure 10: A data set (top) can be clustered into 4 (bottom left) or 2 (bottom right) clusters.

There are many fuzzy clustering techniques which will automatically determine the
number of clusters that should be used. Among them is the Fuzzy Min-Max
Clustering Neural Network (FMMCNN), which we discuss in this section.
|3.1| HYPERBOX FUZZY SETS
The fuzzy clusters used in a FMMCNN are called hyperbox fuzzy sets. A hyperbox
fuzzy set has a hyperbox core, so that every point that lies within the hyperbox is
given a membership degree of 1. The membership function of the hyperbox fuzzy
set then decays linearly as one moves further away from the hyperbox core. A
systemwide parameter controls the rate of this decay.
A hyperbox is completely defined by its min point and its max point. The min point
is a vector whose components provide a series of lower bounds for each dimension
which must be surpassed to remain within the hyperbox. For example, suppose we
have a 2 dimentional hyperbox with min point <5, 20>. Then for a data point <x,
y> to lie within the hyperbox, it is necessary that x 5 and y 20. Analogously,
the max point provides a series of upper bounds for each dimension, which must
be respected to remain within the hyperbox. A formal definition of the membership
function associated with a hyperbox fuzzy set is shown on the next page. To gain a
more practical / intuitive understanding of hyperbox fuzzy sets, contour plots can
be generated and manipulated using a Mathematica notebook created by me. With
this notebook, one can control the position of the min and max points, as well as

the gamma membership decay parameter. The notebook can also plot one
dimensional hyperbox fuzzy sets, thereby revealing that hyberbox fuzzy sets can
be thought of as generalized symmetric trapezoidal fuzzy numbers. The notebook
is available from my website at the following URL:
http://www.benjaminjamesbush.com/fuzzyclustering
Screenshots are given on the following page for the readers convenience.

Figure 11: Membership function of a Hyperbox Fuzzy Set. Adapted from [3]2

m in

m in

m in

m ax

m ax

m ax

g am m a

g amm a

g am m a

1 .0

1 .0

1 .0

0 .8

0 .8

0 .8

0 .6

0 .6

0 .6

0 .4

0 .4

0 .4

0 .2

0 .2

0 .2

0 .0

0 .2

0 .4

0 .6

0 .8

1 .0

0 .0

0 .2

0 .4

0 .6

0 .8

1 .0

0 .0

0 .2

0 .4

0 .6

0 .8

2 [3] contains some typographical errors. They have been corrected in Figure 11.

1 .0

m in 1
m in 2
m ax1

m in 1
m in 2
m a x1

m in 1
m in 2
m a x1

m ax2

m a x2

m a x2

gamma

g am m a

g amm a

Figure 12: Manipulating hyperbox fuzzy sets in Mathematica. One dimensional (top) and two
dimensional (bottom).

|3.1| FUZZY MIN MAX NEURAL NETWORKS


A major advantage of using hyperbox fuzzy set for fuzzy clustering is the fact that
they can easily be implemented as 2 layer artificial neural networks. The following
figure illustrates how this is done.

Figure 13: A hyperbox fuzzy set implemented as a 2 layer artificial neural network.

The input layer contains one node per dimension of the space in which the data
points are embedded. Each input node is connected to the output node via a pair

of weighted links which are weighed by the corresponding component value of the
max point and min point, respectively. Implementing a clustering system in this
way allows for the development of massively parallel systems that can quickly
calculate the membership values for incoming data.
|3.1| EVOLVING FUZZY CLUSTERS
Another advantage of hyperbox fuzzy sets is their relative simplicity with which
they can be expressed. As previously mentioned, a hyperbox fuzzy set can be
completely represented by a min point and a max point. This makes it very easy to
design an evolutionary algorithms which can be used to evolve sets of hyperbox
fuzzy sets for use within fuzzy min-max clustering neural networks. One such
algorithm was published by Fogel and Simpson in [3] and is outlined in the flow
chart on the next page. For their fitness function, Fogel and Simpson use the
minimum description length (MDL), which is in some way an optimal compromise
between fitting the data and using the smallest possible number of clusters. For
more information on the MDL, see [4].

Figure 14: Flow chart summerizing the evolutionary algorithm used in [3]

|4| APPLICATIONS
Fuzzy clustering is becoming an important data processing technique in many
scientific fields. While the use of FCM is widespread, fuzzy min-max clustering
neural networks are harder to come by. Below I list of a few interesting applications
which I encountered in the literature.
GENETICS
Gasch and Eisen used FCM to find clusters of yeast genes [5].
POLITICS
Teran and Meier designed a fuzzy system that used FCM to simplify the complex
political landscape and recommend candidates to voters based on fuzzy data
obtained from surveys [6].
RADIOLOGY
John, Innocent and Barnes used a fuzzy min-max clustering neural network to
group x-ray images of the tibia into clusters [7].
INDUSTRIAL ENGINEERING
Dobado et. al. used a fuzzy min-max clustering neural network to group parts into
part families, an important step in the formation of cells for cellular manufacturing
[8].

APPENDIX: MATHEMATICA CODE


The following Mathematica code can be used to create interactive plots of
hyperbox fuzzy sets in one and two dimensions. The code has been tested on
Mathematica 8.

I n [1 ]:=

I n [2 ]:=

I n [3 ]:=

f x_, y_

0 .6

1 .0

: If x y 0, 0, If 0 x y 1, x y , If x y 1, 1

b1D a_, min_, max_, gamma_

: 1 f a max , gamma

m in
m ax
gam m a

1 .0

0 .8
O u t[3 ]=

0 .6

0 .4

0 .2

0 .2

0 .4

0 .8

f min a, gamma ;

Manipulate
Show
Plot b1D a, min, max, gamma , a, 0, 1 , PlotRange 0, 1
Graphics PointSize 0.05 , Red, Point max, 0
,
Graphics PointSize 0.05 , Black, Point min, 0
,
min, 0.2 , 0, 1 ,
max, 0.3 , 0, 1 ,
gamma, 6 , 0.5, 40

0 .0

In [ 5 ] : =

In [ 6 ] : =

Manipulate
Show
ContourPlot b2D a1, a2, min1, min2, max1, max2, gamma , a1, 0, 1 , a2, 0, 1 ,
Contours 5, ContourLabels True ,
Graphics EdgeForm Thick , White, Rectangle
min1, min2 , max1, max2
,
Graphics Line
min1, min2 , min1, max2 , max1, max2 , max1, min2 , min1, min2
Graphics PointSize 0.05 , Red, Point max1, max2
,
Graphics PointSize 0.05 , Black, Point min1, min2
,
min1, 0.2 , 0, 1 ,
min2, 0.3 , 0, 1 ,
max1, 0.5 , 0, 1 ,
max2, 0.6 , 0, 1 ,
gamma, 6 , 2, 10

m in 1
m in 2
m ax1
m ax2
gam m a

O u t[6 ]=

b2D a1_, a2_, min1_, min2_, max1_, max2_, gamma_ :


1
1 f a1 max1, gamma f min1 a1, gamma

2
1 f a2 max2, gamma f min2 a2, gamma

WORKS CITED
[1 J. S. R. Jang, C-T Sun, and E Mizutani, Neuro-fuzzy and soft computing: a
] computational approach to learning and machine intelligence.: Prentice Hall,
1997.
[2 Matteo Matteucci. (2012, May) A Tutorial on Clustering Algorithms: Fuzzy C] Means Clustering. [Online].
http://home.dei.polimi.it/matteucc/Clustering/tutorial_html/cmeans.html
[3 D B Fogel and P K Simpson, "Evolving Fuzzy Clusters," in IEEE International
] Conference on Neural Networks, 1993.
[4 Peter Grnwald. (2008, August) Videolectures.net: MDL Tutorial. [Online].
] http://videolectures.net/icml08_grunwald_mld/
[5 A P Gasch and M B Eisen, "Exploring the conditional coregulation of yeast gene
] expression through fuzzy k-means clustering," Genome Biology, vol. 3, no. 11,
October 2002.
[6 L Teran and A Meier, "A Fuzzy Recommender System for eElections," in
] Electronic Government and the Information Systems Perspective., 2010, pp. 6276.
[7 R I John, P R Innocent, and M R Barnes, "Neuro-fuzzy clustering of radiographic
] tibia image data using type 2 fuzzy sets," Information Sciences, vol. 125, no. 14, pp. 65-82, June 2000.
[8 D Dobado, S Lozano, J M Bueno, and J Larraeta, "Cell formation using a Fuzzy
] Min-Max neural network," International Journal of Production Research, vol. 40,
no. 1, pp. 93-107, November 2010.

You might also like