Algorithm Its Clustering: Detecting

583
IEEE TRANSACTIONS ON COMPUTERS, VOL. C-19, NO. 7, JULY 1970
An Algorithm for Detecting Unimodal

Fuzzy Sets and Its Application as a
Clustering Technique
ISRAEL GITMAN
AND
MARTIN D. LEVINE, MEMBER, IEEE
Abstract-An algorithm is presented which partitions a given

sample from a multimodal fuzzy set into unimodal fuzzy sets. It is
proven that if certain assumptions are satisfied, then the algorithm
will derive the optimal partition in the sense of maximum separation.
The algorithm is applied to the problem of clustering data, defined
in multidimensional space, into homogeneous groups. An artificially
generated data set is used for experimental purposes and the results
and errors are discussed in detail. Methods for extending the algorithm to the clustering of very large sets of points are also described.
The advantages of the method (as a clustering technique) are that
it does not require large blocks of high speed memory, the amount of
computing time is relatively small, and the shape of the distribution of
points in a group can be quite general.
Index Terms -Clustering algorithms, multimodal data sets,
pattern recognition, symmetric fuzzy sets, unimodal fuzzy sets.
I. INTRODUCTION
rT HE PRIMARY objective of clustering techniques is

to partition a given data set into so-called homogeneous clusters (groups, categories). The term homogeneous is used in the sense that all points in the same group
are similar (according to some measure) to each other and
are not similar to points in other groups. The clusters generated by the partition are used to exhibit the data set and
to investigate the existence of families as is done in numerical taxonomy or alternatively, as categories for classifying
future data points as in pattern recognition. The role of
cluster analysis in pattern recognition is discussed in detail
in two excellent survey papers [1] [16].
The basic practical problems that clustering techniques
must address themselves to involve the following:
1) the availability of fast computer memory,
2) computational time,
3) the generality of the distributions of the detected
categories.
Clustering algorithms that satisfactorily overcome all of
these problems are not yet available. In general, techniques
that can handle a relatively large data set (say 1000 points)
are only capable of detecting very simple distributions of
points [Fig. 1(a)]; on the other hand, techniques that perform an extensive search in the feature space (the vector
Manuscript received September 22, 1969; revised December 15, 1969.
The research reported here was sponsored by the National Research
Council of Canada under Grant A4156.
I. Gitman was with the Department of Electrical Engineering, McGill
University, Montreal, Canada. He is now with the Research and Development Laboratories, Northern Electric Co. Ltd., Ottawa, Ontario, Canada.
M. D. Levine is with the Department of Electrical Engineering, McGill
University, Montreal, Canada.
62.
(b)
Fig. 1. Distribution of points in a two-dimensional feature space.

The curves represent the closure of the sets which exhibit a high
concentration of sample points.
space in which the points are represented) are only able to

handle a small data set.
Some authors [6], [9], [19], [20] have formulated the
clustering problem in terms of a minimization of a functional based on a distance measure applied to an underlying
model for the data. The clustering methods used to derive
this optimal partition perform an extensive search and are
therefore only applicable to small data sets (less than 200
points). In addition, there is no guarantee that the convergence is to the true minimum. Other methods [3], [12], [15]
use the so-called pairwise similarity matrix or sample covariance matrix [15]. These are memory-limited since, for
example, one-half million memory locations are required
just to store the matrix elements when clustering a data set
of 1000 points. Also, the methods in [12], [15] will generally
not give satisfactory results in detecting categories for an
input space of the type shown in Fig. 1(b).
It is rather difficult to make a fruitful comparison among
the many clustering techniques that have been reported in
the literature and this is not the aim of the paper. The difficulty may be attributed to the fact that many of the algorithms are heuristic in nature, and furthermore, have not
been tested on the same standard data sets. In general it
seems that most of the algorithms are not capable of detecting categories which exhibit complicated distributions in
the feature space [Fig. 1(c)] and that a great many are not
applicable to large data sets (greater than 2000 points).
This paper discusses an algorithm which partitions the
given data set into "unimodal fuzzy sets." The notion of a
unimodal fuzzy set has been chosen to represent the partition of a data set for two reasons. First, it is capable of detecting all the locations in the vector space where there exist
highly concentrated clusters of points, since these will appear as modes according to some measure of "cohesiveness." Second, the notion is general enough to represent
clusters which exhibit quite general distributions of points.
IEEE TRANSACTIONS ON COMPUTERS, JULY 1970
584
The generated partition is optimal in the sense that the program detects all of the existing unimodal fuzzy sets and
realizes the maximum separation [21] among them. The
algorithm attempts to solve problems 1), 2), and 3) mentioned above; that is, it is economical in memory space and
computational time requirements and also detects groups
which are fairly generally distributed in the feature space
[Fig. 1(c)]. The algorithm is a systematic procedure (as opposed to an iterative technique) which always terminates
and the computation time is reasonable.
An important distinction between this procedure and the
methods reported in the literature' is that the latter use a
distance measure (or certain average distances) as the only
means of clustering. We have introduced another "dimension," the dimension of the order of "importance" of every
point, as an aid in the clustering process. This is accomplished by associating with every point in the set a grade of
membership or characteristic value [21 ]. Thus the order of
the points according to their grade of membership, as well
as their order according to distance, are used in the algorithm. The latter partitions a sample from a multimodal
fuzzy set into unimodal fuzzy sets.
In Section II the concept of a fuzzy set is extended in order
to define both symmetric and unimodal fuzzy sets. The
basic algorithm consists of the two procedures, F and S,
which are described in detail in Sections III and IV, respectively. Section V deals with the application of the algorithm
to the clustering of data and the various practical implications. Section VI discusses the experimental results. Possible
extensions of the algorithm to handle very large data sets
(say greater than 30 000 points) are presented in Section
VII. The conclusions are given in Section VIII.
V
and
.-
i= {xif(x) f(xi)}2
2
=
x) < d(u, xi)}
{xdd(jl,
where xi is some point in B and d is a metric.

Definition: A fuzzy set B is symmetric if and only if, for
every point xi in B, xi = rxid*
Clearly, if B is symmetric, then for every two points xi
and Xk in B,
d(xi,,) < d(Xk, U):: tf(Xi) . f(x4
As an example of a symmetric fuzzy set, consider the set B

defined as "all the very tall buildings." B is a symmetric
fuzzy set, since the taller the building, the higher the grade
of membership it will have in B. Any symmetric (in the
ordinary sense) function, or a truncated symmetric function,
can represent a characteristic function of a symmetric
fuzzy set.
Definition: A fuzzy set B is unimodal if and only if the
set Fx, is connected for all xi in B (see Fig. 2).
In order to consider the problem of clustering data points
it will be necessary to define discrete fuzzy sets.
A sample point from B will be a point xeX with its associated characteristic value, f(x). Further, we will denote a
L)N}, where xi is a
sample of N points from B by Si{(xi,
of membership.
grade
corresponding
its
point in X and fi
which includes
set
fuzzy
discrete
a
as
S can be considered
shall require
We
the
sample.
by
given
only those points xi
to
in
comparison
S
is
large
in
particular,
a large sample S;
local
of
number
the
to
and
X,
of
the
space
the dimension
maxima in f.
{(Si, yi)m} denotes a partition of S into m subsets, where
II. DEFINITIONS
Si is a discrete fuzzy subset and pi the point in Si at which
[21].
Zadeh
by
introduced
was
set
fuzzy
a
of
The notion
the maximal grade of membership is attained. We refer to
Then:
xeX.
elements
with
of
points
Let X be a space
pi as the mode of Si. A mode will be called a local maximum
if it is a local maximum of f and will then be denoted by vi.
"a fuzzy set A in X is characterized by a membership
It will be assumed that every local maximal grade of mem(characteristic) function fA(x) which associates with
bership is unique [211.
each point in X a real number in the interval [0, 1], with
The notion of an interior point in a discrete fuzzy set
the value fA(x) at x representing the 'grade of memberas follows.
defined
is
ship' of x in A."
Definition: Let S be a sample from a fuzzy set B and Si a
In the rest of this section we shall introduce some nota- proper subset of S. For some point Xk in Si we associate a
tion and certain definitions required for the description of point x, in (S - Si) such that
the algorithm.
d(x,, Xk) = xje(SminSi) [d(xk, xi)].
Let B be a fuzzy set in X with the membership (charactermaximal
the
which
at
4u
be
the
point
let
and
f,
istic) function
grade of membership is attained, that is,
The point Xk is defined to be an interior point in Si if and
only if the set F = {xld(xi, x) < d(xi, Xk)} includes at least one
[f(x)].
f(=) Sup
xeB
sample point in Si (see Fig. 3).
Note that when the sample is of infinite size, in the sense
We may define two sets in B as follows:
that every point in X is also in S, this definition reduces to
that of an interior point in ordinary sets.
' Rogers and Tanimoto [17] introduced a certain order among the
Given a sample, the algorithm to be described in the next
points by associating with the point i, a value which is the number of points
at a constant finite distance from i. When the number of attributes is large, two sections is composed of two parts: procedure F which
-
this so-called "prime mode" will be the centroid of the data set, rather
than a "mode" of a cluster. A measure of inhomogeneity is used to detect
clusters one at a time.
2 This is equivalent to the set Fr
in [21 ] where
=f(x,i).
GITMAN AND LEVINE: ALGORITHM FOR DETECTING UNIMODAL FUZZY SETS
585
point in An is its mode and the points are ordered according

to their distance to this mode. Not all the points in An will
necessarily be assigned into group n at the termination of
X2
x)
Qrb
(a)
(b)
Xi
Fig. 2. An example of a unimodal fuzzy set in a two-dimensional space.

The curves indicate lines of equigrade of membership. (a) A unimodal
fuzzy set where for every point xi in the set, the set 17, is not disjoint.
(b) A multimodal fuzzy set, since there exists a point xr for which F,r
is disjoint.
000
X2
r2
Xi
Fig. 3. The points in Si are denoted by x. The point Xkl is on the boundary
of Si since IF includes no sample points in S.. The point Xk2 is an
interior point in Si, since F2 includes points in Si.
detects all the local maxima off, and procedure S which

uses these local maxima and partitions the given sample into
unimodal fuzzy sets.
PROCEDURE F
Given a sample S= {(xi, fi)N} from a multimodal fuzzy set,
subject to certain conditions on f and S (see Theorem 1),
procedure F detects all the local maxima of f. It is divided
into two parts: in the first part, the sample is partitioned into
symmetric subsets and in the second, a search for the local
maxima in the generated subsets is performed.
In order to make the steps in the procedure clear, some
preliminary explanations are given below. An example
which demonstrates the procedure is presented later.
The number of groups (subsets) into which the sample is
partitioned is not known beforehand. The procedure is
initialized by the construction of two sequences: a sequence
A in which the points are ordered according to their grade
of membership, and a sequence A1 in which they are ordered
according to their distance to the mode of A (the first point
in A). The order of the points in the sequence A is the order
in which the points are considered for assignment into
groups. This process will initiate new groups when certain
conditions are satisfied. Whenever a group, say n, is initiated,
a sequence of points An is formed of all the points in S which
might be considered for assignment into group n. The first
III.
the procedure. At every stage of assignment, every group i

displays a point from its sequence Ai, which is its candidate
point, to be accepted into the group. The point of A to be
assigned is compared (for identity) with each of the candidate points in turn and is either assigned into one of the
existing groups (if it is identical to the corresponding
candidate point) or initiates a new group. If, for example,
the point is assigned into groupj, then the candidate of this
group is replaced by its next point in the sequence A4. Thus
a point is assigned to the group in which its order according
to the grade of membership corresponds to its order according to the distance to its mode.
Part 1 of Procedure F
Let S = {(xi, fj)N} be a sample from a fuzzy set (assume,
for simplicity, that fi # fj for i #j).'
1) Initially it is required to generate the following two
sequences.
a) A = (yv1, Y2, YN) is a descending sequence of the
points in the sample ordered according to their grade
of membership; that is, fj 2 f, for 1< t, where fj and f,
are the grades of membership of yj and Yt, respectively.
where yI =y1,4 is the sequence
,
b) A1 =(yl, y' y',
Y3 *Y),
of the points ordered according to their distance to
that is, d(yl, yJ) < d(yl, y1) forj. t.5
We will also refer to A1 as the sequence of ordered
"candidate" points to be assigned into group 1. Thus y2
is the first candidate, and if it is assigned into group 1,
then y' becomes the next candidate, and so on. We can
therefore state that the current candidate point for group 1,
y1, is the nearest point to its mode yI(mY 1up1) except for
points that have already been assigned to group 1. This will
hold true for any sequence Ai; that is, y' -pi is the mode
for group i, and y' is its candidate point.
2) If yjmy=y, for i-2, 3,..., r-1, and Y 1, then yi,
i= 1, 2,"*, r-1, are assigned into group 1 and a new group
is initiated with y2- l2 Yr as its mode. That is, the sequence
A2=(yf y2 y32** y2) iS generated. The latter includes
from among the points that have not yet been assigned,
those points which are closer to Yr than the shortest distance
from Yr to the points that have already been assigned; this is
shown for one dimension in Fig. 4. The points in A2 are now
ordered according to their distance to Yr; that is, d(yr, y2)
y.d(ry2) forj.t.
3) Suppose that G groups have been initiated. Thus there

exist G sequences, Ai, i = 1, * , G, each of which displays a
candidate point, y'. Suppose that yq in the sequence A is the
point currently being considered for assignment (all the
3 The case in which there are equal grades of membership will be discussed in Section V.
4 We shall use the symbol "-" to mean "is identical to."
5 The sets A and A1 are sequences of the same N points; however, the
ordering principle is different. Thus the point y, is some point in A and its
label indicates that it is also in location k in the sequence Ai.
586

f
S3S2 St
S4
' ,,I
rp
~~~ri
Fig. 4. At the stage where the point xi(=ji,) initiates a new group, all the
sample points that have already been assigned are in the domain VP.
Thus the nearest point in Fp to xi is at a distance Ri, which defines the
domain Fj of all the points which are at a shorter distance to xi than Ri.
The sample points in Fi will be ordered as candidate points to be
assigned into the group in which xi is the mode.
points in A for i < q have already been assigned); then the

following holds.
a) Ifyq=my-and Yq #yJ j= 1,.* , G, j= i, then yq is assigned
into group i.
b) If yq Yi for some ieI, where I is a set of integers representing those groups whose candidate points are
identical to yq, then yq is assigned into that group to
which its nearest neighbor with a higher grade of
membership, has been assigned.
c) If Yq # yi for i = 1,*, G, then a new group is initiated
with Yq as its mode.
Part 1 of procedure F is terminated when the sequence A is
exhausted.
Theorem 16: Let f be a characteristic function of a fuzzy
set with K local maxima so that:
1) if VK is a local maximum of f, then there exists a finite
E >0 such that the set {xld(vk, x) < E } is a symmetric
fuzzy set.
Let S = {(xi, f1)N} be a large sample from f, such that:
2) for every xi in the domain of f, the set {xId(xi, x)<e/2}
includes at least one point in S, and
3) {(Vk, fk) }CS.
Let {(Si, pj)m} denote the partition generated by part 1 of
procedure F, where Si denotes the discrete fuzzy set, ji its
maximal grade of membership (mode), and m the number
of groups. Then pi is an interior point in Si if and only if
it is a local maximum of f.
Theorem 1 states the sufficient condition under which the
procedure will detect all the local maxima of f. The main
restriction is the requirement that every local maximum of
f shall have a small symmetric subset in its neighborhood
(condition 1). It is not necessary for the sample to be of
infinite size; it will be sufficient if it is large in the neighborhood of a local maximum. Condition 2 indirectly relates
the dimension of the space to the size ofthe sample set.
Using the result of Theorem 1, part 2 of the procedure is
employed to check all the modes pi in order to detect which
6
The proofs of the theorems are given in Appendix I.
XIx lX7X
IX9
1X3
X15
lX
X20
X2F3 X25 IX27 X2
Fig. 5. The characteristic functionf and the 30 point sample for the example are shown. The dotted lines indicate the partition (the sets Si)
resulting from the application of part I of procedure F. We can observe
that x15 and x25 are the only interior modes in the partition and thus
will be recognized as the local maxima points (vi) off.
are interior points. This is done according to the definition
given in the previous section.
Part 2
Let {Si,
yui)l}
be the partition generated by part 1 of
procedure F. For every mode pi and set Si, a point x,j and a
distance Ri can be found as follows:
Ri = d(,u, xpi) = min [d(i, Xk)J.

Xke(S - Si)
Ri is the minimum distance from the mode to a point in S

outside the set Si. We say that ,ui is a local maximum if the set
FRj = {xld(xpi, X) < Ri}

includes points in Si. Otherwise we decide that pi is not a
local maximum because it is a boundary point of Si.
To summarize this section, if f and S satisfy the conditions
stated in Theorem 1, then the procedure presented detects
all the local maxima of f .
Example: The following example demonstrates the various procedures associated with the algorithm. A sample of
30 points was taken from a one-dimensional characteristic
function. The latter, as well as the sample points with their
associated sample numbers, are shown in Fig. 5. The sequences A and A1 are given by
A
(Y1, Y2,
Y,Y30)
(x15, x14, x16, x13, (x2X,x1l,lx17

X25, X24, X10, X26, X9, X27, X23, X18,
X8, X28, X7, X6, X5, X4, X3, X19, X29,
A1 = (yll Y21 Y31*
X2, X30, X1, X20, X22, X21).

Y30) = XX514 X16 X13
X2 X18
Xll, XlO, X9,X19, X20, X8, X7,
,
X21, X6, X22, X5, X4, X23, X3, X24,

X2, X1, X25, X26, X27, X28, X29,
X30)-
Observing these sequences, we can see that the first four

points in A and A1 are pairwise identical and thus they are
assigned to group 1. Thereafter, the candidate point for
group 1 is y1 =x17, whereas the point to be assigned is x12.
Thus the latter will initiate a new group and a new sequence
A2 will be generated.
587
After the first four groups are initiated, the sequences Ai ture. Rather than an arbitrary order which is the usual case,
the points are finally assigned in the order in which they
and the resulting partition to this point are as follows:
appear in the sequence A.
i A - (Y1, Y2,
Y30)
Specifically, let S = {(xi, fi)N} be a sample from a fuzzy set,
=: (X15, X14, X16, X13, X12, Xll,
and
{(vi, fJ(v))K} c S be the sample of the K local maxima
X17, X25, X24,
, X26,
f Assume that f1(xi)Af+j(xj) for i #j, and f(vj)>f(vj)
of
vi
... IX21)
for i .j. Let A be the sequence of the points ordered accordI
(A (-Y'l,1 Y2L * . , Y30J
-:: X1 5, X14, X1l6, X 13, X17X127? ing to their grade of membership, and suppose that the K
i K in A. We
local maxima of f are in locations pi, i= 1,
can infer the following proposition.
SI = (XI 5, X1 4, X 1 6 1l 13 17,
Proposition: The point xj in location j in the sequence
(A (y2)
A, PM<j<P(M+1), M<K, can only be assigned into one of
(X12)
the groups ieIM= {1, 2, * * * , M}.
S2= (X12,)
If f(Xpr), r= M+ 1, M + 2,.* , K is the local maximum
(A3 (yS y, y3)
x9)
(Xi
of group r, then only points with a lower grade of membership can be assigned into group r. Since all the points that
S3 = (X11,)
A4 - (Y4,Y4 4, ,143) = (X25, X24,X26, X23, X27, X28, precede location p, in A have higher grades of membership,
none of them can be assigned into group r, r=M+ 1,
X22, X29, X21, X30, X20,
M+2, * **,K.
This proposition implies that all the points in A which are
X18)
found in the locations Pj .J<p will automatically be asS4 =(X25,X24)
signed into group 1; the points in locations P2<] <P3 will
be divided between group 1 and group 2, and so on.
In relation to the procedure described above, we note the
Procedure S uses the following rule: assign the point xj
following.
in location j in the sequence A into the group in which its
1) The sequence A2 includes only one point (its mode) nearest neighbor with a higher grade of membership (all
since the nearest point to x12 in S has already been the points preceding xj in A) has been assigned. This rule
assigned. Therefore there are no sample points in S to applies to all the points with the exception of the local
maxima that initiate new groups. Note that the rule is difgenerate a symmetric fuzzy set whose mode is x12.
2) At the stage shown, x1o in the sequence A is to be as- ferent from the "nearest neighbor classification rule" [5 ] besigned. The candidate points for the four groups that cause of the particular order in which the points are introhave already been initiated are x12, no candidate, x10, duced.
Theorem 2: Let f be a piecewise continuous characterand x26, respectively. Thus x1o will be assigned to
group 3 since it is identical with the latter's candidate istic function of a fuzzy set.
Let S = {(xi, fj)N} be an infinite sample from f, such that
point.
3) No more points will be assigned into group 1, since its
1) for every xi in the domain of f and for an oa 0, the set
candidate x12 has already been assigned to another
F = {xId(xi, x) < a/2} includes at least one sample
group, and thus cannot be replaced as a candidate for
S.
point
group 1.
If L-*O, then procedure S partitions the given sample into
The resulting symmetric fuzzy sets generated by the applicaunimodal
fuzzy sets.
2
of
5.
Part
shown
in
F
are
1
Fig.
tion of part of procedure
3: Let S be a sample from a fuzzy set with a
Theorem
13
modes
to
each
of
the
test
the procedure is now applied to
characteristic
function f. Let f and S be constrained as in
can
In
5
we
Fig.
are
interior points.
detect which of these
2.
Theorem
in
interior
points
and
are
modes
x25
see that only the
x15
If x-+0, then every final set is a union of the sets Si gentheir corresponding sets, and therefore only two local
erated
in part 1 of procedure F.
the
this
partial result,
maxima are discovered. Based on
=
X
If
E', a more powerful result than Theorem 2 can be
in
next
section
the
end
the
of
at
be
continued
will
example
for
simplicity we will state it for the case of two local
stated;
S.
demonstrate
to
order
procedure
maxima.
Theorem 4: Let f be a piecewise continuous characteristic
IV. PROCEDURE S
function of a fuzzy set and d the distance between its two
Procedure S partitions a sample from a fuzzy set into local maxima.
unimodal fuzzy sets, providing the local maxima of f are
Let S be a sample from f, such that,
known. Thus this procedure uses the information obtained
1) for every point xi in the domain of f and for a finite
from the application of procedure F; that is, the number,
ox > 0, a d, the set F = {xjd(x., x) < o} includes at least
location, and characteristic values of the local maxima off.
one point in S, and
The rule for assigning the points differs from the known classification rules appearing in the pattern recognition litera2) the local maxima, (v1, f(v1)), (v2, f (v2)) are in S.
=
11
I,
X19,
588
Let H = xo7 be the optimal hyperplane (point) separating

f into unimodal fuzzy sets and F, = {xld(xo, x) <oc/2}.
If S does not include any points in F7, then procedure S

derives the optimal partition of S for any finite cx, oc <<V
Theorem 2 states the sufficient conditions (but not necessary) under which procedure S derives the optimal partition
into unimodal fuzzy sets. Note that when cx = 0, the sample
S is identically equal to the domain of f. On the other hand,
given a characteristic function f, we can always find a finite
ac for which the result holds. Observing procedure S, we may
see that the sample must be large, particularly in the neighborhood of the separating hypersurface (see Theorem 4).
Utilizing the result of Theorem 3, we can modify procedure S to assign subsets Si, generated in part 1 of procedure
F, rather than individual points of S. That is, we can first
assign pi (the mode of Si), and then automatically all the
points in Si to this same group. In fact no further computation is necessary since, if pi is a mode, it will initiate a new
set (group) in part 1 of procedure F. In the latter, when
evaluating the distances to the points that have already
been assigned, we can record its nearest point (with a higher
grade of membership). Hence procedure S reduces to an
automatic classification of the points.
Theorem 4 implies that if a is finite, but ox << d, then only
poinis in S within a distance cx to H can be misclassified.
Example: To demonstrate procedure S, we again consider
the sequence A, where now it is assumed that the local
maxima are known.
A = ( ,x14,x16, x13,3x12,x Xll 7, 5 , X24, X10, x26,
X9, X27, (i), x18, X8, x28, x7, x6, x5, x4, x3, x19, x29,
x1
X2, X30, X1, X20, X22, X21).
All the points up to x25 are automatically assigned to the

group in which x15 is the local maximum (see proposition).
The other points are assigned either to the first group or to
the second (where x25 is the local maximum) according to
the classification of the nearest point to the point to be
assigned.
In particular, if x23 is the point to be currently assigned,
then the partial sets (S, and S2) are given by
Sl = (X15, X14, x16, X13, x12, xll, X17, X10, X%,)

S2 = (X25, X24, X26, X27,)Since the nearest point to x23 (among the ones that have
already been assigned) is x24, the former will be assigned into
S2. This stage of the process and the final partition are
shown in Fig. 6.
V. THE APPLICATION OF THE ALGORITHM
TO CLUSTERING DATA
The problem we have treated so far, which can be stated
as "the partition of a fuzzy set into unimodal fuzzy sets,"
is well defined. This is not, however, the case in the clustering
' If X= El, then a unimodal fuzzy set is also a convex fuzzy set [21],
and the hypersurface becomes a hyperplane.
8 Among other changes in the statement of the theorem in the case when
the number of local maxima is greater than two, we must replace the
distance d by the minimum distance between any two local maxima off.
f,
II
14
-23)4
--I
X15
X17
X23 X2025
r.
Fig. 6. This figure demonstrates procedure S. At a certain stage in the

procedure, all the points in the sequence A with higher grade of membership than f1 have been assigned, and X23 is the next point to be
assigned. In this case, all the points in the domains F1 and F2 have
already been assigned. The distance of x23 to all the points F, and F2
is evaluated and this point will eventually be assigned into the group
in which x24 is a member. The dotted line indicates the final partition
for this example.
problem [16] where a set of points {(xi)'} is given that must

be clustered into categories. In order to directly employ the
algorithm it is necessary to associate with every point xi a
grade of membership fi. In other words, a certain order of
importance is introduced to facilitate the discrimination
among the points not only on the basis of their location (in
the vector space) but also according to their "importance."
There are many possible ways to discriminate among the
points. One possibility is to use a clustering model to associate with every point a membership value according to its
"contribution" to the desired partition. By a clustering
model, we mean functionals which describe the properties
of the desired categories and the structure of the final partition [19].
In our experiments, we have used a threshold value T and
associated with every point xi, an integer number ni which is
the number of sample points in the set Fi = {xid(xi, x) < T}.
It is obvious that the resulting partition is dependent on T,
although for any T, a unique partition into unimodal fuzzy
sets is derived. A previous knowledge about the data to be
partitioned is not essential in order to choose T. The latter
must be determined in such a way that there is "sufficient"
discrimination among the points. For example, in the extreme, if a very large threshold is chosen, then every point
will have the same number ni(ni= N) and no discrimination
is achieved; on the other hand, this is also true for a very
small T, but in this case ni= 1. The threshold essentially
controls the resolution of the characteristic function f. It is
quite within the realm of possibility to automate this procedure but this was not done for the experiments reported in
Section VI.
It is also necessary to consider the practical situation
where many points have the same grade of membership
since this was explicitly excluded in the previous theoretical developments. This problem was solved by allowing
for a permutation of the points in the sequence A when they
have the same grade of membership. More specifically, consider part I of the procedure F in which the symmetric fuzzy
sets are derived. Suppose that G groups have already been
initiated and Yq is the point in the sequence A to be assigned
next. If YqYic, for i= 1, -, G and if f(Yq+I)=(Yq), then
589
the identity between Yq 1 and y', i= 1, , G, is checked.

Thus yq will initiate a new group only if none of the points
Yq+ 1' Yq+2, '', with the same grade of membership as Yq,
G.
is identical to y i= 1,
Another consideration is the case in which the maximal
grade of membership in a set Si is attained by a number of
points. To solve this problem, we have modified part II of
procedure F (in which a search for the local maxima is performed) in the following way. Let Sii be the subset of points
in Si which have the same (maximal) grade of membership as
pi; then every point in Sii is examined as the mode of Si.
If at least one of these points is on the boundary of Si, then
pi is not considered as a local maximum.
VI. EXPERIMENTS
Clustering techniques can be compared only if they are
applied to the same data set or to data sets in which the
categories are well-known beforehand. Such experiments
can therefore be performed either on artifically generated
data, or on data sets such as, for example, handprinted
characters [4].
In order to be able to reach some significant conclusions
concerning the performance of the algorithm we have applied it to artifically generated data sets. The latter consists
of points in a ten-dimensional vector space and belonging
to sets described by multimodal spherical and ellipsoidal
distributions. The samples from each category of the former
were generated by adding zero-mean Gaussian noise to a
prototype vector. The ellipsoidal data sets were determined
by subjecting the vectors of the spherical data sets to certain
linear transformations, stretching, and rotation. This data
set is a part of the version that was used in [7 ], [18 ] for pattern recognition experiments and is described in [8]. We
have taken the first ten prototype vectors and generated a
data set of 1000 points-100 points for each prototype
vector.9
Two series of experiments were performed. In the first
series, the algorithm was applied to six data sets; the spherical sets with = 15, 20, and 25, and the ellipsoidal data sets
derived from these. A summary of the results is given in
Tables I and II. In the second series, two additional runs
with the ellipsoidal data set (derived from the spherical
set with a = 15) obtained with different initial conditions for
the random number generator, were performed. The same
threshold T as in the first series was used, thus facilitating a
comparison of the results of three runs for different initial
conditions of the random number generator. The results
are shown in Tables III and IV.
The optimal partitions for these data sets are unknown
but will be characterized by the rates of error associated
with the optimal solution of the supervised pattern recognition problem. This classification is achieved by assuming a
knowledge of the functional form and the parameters of the
parent populations and using an optimal classifier (Bayes
sense). Although these solutions are known theoretically,
the computation for the ellipsoidal data sets is difficult be-
TABLE I*
Group
Number
Ellipsoidal
Spherical
u=25
a=15
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
6=25
a=15
101(1)
100
100
100
100
97
95
71
71
69
29
29
22
7
4
3
2
201(100, 1) 202(99, 3)
100
100
100
100
99
100
99
99(2)
97
97
94
96
88
94(2)
81
78(1)
79(1)
20
19
66
12
8
21
3
8
21
3
2
19
15
13
1
100
100
100
100
100
99
85
81
a=20
u=25
101(1) 271(89, 83, 2)

101(1)
100
100
99(1)
93(1)
97(1)
88
93
87
80
62(2)
78
62(8)
65
45
62
38
60
13
38
13(1)
26
10
24
9
18
12
5
3
10
10
5
4
4t
* n(nl, n2, n3) indicates that there is a total of n points in the corresponding group of which n1, n2, and n3 are from different categories.
t In this case 5 additional groups of 4, 2, 2, 2, 2 points, respectively,
were
generated.
TABLE II*
6
Spherical
15
20
25
4000
2500
4000
92
17
18
15
20
25
3500
4000
4500
65
31
15
Ellipsoidal
JEt
(percent)
(minute)
0.1
10.3
10.5
9.1
11.6
12.8
3.40
5.29
5.10
0.1
0.3
18.7
9.7
16.8
23.9
3.33
5.59
5.04
E.
Jv)
DeJ(v
rpercent)
Data
CPU
*f(vi) indicates the maximal grade of membership in the corresponding

test. CPU is the number of minutes required to cluster the data on an
IBM 360/75 and includes the time needed to generate the data set.
9 The prototype vectors which have been used for the data sets
listed in Appendix II.
are
cause the hyperellipsoids which indicate the hypersurfaces
of equal probability density have different shapes and orientations (see [7]). The reference partition that we have
used is the partition into the original ten categories of 100
points each. It is appreciated that this partition cannot be
achieved by any clustering technique because ofoverlapping
among the categories, in particular for the case of a= 25.
Two types of errors have been used to grade the partitions.
1) Em, the mixing error, defines the error caused by some
of the points of category i being assigned to categoryj,
i #1; it is therefore a result of the possible overlapping
among the categories or the linking of several categories.
2) Et, the total error, consists of Em plus the error produced by the generation of small clusters not in the
original set of ten. These small clusters are the result
of the fact that a finite sample from a Gaussian distribution can be made up of several modes.
IEEE TRANSACTIONS ON
590
TABLE III
Group
Number
Ellipsoidal
=15
1
2
3
4
5
6
7
8
9
10
101(1)
100
100
100
100
97
95
71
71
69
100
100
100
100
100
96
95
92
57
48
11
12
13
14
15
16
17
18
29
29
22
7
4
3
2
38
20
11
8
7
5
5
5
3500
158(58)
100
100
100
100
100
100
100
63
42
16
11
10
19
20
21
4
4
TABLE IV
Ellipsoidal
a=15, T2=3500
f(v1)
Em (percent)
E, (percent)
CPU
65
70
70
0.1
0
5.8
9.7
11.2
9.5
3.33
3.29
3.33
From Table I, we can see that nine to ten major categories

number of small clusters were generated in each
test. These clearly indicate that the samples of some of the
categories are in fact multimodal. The experiments show
that there is a small amount of overlapping among some of
the categories. The major mixing error can be attributed to
the fact that the algorithm did not detect a local maximum
in the neighborhood of the prototype vector for some of the
categories. This can be seen in Table I by the entries in the
first row, columns 2, 3, and 6, where 2, 2, and 3 categories,
respectively, have been linked together. The reason for this
seems to be that the sample was not large enough. This is
supported by the low values off(v1) in Table II where for
the above tests the entries are 17, 18, and 15, respectively.
We believe that a better choice for T could have eliminated
this mixing for the spherical data set with a = 20, although
it is doubtful that this could be achieved for the data sets
with v= 25, given the sample size. On the other hand, it is
reasonable to assume that the problem could be eliminated
using larger data sets.
A total of 25 experiments (3 to 5 per data set) have been
performed and the best results are included in the tables
The threshold T was varied coarsely over a wide range and
no fine adjustments were made in order to improve the results. The minimum value of T is constrained by the resolution, while the maximum is constrained by the possible
as well as a
COMPUTERS, JULY 1970
linkage of several categories; that is, if T is very large, then

a point which is not in a cluster at all, but in a space among
several categories, might have the largest grade of membership. However, even in this case, the point will usually not
become a local maximum, since the condition for having a
symmetric fuzzy set in its neighborhood will not be satisfied.
As a guide, a small T is preferred when no previous knowledge of the data set is available.
The required computing time lay between three and six
minutes on an IBM 360/75 computer and this depended on
the discrimination in the values off. If it is such that many
points have the same grade of membership, then procedure
F requires more computer time (see Section V). The value
f(v1) in Table II gives some indication as to the discrimination achieved; comparing the entries in this column with
the corresponding ones in the CPU column gives some support to the above statement. This factor could be eliminated
by, for example, using an additional measure for discriminating among the points which have the same grade of
membership, or possibly by using an underlying model to
evaluate the grade of membership and so yield a continuous
variation in f The computer program used the process of
assigning a point at a time in procedure S. All of the computing time in the latter which includes N(N- 1) computations of distance and the search for the minimum distance
for every point, could be saved by applying Theorem 3. It
is estimated that this would result in an approximate 25
percent reduction in computational time.
From the results of the second series of experiments (see
Tables III and IV) we can see that the partitions generated
with data sets obtained for different initial conditions of the
random number generator, are similar. In one of these experiments, a local maximum in the neighborhood of one of
the categories was not detected, thus linking 58 points of
this category with another. The difference in the error rates
is within 1.7 percent.
Generally speaking, the results are quite encouraging. In
the two series of experiments, 5 out of 80 local maxima in
the neighborhood of the prototype vectors were not detected. This problem could be eliminated if the size of the
sample were increased. In particular, the fact that the error
rates for the ellipsoidal data are comparable with those for
the spherical sets, indicates that the shape of distribution of
the points was not a major factor in causing the error. This
supports our claim that the algorithm is capable ofdetecting
categories with general distributions of points.
VII. THE EXTENSION OF THE ALGORITHM TO
VERY LARGE DATA SETS
The computers available now are generally not capable
of clustering very large data sets (say, greater than 30 000
points in a many dimensional space) because of both memory space and computing time limitations. We propose two
ways in which such sets could be treated to derive partitions
which are very similar (if not identical) to the ones discussed
in the previous sections. These have not yet been tested
experimentally.
Threshold Filtering
In this process we reduce the sample size before applying
procedures F and S. A small threshold T1 is employed for
filtering purposes while a large value T2 (equivalent to T in
the previous section) is used to evaluate the final grade of
membership.
The first point, say x1, is introduced. Then all the other
points are introduced sequentially and the distance from x1
to every point is measured. If d(x1, xi) < T1, then the grade
of membership of x1 is increased by 1; the corresponding
point xi is assignedfinally into the group into which x1 will
later be assigned. Thus xi is not considered further in the
application of procedures F and S. On the other hand, if
d(x1, xi)> T1, then xi will again be introduced until every
point has been assigned. When this process of filtering is
terminated, there remains a smaller set of points, x1,
X2,.* *, XN with the temporary grades of membership
nl, n2,. ,nN, where
N
Z ni = the number of points in the original data set.
591
(a)
xi
(b)
__El_
(c)
fd
i= 1
Now the usual discrimination procedure is employed; for

example, to evaluate f (xi), if
S
' {xld(xi, x) < T2}
{xi, Xi, Xm},
then set
f(xi) = ni + ni +, nm.
If N is of such a size that can be handled by the available
computer, then the algorithm can be employed; if not, a
further filtering stage can be imposed in the same manner.
Although threshold filtering has been used before, it has a
particular significance here. This is because the points
which are filtered out contribute to the partition of the entire set since they are represented in the grade of membership of the points which are included for clustering.
~N
(d)
K
x
Fig. 7. The truncation process. (a) The characteristic function f, truncated

at f1 and f11. (b), (c), (d). The resultant characteristic functions fb, f,
and fd.
mains and the grade of membership of the points is as in the

original functionf The entire domain and the three domains
resulting from the truncation are shown in Fig. 7(a), (b),
(c), and (d), respectively. It can be seen that a local maximum may sometimes not be detected if the truncation is
done immediately after a local maximum point.
VIII. CONCLUSIONS AND REMARKS
An algorithm is presented which partitions a given sample
from a multimodal fuzzy set into unimodal fuzzy sets. It is
proven that if certain conditions are satisfied, then the algorithm derives the optimal partition in the sense treated. This
partition is also optimal in the sense of maximum separation [21]. The use of this algorithm as a clustering technique
was also demonstrated. Experiments on artificially generated data sets were performed and both the results and
errors were discussed in detail.
The algorithm can also be applied effectively in supervised pattern recognition, in particular when the categories
are multimodal and this information is not known. Such
experiments have been reported in [7], [101, [18]. We can
use this algorithm to first partition every category independently into unimodal fuzzy sets. In this case we associate with every point xi the distance membership function
Truncating the Sequence A

It can be observed that the major memory space limitations are governed by the requirements of part 1 in procedure F. By truncating the sequence A, part 1 of procedure F
can be applied sequentially to the truncated parts. Once the
sample has been partitioned into symmetric fuzzy subsets,
then part 2 of procedure F and procedure S can be applied
to the entire set.
First the sequence A is generated in the usual way. Then
it is truncated at several points according to the desired sample size, and the truncated parts can then be introduced sequentially in order to generate the symmetric fuzzy subsets.
An example of the truncation process when X= E' is given
in Fig. 7. Here the sequence A is truncated at a point xl where
F(xi) =fJ and at xi, where f(x11) =Jji. This operation results
in the partition (division) of the domain off into three dismin [d(xi, xj)]
fi= Xje(S-Ci)
joint domains where each of the latter may be a union of
several disjoint subdomains. Every subset which is pro- where Ci is the set of points in the category in which xi is a
duced includes sample points in only one of the above do- member.
592
IEEE TRANSACTIONS ON COMPUTERS, JULY
1970
It is suggested that the clustering algorithm reported in groupj,]j i. Then there exists a subset Sii c Si which satisfies
this paper possesses three advantages over the ones dis- the condition d(4i, xr)>> (pi, xq) for xreSii. Thus xq precedes
cussed in the literature.
all the points in Sii in the sequence Ai. Since xq is assigned
to
group j,j =# i, it will not be replaced as the candidate point
1) It does not require a great amount of fast core memory
in
group i, and thus will block all the points in Sii from being
and therefore can be applied to large data sets. The
storage requirement is (20N+ CN+ S)10 bytes, where assigned into group i.
N is the number of points to be partitioned, 20N and Proof of Theorem 1
CN are required for the fixed portion of the program
The lemma implies that if ji is not a local maximum then
and the variable length data sequences (A, Ai), respec- it must be on the
boundary of Si. It remains to be shown that
tively, and S is the number of storage locations re- if it is a local maximum
of f, then it is an interior point.
quired for the given set of data points. Obviously, S
If pi is a local maximum, then assumption 1 of Theorem 1
depends on the particular resolution of the magnitude
implies that the subset Si is
of the components of the data vectors.
2) The amount of computing time is relatively small.
Si = {xjfd(ui, xi) < q}, where n >E/2.
(1)
3) The shape of the distribution of the points in a group Now let be the
xt
sample point such that
(category) can be quite general because of the distributions that the unimodal fuzzy sets include. This can be
Rt = d(ui,xt) = min [d(Ci,Xk)].
Xk(SthSi)
an advantage, especially in practical problems in
which the categories are not distributed in "round" Assumption 1 implies that R, . e.
clusters.
To show that the set F ={xld(x,, x) < R1} includes at least
one sample point in Si, we may consider the line segment
APPENDIX I
joining x, and ji,, and the point xi, on this line such that
Proof of Theorem I
d(xin, pi)=e/2. Defining the set F=
x)<e/2}, assumption 2 assures that F includes at least one sample
Lemma: The sets Si are disjoint symmetric fuzzy sets.
and (1) shows that this point is in Si.
Proof: Let Aii define a subsequence of A; of the points point
that have been assigned into group i, arranged in the order Proof of Theorem 2
that they stand in A. Let Ai be the sequence of candidate
Without loss of generality, let us assume that f has only
points to be assigned into group i. Bearing in mind pro- two local maxima. Let H be the optimal hypersurface
cedure F, any two points xp and xq can be assigned to the separating f into the two unimodal fuzzy sets, and S, and
same set Si if and only if their order in Aii corresponds to S2 the optimal partition of S.
their order in Ai. Suppose their order does not correspond,
Suppose that (n-1) points have already been assigned
that is,
correctly, thus generating the sets S(n -1) c S1 and S(n 1) S
and that xneS1 is the point to be assigned next. It is suffiAiiad * xp,
Xq
cient to show that there exists a sample point x eS(n1
and
such that
Ai = (.. Xq, * * * Xp * * .).
d(x0, x.) < min [d(xn, x)] XpOS2 -n1)
Then xp in Aii must be assigned first. But xq precedes xp as a
Let
candidate to be assigned into group i; thus xq will prevent
xp from being assigned into Si since it is not replaced as a
r1 = {xld(x,,, x) = oc/2}, f(u) = sup
[f(x)]
candidate point, unless it is assigned to Si. Thus if Ni is the
xer
number of points that have been assigned into group i, then and IF = {xjd(u, x) <. /2}. Clearly, f(u)
.f(xn), since xn is
for every n< Ni:
not a local maximum. In the limit when a- 0, f (xn) <(x) for
every xe- F,. Let xv be the point such that
d(x, xv) = min [d(Xn, Xj)].
d(i, x)), d(pi, xj)} for j = 1, 2, , n-1,xj, S
{xld(xin,
...
xjce(F;")
and
Assumption 1 implies that (IFnS) is
f(xn)
>
f(Xr)
d(Iti, x.) < d(Iti,Xr)J
forr=n + 1,
,Ni, xreSi
d (x,, xv) < a.
We have shown that there is a sample point

that
which proves that S is symmetric.

f(xv) > f(xn) and
Disjointness is demonstrated by the same argument. Suppose Xq is an interior point in Si and has been assigned into ocx-+O establishes the proof, since
"
C depends on the number of categories that the given data set

represents; in our experiments, C= 5 was found to be sufficient. Note that
this estimate of the total memory space is correct for N up to 32 767.
not empty; thus
min
XpES2
[d(x,, xp)]
if and only if d(xn, H) < a.
<
d(xn, xv) < 2.
min
X E
>#(n- 1)
[d(x,, xi)]
xveS such
Proof of Theorem 3
In this proof we make use of the lemma to Theorem 1.
Note that in the proof of this lemma, none of the constraints
of Theorem 1 were applied; thus the sets Si generated by
part 1 of procedure F are always symmetric and disjoint
fuzzy sets.
Let us assume that f has only two maxima and let H be
the optimal hypersurface separating f into the two unimodal fuzzy sets. It is sufficient to show that if Si is a set
generated by the above procedure, then it is on one side
(either inside or outside) of H. Then the application of
Theorem 2 will complete the proof.
Suppose that Si includes points on both sides of H, say
the sets Si, and Si2(Si1uSi2=Si) and suppose that pi (the
mode of Si) is in Si,. Let x2 be the point such that
d(pi,x2) = min [d(pi,xj)].

XjG5i2
Let L be the line segment joining x2 and pi, and Xk be the

point at which L intersects H (suppose that there is one
point of intersection; if not, let Xk be the point of intersection with the lowest value f). Define the following sets:
Fk= {xld(Xk, X)
(}
and
Sk
S n
Fk
Condition 1 (see Theorem 2) implies that Sk is not empty.

Now if c--+O and xr is any point in Sk, then
f(Xr) <
f(x2)
and
d(pi, X2),
which implies that Si is not symmetric. This contradicts the
above assumption. An application of Theorem 2 completes
the proof since if S, and S2 is the optimal partition, then
S, is on one side of H and S2 is on its other side.
d(pi, Xr)
<
Proof of Theorem 4
Let S, and S2 denote the optimal partition of S. Suppose
that (n-1) points have already been assigned correctly, thus
generating S(- 1) c S, and S(n - 1) c S2, and suppose xneS1 is
the next point to be assigned.
Let xu be the point such that
d(xn, Xu)
min
=
Xj E
(S'(n 1) ')tasun
-
1 ))
[d(xn, xj)].
Condition 1 implies that d(xn, xu) <c. Now xU and xn must

belong to S,, since if there are no sample points in 17, then
d(xn, xp)2co for every xp in S2.
APPENDIX
II
The following are the ten prototype vectors, given by

their integer components in the ten-dimensional space,
which were used to generate the data sets for the experi-
ments discussed in Section VI. The order of the vectors

bears no relation to the group numbers in Tables I and III.
593
The vectors are the first ten of the eighty prototype vectors
given in [8].
VI
v2
-77 -57
-67 -57
-131 19
27 -65
-63
83
53
51
67
-87
-73 -69
73
53
49
57
V3
-791
-13
69
- 11
65
33
11
V4 <5
-27
59
-55
25
-27
-33
-47
67j 33
31
53
- I
67
47
89
V_6 V7
13
-19 -37
- 5 -43
45
47
-75 -71
-93 -87
-49 -21
- 71 - 37
-71 -91
29
-59
69
65
-25
- 17
41
-65
- 37
-65
V8 V9
Vo
3 99
- 3 51
43
27
21
29
3
23
59
43
5
-65
-73
-97
-35
11
35 25
-43
51
-23
21
5
87
-17
15
-41
-85
REFERENCES
[1] G. H. Ball, "Data analysis in the social sciences: What about the
details?" 1965 Fall Joint Computer Conf. AFIPS Proc., vol. 27, pt. 1.
Washington, D. C.: Spartan, 1965, pp. 533-559.
[2] G. H. Ball and D. J. Hall, "ISODATA, A novel method of data analysis
and pattern classification," Stanford Research Institute, Menlo Park,
Calif., April 1965.
[3] R. E. Bonner, "On some clustering techniques," IBM J. Res. alnd
Develop., vol. 8, pp. 22-32, January 1964.
[4] R. G. Casey and G. Nagy, "An autonomous reading machine,"
IEEE Trans. Computers, vol. C-17, pp. 492-503, May 1968; also IBM
Corp., Yorktown Heights, N. Y., Research Rept. RC-1768, February 1967.
[5] T. M. Cover and P. E. Hart, "Nearest neighbor pattern classification," IEEE Trans. Information Theory, vol. IT-13, pp. 21-27,
January 1967.
[6] A. A. Dorofeyuk, "Teaching algorithms for a pattern recognition
machine without a teacher based on the method of potential functions," Automation and Remote Control, vol. 27, pp. 1728-1737,
December 1966.
[7] R. 0. Duda and H. Fossum, "Pattern classification by iteratively
determined linear and piecewise linear discriminant functions," IEEE
Trans. Electronic Computers, vol. EC-15, pp. 220-232, April 1966.
[8]
, "Computer-generated data for pattern recognition experiments," available from C. A. Rosen, Stanford Research Institute,
Menlo Park, Calif., 1966.
[9] W. D. Fisher, "On grouping for maximum homogenity," Amer. Stat.
Assoc. J., vol. 53, pp. 789-798. 1958.
[10] 0. Firschen and M. Fischler, "Automatic subclass determination for
pattern-recognition applications," IEEE Trans. Electronic Computers
(Correspondence), vol. EC-12, pp. 137-141, April 1963.
[11] E. W. Forgy, "Detecting natural clusters of individuals," presented
at the 1964 Western Psych. Assoc. Meeting, Santa Monica, Calif.,
September 1964.
[12] J. A. Gengerelli, "A method for detecting subgroups in a population
and specifying their membership," J. Psych., vol. 55, pp. 457-468,
1963.
[13] T. Kaminuma, T. Takekawa, and S. Watanabe, "Reduction of
clustering problem to pattern recognition," Pattern Recognition, vol.
1, pp. 195-205, 1969.
[14] J. MacQueen, "Some methods for classification and analysis of
multivariate observations," Proc. 5th Berkeley Symp. on Math.
Statist. and Prob. Berkeley, Calif.: University of California Press,
1967, pp. 281- 297.
[15] R. L. Mattson and J. E. Dammann, "A technique for determining
and coding subclasses in pattern recognition problems," IBM J. Res.
and Develop., vol. 9, pp. 294-302, July 1965.
[16] G. Nagy, "State of the art in pattern recognition," Proc. IEEE, vol.
56, pp. 836-862, May 1968.
[17] D. J. Rogers and T. T. Tanimoto, "A computer program for classifying plants," Science, vol. 132, pp. 115-118, October 1960.
[18] C. A. Rosen and D. J. Hall, A pattern recognition experiment with
near-optimum results," IEEE Trans. Electronic Computers (Correspondence), vol. EC-15, pp. 666-667, August 1966.
[19] J. Rubin, "Optimal classification into groups: An approach for solving the taxonomy problem," IBM Rept. 320-2915, December 1966.
[20] J. H. Ward, ";Hierarchical grouping to optimize an objective function," Amer. Stat. Assoc. J., vol. 58, pp. 236-244, 1963.
[21 ] L. A. Zadeh, "Fuzzy sets," Information and Control, vol. 8, pp. 338353, 1965.

Algorithm Its Clustering: Detecting

Uploaded by

Document Information

Original Description:

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Algorithm Its Clustering: Detecting

Uploaded by

Copyright:

Available Formats

583

IEEE TRANSACTIONS ON COMPUTERS, VOL. C-19, NO. 7, JULY 1970

An Algorithm for Detecting Unimodal

MARTIN D. LEVINE, MEMBER, IEEE

Abstract-An algorithm is presented which partitions a given

rT HE PRIMARY objective of clustering techniques is

Fig. 1. Distribution of points in a two-dimensional feature space.

space in which the points are represented) are only able to

IEEE TRANSACTIONS ON COMPUTERS, JULY 1970

where xi is some point in B and d is a metric.

d(xi,,) < d(Xk, U):: tf(Xi) . f(x4

As an example of a symmetric fuzzy set, consider the set B

2 This is equivalent to the set Fr

GITMAN AND LEVINE: ALGORITHM FOR DETECTING UNIMODAL FUZZY SETS

point in An is its mode and the points are ordered according

Fig. 2. An example of a unimodal fuzzy set in a two-dimensional space.

detects all the local maxima off, and procedure S which

the procedure. At every stage of assignment, every group i

3) Suppose that G groups have been initiated. Thus there

IEEE TRANSACTIONS ON COMPUTERS, JULY 1970

points in A for i < q have already been assigned); then the

The proofs of the theorems are given in Appendix I.

X2F3 X25 IX27 X2

be the partition generated by part 1 of

distance Ri can be found as follows:

Ri = d(,u, xpi) = min [d(i, Xk)J.

Ri is the minimum distance from the mode to a point in S

FRj = {xld(xpi, X) < Ri}

(x15, x14, x16, x13, (x2X,x1l,lx17

A1 = (yll Y21 Y31*

X2, X30, X1, X20, X22, X21).

X21, X6, X22, X5, X4, X23, X3, X24,

Observing these sequences, we can see that the first four

GITMAN AND LEVINE: ALGORITHM FOR DETECTING UNIMODAL FUZZY SETS

IEEE TRANSACTIONS ON COMPUTERS, JULY 1970

Let H = xo7 be the optimal hyperplane (point) separating

If S does not include any points in F7, then procedure S

X2, X30, X1, X20, X22, X21).

All the points up to x25 are automatically assigned to the

Sl = (X15, X14, x16, X13, x12, xll, X17, X10, X%,)

Fig. 6. This figure demonstrates procedure S. At a certain stage in the

problem [16] where a set of points {(xi)'} is given that must

GITMAN AND LEVINE: ALGORITHM FOR DETECTING UNIMODAL FUZZY SETS

the identity between Yq 1 and y', i= 1, , G, is checked.

101(1) 271(89, 83, 2)

*f(vi) indicates the maximal grade of membership in the corresponding

cause the hyperellipsoids which indicate the hypersurfaces

From Table I, we can see that nine to ten major categories

COMPUTERS, JULY 1970

linkage of several categories; that is, if T is very large, then

GITMAN AND LEVINE: ALGORITHM FOR DETECTING UNIMODAL FUZZY SETS

Z ni = the number of points in the original data set.

Now the usual discrimination procedure is employed; for

' {xld(xi, x) < T2}

{xi, Xi, Xm},

Fig. 7. The truncation process. (a) The characteristic function f, truncated

mains and the grade of membership of the points is as in the

Truncating the Sequence A

IEEE TRANSACTIONS ON COMPUTERS, JULY

Assumption 1 implies that (IFnS) is

d(Iti, x.) < d(Iti,Xr)J

d (x,, xv) < a.