Professional Documents
Culture Documents
A.K. JAIN
Michigan State University
M.N. MURTY
Indian Institute of Science
AND
P.J. FLYNN
The Ohio State University
Section 6.1 is based on the chapter “Image Segmentation Using Clustering” by A.K. Jain and P.J.
Flynn, Advances in Image Understanding: A Festschrift for Azriel Rosenfeld (K. Bowyer and N. Ahuja,
Eds.), 1996 IEEE Computer Society Press, and is used by permission of the IEEE Computer Society.
Authors’ addresses: A. Jain, Department of Computer Science, Michigan State University, A714 Wells
Hall, East Lansing, MI 48824; M. Murty, Department of Computer Science and Automation, Indian
Institute of Science, Bangalore, 560 012, India; P. Flynn, Department of Electrical Engineering, The
Ohio State University, Columbus, OH 43210.
Permission to make digital / hard copy of part or all of this work for personal or classroom use is granted
without fee provided that the copies are not made or distributed for profit or commercial advantage, the
copyright notice, the title of the publication, and its date appear, and notice is given that copying is by
permission of the ACM, Inc. To copy otherwise, to republish, to post on servers, or to redistribute to
lists, requires prior specific permission and / or a fee.
© 2000 ACM 0360-0300/99/0900–0001 $5.00
Y Y
x 4 4
x x x 4 4 5
x 5
x x x x x 4 4 4 5 5
x 3 55
x x x x x x 33 4
x 4 3 4
x x
4 4
x x 4
x 4 4 4
x x x
x x x x 2 2 6 7
x x x x 2 2 6 7
x x 6 7
x
xxx x x 1 11 1 6 7
x 1
X (b) X
(a)
methods for grouping of unlabeled data. sionals (who should view it as an acces-
These communities have different ter- sible introduction to a mature field that
minologies and assumptions for the is making important contributions to
components of the clustering process computing application areas).
and the contexts in which clustering is
used. Thus, we face a dilemma regard- 1.2 Components of a Clustering Task
ing the scope of this survey. The produc-
tion of a truly comprehensive survey Typical pattern clustering activity in-
would be a monumental task given the volves the following steps [Jain and
sheer mass of literature in this area. Dubes 1988]:
The accessibility of the survey might (1) pattern representation (optionally
also be questionable given the need to including feature extraction and/or
reconcile very different vocabularies selection),
and assumptions regarding clustering
in the various communities. (2) definition of a pattern proximity
The goal of this paper is to survey the measure appropriate to the data do-
core concepts and techniques in the main,
large subset of cluster analysis with its (3) clustering or grouping,
roots in statistics and decision theory.
Where appropriate, references will be (4) data abstraction (if needed), and
made to key concepts and techniques (5) assessment of output (if needed).
arising from clustering methodology in
the machine-learning and other commu- Figure 2 depicts a typical sequencing of
nities. the first three of these steps, including
The audience for this paper includes a feedback path where the grouping
practitioners in the pattern recognition process output could affect subsequent
and image analysis communities (who feature extraction and similarity com-
should view it as a summarization of putations.
current practice), practitioners in the Pattern representation refers to the
machine-learning communities (who number of classes, the number of avail-
should view it as a snapshot of a closely able patterns, and the number, type,
related field with a rich history of well- and scale of the features available to the
understood techniques), and the clustering algorithm. Some of this infor-
broader audience of scientific profes- mation may not be controllable by the
feedback loop
Figure 2. Stages in clustering.
subjectively. Hence, little in the way of —How can a vary large data set (say, a
‘gold standards’ exist in clustering ex- million patterns) be clustered effi-
cept in well-prescribed subdomains. Va- ciently?
lidity assessments are objective [Dubes
1993] and are performed to determine These issues have motivated this sur-
whether the output is meaningful. A vey, and its aim is to provide a perspec-
clustering structure is valid if it cannot tive on the state of the art in clustering
reasonably have occurred by chance or methodology and algorithms. With such
as an artifact of a clustering algorithm. a perspective, an informed practitioner
When statistical approaches to cluster- should be able to confidently assess the
ing are used, validation is accomplished tradeoffs of different techniques, and
by carefully applying statistical meth- ultimately make a competent decision
ods and testing hypotheses. There are on a technique or suite of techniques to
three types of validation studies. An employ in a particular application.
external assessment of validity com- There is no clustering technique that
pares the recovered structure to an a is universally applicable in uncovering
priori structure. An internal examina- the variety of structures present in mul-
tion of validity tries to determine if the tidimensional data sets. For example,
structure is intrinsically appropriate for consider the two-dimensional data set
the data. A relative test compares two shown in Figure 1(a). Not all clustering
structures and measures their relative techniques can uncover all the clusters
merit. Indices used for this comparison present here with equal facility, because
are discussed in detail in Jain and clustering algorithms often contain im-
Dubes [1988] and Dubes [1993], and are plicit assumptions about cluster shape
not discussed further in this paper. or multiple-cluster configurations based
on the similarity measures and group-
1.3 The User’s Dilemma and the Role of ing criteria used.
Expertise Humans perform competitively with
automatic clustering procedures in two
The availability of such a vast collection dimensions, but most real problems in-
of clustering algorithms in the litera- volve clustering in higher dimensions. It
ture can easily confound a user attempt- is difficult for humans to obtain an intu-
ing to select an algorithm suitable for itive interpretation of data embedded in
the problem at hand. In Dubes and Jain a high-dimensional space. In addition,
[1976], a set of admissibility criteria data hardly follow the “ideal” structures
defined by Fisher and Van Ness [1971] (e.g., hyperspherical, linear) shown in
are used to compare clustering algo- Figure 1. This explains the large num-
rithms. These admissibility criteria are ber of clustering algorithms which con-
based on: (1) the manner in which clus- tinue to appear in the literature; each
ters are formed, (2) the structure of the new clustering algorithm performs
data, and (3) sensitivity of the cluster- slightly better than the existing ones on
ing technique to changes that do not a specific distribution of patterns.
affect the structure of the data. How- It is essential for the user of a cluster-
ever, there is no critical analysis of clus- ing algorithm to not only have a thor-
tering algorithms dealing with the im- ough understanding of the particular
portant questions such as technique being utilized, but also to
—How should the data be normalized? know the details of the data gathering
process and to have some domain exper-
—Which similarity measure is appropri- tise; the more information the user has
ate to use in a given situation?
about the data at hand, the more likely
—How should domain knowledge be uti- the user would be able to succeed in
lized in a particular clustering prob- assessing its true class structure [Jain
lem? and Dubes 1988]. This domain informa-
tion can also be used to improve the survey of the state of the art in cluster-
quality of feature extraction, similarity ing circa 1978 was reported in Dubes
computation, grouping, and cluster rep- and Jain [1980]. A comparison of vari-
resentation [Murty and Jain 1995]. ous clustering algorithms for construct-
Appropriate constraints on the data ing the minimal spanning tree and the
source can be incorporated into a clus- short spanning path was given in Lee
tering procedure. One example of this is [1981]. Cluster analysis was also sur-
mixture resolving [Titterington et al. veyed in Jain et al. [1986]. A review of
1985], wherein it is assumed that the image segmentation by clustering was
data are drawn from a mixture of an reported in Jain and Flynn [1996]. Com-
unknown number of densities (often as- parisons of various combinatorial opti-
sumed to be multivariate Gaussian). mization schemes, based on experi-
The clustering problem here is to iden- ments, have been reported in Mishra
tify the number of mixture components and Raghavan [1994] and Al-Sultan and
and the parameters of each component. Khan [1996].
The concept of density clustering and a
methodology for decomposition of fea-
ture spaces [Bajcsy 1997] have also 1.5 Outline
been incorporated into traditional clus-
This paper is organized as follows. Sec-
tering methodology, yielding a tech-
tion 2 presents definitions of terms to be
nique for extracting overlapping clus-
used throughout the paper. Section 3
ters.
summarizes pattern representation,
feature extraction, and feature selec-
1.4 History tion. Various approaches to the compu-
tation of proximity between patterns
Even though there is an increasing in-
are discussed in Section 4. Section 5
terest in the use of clustering methods
presents a taxonomy of clustering ap-
in pattern recognition [Anderberg
proaches, describes the major tech-
1973], image processing [Jain and
niques in use, and discusses emerging
Flynn 1996] and information retrieval
techniques for clustering incorporating
[Rasmussen 1992; Salton 1991], cluster-
non-numeric constraints and the clus-
ing has a rich history in other disci-
tering of large sets of patterns. Section
plines [Jain and Dubes 1988] such as
6 discusses applications of clustering
biology, psychiatry, psychology, archae-
methods to image analysis and data
ology, geology, geography, and market-
mining problems. Finally, Section 7 pre-
ing. Other terms more or less synony-
sents some concluding remarks.
mous with clustering include
unsupervised learning [Jain and Dubes
1988], numerical taxonomy [Sneath and 2. DEFINITIONS AND NOTATION
Sokal 1973], vector quantization [Oehler
and Gray 1995], and learning by obser- The following terms and notation are
vation [Michalski and Stepp 1983]. The used throughout this paper.
field of spatial analysis of point pat-
terns [Ripley 1988] is also related to —A pattern (or feature vector, observa-
cluster analysis. The importance and tion, or datum) x is a single data item
interdisciplinary nature of clustering is used by the clustering algorithm. It
evident through its vast literature. typically consists of a vector of d mea-
A number of books on clustering have
been published [Jain and Dubes 1988; surements: x 5 ~ x 1 , . . . x d ! .
Anderberg 1973; Hartigan 1975; Spath
1980; Duran and Odell 1974; Everitt —The individual scalar components x i
1993; Backer 1995], in addition to some of a pattern x are called features (or
useful and influential review papers. A attributes).
.....................
.........
. ......
.. ............. from the original set. In either case, the
.................
................
. . ...........
................................ ..
goal is to improve classification perfor-
.............................. ....
..... .
. ...
X2 X2
C C
B B
A A
D
FE
X1 X1
Figure 4. A and B are more similar than A Figure 5. After a change in context, B and C
and C. are more similar than B and A.
Clustering
Hierarchical Partitional
k-means Expectation
Maximization
Figure 7. A taxonomy of clustering approaches.
X2 V
to be clustered is large, and con-
straints on execution time or memory 222 2 2 3
space affect the architecture of the 2 22 3 333 3
2
2 2 2 2222 3
algorithm. The early history of clus- 2 2 H1 3 3
3
3 3 3 3
tering methodology does not contain 3 3
many examples of clustering algo- 1 1 1 1 1 3 3 3 H2
1 1
rithms designed to work with large 11 1 11 1 1
1
data sets, but the advent of data min- 1 1 1 1
1 4 4
4
1 1
1 11
ing has fostered the development of 4 4 4 4
A
B It has a tendency to produce clusters
that are straggly or elongated. There
X1 are two clusters in Figures 12 and 13
separated by a “bridge” of noisy pat-
Figure 9. Points falling in three clusters.
terns. The single-link algorithm pro-
duces the clusters shown in Figure 12,
whereas the complete-link algorithm ob-
tains the clustering shown in Figure 13.
S The clusters obtained by the complete-
i
m link algorithm are more compact than
i those obtained by the single-link algo-
l
a rithm; the cluster labeled 1 obtained
r using the single-link algorithm is elon-
i
t gated because of the noisy patterns la-
y beled “*”. The single-link algorithm is
more versatile than the complete-link
algorithm, otherwise. For example, the
single-link algorithm can extract the
concentric clusters shown in Figure 11,
A B C D E F G
but the complete-link algorithm cannot.
Figure 10. The dendrogram obtained using However, from a pragmatic viewpoint, it
the single-link algorithm.
has been observed that the complete-
link algorithm produces more useful hi-
Y
erarchies in many applications than the
1 single-link algorithm [Jain and Dubes
1 1 1988].
2 2 Agglomerative Single-Link Clus-
1
1 2 tering Algorithm
2 2
2 1
1 (1) Place each pattern in its own clus-
1 1 ter. Construct a list of interpattern
distances for all distinct unordered
X pairs of patterns, and sort this list
Figure 11. Two concentric clusters. in ascending order.
X2 X2
2 2 2 2
1 111 2 2 22 2 1 111 2 2 22 2
2 2
1 1 11 2 1 1 11 2
2 2 11 2 2
11 1 1 1 *** * * * * ** 2 2
1 1 1 *** * * * * ** 2 2 2 2
1 1 2 2 1 1 1 2 2
1 1 1 1 2 1 1 1 2
1 1 2 1 1 2
X1 X1
Figure 12. A single-link clustering of a pattern Figure 13. A complete-link clustering of a pat-
set containing two classes (1 and 2) connected by tern set containing two classes (1 and 2) con-
a chain of noisy patterns (*). nected by a chain of noisy patterns (*).
X2 X2
* *
*
*
* *
* *
*
* * *
* *
X1 X1
By The Centroid By Three Distant Points
Figure 17. Representation of a cluster by points.
X2 |
|
3 3
5 |
|
3 X1< 3 X1 >3
3
1 | 3
4 3
1 1 1 | 3 X 2 <2 X2 >2
|
3 1 1 3 3
1 1|
1 3 1 2 3
1 | 3
2 1 1
1 1 |---------- Using Nodes in a Classification Tree
2 2
|
1 1 1 1 2 2 2
1 |
1 1 1 1 1 | 2 2
|
0
0 1 2 3 4 5 X1 1: [X1<3]; 2: [X1 >3][X2<2]; 3:[X1>3][X2 >2]
Using Conjunctive Statements
Figure 18. Representation of clusters by a classification tree or by conjunctive statements.
2
2 2 22
222
1 2 LVQ is addressed in Pal et al. [1993].
1 1
1 1 1
11 1 1 1
1 1
2 2 2
2
2 2
2 1 2 The learning algorithm in ART models
2 2
111 11
1 11 1
1 2
2
2 2 2
2
1 2 is similar to the leader clustering algo-
1 1 11 1
1 1 1
22 2
2 2
2
1 2
rithm [Moor 1988].
2 2
1 1 22
The SOM gives an intuitively appeal-
ing two-dimensional map of the multidi-
(a)
X1
(b)
X1 mensional data set, and it has been
successfully used for vector quantiza-
Figure 19. Data compression by clustering.
tion and speech recognition [Kohonen
1984]. However, like its sequential
counterpart, the SOM generates a sub-
(1) ANNs process numerical vectors and optimal partition if the initial weights
so require patterns to be represented are not chosen properly. Further, its
using quantitative features only. convergence is controlled by various pa-
rameters such as the learning rate and
(2) ANNs are inherently parallel and a neighborhood of the winning node in
distributed processing architec- which learning takes place. It is possi-
tures. ble that a particular input pattern can
(3) ANNs may learn their interconnec- fire different output units at different
tion weights adaptively [Jain and iterations; this brings up the stability
Mao 1996; Oja 1982]. More specifi- issue of learning systems. The system is
cally, they can act as pattern nor- said to be stable if no pattern in the
malizers and feature selectors by training data changes its category after
appropriate selection of weights. a finite number of learning iterations.
This problem is closely associated with
Competitive (or winner–take–all) the problem of plasticity, which is the
neural networks [Jain and Mao 1996] ability of the algorithm to adapt to new
are often used to cluster input data. In data. For stability, the learning rate
competitive learning, similar patterns should be decreased to zero as iterations
are grouped by the network and repre- progress and this affects the plasticity.
sented by a single unit (neuron). This The ART models are supposed to be
grouping is done automatically based on stable and plastic [Carpenter and
data correlations. Well-known examples Grossberg 1990]. However, ART nets
of ANNs used for clustering include Ko- are order-dependent; that is, different
honen’s learning vector quantization partitions are obtained for different or-
(LVQ) and self-organizing map (SOM) ders in which the data is presented to
[Kohonen 1984], and adaptive reso- the net. Also, the size and number of
nance theory models [Carpenter and clusters generated by an ART net de-
Grossberg 1990]. The architectures of pend on the value chosen for the vigi-
these ANNs are simple: they are single- lance threshold, which is used to decide
layered. Patterns are presented at the whether a pattern is to be assigned to
input and are associated with the out- one of the existing clusters or start a
put nodes. The weights between the in- new cluster. Further, both SOM and
put nodes and the output nodes are ART are suitable for detecting only hy-
iteratively changed (this is called learn- perspherical clusters [Hertz et al. 1991].
ing) until a termination criterion is sat- A two-layer network that employs regu-
isfied. Competitive learning has been larized Mahalanobis distance to extract
found to exist in biological neural net- hyperellipsoidal clusters was proposed
works. However, the learning or weight in Mao and Jain [1994]. All these ANNs
update procedures are quite similar to use a fixed number of output nodes
issues such as the convergence of these Figure 21. GAs perform globalized search.
approaches were studied in Fogel and
Fogel [1994].
GAs perform a globalized search for
solutions whereas most other clustering S 4 5 11000 . The corresponding deci-
procedures perform a localized search. mal values are 15 and 24, respectively.
In a localized search, the solution ob- Similarly, by mutating the most signifi-
tained at the ‘next iteration’ of the pro- cant bit in the binary string 01111 (dec-
cedure is in the vicinity of the current imal 15), the binary string 11111 (deci-
mal 31) is generated. These jumps, or
solution. In this sense, the k -means al- gaps between points in successive gen-
gorithm, fuzzy clustering algorithms, erations, are much larger than those
ANNs used for clustering, various an- produced by other approaches.
nealing schemes (see below), and tabu Perhaps the earliest paper on the use
search are all localized search tech- of GAs for clustering is by Raghavan
niques. In the case of GAs, the crossover and Birchand [1979], where a GA was
and mutation operators can produce used to minimize the squared error of a
new solutions that are completely dif- clustering. Here, each point or chromo-
ferent from the current ones. We illus-
trate this fact in Figure 21. Let us as- some represents a partition of N objects
sume that the scalar X is coded using a into K clusters and is represented by a
5-bit binary representation, and let S 1 K -ary string of length N . For example,
and S 2 be two points in the one-dimen- consider six patterns—A, B, C, D, E,
sional search space. The decimal values and F—and the string 101001. This six-
of S 1 and S 2 are 8 and 31, respectively. bit binary (K 5 2 ) string corresponds to
placing the six patterns into two clus-
Their binary representations are S 1 5 ters. This string represents a two-parti-
01000 and S 2 5 11111 . Let us apply tion, where one cluster has the first,
the single-point crossover to these third, and sixth patterns and the second
strings, with the crossover site falling cluster has the remaining patterns. In
between the second and third most sig- other words, the two clusters are
nificant bits as shown below. {A,C,F} and {B,D,E} (the six-bit binary
string 010110 represents the same clus-
01!000 tering of the six patterns). When there
are K clusters, there are K ! different
11!111 chromosomes corresponding to each
K -partition of the data. This increases
This will produce a new pair of points or the effective search space size by a fac-
chromosomes S 3 and S 4 as shown in tor of K !. Further, if crossover is applied
Figure 21. Here, S 3 5 01111 and on two good chromosomes, the resulting
offspring may be inferior in this repre- tion. This hybrid approach performed
sentation. For example, let {A,B,C} and better than the GA.
{D,E,F} be the clusters in the optimal A major problem with GAs is their
2-partition of the six patterns consid- sensitivity to the selection of various
ered above. The corresponding chromo- parameters such as population size,
somes are 111000 and 000111. By ap- crossover and mutation probabilities,
plying single-point crossover at the etc. Grefenstette [Grefenstette 1986]
location between the third and fourth has studied this problem and suggested
bit positions on these two strings, we guidelines for selecting these control pa-
get 111111 and 000000 as offspring and rameters. However, these guidelines
both correspond to an inferior partition. may not yield good results on specific
These problems have motivated re- problems like pattern clustering. It was
searchers to design better representa- reported in Jones and Beltramo [1991]
tion schemes and crossover operators. that hybrid genetic algorithms incorpo-
In Bhuyan et al. [1991], an improved rating problem-specific heuristics are
representation scheme is proposed good for clustering. A similar claim is
made in Davis [1991] about the applica-
where an additional separator symbol is
bility of GAs to other practical prob-
used along with the pattern labels to
lems. Another issue with GAs is the
represent a partition. Let the separator
selection of an appropriate representa-
symbol be represented by *. Then the
tion which is low in order and short in
chromosome ACF*BDE corresponds to a defining length.
2-partition {A,C,F} and {B,D,E}. Using It is possible to view the clustering
this representation permits them to problem as an optimization problem
map the clustering problem into a per- that locates the optimal centroids of the
mutation problem such as the traveling clusters directly rather than finding an
salesman problem, which can be solved optimal partition using a GA. This view
by using the permutation crossover op- permits the use of ESs and EP, because
erators [Goldberg 1989]. This solution centroids can be coded easily in both
also suffers from permutation redun- these approaches, as they support the
dancy. There are 72 equivalent chromo- direct representation of a solution as a
somes (permutations) corresponding to real-valued vector. In Babu and Murty
the same partition of the data into the [1994], ESs were used on both hard and
two clusters {A,C,F} and {B,D,E}. fuzzy clustering problems and EP has
More recently, Jones and Beltramo been used to evolve fuzzy min-max clus-
[1991] investigated the use of edge- ters [Fogel and Simpson 1993]. It has
based crossover [Whitley et al. 1989] to been observed that they perform better
solve the clustering problem. Here, all than their classical counterparts, the
patterns in a cluster are assumed to k -means algorithm and the fuzzy
form a complete graph by connecting c -means algorithm. However, all of
them with edges. Offspring are gener- these approaches suffer (as do GAs and
ated from the parents so that they in- ANNs) from sensitivity to control pa-
herit the edges from their parents. It is rameter selection. For each specific
observed that this crossover operator problem, one has to tune the parameter
takes O ~ K 6 1 N ! time for N patterns values to suit the application.
and K clusters ruling out its applicabil-
ity on practical data sets having more
than 10 clusters. In a hybrid approach 5.9 Search-Based Approaches
proposed in Babu and Murty [1993], the Search techniques used to obtain the
GA is used only to find good initial optimum value of the criterion function
cluster centers and the k -means algo- are divided into deterministic and sto-
rithm is applied to find the final parti- chastic search techniques. Determinis-
tic search techniques guarantee an opti- quality (as measured by the criterion
mal partition by performing exhaustive function). The probability of acceptance
enumeration. On the other hand, the is governed by a critical parameter
stochastic search techniques generate a called the temperature (by analogy with
near-optimal partition reasonably annealing in metals), which is typically
quickly, and guarantee convergence to specified in terms of a starting (first
optimal partition asymptotically. iteration) and final temperature value.
Among the techniques considered so far, Selim and Al-Sultan [1991] studied the
evolutionary approaches are stochastic effects of control parameters on the per-
and the remainder are deterministic. formance of the algorithm, and Baeza-
Other deterministic approaches to clus- Yates [1992] used SA to obtain near-
tering include the branch-and-bound optimal partition of the data. SA is
technique adopted in Koontz et al. statistically guaranteed to find the glo-
[1975] and Cheng [1995] for generating bal optimal solution [Aarts and Korst
optimal partitions. This approach gen- 1989]. A high-level outline of a SA
erates the optimal partition of the data based algorithm for clustering is given
at the cost of excessive computational below.
requirements. In Rose et al. [1993], a
deterministic annealing approach was Clustering Based on Simulated
proposed for clustering. This approach Annealing
employs an annealing technique in (1) Randomly select an initial partition
which the error surface is smoothed, but and P 0 , and compute the squared
convergence to the global optimum is
not guaranteed. The use of determinis- error value, E P 0 . Select values for
tic annealing in proximity-mode cluster- the control parameters, initial and
ing (where the patterns are specified in final temperatures T 0 and T f .
terms of pairwise proximities rather (2) Select a neighbor P 1 of P 0 and com-
than multidimensional points) was ex-
plored in Hofmann and Buhmann pute its squared error value, E P 1 . If
[1997]; later work applied the determin- E P 1 is larger than E P 0 , then assign
istic annealing approach to texture seg- P 1 to P 0 with a temperature-depen-
mentation [Hofmann and Buhmann dent probability. Else assign P 1 to
1998]. P 0 . Repeat this step for a fixed num-
The deterministic approaches are typ- ber of iterations.
ically greedy descent approaches,
whereas the stochastic approaches per- (3) Reduce the value of T 0 , i.e. T 0 5
mit perturbations to the solutions in cT 0 , where c is a predetermined
non-locally optimal directions also with constant. If T 0 is greater than T f ,
nonzero probabilities. The stochastic then go to step 2. Else stop.
search techniques are either sequential
or parallel, while evolutionary ap- The SA algorithm can be slow in
proaches are inherently parallel. The reaching the optimal solution, because
simulated annealing approach (SA) optimal results require the temperature
[Kirkpatrick et al. 1983] is a sequential to be decreased very slowly from itera-
stochastic search technique, whose ap- tion to iteration.
plicability to clustering is discussed in Tabu search [Glover 1986], like SA, is
Klein and Dubes [1989]. Simulated an- a method designed to cross boundaries
nealing procedures are designed to of feasibility or local optimality and to
avoid (or recover from) solutions which systematically impose and release con-
correspond to local optima of the objec- straints to permit exploration of other-
tive functions. This is accomplished by wise forbidden regions. Tabu search
accepting with some probability a new was used to solve the clustering prob-
solution for the next iteration of lower lem in Al-Sultan [1995].
1984] that the k -means method con- Every clustering algorithm uses some
verges to a locally optimal solution. This type of knowledge either implicitly or
behavior is linked with the initial seed explicitly. Implicit knowledge plays a
selection in the k -means algorithm. So role in (1) selecting a pattern represen-
if a good initial partition can be ob- tation scheme (e.g., using one’s prior
tained quickly using any of the other experience to select and encode fea-
techniques, then k -means would work tures), (2) choosing a similarity measure
well even on problems with large data (e.g., using the Mahalanobis distance
sets. Even though various methods dis- instead of the Euclidean distance to ob-
cussed in this section are comparatively tain hyperellipsoidal clusters), and (3)
weak, it was revealed through experi- selecting a grouping scheme (e.g., speci-
mental studies that combining domain fying the k -means algorithm when it is
knowledge would improve their perfor- known that clusters are hyperspheri-
mance. For example, ANNs work better cal). Domain knowledge is used implic-
in classifying images represented using itly in ANNs, GAs, TS, and SA to select
extracted features than with raw im- the control/learning parameter values
ages, and hybrid classifiers work better that affect the performance of these al-
than ANNs [Mohiuddin and Mao 1994]. gorithms.
Similarly, using domain knowledge to It is also possible to use explicitly
hybridize a GA improves its perfor- available domain knowledge to con-
mance [Jones and Beltramo 1991]. So it strain or guide the clustering process.
may be useful in general to use domain Such specialized clustering algorithms
knowledge along with approaches like have been used in several applications.
GA, SA, ANN, and TS. However, these Domain concepts can play several roles
approaches (specifically, the criteria
in the clustering process, and a variety
functions used in them) have a tendency
of choices are available to the practitio-
to generate a partition of hyperspheri-
ner. At one extreme, the available do-
cal clusters, and this could be a limita-
tion. For example, in cluster-based doc- main concepts might easily serve as an
ument retrieval, it was observed that additional feature (or several), and the
the hierarchical algorithms performed remainder of the procedure might be
better than the partitional algorithms otherwise unaffected. At the other ex-
[Rasmussen 1992]. treme, domain concepts might be used
to confirm or veto a decision arrived at
independently by a traditional cluster-
5.11 Incorporating Domain Constraints in ing algorithm, or used to affect the com-
Clustering putation of distance in a clustering algo-
rithm employing proximity. The
As a task, clustering is subjective in
incorporation of domain knowledge into
nature. The same data set may need to
be partitioned differently for different clustering consists mainly of ad hoc ap-
purposes. For example, consider a proaches with little in common; accord-
whale, an elephant, and a tuna fish ingly, our discussion of the idea will
[Watanabe 1985]. Whales and elephants consist mainly of motivational material
form a cluster of mammals. However, if and a brief survey of past work. Ma-
the user is interested in partitioning chine learning research and pattern rec-
them based on the concept of living in ognition research intersect in this topi-
water, then whale and tuna fish are cal area, and the interested reader is
clustered together. Typically, this sub- referred to the prominent journals in
jectivity is incorporated into the cluster- machine learning (e.g., Machine Learn-
ing criterion by incorporating domain ing, J. of AI Research, or Artificial Intel-
knowledge in one or more phases of ligence) for a fuller treatment of this
clustering. topic.
xxx x x
100 4,950 1200
x x xx x xx x x xx x xx x x x xxx 500 124,750 2 10,750
x x x
...
x n 100 499,500 4 31,500
--
xx
xxx x xx x x
xxx
p 10,000 49,995,000 10 1,013,750
x
1 2 p
Figure 23. Divide and conquer approach to 5.12.2 Incremental Clustering. In-
clustering. cremental clustering is based on the
assumption that it is possible to con-
sider patterns one at a time and assign
them to existing clusters. Here, a new
data item is assigned to a cluster with-
We transfer each of these blocks to the
out affecting the existing clusters signif-
main memory and cluster it into k clus- icantly. A high level description of a
ters using a standard algorithm. One or typical incremental clustering algo-
more representative samples from each rithm is given below.
of these clusters are stored separately;
we have pk of these representative pat- An Incremental Clustering Algo-
terns if we choose one representative rithm
per cluster. These pk representatives (1) Assign the first data item to a clus-
are further clustered into k clusters and ter.
the cluster labels of these representa- (2) Consider the next data item. Either
tive patterns are used to relabel the assign this item to one of the exist-
original pattern matrix. We depict this ing clusters or assign it to a new
two-level algorithm in Figure 23. It is cluster. This assignment is done
possible to extend this algorithm to any based on some criterion, e.g. the dis-
number of levels; more levels are re- tance between the new item and the
quired if the data set is very large and existing cluster centroids.
the main memory size is very small (3) Repeat step 2 till all the data items
[Murty and Krishna 1980]. If the single- are clustered.
link algorithm is used to obtain 5 clus-
ters, then there is a substantial savings The major advantage with the incre-
in the number of computations as mental clustering algorithms is that it
shown in Table II for optimally chosen p is not necessary to store the entire pat-
when the number of clusters is fixed at tern matrix in the memory. So, the
5. However, this algorithm works well space requirements of incremental algo-
only when the points in each block are rithms are very small. Typically, they
reasonably homogeneous which is often are noniterative. So their time require-
satisfied by image data. ments are also small. There are several
A two-level strategy for clustering a incremental clustering algorithms:
data set containing 2,000 patterns was (1) The leader clustering algorithm
described in Stahl [1986]. In the first [Hartigan 1975] is the simplest in
level, the data set is loosely clustered terms of time complexity which is
into a large number of clusters using O ~ nk ! . It has gained popularity be-
the leader algorithm. Representatives cause of its neural network imple-
from these clusters, one per cluster, are mentation, the ART network [Car-
the input to the second level clustering, penter and Grossberg 1990]. It is
which is obtained using Ward’s hierar- very easy to implement as it re-
chical method.
quires only O ~ k ! space.
x
3
x
2
x
1
Figure 25. Feature representation for clustering. Image measurements and positions are transformed
to features. Clusters in feature space correspond to image segments.
applications, and can be addressed as a is the input image with N r rows and N c
clustering problem [Rosenfeld and Kak columns and measurement value x ij at
1982]. The segmentation of the image(s)
presented to an image analysis system pixel ~ i, j ! , then the segmentation can
is critically dependent on the scene to be expressed as 6 5 $ S 1 , . . . S k % , with
be sensed, the imaging geometry, con- the l th segment
figuration, and sensor used to transduce
the scene into a digital image, and ulti- Sl 5 $~il1, jl1!, . . . ~ilN , jlN !%
l l
’x.dat’
(a) (b)
(c)
Figure 26. Binarization via thresholding. (a): Original grayscale image. (b): Gray-level histogram. (c):
Results of thresholding.
Dunn et al. 1974]. A postprocessing step 6.1.2 Image Segmentation Via Clus-
separates the classes into connected re- tering. The application of local feature
gions. While simple gray level thresh- clustering to segment gray–scale images
olding is adequate in some carefully was documented in Schachter et al.
controlled image acquisition environ- [1979]. This paper emphasized the ap-
ments and much research has been de- propriate selection of features at each
voted to appropriate methods for pixel rather than the clustering method-
thresholding [Weszka 1978; Trier and ology, and proposed the use of image
Jain 1995], complex images require plane coordinates (spatial information)
more elaborate segmentation tech- as additional features to be employed in
niques. clustering-based segmentation. The goal
Many segmenters use measurements of clustering was to obtain a sequence of
which are both spectral (e.g., the multi-
hyperellipsoidal clusters starting with
spectral scanner used in remote sens-
cluster centers positioned at maximum
ing) and spatial (based on the pixel’s
density locations in the pattern space,
location in the image plane). The mea-
surement at each pixel hence corre- and growing clusters about these cen-
sponds directly to our concept of a pat- ters until a x 2 test for goodness of fit
tern. was violated. A variety of features were
discussed and applied to both grayscale In Vinod et al. [1994], two neural
and color imagery. networks are designed to perform pat-
An agglomerative clustering algo- tern clustering when combined. A two-
rithm was applied in Silverman and layer network operates on a multidi-
Cooper [1988] to the problem of unsu- mensional histogram of the data to
pervised learning of clusters of coeffi- identify ‘prototypes’ which are used to
cient vectors for two image models that classify the input patterns into clusters.
correspond to image segments. The first These prototypes are fed to the classifi-
image model is polynomial for the ob- cation network, another two-layer net-
served image measurements; the as- work operating on the histogram of the
sumption here is that the image is a input data, but are trained to have dif-
collection of several adjoining graph fering weights from the prototype selec-
surfaces, each a polynomial function of tion network. In both networks, the his-
togram of the image is used to weight
the image plane coordinates, which are
the contributions of patterns neighbor-
sampled on the raster grid to produce
ing the one under consideration to the
the observed image. The algorithm pro-
location of prototypes or the ultimate
ceeds by obtaining vectors of coefficients classification; as such, it is likely to be
of least-squares fits to the data in M more robust when compared to tech-
disjoint image windows. An agglomera- niques which assume an underlying
tive clustering algorithm merges (at parametric density function for the pat-
each step) the two clusters that have a tern classes. This architecture was
minimum global between-cluster Ma- tested on gray-scale and color segmen-
halanobis distance. The same frame- tation problems.
work was applied to segmentation of Jolion et al. [1991] describe a process
textured images, but for such images for extracting clusters sequentially from
the polynomial model was inappropri- the input pattern set by identifying hy-
ate, and a parameterized Markov Ran- perellipsoidal regions (bounded by loci
dom Field model was assumed instead. of constant Mahalanobis distance)
Wu and Leahy [1993] describe the which contain a specified fraction of the
application of the principles of network unclassified points in the set. The ex-
flow to unsupervised classification, tracted regions are compared against
yielding a novel hierarchical algorithm the best-fitting multivariate Gaussian
for clustering. In essence, the technique density through a Kolmogorov-Smirnov
views the unlabeled patterns as nodes test, and the fit quality is used as a
in a graph, where the weight of an edge figure of merit for selecting the ‘best’
region at each iteration. The process
(i.e., its capacity) is a measure of simi-
continues until a stopping criterion is
larity between the corresponding nodes.
satisfied. This procedure was applied to
Clusters are identified by removing
the problems of threshold selection for
edges from the graph to produce con- multithreshold segmentation of inten-
nected disjoint subgraphs. In image seg- sity imagery and segmentation of range
mentation, pixels which are 4-neighbors imagery.
or 8-neighbors in the image plane share Clustering techniques have also been
edges in the constructed adjacency successfully used for the segmentation
graph, and the weight of a graph edge is of range images, which are a popular
based on the strength of a hypothesized source of input data for three-dimen-
image edge between the pixels involved sional object recognition systems [Jain
(this strength is calculated using simple and Flynn 1993]. Range sensors typi-
derivative masks). Hence, this seg- cally return raster images with the
menter works by finding closed contours measured value at each pixel being the
in the image, and is best labeled edge- coordinates of a 3D location in space.
based rather than region-based. These 3D positions can be understood
as the locations where rays emerging The CLUSTER algorithm [Jain and
from the image plane locations in a bun- Dubes 1988] was used to obtain seg-
dle intersect the objects in front of the ment labels for each pixel. CLUSTER is
sensor. an enhancement of the k -means algo-
The local feature clustering concept is rithm; it has the ability to identify sev-
particularly attractive for range image eral clusterings of a data set, each with
segmentation since (unlike intensity a different number of clusters. Hoffman
measurements) the measurements at and Jain [1987] also experimented with
each pixel have the same units (length); other clustering techniques (e.g., com-
this would make ad hoc transformations plete-link, single-link, graph-theoretic,
or normalizations of the image features
and other squared error algorithms) and
unnecessary if their goal is to impose
found CLUSTER to provide the best
equal scaling on those features. How-
combination of performance and accu-
ever, range image segmenters often add
additional measurements to the feature racy. An additional advantage of CLUS-
space, removing this advantage. TER is that it produces a sequence of
A range image segmentation system output clusterings (i.e., a 2-cluster solu-
described in Hoffman and Jain [1987] tion up through a K max -cluster solution
employs squared error clustering in a where K max is specified by the user and
six-dimensional feature space as a is typically 20 or so); each clustering in
source of an “initial” segmentation this sequence yields a clustering statis-
which is refined (typically by merging tic which combines between-cluster sep-
segments) into the output segmenta- aration and within-cluster scatter. The
tion. The technique was enhanced in clustering that optimizes this statistic
Flynn and Jain [1991] and used in a is chosen as the best one. Each pixel in
recent systematic comparison of range the range image is assigned the seg-
image segmenters [Hoover et al. 1996]; ment label of the nearest cluster center.
as such, it is probably one of the long- This minimum distance classification
est-lived range segmenters which has step is not guaranteed to produce seg-
performed well on a large variety of ments which are connected in the image
range images. plane; therefore, a connected compo-
This segmenter works as follows. At nents labeling algorithm allocates new
each pixel ~ i, j ! in the input range im- labels for disjoint regions that were
age, the corresponding 3D measurement placed in the same cluster. Subsequent
is denoted ~ x ij , y ij , z ij ! , where typically operations include surface type tests,
x ij is a linear function of j (the column merging of adjacent patches using a test
number) and y ij is a linear function of i for the presence of crease or jump edges
(the row number). A k 3 k neighbor- between adjacent segments, and surface
parameter estimation.
hood of ~ i, j ! is used to estimate the 3D
Figure 27 shows this processing ap-
surface normal nij 5 ~ n xij , n yij , n zij ! at plied to a range image. Part a of the
~ i, j ! , typically by finding the least- figure shows the input range image;
squares planar fit to the 3D points in part b shows the distribution of surface
the neighborhood. The feature vector for normals. In part c, the initial segmenta-
the pixel at ~ i, j ! is the six-dimensional tion returned by CLUSTER and modi-
measurement ~ x ij , y ij , z ij , n xij , n yij , n zij ! , fied to guarantee connected segments is
and a candidate segmentation is found shown. Part d shows the final segmen-
by clustering these feature vectors. For tation produced by merging adjacent
practical reasons, not every pixel’s fea- patches which do not have a significant
ture vector is used in the clustering crease edge between them. The final
procedure; typically 1000 feature vec- clusters reasonably represent distinct
tors are chosen by subsampling. surfaces present in this complex object.
(a) (b)
(c) (d)
Figure 27. Range image segmentation using clustering. (a): Input range image. (b): Surface normals
for selected image pixels. (c): Initial segmentation (19 cluster solution) returned by CLUSTER using
1000 six-dimensional samples from the image as a pattern set. (d): Final segmentation (8 segments)
produced by postprocessing.
The analysis of textured images has clusters as well as the fuzzy member-
been of interest to researchers for sev- ship of each feature vector to the vari-
eral years. Texture segmentation tech- ous clusters.
niques have been developed using a va- A system for segmenting texture im-
riety of texture models and image ages was described in Jain and Far-
operations. In Nguyen and Cohen rokhnia [1991]; there, Gabor filters
[1993], texture image segmentation was were used to obtain a set of 28 orienta-
addressed by modeling the image as a tion- and scale-selective features that
hierarchy of two Markov Random characterize the texture in the neigh-
Fields, obtaining some simple statistics borhood of each pixel. These 28 features
from each image block to form a feature are reduced to a smaller number
vector, and clustering these blocks us- through a feature selection procedure,
ing a fuzzy K -means clustering method. and the resulting features are prepro-
The clustering procedure here is modi- cessed and then clustered using the
fied to jointly estimate the number of CLUSTER program. An index statistic
(a) (b)
Figure 28. Texture image segmentation results. (a): Four-class texture mosaic. (b): Four-cluster
solution produced by CLUSTER with pixel coordinates included in the feature set.
[Dubes 1987] is used to select the best nance imaging channels (yielding a five-
clustering. Minimum distance classifi- dimensional feature vector at each
cation is used to label each of the origi- pixel). A number of clusterings were
nal image pixels. This technique was obtained and combined with domain
tested on several texture mosaics in- knowledge (human expertise) to identify
cluding the natural Brodatz textures the different classes. Decision rules for
and synthetic images. Figure 28(a) supervised classification were based on
shows an input texture mosaic consist- these obtained classes. Figure 29(a)
ing of four of the popular Brodatz tex- shows one channel of an input multi-
tures [Brodatz 1966]. Part b shows the spectral image; part b shows the 9-clus-
segmentation produced when the Gabor ter result.
filter features are augmented to contain The k -means algorithm was applied
spatial information (pixel coordinates). to the segmentation of LANDSAT imag-
This Gabor filter based technique has ery in Solberg et al. [1996]. Initial clus-
proven very powerful and has been ex- ter centers were chosen interactively by
tended to the automatic segmentation of a trained operator, and correspond to
text in documents [Jain and Bhatta- land-use classes such as urban areas,
charjee 1992] and segmentation of ob- soil (vegetation-free) areas, forest,
jects in complex backgrounds [Jain et grassland, and water. Figure 30(a)
al. 1997]. shows the input image rendered as
Clustering can be used as a prepro- grayscale; part b shows the result of the
cessing stage to identify pattern classes clustering procedure.
for subsequent supervised classifica-
tion. Taxt and Lundervold [1994] and 6.1.3 Summary. In this section, the
Lundervold et al. [1996] describe a par- application of clustering methodology to
titional clustering algorithm and a man- image segmentation problems has been
ual labeling technique to identify mate- motivated and surveyed. The historical
rial classes (e.g., cerebrospinal fluid, record shows that clustering is a power-
white matter, striated muscle, tumor) in ful tool for obtaining classifications of
registered images of a human head ob- image pixels. Key issues in the design of
tained at five different magnetic reso- any clustering-based segmenter are the
(a) (b)
Figure 29. Multispectral medical image segmentation. (a): A single channel of the input image. (b):
9-cluster segmentation.
(a) (b)
Figure 30. LANDSAT image segmentation. (a): Original image (ESA/EURIMAGE/Sattelitbild). (b):
Clustered scene.
strong assumptions (often multivariate shape index values (which are related to
Gaussian) about the multidimensional surface curvature values) and accumu-
shape of clusters to be obtained. The lating all the object pixels that fall into
ability of new clustering procedures to each bin. By normalizing the spectrum
handle concepts and semantics in classi- with respect to the total object area, the
fication (in addition to numerical mea- scale (size) differences that may exist
surements) will be important for certain between different objects are removed.
applications [Michalski and Stepp 1983; The first moment m 1 is computed as the
Murty and Jain 1995]. # ~h!:
weighted mean of H
obtained from any arbitrary viewpoint. Then, the feature vector is denoted as
The system under consideration em- R 5 ~ m 1 , m 2 , · · ·, m 10 ! , with the
ployed a viewpoint dependent (or view- range of each of these moments being
centered) approach to the object recog- @ 21,1 # .
nition problem; each object to be
recognized was represented in terms of Let 2 5 $ O 1 , O 2 , · · ·, O n % be a col-
a library of range images of that object. lection of n 3D objects whose views are
There are many possible views of a 3D present in the model database, } D . The
object and one goal of that work was to i th view of the j th object, O ij in the
avoid matching an unknown input view database is represented by ^ L ij , R ij & ,
against each image of each object. A
common theme in the object recognition where L ij is the object label and R ij is the
literature is indexing, wherein the un- feature vector. Given a set of object
known view is used to select a subset of representations 5 i 5 $^ L i1 , R i1 & , · · ·,
views of a subset of the objects in the ^ L im , R im &% that describes m views of the
database for further comparison, and i th object, the goal is to derive a par-
rejects all other views of objects. One of tition of the views, 3 i 5 $ C i1 ,
the approaches to indexing employs the
notion of view classes; a view class is the C 2 , · · ·, C k i % . Each cluster in 3 i con-
i i
set of qualitatively similar views of an tains those views of the i th object that
object. In that work, the view classes have been adjudged similar based on
were identified by clustering; the rest of the dissimilarity between the corre-
this subsection outlines the technique. sponding moment features of the shape
Object views were grouped into spectra of the views. The measure of
classes based on the similarity of shape dissimilarity, between R ij and R ik , is de-
spectral features. Each input image of fined as:
an object viewed in isolation yields a
feature vector which characterizes that
O~ R
10
view. The feature vector contains the $~Rij, Rik! 5 i
jl 2 Rikl!2. (3)
first ten central moments of a normal- l51
Figure 31. A subset of views of Cobra chosen from a set of 320 views.
The range images from 320 possible ject is shown in Figure 32. The view
viewpoints (determined by the tessella- grouping hierarchies of the other nine
tion of the view-sphere using the icosa- objects are similar to the dendrogram in
hedron) of the objects were synthesized. Figure 32. This dendrogram is cut at a
Figure 31 shows a subset of the collec- dissimilarity level of 0.1 or less to ob-
tion of views of Cobra used in the exper- tain compact and well-separated clus-
iment. ters. The clusterings obtained in this
The shape spectrum of each view is manner demonstrate that the views of
computed and then its feature vector is each object fall into several distinguish-
determined. The views of each object able clusters. The centroid of each of
are clustered, based on the dissimilarity these clusters was determined by com-
measure $ between their moment vec- puting the mean of the moment vectors
tors using the complete-link hierarchi- of the views falling into the cluster.
cal clustering scheme [Jain and Dubes Dorai and Jain [1995] demonstrated
1988]. The hierarchical grouping ob- that this clustering-based view grouping
tained with 320 views of the Cobra ob- procedure facilitates object matching
0.25
0.20
0.15
0.10
0.05
0.0
These clusters of books and the corre- coaches detecting trends and patterns of
sponding cluster descriptions can be play for individual players and teams,
used as follows: If a user is searching and categorizing patterns of children in
for books, say, on image segmentation the foster care system [Hedberg 1996].
(I46), then we select cluster C 1 because Several journals have had recent special
its representation alone contains the issues on data mining [Cohen 1996,
string I46. Books B 2 (Neurocomputing) Cross 1996, Wah 1996].
and B 18 (Sensory Neural Networks: Lat- 6.4.1 Data Mining Approaches.
eral Inhibition) are both members of clus- Data mining, like clustering, is an ex-
ter C 1 even though their LCC numbers ploratory activity, so clustering methods
are quite different (B 2 is QA76.5.H4442, are well suited for data mining. Cluster-
B 18 is QP363.3.N33 ). ing is often an important initial step of
Four additional books labeled B 101 , several in the data mining process
[Fayyad 1996]. Some of the data mining
B 102 , B 103 , and B 104 have been used to approaches which use clustering are da-
study the problem of assigning classifi- tabase segmentation, predictive model-
cation numbers to new books. The LCC ing, and visualization of large data-
numbers of these books are: (B 101 ) bases.
Q335.T39, (B 102 ) QA76.73.P356C57, Segmentation. Clustering methods
(B 103 ) QA76.5.B76C.2, and (B 104 ) are used in data mining to segment
QA76.9D5W44. These books are as- databases into homogeneous groups.
signed to clusters based on nearest This can serve purposes of data com-
neighbor classification. The nearest pression (working with the clusters
rather than individual items), or to
neighbor of B 101 , a book on artificial identify characteristics of subpopula-
intelligence, is B 23 and so B 101 is as- tions which can be targeted for specific
signed to cluster C 1 . It is observed that purposes (e.g., marketing aimed at se-
the assignment of these four books to nior citizens).
the respective clusters is meaningful, A continuous k-means clustering algo-
demonstrating that knowledge-based rithm [Faber 1994] has been used to
clustering is useful in solving problems cluster pixels in Landsat images [Faber
associated with document retrieval. et al. 1994]. Each pixel originally has 7
values from different satellite bands,
6.4 Data Mining including infra-red. These 7 values are
difficult for humans to assimilate and
In recent years we have seen ever in- analyze without assistance. Pixels with
creasing volumes of collected data of all the 7 feature values are clustered into
sorts. With so much data available, it is 256 groups, then each pixel is assigned
necessary to develop algorithms which the value of the cluster centroid. The
can extract meaningful information image can then be displayed with the
from the vast stores. Searching for use- spatial information intact. Human view-
ful nuggets of information among huge ers can look at a single picture and
amounts of data has become known as identify a region of interest (e.g., high-
the field of data mining. way or forest) and label it as a concept.
Data mining can be applied to rela- The system then identifies other pixels
tional, transaction, and spatial data- in the same cluster as an instance of
bases, as well as large stores of unstruc- that concept.
tured data such as the World Wide Web. Predictive Modeling. Statistical meth-
There are many data mining systems in ods of data analysis usually involve hy-
use today, and applications include the pothesis testing of a model the analyst
U.S. Treasury detecting money launder- already has in mind. Data mining can
ing, National Basketball Association aid the user in discovering potential
1.0
0.8
0.6
94
89
56
75
0.4
3
50
12
16
19
31
81
64
74
0.2
58
61
21
55
51
79
80
48
85
25
20
53
77
63
33
18
11
78
91
95
59
60
30
96
92
100
29
49
52
82
83
6
7
57
88
0.0
13
14
17
90
66
73
84
2
10
22
24
28
39
40
1
62
65
36
86
87
93
98
41
8
26
32
97
99
15
72
43
47
37
38
34
35
54
44
45
71
23
27
67
68
69
70
76
4
5
42
46
Figure 34. The seven smallest clusters found in the document set. These are stemmed words.
large 4000 member group. This takes vent the hydrocarbon from leaking
care of spelling errors, proper names away. A large volume of porous sedi-
which are infrequent, and terms which ments is crucial to finding good recover-
are used in the same manner through- able reserves, therefore developing reli-
out the entire document set. Terms used able and accurate methods for
in specific contexts (such as file in the estimation of sediment porosities from
context of filing a patent, rather than a the collected data is key to estimating
computer file) will appear in the docu- hydrocarbon potential.
ments consistently with other terms ap- The general rule of thumb experts use
propriate to that context (patent, invent) for porosity computation is that it is a
and thus will tend to cluster together. quasiexponential function of depth:
Among the groups of words, unique con-
texts stand out from the crowd. Porosity 5 K z e2F~x1, x2, ..., xm!zDepth. (4)
After discarding the largest cluster,
the smaller set of features can be used A number of factors such as rock types,
to construct queries for seeking out structure, and cementation as parame-
other relevant documents on the Web ters of function F confound this rela-
using standard Web searching tools tionship. This necessitates the defini-
(e.g., Lycos, Alta Vista, Open Text). tion of proper contexts, in which to
Searching the Web with terms taken attempt discovery of porosity formulas.
from the word clusters allows discovery Geological contexts are expressed in
of finer grained topics (e.g., family med- terms of geological phenomena, such as
ical leave) within the broadly defined geometry, lithology, compaction, and
categories (e.g., labor). subsidence, associated with a region. It
is well known that geological context
6.4.3 Data Mining in Geological Da- changes from basin to basin (different
tabases. Database mining is a critical geographical areas in the world) and
resource in oil exploration and produc- also from region to region within a ba-
tion. It is common knowledge in the oil sin [Allen and Allen 1990; Biswas 1995].
industry that the typical cost of drilling Furthermore, the underlying features of
a new offshore well is in the range of contexts may vary greatly. Simple
$30-40 million, but the chance of that model matching techniques, which work
site being an economic success is 1 in in engineering domains where behavior
10. More informed and systematic drill- is constrained by man-made systems
ing decisions can significantly reduce and well-established laws of physics,
overall production costs. may not apply in the hydrocarbon explo-
Advances in drilling technology and ration domain. To address this, data
data collection methods have led to oil clustering was used to identify the rele-
companies and their ancillaries collect- vant contexts, and then equation discov-
ing large amounts of geophysical/geolog- ery was carried out within each context.
ical data from production wells and ex-
ploration sites, and then organizing The goal was to derive the subset x 1 ,
them into large databases. Data mining x 2 , ..., x m from a larger set of geological
techniques has recently been used to features, and the functional relation-
derive precise analytic relations be- ship F that best defined the porosity
tween observed phenomena and param- function in a region.
eters. These relations can then be used The overall methodology illustrated
to quantify oil and gas reserves. in Figure 35, consists of two primary
In qualitative terms, good recoverable steps: (i) Context definition using unsu-
reserves have high hydrocarbon satura- pervised clustering techniques, and (ii)
tion that are trapped by highly porous Equation discovery by regression analy-
sediments (reservoir porosity) and sur- sis [Li and Biswas 1995]. Real explora-
rounded by hard bulk rocks that pre- tion data collected from a region in the
Alaska basin was analyzed using the sample measurements collected from
methodology developed. The data ob- wells is the Alaskan Basin. The
jects (patterns) are described in terms of k -means clustered this data set into
37 geological features, such as porosity, seven groups. As an illustration, we se-
permeability, grain size, density, and lected a set of 138 objects representing a
sorting, amount of different mineral context for further analysis. The fea-
fragments (e.g., quartz, chert, feldspar) tures that best defined this cluster were
present, nature of the rock fragments, selected, and experts surmised that the
pore characteristics, and cementation. context represented a low porosity re-
All these feature values are numeric gion, which was modeled using the re-
measurements made on samples ob- gression procedure.
tained from well-logs during exploratory
drilling processes. 7. SUMMARY
The k -means clustering algorithm
was used to identify a set of homoge- There are several applications where
neous primitive geological structures decision making and exploratory pat-
~ g 1 , g 2 , ..., g m ! . These primitives were tern analysis have to be performed on
then mapped onto the unit code versus large data sets. For example, in docu-
stratigraphic unit map. Figure 36 de- ment retrieval, a set of relevant docu-
picts a partial mapping for a set of wells ments has to be found among several
and four primitive structures. The next millions of documents of dimensionality
step in the discovery process identified of more than 1000. It is possible to
sections of wells regions that were made handle these problems if some useful
up of the same sequence of geological abstraction of the data is obtained and
primitives. Every sequence defined a is used in decision making, rather than
directly using the entire data set. By
context C i . From the partial mapping of data abstraction, we mean a simple and
Figure 36, the context C 1 5 g 2 + g 1 + compact representation of the data.
g 2 + g 3 was identified in two well re- This simplicity helps the machine in
gions (the 300 and 600 series). After the efficient processing or a human in com-
contexts were defined, data points be- prehending the structure in data easily.
longing to each context were grouped Clustering algorithms are ideally suited
together for equation derivation. The for achieving data abstraction.
derivation procedure employed multiple In this paper, we have examined var-
regression analysis [Sen and Srivastava ious steps in clustering: (1) pattern rep-
1990]. resentation, (2) similarity computation,
This method was applied to a data set (3) grouping process, and (4) cluster rep-
of about 2600 objects corresponding to resentation. Also, we have discussed
Area Code
100 200 300 400 500 600 700
2000
3000
3100
3110
Stratigraphic Unit
3120
3130
3140
3150
3160
3170
3180
3190
Figure 36. Area code versus stratigraphic unit map for part of the studied region.
use knowledge either implicitly or ex- handle mixed data types. However, a
plicitly. Most of the knowledge-based major problem with fuzzy clustering is
clustering algorithms use explicit that it is difficult to obtain the member-
knowledge in similarity computation. ship values. A general approach may
However, if patterns are not repre- not work because of the subjective na-
sented using proper features, then it is ture of clustering. It is required to rep-
not possible to get a meaningful parti- resent clusters obtained in a suitable
tion irrespective of the quality and form to help the decision maker. Knowl-
quantity of knowledge used in similar- edge-based clustering schemes generate
ity computation. There is no universally intuitively appealing descriptions of
acceptable scheme for computing simi- clusters. They can be used even when
larity between patterns represented us- the patterns are represented using a
ing a mixture of both qualitative and combination of qualitative and quanti-
quantitative features. Dissimilarity be- tative features, provided that knowl-
tween a pair of patterns is represented edge linking a concept and the mixed
using a distance measure that may or features are available. However, imple-
may not be a metric. mentations of the conceptual clustering
The next step in clustering is the schemes are computationally expensive
grouping step. There are broadly two and are not suitable for grouping large
grouping schemes: hierarchical and par- data sets.
titional schemes. The hierarchical
schemes are more versatile, and the The k -means algorithm and its neural
partitional schemes are less expensive. implementation, the Kohonen net, are
The partitional algorithms aim at maxi- most successfully used on large data
mizing the squared error criterion func- sets. This is because k -means algorithm
tion. Motivated by the failure of the is simple to implement and computa-
squared error partitional clustering al- tionally attractive because of its linear
gorithms in finding the optimal solution time complexity. However, it is not fea-
to this problem, a large collection of sible to use even this linear time algo-
approaches have been proposed and rithm on large data sets. Incremental
used to obtain the global optimal solu- algorithms like leader and its neural
tion to this problem. However, these implementation, the ART network, can
schemes are computationally prohibi- be used to cluster large data sets. But
tive on large data sets. ANN-based clus- they tend to be order-dependent. Divide
tering schemes are neural implementa- and conquer is a heuristic that has been
tions of the clustering algorithms, and rightly exploited by computer algorithm
they share the undesired properties of designers to reduce computational costs.
these algorithms. However, ANNs have However, it should be judiciously used
the capability to automatically normal- in clustering to achieve meaningful re-
ize the data and extract features. An sults.
important observation is that even if a In summary, clustering is an interest-
scheme can find the optimal solution to ing, useful, and challenging problem. It
the squared error partitioning problem, has great potential in applications like
it may still fall short of the require- object recognition, image segmentation,
ments because of the possible non-iso- and information filtering and retrieval.
tropic nature of the clusters. However, it is possible to exploit this
In some applications, for example in potential only after making several de-
document retrieval, it may be useful to sign choices carefully.
have a clustering that is not a partition.
This means clusters are overlapping. ACKNOWLEDGMENTS
Fuzzy clustering and functional cluster-
ing are ideally suited for this purpose. The authors wish to acknowledge the
Also, fuzzy clustering algorithms can generosity of several colleagues who
BRODATZ, P. 1966. Textures: A Photographic Al- DIDAY, E. 1973. The dynamic cluster method in
bum for Artists and Designers. Dover Publi- non-hierarchical clustering. J. Comput. Inf.
cations, Inc., Mineola, NY. Sci. 2, 61– 88.
CAN, F. 1993. Incremental clustering for dy- DIDAY, E. AND SIMON, J. C. 1976. Clustering
namic information processing. ACM Trans. analysis. In Digital Pattern Recognition, K.
Inf. Syst. 11, 2 (Apr. 1993), 143–164. S. Fu, Ed. Springer-Verlag, Secaucus, NJ,
CARPENTER, G. AND GROSSBERG, S. 1990. ART3: 47–94.
Hierarchical search using chemical transmit- DIDAY, E. 1988. The symbolic approach in clus-
ters in self-organizing pattern recognition ar- tering. In Classification and Related Meth-
chitectures. Neural Networks 3, 129 –152. ods, H. H. Bock, Ed. North-Holland Pub-
CHEKURI, C., GOLDWASSER, M. H., RAGHAVAN, P., lishing Co., Amsterdam, The Netherlands.
AND UPFAL, E. 1997. Web search using au- DORAI, C. AND JAIN, A. K. 1995. Shape spectra
tomatic classification. In Proceedings of the based view grouping for free-form objects. In
Sixth International Conference on the World Proceedings of the International Conference on
Wide Web (Santa Clara, CA, Apr.), http:// Image Processing (ICIP-95), 240 –243.
theory.stanford.edu/people/wass/publications/ DUBES, R. C. AND JAIN, A. K. 1976. Clustering
Web Search/Web Search.html. techniques: The user’s dilemma. Pattern
CHENG, C. H. 1995. A branch-and-bound clus- Recogn. 8, 247–260.
tering algorithm. IEEE Trans. Syst. Man DUBES, R. C. AND JAIN, A. K. 1980. Clustering
Cybern. 25, 895– 898. methodology in exploratory data analysis. In
CHENG, Y. AND FU, K. S. 1985. Conceptual clus- Advances in Computers, M. C. Yovits,, Ed.
tering in knowledge organization. IEEE Academic Press, Inc., New York, NY, 113–
Trans. Pattern Anal. Mach. Intell. 7, 592–598. 125.
CHENG, Y. 1995. Mean shift, mode seeking, and DUBES, R. C. 1987. How many clusters are
clustering. IEEE Trans. Pattern Anal. Mach. best?—an experiment. Pattern Recogn. 20, 6
Intell. 17, 7 (July), 790 –799. (Nov. 1, 1987), 645– 663.
CHIEN, Y. T. 1978. Interactive Pattern Recogni- DUBES, R. C. 1993. Cluster analysis and related
tion. Marcel Dekker, Inc., New York, NY. issues. In Handbook of Pattern Recognition
& Computer Vision, C. H. Chen, L. F. Pau,
CHOUDHURY, S. AND MURTY, M. N. 1990. A divi-
and P. S. P. Wang, Eds. World Scientific
sive scheme for constructing minimal span-
Publishing Co., Inc., River Edge, NJ, 3–32.
ning trees in coordinate space. Pattern
DUBUISSON, M. P. AND JAIN, A. K. 1994. A mod-
Recogn. Lett. 11, 6 (Jun. 1990), 385–389.
ified Hausdorff distance for object matchin-
1996. Special issue on data mining. Commun.
g. In Proceedings of the International Con-
ACM 39, 11.
ference on Pattern Recognition (ICPR
COLEMAN, G. B. AND ANDREWS, H.
’94), 566 –568.
C. 1979. Image segmentation by cluster- DUDA, R. O. AND HART, P. E. 1973. Pattern
ing. Proc. IEEE 67, 5, 773–785. Classification and Scene Analysis. John
CONNELL, S. AND JAIN, A. K. 1998. Learning Wiley and Sons, Inc., New York, NY.
prototypes for on-line handwritten digits. In DUNN, S., JANOS, L., AND ROSENFELD, A. 1983.
Proceedings of the 14th International Confer- Bimean clustering. Pattern Recogn. Lett. 1,
ence on Pattern Recognition (Brisbane, Aus- 169 –173.
tralia, Aug.), 182–184. DURAN, B. S. AND ODELL, P. L. 1974. Cluster
CROSS, S. E., Ed. 1996. Special issue on data Analysis: A Survey. Springer-Verlag, New
mining. IEEE Expert 11, 5 (Oct.). York, NY.
DALE, M. B. 1985. On the comparison of con- EDDY, W. F., MOCKUS, A., AND OUE, S. 1996.
ceptual clustering and numerical taxono- Approximate single linkage cluster analysis of
my. IEEE Trans. Pattern Anal. Mach. Intell. large data sets in high-dimensional spaces.
7, 241–244. Comput. Stat. Data Anal. 23, 1, 29 – 43.
DAVE, R. N. 1992. Generalized fuzzy C-shells ETZIONI, O. 1996. The World-Wide Web: quag-
clustering and detection of circular and ellip- mire or gold mine? Commun. ACM 39, 11,
tic boundaries. Pattern Recogn. 25, 713–722. 65– 68.
DAVIS, T., Ed. 1991. The Handbook of Genetic EVERITT, B. S. 1993. Cluster Analysis. Edward
Algorithms. Van Nostrand Reinhold Co., Arnold, Ltd., London, UK.
New York, NY. FABER, V. 1994. Clustering and the continuous
DAY, W. H. E. 1992. Complexity theory: An in- k-means algorithm. Los Alamos Science 22,
troduction for practitioners of classifica- 138 –144.
tion. In Clustering and Classification, P. FABER, V., HOCHBERG, J. C., KELLY, P. M., THOMAS,
Arabie and L. Hubert, Eds. World Scientific T. R., AND WHITE, J. M. 1994. Concept ex-
Publishing Co., Inc., River Edge, NJ. traction: A data-mining technique. Los
DEMPSTER, A. P., LAIRD, N. M., AND RUBIN, D. Alamos Science 22, 122–149.
B. 1977. Maximum likelihood from incom- FAYYAD, U. M. 1996. Data mining and knowl-
plete data via the EM algorithm. J. Royal edge discovery: Making sense out of data.
Stat. Soc. B. 39, 1, 1–38. IEEE Expert 11, 5 (Oct.), 20 –25.
HUTTENLOCHER, D. P., KLANDERMAN, G. A., AND JONES, D. AND BELTRAMO, M. A. 1991. Solving
RUCKLIDGE, W. J. 1993. Comparing images partitioning problems with genetic algorithms.
using the Hausdorff distance. IEEE Trans. In Proceedings of the Fourth International
Pattern Anal. Mach. Intell. 15, 9, 850 – 863. Conference on Genetic Algorithms, 442– 449.
ICHINO, M. AND YAGUCHI, H. 1994. Generalized JUDD, D., MCKINLEY, P., AND JAIN, A. K.
Minkowski metrics for mixed feature-type 1996. Large-scale parallel data clustering.
data analysis. IEEE Trans. Syst. Man Cy- In Proceedings of the International Conference
bern. 24, 698 –708. on Pattern Recognition (Vienna, Aus-
1991. Proceedings of the International Joint Con- tria), 488 – 493.
ference on Neural Networks. (IJCNN’91). KING, B. 1967. Step-wise clustering proce-
1992. Proceedings of the International Joint Con- dures. J. Am. Stat. Assoc. 69, 86 –101.
ference on Neural Networks. KIRKPATRICK, S., GELATT, C. D., JR., AND VECCHI,
ISMAIL, M. A. AND KAMEL, M. S. 1989. M. P. 1983. Optimization by simulated an-
Multidimensional data clustering utilizing nealing. Science 220, 4598 (May), 671– 680.
hybrid search strategies. Pattern Recogn. 22, KLEIN, R. W. AND DUBES, R. C. 1989.
1 (Jan. 1989), 75– 89. Experiments in projection and clustering by
JAIN, A. K. AND DUBES, R. C. 1988. Algorithms simulated annealing. Pattern Recogn. 22,
for Clustering Data. Prentice-Hall advanced 213–220.
reference series. Prentice-Hall, Inc., Upper KNUTH, D. 1973. The Art of Computer Program-
Saddle River, NJ. ming. Addison-Wesley, Reading, MA.
JAIN, A. K. AND FARROKHNIA, F. 1991. KOONTZ, W. L. G., FUKUNAGA, K., AND NARENDRA,
Unsupervised texture segmentation using Ga- P. M. 1975. A branch and bound clustering
bor filters. Pattern Recogn. 24, 12 (Dec. algorithm. IEEE Trans. Comput. 23, 908 –
1991), 1167–1186. 914.
JAIN, A. K. AND BHATTACHARJEE, S. 1992. Text KOHONEN, T. 1989. Self-Organization and Asso-
segmentation using Gabor filters for auto- ciative Memory. 3rd ed. Springer informa-
matic document processing. Mach. Vision tion sciences series. Springer-Verlag, New
Appl. 5, 3 (Summer 1992), 169 –184. York, NY.
JAIN, A. J. AND FLYNN, P. J., Eds. 1993. Three KRAAIJVELD, M., MAO, J., AND JAIN, A. K. 1995.
Dimensional Object Recognition Systems. A non-linear projection method based on Ko-
Elsevier Science Inc., New York, NY. honen’s topology preserving maps. IEEE
JAIN, A. K. AND MAO, J. 1994. Neural networks Trans. Neural Netw. 6, 548 –559.
and pattern recognition. In Computational KRISHNAPURAM, R., FRIGUI, H., AND NASRAOUI, O.
Intelligence: Imitating Life, J. M. Zurada, R. 1995. Fuzzy and probabilistic shell cluster-
J. Marks, and C. J. Robinson, Eds. 194 – ing algorithms and their application to bound-
212. ary detection and surface approximation.
JAIN, A. K. AND FLYNN, P. J. 1996. Image seg- IEEE Trans. Fuzzy Systems 3, 29 – 60.
mentation using clustering. In Advances in KURITA, T. 1991. An efficient agglomerative
Image Understanding: A Festschrift for Azriel clustering algorithm using a heap. Pattern
Rosenfeld, N. Ahuja and K. Bowyer, Eds, Recogn. 24, 3 (1991), 205–209.
IEEE Press, Piscataway, NJ, 65– 83. LIBRARY OF CONGRESS, 1990. LC classification
JAIN, A. K. AND MAO, J. 1996. Artificial neural outline. Library of Congress, Washington,
networks: A tutorial. IEEE Computer 29 DC.
(Mar.), 31– 44. LEBOWITZ, M. 1987. Experiments with incre-
JAIN, A. K., RATHA, N. K., AND LAKSHMANAN, S. mental concept formation. Mach. Learn. 2,
1997. Object detection using Gabor filters. 103–138.
Pattern Recogn. 30, 2, 295–309. LEE, H.-Y. AND ONG, H.-L. 1996. Visualization
JAIN, N. C., INDRAYAN, A., AND GOEL, L. R. support for data mining. IEEE Expert 11, 5
1986. Monte Carlo comparison of six hierar- (Oct.), 69 –75.
chical clustering methods on random data. LEE, R. C. T., SLAGLE, J. R., AND MONG, C. T.
Pattern Recogn. 19, 1 (Jan./Feb. 1986), 95–99. 1978. Towards automatic auditing of
JAIN, R., KASTURI, R., AND SCHUNCK, B. G. records. IEEE Trans. Softw. Eng. 4, 441–
1995. Machine Vision. McGraw-Hill series 448.
in computer science. McGraw-Hill, Inc., New LEE, R. C. T. 1981. Cluster analysis and its
York, NY. applications. In Advances in Information
JARVIS, R. A. AND PATRICK, E. A. 1973. Systems Science, J. T. Tou, Ed. Plenum
Clustering using a similarity method based on Press, New York, NY.
shared near neighbors. IEEE Trans. Com- LI, C. AND BISWAS, G. 1995. Knowledge-based
put. C-22, 8 (Aug.), 1025–1034. scientific discovery in geological databases.
JOLION, J.-M., MEER, P., AND BATAOUCHE, S. In Proceedings of the First International Con-
1991. Robust clustering with applications in ference on Knowledge Discovery and Data
computer vision. IEEE Trans. Pattern Anal. Mining (Montreal, Canada, Aug. 20-21),
Mach. Intell. 13, 8 (Aug. 1991), 791– 802. 204 –209.
LU, S. Y. AND FU, K. S. 1978. A sentence-to- MURTY, M. N. AND KRISHNA, G. 1980. A compu-
sentence clustering procedure for pattern tationally efficient technique for data cluster-
analysis. IEEE Trans. Syst. Man Cybern. 8, ing. Pattern Recogn. 12, 153–158.
381–389. MURTY, M. N. AND JAIN, A. K. 1995. Knowledge-
LUNDERVOLD, A., FENSTAD, A. M., ERSLAND, L., AND based clustering scheme for collection man-
TAXT, T. 1996. Brain tissue volumes from agement and retrieval of library books. Pat-
multispectral 3D MRI: A comparative study of tern Recogn. 28, 949 –964.
four classifiers. In Proceedings of the Confer- NAGY, G. 1968. State of the art in pattern rec-
ence of the Society on Magnetic Resonance, ognition. Proc. IEEE 56, 836 – 862.
MAAREK, Y. S. AND BEN SHAUL, I. Z. 1996. NG, R. AND HAN, J. 1994. Very large data bases.
Automatically organizing bookmarks per con- In Proceedings of the 20th International Con-
tents. In Proceedings of the Fifth Interna- ference on Very Large Data Bases (VLDB’94,
tional Conference on the World Wide Web Santiago, Chile, Sept.), VLDB Endowment,
(Paris, May), http://www5conf.inria.fr/fich- Berkeley, CA, 144 –155.
html/paper-sessions.html. NGUYEN, H. H. AND COHEN, P. 1993. Gibbs ran-
MCQUEEN, J. 1967. Some methods for classifi- dom fields, fuzzy clustering, and the unsuper-
cation and analysis of multivariate observa- vised segmentation of textured images. CV-
tions. In Proceedings of the Fifth Berkeley GIP: Graph. Models Image Process. 55, 1 (Jan.
Symposium on Mathematical Statistics and 1993), 1–19.
Probability, 281–297. OEHLER, K. L. AND GRAY, R. M. 1995.
MAO, J. AND JAIN, A. K. 1992. Texture classifi- Combining image compression and classifica-
cation and segmentation using multiresolu- tion using vector quantization. IEEE Trans.
tion simultaneous autoregressive models. Pattern Anal. Mach. Intell. 17, 461– 473.
Pattern Recogn. 25, 2 (Feb. 1992), 173–188. OJA, E. 1982. A simplified neuron model as a
MAO, J. AND JAIN, A. K. 1995. Artificial neural principal component analyzer. Bull. Math.
networks for feature extraction and multivari- Bio. 15, 267–273.
ate data projection. IEEE Trans. Neural OZAWA, K. 1985. A stratificational overlapping
Netw. 6, 296 –317. cluster scheme. Pattern Recogn. 18, 279 –286.
MAO, J. AND JAIN, A. K. 1996. A self-organizing OPEN TEXT, 1999. http://index.opentext.net.
network for hyperellipsoidal clustering (HEC). KAMGAR-PARSI, B., GUALTIERI, J. A., DEVANEY, J. A.,
IEEE Trans. Neural Netw. 7, 16 –29. AND KAMGAR-PARSI, K. 1990. Clustering with
MEVINS, A. J. 1995. A branch and bound incre- neural networks. Biol. Cybern. 63, 201–208.
mental conceptual clusterer. Mach. Learn. LYCOS, 1999. http://www.lycos.com.
18, 5–22. PAL, N. R., BEZDEK, J. C., AND TSAO, E. C.-K.
MICHALSKI, R., STEPP, R. E., AND DIDAY, E. 1993. Generalized clustering networks and
1981. A recent advance in data analysis: Kohonen’s self-organizing scheme. IEEE
Clustering objects into classes characterized Trans. Neural Netw. 4, 549 –557.
by conjunctive concepts. In Progress in Pat- QUINLAN, J. R. 1990. Decision trees and deci-
tern Recognition, Vol. 1, L. Kanal and A. sion making. IEEE Trans. Syst. Man Cy-
Rosenfeld, Eds. North-Holland Publishing bern. 20, 339 –346.
Co., Amsterdam, The Netherlands. RAGHAVAN, V. V. AND BIRCHAND, K. 1979. A
MICHALSKI, R., STEPP, R. E., AND DIDAY, clustering strategy based on a formalism of
E. 1983. Automated construction of classi- the reproductive process in a natural system.
fications: conceptual clustering versus numer- In Proceedings of the Second International
ical taxonomy. IEEE Trans. Pattern Anal. Conference on Information Storage and Re-
Mach. Intell. PAMI-5, 5 (Sept.), 396 – 409. trieval, 10 –22.
MISHRA, S. K. AND RAGHAVAN, V. V. 1994. An RAGHAVAN, V. V. AND YU, C. T. 1981. A compar-
empirical study of the performance of heuris- ison of the stability characteristics of some
tic methods for clustering. In Pattern Recog- graph theoretic clustering methods. IEEE
nition in Practice, E. S. Gelsema and L. N. Trans. Pattern Anal. Mach. Intell. 3, 393– 402.
Kanal, Eds. 425– 436. RASMUSSEN, E. 1992. Clustering algorithms.
MITCHELL, T. 1997. Machine Learning. McGraw- In Information Retrieval: Data Structures and
Hill, Inc., New York, NY. Algorithms, W. B. Frakes and R. Baeza-Yates,
MOHIUDDIN, K. M. AND MAO, J. 1994. A compar- Eds. Prentice-Hall, Inc., Upper Saddle
ative study of different classifiers for hand- River, NJ, 419 – 442.
printed character recognition. In Pattern RICH, E. 1983. Artificial Intelligence. McGraw-
Recognition in Practice, E. S. Gelsema and L. Hill, Inc., New York, NY.
N. Kanal, Eds. 437– 448. RIPLEY, B. D., Ed. 1989. Statistical Inference
MOOR, B. K. 1988. ART 1 and Pattern Cluster- for Spatial Processes. Cambridge University
ing. In 1988 Connectionist Summer School, Press, New York, NY.
Morgan Kaufmann, San Mateo, CA, 174 –185. ROSE, K., GUREWITZ, E., AND FOX, G. C. 1993.
MURTAGH, F. 1984. A survey of recent advances Deterministic annealing approach to con-
in hierarchical clustering algorithms which strained clustering. IEEE Trans. Pattern
use cluster centers. Comput. J. 26, 354 –359. Anal. Mach. Intell. 15, 785–794.
ROSENFELD, A. AND KAK, A. C. 1982. Digital Pic- SPATH, H. 1980. Cluster Analysis Algorithms
ture Processing. 2nd ed. Academic Press, for Data Reduction and Classification. Ellis
Inc., New York, NY. Horwood, Upper Saddle River, NJ.
ROSENFELD, A., SCHNEIDER, V. B., AND HUANG, M. SOLBERG, A., TAXT, T., AND JAIN, A. 1996. A
K. 1969. An application of cluster detection Markov random field model for classification
to text and picture processing. IEEE Trans. of multisource satellite imagery. IEEE
Inf. Theor. 15, 6, 672– 681. Trans. Geoscience and Remote Sensing 34, 1,
ROSS, G. J. S. 1968. Classification techniques 100 –113.
for large sets of data. In Numerical Taxon- SRIVASTAVA, A. AND MURTY, M. N 1990. A com-
omy, A. J. Cole, Ed. Academic Press, Inc., parison between conceptual clustering and
New York, NY. conventional clustering. Pattern Recogn. 23,
RUSPINI, E. H. 1969. A new approach to cluster- 9 (1990), 975–981.
ing. Inf. Control 15, 22–32. STAHL, H. 1986. Cluster analysis of large data
SALTON, G. 1991. Developments in automatic sets. In Classification as a Tool of Research,
text retrieval. Science 253, 974 –980.
W. Gaul and M. Schader, Eds. Elsevier
SAMAL, A. AND IYENGAR, P. A. 1992. Automatic
North-Holland, Inc., New York, NY, 423– 430.
recognition and analysis of human faces and
STEPP, R. E. AND MICHALSKI, R. S. 1986.
facial expressions: A survey. Pattern Recogn.
Conceptual clustering of structured objects: A
25, 1 (Jan. 1992), 65–77.
SAMMON, J. W. JR. 1969. A nonlinear mapping goal-oriented approach. Artif. Intell. 28, 1
for data structure analysis. IEEE Trans. (Feb. 1986), 43– 69.
Comput. 18, 401– 409. SUTTON, M., STARK, L., AND BOWYER, K.
SANGAL, R. 1991. Programming Paradigms in 1993. Function-based generic recognition for
LISP. McGraw-Hill, Inc., New York, NY. multiple object categories. In Three-Dimen-
SCHACHTER, B. J., DAVIS, L. S., AND ROSENFELD, sional Object Recognition Systems, A. Jain
A. 1979. Some experiments in image seg- and P. J. Flynn, Eds. Elsevier Science Inc.,
mentation by clustering of local feature val- New York, NY.
ues. Pattern Recogn. 11, 19 –28. SYMON, M. J. 1977. Clustering criterion and
SCHWEFEL, H. P. 1981. Numerical Optimization multi-variate normal mixture. Biometrics
of Computer Models. John Wiley and Sons, 77, 35– 43.
Inc., New York, NY. TANAKA, E. 1995. Theoretical aspects of syntac-
SELIM, S. Z. AND ISMAIL, M. A. 1984. K-means- tic pattern recognition. Pattern Recogn. 28,
type algorithms: A generalized convergence 1053–1061.
theorem and characterization of local opti- TAXT, T. AND LUNDERVOLD, A. 1994. Multi-
mality. IEEE Trans. Pattern Anal. Mach. In- spectral analysis of the brain using magnetic
tell. 6, 81– 87. resonance imaging. IEEE Trans. Medical
SELIM, S. Z. AND ALSULTAN, K. 1991. A simu- Imaging 13, 3, 470 – 481.
lated annealing algorithm for the clustering TITTERINGTON, D. M., SMITH, A. F. M., AND MAKOV,
problem. Pattern Recogn. 24, 10 (1991), U. E. 1985. Statistical Analysis of Finite
1003–1008. Mixture Distributions. John Wiley and Sons,
SEN, A. AND SRIVASTAVA, M. 1990. Regression Inc., New York, NY.
Analysis. Springer-Verlag, New York, NY. TOUSSAINT, G. T. 1980. The relative neighbor-
SETHI, I. AND JAIN, A. K., Eds. 1991. Artificial hood graph of a finite planar set. Pattern
Neural Networks and Pattern Recognition: Recogn. 12, 261–268.
Old and New Connections. Elsevier Science TRIER, O. D. AND JAIN, A. K. 1995. Goal-
Inc., New York, NY.
directed evaluation of binarization methods.
SHEKAR, B., MURTY, N. M., AND KRISHNA, G.
IEEE Trans. Pattern Anal. Mach. Intell. 17,
1987. A knowledge-based clustering scheme.
1191–1201.
Pattern Recogn. Lett. 5, 4 (Apr. 1, 1987), 253–
UCHIYAMA, T. AND ARBIB, M. A. 1994. Color image
259.
SILVERMAN, J. F. AND COOPER, D. B. 1988. segmentation using competitive learning.
Bayesian clustering for unsupervised estima- IEEE Trans. Pattern Anal. Mach. Intell. 16, 12
tion of surface and texture models. (Dec. 1994), 1197–1206.
IEEE Trans. Pattern Anal. Mach. Intell. 10, 4 URQUHART, R. B. 1982. Graph theoretical clus-
(July 1988), 482– 495. tering based on limited neighborhood
SIMOUDIS, E. 1996. Reality check for data min- sets. Pattern Recogn. 15, 173–187.
ing. IEEE Expert 11, 5 (Oct.), 26 –33. VENKATESWARLU, N. B. AND RAJU, P. S. V. S. K.
SLAGLE, J. R., CHANG, C. L., AND HELLER, S. R. 1992. Fast ISODATA clustering algorithms.
1975. A clustering and data-reorganizing al- Pattern Recogn. 25, 3 (Mar. 1992), 335–342.
gorithm. IEEE Trans. Syst. Man Cybern. 5, VINOD, V. V., CHAUDHURY, S., MUKHERJEE, J., AND
125–128. GHOSE, S. 1994. A connectionist approach
SNEATH, P. H. A. AND SOKAL, R. R. 1973. for clustering with applications in image
Numerical Taxonomy. Freeman, London, analysis. IEEE Trans. Syst. Man Cybern. 24,
UK. 365–384.
WAH, B. W., Ed. 1996. Special section on min- WULFEKUHLER, M. AND PUNCH, W. 1997. Finding
ing of databases. IEEE Trans. Knowl. Data salient features for personal web page categories.
Eng. (Dec.). In Proceedings of the Sixth International Con-
WARD, J. H. JR. 1963. Hierarchical grouping to ference on the World Wide Web (Santa Clara,
optimize an objective function. J. Am. Stat. CA, Apr.), http://theory.stanford.edu/people/
Assoc. 58, 236 –244. wass/publications/Web Search/Web Search.html.
WATANABE, S. 1985. Pattern Recognition: Hu- ZADEH, L. A. 1965. Fuzzy sets. Inf. Control 8,
man and Mechanical. John Wiley and Sons, 338 –353.
Inc., New York, NY. ZAHN, C. T. 1971. Graph-theoretical methods
WESZKA, J. 1978. A survey of threshold selec- for detecting and describing gestalt clusters.
tion techniques. Pattern Recogn. 7, 259 –265. IEEE Trans. Comput. C-20 (Apr.), 68 – 86.
WHITLEY, D., STARKWEATHER, T., AND FUQUAY,
ZHANG, K. 1995. Algorithms for the constrained
D. 1989. Scheduling problems and travel-
editing distance between ordered labeled
ing salesman: the genetic edge recombina-
trees and related problems. Pattern Recogn.
tion. In Proceedings of the Third Interna-
tional Conference on Genetic Algorithms 28, 463– 474.
(George Mason University, June 4 –7), J. D. ZHANG, J. AND MICHALSKI, R. S. 1995. An inte-
Schaffer, Ed. Morgan Kaufmann Publishers gration of rule induction and exemplar-based
Inc., San Francisco, CA, 133–140. learning for graded concepts. Mach. Learn.
WILSON, D. R. AND MARTINEZ, T. R. 1997. 21, 3 (Dec. 1995), 235–267.
Improved heterogeneous distance func- ZHANG, T., RAMAKRISHNAN, R., AND LIVNY, M.
tions. J. Artif. Intell. Res. 6, 1–34. 1996. BIRCH: An efficient data clustering
WU, Z. AND LEAHY, R. 1993. An optimal graph method for very large databases. SIGMOD
theoretic approach to data clustering: Theory Rec. 25, 2, 103–114.
and its application to image segmentation. ZUPAN, J. 1982. Clustering of Large Data
IEEE Trans. Pattern Anal. Mach. Intell. 15, Sets. Research Studies Press Ltd., Taunton,
1101–1113. UK.