Professional Documents
Culture Documents
E-mail: tomas@theory.stanford.edu ciently store and retrieve this information [40]. A major issue
that document databases are now facing is the extremely high
metric space
Formally, the clustering problem is: given points in a
, partition the points into clusters so as to Previous Work in Static Clustering. The closely-related
minimize the maximum cluster diameter. The diameter of a problems of clustering to minimize diameter and radius are
cluster is defined to be the maximum inter-point distance in
also called pairwise clustering and the -center problem, re-
it. Sometimes the objective function is chosen to be the max- spectively [2, 21]. Both are NP-hard [17, 28], and in fact
imum cluster radius. In Euclidean spaces, radius denotes the hard to approximate to within factor 2 for arbitrary metric
radius of the minimum ball enclosing all points in the cluster. spaces [2, 21]. For Euclidean spaces, clustering on the line
To extend the notion of radius to arbitrary metric spaces, we is easy [3], but in higher dimensions it is NP-hard to approx-
first select a center point in each cluster, whereupon the ra- imate to within factors close to 2, regardless of the metric
dius is defined as the maximum distance from the center to used [14, 15, 19, 29, 30]. The furthest point heuristic due
any point in the cluster. We will assume the diameter meas- to Gonzalez [19] (see also Hochbaum and Shmoys [23, 24])
ure as the default. gives a 2-approximation in all metric spaces. This algorithm
We define the incremental clustering problem as follows: requires distance computations, and when the met-
for an update sequence of points in , maintain a collec- ric space is induced by shortest-path distances in weighted
tion of clusters such that as each input point is presented, graphs, the running time is . Feder and Greene [14]
either it is assigned to one of the current clusters, or it starts gave an implementation for Euclidean spaces with running
off a new cluster while two existing clusters are merged into time .
Overview of Results. Our results for incremental clustering there is the issue of handling deletions which, though not
show that it is possible to obtain algorithms that are compar- important for our motivating application of information re-
able to the best possible in the static setting, both in terms of trieval, may be relevant in other applications. Finally, there is
efficiency and performance ratio. We begin in Section 2 by the question of formulating a model for adaptive clustering,
considering natural greedy algorithms that choose clusters to wherein the clustering may be modified as a result of queries
merge based on some measure of the resulting cluster. We es- and user feedback, even without any updates.
tablish that greedy algorithms behave poorly by proving that
a Center-Greedy algorithm has a tight performance ratio of 2 Greedy Algorithms
, and a Diameter-Greedy algorithm has a lower bound
of
. It seems likely that greedy algorithms behave We begin by examining some natural greedy algorithms.
better in geometric spaces, and we discover some evidence A greedy incremental clustering algorithm always merges
in the case of the line. We show that Diameter-Greedy has clusters to minimize some fixed measure. Our results indic-
performance ratio 2 for
on the line. This analysis ate that such algorithms perform poorly.
suggests a variant of Diameter-Greedy, and this is shown to
achieve ratio 3 for all on the line. In Section 3 we present Definition 1 The Center-Greedy Algorithm associates a
the Doubling Algorithm and show that its performance ratio center for each cluster and merges the two clusters whose
is , and that a randomized version has ratio
. While the centers are closest. The center of the old cluster with the
obvious implementation of these algorithms is expensive, we larger radius becomes the new center. It is possible to define
variants of Center-Greedy based on how the centers of the
show that they can be implemented so as to achieve amort-
ized time per update. These results for the Doub- clusters are picked but we restrict ourselves to this definition
ling Algorithm carry over to the radius measure. Then, in for reasons of simplicity and intuitiveness.
Section 4, we present the Clique Algorithm and show that
Definition 2 The Diameter-Greedy Algorithm always
it has performance ratio 6, and that a randomized version
has ratio
. While the Clique Algorithm may appear to
merges those two clusters which minimize the diameter of
the resulting merged cluster.
dominate the Doubling Algorithm, this is not the case since
the former requires computing clique partitions, an NP-hard We can establish the following lower bounds on the per-
problem, although it must be said in its defense that the clique
formance ratio of these two greedy algorithms. We omit the
partitions need only be computed in graphs with ver- proofs in this extended abstract.
tices. While the performance ratio of the Clique Algorithm is
8 for the radius measure, improved bounds are possible for - Theorem 1 The Center-Greedy Algorithm has performance
dimensional Euclidean spaces; specifically, we show that the
ratio at least %& .
radius performance ratio of the Clique Algorithm in im-
proves to
, which is 6 for ! , and is
Theorem 2 The Diameter-Greedy Algorithm has perform-
asymptotic to 6.83 for large . In Section 5, we provide lower ance ratio at least , even on the line.
bounds for incremental clustering algorithms. We show that
even for " and on the line, no deterministic or ran-
We now give a tight upper bound for the Center-Greedy
Algorithm. Note that for '(
it has ratio 5, but for larger
domized algorithm can achieve a ratio better than 2. We im-
prove this lower bound to
# for deterministic algorithms
its performance is worse than that of the algorithms to be
presented later.
in general metric spaces. Finally, in Section 6 we consider the
dual clustering problem of minimizing the number of clusters Theorem 3 The Center-Greedy Algorithm has performance
of a fixed radius. Since it is impossible to achieve bounded
ratio of %) in any metric space.
ratios for general metric spaces, we focus on -dimensional
Proof: Suppose that a set * of points is inserted. Let
Euclidean spaces. We present an incremental algorithm that
$ , and also provide a lower their optimal clustering be the partition +,.-0/21435
$
$
637/98: ,
has performance ratio
bound of . with as the optimal diameter. We will show that the dia-
gested by our work. There are the obvious questions of We define a graph ; on the set + of the optimal clusters,
improving our upper and lower bounds, particularly for the where two clusters are connected by an edge if the min-
dual clustering problem. An important theoretical question imum distance between them is at most , where the distance
is whether the geometric setting permits better ratios than do between two clusters is the minimum distances between
metric spaces. Our model can be generalized in many dif- points in them. Consider the connected components of ; .
ferent ways. Depending on the exact application, we may Note that two clusters in different connected components
wish to consider other measures of clustering quality, such have minimum distance strictly greater than . We say that
as: minimum variance in cluster diameter, and the sum of a cluster < intersects a connected component consisting of
squares of the inter-point distances within a cluster. Then, the optimal clusters />=@?53$
5
$
73A/9=@B if < intersects CEF$D G 1 /9=IH .
We claim that at all times, any cluster produced by Center- Unlike Diameter-Greedy, we can show that
-Diameter
Greedy intersects exactly one connected component of ; . Greedy has a bounded performance ratio on the line.
Theorem 6 The
-Diameter Greedy Algorithm has perform-
We prove this claim by induction over . Suppose the claim
ance ratio
on the line.
is true before a new point arrives. Initially, is in a
cluster of its own and Center-Greedy has clusters, each
of which intersect exactly one connected component of ; . Proof: In fact, we show that it produces a clustering with
Since there are cluster centers, two of them must be
-diameter at most the optimal diameter, and the factor of
in the same optimal cluster. This implies that the distance
follows. Assume this holds before the last two clusters
between the two closest centers is at most . If < 1 and <
are merged. Let 0153 3$
$
5
7378 be the intervals in the optimal
are the clusters that Center-Greedy merges at this stage, the
clustering, with maximum diameter . Let / 1$3A/ 3$
$
5
3A/98 1
centers of < 1 and < must be at most apart. Hence, both be the current clusters, each with
-diameter at most , of
clusters’ centers must lie in the same connected component of which two must be merged. If />= starts in and ends in
,
; , say . By the inductive hypothesis, all points in < 1 and let =
; notice that 1 5 8 1 . We
< must be in . Hence, all points in the new cluster < 14C9< assume that if / = ends in then / = 1 starts in ; otherwise,
must lie in , establishing the inductive hypothesis. we could replace the argument in the intervals F by an ar-
Since each cluster produced by Center-Greedy lies in gument either in the first intervals 1 35
$
5
3 , if there are at
exactly one connected component of ; , the diameter is least clusters / = in this region, or in the last inter-
bounded by the maximum diameter of a connected compon-
vals
103$
$
5
7378 , if there are at least 4 current clusters
ent, which is at most
. /9= in this region. Now, the bounds imply that for some , we
For Diameter-Greedy in general metric spaces, we only have = = 1 . If =!=" 1 !# , then the merging
have the following weak upper bound; the proof is deferred of /9= and /9= 1 is contained in a single interval F and has
to the final version. diameter at most . If say = $# and =" 1 , then the
gap ; between the two consecutive intervals F and F 1 in-
Theorem 4 For , the Diameter-Greedy Algorithm has volved is at most , since / =" 1 has
-diameter at most , so
a performance ratio
in any metric space. the merger of / = and / = 1 has
-diameter at most given by
the
-partition F 3A; 3 F 1 . This completes the proof.
In spite of the lower bounds for greedy algorithms, they We comment briefly on the running time of this algorithm.
may not be entirely useless since some variant may perform In the above proof, the
-diameter of an interval may be re-
well in geometric spaces. We obtain some positive evidence placed by an easily-computed upper bound: at the time of
in this regard via the following analysis for the line. The up- creation of interval % 3'& , let %( 3)& be the gap containing *
per bounds given here should be contrasted with the lower
, and let the upper bound be + ,- * .3)%/ 3 /) .
bound of 2 for the line shown in Section 5. The following
Maintaining the points sorted in a balanced tree, the run-
definitions underlie the analysis. ning time is for each of the points inserted.
where the last inequality follows from the choice that 0 0
Corollary 1 The Doubling Algorithm has performance ratio
1. for the radius measure.
The update stage continues while the number of clusters is
at most . When a new point arrives, the algorithm attempts
A simple modification of the Doubling Algorithm, in which
we pick the new cluster centers by a simple left-to-right scan,
to place it in one of the current clusters without exceeding the
improves the ratio to
for the case of the line.
radius bound 0 = 1 : otherwise, a new cluster is formed with
the update as the cluster center. When the number of clusters While the obvious implementation of this algorithm ap-
reaches , phase ends and the current set of clusters pears to be inefficient, we can establish the following time
along with = 1 are used as the input for the st phase. bound, which is close to the best possible.
All that remains to be specified about the algorithm is the
initialization. The algorithm waits until 9 points have ar-
Theorem 8 The Doubling Algorithm can be implemented to
rived and then enters phase with each point as the center of
run in amortized time per update.
a cluster containing just itself, and with 1 set to the distance
Proof: First of all, we assume that there is a black-box
between the closest pair of points. It is easily verified that the
invariants hold at the start of phase . The following lemma
for computing the distance between two points in the met-
ric space in unit time. This is a reasonable assumption in
invariants for the * st phase.
shows that the clusters at the end of the th phase satisfy the
most applications, and in any case even the static algorithms’
analysis requires such an assumption. In the information re-
Lemma 3 The clusters at the end of the th phase satisfy
trieval application, the documents are represented as vectors
and the black-box implementation will depend on the vector
the following conditions:
length as well as the exact definition of distance.
1. The radius of the clusters is at most 0 =" 1 . We now show how the Doubling Algorithm may be im-
plemented so that the amortized time required for processing
2. The pairwise distance between the cluster centers is at each new update is bounded by . We maintain the
least =" 1 . edge lengths of the complete graph induced by the current
cluster centers in a heap. Since there at most clusters the
3. = 1 OPT ,
where OPT is the diameter of the optimal space requirement is . When a new point arrives, we
clustering for the current set of points. compute the distance of this point to the each of the current
cluster centers, which requires
time. If the point is ad- the expected value of
D is bounded by
ded to one of the current clusters, we are done. If, on the
% & ( 1
#< <
other hand the new point initiates a new cluster, we insert into
D D
D d
1)
the heap edges labeled with the distances between this new
( 1
center and the existing cluster centers which takes
OPT
<
D < D d
time. For accounting purposes in the amortized analysis, we
1
)
associate credits with each inserted edge. We will show
OPT
that it is possible to charge the cost of implementing the mer-
* OPT
ging stage of the algorithm to the credits associated with the
edges. This implies the desired time bound.
Therefore, the expected diameter is at most OPT .
We can assume, without loss of generality, that the mer-
ging stage merges at least two clusters. Let be the threshold
4 The Clique Partition Algorithm
used during the phase. The algorithm extracts all the edges
from the heap which have length less than . Let be the
We now describe the Clique Algorithm which has per-
number of edges deleted from the heap. The deletion from
formance ratio 6. This does not totally improve upon the
the heap costs time. The -threshhold graph on
Doubling Algorithm since the new algorithm involves solv-
the cluster centers is exactly the graph induced by these
ing the NP-hard clique partition problem, even though it is
edges. It is easy to see that the procedure described to find
the new cluster centers using the threshold graph takes time
only on a graph with vertices. Finding a minimum
clique partition is NP-hard even for graphs induced by points
linear in the number of edges of the graph, assuming that the
in the Euclidean plane [17], although it is in polynomial time
edges are given in the form of an adjacency list. Forming the
for points on the line. Since the algorithm needs to solve the
adjacency list from the edges takes linear time. Therefore, the
clique partition problem on graphs with vertices, this
total cost of the merging phase is bounded by
time. The credit of
placed with each
may not be too inefficient for small .
edge when it is inserted in to the heap accounts for this cost, Definition 6 Given an undirected unweighted graph ;
completing the proof.
+ 3
, an , -clique partition is a partition of + + 1 C+ C
Finally, we describe a Randomized Doubling Algorithm
$
5
C'+ 8 such that the the induced graphs ;%-+ = & ’s are cliques.
with significantly better performance ratio. The algorithm A minimum clique partition is an , -clique partition with the
is essentially the same as before, the main change being in minimum possible value of , .
the value of 1 which is the lower bound for phase . In the
deterministic case we chose 1 to be the minimum pairwise The Clique Algorithm is similar to the Doubling algorithm
distance of the first ' points, say . We now choose in that it also operates in phases which have a merging stage
a random value from %I5 3$& according to the probability followed by an update stage. The invariants maintained by
density function $ , set 1 to , and redefine 1 and the algorithm are different though. At the start of the th
0 5
. Similar randomization of doubling algorithms
phase we have clusters /1$3A/ 3$
$
5
737/98 1 and a value =
such that: (a) the radius of each cluster / F is at most = ; the
diameter of each cluster / F is at most
= ; and, (c) =; OPT .
was used earlier in scheduling [31], and later in other applic-
ations [7, 18].
The merging works as follows. Let =" 1 = . We form
Theorem 9 The Randomized Doubling Algorithm has ex- the minimum clique partition of the = 1 -threshold graph ;
pected performance ratio
in any metric space. of the cluster centers. The new clusters are then formed by
The same bound is also achieved for the radius measure. merging the clusters in each clique of the clique partition.
We arbitrarily choose one cluster from each clique and make
its center the cluster center of the new merged cluster. Let
Proof: Let be the sequence of updates and let the op-
timal cluster diameter for be
, where is the minimum
/1 37/ 3$
5
$
637/8& . be the resulting clusters. In the rest of the
pairwise distance of the first points. The optimal value phase we also need to know which old clusters merged to
is at least , hence
: . Suppose we choose 1
form each of the new clusters.
for some 6$ 35
& . Let D be the maximum radius of the Lemma 4 The radius of the clusters after the merging stage
clusters created for with this value of . Using arguments is at most = 1 and the diameter is at most
= 1 .
D is at most = 1 0 =>
=" 1 1A , where is the
similar to those in the proof of Theorem 7, we can show that
Proof: Let / F ?53A/ F $ 35
$
$
A37/ F/ H be the clusters whose
largest integer such that = = 1 1 = 1 OPT union is the new cluster /F and without loss of generality as-
lows that the new radius is at most =" 1 = =" 1 and in any metric space, and this is tight.
the diameter is at most == 1 =
= 1 .
During the update phase, a new point is handled as fol- Since the radius of the clusters is within of the optimal
lows. Let the current number of clusters be , where , = diameter, we obtain the following corollary.
. Recall that /1 37/ 3$
5
$
73A/8& . are the clusters formed Corollary 2 The Clique Algorithm has performance ratio
during the merging stage. If there exists such that " , =
and 32 F = 1 , or if , = and 32 F =" 1 where
in any metric space for the radius measure.
/ F is a cluster which merged to form /F , add to the cluster
/F . If no such exists, make a new cluster with as the cen-
As in the case of the Doubling Algorithm, we can use ran-
domization to improve the bound. Let be the minimum dis-
ter. The phase ends when the number of clusters exceeds ,
tance among the first points. The randomized algorithm
or if there are clusters at the end of the merging phase. sets 1 in phase of the deterministic algorithm, where
The intuition behind the new algorithm is the following. is chosen from %I5 3$& according to the probability density
At the beginning of the phase we have clusters and 1 . The analysis is similar to that of Theorem 9
a lower bound on the optimal. We use the lower bound to
function
D
and we omit the details.
increase the radius of our existing clusters and merge some
of them. To maintain the invariant for the lower bound in Theorem 11 The Randomized Clique Algorithm has per-
the next phase we need to ensure during this merging that formance ratio
in any metric space.
the number of clusters we have after the merging is no more
that what the optimal algorithm can achieve using the lower Corollary 3 The Randomized Clique Algorithm has per-
bound for the next phase. The doubling algorithm achieved formance ratio
for the radius measure in any
this by picking an independent set as the new cluster cen- metric space.
ters in the distance threshold graph. The weakness of this
approach is that we have a bound on the diameter, only as The special structure of the clusters in the Clique Al-
a function of the radius of the new cluster. We get the im- gorithm can be used to show that the performance ratio for
provement by observing that a better bound on the number the radius measure is better than for the geometric case.
of clusters achievable by the optimal with diameter bounded This is based on the following result in geometry; we defer
by = is the size of the minimum clique partition of the dis- the proofs of the proposition and its consequence.
tance threshold graph. We still need a condition on the radius
in order to do the doubling, but now, since we use cliques, we Proposition 12 Any convex set in of diameter at most
can bound the diameter of the new clusters better than twice can be circumscribed by a sphere of radius , where sat-
the radius. isfies the following recurrence with the base case 1!5 ,
The following lemmas show that the clusters at the end of
phase satisfy the invariants for phase .
9 1
Lemma 5 The radius of the clusters at the end of the phase
The solution to this recurrence is ! .
is at most = 1 and the diameter of the clusters is at most
= 1 .
Lemma 6 At the end of phase , =" 1
Theorem 13 The Clique Algorithm has performance ratio
OPT .
for the radius measure in . This implies perform-
Proof: Suppose = 1 " OPT . Let +' -2 1 32 35
$
$
A32 8 1 : ance ratio
for ! ,
for , and
asymptotically
be the cluster centers at the beginning of the phase. Note that for large .
the centers 2 1 3$
5
$
732 8&. belong to + . Let + >-2 F
" , =:
be the set of cluster centers of the clusters which are formed 5 Lower Bounds
2 F in + started a new cluster 2 F 3
in phase after the merging stage. Since each of the centers
" = 1 for all
F
+9C+ -2 : . Therefore in the optimal solution each center in
We present some lower bounds on the performance of in-
cremental clustering. The lower bounds apply to both dia-
+ is in a cluster which contains no center in + . This implies meter and radius measures but our proofs are given for the
that the centers in + are contained in at most , = clusters diameter case. The following theorem shows that even for
of diameter = 1 . This is a contradiction since , = was the size the simplest geometric space, we cannot expect a ratio better
of the minimum clique partition of the = 1 -threshold graph than 2; the proof is omitted.
on + .
The diameter of the clusters during phase is at most Theorem 14 For
5 8 ) on the performance ratio of deterministic and ran-
, there is a lower bound of and
= 1 and we maintain the invariant that = OPT at the
start of the phase. Therefore, the performance ratio of this domized algorithms, respectively, for incremental clustering
algorithm is at most
=" 1 =
. on the line.
In the case of general metric spaces, we can establish a clustering. The distribution on inputs is as follows. Initially,
stronger lower bound. the adversary provides points * 1 37*
$
5
6* such that the
distance between any two of them is . Then the adversary
Theorem 15 There is a lower bound of
# on
the performance ratio of any deterministic incremental clus-
partitions the points into disjoint sets + 1 37+
5
$
6+ 8 at
random, such that all partitions are equally likely. Finally
tering algorithm for arbitrary metric spaces.
the adversary provides points 1 3
3$
5
$
8 , such that
=3A* F if * F + = , =73A* F if * F + = ,
Proof: Consider a metric space consisting of the points
= F , 3 , . The distances between the points
=3 F
. Now, the diameter of the optimal solution
for any input in the distribution is , obtained by construct-
are the shortest path distances in the graph with the following
distances specified: = F 3 F = , and = F ? 3 = F $
.
ing the clusters + = C - = : . However, the incremental al-
Let = - = F
3 : . Note that the sets gorithm can produce a clustering with diameter only if the
= , , partition the metric space into clusters clusters it produces after it sees points *9143A*
5
$
6* are pre-
cisely the sets + = (selected at random by the adversary). Let
of diameter . Let be any deterministic algorithm for
the incremental clustering problem. Let
. Consider <8 be the number of ways to partition the points into
the clusters produced by after it is given the points = F
sets. Then the probability that the incremental algorithm
described above. produces a clustering of diameter is at most !54< 8 .
With probability at least , the incremental algorithm pro-
Case 1: Suppose the maximum diameter of ’s clusters is duces a clustering of diameter at least . Thus the expected
. Then ’s clusters must be the
sets - = F 3 F = : . Now the
adversary gives a point such that 3 = F !# (any large value of the diameter of the clustering produced is at least
. Hence the expected value of the performance ratio is at
number will do) for 3 . The optimal clustering is least . By chosing suitably large, < 8 can be made
- : and the sets 2103 3 3
. The optimal diameter is arbitrarily large, and hence can be made arbitrarily small,
. We claim that the maximum diameter of is at least in particular smaller than for any fixed " # .
. If the cluster that contains contains any other point
then our claim is clearly true. If on the other hand, the cluster 6 Dual Clustering
that contains does not contain any other point, must
have merged two of its existing clusters. Then the maximum We now consider the dual clustering problem: for a se-
diameter of ’s resulting clusters is at least . Thus quence of points 1 3 35
$
$
A3 , cover each point with a
the performance ratio of is at least . unit ball in as it arrives, so as to minimize the total number
of balls used. In the static case this problem is NP-Complete
Case 2: Suppose the maximum diameter of ’s clusters is and there is a PTAS for any fixed dimension [22]. We note
greater than . Then some cluster of contains points that in general metric spaces, it is not possible to achieve any
which are at least distance
apart. Let these points be
3 3) . Now the adversary gives
points
bounded ratio.
and ,
= F , such that = F 3 = F = F 3 F = . Our algorithm’s analysis is based on a theorem from com-
1
radius, any solution must use a separate sphere to cover each
center. Hence, the number of centers is a lower bound for
the number of spheres used by the optimal offline algorithm.
For each center , the incremental algorithm uses at most
spheres to cover the points in . Hence, the per- will satisfy our requirements. Unfolding the recurrence,
formance ratio of the incremental algorithm is bounded by
5 . #1 1
= = =
The following theorem gives a lower bound for the dual =G 1 =G 1 =G 1
clustering problem.
Noting that 1 , we obtain that 1 :
1 5
Theorem 18 For the dual clustering problem in , The lower bound is the smallest value of for which 1 is
negative. Let
be the largest value of for which
any incremental algorithm must have performance ratio
. :
#
1
Proof: The idea is as follows. At time , when points
have been given by the adversary, it will be the case that the
points 1435
$
$
A3 can be covered by a ball of radius .
Then, the adversary will find a point 1 lying outside the This gives the desired lower bound.
unit balls laid down by the algorithm so as to minimize the
radius 1 of the ball required to cover all points and
present that as a request. The game terminates when at some
time ' , we have for the first time that 8 1 " . Clearly,
Acknowledgements
is a lower bound on the performance ratio since the points
143$
5
$
63 8 can be covered by a ball of radius 8 , and We thank Pankaj Agarwal and Leonidas Guibas for help-
the algorithm has used balls up to that point. It remains to
ful discussions, and for suggesting that we consider the dual
analyze the worst-case growth rate of as a function of .
clustering problem.
Note that %1 # and $ .
Let 0 denote the volume of a unit ball in . At time References
, let
be any ball of radius (at most) that covers the
points 1 35
$
5
73 . For some to be specified later, define the
[1] M.S. Aldenderfer and R.K. Blashfield. Cluster Analysis.
ball as a ball with the same center as and with radius
Sage, Beverly Hills, 1984.
. We will choose such that the volume of is at [2] M. Bern and D. Eppstein. Approximation Algorithms for
Geometric Problems. In: D.S. Hochbaum, editor, Approx-
least 0 , implying that the current unit balls placed by the imation Algorithms for NP-Hard Problems. PWS Publishing
algorithm cannot cover the entire volume of ' . This would Company, 1996.
imply that there is a choice of a point 1 inside which [3] P. Brucker. On the complexity of clustering problems.
is not covered by the current balls. It is also clear that the In: R. Henn, B. Korte, and W. Oletti, editors, Optimization
new set of ' points now can be covered by a ball of radius and Operations Research, Heidelberg, New York, NY, 1977,
at most . implying that pp. 45–54.
[4] F. Can. Incremental Clustering for Dynamic Information Pro-
cessing. ACM Transactions on Information Processing Sys-
1
tems, 11 (1993), pp. 143–164.
[5] F. Can and E.A. Ozkarahan. A Dynamic Cluster Maintenance [24] D.S. Hochbaum and D.B. Shmoys. A unified approach to ap-
System for Information Retrieval. In Proceedings of the Tenth proximation algorithms for bottleneck problems. Journal of
Annual International ACM SIGIR Conference, 1987, pp. 123- the ACM, 33 (1986), pp. 533–550.
131. [25] S. Irani and A. Karlin. Online Computation. In: D.S. Hoch-
[6] F. Can and N.D. Drochak II. Incremental Clustering for Dy- baum, editor, Approximation Algorithms for NP-Hard Prob-
namic Document Databases. In Proceedings of the 1990 Sym- lems. PWS Publishing Company, 1996.
posium on Applied Computing, 1990, pp. 61–67. [26] N. Jardine and C.J. van Rijsbergen. The Use of Hierarchical
[7] S. Chakrabarti, C. Phillips, A. Schulz, D.B. Shmoys, C. Stein, Clustering in Information Retrieval. Information Storage and
and J. Wein. Improved Scheduling Algorithms for Minsum Retrieval, 7 (1971), pp. 217–240.
Criteria. In Proceedings of the 23rd International Colloquium [27] A.K. Jain and R.C. Dubes. Algorithms for Clustering Data.
on Automata, Languages and Programming, Springer, 1996. Prentice-Hall, NJ, 1988.
[8] B.B. Chaudhri. Dynamic clustering for time incremental data. [28] O. Kariv and S.L. Hakimi. An algorithmic approach to net-
Pattern Recognition Letters, 13 (1994), pp. 27-34. work location problems, part I: the -centers problem. SIAM