You are on page 1of 1

Clustering is an important data mining task and has been explored extensively by a

number of researchers for different application areas such as finding similarities in images, text
data and bio-informal data. Various optimization techniques have been proposed to improve the
performance of clustering algorithms. In this paper we propose a novel algorithm for clustering
that we call Evolutionary Particle Swarm Optimization (EPSO)-clustering algorithm which is
based on PSO. The proposed algorithm is based on the evolution of swarm generations where the
particles are initially uniformly distributed in the input data space and after a specified number of
iterations; a new generation of the swarm evolves. The swarm tries to dynamically adjust itself
after each generation to optimal positions. The paper describes the new algorithm the initial
implementation and presents tests performed on real clustering benchmark data. The proposed
method is compared with k-means clustering- a benchmark clustering technique and simple
particle swarm clustering algorithm. The results show that the algorithm is efficient and produces
compact clusters.
Clustering is an important data mining task and has been explored extensively by a
number of researchers for different application areas such as finding similarities in images, text
data and bio-informatics data. In this paper we propose a novel algorithm for clustering that is
Balanced Iterative Reducing and Clustering using Hierarchies (BRICH). It is an
unsupervised data mining algorithm used to perform hierarchical clustering over particularly
large data-sets. The algorithm starts with single point clusters (every point in a database is a
cluster).Then it groups the closest points into separate clusters, and continues, until only one
cluster remains. The computation of the clusters is done with a help of distance matrix (O (n 2)
large) and O(n2) time.

It is local in that each clustering decision is made without scanning all data points and

currently existing clusters.


It exploits the observation that data space is not usually uniformly occupied and not every

data point is equally important.


It makes full use of available memory to derive the finest possible sub-clusters while

minimizing I/O costs.


It is also an incremental method that does not require the whole dataset in advance.