Professional Documents
Culture Documents
Swarm Optimization
Abstract. Many universities all over the world have been offering courses on
swarm intelligence from 1990s. Particle Swarm Optimization is a swarm
intelligence technique. It is relatively young, with a pronounce need for a mature
teaching method. This paper presents an educational software tool in MATLAB to
aid the teaching of PSO fundamentals and its applications to data clustering. This
software offers the advantage of running the classical K-Means clustering
algorithm and also provides facility to simulate hybridization of K-Means with
PSO to explore better clustering performances. The graphical user interfaces are
user-friendly and offer good learning scope to aspiring learners of PSO.
1 Introduction
Computational techniques inspired by nature; such as artificial neural networks [1],
fuzzy systems [2], evolutionary computation [3] and swarm intelligence [4] etc have
found the interest of the scholarly. Particle Swarm Optimization is a unique approach
to swarm intelligence based on simplified simulations of animal social behaviors such
as fish schooling and bird flocking. It is first introduced by Kennedy and Eberhart as a
self-adaptive search optimization. Its applications are generally found in solving
complex engineering problems, mainly in non-linear function minimization, optimal
capacitor placement in distributed systems, shape optimization, dynamic systems and
game theory, constrained and unconstrained optimization, multi objective
optimization problems, control systems and others.
Off late, the interest and scope for research in PSO seems to be on a high. It is
therefore worthwhile to consider giving good quality learning to the beginners in the
field. Simulation is one among the better teaching methods for sure. Through this paper,
a software tutorial for PSO, developed to aid the teaching of PSO concepts and its
applications to data clustering, is introduced. The software offers facilities to simulate
classical K-means [6] clustering algorithm, PSO clustering, and hybridizations of
K-Means and PSO. The software provides a scope of experimentation by allowing the
learner to choose different tuning parameters for PSO along with suitable particle sizes
B.K. Panigrahi et al. (Eds.): SEMCCO 2010, LNCS 6466, pp. 278–285, 2010.
© Springer-Verlag Berlin Heidelberg 2010
A Software Tool for Data Clustering Using Particle Swarm Optimization 279
and iterations to obtain better clustering performances. The software is GUI based and
supported by various plots and graphs for better presentation of the derived results. This
work is done using MATLAB (Matrix LABoratory). MATLAB is a computational
environment for modeling, analyzing and simulating a wide variety of intelligent
systems. It also provides a very good access to the students by providing a numerous
design and analysis tools in Fuzzy Systems, Neural Networks and Optimization tool
boxes.
The remainder of this paper is organized as follows. In Section 2, the three
clustering algorithms; K-Means, PSO, and hybrid algorithms on three numerical
datasets – Iris, Wine, and Cancer (collected from UCI machine repository) are
discussed. In Section 3, the software for PSO based data clustering is presented by
taking a conventional K-Means clustering algorithm, PSO, and hybrid clustering
algorithms. In Section 4, comparative analysis of all the clustering algorithms with
experimental results is given based on their intra and inters cluster similarities and
quantization error. Section 5 concludes the paper.
2 Data Clustering
Data clustering is a process of grouping a set of data vectors into a number of clusters
or bins such that elements or data vectors within the same cluster are similar to one
another and are dissimilar to the elements in other clusters. Broadly, there are two
classes of clustering algorithms, supervised and unsupervised. With supervised
clustering, the learning algorithm has an external teacher that indicates the target class
to which the data vector should belong. For unsupervised clustering, a teacher does
not exist, and data vectors are grouped based on distance from one another.
The fitness of the particle is easily measured as the intracluster distance (the
distance among the vectors of a given cluster) which needs to be minimized. It is
given by
∑ [∑ ]
Nk
∀z p ∈Cij d (z p , a j )
j =1
(2)
Nk
Here z p denotes the p th data vector, cij is the i th particles j th cluster, a j denotes
Nd
centroid vector of cluster j, d ( z , a ) =
p j ∑ (z
k =1
pk − a jk ) 2 denoting the Euclidean
velid ( I ) = w * velid ( I − 1) + c1 * rand () * ( pid − xid ( I − 1)) + c 2 * rand () * ( p gd − xid ( I − 1)) (3)
the resulting centroids of which are used to seed the initial swarm, while the rest of
the swarm is initialized randomly. PSO algorithm is then executed (as in sec 2.2).
The second one is PSO + K-Means technique. In this, first PSO algorithm is executed
once, whose resulting gbest is used as one of the centroids for K-Means, while the rest
of the centroids are initialized randomly. K-Means algorithm is then executed.
Our software offers the facilities of exploring these possibilities with various
options of choosing parameters and number of iterations to investigate the ideas.
The results of clustering are shown in terms of intra class and inter class similarities
and also quantization error [Table 1]. A confusion matrix is also given where an
accuracy test can be made between the expected clusters and actual clusters. The time
taken by the algorithm to cluster the data is also given. The results of K-Means
clustering given in Table 1 are appended in the fig. 1 (as displayed by the software).
Fig. 2 displays the scope given for the user, to specify all the PSO parameters like
swarm size, inertia of weight, and acceleration coefficients. The results of clustering
are shown in the same way as in K-Means clustering [Table 2]. Sample results are
computed taking swarm size as 3, inertia weight as 0.72, and c1 and c2 both 1.
However, the user can play with this software giving any values to see how the PSO
clustering algorithm performs.
282 K. Manda et al.
On a similar note, the sample results and screen displays from the software for two
proposed hybridization algorithms are also presented below: K-Means+PSO [Table 3,
Fig. 3] and PSO+K-Means [Table 4, Fig. 4].
Table 5. (continued)
Fig. 5 shows the intra and inter cluster distances, quantization error and the time as
marked with blue, red, green, and black colors respectively, for all four algorithms.
learners can have first hand information about the PSO basics and also can proceed in
investigating fundamentals in clustering algorithms. The entire software has been
developed using MATLAB. The GUI generated using MATLAB are very convenient
for users to use and experiment. Also, users have been provided various options to
choose suitable parameters and check the effectiveness of those in clustering results.
The fitness graph generated while comparing all four clustering algorithms discussed
earlier can provide a better insight about the performances. The confusion matrices
generated are the indications of the accuracies of the algorithm on investigated
dataset. Authors note it here that no such comprehensive tools have been developed
so far to explore PSO based clustering using MATLAB. It is envisioned that the
presented software will offer a good learning environment to students keeping interest
in this filed.
As further scope, the other PSO models are to be included with facilities to include
more parameter setting environment. The variants of PSO also can be explored for the
purpose and a complete package can be developed for clustering applications.
References
1. Bishop, X.M.: Neural networks for pattern recognition. Oxford University Press, Oxford
(1995)
2. Yurkiovich, S., Passino, K.M.: A laboratory course on fuzzy control. IEEE Trans.
Educ. 42(1), 15–21 (1999)
3. Coelho, L.S., Coelho, A.A.R.: Computational intelligence in process control: fuzzy,
evolutionary, neural, and hybrid approaches. Int. J. Knowl-Based Intell. Eng. Sys. 2(2), 80–
94 (1998)
4. Bonabeau, E., Dorigo, M., Theraulaz, G.: Swarm intelligence: from natural to artificial
systems. Oxford University Press, Oxford (1999)
5. Kennedy, J.F., Eberhart, R.C.: Particle swarm optimization. In: Proceedings of the IEEE
International conference on neural networks, Perth, Australia, vol. 4, pp. 1942–1948 (1995)
6. MacQueen, J.B.: Some Methods for classification and Analysis of Multivariate
Observations. In: Proceedings of 5th Berkeley Symposium on Mathematical Statistics and
Probability, vol. 1, pp. 281–297. University of California Press (1967)