Model Order Selection For Multiple Cooperative Swarms Clustering

Information Sciences 182 (2012) 169183
Contents lists available at ScienceDirect
Information Sciences
journal homepage: www.elsevier.com/locate/ins
Model order selection for multiple cooperative swarms clustering using stability analysis
Abbas Ahmadi a,, Fakhri Karray b, Mohamed S. Kamel b
a b
Industrial Engineering Department, Amirkabir University of Technology, Tehran, Iran Department of Electrical and Computer Engineering, University of Waterloo, Waterloo, Ontario, Canada
a r t i c l e
i n f o
a b s t r a c t
Extracting different clusters of a given data is an appealing topic in swarm intelligence applications. This paper introduces two main data clustering approaches based on particle swarm optimization, namely single swarm and multiple cooperative swarms clustering. A stability analysis is next introduced to determine the model order of the underlying data using multiple cooperative swarms clustering. The proposed approach is assessed using different data sets and its performance is compared with that of k-means, k-harmonic means, fuzzy c-means and single swarm clustering techniques. The obtained results indicate that the proposed approach fairly outperforms the other clustering approaches in terms of different cluster validity measures. 2010 Elsevier Inc. All rights reserved.
Article history: Available online 28 October 2010 Keywords: Model order selection Data clustering Particle swarm optimization Cooperative swarms Swarm intelligence
1. Introduction Recognizing subgroups of the given data is of interest in data clustering. A vast number of clustering techniques have been developed to deal with unlabelled data based on different assumptions about the distribution, shape and size of the data. Most of the clustering techniques require a priori knowledge about the number of clusters [5,25], whereas some other approaches are capable of extracting such information [16]. Swarm intelligence approaches such as particle swarm optimization (PSO), biologically inspired by the social behavior of ocking birds [15], have been applied for clustering applications [1,3,7,8,19,2224]. The goal of PSO-based clustering techniques is usually to nd cluster centers. Most of the recent swarm clustering techniques use a single swarm approach to reach a nal clustering solution [8,18,19]. Multiple swarms clustering has been recently proposed [3]. A multiple swarms clustering approach is useful to deal with high dimensional data as it uses a divide and conquer strategy. In other words, it distributes the search space among multiple swarms, each of which explores its associated division while cooperating with others. The novelty of this paper is to apply a stability analysis for determining the number of clusters in underlying data using multiple cooperative swarms [4]. This paper is organized as follows. First, an introduction to cluster analysis is given. Particle swarm optimization and particle swarm clustering approaches are next explained. Then, model order selection using stability analysis is described. Finally, experiments using eight different data sets and concluding remarks are provided. 2. Cluster analysis Organizing a set of unlabeled data points Y into several clusters using some similarity measure is known as clustering [9]. The notion of similarity between samples is usually represented using their corresponding distance. Each cluster Ck contains
Corresponding author. Tel.: +98 21 222 39403.
E-mail address: abbas.ahmadi@aut.ac.ir (A. Ahmadi). 0020-0255/$ - see front matter 2010 Elsevier Inc. All rights reserved. doi:10.1016/j.ins.2010.10.010
170
A. Ahmadi et al. / Information Sciences 182 (2012) 169183
n onk a set of similar data points given by C k yk , where yk denotes data point j in cluster k, nk indicates the number of its j j
j1
associated data points and K is the number of clusters. Let us assume the solution of a clustering algorithm AK(Y) for the given data points Y of size N is presented by T : AK(Y) which is a vector of labels T fti gN , where ti denotes the obtained label for i1 data point i and t i 2 L : f1; . . . ; Kg. The main approaches for grouping data are hierarchical and partitional clustering. The hierarchical clustering approach generates a hierarchy of clusters known as a dendrogram. Apparently, the dendrogram can be broken at different levels to yield different clusterings of the data [13]. To build the dendrogram, agglomerative and divisive approaches are used. In an agglomerative approach, each data point is initially considered as a cluster. Then, the two close clusters merge together and produce a new cluster. Merging close clusters is continued until all points form a single cluster. Different notions of closeness exist, which are single link, complete link and average link. In contrast to the agglomerative approach, the divisive approach begins with a single cluster containing all data points. It then splits the data points into two separate clusters. This procedure continues till each cluster includes a single data point [13]. In the partitional approach to data clustering, the aim is to partition a given data set into a pre-specied number of clusters. Various partitional clustering algorithms are available. The most famous one is the k-means algorithm. The k-means (KM) procedure initiates with k arbitrary random points as cluster centers. The algorithm assigns each data point to the nearest center. New centers are then computed based on the associated data points of each cluster. This procedure is repeated until no improvement is obtained after a certain number of iterations. Unlike k-means, the k-harmonic means (KHM) algorithm, introduced by Zhang and Hsu [25], does not rely on the initial solution. It utilizes the harmonic average of distances from each data point to the centers. As compared to the k-means algorithm, it improves the quality of clustering results in certain cases [25]. Another extension to k-means algorithm was suggested by Bezdeck using fuzzy set theory [5]. This algorithm is known as fuzzy c-means (FCM) clustering in which every data point is associated to each cluster with some degree of membership. Another class of partitional clustering is probabilistic clustering approaches, such as particle swarm-based clustering, which are developed using probability theory [14]. In particle swarm clustering, the intention is to nd the centers of clusters such that an objective function is optimized. The objective function can be dened in terms of cluster validity measures. These measures are used to evaluate the quality of clustering techniques [11]. There are numerous measures, which mainly tend to minimize intra-cluster distance, and/or maximize the inter-cluster distance. Some widely used quality measures of clustering techniques are next described. 2.1. Compactness measure Compactness measure, also represented by within-cluster distance, indicates how compact the clusters are [9]. This measure is denoted by F c m1 ; . . . ; mK or simply by F c M, where M = (m1, . . . , mK). Having K clusters, compactness measure is dened as
F c M
nk K 1 X 1 X k k d m ; yj ; K k1 nk j1
where mk denotes the center of the cluster k and d() stands for Euclidean distance. Clustering techniques tend to minimize this measure. 2.2. Separation measure This measure, also known as between-cluster distance, evaluates the separation of the clusters [9]. It is given by
F s M
K K X X 1 d mj ; mk : KK 1 j1 kj1
It is desirable to maximize this measure, or equivalently minimize F s M. 2.3. Turis validity index Turis validity index [21] is dened as
F Turi M c N 2; 1 1
intra ; inter
where c is a user-specied parameter, equal to unity in this paper, and N is a Gaussian distribution with l = 2 and r = 1. The intra denotes the within-cluster distance provided in Eq. (1). Also, the inter term is the minimum Euclidean distance between the cluster centers computed by
inter minfdmk ; mq g;
k 1; 2; . . . ; K 1;
q k 1; . . . ; K:
171
The aim of the different clustering approaches is to minimize Turis index. 2.4. Dunns index Lets dene a(Ck, Cq) and b(Ck) as
a C k ; C q min q dx; z; k
x2C ;z2C
bC max dx; z:
x;z2C
k
Now, Dunns index [10] can be computed as
F Dunn M min
8 <
16k6K
min @ :k16q6K max
a Ck ; Cq
~ 16k6K
19 = A : ~ k bC ;
Clustering techniques are required to maximize Dunns index. 2.5. S_Dbw index Let the average scattering of the clusters is considered as a measure of compactness expressed by
Scatt K
k K X rC
k1
krYk
where r() stands for the variance of the data and kk indicates Euclidean norm. Then, the separation measure is given by
Sep
K K XX D zk;q 1 ; KK 1 k1 q1 max fDmk ; Dmq g
qk;
where zk,q is the middle point of the line segment dened bycluster centers mk and mq. Also, Dmk denotes a density func Pnk tion around point mk which is estimated by Dmk j1 f mk ; yk , and j
f mk ; yk j
~ 1 if d mk ; yk < r j 0 Otherwise;
~ where r K 1
q PK k k1 krC k. Finally, S_Dbw index [11,12] is dened as
FS
Dbw M
Scatt Sep:
10
Maximizing this index is of interest when trying to cluster a set of data into several groups. 3. Particle swarm clustering Particle swarm optimization (PSO) is a search algorithm introduced for dealing with optimization problems [15]. The PSO procedure commences with an initial swarm of particles in an n-dimensional space and evolves through a number of iterations to nd an optimal solution given a predened objective function F . Each particle i is distinguished from others by its position and velocity vectors, denoted by xi and vi, respectively. To choose a new velocity, each particle considers three components: its previous velocity, a personal best position and a global best position. The personal best and global best positions, denoted by xp and x*, respectively, keep track of the best solutions obtained so far by the associated particle and the swarm. i Thus, the new velocity and position are updated as
vi t 1 wv i t c1 r1 xp t xi t i xi t 1 xi t v i t 1;
c2 r2 x t xi t;
11 12
where w indicates the impact of the previous history of velocities on the current velocity, c1 and c2 are cognitive and social components, respectively, r1 and r2 are generated randomly using a uniform distribution in interval [0, 1]. If minimizing the objective function is of interest, the personal best position of particle i at iteration t can be provided by
( xp t i 1
xp t i
if F xi t 1 P F xp t ; i
xi t 1 otherwise:
13
172
Moreover, the global best position is updated as
x t 1 arg min F xp t; i p
xi t
i 1; 2; . . . ; n:
14
Maximum number of iterations, number of iterations with no improvement, and minimum objective function criterion are common strategies to terminate the PSO procedure [2]. The rst strategy is taken into consideration hereafter in this paper. 3.1. Single swarm clustering In this approach, the position of the particle i is expressed by xi = (m1, . . . , mK)i or simply by xi = (M)i, where M = (m1, . . . , mK) and mk denotes the center of cluster k. In other words, each particle contains a representative for the center of all clusters. The representation of particle position xi for three-cluster case (K = 3) is illustrated in Fig. 1. To model the clustering problem as an optimization problem, it is required to formulate an objective function. Cluster validity measures described in Section 2 can be considered as the objective function. By considering F m1 ; . . . ; mK or F M as the required objective function, PSO algorithm can explore the search space to nd the cluster centers. When the dimensionality of the data increases and the number of clusters is large, the ability of the single swarm clustering is not sufcient to traverse all of the search space. Instead, multiple cooperative particle swarms can be considered to determine clusters centers [3]. 3.2. Multiple cooperative swarms clustering Multiple cooperative swarms clustering approach assumes that the number of swarms is equal to the number of clusters and particles of each swarm are candidates for the corresponding clusters center. The procedure of multiple cooperative swarms clustering is completed through two main phases: to distribute the search task among multiple swarms and to build up a cooperation mechanism between swarms. More detailed description of the proposed distribution and cooperation strategies is given next. 3.2.1. Distribution strategy The core idea is to divide the search space into different divisions sk, k 2 [1, . . . , K]. Each division sk is denoted by its center k z and width Rk; i.e., sk = f(zk, Rk). To distribute the search space into different divisions, a super-swarm is used. The superswarm, which is a population of particles, aims to nd the center of divisions zk, k 2 [1, . . . , K]. Each particle of the superswarm is dened as (z1, . . . , zK), where zk denotes the center of division k. By repeating the single swarm clustering procedure using one of the mentioned cluster validity measures as the objective function, the centers of different divisions are obtained. Then, the width of divisions are computed by
Rk akk ; k 2 1; . . . ; K; max
15
where a is a positive constant selected experimentally and kk is the square root of the biggest eigenvalue of data points max belonging to division k [3].
Fig. 1. Representation of particle position in single swarm clustering.
173
Fig. 2. Schematic representation of multiple swarms. First, the cooperation between multiple swarms initiates and each swarm investigates its associated division (a). When the particles of each swarm converge (b), the nal solution for cluster centers is revealed.
3.2.2. Cooperation strategy After distributing the search space, each division is assigned to a swarm. That is, the number of swarms is equal to the number of divisions, or clusters, and particles of each swarm are candidates for the corresponding clusters center. In this stage, there is information exchange between swarms and each swarm knows the global best, the best cluster center, of the other swarms obtained so far. Therefore, there is a cooperative search scheme where each swarm explores its related division to nd the best solution for the associated cluster center interacting with other swarms. The schematic representation of the multiple cooperative swarms is depicted in Fig. 2. In the multiple cooperative clustering approach, particles of each swarm are required to optimize the following problem:
min F Mi s:t: : m1 2 s1 ; . . . ; mK 2 sK ; i i i 2 1; . . . ; n; 16
where F denotes one of the cluster validity measures introduced in Section 2. The search procedure using multiple swarms is performed in a parallel scheme. First, n different solutions for cluster centers are obtained using Eq. (16). The best solution is called a new candidate for the cluster centers, denoted by M0 = (m0 1, . . . , m0 K). To update the cluster centers, the following rule is applied:
( M
new
M0 M
old
if F M0 6 F Mold ; otherwise;
17
where M = (m1, . . . , mK). In other words, if the objective value of the new candidate for cluster centers (M0 ) is smaller than that of the former candidate (M(old)), the new solution is accepted; otherwise, it is rejected. The overall algorithm of multiple swarms clustering is provided in Algorithm 1. The PSO-based clustering approaches assume that the number of clusters is known in advance. In this paper, the notion of stability analysis is used to extract the number of clusters for the underlying data.
4. Model order selection using stability approach Determining the number of clusters in data clustering is known as a model order selection problem. There exist two main stages in model order selection. First, a clustering algorithm should be chosen. Then, the model order needs to be extracted, given a set of data [16].
174
Algorithm 1: Multiple cooperative swarms clustering Stage 1: Distribute search space into K different divisions s1, . . . , sK Obtain the center of divisions z1, . . . , zK Obtain the width of divisions R1, . . . , RK Stage 2: Cooperate till convergence Explore the division by 1.1. Computing new positions and velocities of all particles of swarms 1.2. Determining the tness value of all particles using the associate cluster validity measure 1.3. Choosing a solution that minimizes the optimization problem provided in Eq. (16) and denoting it as the new candidate for cluster centers (m0 1, . . . , m0 K) Update the cluster centers 2.1. If the objective value of the new candidates for centers of clusters (m0 1, . . . , m0 K) is smaller than that of previous iteration, accept the new solution; otherwise, reject it 2.2. If termination criterion is achieved, stop; otherwise, continue this stage
Most of the clustering approaches assume that the model order is known in advance. Here, we employ stability analysis to obtain the number of clusters when using the multiple cooperative swarms to cluster the underlying data. A description of stability analysis is provided before explaining the core algorithm. Stability concept is used to evaluate the robustness of a clustering algorithm. In other words, the stability measure indicates how much the results of the clustering algorithm are reproducible on other data drawn from the same source. Some examples of stable and unstable clustering are shown in Fig. 3 when the aim is to cluster the presented data into two groups. As can be seen in Fig. 3, data points shown in Fig. 3(a) provide a stable clustering solution in a sense that the same clustering results are obtained by repeating a clustering algorithm several times. However, the data points illustrated in Fig. 3(b) and (c) do not yield stable clustering solutions when two clusters are of interest. That is, different results are generated by running the clustering algorithm a number of times. Each line in Fig. 3 presents a possible clustering solution for the corresponding data. The reason for getting unstable clustering solutions in these cases is the inappropriate number of clusters. In other words, stable results are obtained for these data sets by choosing a suitable number of clusters. The proper number of clusters for these data are three and four, respectively. As a result, one of the issues that affects the stability of the solutions produced by a clustering algorithm is the model order. For example, by assuming a large number of clusters the algorithm generates random groups of data inuenced by the changes observed in different samples. On the other hand, by choosing a very small number of clusters, the algorithm may compound separated structures together and return unstable clusters [16]. As a result, one can utilize the stability measure for estimating the model order of the unlabeled data [4]. The multiple cooperative swarms clustering data requires a priori knowledge of the model order in advance. In order to enable this approach to estimate the number of clusters, the stability approach is taken into consideration. This paper uses the stability method introduced by Lange et al. for the following reasons: it requires no information about the data, it can be applied to any clustering algorithm, it returns the correct model order using the notion of maximal stability. The required procedure for model order selection using stability analysis is provided in Algorithm 2. A more precise schematic description of this algorithm is depicted in Fig. 4. The goal is to get the true cluster centers, denoted by (m1, . . . , mK), for the given data Y. First, the underlying data is randomly divided into two halves Y1 and Y2. Multiple cooperative swarms approach is used to cluster these two halves and the obtained solutions are shown by T1 and T2, respectively. Next, a classier /(Y1) is trained by using the rst half of data and its associated labels (Y1,T1).
(a) Stable
(b) Unstable
(c) Unstable
Fig. 3. Examples of stable and unstable clustering when two clusters are desired.
175
Fig. 4. The schematic description of the model order selection algorithm.
Algorithm 2: Model order selection using stability analysis for k 2 [2 . . . K] do for r 2 [1 . . . rmax] do Randomly split the given data Y into two halves Y1, Y2 Cluster Y1, Y2 independently using an appropriate clustering approach; i.e., T1 : Ak(Y1), T2 : Ak(Y2) Use (Y1, T1) to train classier /(Y1) and compute T 02 /Y 2 Calculate the distance of the two solutions T2 and T 02 for Y2; i.e., dr dT 2 ; T 02 Again cluster Y1, Y2 by assigning random labels to points Extend random clustering as above, and obtain the distance of the solutions; i.e., dnr end for Compute the stability stab(k) = meanr(d) Compute the stability of random clusterings stabrand(k) = meanr(dn)
stabk sk stab k
rand
end for Select the model order k* such that k* = arg mink{s(k)}
Now, the trained classier can be used to determine the labels for Y2, denoted by T 02 . Consequently, there exist two different labels for Y2. The more similar the labels are, the more stable the results would be. The similarity of the obtained solutions can be stated in terms of their associated distance. Accordingly, if the distance is low, it is said that the obtained results are stable. As it is revealed from the algorithm, the explained procedure is repeated several times (rmax) to ensure that the reported results are not generated by random. Furthermore, we repeat the whole procedure for different values of K to extract a correct model order. Next, the most important aspects of the model order selection algorithm are explained. 4.1. Classier /(Y) A set of labeled data is required for training a classier /. The data set Y1 and its clustering solution from algorithm Ak, i.e., T1 : Ak(Y1), can be used to establish a classier. There are a vast range of classiers that can be used for classication. In this paper, k-nearest neighbor (KNN) classier was chosen as it requires no assumption on the distribution of data.
176
4.2. Distance of solutions provided by clustering and classier for the same data Having a set of training data, the classier can be tested using a test data Y2. Its solution is given by T 02 /Y 2 . But, there exists another solution for the same data obtained from the multiple cooperative swarms clustering technique, i.e., T2 : Ak(Y2). The distance of these two solutions is calculated by
N X d T 2 ; T 02 arg min # xt2i t02i ;
x2qk
18
i1
where
# t 2i t02i
&
1 if t2i t 02i ; 0 otherwise:
19
Also, qk contains all permutations of k labels and x is the optimal permutation which produces the maximum agreement between two solutions [16].
24 22 20 18 16 14
stab (k)
(a) Excluding random clustering

0.64 0.62 0.6
s (k)
0.58 0.56 0.54
(b) Including random clustering

Fig. 5. The effect of the random clustering on the selection of the model order.
Table 1 Data sets selected from UCI machine learning repository. Data set Iris Wine Teaching assistant evaluation (TAE) Breast cancer Zoo Glass identication Diabetes Classes 3 3 3 2 7 7 2 Samples 150 178 151 569 101 214 768 Dimensionality 4 13 5 30 17 9 8
177
1. Speech data
2 1.5 1 Turi index Turi index 0.5 0 0.5 0 1 1.5 0.5 1 Kmeans Single swarm Multiple swarms 2 1.5
2. Iris data
Kmeans Single swarm Multiple swarms
0.5
20
40 Iterations
60
80
100
20
40 Iterations
60
80
100
3. Wine data
0.6 0.4 0.2
4. Teaching assistant evaluation data

1 0.8 0.6 0.4
Turi index
0 0.2 0.4
Turi index
0.2 0 0.2 0.4
0.6 0.8 0
Kmeans Single swarm Multilpe swarms 20 40 Iterations 60 80 100
0.6 0.8 0 20 40 Iterations 60 80 100
5. Breast cancer data

0.6 0.4 0.2 4 Turi index 0 0.2 0.4 4 0.6 6 20 40 Iterations 60 80 100 0 20 Turi index 2 0 2 Kmeans Single swarm Multiple swarms 10 8 6
6. Zoo data
40 Iterations
60
80
100
7. Glass data
10 8 6 4 Turi index Turi index 2 0 2 4 6 0 20 40 Iterations 60 80 100 Kmeans Single swarm Multiple swarms 0.5 0.4 0.3 0.2 0.1 0 0.1 0.2 0.3 0 20
8. Diabetes data
40 Iterations
60
80
100
Fig. 6. Comparing the performance of the multiple cooperative swarms clustering with k-means and single swarm clustering in terms of Turis index.
178
4.3. Random clustering The stability measure depends on the number of classes or clusters. For instance, the accuracy rate of 50% for binary classication is more or less the same as that of a random guess. However, this rate for k = 10 is much better than a random predictor. In other words, if a clustering approach outcomes the same accuracy for model orders k1 and k2, where k1 < k2, the clustering solution for k2 is more reliable than the other solution. Hence, the primary stability measure obtained for a certain value k, stab(k) in Algorithm 2, should be normalized using the stability rate of a random clustering stabrand(k) [16]. Therefore, the nal stability measure for the model order k is obtained as follows:
& sk
' stabk : stabrand k
20
The effect of the random clustering is studied on the performance of the Zoo data set provided in Section 5 to determine the model order of the data using k-means algorithm. The stability measure for different number of clusters with and without using random clustering is shown in Fig. 5. As depicted in Fig. 5, the model order of the zoo data using k-means clustering is recognized as 2 without considering random clustering, while it becomes 6, which is close to the true model order, by normalizing the primary stability measure by the stability of the random clustering. 4.4. Appropriate clustering approach For a given data set, the algorithm does not provide the same result for multiple runs. Moreover, the model order is highly dependent on the type of appropriate clustering approach that is used in this algorithm (see Algorithm 2), and there is no specic emphasis in the work of Lange et al. [16] on the type of clustering algorithm that should be used. K-means and kharmonic means algorithms are either sensitive to the initial conditions or to the type of data. In other words, they cannot capture true underlying patterns of the data, and consequently the estimated model order is not robust. However, PSO-based clustering methods such as single swarm or multiple cooperative swarms clustering do not rely on initial conditions, and they are the search schemes which can explore the search space more effectively and may escape from local optimums. Moreover, as described earlier, multiple cooperative swarms clustering is more probable to get the optimal solution as compared with single swarm clustering and it can provide more stable and robust solutions. Therefore, the multiple cooperative swarms approach distributes the search space among multiple swarms and enables cooperation between swarms, leading to an effective search strategy. Accordingly, we propose to use multiple cooperative swarms clustering in stability analysis-based approach to nd the model order of the given data. 5. Experimental results The performance of the proposed approach is evaluated and compared with other approaches such as single swarm clustering, k-means and k-harmonic means clustering using eight different data sets, seven of which are selected from the UCI machine learning repository [6], and the last one being a speech data set is taken from the standard TIMIT corpus [17]. The name of data sets chosen from UCI machine learning repository, their associated number of classes, samples and dimensions are provided in Table 1. Also, the speech data include four phonemes: /aa/, /ae/, /ay/ and /el/, from the TIMIT corpus. A total of 800 samples from these classes was selected, and twelve mel-frequency cepstral coefcients [20] have been considered as speech features.
Table 2 Average and standard deviation of different measures for speech data. Method K-means K-harmonic means Single swarm Cooperative swarms Turis index 0.8328 [0.8167] 3.54e05 [2.62e05] 1.4539 [0.8788] 1.6345 [1.0694] Dunns index 0.0789 0.0769 0.1098 0.1008 [0.0142] [0.0001] [0.014] [0.0153] S_Dbw 3.3093 [0.327] 3.3242 [0.0001] 1.5531 [0.0372] 1.583 [0.0388]
Table 3 Average and standard deviation of different measures for iris data. Method K-means K-harmonic means Single swarm Cooperative swarms Turis index 0.4942 [0.3227] 0.82e05 [0.95e05] 0.8802 [0.4415] 0.89 [1.0164] Dunns index 0.1008 0.0921 0.3979 0.3979 [0.0138] [0.0214] [0.0001] [0.0001] S_Dbw 3.0714 [0.2383] 3.0993 [0.0001] 1.4902 [0.0148] 1.48 [0.008]
179
The performance of the multiple cooperative swarms clustering approach is compared with k-means and single swarm clustering techniques in terms of Turis validity index over 80 iterations (Fig. 6). The results are obtained by repeating the algorithms 30 independent times. For these experiments, the parameters are considered as w = 1.2 (decreasing gradually [2]), c1 = 1.49, c2 = 1.49, n = 30 (for all swarms). Also, the number of clusters is considered to be equal to the number of classes.
Table 4 Average and standard deviation of different measures for wine data. Method K-means K-harmonic means Single swarm Cooperative swarms Turis index 0.2101 [0.3565] 2.83e07 [2.82e07] 0.3669 [0.4735] 0.7832 [0.8564] Dunns index 0.016 [0.006] 190.2 [320.75] 0.1122 [0.0213] 0.0848 [0.009] S_Dbw 3.1239 2.1401 1.3843 1.3829 [0.4139] [0.0149] [0.0026] [0.0044]
Table 5 Average and standard deviation of different measures for TAE data. Method K-means K-harmonic means Single swarm Cooperative swarms Turis index 0.6329 [0.7866] 1.36e06 [1.23e06] 0.5675 [0.6525] 0.7661 [0.7196] Dunns index 0.0802 [0.0306] 0.123 [0.0001] 0.1887 [0.0001] 0.1887 [0.0001] S_Dbw 3.2321 2.7483 1.4679 1.4672 [0.5205] [0.0001] [0.0052] [0.004]
Table 6 Average and standard deviation of different measures for breast cancer data. Method K-means K-harmonic means Single swarm Cooperative swarms Turis index 0.1711 [0.1996] 0.88e08 [0.55e08] 0.62 [0.7997] 0.6632 [0.654] Dunns index 0.0173 [0.0001] 7.0664 [38.519] 217.59 [79.079] 245.4857 [53.384] S_Dbw 2.1768 1.8574 1.7454 1.7169 [0.0001] [0.0203] [0.079] [0.0925]
Table 7 Average and standard deviation of different measures for zoo data. Method K-means K-harmonic means Single swarm Cooperative swarms Turis index 0.8513 [1.0624] 1.239 [1.5692] 5.5567 [3.6787] 6.385 [4.6226] Dunns index 0.2228 0.3168 0.5427 0.5207 [0.0581] [0.0938] [0.0165] [0.0407] S_Dbw 2.5181 2.3048 2.0528 2.0767 [0.2848] [0.1174] [0.0142] [0.025]
Table 8 Average and standard deviation of different measures for glass identication data. Method K-means K-harmonic means Single swarm Cooperative swarms Turis index 0.7572 [0.9624] 0.89e05 [1.01e05] 4.214 [3.0376] 6.0543 [4.5113] Dunns index 0.0286 [0.001] 0.0455 [0.0012] 0.1877 [0.0363] 0.225 [0.1034] S_Dbw 2.599 [ 0.2571] 2.0941 [0.0981] 2.6797 [0.3372] 2.484 [0.1911]
Table 9 Average and standard deviation of different measures for diabetes data. Method K-means K-harmonic means Single swarm Cooperative swarms Turis index 0.243 [0.3398] 1.88e07 [1.9e07] 0.2203 [0.2621] 0.3053 [0.3036] Dunns index 0.0137 153.68 1298.1 1298.1 [0.0001] [398.42] [0.0001] [0.0001] S_Dbw 2.297 [ 0.0001] 2.0191 [0.353] 1.5202 [0.027] 1.5119 [0.0043]
180
Speech data
0.45 0.4 0.35 0.35 0.5 0.45
Iris data
K-means
0.4 0.35 0.4 0.3
Wine data
0.65 0.6 0.55 0.5
TAE data
s (k)
s (k)
s (k)
0.25 0.2 0.15 0.1
0.25 0.2 0.15
0.25 0.2 0.15 0.1
s (k)
2 3 4 5 6 7
0.3
0.3
0.45 0.4 0.35 0.3 0.25
0.1
0.05 2 3 4 5 6 7 2 3 4 5 6 7 2 3 4 5 6 7
K-harmonic means
0.4 0.35 0.36 0.3 0.34 0.45 0.45 0.4 0.38 0.5 0.5 0.55
s (k)
s (k)
s (k)
0.25 0.2 0.15 0.1
s (k)
2 3 4 5 6 7
0.32 0.3 0.28 0.26 0.24 0.22
0.4
0.4 0.35 0.3
0.35
0.3 0.25
Fuzzy c-means
0.35 0.3 0.35 0.3 0.4 0.35 0.45 0.4
s (k)
0.25 0.2
s (k)
s (k)
s (k)
2 3 4 5 6 7
0.25 0.2 0.15
0.3 0.25 0.2 0.15 0.1
0.35 0.3 0.25 0.2
0.15 0.1
0.1 0.05
Single swarm
0.74 0.72 0.7 0.68 0.48 0.47 0.46 0.45 0.51 0.5 0.49 0.48 0.74 0.72 0.7 0.68
s (k)
0.66 0.64 0.62 0.6 0.58 2 3 4 5 6 7
s (k)
0.44 0.43 0.42 0.41 0.4 2 3 4 5 6 7
s (k)
0.47 0.46 0.45 0.44 0.43 2 3 4 5 6 7
s (k)
0.66 0.64 0.62 0.6 0.58 2 3 4 5 6 7
Multiple swarms
0.65 0.64 0.63 0.75 0.7 0.65 0.65 0.6 0.64 0.55 0.62 0.68 0.66
s (k)
s (k)
s (k)
0.6 0.55
0.5 0.45 0.4 0.35
0.61 0.6 0.59 0.58
s (k)
2 3 4 5 6 7
0.62
0.6 0.58 0.56 0.54 0.52
0.5 0.45
0.5 2 3 4 5 6 7 2 3 4 5 6 7 2 3 4 5 6 7
Fig. 7. Stability measure as a function of model order: speech, iris, wine and TAE data sets.
As illustrated in Fig. 6, multiple cooperative swarms clustering provides better results as compared with k-means, as well as single swarm clustering approaches, in terms of Turis index for a majority of the data sets. In Tables 29, the multiple cooperative swarms clustering is compared with other clustering approaches using different cluster validity measures over 30 independent runs. The results presented for different data sets are in terms of average and standard deviation ([r]) values. As observed in Tables 29, multiple swarms clustering is able to provide better results in terms of the different cluster validity measures for most of the data sets. This is because it is capable of manipulating multiple-objective problems, in
181
Breast cancer data

0.35 0.3 0.25 0.6 0.64
Zoo data
Glass data
Diabetes data
0.45 0.4 0.35
K-means
0.5 0.45 0.62 0.4 0.35
s (k)
0.2 0.15
s (k)
s (k)
s (k)
3 4 5 6 7
0.3 0.25 0.2
0.3 0.25 0.2
0.58
0.56 0.1
0.15 0.05 0.54 0.1 3 4 5 6 7 2 3 4 5 6 7 2 0.15
K-harmonic means
0.65 0.6 0.55 0.66 0.64 0.62 0.3 0.28 0.26 0.24 0.5 0.55
s (k)
s (k)
s (k)
s (k)
3 4 5 6 7
0.6 0.58 0.56
0.5 0.45 0.4
0.45
0.22 0.2 0.18
0.4
0.54 0.52
0.16 0.14 3 4 5 6 7 2
0.35
Fuzzy c-means
0.45 0.4 0.35 0.5 0.3 0.35 0.7 0.6 0.5 0.45 0.4 0.3 0.35
s (k)
s (k)
s (k)
0.25 0.2 0.15
0.4 0.3 0.2
s (k)
3 4 5 6 7
0.3 0.25 0.2
0.25 0.2 0.15
0.1 0.05 2 3 4 5 6 7 0.1
0.15 0.1 3 4 5 6 7 2 0.1
Single swarm
0.32 0.3 0.28 0.61 0.64 0.63 0.62 0.4 0.4 0.45 0.5 0.45
s (k)
s (k)
s (k)
0.26 0.24 0.22
0.35
0.6 0.59 0.58 0.57
s (k)
3 4 5 6 7
0.35 0.3 0.25 0.2
0.3
0.25 0.15 0.1 2
0.2
0.56 3 4 5 6 7 2 3 4 5 6 7 2
Multiple swarms
0.4 0.35 0.3 0.52 0.56 0.45 0.4 0.35 0.4 0.3 0.54
s (k)
s (k)
s (k)
0.35
s (k)
3 4 5 6 7
0.25 0.2 0.15 0.1 0.05
0.25 0.2 0.15
0.5
0.3 0.48 0.25
0.46
0.1
Fig. 8. Stability measure as a function of model order: breast cancer, zoo, glass identication and diabetes data sets.
contrast to k-means and k-harmonic means clustering, and it distributes the search space between multiple swarms and solves the problem more effectively. Now, the stability-based approach for model order selection in multiple cooperative swarms clustering is studied. The PSO parameters are kept the same as before, and rmax = 30 and k is considered to be 25 for KNN classier. The stability measures of different model orders for the multiple cooperative swarms and other clustering approaches using different data sets are presented in Figs. 7 and 8. The results for speech, iris, wine and teaching assistant evaluation data sets are provided in Fig. 7 and for the last four data sets including breast cancer, zoo, glass identication and diabetes are shown in Fig. 8. In these
182 Table 10 The best model order (k*) for data sets. Data set Real model order
KM
KHM
FCM
Single swarm Turi Dunn 2 2 4 2 2 2 2 2 S_Dbw 2 2 2 2 2 2 2 2
Multiple swarms Turi 4 3 3 4 2 6 7 2 Dunn 2 4 5 2 2 2 2 2 S_Dbw 2 2 3 2 2 2 2 2
Speech Iris Wine TAE Breast cancer Zoo Glass Diabetes
4 3 3 3 2 7 7 2
2 2 2 2 2 6 2 2
2 3 7 2 3 2 3 3
2 2 2 2 2 4 2 4
7 7 4 3 2 4 4 2
gures, k and s(k) indicate model order and stability measure for the given model order k, respectively. The corresponding curves for single swarm and multiple swarms clustering approaches are obtained using Turis validity index. According to Figs. 7 and 8, the proposed approach using multiple cooperative swarms clustering is able to identify the correct model order for most of the data sets. Moreover, the best model order for different data sets can be obtained as provided in Table 10. The minimum value for stability measure given any clustering approach is considered as the best model order (k*), i.e.
k arg minfskg:
k
21
As presented in Table 10, k-means, k-harmonic means and fuzzy c-means clustering approaches do not converge to the true model order using the stability-based approach for the most of the data sets. The performance of the single swarm clustering is partially better than that of k-means, k-harmonic means and fuzzy c-means clustering because it does not depend on initial conditions and it can escape trapping in local optimal solutions. Moreover, the multiple cooperative swarms approach using Turis index provides the true model order for majority of the data sets. As a result, Turis validity index is appropriate for the model order selection using the proposed clustering approach. Its performance, based on Dunns index and S_Dbw index, is also considerable as compared to the other clustering approaches. Subsequently, the proposed multiple cooperative swarms can provide better estimates for model order, as well as stable clustering results as compared to the other clustering techniques by using the introduced stability-based approach. 6. Conclusion A new bio-inspired multiple cooperative swarms algorithm was described to deal with clustering problem. The stability analysis-based approach was introduced to estimate the model order of the multiple cooperative swarms clustering. We proposed to use multiple cooperative swarms clustering to nd the model order of the data, due to its robustness and stable solutions. The performance of the proposed approach has been evaluated using eight different data sets. The experiments indicate that the proposed approach produces better results as compared with the k-means, k-harmonic means, fuzzy cmeans and single swarm clustering approaches. In the future, we will investigate other similarity measures as Euclidean distance works well when a dataset contains compact or isolated clusters. Furthermore, we will study other stability measures since the used measure requires a considerable computational burden to discover a suitable model order. References
[1] A. Abraham, C. Grosan, V. Ramos (Eds.), Swarm Intelligence in Data Mining, Springer, 2006. [2] A. Abraham, H. Guo, H. Liu, Swarm Intelligence: Foundations, Perspectives and Applications, Studies in Computational Intelligence, Springer-Verlag, Germany, 2006. [3] A. Ahmadi, F. Karray, M. Kamel, Multiple cooperating swarms for data clustering, in: IEEE Swarm Intelligence Symposium, 2007, pp. 206212. [4] A. Ahmadi, F. Karray, M. Kamel, Model order selection for multiple cooperative swarms clustering using stability analysis, in: IEEE Congress on Evolutionary Computation within IEEE World Congress on Computational Intelligence, Hong Kong, 2008, pp. 33873394. [5] J. Bezdek, Pattern Recognition with Fuzzy Objective Function Algoritms, Plenum Press, New York, 1981. [6] C. Blake, C. Merz, UCI Repository of Machine Learning Databases. <http://www.ics.uci.edu/mlearn/MLRepository.html>, 1998. [7] C. Chen, F. Ye, Particle swarm optimization algorithm and its application to clustering analysis, in: IEEE International Conference on Networking, Sensing and Control, 2004, pp. 789794. [8] X. Cui, J. Gao, T.E. Potok, A ocking based algorithm for document clustering analysis, Journal of Systems Architecture 52 (8-9) (2006) 505515. [9] R. Duda, P. Hart, D. Stork, Pattern Classication, John Wiley and Sons, 2000. [10] J. Dunn, A fuzzy relative of the isodata process and its use in detecting compact well-separated clusters, Cybernetics 3 (1973) 3257. [11] M. Halkidi, Y. Batistakis, M. Vazirgiannis, On clustering validation techniques, Intelligent Information Systems 17 (23) (2001) 107145. [12] M. Halkidi, M. Vazirgiannis, Clustering validity assessment: nding the optimal partitioning of a data set, in: International Conference on Data Mining, 2001, pp. 187194. [13] A. Jain, M. Murty, P. Flynn, Data clustering: a review, ACM Computing Surveys 31 (3) (1999) 264323. [14] M. Kazemian, Y. Ramezani, C. Lucas, B. Moshiri, Swarm Clustering Based on Flowers Pollination by Articial Bees, Swarm Intelligence in Data Mining, Springer, 2006. pp. 191202. [15] J. Kennedy, R. Eberhart, Particle swarm optimization, in: IEEE International Conference on Neural Networks, vol. 4, 1995, pp. 19421948.
183
[16] T. Lange, V. Roth, M. Braun, J. Buhmann, Stability-based validation of clustering solutions, Neural Computing 16 (2004) 12991323. [17] National Institute of Standards and Technology, TIMIT Acoustic-Phonetic Continuous Speech Corpus, Speech Disc 1-1.1, NTIS Order No. PB915050651996, 1990. [18] M. Omran, A. Engelbrecht, A. Salman, Particle swarm optimization method for image clustering, International Journal of Pattern Recognition and Articial Intelligence 19 (3) (2005) 297321. [19] M. Omran, A. Salman, A. Engelbrecht, Dynamic clustering using particle swarm optimization with application in image segmentation, Pattern Analysis and Applications 6 (2006) 332344. [20] M. Seltzer, Sphinx III signal processing front end specication, Technical Report, CMU Speech Group, 1999. [21] R. Turi, Clustering-based colour image segmentation, Ph.D. Thesis, Monash University, Australia, 2001. [22] D. van der Merwe, A. Engelbrecht, Data clustering using particle swarm optimization, in: Proceeding of IEEE Congress on Evolutionary Computation, vol. 1, 2003, pp. 215220. [23] X. Xiao, E. Dow, R. Eberhart, Z. Miled, R. Oppelt, Gene clustering using self-organizing maps and particle swarm optimization, in: IEEE Procedings of International Parallel Processing Symposium, 2003, p. 10. [24] F. Ye, C. Chen, Alternative kpso-clustering algorithm, Tamkang Journal of Science and Engineering 8 (2) (2005) 165174. [25] B. Zhang, M. Hsu, K-harmonic means: a data clustering algorithm, Technical Report, Hewlett-Packard Labs, HPL-1999-124.

Model Order Selection For Multiple Cooperative Swarms Clustering

Uploaded by

Document Information

Original Description:

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Model Order Selection For Multiple Cooperative Swarms Clustering

Uploaded by

Copyright:

Available Formats

Information Sciences 182 (2012) 169183

Contents lists available at ScienceDirect

A. Ahmadi et al. / Information Sciences 182 (2012) 169183

A. Ahmadi et al. / Information Sciences 182 (2012) 169183

Now, Dunns index [10] can be computed as

min @ :k16q6K max

K K XX D zk;q 1 ; KK 1 k1 q1 max fDmk ; Dmq g

q PK k k1 krC k. Finally, S_Dbw index [11,12] is dened as

A. Ahmadi et al. / Information Sciences 182 (2012) 169183

Moreover, the global best position is updated as

Fig. 1. Representation of particle position in single swarm clustering.

A. Ahmadi et al. / Information Sciences 182 (2012) 169183

A. Ahmadi et al. / Information Sciences 182 (2012) 169183

A. Ahmadi et al. / Information Sciences 182 (2012) 169183

Fig. 4. The schematic description of the model order selection algorithm.

A. Ahmadi et al. / Information Sciences 182 (2012) 169183

1 if t2i t 02i ; 0 otherwise:

(a) Excluding random clustering

0.58 0.56 0.54

(b) Including random clustering

A. Ahmadi et al. / Information Sciences 182 (2012) 169183

4. Teaching assistant evaluation data

0.2 0 0.2 0.4

Kmeans Single swarm Multiple swarms

Kmeans Single swarm Multilpe swarms 20 40 Iterations 60 80 100

0.6 0.8 0 20 40 Iterations 60 80 100

5. Breast cancer data

Kmeans Single swarm Multiple swarms

A. Ahmadi et al. / Information Sciences 182 (2012) 169183

' stabk : stabrand k

A. Ahmadi et al. / Information Sciences 182 (2012) 169183

A. Ahmadi et al. / Information Sciences 182 (2012) 169183

0.25 0.2 0.15 0.1

0.25 0.2 0.15

0.25 0.2 0.15 0.1

0.45 0.4 0.35 0.3 0.25

0.25 0.2 0.15 0.1

0.32 0.3 0.28 0.26 0.24 0.22

0.4 0.35 0.3

0.25 0.2 0.15

0.3 0.25 0.2 0.15 0.1

0.35 0.3 0.25 0.2

0.66 0.64 0.62 0.6 0.58 2 3 4 5 6 7

0.44 0.43 0.42 0.41 0.4 2 3 4 5 6 7

0.47 0.46 0.45 0.44 0.43 2 3 4 5 6 7

0.66 0.64 0.62 0.6 0.58 2 3 4 5 6 7

0.5 0.45 0.4 0.35

0.61 0.6 0.59 0.58

0.6 0.58 0.56 0.54 0.52

A. Ahmadi et al. / Information Sciences 182 (2012) 169183

Breast cancer data

0.3 0.25 0.2

0.3 0.25 0.2

0.15 0.05 0.54 0.1 3 4 5 6 7 2 3 4 5 6 7 2 0.15

0.6 0.58 0.56

0.5 0.45 0.4

0.22 0.2 0.18

0.25 0.2 0.15

0.4 0.3 0.2

0.3 0.25 0.2

0.25 0.2 0.15

0.1 0.05 2 3 4 5 6 7 0.1