This paper presents the methods for finding
better and improved clusters using partitioning
methods along with hierarchical methods. In this
paper, k-means algorithm has been used for finding
clusters and the resultant clusters are further divided
using CFNG (Colored Farthest Neighbor Graph). The
final clusters formed would be better and closed as
compared to the clusters formed using k-means
algorithm.
This paper presents the methods for finding
better and improved clusters using partitioning
methods along with hierarchical methods. In this
paper, k-means algorithm has been used for finding
clusters and the resultant clusters are further divided
using CFNG (Colored Farthest Neighbor Graph). The
final clusters formed would be better and closed as
compared to the clusters formed using k-means
algorithm.
This paper presents the methods for finding
better and improved clusters using partitioning
methods along with hierarchical methods. In this
paper, k-means algorithm has been used for finding
clusters and the resultant clusters are further divided
using CFNG (Colored Farthest Neighbor Graph). The
final clusters formed would be better and closed as
compared to the clusters formed using k-means
algorithm.
Improved Clustering using Hierarchical Approach 1 Megha Gupta, 2 Vishal Shrivastava 1. M.Tech Scholar (Rajasthan Technical University, Kota) 2. M.Tech Associate Professor (Rajasthan Technical University, Kota) India Abstract- This paper presents the methods for finding better and improved clusters using partitioning methods along with hierarchical methods. In this paper, k-means algorithm has been used for finding clusters and the resultant clusters are further divided using CFNG (Colored Farthest Neighbor Graph). The final clusters formed would be better and closed as compared to the clusters formed using k-means algorithm. Keywords: Data mining, K-means algorithm, CFNG I. INTRODUCTION In the enormous amount of data stored in files, databases, and other repositories, it increasingly important, if not necessary, to develop powerful means for analysis and perhaps interpretation of such data and for the extraction of interesting knowledge that could help in decision-making. Data Mining [4], also popularly known as Knowledge Discovery in Databases (KDD), refers to the nontrivial extraction of implicit, previously unknown and potentially useful information from data in databases. While data mining and knowledge discovery in databases (or KDD) are frequently treated as synonyms, data mining is actually part of the knowledge discovery process. This paper presents the new method for developing clusters using k-means algorithm (partition method) along with CFNG (Colored Farthest Neighbor Graph). Cluster analysis studies the problem of finding groups of similar objects. Given a set of objects, this problem amounts to splitting the set into clusters or groups, such that the objects in each group are more similar to each other than to the objects in other groups. Clustering is the organization of data in classes. In clustering, class labels are unknown and it is up to the clustering algorithm to discover acceptable classes. Clustering is also known as unsupervised classification, because the classification is not directed by given class labels. There are many clustering approaches which are based on the principle of maximizing the similarity between objects in a same class and minimizing the similarity between objects of different classes. II. METHODS AND TECHNIQUES The methods that have been used in this paper are k- means algorithm (partitioning method) and CFNG (hierarchical clustering). 2.1 K-means algorithm The k-means algorithm [1] is for partitioning where each clusters center is represented by the mean value of the objects in the cluster. Input: K: the number of clusters D: a data set containing n objects. Output: A set of k clusters 2.1.1 Algorithm 1. arbitrarily choose k objects from D as the initial cluster centers 2. repeat 3. (re)assign each object to the cluster to which the object is the most similar 4. based on the mean value of the objects in the cluster International Journal of Computer Trends and Technology (IJCTT) volume 4 Issue 6June 2013 ISSN: 2231-2803 http://www.ijcttjournal.org Page 1565
5. update the cluster means, that is, calculate the mean value of the objects for each cluster 6. until no change 2.2 CFNG (Colored Farthest Neighbor Graph) The basic approach for CFNG is to split one set of objects into two subsets. Two steps are used for this process. In first step, the objects are used to obtain farthest neighbor graph (FNG). Then using two different colors, the graph vertices are colored. Finally the vertices are separated into two subsets, one for each color. Objects in each subset tend to be near each other and far from objects in the other subset. So this process keeps on going by splitting into smaller and smaller subsets until single objects are obtained. The following steps are being followed for obtaining FNG. Graph Construction A graph is a set of objects, along with a set of links between some of the objects. Each object in a graph is known as vertex and each link is known as edge. The vertices are usually represented by V and edge is represented by E. FNG consists of objects and the edges between the objects. In this graph, each object is connected with the farthest neighbor. Vertex Color In any vertices, graph can be colored so that the vertices of edges got different colors. To color a graph, vertices do not receive actual visible colors. A proper vertex coloring of a graph assigns each vertex a label, such that each vertices are adjacent, they will get different labels. A four color theorem states that any planar graph can be colored with four different colors [1]. 1. It means that using four colors, each vertex must be of different colors so that any two vertices sharing borders must be of different colors. If k different labels are used in coloring, then it is referred to as k-colored. Once the FNG is built, the method uses partition coloring to partition the objects into two well- separated subsets. For an arbitrary graph, a coloring can be found quickly with a simple greedy algorithm, which starts with an uncolored graph and visits the vertices in same order. When it visits the color of vertex, it examines the color of the neighbor vertex. If none of the vertex has been colored then it chooses the color for the vertex arbitrarily. But if some of the vertex has been colored then it chooses the color which is different from neighbors vertex. Such an algorithm will always help in creating a graph having different colors but if minimum colors are being used for coloring the vertices then it becomes very hard to implement the algorithm. So instead of using the graph, forest is used for this purpose. Forest is a collection of trees. A tree is a combination of vertices and edges in which two vertices are connected with single edge and there can be no cycle formed in the case of forest. Colored farthest neighbor graph shares many characteristics with SFN (shared farthest neighbors) by Rovetta and Masulli [3]. This algorithm yields binary partitions of objects into subsets, whereas number of subsets obtained by SFN can vary. The SFN algorithm can easily split a cluster where no natural partition is necessary, while the CFNG often avoids such splits. 2.2.1 Algorithm S is a set of objects in metric space BuildTreeWithStack(S) Clear Stack (A, B)=split(S) push (A), push(B) root a new tree node root. (left, right) (A, B) while stack is not empty do B pop ( ) A pop ( ) if |A| 1 or |B| 1 then (c, d) split (A) (e, f) split (B) If one of c, d, e, f is in wrong subset then adjust A and B (merge one of c, d, e, f into A or B) (c, d) split (A) (e, f) split (B) end International Journal of Computer Trends and Technology (IJCTT) volume 4 Issue 6June 2013 ISSN: 2231-2803 http://www.ijcttjournal.org Page 1566
A. (left, right) (c, d) B. (left, right) (e, f) Push(c), push (d), push (e), push (f) end end return root III. CONCLUSION In this paper, we have analyzed k-means algorithm and CFNG for hierarchical clustering. We have tried to describe better and closed clusters. Further, other algorithms can be used for hierarchical clustering. With hierarchical clustering, we have come with the tightly bonded clusters. With k-means we have got two clusters, one strong and other having weak tie and with further clustering using CFNG of the strong cluster, we have got three clusters finally, two strong and one having weak tie.
REFERENCES [1] Carl Endorf, Gene Schultz & Jim Mellander, Intrusion Detection & Prevention. McGraw-Hill Osborme Media, first edition, 2003 [2] M.Chrikar, C. Chakuri, T.Fedar & R.Motwani. Incremental Clustering & Dynamic Information Retrieval. Proceeding of the 29 th ACM Symposiumon the Theory of Computation, 1997. [3] S.Rovetta, F.Masulli, Shared farthest neighbor approach to clustering of high dimensionality, low cardinality data, Pattern Recognition 39 (12) (2006) 2415-2425. [4] Jiawei Han, Micheline Kamber, J ian Pei. Data Mining Concepts and Techniques.