You are on page 1of 16

International Journal of Computer Science Engineering

and Information Technology Research (IJCSEITR)


ISSN(P): 2249-6831; ISSN(E): 2249-7943
Vol. 6, Issue 5, Oct 2016, 37-52
TJPRC Pvt. Ltd.

COMPONENT BASED SELELCTION TECHNIQUE USING CLUSTERING


NEHA BANSAL1 & JAGDEEP KAUR2
1

Research Scholar, Department of CSE, The North Cap University, Gurgaon, Haryana, India
2

Department of CSE, The North Cap University, Gurgaon, Haryana, India

ABSTRACT
Component Based Software Engineering is the latest expanding technology that has gained various advantages
in the field of software reuse. The software component selection become a major concern in this approach. Though, this
progress also includes proficient discovery of software component. But the issue that is normally experienced is the
finding and fetching of the components from the repository as there are large number of similar components available.
This paper discusses a new similarity measure that summarizes the search and also assists the users to select the best
component that is suitable for their objectives. Consequently, this paper also consist a case study in which components are
taken from repository and new similarity measure is applied in order to get the most suitable component and at the end
comparison has been done between various similarity measures in the context of time and efficiency so that fetched
components can be reutilized in the future.

(COTS), Similarity Measure

Received: Jul 07, 2016; Accepted: Aug 18, 2016; Published: Sep 22, 2016; Paper Id.: IJCSEITROCT20165

INTRODUCTION

Original Article

KEYWORDS: Clustering, Clustering Analysis, Component Based Software Engineering (CBSE), Component of the Shelf

Software engineering is the practice of well-organized, controlled, measurable technique to the


development, performance, and protection of system software. The main aim of Software Corporation is the
delivery of project on time and budget. Now days, projects are becoming very intricate due to large amount of
functional or non-functional requirement and the need of gathering the components that comes from different
merchants. Therefore, meeting the stated objectives becomes difficult to achieve. To attain the objectives,
engineering researchers have evolved some procedures that help the software team to move towards their target.
Basically, a project begins with documenting the requirements and then extents its important landmarks i.e.
designing and implementing the documented requirements. After that operation and maintenance generally the last
step in the development procedure. At the time of systems growth, all suitable and required documents will be
enhanced and compiled. Simultaneously, software engineering also instructs the significance of electing the
suitable technic and structural design. In reality, a software team lacks to persue these instructions [1] due to
variety of reasons such as: (1) Lack of project development time. (2) Requirements of the projects are often altered.
(3) Description of the project changes time to time. (4) Scarcity of human resources. (5)Human elements like
idleness, inaccuracy in designing /implementation, workers facing problems to work as a team, workers changing
their company, different workers have different knowledge and different style of doing the work, and have
different technical skills etc. (6) Developer lack to follow system structural design instructions.

www.tjprc.org

editor@tjprc.org

38

Neha Bansal & Jagdeep Kaur

In some extent, Component Base Software Engineering [2] tries to resolve these problems. This involves speedy
congregation and maintenance of component-based systems (CBS), where components and platforms have validated
characteristics and these validating characteristic offers the source for estimating the characteristics of systems that is
evaluated by using components. The powerful strength behind this examination is to generate more consistent systems in
considerable time limit.
CBSE deals with reutilizing the code that was designed and tested earlier, in order to reduce the cost require to
develop newer system. It includes that the developing system can be partly or totally dependent on the components that are
already developed. The actual usefulness of CBSE [3] is to minimize the developing and the analyzing time, and generate
powerful and simply altered systems. To make use of CBSE, decision should be taken before the evolution of a system
because it needs essential alteration in the procedures, can generate supplementary barriers and also influence the
disintegration of the problem at the time of designing.
Manufacturing the system by using already developed component contains various issues. It is very hard to
discover the components that match the exact systems requirement which are under evolution. Additionally, every
component has their own functionality; sometimes it is not possible to combine the component with rest of the system.
However, if component developers engaged in electing the components by adjusting their assets according to the end user
necessity, the probability of discovering and fetching the components has been raised. A clear awareness of component and
end user requirements gives the best solution of the problems.
Reutilization of Components wants Component Repositories [4] (libraries) in which components can be easily
saved, fetched and altered. The main aim of Component Repository is to finding and fetching the component that can be
reused. Reuse aspirants can be practicable objects, programming codes, records, credentials, test suit, projects, libraries,
etc. This information can be in various types such as subjective, pictorial, acoustic or motion picture that wants visual
appearance which enhances the user's perception regarding the components and also maximizes the opportunity of fetching
the appropriate components. As the data goes on increasing day by day in the repository, the complication of fetching the
right component also increases. For fetching the component from repository, searching algorithm is required that helps in
extracting the component described by the user in the inquiry.

Figure1: Clustering Process


As shown in Figure 1, Clustering[5] is the step by step method in which related component is composed into one
cluster and unrelated one into another. The initial step in the clustering procedure involves choosing the component and
their characteristic that need to be clustered. Then similarity measure [6] is used to find out which components are more
identical depending on their characteristic. Then, a clustering algorithm is used to assemble the related components. Lastly,
the outcome has been analyzed. The outcome of clustering is totally based on the description of data, elected characteristic,
similarity measure applied and on the clustering algorithm. It is noticed that the outcome may vary depending on the
Impact Factor (JCC): 7.1293

NAAS Rating: 3.63

Component Based Selelction Technique Using Clustering

39

sphere in which clustering is employed. Clustering is significant part in text mining. This describes the method of
congregating the component with equivalent attributes in to clustering that improve the range and exactness of mining
practices. One significant characteristic of clustering is that all the paradigm contained by clustering share related
characteristic and paradigm in dissimilar clustering is unrelated in subsequent sense. In software engineering, components
inside related clustering have high consistency and less combination. These major points are to be kept in mind while
clustering. This focuses on using the paradigms in order to resolve the user issues within time and cost constraints.
The main agenda to use similarity measure with clustering algorithm is to fetch the components that can be reutilized in the
future. The details are discussed over the same in later chapters.
The rest of the paper is structured as section II includes related work, section III presents various similarity
measures, section IV includes a case study, section V is a detailed explanation of proposed work, section VI gives the
results and experimental analysis using MATLAB R2013a.The conclusion of the paper is given in section VII.
RELATED WORK
Various clusteringalgorithms like k-means, agglomerative, density based, etc. algorithms has been discussed in
the literature. Some of the algorithms give exact solution but they provide poor performance with large search spaces.
However number of authors has worked to improve and enhance the performance of these clustering algorithms.
Carina Frota Alves, et al. [6] states that CBSE is a crucial approach that gives more emphasis on marking larger
system. The main aim is to improve the quality of the software by reducing its cost & time. Various COTS techniques
works only for quality attribute during the succeeding stages of development. This increases the chances of uncertainty and
non-success of the final system. So, capturing the non-functional requirement should be done quickly in the life process of
the system. Therefore the author proposed CRE model that helps in analyzing and electing the COTS Components.
Anil Jadhav, et al. [7] proposed HKBS Method that is a combination of rule based and case based approach. In
this

paper

author

also

made

comparison

between

various

COTS

techniques

such

as

AHP

(Analytical Hierarchical Processing), WSM (Weight Score Method) with proposed HKBS and found that HKBS overcome
all the drawbacks of these techniques. HKBS will be beneficial in various factors like change in assess similarity
measurement criteria, sudden change in requirement, changes in analyzing criteria and problem of ranking the component
etc.
Syed Ahsan Fahmi, et al. [8] discussed Case Based Reasoning (CBR) approach for electing the best component
among the available enlisted components. The electing procedure includes 1) Firstly Determine the Asses similarity
measurement Criteria 2) Performing the Component Search 3) Search results will be filtered according to the requirement
4)Analyze the Component 5) Evaluate the assessment data and elect the component that is best suited according to criteria.
N MD Jubair Basin, et al. [9] proposed the method of component election using clustering. In order to recognize
the components, low coupling and high cohesion is required. It is easy to abstract the component when it is loosely
coupled. This principle raises the use of component again and again. By determining the redesigned component, the
proposed CBO Measure assists to decrease the coupling or pairing between objects. This factor assist in increasing the
output or work rate of the organization.
Wen Zhang, et al. [10] proposed the maximum capturing that provides document clustering. This comprises two
strategies: 1) Assembling of Document clustering 2) Documents Topics Allotment. In the Assembling of Document
www.tjprc.org

editor@tjprc.org

40

Neha Bansal & Jagdeep Kaur

Clustering, the objective is to have that pair of documents which have the highest similarity. And to generate document
clustering, MST Algorithm is used. In Document Topic Allotment, topic is the most frequent item set of the cluster and to
terminate cluster topic overlaps, Topic Reduction Procedure is used. Maximum capturing shows better result to documents
into clusters.
Amit Kumar, et al. [11] promotes Keyword Classification Approach that helps in classifying the components. In
classifying the components or module the adjustable approach is Faceted Classification System. The enumerated system
doesnt provide elasticity feature in the repository for the classification of components or module in more than one way; it
only provides a fastest way to investigate the component within the library. The Attribute Value System permits
classification of same component in different ways, but restricts its ordering. To split the repository, Attribute Value
Strategy is used. Then combination of Keyword Approach and Faceted System is used by the library to provide the
interface to the user so that they can come to know how to transfer, fetch and scan the components. To fetch the
components, user provides the inquiry with all the details about the component, searching is performed in the repository
and provides the output/result. Then end user will elect the desirable component according to their relevance. For arranging
the component within the cluster, Keyword Approach is used. Similar type of components is in same cluster and different
type of components is in separate clusters.
Chintakindi Srinivas, et al. [12] introduced a new similarity measure function i.e. XNOR to find the similarity
between the document and components. Set of components or set of Documents or set of entities can be the input to the
clustering algorithm. Clustering algorithm can take set of components or documents as an input and displays the output in
the form of clusters. In relation to algorithm: 1) Documents will be Preprocessed 2) Discover the frequent Dataset of all
documents 3) By using XNOR similarity function, documents will be clustered.
K. A. Abdul Nazir, et al. [13] introduces the concept of improved K-means Algorithm. In K-means, algorithm is
used for clustering the components but this algorithm lacks in the accuracy perspective because of the unsystematic way of
electing the centroid. This algorithm provides the well-organized way to find the primarily centroid and also provide the
effective method of allotting and the applying the K-means sets of data to clusters.

Shweta

Yadav,

et

al.

[14]

contribute the force for the correct retrieval of software component or module. Various approaches have been applied but
the result is not accurate according to the user perspective. So, the author defined an algorithm which is a combination of
Keyword and Semantic approach for the accurate retrieval of component. By using the Semantic Technique, the fetched
component was influenced as the percentage of similar component. Uncertainty was decreased by using semantic method
in opposition of keyword based. This method assists the user in electing the component according to their need as ranking
is retrieved on the basis of periodicity and explanation of the component.
SIMILARITY MEASURE
Similarity measure plays an important role in the field of mining. It is used to determine the similarity between the
components. The binary similarity measures [15] are mentioned below In Table1.

Impact Factor (JCC): 7.1293

NAAS Rating: 3.63

Component Based Selelction Technique Using Clustering

41

Table 1: Similarity Measure


Similarity Measure
Simple Matching
Jaccard
Sorensen Dice

Formulas(F)
+
+ + +
+ +
2
2 + +

It is used to determine the similarity between the components.The purpose of similarity measure is to allocate a
binary value to the object between 0 and 1. The numerical value 0 indicates that the objects are not similar at all, while 1
indicates that the two objects have some similar characteristics depending on the basis of similarity measure. Here, p
indicates the presence of characteristics in both the objects i.e. 1, q & r indicates that the characteristic present in one
object(1) but absent in another object(0) and s indicates the absence of characteristic in both the objects i.e. 0.
CASE STUDY

Sorensen Dice
It is the similarity measure that is used to compute the similar features among components.
SorensenDice=
This measure consists of two cases that underline some inadequacy or short comings of Sorensen Dice measure.

Then a Sorensen Dice like similarity measure is proposed that resolve this inadequacy.
Case1: Amount of ps is distinct among various components, but similarity is identical.
An instance of feature matrix (FM) with six components (CM1-CM6) & five attributes (AF1-AF5) shown in
Table 2
Table 2: Software System 1

The equivalent table using Sorensen Dice is presented in Table 3 .This can be noticed from Table 3 that Sorensen
Dice similarity measure finds components CM1 & CM2, CM3 & CM4,CM5 & CM6 are equally identical but they have
distinct amount of p. For instance, CM5 & CM6 have p=5, CM3 & CM4 have p=4 and lastly CM1 & CM2 have p=2.
The ALGO which utilize this similarity measure will take arbitrary (random) decision. In order to resolve this problem, it is
more suitable to cluster the components having large amount of attributes or traits that is large amount of ps.

www.tjprc.org

editor@tjprc.org

42

Neha Bansal & Jagdeep Kaur

Table 3: Similarity Measure Using Sorensen Dice forSystem1

Case2: Amount of ps is very high among components, but they are not totally identical.
An instance of FM with four components (CM1-CM4) and six attributes (AF1-AF5) shown in Table 4.
The equivalent table using the Sorensen Dice is presented in Table 5. This can be noticed from Table 4 that Sorensen Dice
similarity measure finds components CM1 & CM2 to be identical because they share one attribute AF1 only but CM3
& CM4 are less identical CM5 & CM6 are equally identical even though they share three attributes. In order to resolve this
problem, It is more suitable to cluster the components having large amount of attributes or traits (large amount of ps )
though there are similarity measure all amount of qs and rs representing differences.
Table 4: Software System 2

Table 5: Similarity Measure Using Sorensen Dice for System 2

Impact Factor (JCC): 7.1293

NAAS Rating: 3.63

Component Based Selelction Technique Using Clustering

PROPOSED SORENSEN DICE - HB=

43

2p
2p + q+ r + w

Where, w is the total number of features.


w=p+q+r+s
SORENSEN DICE- HB=

2p

2p + q + r + (p + q + r + s)
=2 p
3p + 2q + 2r + s
It is important to consider that Sorensen Dice-HB computes the absence of feature, i.e. s
After applying SORENSEN DICE-HB in above two cases that was illustrated in Table 2 and 3.
It is noticed that newmeasure ranks the similarity between components CM1 & CM2, CM3 & CM4, CM5 &
CM6, therefore choosing the cluster component is not arbitrary(random) as shown in Table 6.
Table 6: Similarity Measure Using Sorensen Dice-HB for System 1

In case 2, it is noticed that CM3 and CM4 are mainly identical so CM3 and CM4 will be clustered first in
comparison to CM1 and CM2 as shown in Table 7.
Table 7: Similarity Measure Using Sorensen Dice-HB for System 2

www.tjprc.org

editor@tjprc.org

44

Neha Bansal & Jagdeep Kaur

PROPOSED ALGORITHM
The proposed algorithm has been devised with the objective to overcome the shortcomings of existing algorithm.
Discovering component for proficient reuse is the major difficulty faced by various researchers. Clustering helps in
minimizing the components search space by assembling identical components together and also assist in minimizing the
time. Here a new similarity measure i.e. Sorensen Dice HB has been proposed with the aim of discovering analogies
between components. For the enlisted components, similarity matrix has been created by using proposed similarity
measure. Clustering algorithm takes the similarity matrix as an input and provide the output in the form of clusters.
The election of similarity measure can have a large influence on clustering quality. Hence, similarity measure is an
essential way to affect the result of clustering.
This algorithm can be utilized to cluster component or document data. This algorithm takes component as input
and provides the output in the form of clusters.
Algorithm
Begin
Initialize c, ,,,f, w
c= component
= stop words
= stemming words
w= word set
f = frequent dataset
= counter
uw =unique words
fw= frequent item sets
(Each )
For (c=1, c>, c++)
{
-

Eliminate the stop & stemming words from each component

Search uwfrom each component

Search fw from each component while framing the w containing each word in fw of component

Frame DBM with each row and column w. r. teach component & each word respectively

For every component (c =1, c > , c++)


{{

Impact Factor (JCC): 7.1293

NAAS Rating: 3.63

Component Based Selelction Technique Using Clustering

45

For every item sets (fw=1, fw> , fw++)


}
For every word in w do
}
If (wk in w is in ci)
{
Set c [ci, wk] = 1
Else
Set c [ci, wk] = 0
End if
End for
End for
}

Search the feature vector (fv)similarity measure by determining identical value for every component pair .

Apply Sorensen Dice HB Function in orderto acquire the matrix with fv for every component pair

Change the analogous matrix cells with total number of 0s in tri state feature vector ( fv).

At every step,
Search the highest value among the cells and component pairs consisting this value in the matrix. Perform

Clustering (Uniform) collect such component pairs in order to frame clusters.


if component pair (X,Y) falls in same cluster and pair (Y, Z) in different cluster, frame a new cluster consisting
(X, Y, Z) as its constituents.

Repeat
Until
{
Component set is empty or
Equilibrium point i.e., the point where the minimum value for 0s is achieved
}

Output: Group of Clusters

www.tjprc.org

editor@tjprc.org

46

Neha Bansal & Jagdeep Kaur

Block Diagram of Proposed Work


It is very easy computation procedure and proficient in context of processing with minimum search space and can
also be utilize in component clustering. This algorithm search the unique words and frequent items from the component set
and then frame a word set and Dependency Boolean Matrix to obtain feature vector by using the Sorensen- Dice HB that
results in to group of clusters. Similarity measure plays an important role in the clustering algorithm. Figure 2 explains the
working of the proposed algorithm.

Figure 2:Block Diagram of Proposed Work

RESULT ANALYSIS

Experimental Analysis
The following experimentation over the similarity measure has been carried out in order to examine the

performance of clusters using MATLAB 2013a. Here we assume a group of 24 components with the frequent item sets (fw)
acquired after mining. These components are sample data as shown in Table 8 that are used to classify which component is
more similar to one another based on various similarity measure.
Table 8: Sample Data

Create a Boolean matrix with rows representing each component and column representing to each unique fw from
the group of fw sets of all components accordingly as in Table 9.

Impact Factor (JCC): 7.1293

NAAS Rating: 3.63

Component Based Selelction Technique Using Clustering

47

Table 9: Boolean Matrix Description of Components

Create a matrix C [n-1, n] for n components and acquire only bottom-most triangular area. These matrix cells are

employed by similarity measure that works an input as presented below in Table 10.
Table 10: Feature Vector by Using Similarity Measure

After attaining the feature vector table, now determine the matrix with highest value as shown in Table 11.
Table 11: Feature Vector with Highest Value

Search the first highest value from the matrix and extractthe cells which consist these values, to shape primary
cluster.

www.tjprc.org

editor@tjprc.org

48

Neha Bansal & Jagdeep Kaur

Table 12: Search Highest Value

Search the next highest value from the above table that is 0.46 here and extract the cells with most optimal
solution. This value lies in {4, 3} that already exist in previous cluster as shown in Table 13.
Table 13: Search Highest Value

Search the next highest value that is 0.4 here and extract the cells with best optimal solution in Table 14.
Table 14: Next Highest Value from Remaining Components

Impact Factor (JCC): 7.1293

NAAS Rating: 3.63

Component Based Selelction Technique Using Clustering

49

Search the next highest value that is 0.36 and extract the cells with most optimal solution. Examine only unclustered component set that is {5, 7, 8}. 0.36 doesnt lie in this group.
Table 15: Next Highest Value

Search the next highest value that is 0.33 and extract the cells with most optimal solution. Examine only
un-clustered component set that is {5, 7, 8}. 0.33 doesnt lie in this group.
Table 16: Next Highest Value

Next highest value is 0.2 here and un- clustered component set that is {5, 7,8}.0.2 lie in {7, 5}, {8, 7} in Table 17.
Table 17: Clusters Obtained

www.tjprc.org

editor@tjprc.org

50

Neha Bansal & Jagdeep Kaur

So, the final group of cluster produced is shown in figure.


Cluster 1: 1, 2, 3, 4, 6, 9.
Cluster2:5, 7, 8.

Figure 3: Formation of Clusters


From the above clusters result, it is noticed that M1, M2, M3, M4, M6, M9 are more similar compare to M5, M7,
M9. This can be used to various set of projects, if the new project is identical to previous one then cost of the project can be
estimated.
Comparison between various Similarity Measures
Comparison have been made in Table 12 between various similarity measure such as Jaccard , Sorensen dice and
lastlySorensen dice HB on the basis of elapsed time and number of efficient clusters formed by taking the dataset of 50
components .

Jaccard- It is used to evaluate the analogous characteristics between the sets of components. It is define as
( )( + + ). While comparing two components, the subsequent condition occurs: p=1.1; q=0.1; r=1.0.

Sorensen It is used to evaluate identical characteristics between the component sets. It is define
as(2 )(2 + + ). While comparing two components, the subsequent condition occurs: p=1.1; q=0.1; r=1.0.

Sorensen Dice-HB- It is used to evaluate the analogous characteristics between the sets of components. It is
defined as (2 )(3 + 2 + 2 + ). While comparing two components, the subsequent condition occurs: p=1.1;
q=0.1; r=1.0; s=0.0.
Table 12: Comparison among Similar Measures
Similarity Measures
Jaccard
Sorensen Dice
Sorensen Dice-HB

Impact Factor (JCC): 7.1293

Time Elapsed
4.42
3.90
3.64

Number of Clusters
24
24
21

NAAS Rating: 3.63

Component Based Selelction Technique Using Clustering

51

Figure 4: Time Analysis of Similarity Measure


The figure 4 shows the time comparison among various similarity measures. Sorensen Dice-HB have better time
elapsed than that of the Sorensen Dice and Jaccard.

CONCLUSIONS

This paper majorly discussed about the efficient similarity measures in order to find the similarities between the
components. Similarity Matrix works as an input to clustering algorithm and provides the output in the clusters
form.After implementing the work, the conclusion highlights that:

Out of all similarity measure, the Sorensen Dice -HB is better because it provides the refined clusters result.
Sorensen Dice HBhelps in reducing the time and effort for finding the appropriate match according to the users
specification. User does not require much effort to fetch the required components according to their needs.

REFERENCES
1.

Lung, Chung-Horn, Marzia Zaman and Amit Nandi. Application of clustering techniques to software portioning , recovery
and restructuring.Journal of Systems and Software 73.2(2004): 227-244.

2.

Khan, Adnan, et al. A Component-Based Framewrk for Software Reusability. International Journal of Software Engineering
and Its Applications 8.10(2014):13-24.

3.

Divya Chaudhary.Component Based Software Engineering Systems: Process and Metrices.International Journal of
advanced Researchin Computer Science and Software Engineering,2013.

4.

Srinivas, Chintakindi, and CV Guru Rao. "Software Reusable Components with Repository System."

5.

V.Susheela Devi, M. Narashima Murthy. Text Book on Pattern Reorganization. An Introduction University Press.

6.

CarinaFrotaAlves, et al. Requirement Engineering For CTOS.WER, 2010.

7.

Kaur, Kulbir, and Harshpreet Singh. "Quantifying COTS Components Selection using Multi Criteria Decision Analysis
Method-PROMETHEE."Global Journal of ComputerScience and Technology 14.(2014).

8.

Jadhav, Anil, and Rajendra Sonar. "Analytic hierarchy process (AHP),weighted scoring method(WSM), and hybrid knowledge
based system (HKBS) for software selection: a comparative study." Emerging Trends inEngineering and Technology
(ICETET), 2009 2ndInternational Conferenceon IEEE, 2009.

9.

Kaur, Arvinder, and Kulvinder Singh Mann. "Component selection for component based software engineering." International
Journal of Computer Applications 2.1 (2010): 109-114.

www.tjprc.org

editor@tjprc.org

52

Neha Bansal & Jagdeep Kaur


10. Amit Kumar. "Software Reuse Libraries Based Proposed Classification for Efficient Retrieval of Components." International
Journal of Advanced Research in Computer Science and Software Engineering, 2013.
11. Basha, N., and Chandra Mohan. "A strategy to identify components using clustering approach for component
reusability." arXivpreprint arXiv:1406.4123 (2014).
12. Zhang, Wen, et al. "Text clustering using frequent itemsets." Knowledge-Based Systems 23.5 (2010): 379-388.
13. Srinivas, Chintakindi, Vangipuram Radhakrishna, and CV Guru Rao. "Clustering software components for program
restructuring and component reuse using hybrid XNOR similarity function." Procedia Technology 12 (2014): 246-254.
14. C.V. Guru Rao et.al. " Clustering Software Components for Program Restructuring and Component Reuse Using Hybrid XOR
SimilarityFunction."Procedia Computer Science 31 ( 2013 ) 1044 1050.

15. Shtern, Mark, and Vassilios Tzerpos. "Clustering methodologies for software engineering." Advances in Software
Engineering 2012 (2012): 1.

Impact Factor (JCC): 7.1293

NAAS Rating: 3.63

You might also like