You are on page 1of 12

International J ournal in Foundations of Computer Science & Technology (IJ FCST), Vol.4, No.

5, September 2014

DOI:10.5121/ijfcst.2014.4505 45

A HYBRID FUZZY-ANN APPROACH FOR SOFTWARE
EFFORT ESTIMATION

Sheenu Rizvi
1
, Dr. S.Q. Abbas
2
and Dr. Rizwan Beg
3

1
Department of Computer Science, Amity University, Lucknow, India
2
A.I.M.T., Lucknow, India
3
Integral University, Lucknow, India

ABSTRACT

Software development effort estimation is one of the major activities in software project management.
During the project proposal stage there is high probability of estimates being made inaccurate but later on
this inaccuracy decreases. In the field of software development there are certain matrices, based on which
the effort estimation is being made. Till date various methods has been proposed for software effort
estimation, of which the non algorithmic methods, like artificial intelligence techniques have been very
successful. A Hybrid Fuzzy-ANN model, known as Adaptive Neuro Fuzzy Inference System (ANFIS) is more
suitable in such situations. The present paper is concerned with developing software effort estimation
model based on ANFIS. The present study evaluates the efficiency of the proposed ANFIS model, for which
COCOMO81 datasets has been used. The result so obtained has been compared with Artificial Neural
Network (ANN) and Intermediate COCOCMO model developed by Boehm. The results were analyzed using
Magnitude of Relative Error (MRE) and Root Mean Square Error (RMSE). It is observed that the ANFIS
provided better results than ANN and COCOMO model.

KEYWORDS

Software Effort Estimation, RMSE, ANFIS, ANN, COCOMO, MRE.

1. INTRODUCTION

One of the key challenges in software industry is the accurate estimation of the development
effort, which is particularly important for risk evaluation, resource scheduling as well as progress
monitoring. Inaccuracies in estimations lead to problematic results; for instance, overestimation
causes waste of resources, whereas underestimation results in approval of projects that will
exceed their planned budgets. For this many models has been framed so as to make it cost
effective. These models can be examined based on methodologies used: Expert-based, analogy-
based and regression-based. Expert based models depend on the expert knowledge to use past
experience on software projects. Based on a comprehensive review, expert based estimation is
one of the most frequently applied estimation strategy. Alternatively, regression-based methods
use statistical techniques such as least square regression, in the sense that a set of independent
variables explain the dependent variable with minimumerror rate. Mathematical models like
Barry Boehms COCOMO [1] and COCOMO II [2] are widely investigated regression-based
methods. Parameters of these models are calibrated according to the projects in a company. Thus,
they have the drawback of requiring local calibration. To combat these problems a hybrid Fuzzy-
ANN model known as Adaptive Neuro Fuzzy Inference System(ANFIS) has been dealt in this
paper.

International J ournal in Foundations of Computer Science & Technology (IJ FCST), Vol.4, No.5, September 2014

46
2. DATA USED

The data used is COCOMO 81. The data utilised for ANFIS model development as input and
output variables are given in the Table 1. Total sixteen input variables have been used which
include fifteen effort multipliers and the size measured in thousand delivered lines of code.
Development Effort (DE) has been used as the output of the model measured in man-months. The
data were collected from the analysis of sixty three (63) software projects, as published by Barry
Boehmin 1981[3] [16].

Table 1. Input and Output variables for ANFIS model.

Input
Variables
RELY - Required software reliability
DATA - Data base size,
CPLX - Product complexity,
TIME - Execution time,
STORmain storage constraint,
VIRTvirtual machine volatility
TURNcomputer turnaround time,
ACAPanalyst capability,
AEXPapplications experience,
PCAPprogrammer capability,
VEXPvirtual machine experience,
LEXPlanguage experience
MODPmodern programming,
TOOLuse of software tools,
SCEDrequired development schedule,
SIZE in KLOC
Output
Variable
Development Effort (DE)

Source: - COCOMO81 Dataset (PROMISE Software Engineering Repository data [16])

3. ANFIS MODEL DEVELOPMENT

3.1. Parameter Selection

ANFIS [9],[10] is a judicious integration of FIS and ANN, capable of learning, high-level
thinking and reasoning and it combines the benefits of these two techniques into a single capsule
[4]. The success for FIS is the finding of the rule base. The reason being that there are no specific
techniques for converting the knowledge of human beings into the rule base and also in order to
maximise the performance of the model and to minimize the output error, further fine tuning of
the membership functions is required. Thus when generating a FIS using ANFIS, it is important
to select proper parameters, including the number of membership functions (MFs) for each
individual antecedent variables. It is also vital to select appropriate parameters for learning and
refining process, including the initial step size (ss). In the present work the commonly used rule
extraction method applied for FIS identification and refinement is subtractive clustering. The
MATLAB Fuzzy Logic Toolbox [7] has been used for ANFIS model development.

Here the initial parameters of the ANFIS are identified using the subtractive clustering method
[5]. However, it is vital to properly define the substractive clustering parameters, of which the
clustering radius is the most important. It is determined through a trial and error approach. By
varying the clustering radius r
a
with varying step size, the optimal parameters are obtained by
International J ournal in Foundations of Computer Science & Technology (IJ FCST), Vol.4, No.5, September 2014

47
minimizing the root mean squared error based on the validation datasets. Clustering radius r
b
is
selected as 1.5r
a
. Gaussian membership functions are used for each fuzzy set in the fuzzy system.
The number of membership functions and fuzzy rules required for a particular ANFIS is
determined through the subtractive clustering algorithm. Parameters of the Gaussian membership
function are optimally determined using the hybrid learning algorithm. Each ANFIS is trained for
10 epochs.

Gaussian membership function has been used as the input membership function and linear
membership function for the output function. Here separate sets of input and output data has been
used as input arguments. In MATLAB genfis2 generates a Sugeno-type FIS structure using
subtractive clustering. Genfis2 is generally used where there is only one output; hence here it has
been used to generate initial FIS for training the ANFIS. On the other hand genfis2 achieves this
by extracting a set of rules that simulates the data values. In order to determine the number of
rules and antecedent membership functions, subclust function has been used by the rule extraction
methods. Further it uses the linear least squares estimation to determine each rule's consequent
equations.

The parameters used in the model for training ANFIS are given in Table 2 and the rule extraction
method used is given in Table 3. Table 4 summarizes the results of types and values of model
parameters used for training ANFIS

Table 2. Parameters used in all the models for training ANFIS

Rule extraction method
used
Subtractive clustering
Input MF type Gaussian membership (gaussmf)
Input partitioning variable
Output MF Type Linear
Number of output MFs one
Training algorithm Hybrid learning
Training epoch number 10
Initial step size 0.01

Table 3. Rule extraction method used for training ANFIS

Rule Extraction Method Type
And method prod
Or method probor
Defuzzy method wtever
Implication method prod
Aggregation method max

Table 4. Values of parameters used for training ANFIS

No. of nodes 1311
No. of linear parameters 646
No. of non-linear parameters 1216
Total no. of parameters 1862
No. of training data pairs 40
No. of testing data pairs 23
No. of fuzzy rules 38


International J ournal in Foundations of Computer Science & Technology (IJ FCST), Vol.4, No.5, September 2014

48
4. RESULT AND DISCUSSION

Here the ANFIS model has been trained tested by ANFIS method and their performance for the
best prediction model are evaluated and compared for training and testing data sets separately.
The RMSE performances of the ANFIS model both for training and testing datasets have been
plotted separately in Fig. 1 & Fig.2 and their corresponding range of values (minimum and
maximum) are summarized in Table 5.



Figure 1. Graphical plot of RMSE value variation during training


Figure 2. Graphical plot of RMSE value variation during testing


Table 5. Range of RMSE during training and testing phase

RMSE Value
Minimum Maximum
Training datasets 0.4824 2.8096
Testing datasets 186.41 188.41
International J ournal in Foundations of Computer Science & Technology (IJ FCST), Vol.4, No.5, September 2014

49
Further Table 6 gives the RMSE values using COCOMO, ANN and ANFIS techniques.

Table 6. Performance evaluation using RMSE criteria

RMSE
Val.
COCOMO ANN ANFIS
532.2147 353.1977 112.638

Fromanalysis of Fig. 1 & Fig. 2 and perusal of the data given in tables 5 it is inferred that during
training phase (Fig.1), there is zig zag variation in the RMSE values, having a minimum value of
0.4824 (at epoch 8) and a maximumvalue of 2.8096 ( epoch 3). Hence during training phase
there is initially a rise in the RMSE value and then there is a fall at epoch no. 8, after which there
is again a slight increase. On the other hand, during testing phase (Fig.2) of ANFIS training
initially upto epoch 4 the RMSE value decreases and reaches upto a minimum of 186.41 and then
there is steep rise in the RMSE value upto 10 epochs, where the maximum value reached is
188.41. FromTable 5 it can be inferred that ANFIS has performed better during training phase
than testing phase but its overall RMSE value is 112.638. Which shows a marked improvement
than those calculated in ANN and COCOMO model i.e. 353.1977 and 532.2147 respectively.
(Given above in Table 6).

Further consider the absolute values of Magnitude of Relative Error (MRE) calculated both for
COCOMO and ANFIS models (given below in Table 7) and their comparative plot, both for
training and testing datasets (as given in Fig. 3 & 4). From the perusal of both the data and the
graphical plot, it is seen that during the training as well as testing phase of the ANFIS model
development, the absolute values of the MRE are very less as compared to COCOMO model,
especially during training phase. Since Absolute MRE computes the absolute percentage of error
between the actual and predicted effort for each project, hence from the above data analysis it can
be derived that the absolute percentage of error between the actual and predicted effort using
ANFIS technique is far less than those using COCOMO model.

Thus, it is clear that proper selection of influential radius which affects the cluster results directly
in ANFIS using subtractive clustering rule extraction method has resulted in reduction of RMSE
and MRE both for training and testing data sets. Hence, it is seen that for small size training data,
ANFIS has outperformed ANN and COCOMO model.

Table 7. Comparative chart of Absolute values of MRE for COCOMO and ANFIS Model

S.No. ABS MRE
COCOMO
ABS MRE
ANFIS
1. 8.651813725 0.000103189
2. 73.9110625 0.030832219
3. 1.377489712 0.00195532
4. 2.00825 0.000158388
5. 16.93939394 0.000202853
6. 40.51162791 1.22696E-05
7. 22.125 0.000142747
8. 41.41395349 1.94362E-05
9. 21.04728132 1.11052E-05
10. 14.17757009 5.40767E-05
11. 42.22018349 0.000783969
12. 0.646766169 9.3241E-05
13. 43.78481013 0.000854332
14. 16.41666667 6.95013E-07
International J ournal in Foundations of Computer Science & Technology (IJ FCST), Vol.4, No.5, September 2014

50
15. 28.47540984 4.75704E-06
16. 45.575 1.81974E-05
17. 181.7777778 0.000109538
18. 18.50412281 0.009939471
19. 45.78439394 0.041568784
20. 10.5675 0.007541921
21. 24.53034623 0.006063228
22. 12.06767956 2.95788E-05
23. 15.71799629 0.000118637
24. 31.38852097 0.000124277
25. 49.22179732 0.000220024
26. 26.12428941 7.74201E-06
27. 19.43181818 0.000151894
28. 35.63265306 2.81222E-05
29. 5.342465753 0.003622306
30. 8.661016949 0.0064311
31. 14.31420508 2.2618E-05
32. 94.06980057 0.002576867
33. 8.978512397 5.71114E-05
34. 26.07826087 1.92174E-05
35. 51.81707317 7.19225E-06
36. 27.74545455 5.829E-06
37. 86.59574468 0.000106447
38. 64.25 1.23164E-05
39. 22.5 0.000423304
40. 22.25 1.11081E-06
41. 13.16666667 34.11019307
42. 142.8666667 33.128475
43. 24.97590361 17.5124589
44. 52.72413793 49.50818218
45. 3.018867925 96.87507342
46. 69.76984127 12.0325458
47. 8.972222222 60.61766094
48. 73.31996855 41.92811776
49. 9.288461538 114.7807153
50. 7.693181818 7.139281263
51. 32.18032787 23.15173707
52. 11.07317073 24.48625124
53. 60.07142857 40.28145
54. 41.1 73.28148424
55. 58.27777778 7.153429004
56. 59.40709812 59.77180117
57. 17.02531646 25.23833685
58. 11.68461538 11..7211021
59. 18.25714286 22.62693271
60. 12.0877193 10.9231245
61. 5.48 18.00801248
62. 8.368421053 27.0459325
63. 14.2 31.29088085
International J ournal in Foundations of Computer Science & Technology (IJ FCST), Vol.4, No.5, September 2014

51
Absolute MRE of COCOMO and ANFIS Output for training data
0
100
200
1 4 7 10 13 16 19 22 25 28 31 34 37 40
No. of Projects
A
b
s
o
l
u
t
e

M
R
E
COCOMO MRE
ANFIS MRE


Figure 3. Absolute MRE plot for COCOMO and ANFIS Output for training datasets

MRE of COCOMO and ANFIS output for testing data
0
50
100
150
200
1 3 5 7 9 11 13 15 17 19 21 23
No. of Projects
A
b
s
o
l
u
t
e

M
R
E
MRECOCOMO
MREANFIS


Figure 4. Absolute MRE plot for COCOMO and ANFIS Output for testing datasets

In order to depict how well ANFIS has performed over ANN and COCOMO model, a
comparative plot of actual effort versus predicted effort, by COCOMO, ANN and ANFIS
technique, has been shown in Fig. 5 using data given in Table 8.. Fromthe graph it is seen that
ANFIS model line almost closely follows the actual effort line than those of COCOMO. This
again depicts the superiority of ANFIS technique over ANN and COCOMO model for effort
estimation.

Table 8. Comparative chart of Actual Effort Versus Estimated Effort using COCOMO, ANN and ANFIS

S. No Actual
Effort
Estimated Effort using
COCOMO ANN ANFIS
1
2040 1863.503 2040.022 2040.002
2
1600 2782.577 3168.456 1599.507
3
243 246.3473 242.8827 242.9952
International J ournal in Foundations of Computer Science & Technology (IJ FCST), Vol.4, No.5, September 2014

52
4
240 235.1802 240.167 240.0004
5
33 38.59 39.88948 32.99993
6
43 25.58 11.68468 42.99999
7
8 9.77 6.106686 7.999989
8
1075 629.8 1075.621 1075
9
423 333.97 197.3923 423
10
321 275.49 13.33255 320.9998
11
218 310.04 217.8293 218.0017
12
201 199.7 200.0765 200.9998
13
79 113.59 82.28573 78.99933
14
60 50.15 59.5612 60
15
61 43.63 56.88275 61
16
40 58.23 41.55418 39.99999
17
9 25.36 41.71533 9.00001
18
11400 9290.53 11384.8 11398.87
19
6600 9621.77 6599.016 6602.744
20
6400 5723.68 7108.591 6399.517
21
2455 1852.78 2454.785 2454.851
22
724 811.37 1036.327 724.0002
23
539 454.28 538.0881 539.0006
24
453 310.81 10.07177 452.9994
25
523 265.57 1214.319 522.9988
26
387 285.899 387.3988 387
27
88 70.9 88.77245 87.99987
28
98 132.92 96.47764 98.00003
29
7.3 7.69 15.74339 7.299736
30
5.9 6.411 20.11236 5.900379
31
1063 1215.16 1063.154 1063
32
702 1362.37 1129.184 701.9819
33
605 550.68 604.7895 605.0003
34
230 170.02 73.82972 230
35
82 124.49 30.58422 82.00001
36
55 39.74 7.026457 55
37
47 87.7 29.24169 46.99995
38
12 19.71 7.208678 12
39
8 6.2 66.48077 8.000034
40
8 9.78 8.401984 8
41
6 5.21 6.211204 8.046612
International J ournal in Foundations of Computer Science & Technology (IJ FCST), Vol.4, No.5, September 2014

53
42
45 109.29 234.8325 195.2396
43
83 103.73 101.074 228.257
44
87 132.87 100.6351 130.0721
45
106 109.2 157.2179 3.31
46
126 213.91 122.6887 343.28
47
36 32.77 7.266029 57.82236
48
1272 2204.63 6.364794 738.6743
49
156 141.51 155.7227 335.0579
50
176 162.46 491.2995 188.5651
51
122 82.74 254.6255 93.75488
52
41 36.46 48.05263 51.03936
53
14 22.41 38.53126 104.7524
54
20 11.78 6.371402 34.6563
55
18 7.51 8.634863 16.71238
56
958 388.88 957.3443 385.3861
57
237 277.35 238.0535 177.1851
58
130 145.19 1540.691 282.375
59
70 82.78 6.243794 85.83885
60
57 50.11 132.3261 119.6359
61
50 47.26 6.030985 40.99599
62
38 41.18 38.24981 140.7745
63
15 17.13 6.164915 19.69363

Finally, Figure 6, 7 & 8 shows the scatter plot of Actual Effort versus Estimated Effort using
ANFIS, ANN and COCOMO models. The figures show that the model performance is generally
precise in case of ANFIS, where all data points follow a linear trend line and the model using
ANFIS is better than ANN and COCOMO.

0
5000
10000
15000
1 7 131925313743495561
Actual Effort
Estimated Effort using COCOMO
Estimated Effort using ANN
Estimated Effort using ANFIS


Figure 5. Comparative plot of Actual Effort, COCOMO, ANN and ANFIS Output
International J ournal in Foundations of Computer Science & Technology (IJ FCST), Vol.4, No.5, September 2014

54
Usi ng ANFIS
0
5000
10000
15000
0 5000 10000 15000
Act ual Ef f or t
E
s
t
i
m
a
t
e
d

E
f
f
o
r
t


Figure 6. Scatter Plot of Actual vs. Estimated Effort using ANFIS

Using ANN
0
5000
10000
15000
0 2000 4000 6000 8000 1000
0
1200
0
Actual Ef f or t
E
s
t
i
m
a
t
e
d

E
f
f
o
r
t

Figure 7. Scatter Plot of Actual vs. Estimated Effort using ANN
International J ournal in Foundations of Computer Science & Technology (IJ FCST), Vol.4, No.5, September 2014

55
Usi ng COCOMO
0
5000
10000
15000
0 5000 10000 15000
Actual Effor t
E
s
t
i
m
a
t
e
d

E
f
f
o
r
t


Figure 8. Scatter Plot of Actual vs. Estimated Effort using COCOMO

5. CONCLUSION

Here, in the present paper, applicability and capability of ANFIS techniques for effort estimation
prediction has been investigated. It is seen that ANFIS models are very robust, characterized by
fast computation, capable of handling the noisy and approximate data that are typical of data used
here for the present study. Due to the presence of non-linearity in the data, it is an efficient
quantitative tool to predict effort estimation. The studies have been carried out using MATLAB
simulation environment. In all sixteen input variable were used, consisting of fifteen Effort
Adjustment Factors and size of the project and one output variable as Effort.

Here the initial parameters of the ANFIS are identified using the subtractive clustering method.
Gaussian membership functions (given in earlier section) are used for each fuzzy set in the fuzzy
system. Subtractive clustering algorithm has been used to determine the number of membership
functions and fuzzy rules required for ANFIS development. Here hybrid learning algorithmhas
been used to determine the parameters of the Gaussian membership function. Each ANFIS has
been trained for 10 epochs.

From the analysis of the above results, given under heading Results and Discussions, it is seen
that the Effort Estimation prediction model developed using ANFIS technique has been able to
perform well over ANN and COCOMO Model. This can be concluded from the analysis of the
results given in Tables 5, 6, 7 and 8. The RMSE value obtained fromANFIS model (112.638) is
lower than those fromANN (353.1977) and COCOMO Model (532.2147). Further fromFig. 6, 7
& 8 and Table 8 it is seen that ANFIS model line almost closely follows the actual effort line than
those of ANN and COCOMO. This again depicts the superiority of ANFIS technique over ANN
and COCOMO model for effort estimation.

REFERENCES

[1]. Alpaydn,E. 2004. Introduction to machine learning. Cambridge: MIT Press.
[2]. Boehm,B., Abts, C., Chulani, S. 2000. Software development cost estimation approaches: A survey.
[3]. Annals of Software Engineering (10): 177205.
International J ournal in Foundations of Computer Science & Technology (IJ FCST), Vol.4, No.5, September 2014

56
[4]. Boehm,B.W. 1981. Software Engineering Economics. Upper Saddle River, NJ, USA: Prentice Hall
PTR.
[5]. Chen,D.W. And Zhang, J.P., (2005), Time series prediction based on ensemble ANFIS,
Proceedings of the fourth International Conference on Machine Learning and Cybernetics, IEEE, pp
3552-3556.10
[6]. Chiu,S.,(1994), Fuzzy Model Identification based on cluster estimation, Journal of Intelligent and
Fuzzy Systems, 2 (3), pp 267278.11
[7] .Fuller,R.,(1995), Neural Fuzzy Systems, ISBN 951-650-624-0, ISSN 0358-5654.17
[8]. Fuzzy Logic Toolbox, MATLAB version R2013a.
[9]. Hammouda, K. A., Comparative Study of Data Clustering Techniques.
[10]. Jang,J -S.R.,(1992),Neuro-Fuzzy Modelling: Architecture, Analyses and Applications, P.hd. Thesis.
[11]. Jang,J -S.R.,(1993),ANFIS-Adaptive-Network Based Fuzzy Inference System, IEEE Transactions
on Systems, Man and Cybernetics, 23(3), pp 665-685.
[12]. Jang, J-S. R., SUN, C.-T., (1995), Neuro-fuzzy modelling and control, Proceedings IEEE,. 83 (3),
pp 378406.
[13]. Jantzen,J.,(1998), Neurofuzzy Modelling. Technical Report no. 98-H-874(nfmod), Department of
Automation. Technical University of Denmark.1-28.
[14]. Pendharkar, Parag C., et. al., (2005), A Probabilistic Model for Predicting Software Development
Effort, IEEE Transactions On Software Engineering, Vol. 31, NO. 7.
[15]. Priyono, A. Ridwan, M., et. al. (2005), Generation of fuzzy rules with subtractive clustering,
Journal Teknologi., 43(D), pp 143-153.
[16]. Sayyad Shirabad, J. and Menzies, T.J. (2005) The PROMISE Repository of Software Engineering
Databases. School of Information Technology and Engineering, University of Ottawa, Canada.
Available: http://promise.site.uottawa.ca/SERepository
[17]. Tagaki, T. And Sugeno, M. , (1983), Derivation of fuzzy control rules fromhuman operators control
actions, Proc. IFAC Symp. Fuzzy Inform, Knowledge Representation and Decision Analysis, pp 55-
60.
[18]. Vaidehi, V., Monica, S., Mohammad Sheikh Safeer, S.,Deepika, M. And Sangeetha, S., (2008), A
Prediction System Based on Fuzzy Logic, Proceedings of World Congress on Engineering and
Computer Science. 38
[19]. Zadeh, L.A., 1965), Fuzzy sets, Information and Control, 8, pp 338353.36.

Authors

Sheenu Rizvi, Assistant Professor, Amity School of Engineering and Technology
Lucknow, India. He received his M.Tech degree in Information Technology in 2005 and
Persuing Ph.D in Computer Application fromIntegral University.


Syed Qamar Abbas completed his Master of Science (MS) fromBITS Pilani. His PhD
was on computer-oriented study on Queueing models. He has more than 20 years of
teaching and research experience in the field of Computer Science and Information
Technology. Currently, he is Director of Ambalika Institute of Management and
Technology, Lucknow.

Prof. Dr. M. Rizwan Beg is M.Tech & Ph.D in Computer Sc. & Engg. Presently he is
working as Controller of Examination in Integral University Luck now, Uttar Pradesh,
India He is having more than 16 years of experience which includes around 14 years of
teaching experience. His area of expertise is Software Engg., Requirement Engineering,
Software Quality, and Software Project Management. He has published more than 40
Research papers in International Journals & Conferences. Presently 8 research scholars
are pursuing their Ph.D in his supervision.

You might also like