Professional Documents
Culture Documents
Abstract – According to increment of software and code). There is a method to predict LOC as name UCP (Use
application requisition, present a stringent and reliable Case Point). Then can predict output with COCOMO II
model for effort estimation is necessary. In this paper a Equation [1]. There is a novel model of cascade correlation
feature extension method has been proposed, in order to neural network with cross validation which utilizes use case
increase the accuracy. In this way quadratic mapping has diagram. it seems that this model can be an alternative method
been used as feature expander. Quadratic mapping construct to predict software estimation [3]. Integrating fuzzy technique
more discriminative features, therefore it can gain better which weights the features with UCP can conclude reliable
result. Although use mapping causes dimensional increment, prediction. Fuzzy technique uses to calculate UCP coefficients
the results will be more accurate, especially when using W- [4]. Modification of UCP by definition Mobile Complexity
KNN (Weighted-K nearest neighborhood) as regression Factor (MCF) coefficient can gain better result in mobile
model. It should be mentioned, according to shortage of data application. MCF obtains based on MTDI factor and result of
in this field, dimensional increment, don't cause a serious some interviews with mobile developers [5]. Analogy-based
problem, and processing can be run on a usual home pc. effort estimation can be a reliable method for software effort
estimation because of its capability of handling noisy datasets.
Introduction This approach estimates effort based on k nearest analogies
It is clear that there are 2 approaches in data analysis. First is [6]. Combination of 2 different datasets with similar features
functional analysis, and second is statistical analysis. Using from a company will decrease the error of estimation
the first approach limits to a specific dataset and needs a quite significantly in statistical approaches [7]. A systematic
intuition on dataset. But the second approach is public and mapping named ASEE techniques have gained considerable
usable in any dataset. So, it is more common than first result in software effort estimation, especially when combine
approach. Also, implementation of second approach is simpler this approach with fuzzy logic or genetic algorithm [8]. A
than first approach. Constructive Cost Model (COCOMO), comparison between multilayer perceptron (MLP) , general
one of the most famous functional approaches, developed by regression neural network (GENN), cascade correlation neural
Barry W.Boehm in 1970s. This method used a predefined network (CCNN), radial basis function neural network
formula to compute software effort estimation. But it was not (RBFNN) showed CNN outperformed other models[9].
accurate enough. COCOMO II has been developed by Automatically transformed linear Model (ATLM) is a
COCOMO in 2000. As COCOMO II is more sufficient than stringent baseline model for comparison against software
earlier version it gained a lower rate of error [1]. We are now effort estimation methods. ATLM can be used as a baseline of
investigating several new papers: A successful approach to effort prediction for every future model in effort estimation
select important features is correlation. Even Decision tree [10].
Can use in a intelligently way to select features more related to BASIC CONCEPTS
output. And evolutionary SVM can be a valid method to In this part, we first explain quadratic mapping which is a kind
predict new value with low MMRE [2]. One of most important of feature expander then we review 4 statistical methods in
features in every datasets of effort estimation is LOC (Line of detailed.
421
All rights reserved by www.ijrdt.org
Paper Title:-Improve Software Effort Estimation using Weighted KNN in New Feature Space
dimensional space will create {x_1^2, √2 x_1 x_2, x_2^2}. max[14]. The following Eq (5), (6) utilize for classifier
K NEAREST NEIGHBORHOOD 𝑗
1
1-Nearest neighborhood measures distance between test and 𝑓 𝑥 = 𝑗 𝑥 (7)
train data. And choose output of nearest train data as output of
𝑗
𝑗 =1
new test data. Euclidean distance usually utilizes as a Which 𝑗 (x)s included of 1 𝑥 , 2 𝑥 , … , 𝑛 (x) are
confident way to measure distance[12]. outputs of every node. And average of their quantity is output
of Random Forest Regression model[15]. It should be
𝐷𝑖𝑠𝑡 𝑥𝑗 , 𝑥𝑖 = 𝑥𝑗 − 𝑥𝑖 1
Eq (1) shows how to calculate Euclidean distance. In uniform mentioned using Random Forest with cross validation usually
KNN, average of output of k nearest neighbors calculate and gain better result[16].
422
ISSN:-2349-3585 |www.ijrdt.org
Paper Title:-Improve Software Effort Estimation using Weighted KNN in New Feature Space
Hamming Distance
Number of unequal columns (x, y)
= (12)
Number of total columns of x or y
Eq (12) shows how to measure hamming distance[20]. Where
x is test data and y is nearest cluster of train data to x. Indeed,
before start predicting with hamming distance, all data (train
and test) have been labeled with mentioned approach in
Kmeans. Then for predicting, hamming distance have been
measured between test data and part of train data with same
Figure 1: shows multilayer perceptron with 2 hidden label of test data.
layers
The main problem in neural network is updating weights and
biases of each neuron which is usually execute with back
propagation algorithm.
𝑆 𝑀 = −2 𝐹 ′𝑀 𝑛𝑀 𝑡 − 𝑎 (8)
423
ISSN:-2349-3585 |www.ijrdt.org
Paper Title:-Improve Software Effort Estimation using Weighted KNN in New Feature Space
𝑀𝑀𝑅𝐸. 𝑖
100
= (15)
𝑇 ∗ (𝑀𝑅𝐸. 1 + 𝑀𝑅𝐸. 2 + ⋯ . 𝑀𝑅𝐸. 𝑇)
424
ISSN:-2349-3585 |www.ijrdt.org
Paper Title:-Improve Software Effort Estimation using Weighted KNN in New Feature Space
425
ISSN:-2349-3585 |www.ijrdt.org
Paper Title:-Improve Software Effort Estimation using Weighted KNN in New Feature Space
426
ISSN:-2349-3585 |www.ijrdt.org