Professional Documents
Culture Documents
Neural Networks
journal homepage: www.elsevier.com/locate/neunet
Department of Computer Engineering, University of Adana Science and Technology, Adana, Turkey
article
info
Article history:
Received 14 April 2012
Received in revised form 7 November 2012
Accepted 3 December 2012
Keywords:
GCNN
GRNN
PNN
Classification neural networks
Gradient descent learning
abstract
In this work a new radial basis function based classification neural network named as generalized classifier
neural network, is proposed.
The proposed generalized classifier neural network has five layers, unlike other radial basis function
based neural networks such as generalized regression neural network and probabilistic neural network.
They are input, pattern, summation, normalization and output layers. In addition to topological difference,
the proposed neural network has gradient descent based optimization of smoothing parameter approach
and diverge effect term added calculation improvements. Diverge effect term is an improvement
on summation layer calculation to supply additional separation ability and flexibility. Performance
of generalized classifier neural network is compared with that of the probabilistic neural network,
multilayer perceptron algorithm and radial basis function neural network on 9 different data sets and
with that of generalized regression neural network on 3 different data sets include only two classes in
MATLAB environment. Better classification performance up to %89 is observed. Improved classification
performances proved the effectivity of the proposed neural network.
2012 Elsevier Ltd. All rights reserved.
1. Introduction
Pattern classification problems are important application areas
of neural networks used as learning systems (Al-Daoud, 2009;
Bartlett, 1998; Specht, 1990). Multilayer perceptrons (MLP), radial
basis functions (RBF), probabilistic neural networks (PNN), self
organizing maps (SOM), cellular neural networks (CNN), recurrent
neural networks and conic section function neural network
(CSFNN) are some of these neural networks. In addition to
classification problems, function approximation problems are
also solved with neural networks. Generalized regression neural
network (GRNN) is one of the most popular neural network,
used for function approximation. GRNN and PNN are kinds of
radial basis function neural networks (RBFNN) with one pass
learning (Al-Daoud, 2009). However they are similar; PNN is used
for classification where GRNN is used for continuous function
approximation (Mosier & Jurs, 2002).
PNN introduced by Donald F. Specht in 1990 (Specht, 1990)
is used for various classification problems ever since (Adeli
& Panakkat, 2009; Hajmeer & Basheer, 2002; Kailun, Huijun,
& Maohua, 2010; Zhu & Hao, 2009). Since performance of
PNN is related with smoothing parameter and size of the
Corresponding author.
E-mail addresses: melis.ozyildirim@gmail.com (B.M. Ozyildirim),
mavci@cu.edu.tr (M. Avci).
0893-6080/$ see front matter 2012 Elsevier Ltd. All rights reserved.
doi:10.1016/j.neunet.2012.12.001
19
Attributes
Classes
Data
Glass
Habermans survival
Two spiral problem
Lenses
Balance-scale
Iris
Breast cancer wisconsin
E.coli
Yeast
10
3
2
4
4
4
10
8
8
7
2
2
3
3
3
2
8
10
214
306
328
24
625
150
699
336
1484
two spiral problem, lenses, balance-scale, iris, breast-cancerwisconsin Bennett and Mangasarian (1992), Mangasarian, Setiono,
and Wolberg (1990), Mangasarian and Wolberg (1990), Wolberg
and Mangasarian (1990), E.coli and yeast data sets (Frank &
20
Table 2
10-fold cross validation classification performances.
Data sets/methods (%)
GCNN
GRNN
PNN
= 0.3
GRNN
optimized
PNN
optimized
Glass
sigma =
0.2567/94.3925
sigma =
0.2794/66.0131
sigma =
0.2998/89.0244
sigma = 0.3/100
52.8037
59.4771
59.4771
85.3659
85.3659
66.6667
sigma =
0.2/59.8
sigma =
0.24/85.37
72.1154
Iris
sigma =
0.2997/91.5064
sigma = 0.2823/100
94
sigma = 0.3/96.2751
95.4155
95.4155
E.coli
sigma = 0.2376/100
56.5476
sigma =
0.265/95.13
Yeast
sigma = 0.2171/100
11.1186
sigma =
0.196/55.61
sigma =
0.25/59.48
sigma =
0.25/85.37
sigma =
0.3/66.6667
sigma =
0.35/72.12
sigma =
0.26/95.33
sigma =
0.265/95
sigma =
0.14/77.68
sigma =
0.15/31.2
Habermans survival
Two spiral problem
Lenses
Balance-scale
= 0.3
GRNN
PNN
MLP
RBF
53.7383
48.130
49.0654
58.8235
61.1111
64.0523
71.2418
18.9024
85.3659
31.0976
79.2683
66.6667
70.83
75
69.7115
87.1795
73.2372
95.33
92
92
95.7020
70.7736
96.1318
67.1920
78.5714
76.1905
71.7262
38.6792
43.7332
38.2749
=1
= 0.1
Table 3
Training and test times.
Data sets
Glass
Habermans survival
Two spiral problem
Lenses
Balance-scale
Iris
Breast cancer wisconsin
E.coli
Yeast
GRNN
= 0.3
= 0.3
PNN
GRNN
optimized
PNN
optimized
GRNN
=1
= 0.1
PNN
MLP
RBF
10.9110/0.0873
11.8723/0.1334
14.3450/0.1955
0.1094/0.0014
61.6890/0.4979
3.3015/0.0382
60.8600/0.5141
32.0276/0.3169
680.5347/4.6442
0.1838
0.2004
0.2174
0.1888
0.1840
0.1989
0.1831
0.2064
0.1772
0.2108
0.2001
0.2562
0.1817
0.1924
0.2101
0.2049
0.1844
0.1987
0.1915
0.2053
0.1744
0.2117
0.1924
0.3951
0.1824
0.1909
0.2169
0.2014
0.1841
0.2112
0.1856
0.2069
0.1762
0.2126
0.2289
0.4267
2.0599/0.0607
1.2340/0.0620
1.2129/0.0606
0.8263/0.0521
2.7872/0.0598
1.3461/0.0563
1.7337/0.0552
3.1709/0.0637
14.1513/0.0801
10.1004/0.1005
14.3833/0.1011
11.0586/0.0854
0.9545/0.0802
21.8738/0.0954
0.6813/0.0983
37.7400/0.0883
1.9160/0.0949
174.1730/0.1344
21
Dj = x tj
x tj
(1)
Yf (x, Y ) dY
Y (x) =
f (x, Y ) dY
Y (x) =
yj e
j=1
Dj
2 2
j=1
T
(t x) 2(t x)
(x) = e
(4)
.
p
Dj
e
(2)
the other hand the larger one extends radius of effective neighbors
(Amrouche & Rouvaen, 2006; Ren et al., 2010).
(3)
2 2
g (x1 , x2 , . . . , xm ) =
p
j =1
1
p1 2 . . . m
x1 t1,j x2 t2,j
,...,
xm tm,j
(5)
22
Fig. 4. (continued)
g ( x) =
1
m
(2)( 2 ) p m
xtj 2
e = (y f )2
2 2
(6)
j =1
(7)
e
(f )
=2
.
w
w
(8)
e
Finally, w is updated in the opposite direction of w
with step
size as given in (9).
wt +1 = wt +
e
.
w
(9)
y (j, i) =
1 dist(2j)
0.9
0.1
1jp
1jp
1iN
d (j, i) = e
y (j, i)
d (j, i) r (j),
1iN
D=
(11)
(12)
(13)
(14)
(15)
r (j)
1 i N.
(16)
dist(j)
j =1
j =1
ci =
where y(z , id) represents the value of zth training input data for
idth class and cid is value of winner class. Secondly, first derivative
of error e is calculated according to (19)(22) (Masters & Land,
1997).
(17)
(19)
(20)
(21)
(22)
new = old +
e
.
(23)
r (j).
(18)
l(id) = 2
j =1
p
e = (y (z , id) cid )2
(10)
where d(j, i) denotes diverge effect term of jth training data and ith
class. ymax is initialized with 0.9 which denotes the maximum value
of y (j, i) and updated with the maximum value of output layer for
each iteration.
At this layer, when N neurons calculate sum of dot product of
diverge effect terms and pattern layer outputs as given in (14),
other neuron calculates denominator the same as GRNN, given in
(15).
ui =
Since smoothing parameter has an important effect on classification performance, gradient descent based training approach is
adapted to GCNN. During the training step, each training datum at
pattern layer is sequentially applied to neural network. Firstly for
each input, squared error e is calculated as given in (18).
e
cid
= 2 [(y (z , id) cid )]
o
b(id) l(id) cid
=
D
p
dist(j)
b(id) = 2
d (j, id) r (j)
3
j =1
23
and D =
j =1
j =1
d (j, i) r (j)
r (j)
u
24
Inputs of test algorithm are test data and optimum smoothing parameters obtained from training algorithm. Outputs are estimated classes of test data. Algorithm 2 shows algorithm of GCNN
test step.
Algorithm 2 Test of GCNN
inputs: smoothing_parameter, test_data
outputs: class
for each training datum; tj
find Euclidean distance between test
and training data, dist (j)
perform RBF activation function, r (j)
for each class; i
calculate diverge effect term,
d (j, i) = e(y(j, i)ymax ) y (j, i)
compute ui =
and D =
j =1
d (j, i) r (j)
j=1 r (j)
ui
D
25
Fig. 5. (continued)
with optimized smoothing parameter. GCNN has better classification performance than standard and optimized PNN and GRNN.
In addition to the radial basis function based neural networks
GCNN provides better classification performance than both MLP
and RBFNN except Habermans survival data set.
In GCNN smoothing parameter is optimized according to
training data for each fold. Fig. 4 shows smoothing parameter
values for each data set and fold.
Number of epoch and learning step for GCNN, optimized
GRNN and PNN models are chosen as 10 and 0.3 respectively. In
Table 3 average training and test times are given for each methods
compared in Table 2.
Since GCNN has training step, it requires more computational
time than other methods which do not include training step. On
the other hand, MLP has also training step however requires less
training time than GCNN. This is because GCNN has one neuron for
each training data in its hidden layer where MLP has less number
of neurons. Computational times of RBFNN and GCNN are close to
each other. The difference between RBFNN of MATLAB Toolbox and
GCNN is that, while GCNN includes one neuron for each training
26
Bennett, K. P., & Mangasarian, O. L. (1992). Robust linear programming discrimination of two linearly inseparable sets. Optimization Methods and Software, 2334.
Berthold, M. R., & Diamond, J. (1998). Constructive training of probabilistic neural
networks. Neurocomputing, 19, 167183.
Erkmen, B., & Yildirim, T. (2008). Improving classification performance of sonar
targets by applying general regression neural network with PCA. Expert Systems
with Applications, 35, 472475.
Firat, M., & Gungor, M. (2009). Generalized regression neural networks and feed
forward neural networks for prediction of scour depth around bridge piers.
Advances in Engineering Software, 40, 731737.
Frank, A., & Asuncion, A. (2010). UCI machine learning repository.
http://archieve.ics.uci.edu/ml.
Hajmeer, M., & Basheer, I. (2002). A probabilistic neural network approach for
modeling and classification of bacterial growth/no-growth data. Journal of
Microbiological Methods, 51, 217226.
Hoya, T., & Chambers, J. A. (2001). Heuristic pattern corrrection scheme using
adaptively trained generalized regression neural networks. IEEE Transactions on
Neural Networks, 12, 1.
Kailun, H., Huijun, X., & Maohua, X. (2010). The application of probabilistic neural
network model in the green supply chain performance evaluation for pig
industry. In International conference on e-business and e-government.
Kayaer, K., & Yildirim, T. (2003). Medical diagnosis on pima indian diabetes using
general regression neural networks. In Artificial neural networks and neural
information processing.
Kiyan, T., & Yildirim, T. (2004). Breast cancer diagnosis using statistical neural
networks. Journal of Electrical & Electronics Engineering, 4(2), 11491153.
Lee, E. W. M., Lim, C. P., Yuen, R. K. K., & Lo, S. M. (2004). A hybrid neural
network model for noisy data regression. IEEE Transactions on Systems, Man, and
Cybernetics, 34(2), 951960.
Madsen, K., Nielsen, H. B., & Tingleff, O. (2004). Methods for non-linear least
squares problems. Informatics and Mathematical Modeling Technical University
of Denmark.
Mangasarian, O. L., Setiono, R., & Wolberg, W. H. (1990). Pattern recognition via
linear programming: theory and application to medical diagnosis. In Large-scale
numerical optimization (pp. 2230). SIAM Publications.
Mangasarian, O. L., & Wolberg, W. H. (1990). Cancer diagnosis via linear
programming. SIAM News, 23(5), 118.
Mao, K., Tan, K., & Set, W. (2000). Probabilistic neural-network structure
determination for pattern classification. IEEE Transactions on Neural Networks,
11(4), 10091016.
Masters, T., & Land, W. (1997). A new training algorithm for the general regression
neural network. In IEEE international conference on systems, man and cybernetics,
computational cybernetics and simulation. Vol. 3 (pp. 19901994).
Montana, D. (1992). A weighted probabilistic neural network. Advances in neural
information processing systems, 4, 11101117.
Mosier, P. D., & Jurs, P. C. (2002). QSAR/QSPR studies using probabilistic neural
networks and generalized regression neural networks. Journal of Chemical
Information and Computer Sciences, 42, 14601470.
Popescu, I., Kanatas, A., Constantinou, P., & Nafornita, I. (2002). Application of
general regression neural networks for path loss prediction. In Proceedings of
international workshop trends and recent achievements in information technology.
Ren, S., Yang, D., Ji, F., & Tian, X. (2010). Application of generalized regression neural
network in prediction of cement properties. In 2010 International conference on
computer design and applications.
Rutkowski, L. (2004). Adaptive probabilistic neural networks for pattern classification in time-varying environment. IEEE Transactions on Neural Networks, 15(4),
811827.
Specht, D. F. (1990). Probabilistic neural networks. Neural Networks, 3, 109118.
Specht, D. F. (1991). A general regression neural network. IEEE Transactions on Neural
Networks, 2(6), 568576.
Tomandl, D., & Schober, A. (2001). A modified general regression neural network
with new efficient training algorithms as a robust black boxtool for data
analysis. Neural Networks, 14, 10231034.
Wang, Z., & Sheng, H. (2010). Rainfall prediction using generalized regression
neural network: case study Zhengzhou. In 2010 International conference on
computational and information sciences.
Wolberg, W. H., & Mangasarian, O. (1990). Multisurface method of pattern
separation for medical diagnosis applied to breast cytology. Proceedings of the
National Academy of Sciences, 87, 91939196.
Yildirim, T., & Cigizoglu, H. K. (2002). Comparison of generazlized regression neural
network and MLP performances on hydrologic data forecasting. In International
conference on neural information processing.
Yoo, P. D., Sikder, A. R., Zhou, B. B., & Zomaya, A. Y. (2007). Improved
general regression network for protein domain boundary prediction. In Sixth
international conference on bioinformatics.
Zhao, S., Zhang, J., Li, X., & Song, W. (2007). A generalized regression neural network
based on fuzzy means clustering and its application in system identification. In
International symposium on information technology convergence.
Zhu, C., & Hao, Z. (2009). Application of probabilistic neural network model in
evaluation of water quality. In Internation conference on environmental science
and information application technology.