Professional Documents
Culture Documents
Abstract—Artificial bee colony (ABC) algorithm has been used in problems: speech synthesis, diagnostic problems, medicine,
several optimization problems, including the optimization of finance, robotic control, signal processing, computer vision
synaptic weights from an Artificial Neural Network (ANN). and many other problems that fall under the category of
However, this is not enough to generate a robust ANN. For that pattern recognition [3]. Among many different neural network
reason, some authors have proposed methodologies based on so- classifiers, the multilayer feed- forward networks have been
T
called met heuristics that automatically allow designing an ANN,
mainly used for solving classification tasks, due to their well-
taking into account not only the optimization of the synaptic
weights as well as the ANN’s architecture, and the transfer known universal approximation capabilities The success of
function of each neuron. However, those methodologies do not neural networks largely depends on their architecture, their
generate a reduced design (synthesis) of the ANN. In this paper, training algorithm, and the choice of features used in training.
we present an ABC based methodology, which maximizes its Artificial neural networks (ANN) are very important tools for
accuracy and minimizes the number of connections of an ANN by solving different kind of problems such as pattern
ES
evolving at the same time the synaptic weights, the ANN’s
architecture and the transfer functions of each neuron. The
methodology is tested with several pattern recognition
classification, forecasting and regression. However, their
design imply a mechanism of error-testing that tests different
architectures, transfer functions and the selection of a training
algorithm that permits to adjust the synaptic weights of the
Keywords- ARTIFICIAL NEURAL NETWORKS, ANN. This design is very important because the wrong
ARTIFICIAL BEE COLONY ALGORITHM, selection of one of these characteristics could provoke that the
METHODOLOGY, STANDARD DEVIATION training algorithm be trapped in a local minimum. Because of
this, several met heuristic based methods in order to obtain a
good ANN design have been reported.
I. INTRODUCTION (HEADING 1) In [1], Xin Yao presents a state-of-the-art where evolutionary
Artificial Neural Networks are commonly used in pattern algorithms are used to evolve the synaptic weights and the
classification, function approximation, optimization, pattern architecture, in some cases with the help of classic techniques
A
matching, machine learning and associative memories. They like back-propagation algorithm. But there are some works
are currently being an alternative to traditional statistical like [2] where the authors evolve automatically the design of
methods for mining data sets in order to classify data. an ANN using basic PSO, second generation PSO (2GPSO)
Artificial Neural Networks are well-established technology for and a new technique (NMPSO). In [3], the authors design an
solving prediction and classification problems, using training ANN by means of DE algorithm and compare it with other
and testing data to build a model. However, the success of the bio-inspired techniques. In these last two works, the authors
evolve, at the same time, the principal features of an ANN: the
IJ
characteristics of this kind of ANN are optimized: the weights the input nodes, which receive the pattern a 2 IRN, pass the
between the hidden layer and the output layer, the spread information to the units in the first hidden layer, the outputs
parameters of the hidden layer base function, the centre from this first hidden layer are passed to the next layer, and so
vectors of the hidden layer and the bias parameters of the on until reaching the output layer, producing thus an
neurons of the output layer. In [10] an ANN is trained to approximation of the desired output y 2 IRM.
estimate and model the daily reference evapotranspiration of Basically, learning is a process by which the free parameters
two USA stations. There are other kinds of algorithms based (i.e., synaptic weights W and bias levels _) of an ANN are
on the bee behaviour that have been applied to train an ANN. adapted through a continuous process of stimulation by the
For example, in [11] bee algorithm is used to identify woods environment in which the network is embedded. The type of
defects, while in [12], the same algorithm is applied to learning is determined by the manner in which the parameter
optimize the synaptic weights of an ANN. In [13] a good changes take place. On the other hand, the learning process
review concerning this kind a bio-inspired algorithm to may be classified as: supervised or unsupervised. In this paper
provide solutions to different problems is given. It said that we focus on supervised learning that assumes the availability
ABC algorithm is a good optimization technique. In this paper of a labeled set of training data made up of p input-output
we want to verify if this algorithm performs in the automatic samples (see Eq. 2):
designing of an ANN, including not only the synaptic weights
but also architecture and the transfer functions of the neurons.
As we will see, the architectures obtained are optimal in the
sense that the number of connections is minimal without
T
losing efficiency. The paper is organized as follows: in section where a is the input pattern and d the desired response. Given
2 we briefly present the basics of ANNs. In section 3 we the training sample T_, the requirement is to compute the free
explain the fundamental concepts of ABC algorithm, while in parameters of the neural network so that the actual output y_
section 4 we describe how ABC algorithm is used to design of the ANN due to a_ is close enough to d_ for all _ in a
and ANN and how the ANN’s architecture can be optimized. statistical sense. In this sense, we may use the mean-square
In section 5 the experimental results using different error (MSE) given in Eq. 6 as the first objective function to be
[17]. It consists in a set of possible solutions xi (the 4: Produce new solutions vi for the employed bees by
population) that are represented by the position of the food using Eq. 4 and evaluate them.
sources. On the other hand, in order to find the best solution 5: Apply the greedy selection process.
three classes of bees are used: employed bees, onlooker bees 6: Calculate the probability values pi for the solutions xi
and scout bees. These bees have got different tasks in the by Eq. 5.
colony, i. e., in the earch space. 7: Produce the new solutions vi for the onlookers from
the solutions xi selected depending on pi and evaluate
Employed bees: Each bee searches for new neighbor food them.
source near of their hive. After that, it compares the food 8: Apply the greedy selection process.
source against the old one using Eq. 4. Then, it saves in their 9: Determine the abandoned solution for the scout, if exist,
memory the best food source. and replace it with a new randomly produced solution
xi by Eq. 6.
10: Memorize the best solution achieved so far.
11: cycle = cycle + 1
12:end for
---------------------------------------------------------------------------
IV. METHODOLOGY
T
The main aim of our methodology is to evolve, at the same
time, the synaptic weights, the architecture (or topology), and
After that, the bee evaluates the quality of each food source the transfer functions of each neuron so as to obtain a
based on the amount of the nectar (the information) i.e. the minimum Mean Square Error (MSE) as well as a minimum
fitness function is calculated. Finally, it returns to the dancing classification error (CER). At the same time, we look to
area in the hive, where the onlookers bees are. optimize the ANN’s architecture by reducing the number of
ES
Onlooker bees: This kind of bees watch the dancing of the
employed bee so as to know where the food source can be
found, if the nectar is of high quality, as well as the size of the
neurons and their connections. The problem to be solved can
be defined as follows: Giving a set of input patterns X = _x1,
..., xp , x 2 IRn and a set of desired patterns D = _d1, ..., dp
, d 2 IRm, find an ANN represented by W 2 IRq×(q+2) such
food source. The onlooker bee chooses probabilistically a food that a function defined by min (F (D,X,W)) is minimized. The
source depending on the amount of nectar shown by each codification of the ANN’s design to be evolved by ABC
employed bee, see Eq. 5 algorithm is given in Fig. 1. This figure shows the food
source’s position representing the solution to the problem.
This solution is defined by a matrix W 2 IRq×(q+2) composed
by three main parts: the topology (T), the synaptic weights
(SW), and the transfer functions (F ) where q is the maximum
number of neurons MNN; it is defined by q = 2 (m + n)
A
(remember that n is the dimension of the input patterns vector
where fiti is the fitness value of the solution i and SN is the and m is the dimension of the desired patterns vector). The
number of food sources which are equal to the number of three parts of the matrix W take values form three different
employed bees. ranges. For the case of the topology, the range is between _1,
2MNN − 1_, for the case of the synaptic weights is between
Scout bees: This kind of bees helps the colony to randomly [−4, 4] and for the case of the transfer functions the range is
create new solutions when a food source cannot be improved [1, nF], where nF is the number of transfer functions. The
IJ
anymore, see Eq. 6. This phenomenon is called ― limit‖ or ANN’s topology is codified based on the binary square matrix
―abandonment criteria‖. representation of a graph x where each component xij
T
when xij = 1. This information was codified into its decimal
base. For example, suppose that next binary code ‖01101‖ 8: end for
represents the connections of the ith neuron to five neurons. --------------------------------------------------------------------------
From this binary code, we can observe that only neurons two, Note that the restriction j < i is used to avoid the generation
three, and five are connected to neuron i. This binary code is of cycles in the ANN.
transformed into its decimal base value resulting in ‖13; this Until now, we have defined two fitness functions that help
will be the number that we will evolved instead of the binary to maximize the ANN’s accuracy by minimizing their error
ES
value. This scheme is much faster to manipulate. Instead of
evolving a string of bits, we thus evolve a decimal base
number. The synaptic weights of the ANN are codified again
(MSE or CER). Now, we have to propose a function that helps
not only to get a maximum accuracy but also to minimize
the number of connections of the ANN. The reduction of the
architecture could be represented as follows:
by square matrix representation of a graph x where each
component xij represents the synaptic weight between neuron
i and neuron j.
Finally, the transfer function for each neuron is represented by
an integer in the range of [0, 5] codifying one of the six where NC is the number of connections in the ANN designed
transfer functions used in this research: logsig, tansig, sin, by the proposed methodology and NMaxC is the maximum
adbas, pureline, and hardlim. These functions were selected number of connections generated with MNN neurons.
for they are the most popular and useful transfer functions in NMaxC is given as:
A
several kinds of problems.
When the aptitude of an individual is computed by means of
the MSE function (Eq. 7), all the values of matrix W are
codified so as to obtain the desired ANN. Moreover, each
solution must be tested in order to evaluate its performance. It is important to notice that if F3 is used as a fitness function
Due to the methodology is tested with several pattern in the ABC algorithm. The proposed methodology will allow
classification problems, it is necessary to know the synthesizing the ANN but the accuracy will not be maximized.
IJ
classification error (CER) function, this means to know how For that reason, we have to proposed a fitness function that
many patterns have been correctly classified and many were integrates both objectives: the minimization of the error and
incorrectly classified. synthesis of the ANN (the reduction of the number of
connections). Two fitness functions are proposed to achieved
this goal using the ABC algorithm. These fitness functions are
composed by combining functions F1, F2 and F3. First fitness
function (FF1) is represented by Eq. 11, while second fitness
For the case of the CER fitness function, the output of the function (FF2) is represented by Eq. 12.
ANN is transformed by means of the winner-take-all
technique; this codification is then compared with the set of
the desired patterns. When the output of the ANN equals the
corresponding desired pattern, this means that the pattern has
been correctly classified, otherwise it was incorrectly
With these functions we will next see the we will be able to divided into two sets: a training set and a testing set; this with
design ANNs with a high accuracy and a very low number of the aim to prove the robustness and the performance of the
connections methodology. The same parameters were used through the
whole experimentation.
Depending of the problem, the ABC algorithm approaches the
solution to the minimum error during the evolutionary
learning process at different rates. For instance, for the case of
the object recognition problem, we observed that by evolving
FF1 (the one using MSE), the error tends to diminish faster,
and after a certain number of generations the error diminish
slowly (Figure 2(a)). On the other hand, we also observed that,
in some cases when FF2 is evolved, the error reaches the
minimum error in a few number of epochs; nonetheless
The error tends to diminish slowly (Figure 2(b)).
T
ES
Fig. 2. Evolution of the error for the ten experiments for the
object recognition problem. (a) Evolution of FF1 using MSE
functions. (b) Evolution of FF2 using CER function .
IV. RESULTS
Several experiments were performed in order to evaluate the
accuracy of the ANN designed by means of the proposal. The
accuracy of the ANN was tested with four pattern
classification problems. Three of them were taken from UCI
Fig. 3. Evolution of the error for the ten experiments for the
machine learning benchmark repository [19]: iris plant, wine
Iris plant problem. (a) Evolution of FF1 using MSE function.
A
and breast cancer datasets. The other database was a real
(b) Evolution of FF2 using CER function .
object recognition problem.
The main features of each dataset are next given. For the case
of the object recognition dataset the dimension of the input
vector is 7 and the number of classes is 5 objects. For the iris
plant dataset, the dimension of input vector is 4 and the
number of classes is 3. For the wine dataset, the dimension of
IJ
Fig. 4. Evolution of the error for the ten experiments for the taking into account F3 function. (b) ANN designed by the
Wine problem. (a) Evolution of FF1 using MSE functions. (b) ABC algorithm taking into account F3 function.
Evolution of FF2 using CER function.
T
Fig. 7. Two different ANNs designs for the Iris plant problem.
(a) ANN designed by the ABC algorithm without taking into
account F3 function. (b) ANN designed by the ABC algorithm
taking into account F3 function.
ES for each dataset. It is important to note that Figures 6(a), 7(a),
8(a) and 9(a) were automatically designed by the ABC
algorithm but the fitness functions FF1 and FF2 did not
include the synthesis of the ANN, in other words, these fitness
functions do not used F3 function. On the contrary, Figures
6(b), 7(b), 8(b) and 9(b) were automatically designed by the
ABC algorithm but the fitness functions FF1 and FF2 include
the synthesis of the ANN using F3 function.
Furthermore, in some cases the dimensionality of the input
pattern is reduced because some features do not contribute to
the ANN’s output.
Fig. 5. Evolution of the error for the ten experiments for the
A
Breast cancer problem. (a) Evolution of FF1 using MSE
functions. (b) Evolution of FF2 using CER function
IJ
Fig. 8. Two different ANNs designs for the Wine problem. (a)
ANN designed by the ABC algorithm without taking into
Fig. 6. Two different ANNs designs for the object recognition account F3 function. (b) ANN designed by the ABC algorithm
problem. (a) ANN designed by the ABC algorithm without taking into account F3 function
T
Fig. 9. Two different ANNs designs for the Breast cancer
problem. (a) ANN designed by the ABC algorithm without
taking into account F3 function. (b) ANN designed by the
ABC algorithm taking into account F3 function
ES
Table I shows the average connection number achieved with
the proposed fitness functions (FF1 and FF2). In addition, we
also present the average connection number achieved when F3
is not taken into account by the proposed fitness functions. As
the reader can appreciate, the number of connections decreases
when function F3 is used.
TABLE I
AVERAGE CONNECTION NUMBER
T
Fig. 11. Percentage of recognition for the Iris problem and the
ten experiments during the training and testing stage for each
fitness function.
(a) Percentage of recognition minimizing the FF1 function. (b)
Percentage
of recognition minimizing the FF2 function. Fig. 13. Percentage of recognition for the Breast cancer
ES problem and the ten experiments during the training and
testing stage for each fitness function. (a) Percentage of
recognition minimizing the FF1 function. (b) Percentage of
recognition minimizing the FF2 function
TABLE III
STANDARD DEVIATION RECOGNITION
A
Dataset Objective function Objective function
FF1 FF
Training Testing Training Testing
Object rec. 0.0386 0.0962 0.0371 0.0842
Fig. 12. Percentage of recognition for the Wine problem and Iris plant 0.0237 0.0378 0.0189 0.0373
the ten experiments during the training and testing stage for Wine 0.0287 0.0575 0.0304 0.1164
each fitness function. (a) Percentage of recognition
IJ
T
the IEEE, vol. 87, 1999.
Iris plant 0.92 0.8533 0.9333 0.84 [2] B. A. Garro, H. Sossa, and R. A. Vazquez, ― Design of artificial neural
Wine 0.8989 0.7865 0.8315 0.5169 networks using a modified particle swarm optimization algorithm,‖
Breast Cancer 0.9648 0.9444 0.9501 0.9386 in Proceedings of the 2009 international joint conference on Neural
Networks, ser. IJCNN’09. Piscataway, NJ, USA: IEEE Press, 2009,
pp. 2363–2370.
[3] ——, ― Design of artificial neural networks using differential evolution
algorithm,‖ in Proceedings of the 17th international conference on
ES
From these experiments, we observed that the ABC algorithm
was able to find the best configuration for an ANN given a
specific set of patterns that define a classification problem.
Neural information processing: models and applications - Volume Part
II, ser. ICONIP’10. Berlin, Heidelberg: Springer-Verlag, 2010, pp.
201–208.
[4] D. Karaboga and B. Akay, ― Artificial Bee Colony (ABC) Algorithm
on Training Artificial Neural Networks,‖ in Signal Processing and
Moreover, the integration of the synthesis into the fitness Communications Applications, 2007. SIU 2007. IEEE 15th, 2007, pp.
1–4.
function provokes that the ABC algorithm generates ANNs
[5] D. Karaboga, B. Akay, and C. Ozturk, ― Artificial bee colony (abc)
with a small number of connections and high performance. optimization algorithm for training feed-forward neural networks,‖ in
The design of the ANN’s consists on providing a good Proceedings of the 4th international conference on Modeling Decisions
architecture with the best set of transfer functions and synaptic for Artificial Intelligence, ser. MDAI ’07. Berlin, Heidelberg: Springer-
Verlag, 2007, pp. 318–329.
weights. The experimentation shows that all the designs
[6] D. Karaboga and C. Ozturk, ― Neural networks training by artificial
generated by the proposal present an acceptable percentage of bee colony algorithm on pattern classification,‖ Neural Network World,
A
recognition for both training and testing phases with the two vol. 19, no. 10, pp. 279 –292, 2009.
fitness functions [7] D. Karaboga, C. Ozturk, and B. Akay, ― Training neural networks
with abc optimization algorithm on medical pattern classification,‖ in
V. CONCLUSIONS International conference on multivariate statistical modelling and high
dimensional data mining, 2008.
The design of an ANN is achieved using the proposed [8] C. Ozturk and D. Karaboga, ― Classification by neural networks and
methodology. The synaptic weights, the architecture and the clustering with artificial bee colony (abc) algorithm,‖ in International
transfer function of an ANN are evolved by means of ABC symposium on intelligent and manufacturing systems features, strategies
IJ
T
ES
A
IJ