Professional Documents
Culture Documents
Neuroevolutionary optimization
Eva Volna1
1
Department of Computer Science, University of Ostrava
Ostrava, 70103, Czech Republic
case of supervised learning where the exact desired output To overcome some shortcomings of the binary
is unknown. It is based only on the information of whether representation scheme, real numbers themselves proposed
or not the actual output is correct. Unsupervised learning to represent connection weights directly, i.e. one real
is solely based on the correlations among input data. The number per connection weight. The chromosome is
essence of a learning algorithm is the learning rule, i.e., a represented by the concatenation of these real numbers,
weight-updating rule that determines how connection where their order is important. As connection weights are
weights are changed. Examples of popular learning rules represented by real numbers, each individual in an
include the delta rule, the Hebbian rule, the competitive evolving population is a real vector. Standard search
learning rule, etc. are discussed in numerous publications operators dealing with binary strings cannot be applied
[1]. directly in the real representation scheme. In such
circumstances, an important task is to design carefully a
set of genetic operators, which are suitable for the real
2. Evolution in artificial neural networks representation as well as artificial neural network’s
training, in order to improve the speed and accuracy of the
Evolution has been introduced into artificial neural evolutionary training. Single real numbers are often
networks at roughly three different levels: connection changed by average crossover, random mutation or other
weights, architectures, and learning rules. The evolution of domain specific genetic operators. It is discussed in [2].
connection weights introduces an adaptive and global The major aim is to retain useful functional blocks during
approach to problem solution. The evolution of evolution, i.e., to form and keep useful feature detectors in
architectures enables artificial neural networks to adapt an artificial neural network.
their topologies to different tasks without human Evolutionary algorithms are usually based on a global
intervention and thus provides an approach to automatic search algorithm, thus can escape from a local minimum,
artificial neural network design. The evolution of learning while a gradient descent algorithm can only find a local
rules can be regarded as a process of “learning to learn” in optimum in a neighborhood of the initial solution. An
artificial neural networks, where the adaptation of learning evolutionary algorithm sets no restriction on types of
rules is achieved through evolution. artificial neural networks being trained as long as a
suitable fitness function can be defined properly, thus can
2.1 The evolution of connection weights deal with a wide range of artificial neural networks:
recurrent artificial neural networks, high-order artificial
The evolutionary approach to weight training in artificial
neural networks, fuzzy artificial neural networks etc. An
neural networks consists of two major phases. The first
assignment of the most acceptable evolutionary algorithm
phase means to decide the representation of connection
to a task represents always a big problem, because each
weights. The second one means the evolutionary process
search procedure is suitable only for a class of error
simulated by evolutionary algorithms.
(fitness) functions with certain types of landscape, the
issue of what kind of search procedure is more suitable for
The most convenient representation of connection weights
which class of error (fitness) function is an important
is, from evolutionary algorithm’s perspective, binary
research topic of general interest. The efficiency of
string. In such a representation scheme, each connection
evolutionary training can be improved significantly by
weight is represented by a number of bits of a certain
incorporating a local search procedure into the evolution,
length. An artificial neural network is encoded by
i.e., combining evolutionary algorithm’s global search
concatenation of all the connection weights of the network
ability with local search’s ability to fine tune. Evolutionary
into the chromosome. The order of the concatenation is,
algorithms can be used to locate a good region in the space
however, essentially ignored, although it can affect the
and a local search procedure is used to find a near-optimal
performance of evolutionary training, e.g. training time
solution in this region. The obtained results showed that
and accuracy. The advantages of the binary representation
the hybrid GA/BP approach was more efficient than if
lie in its simplicity and generality. It is straightforward to
either the genetic or backpropagation algorithm alone
apply classical crossover (such as one-point or uniform
were used, because genetic algorithms are much better at
crossover) and mutation to binary strings. A limitation of
local good initial weights than the random start
binary representation is the representation precision of
backpropagation method. Similar work on the evolution of
discrete connection weights. It is still an open question
initial weights has also been done on competitive learning
how to optimize the number of bits for each connection
neural networks and Kohonen networks [3]. One of the
weight, the range encoded, and the encoding method used
problems faced by evolutionary training of artificial neural
although dynamic techniques could be adopted to alleviate
networks is the permutation problem [4], also known as
the problem.
the competing convention problem. It is caused by the
IJCSI International Journal of Computer Science Issues, Vol. 7, Issue 2, No 4, March 2010 33
www.IJCSI.org
many-to-one mapping from the representation (genotype) The surface is complex and noisy since the mapping
to the actual artificial neural network (phenotype) since from an architecture to its performance is indirect
two artificial neural networks that order their hidden and dependent on the evaluation method used.
neurons differently in their chromosomes will still be The surface is deceptive since similar architectures
equivalent functionally. The permutation problem makes may have quite different performance.
the crossover operator very inefficient and ineffective in The surface is multimodal since different
producing good offspring. It is generally very difficult to architectures may have similar performance.
apply crossover operators in evolving connection weights
since they tend to destroy feature detectors found during
the evolutionary process, because hidden nodes are in Decoding
essence feature extractors and detectors.
offspring
time of evolution. In [8] is shown that the length of the network. In this scheme the genotype is a collection of
genotype is proportional to the complexity of the rules governing the process of cell divisions (a single cell
corresponding phenotype, and the space to be searched by is replaced by two "daughter" cells) and transformations
the evolutionary process increases exponentially with the (new connections can be added and the strengths of the
size of the network. Another problem of direct encoding connections departing from a cell can be modified). In this
schemes is the impossibility to encode repeated structures model, therefore, connection links are established during
(such as network composed of several sub-networks with the cellular duplication process. This mechanism allows
similar local connectivity) in a compact way. In one-to- the genotype-to-phenotype process to produce repeated
one mappings, in fact, elements that are repeated at the phenotype structures (e.g. repeated neural sub-networks)
level of the phenotype must be repeated at the level of the by reusing the same genetic information, which saves
genotype as well. The next problem to be considered in space in genome and it is useful even for keeping the
binary representation is that of variable-length genomes. substructures when applying the crossover operator. The
For example, where two parents have different topologies, literature [10] introduces a new algorithm based on Gene
it is not obvious how their offspring should be formed. Expression Programming that performs a total network
When determining which nodes and connections the induction using linear chromosomes of fixed length that
offspring should inherit, it would be helpful to know map into complex neural networks of different sizes and
which subnetworks from the parents perform the same shapes. The total induction of neural networks using gene
functions, and which represent disjoint concepts. expression programming requires further modification of
Unfortunately, this information is not readily apparent the structural organization developed to manipulate
from the two different topologies. Handling a binary numerical constants and domain-specific operators. The
representation we strive to take advantages from indirect encoding scheme is biologically more plausible as
combining different solutions, while we draw attention to well as more practical, from the view point of engineering,
the confrontation between the flexibility of representations than the direct encoding scheme although some fine-
and compatibility of genotypes. tuning algorithms might be necessary to further improve
the result of evolution. Other techniques of indirect neural
In order to reduce the length of the genotype network encoding topology are listened in numerous
representation of architectures, we can use the indirect publications, e.g. [11, 12, 13].
encoding scheme, where only the most important
characteristics of architecture are encoded in the The representation of artificial neural network
chromosome. The details about each connection in an architectures always plays an important role in the
artificial neural network is either predefined according to evolutionary design of architectures. There is not a single
prior knowledge or specified by a set of deterministic method, which outperforms others in all aspects. The best
developmental rules. Many of the indirect neural networks choice depends heavily on applications at hand and
encoding strategies are inspired by the Lindenmayer available prior knowledge. A problem closely related to
systems. The typical approach of such encoding is a the representation issue is the design of genetic operators.
grammatical encoding [8], where evolutionary algorithms However, the use of crossover appears to be inconsistent,
do not develop network architecture directly, but rules of because crossover works the best when building blocks
formal grammatics, being subsequently used for exist but it is unclear what a building block might be in an
generating the network topology. The shift from the direct artificial neural network since the artificial neural
optimization of architectures to the optimization of networks are featured with a distributed (knowledge)
developmental rules has brought some benefits, such as representation. The knowledge in an artificial neural
more compact genotype representation. The rules do not network is distributed among all the weights in the
grow with the size of artificial neural networks, since the artificial neural network. Recombining one part of an
rule size does not change. The rule is usually described by artificial neural network with another part of another
a recursive equation or a generation rule is similar to a artificial neural network is likely to destroy both artificial
production rule in a production system with a left-hand neural networks. However, if artificial neural networks do
side and a right-hand side. In [9], a genetic encoding not use a distributed representation but rather a localized
scheme for neural networks based on a cellular duplication one, such as radial basis function networks or nearest-
and differentiation process was proposed. Genomes are neighbor multilayer perceptrons, crossover might be a
programs written in a specialized graph transformation very useful operator [14]. In general, artificial neural
language called the grammar tree, which is very compact. networks using distributed representation are more
The genotype-to-phenotype mapping starts with a single compact and have a better generalization capability for
cell that undergoes a number of duplication and most practical problems. As for the evolution of
transformation processes ending up in a complete neural connection weights, thus even here we have to resolve the
IJCSI International Journal of Computer Science Issues, Vol. 7, Issue 2, No 4, March 2010 35
www.IJCSI.org
permutation problem that causes enormous redundancy in network without any weight information) fitness. There
the architecture space. Unfortunately, no satisfying are two major sources of noise [15]:
technique has been implemented to tackle this problem. The first source is the random initialization of the
An advantage of the evolutionary approach is that the weights. Different random initial weights may
fitness function can be defined easily in such a way that an produce different training results. Hence, the same
artificial neural network with some special features is genotype may have quite different fitness due to
evolved. For example, artificial neural networks with a different random initial weights used in training.
better generalization can be obtained if testing results, The second source is the training algorithm.
instead of training results, are used in their fitness Different training algorithms may produce different
calculations. A penalty term in the fitness function for training results even from the same set of initial
complex connectivity can also help improve artificial weights. This is especially true for multimodal error
neural network’s generalization ability, besides the cost functions.
benefit, by reducing the number of neurons and
connections in artificial neural networks. In order to reduce such noise, architecture usually has to
be trained many times from different random initial
The discussion on the evolution of architectures so far weights. The average result is then used to estimate the
only deals with the topological structure of architecture. genotype’s mean fitness. This method increases the
The transfer function of each neuron in the architecture computation time for fitness evaluation dramatically. It is
has been assumed to be fixed and predefined by human one of the major reasons why only small artificial neural
experts yet. The transfer function has been shown to be an networks were evolved in this way. In essence, the noise is
important part of artificial neural network architecture and caused by the one-to-many mapping from genotypes to
have significant impact on artificial neural network’s phenotypes. It is clear that the evolution of architectures
performance. In principle, transfer functions of different without any weight information has difficulties in
neurons in an artificial neural network can be different and evaluating fitness accurately. One way to alleviate this
decided automatically by an evolutionary process, instead problem is to evolve artificial neural network architectures
of assigned by human experts. The decision on how to and connection weights simultaneously [17]. In this case,
encode transfer functions in chromosome depends on how each individual in a population is a fully specified
much prior knowledge and computation time is available. artificial neural network with complete weight
In general, neurons within a group, like a layer, in an information. Since there is a one-to-one mapping between
artificial neural network tend to have the same type of a genotype and its phenotype, fitness evaluation is
transfer function with possible difference in some accurate.
parameters, while different groups of neurons might have
different types of transfer functions. This suggests some 2.4 The evolution of learning rules
kind of indirect encoding method, which lets
developmental rules to specify function parameters if the An artificial neural network training algorithm may have
function type can be obtained through evolution, so that different performance when applied to different
more compact chromosomal encoding and faster evolution architectures. The design of training algorithms, more
can be achieved. fundamentally the learning rules used to adjust connection
weights, depends on the type of architectures and learning
2.3 Simultaneous evolution of architectures and tasks under investigation. After selecting a training
connection weights algorithm, there are still algorithm parameters, like the
learning rate and momentum in backpropagation
The evolutionary approaches discussed so far in designing algorithms, which have to be specified. For example
artificial neural network architecture evolve architectures genetic algorithms are suitable for training artificial neural
only, without any connection weights. Connection weights networks with feedback connections and deep feedforward
have to be learned after a near-optimal architecture is artificial neural networks (with many hidden layers) while
found. This is especially true if one uses the indirect backpropagation is good at training shallow ones. At
encoding scheme of network architecture. One major present, this kind of search for an optimal (near optimal)
problem with the evolution of architectures without learning rule can only be done by some experts through
connection weights is noisy fitness evaluation [15, 16]. In their experience and trial-and-error. In fact, what is needed
other words, fitness evaluation is very inaccurate and from an artificial neural network is its ability to adjust its
noisy because a phenotype’s (i.e., an artificial neural learning rule adaptively according to its architecture and
network with a full set of weights) fitness was used to the task to be performed. Since evolution is one of the
approximate its genotype’s (i.e., an artificial neural most fundamental forms of adaptation, then said evolution
IJCSI International Journal of Computer Science Issues, Vol. 7, Issue 2, No 4, March 2010 36
www.IJCSI.org
may contribute to the development of appropriate type of where t is time, w is the weight change, x1,x2,...,xn are
the learning rule for given application; for which also the local variables, and the ’s are real-valued coefficients,
fact may be utilized that the relationship between which will be determined by evolution. In other words, the
evolution and learning is extremely complex. Various evolution of learning rules in this case is equivalent to the
models have been proposed, but most of them deal with evolution of real-valued vectors of ’s. The major aim of
the issue of how learning can guide evolution and the the evolution of learning rules is to decide these
relationship between the evolution of architectures and coefficients. Different ’s determine different learning
that of connection weights. Research into the evolution of rules. Due to a large number of possible terms in (3),
learning rules is still in its early stages, see e.g. [18, 19]. which would make evolution very slow and impractical,
This research is important not only in providing an only a few terms have been used in practice according to
automatic way of optimizing learning rules and in some biological or heuristic knowledge [22]. Research
modeling the relationship between learning and evolution, related to the evolution of learning rules is also included in
but also in modeling the creative process since newly [23, 24], although they did not evolve learning rules
evolved learning rules can deal with a complex and explicitly. Researchers emphasized the crucial role of the
dynamic environment. environment in which the evolution occurred.
(finding an optimal solution). Such designed artificial [13] K.O. Stanley, and R.Miikkulainen, “Evolving adaptive
neural networks have been shown to be quite competitive neural networks with and without adaptive synapses”. In
in terms of the quality of solutions found and the Proceedings of the Genetic and Evolutionary Computation
computational cost. With the increasing power of parallel Conference (GECCO-2002) Workshop Program. San
Francisco, CA: Morgan Kaufmann, 2002.
computers, the evolution of large artificial neural networks [14] J. Zhang, H. S. Chung, and J. Zhong, “Adaptive crossover
becomes feasible. Not only can such evolution discover and mutation in genetic algorithms based on clustering
possible new artificial neural network architectures and technique”. In H. Beyer, (ed.) Proceedings of the 2005
learning rules, but it also offers a way to model the Conference on Genetic and Evolutionary Computation
creative process as a result of artificial neural network’s GECCO '05. ACM, New York, NY,2005, pp.1577-1578.
adaptation to a dynamic environment. [15] X. Yao, G.Lin, and Y. Liu, “A new evolutionary system for
evolving artificial neural networks”. IEEE Trans. Neural
Networks, vol. 8, 1997, pp. 694–713.
References [16] X. Yao, “Evolving artificial neural networks”. In
[1 L. V. Fausett, Fundamentals of Neural Networks. Prentice-
Proceedings of the IEEE 89 (9) 1999, pp.1423-1447,.
Hall, Inc., Englewood Cliffs, New Jersey 1994.
[17] P. A. Castillo, J. J. Merelo, M. G. Arenas, and G. Romero,
[2] A.Molfetas, G.Bryan, „Structured genetic algorithm
“Comparing evolutionary hybrid systems for design and
representations for neural network evolution“. In
optimization of multilayer perceptron structure along training
Proceedings of the 25th IASTED International Multi –
parameters”, Information Sciences, Vol 177 (14) 2007,
Conference Artificial intelligence and applications, Austria
pp.2884-2905.
2007, pp. 486-491.
[18] B. Liu, H. A. Abbass, and B. McKay, “Classification rule
[3] R.Xu, D. Wunsch, “Survey of Clustering Algorithms”, IEEE
discovery with ant colony optimization” In the IEEE/WIC
Transactions on Neural Networks, Vol. 16(3) 2005, pp. 645-
International Conference on Intelligent Agent Technology
678.
IAT 2003, Halifax, Canada, 2003.
[4] K. O. Stanley, “Comparing Artificial Phenotypes with
[19] K. Shafi, H. Abbass, and W. Zhu “Real time signature
Natural Biological Patterns”. In Proceedings of the Genetic
extraction during adaptive rule discovery using UCS”, In
and Evolutionary Computation Conference (GECCO)
IEEE Congress on Evolutionary Computation (CEC),
Workshop Program. New York, NY: ACM Press, 2006.
Singapore, 2007.
[5] G. F. Miller, P. M. Todd, and S. U. Hedge, “Design neural
[20] R. A. Jacobs, “Increased rates of convergence through
networks using genetic algorithms”. In Schaffer, J. D. (ed.)
learning rate adaptation”. Neural Networks, vol. 1, no. 3,
Proceedings of the Third International Conference on
1988, pp. 295–307.
Genetic Algorithms, Morgan Kaufmann, San Mateo, CA
[21] X. Yao, “Evolutionary artificial neural networks.” In Kent,
1989, pp. 379-384.
A., and Williams, J. G. (eds.) Encyclopedia of Computer
[6] P. G. Benardos, and G. Vosniakos, “Optimizing feedforward
Science and Technology, vol. 33, New York: Marcel Dekker,
artificial neural network architecture”. Eng. Appl. Artif.
1995, pp. 137–170.
Intell. Vol. 20 (3) 2007, pp.365-382.
[22] D. J. Chalmers, “The evolution of learning: An experiment
[7] M. H. Luerssen, “Graph grammar encoding and evolution of
in genetic connections”. In Touretzky, D.S., Eltman, J.L.,
automata networks”. In V. Estivill-Castro, (ed.) Proceedings
Sejnowski, T.J., and Hinton, G.E. (eds.) Proceedings of the
of the Twenty-Eighth Australasian Conference on Computer
1990 Connectionist Model Summer School. Morgan
Science - Volume 38, Darlinghurst, Australia 2005, pp.229-
Kaufmann, San Mateo, CA 1990.
238,.
[23] S. Nolfi, J. L. Elman, and D. Parisi, Learning and evolution
[8] H. Kitano, “Designing neural networks using genetic
in neural networks. Center Res. Language, Univ. California,
algorithms with graph generation system”, Complex
San Diego, July Tech. Rep. CRT-9019, 1990.
Systems, 4, 1990, pp. 461-476.
[24] Parisi, F. Cecconi, and S. Nolfi, “Econets: neural networks
[9] F.Gruau, D. Whitley, and L.Pyeatt, “A comparison between
that learn in an environment”. Network, 1, 1990, pp. 119-
cellular encoding and direct encoding for genetic neural
168.
networks”. In Koza, J. R. Goldberg, D. E. Fogel, D. B., and
Riolo, R. L. (eds.) Genetic Programming 1996: Proceedings
of the First Annual Conference, Cambridge, MA, MIT Press, Eva Volna She graduated at the Slovak Technical University in
1996, pp. 81-89. Bratislava and defended PhD. thesis with title “Modular Neural
[10] Ferreira, C. “Gene expression programming: A new Networks”. She has been working as a lecturer at the Department
adaptive algorithm for solving problems”. Complex Systems, of Computer Science, University of Ostrava (Czech Republic) from
13 (2): 87-129, 2001. 1992. Her interests include artificial intelligence, artificial neural
[11] R. Miikkulainen, “Evolving neural networks”. In networks, evolutionary algorithms, and cognitive science. She is
Proceedings of the 2007 GECCO Conference Companion on author more than 50 scientific publications.
Genetic and Evolutionary Computation GECCO '07. ACM,
New York, NY, 2007, pp. 3415-3434.
[12] S. Nolfi, D. Parisi, „Evolution of artificial neural networks”.
In M. A.Arbib (Ed.), Handbook of brain theory and neural
networks, Second Edition. Cambridge, MA: MIT Press,
2002, pp. 418-421.