You are on page 1of 18

International Journal of Approximate Reasoning 54 (2013) 1434–1451

Contents lists available at ScienceDirect

International Journal of Approximate Reasoning


www.elsevier.com/locate/ijar

An efficient multi-objective evolutionary fuzzy system for


regression problems
Michela Antonelli, Pietro Ducange, Francesco Marcelloni ∗
Dipartimento di Ingegneria dell’Informazione, University of di Pisa, 56122, Pisa, Italy

a r t i c l e i n f o a b s t r a c t

Article history: During the last years, multi-objective evolutionary algorithms (MOEAs) have been exten-
Received 22 March 2013 sively employed as optimization tools for generating fuzzy rule-based systems (FRBSs) with
Received in revised form 14 June 2013 different trade-offs between accuracy and interpretability from data. Since the size of the
Accepted 17 June 2013
search space and the computational cost of the fitness evaluation depend on the number of
Available online 26 June 2013
input variables and instances, respectively, managing high-dimensional and large datasets
Keywords: is a critical issue.
Multi-objective evolutionary fuzzy systems In this paper, we focus on MOEAs applied to learn concurrently the rule base and the
High-dimensional datasets data base of Mamdani FRBSs and propose to tackle the issue by exploiting the synergy
Large datasets between two different techniques. The first technique is based on a novel method which
Regression problems reduces the search space by learning rules not from scratch, but rather from a heuristically
generated rule base. The second technique performs an instance selection by exploiting a
co-evolutionary approach where cyclically a genetic algorithm evolves a reduced training
set which is used in the evolution of the MOEA.
The effectiveness of the synergy has been tested on twelve datasets. Using non-parametric
statistical tests we show that, although achieving statistically equivalent solutions, the
adoption of this synergy allows saving up to 97.38% of the execution time with respect to
a state-of-the-art multi-objective evolutionary approach which learns rules from scratch.
© 2013 Elsevier Inc. All rights reserved.

1. Introduction

Mamdani fuzzy rule-based systems (MFRBSs) [1,2] are widely used in different engineering fields such as control, pattern
recognition, system identification, and signal analysis, thanks to their capability of explaining how they elaborate the input
values for producing an output. An MFRBS consists of a completely linguistic rule base (RB), a database (DB) containing the
fuzzy sets associated with the linguistic terms used in the RB and a fuzzy logic inference engine. RB and DB compose the
knowledge base (KB) of the MFRBS.
The most natural approach to build an MFRBS is to elicit the knowledge from a human expert and to codify this knowl-
edge in the KB, but sometimes the complexity of the application domain can make this approach hardly viable. Thus, several
methods have been proposed in the literature to generate the KB from data (typically expressed as input-output patterns).
When MFRBSs are generated with the only objective of maximizing the accuracy, they are generally characterized by a high
number of rules and by linguistic fuzzy partitions with a low level of comprehensibility, thus loosing that feature which
may make MFRBSs preferable to other approaches, namely their interpretability [2–5]. Thus, in the last years, the generation
of MFRBSs from data has been modelled as a multi-objective optimization problem, taking accuracy and interpretability as

* Corresponding author. Tel: +39 0502217678; fax: +39 0502217600.


E-mail address: f.marcelloni@iet.unipi.it (F. Marcelloni).

0888-613X/$ – see front matter © 2013 Elsevier Inc. All rights reserved.
http://dx.doi.org/10.1016/j.ijar.2013.06.005
M. Antonelli et al. / International Journal of Approximate Reasoning 54 (2013) 1434–1451 1435

the objectives to be optimized. Multi-objective evolutionary algorithms have been so widely used as optimization technique
in this framework that the term multi-objective evolutionary fuzzy systems (MOEFSs) has been coined [2,6,7].
In the first MOEFSs proposed in the literature, multi-objective evolutionary algorithms have been used to select [8,9] or
learn [10–12] rules, and to perform the tuning [13] of the DB with prefixed DB and RB, respectively. On the other hand, the
most recent MOEFSs perform the learning [14–18] or the selection [19–22] of the rules concurrently with the learning of
some elements of the DB, namely the granularity and the membership function parameters of the fuzzy partitions. While
in rule selection, rules are selected from an RB generated from data by some heuristic, in rule learning, rules are created
during the evolutionary process.
Two main drawbacks limit the effective use of the MOEFSs. First, the size of the search space grows with the increase of
the number of input variables, thus leading to a slow and possibly difficult convergence of the evolutionary algorithms. This
problem is particularly relevant in MOEFSs which employ rule learning [10,11,18]. Second, the computational cost of the
fitness evaluation increases linearly with the increase of the number of instances in the dataset [14,23,24], thus obliging to
limit the number of evaluations especially when the dataset is large. In the literature, the first drawback has been tackled
in three different ways: (i) pre-processing the datasets by applying a feature selection algorithm [25,26], (ii) exploiting
ad-hoc modified multi-objective evolutionary algorithms [23], and/or (iii) using a reduced number of parameters for the DB
learning or tuning [14,23].
Three main approaches have been proposed to manage the second drawback, namely parallel implementation of the evo-
lutionary algorithms [27–29], fitness approximation approaches [30,31] and data reduction techniques [32,33]. In particular,
the last approach speeds up the evaluation of the fitness function by using a reduced training set (TS). Instance selection
is one of the most used data reduction techniques [32]. There are two main strategies in instance selection: prototype
selection and TS selection (see [33] for a recent review and taxonomy of instance selection techniques).
Prototype selection performs an instance removal from the TS so as to retain only those instances which allow the
1-NN classifier to achieve the maximum classification rate. TS selection aims to identify a reduced set of representative
instances: unlike prototype selection, which is specifically targeted to the nearest neighbor classification, the representative
instances extracted by TS selection can be used in different machine learning algorithms for different applications such as
regression, classification, subgroup discovery and clustering. The work in [33] has highlighted that evolutionary outperforms
non-evolutionary instance selection in terms of both instance reduction rate and classification accuracy. In [24] we have
introduced a co-evolutionary algorithm to concurrently perform the evolutionary instance selection and the multi-objective
evolutionary learning of MFRBSs. We have demonstrated that the co-evolutionary algorithm allows us to save up to 86% of
the execution time without any statistically relevant loss of accuracy.
In this paper, we propose a new method to perform efficient multi-objective evolutionary learning of MFRBSs from
high-dimensional and large datasets. To the best of our knowledge, only another MOEFS has been recently proposed for
dealing with high-dimensional and large datasets [23]. This MOEFS performs a multi-objective evolutionary DB learning with
an embedded RB generation. To this aim, it employs a chromosome that codifies the number of involved input variables
(10 in the initial population) and, for each linguistic variable, the granularity and a lateral displacement of the overall fuzzy
partition with respect to an initial position. The RB is generated by using a modified version of the Wang and Mendel
algorithm [34], which uses a rule-cropping mechanism to speed-up the RB generation: this mechanism allows generating
50 rules at most. The multi-objective optimization is carried out by using an evolutionary algorithm based on SPEA2, which
embeds both incest prevention and restarting mechanisms. A post-processing stage, consisting of a multi-objective rule
selection and membership function parameter tuning, improves the accuracy of the generated solutions. In order to deal
with large datasets, a method for estimating the training error from a reduced subset of the instances is also adopted.
Unlike the approach in [23], which adopts a number of techniques (input variable selection, a unique parameter for tuning
all the membership functions of a partition, a rule-cropping mechanism for speeding up the RB generation, instance set
selection, etc.) for managing high-dimensional and large datasets, we adopt two different strategies that work in synergy.
We manage the high-dimensionality issue by learning the rules in a constrained search space: during the evolutionary
process, we select a reduced number of rules from a heuristically generated RB and, unlike classical rule selection, a reduced
number of conditions for each selected rule. Thus, this approach can be considered a rule learning since rules are gener-
ated during the evolutionary process, although choosing from a pre-defined set of conditions. We exploit the Wang and
Mendel algorithm [34] to generate the initial RB and a modified version of the classical (2 + 2)PAES [35] as multi-objective
evolutionary algorithm. In the following, we denote the approach as PAES-RCS, where RCS stands for rule and condition
selection.
The modified version of the (2 + 2)PAES has proved to be very effective in our previous works on MOEFSs based on RB
learning [11,14–17]. In particular, during the evolutionary process, PAES-RCS concurrently learns the RB and membership
function parameters of the linguistic variables by minimizing the prediction error and the RB complexity. We adopt a chro-
mosome consisting of two parts that are used to codify the RB and the DB, respectively. The first part of the chromosome
codifies the set of rules selected from the initial RB and, for each selected rule, the selected conditions. The second part of
the chromosome is a real value vector which, for each linguistic variable, codifies how the variable is partitioned. In partic-
ular, each gene codifies the position of the core of a triangular fuzzy set. Further, we impose that each core coincides with
the left and right extremes of the supports of the right and left adjacent fuzzy sets, respectively. This reduces the number
of parameters, especially in case of high-dimensional data, and ensures to manage always strong fuzzy partitions during the
evolutionary process.
1436 M. Antonelli et al. / International Journal of Approximate Reasoning 54 (2013) 1434–1451

To deal with large datasets we integrate PAES-RCS into the co-evolutionary model for instance selection we proposed
in [24]. A single objective genetic algorithm (SOGA) and the PAES-RCS are cyclically executed in sequence for a pre-fixed
number of iterations: the SOGA evolves a reduced TS which is used in the evolution of PAES-RCS.
Actually, we have discussed a preliminary version of the RCS learning method in [36] and of the integration of PAES-RCS
into the co-evolutionary model in [37]. This current paper includes more details and discussions regarding the RCS approach
and shows a more complete and thorough experimental analysis. In particular, we discuss a number of simulations on a
higher number of real-world regression datasets (twelve instead of two), an extensive statistical study and an accurate
analysis of the computational times. Further, we include more details and discussions regarding the RCS approach.
To discuss the major strengths of our approach, we first verify that PAES-RCS converges more rapidly to statistically
equivalent Pareto fronts than the multi-objective evolutionary approach based on learning rules from scratch we have intro-
duced in [11] and successfully employed also in other works [14–17]. We denote this multi-objective evolutionary learning
as PAES-RL in the following. By using non-parametric statistical tests, we have proved that PAES-RCS after 50,000 fitness
evaluations achieves results statistically equivalent, in terms of hypervolume and epsilon dominance on the training set
and accuracy on the test set, to the ones generated by the PAES-RL after 300,000 fitness evaluations. Then, we show that
the co-evolutionary approach with PAES-RCS executed using 10% of the overall TS achieves results statistically equivalent
to the ones generated by executing PAES-RCS and PAES-RL with the overall TS for 50,000 and 300,000 fitness evaluations,
respectively. Thus, the reduction of the number of evaluations, obtained by the rule and condition selection technique, and
the use of the co-evolutionary approach, which allows reducing the computational time of the fitness evaluation by using
only 10% of the overall TS, permit us to save up to 97.38% of the execution time needed by PAES-RL.
The paper is organized as follows: Section 2 briefly describes the MFRBSs and introduces some notation. In Section 3
we introduce the new rule and condition selection approach. Section 4 describes the integration of PAES-RCS into the
co-evolutionary instance selection approach. Section 5 shows the experimental results and Section draws some conclusion.

2. Mamdani fuzzy rule-based systems

Let X = { X 1 , . . . , X F } be the set of input variables and X F +1 be the output variable. Let U f , with f = 1, . . . , F + 1, be the
universe of the f th variable X f . Let P f = { A f ,1 , . . . , A f , T f } be a strong fuzzy partition of T f fuzzy sets on variable X f . The
RB of an MFRBS is composed of M rules expressed as:

R m : If X 1 is A 1, jm,1 and . . . and X f is A f , jm, f and . . .


. . . and X F is A F , jm, F then X F +1 is A F +1, jm, F +1 (1)

where jm, f ∈ [1, T f ], f = 1, . . . , F + 1, identifies the index of the fuzzy set (among the T f linguistic terms of partition P f ),
which has been selected for X f in rule R m .
We adopt triangular fuzzy sets A f , j defined by the tuple (a f , j , b f , j , c f , j ), where a f , j and c f , j correspond to the left and
right extremes of the support of A f , j , and b f , j to the core. Further, we assume that a f ,1 = b f ,1 and b f , T f = c f , T f , and for
j = 2 . . . T f − 1, b f , j = c f , j −1 and b f , j = a f , j +1 .
To take the “don’t care” condition into account, a new fuzzy set A f ,0 is added to all the f input partitions P f , f =
1 . . . F . This fuzzy set is characterized by a membership function equal to 1 on the overall universe [38]. The terms A f ,0
allow generating rules which contain only a subset of the input variables. It follows that jm, f ∈ [0, T f ], f = 1, . . . , F , and
jm, F +1 ∈ [1, T F +1 ].
As stated in [11], the RB of an MFRBS can be completely described by a matrix J ∈ R ( Mx( F +1)) , where the generic element
indicates that fuzzy set A f , jm, f has been selected for variable X f in rule R m .

3. The rule and condition selection approach

Typically, in MOEFSs based on rule learning a chromosome codifies the overall RB: each gene in the chromosome identi-
fies the index of the fuzzy set selected for the corresponding linguistic variable in each rule of the RB. The dimension of the
chromosome, and consequently of the search space, increases with the increase of the number of the input variables. Thus,
when the number of the input variables is high, the multi-objective evolutionary algorithm generally needs a large amount
of evaluations to adequately explore the search space and therefore achieve good solutions.
Also in MOEFSs based on rule selection, a chromosome codifies the overall RB. Unlike MOEFSs based on rule learning,
however, each gene identifies a rule from an initial RB generated by exploiting some heuristic. Thus, MOEFSs based on
rule selection manage a smaller search space than MOEFSs based on rule learning. However, since heuristics, like the Wang
and Mendel algorithm, adopted to generate the initial RB do not use the “don’t care” term, rules always have a number of
conditions equal to the number of linguistic variables. Thus, when dealing with high-dimensional datasets, the generated
RBs result to be quite complex [19–21]. In [9,22], where a multi-objective evolutionary rule selection approach is applied
to classification problems, the authors deal with “don’t care” terms but they include these terms in the initial RB thanks to
specific RB generation heuristics. The work in [22] includes novel strategies for designing the semantic and syntactic of the
fuzzy linguistic terms.
M. Antonelli et al. / International Journal of Approximate Reasoning 54 (2013) 1434–1451 1437

Fig. 1. An example of the C RB part of a chromosome.

To exploit the advantages of both rule learning (potentially better trade-offs between accuracy and complexity) and rule
selection (reduced search space and consequently faster convergence), in PAES-RCS we propose to learn rules not from
scratch, but rather from a heuristically generated initial RB. In particular, we apply the Wang and Mendel algorithm to
the data for generating this RB by using a fuzzy uniform partition with T f fuzzy sets for each linguistic variable X f .
Then, during the evolutionary process, we learn RBs of MFRBSs by selecting rules and conditions from the initial RB, and
concurrently the corresponding DBs by determining the membership function parameters. In the following, we discuss in
detail the chromosome coding, the mating operators and the multi-objective evolutionary algorithm used to this aim.

3.1. The chromosome coding

PAES-RCS employs an appropriate chromosome coding and properly defined mating operators. In particular, chromosome
C is composed of two parts (C RB , C DB ), which define the RB and the membership function parameters of the input variables,
respectively. We indicate with J WM and M WM the initial RB generated by applying the Wang and Mendel algorithm to the
training set and the number of rules of this RB, respectively. We recall that the antecedents of each candidate rule have a
condition for each linguistic variable, since the Wang and Mendel algorithm performs no selection of the conditions. The
C RB part of the chromosome is a vector of M MAX pairs pm = (km , vm ), where km ∈ [0, . . . , M WM ] identifies the index of the
rule in J WM selected for the current RB and vm = [ v m,1 , . . . , v m, F ] is a binary vector which indicates, for each condition
in the rule, if the condition has to be preserved in the rule (v m, f = 1) or replaced by a “don’t care” condition (v m, f = 0).
If km = 0, the mth rule is not included in the RB. In this way we manage to generate RBs with a lower number of rules
than M MAX . As an example, let us consider M MAX = 3 and let us suppose to have a two-inputs fuzzy model with four rules
generated by the Wang and Mendel algorithm and described by the following J WM matrix:
⎡ ⎤
3 2 2
⎢5 4 1⎥
J WM ⎢
=⎣ ⎥ (2)
1 5 5⎦
2 2 4
Let us assume that, during the evolutionary process, the C RB chromosome part shown in Fig. 1 is generated.
Then, the corresponding RB will be represented by the following matrix J :

3 0 2
J= (3)
0 5 5
We note that, even though M MAX = 3, only two rules have been selected in the final RB. Further, the first and the second
conditions have been selected for the first and the third rules of the initial RB, respectively.
In order to perform the membership function parameters learning, we exploit the C DB part, which codifies the positions
of the centroids of each fuzzy set in each linguistic variable by using a real coding. Indeed, since we adopt strong fuzzy
partitions with, for j = 2 . . . T f − 1, b f , j = c f , j −1 and b f , j = a f , j +1 , in order to define each fuzzy set of the partition it
is sufficient to fix the positions of the cores b f , j along the universe U f of the f th variable (we normalize each variable
in [0, 1]). As b f ,1 and b f , T f coincide with the extremes of the universe, the partition of each linguistic variable X f is
completely defined by T f − 2 parameters. In our previous works [15–17], we have verified that this schema of DB coding
reduces the number of parameters with respect to, for instance, the classical three-point approach, without affecting the
modeling capability. Fig. 2 shows the chromosome part which consists of F + 1 vectors of real numbers: the f th vector
contains the [b f ,2 , . . . , b f , T f −1 ] cores which define the positions of the membership functions for the linguistic variable
X f . To ensure a good integrity level of the membership functions, in terms of order, coverage and distinguishability [3,5],
b f , j −b f , j −1 b f , j +1 −b f , j
∀ j ∈ [2, T f − 1], we force b f , j to vary in the definition interval [b f , j − 2
,b f,j + 2
].

3.2. The multi-objective evolutionary learning of the KB

As multi-objective evolutionary algorithm, PAES-RCS adopts the modified version of the classical (2 + 2)PAES we have
proposed in [11]. We denoted this version as (2 + 2)M-PAES, where M-PAES stands for modified PAES1 . Unlike classical
(2 + 2)PAES [35], which uses only mutation to generate new candidate solutions, (2 + 2)M-PAES exploits both crossover and
mutation. Further, in (2 + 2)M-PAES, current solutions are randomly extracted at each iteration rather than maintained until
they are replaced by solutions with particular characteristics.

1
Actually, in the specialized literature, the Memetic PAES proposed in [39] is also denoted as M-PAES. However, in our papers we have always referred
to the modified PAES as (2 + 2)M-PAES so as to avoid possible misunderstandings.
1438 M. Antonelli et al. / International Journal of Approximate Reasoning 54 (2013) 1434–1451

Fig. 2. The C DB part of a chromosome.

(2 + 2)M-PAES determines an approximation of the optimal Pareto front by concurrently minimizing the MSE/2 and the
RB complexity. The MSE/2 is calculated as:
|S|

1 l 2
MSE/2 = F x − yl , (4)
2 · |S|
l =1

where | S | is the size of the dataset, F (xl ) is the output obtained from the MFRBS when the lth input pattern is considered,
and yl is the desired output.
The RB complexity is measured as sum of the conditions which compose the antecedents of the rules included in the
RB. Thus, low values of complexity correspond to RBs characterized by a low number of rules and a low number of input
variables really used in each rule. In order to generate the offspring populations, we exploit both crossover and mutation.
We apply the one-point crossover to the C RB part and the BLX-α crossover, with α = 0.5, to the C DB part.
Let s1 and s2 be two selected parent chromosomes. As regard the crossover of the C RB part, we choose the common
gene by extracting randomly a number in [1, ρMAX − 1], where ρMAX is the maximum number of rules in s1 and s2 . The
crossover point is always chosen between two rules and not within a rule. When we apply the one-point crossover to the
RB part, we can generate an RB with one or more pairs of equal rules. In this case, we simply eliminate one of the rules
from each pair setting the corresponding km to zero. This allows us to reduce the total number of rules.
As regards the mutation of the C RB part, we have defined two operators. The first step of each operator is to randomly
select a rule (i.e., a pair pm = (km , vm )) in the chromosome. The first operator replaces the value of km in the selected pair
with an integer value randomly generated in [1, . . . , M WM ]. If the old value of km was equal to zero, the new chromosome
will contain an additional rule. The second operator modifies the antecedent vm of the selected rule by complementing
each gene v m, f with a probability equal to P cond ( P cond = 2/ F in the experiments). The two operators are applied with
two different probabilities: P MRB1 and P MRB2 . After applying the two mutation operators, we check the chromosome for
duplicate rules in the RB.
The mutation operator applied to C DB , first, randomly chooses a variable X f , f ∈ [1, F + 1], and a fuzzy set j ∈ [2, T f − 1]
and then replaces the value of b f , j with a value randomly chosen within its definition interval.
Fig. 3 shows a pseudo-code which describes the application scheme of the different operators to generate the offspring
solutions o1 and o2 from the selected parents s1 and s2 . Note that P CRB and P CDB represent the probabilities of applying
the crossover operator on C RB and C DB parts, respectively, while P MDB represents the probability of applying the mutation
operator to C DB . The values of the probabilities used in the experiments are reported in the experimental part.
At the beginning, we generate two solutions s1 and s2 . The genes of the C DB part are randomly generated, while for
the C RB part, we randomly generate only the km values, and we initialize v m = 1 for all the antecedents of all the rules. At
each iteration, the application of crossover and mutation operators produces two new candidate solutions from the current
solutions s1 and s2 . These candidate solutions are added to the archive only if they are dominated by no solution contained
in the archive; possible solutions in the archive dominated by the candidate solutions are removed. Typically, the size of
the archive is fixed at the beginning of the execution of the (2 + 2)M-PAES. In this case, when the archive is full and a
new solution z has to be added to the archive, if it dominates no solution in the archive, then we insert z into the archive
and remove the solution (possibly z itself) that belongs to the region with the highest crowding degree [35]. If the region
contains more than one solution, then, the solution to be removed is randomly chosen.

4. The co-evolutionary approach

In order to deal with datasets characterized by a huge number of instances, we integrate PAES-RCS into the co-
evolutionary approach we proposed in [24]. Fig. 4 shows the execution flow of the co-evolutionary approach by using
the UML activity diagram notation. In the activity diagram, rectangles and rounded rectangles represent data objects and
activities, respectively, and diamonds are used as decision and merge points. In the execution of PAES-RCS, periodically, an
SOGA evolves for a fixed number E SO of fitness evaluations a population of reduced TSs. The SOGA aims to maximize a
purposely-defined index which measures how much the Pareto fronts computed by using, respectively, the reduced TS and
the overall TS are close to each other: the closer the fronts are, the more the reduced TS is representative of the overall TS.
M. Antonelli et al. / International Journal of Approximate Reasoning 54 (2013) 1434–1451 1439

Fig. 3. Application scheme of the evolutionary operators in PAES-RCS.

The reduced TS with the highest fitness value is used in the PAES-RCS to perform the multi-objective learning of a popula-
tion of KBs. PAES-RCS is executed for a fixed number E MO of KB fitness evaluations. The resulting approximated Pareto front
(set of non-dominated KBs with different accuracy-complexity tradeoffs) is used in the execution of the SOGA to calculate
the fitness function. At the end of its execution, the co-evolutionary approach returns a set of non-dominated KBs in the
complexity-accuracy plane. In [24], we have demonstrated the effectiveness of the co-evolutionary approach by using, in
place of PAES-RCS, the RB learning we introduced in [11].
In order to deal with a huge number of instances, we split the TS into B disjoint blocks where each block contains
N / B instances randomly extracted from the TS. The chromosome C TS of the SOGA population codifies the set of K blocks
selected for the reduced TS: each gene is an integer which varies from 1 to B and represents the index of the block selected
to be included in the reduced TS. Individuals are selected for reproduction by using the roulette wheel selection. Classical
one-point crossover and uniform mutation operators are applied with probabilities P cTS and P m TS
, respectively. Further, we
use an elitist strategy where the worst individuals of the offspring population are replaced with the best individuals of the
parent population with a percentage of 5%. In our experiments, the values of B and K have been determined by fixing
the number I of instances contained in a block and the desired percentage of instances of the overall TS contained in the
reduced TS, respectively. More details regarding the co-evolutionary approach can be found in [24].

5. Experimental results

5.1. Experimental set up

We tested PAES-RCS on twelve regression problem datasets extracted from the Keel (available at http://sci2s.ugr.es/keel/
datasets.php) and Torgo’s (available at http://www.liaad.up.pt/~ltorgo/Regression/DataSets.html) repositories.
As shown in Table 1, the datasets are characterized by different numbers (from 7 to 40) of input variables and different
numbers (from 4052 to 40768) of input/output instances. In the following subsections, first we show how the constrained
RCS learning proposed in this paper speeds up the convergence with respect to classical rule learning. To this aim, we
compare the results obtained by PAES-RCS with the ones achieved by PAES-RL, a state-of-the-art MOEFS based on classical
rule learning we proposed in [11] and successfully used in our recent papers [15,17,16]. Both PAES-RCS and PAES-RL employ
the same multi-objective evolutionary algorithm, namely (2 + 2)M-PAES, and perform RB and DB learning. We point out
that the unique difference between the two MOEFSs lies on the generation of the RB: PAES-RCS uses a rule and condition
selection, while PAES-RL employs a classical rule learning approach. As regards the DB learning, both MOEFSs exploit the
approach described in Section 3.
In the experiments, we executed 50,000 and 300,000 fitness evaluations of PAES-RCS and PAES-RL, respectively. In the
comparison, we also consider the results generated by PAES-RL after 50,000 fitness evaluations. We denote PAES-RL executed
for 50,000 and 300,000 evaluations as PAES-RL50 and PAES-RL300, respectively.
Table 2 shows the values used in the experiments for the parameters of PAES-RCS. For each dataset and for each algo-
rithm, we carried out a five-fold cross-validation and executed six trials for each fold with different seeds for the random
function generator (30 trials in total). As regards PAES-RL we employed the same values of the parameters used in [24]. In
1440 M. Antonelli et al. / International Journal of Approximate Reasoning 54 (2013) 1434–1451

Fig. 4. Activity diagram of the co-evolutionary approach.

particular, the archive size, the number of fuzzy sets for each variable, the minimum and maximum numbers of rules are
the same for both PAES-RCS and PAES-RL.
To compare PAES-RCS with PAES-RL we adopt two well-known indicators, the epsilon dominance and the hypervolume
(see the review in [40] on the performance assessment of multi-objective stochastic optimizers for a detailed explanation
of these indicators). The aim of this analysis is to show that PAES-RCS, though performing only 50,000 fitness evaluations,
achieves Pareto front approximations comparable with the ones achieved by PAES-RL300, thus confirming the effectiveness
of the RCS approach in speeding up the convergence. Further, the solutions are also compared in terms of accuracy and
complexity by using the procedure adopted in our previous papers [14,15,24].
Once shown that the solutions achieved by PAES-RCS and PAES-RL after 50,000 and 300,000 evaluations, respectively,
are statistically equivalent, we present the effectiveness of embedding the PAES-RCS into the co-evolutionary approach for
reducing the computational cost without deteriorating the quality of the solutions. To this aim, we compare the set of so-
lutions generated, using the overall TS, by PAES-RCS and PAES-RL after 50,000 and 300,000 fitness evaluations, respectively,
with the ones generated, using the 10% of the overall TS, by PAES-RCS embedded in our co-evolutionary approach (denoted
as PAES-RCS(10%) in the following) after 50,000 fitness evaluations. In Table 3, we show the specific parameters used to ex-
ecute the co-evolution in PAES-RCS(10%). We chose the 10% of the overall TS because in [24] we showed that this reduction
is sufficient for achieving trade-off solutions statistically equivalent to the ones obtained by using the overall TS.
M. Antonelli et al. / International Journal of Approximate Reasoning 54 (2013) 1434–1451 1441

Table 1
Datasets used in the experiments (sorted for increasing numbers of input variables).

Dataset #Instance #Input variables Repository


Delta Ailerons (DA) 7129 6 Torgo
Delta Elevators (DE) 9517 6 Torgo
Analyzing Categorical Data (AN) 4052 7 Keel
Kinematics (KI) 8192 8 Torgo
Pumadyn (PM) 8192 8 Torgo
California Housing (CH) 20 460 8 Keel
Abalone (AB) 4177 9 Keel
MV Artificial Domain (MV) 40 768 10 Keel
House_16H (HO) 22 784 16 Keel
Elevators (EL) 16 559 18 Keel
Computer activity (CA) 8192 21 Keel
Ailerons (AI) 13 750 40 Keel

Table 2
Values of the parameters used in the experiments for PAES-RCS.

AS (2 + 2)M-PAES archive size 64


Tf Number of fuzzy sets in each variable X f , f = 1, . . . , F + 1 3
M MIN Minimum number of rules in an RB 5
M MAX Maximum number of rules in an RB 30
P CRB Probability of applying the crossover operator to C RB 0.2
P CDB Probability of applying the crossover operator to C DB 0.5
P MRB1 Probability of applying the first mutation operator to C RB 0.1
P MRB2 Probability of applying the second mutation operator to C RB 0.7
P MDB Probability of applying the mutation operator to C DB 0.2

Table 3
Values of the additional parameters used in the experiments for PAES-RCS(10%).

N TS SOGA population size 32


E SO Number of evaluations in the SOGA for each cycle 64
E MO Number of evaluations in the PAES-RCS block for each cycle 640
P cTS Probability of applying the crossover operator to C TS 0.1
TS
Pm Probability of applying the mutation operator to C TS 0.01
I Number of instances in a block 5

As regards the PAES-RCS(10%), we highlight that the fitness evaluations of both PAES-RCS and SOGA are computed by
using the reduced TS. Only when we switch from the execution of PAES-RCS to SOGA, we perform the evaluations of
each individual in the Pareto front approximation by using the overall TS. Thus, we have that PAES-RCS(10%) cyclically
executes E MO + E SO fitness evaluations with the reduced TS and N KB (at most, A S) evaluations with the overall TS, where
N KB is the number of solutions contained in the PAES-RCS archive when the switch is performed. Both PAES-RCS and
PAES-RCS(10%) stop their execution when the total number of KB evaluations performed by the PAES-RCS is equal to 50,000.
In the case of PAES-RCS(10%), therefore, to determine the stopping criterion, we do not consider the number of evaluations
performed by the SOGA using the reduced TS and the number of evaluations performed using the overall TS. Also in this
case, a five-fold cross validation is carried out and the three MOEFSs are compared in terms of hypervolume and epsilon
dominance indicators, and accuracy and complexity of the solutions. Further, we discuss and compare the execution times
of PAES-RL300, PAES-RCS, and PAES-RCS(10%) in order to highlight the percentage of time saved by using PAES-RCS in place
of PAES-RL300, and PAES-RCS(10%) in place of PAES-RCS and PAES-RL300. We point out that actually the synergy between
the RCS technique in rule learning and the co-evolutionary approach to instance selection allows coping successfully with
large and high-dimensional datasets.

5.2. Analysis of the convergence speed of PAES-RCS

To assess whether the constrained rule learning speeds up the convergence with respect to classical rule learning without
deteriorating the quality of the solutions, we compare the Pareto front approximations generated by PAES-RCS, PAES-RL50
and PAES-RL300 in terms of epsilon dominance and hypervolume indicators. In order to compute this indicator, the objective
space must be either bounded or a bounding reference point, that is (at least weakly) dominated by all the solutions, must
be defined. The computations of the two indicators were performed using the performance assessment package provided in
the PISA toolkit [41].
The Pareto front approximations generated by PAES-RCS, PAES-RL50 and PAES-RL300 are analyzed together. First, the
maximum values of MSE/2 and complexity among the 90 Pareto front approximations (30 per each algorithm) are computed
in order to obtain the bounds for normalizing in [0, 1] the objectives of each approximation. Then, the objectives are
1442 M. Antonelli et al. / International Journal of Approximate Reasoning 54 (2013) 1434–1451

Table 4
Results of the statistical tests on Epsilon Dominance and Hypervolume between PAES-RL300, PAES-RL50 and PAES-RCS.

Epsilon dominance
Algorithm Friedman rank Iman and Davenport p-value Hypothesis
PAES-RL50 2.75
PAES-RCS 1.916 4.19E−04 Rejected
PAES-RL300 1.333
Holm post-hoc procedure
i Algorithm z-value p-value alpha/i Hypothesis
2 PAES-RL50 3.47 5.20E−04 0.025 Rejected
1 PAES-RCS 1.429 1.53E−01 0.05 Not rejected
Hypervolume
Algorithm Friedman rank Iman and Davenport p-value Hypothesis
PAES-RL50 2.667 Rejected
PAES-RCS 1.917 3.91E-03
PAES-RL300 1.417
Holm post-hoc procedure
i Algorithm z-value p-value alpha/i Hypothesis
2 PAES-RL50 3.061 2.20E-03 0.025 Rejected
1 PAES-RCS 1.224 2.21E-01 0.05 Not rejected

normalised. The hypervolume is calculated by using (1, 1) as reference point. As a consequence of the normalization, the
values of the two indicators are normalized in [0, 1]. We recall that the approximated Pareto fronts are computed by using
the overall TS.
In order to verify if there exist statistical differences among the indicator values and, consequently, among the Pareto
front approximations generated by PAES-RCS, PAES-RL50 and PAES-RL300, we have performed a statistical analysis. As sug-
gested in [42], we have applied non-parametric statistical tests for multiple comparisons by combining all the datasets:
for each approach we have generated a distribution consisting of the mean values of the epsilon dominance and of the
hypervolume. We first have applied the Friedman test in order to compute a ranking among the distributions [43]. Then, we
have applied the Iman and Davenport test in order to evaluate whether there exist statistically relevant differences among
the mean values of the epsilon dominance and of the hypervolume computed for the three algorithms [44]. If there exist,
then we apply a post-hoc procedure, namely the Holm test [45]. This test allows detecting effective statistical differences
between the control approach, i.e. the one with the lowest Friedman rank, and the remaining approaches.
Table 4 shows the results of the non-parametric statistical tests on both indicators: for each indicator and for each
MOEFS, we show the Iman and Davenport p-value and the Friedman rank. If the p-value is lower than the level of significance
α (in the experiments α = 0.05), we can reject the null hypothesis and affirm that there exist statistical differences between
the multiple distributions, associated with each approach, of the epsilon dominance and/or of the hypervolume indicators.
Otherwise, no statistical difference exists among the distributions and therefore the three different MOEFSs evolve towards
similar Pareto front approximations. For both the indicators, the Iman and Davenport statistical hypothesis of equivalence
is rejected and so statistical differences among the three MOEFSs are detected. Thus, we have to apply the Holm post-hoc
procedure considering the PAES-RL300 as control algorithm (associated with the lowest rank and in bold in the table).
For both the indicators, we observe that the statistical hypothesis of equivalence cannot be rejected for PAES-RCS but it is
rejected for PAES-RL50.
By analysing the results of the statistical tests, we can deduce that: (i) PAES-RL does not converge to accurate Pareto front
approximations by performing only 50,000 fitness evaluations; (ii) the Pareto front approximations generated by PAES-RCS
are statistically equivalent, in terms of epsilon dominance and of hypervolume indicators, to those generated by PAES-RL300.
Since PAES-RCS performed just 16.67% of the fitness evaluations executed by PAES-RL300, this result shows the effectiveness
of PAES-RCS in exploring the search space. On the other hand, these results can be explained by analyzing the different size
of the search spaces handled by the two MOEFSs. As regards PAES-RCS, it selects rules and conditions of these rules from
an initial RB generated by the Wang and Mendel algorithm: thus, rules are learned in a very constrained space. In contrast,
PAES-RL learns the rules by exploring a space consisting of all the possible valid combinations of propositions which can be
generated by the linguistic values defined on the universes of the input and output variables. This space rapidly becomes
difficulty manageable with the increase of the number of input variables.
The analysis of the Pareto front approximations through the hypervolume and epsilon dominance indicators is useful
to show the effectiveness of the evolutionary process carried out by using PAES-RCS in terms of quality of Pareto fronts
computed on the TS. Since we employ MOEFSs to generate MFRBSs with different trade-offs between interpretability and
accuracy in regression problems, we are also interested in assessing the generalization capabilities of the generated MFRBSs.
To this aim, we use the procedure adopted in our previous papers [14,15,24]. The procedure is based on the analysis of
three representative solutions of the Pareto front approximations, namely the most accurate (denoted as FIRST), the least
accurate (denoted as LAST) and the median between the FIRST and the LAST (denoted as MEDIAN) solutions. In practice,
for each of the thirty trials, we compute the Pareto front approximations of each algorithm and order the solutions in
M. Antonelli et al. / International Journal of Approximate Reasoning 54 (2013) 1434–1451 1443

Table 5
Results of the statistical tests on the FIRST, MEDIAN and LAST solutions applied to the test sets of PAES-RL300, PAES-RL50 and PAES-RCS.

FIRST
Algorithm Friedman rank Iman and Davenport p-value Hypothesis
PAES-RL50 2.75
PAES-RCS 1.667 2.33E−03 Rejected
PAES-RL300 1.583
Holm post-hoc procedure
i Algorithm z-value p-value alpha/i Hypothesis
2 PAES-RL50 2.858 4.27E−03 0.025 Rejected
1 PAES-RCS 0.204 8.38E−01 0.05 Not rejected
MEDIAN
Algorithm Friedman rank Iman and Davenport p-value Hypothesis
PAES-RL50 2.75
PAES-RCS 1.749 1.78E−03 Rejected
PAES-RL300 1.5
Holm post-hoc procedure
i Algorithm z-value p-value alpha/i Hypothesis
2 PAES-RL50 3.061 2.20E−03 0.025 Rejected
1 PAES-RCS 0.612 5.40E−01 0.05 Not rejected
LAST
Algorithm Friedman rank Iman and Davenport p-value Hypothesis
PAES-RL50 3
PAES-RCS50 1.167 3.71E−10 Rejected
PAES-RL300 1.833
Holm post-hoc procedure
i Algorithm z-value p-value alpha/i Hypothesis
2 PAES-RL50 4.49 7.10E−06 0.025 Rejected
1 PAES-RL300 1.632 1.03E−01 0.05 Not rejected

each approximation for increasing MSE/2 values. Then, for each approximation, we select the first (the most accurate), the
median and the last (the least accurate) solutions (this is the reason why we denote the points as FIRST, MEDIAN and LAST).
Finally, for the FIRST, MEDIAN and LAST solutions, we compute the mean values over the 30 trials of the MSE/2 on the
training and test sets, and of the complexity. To provide the reader with a glimpse of the three representative solutions of
PAES-RCS, PAES-RL50 and PAES-RL300, in Figs. 5 and 6 we plot the mean values of the MSE/2 and the complexity for the
FIRST, MEDIAN and LAST solutions for all the datasets, on both the training and test sets.
From Figs. 5 and 6 we realize that PAES-RL300 generates more complex solutions except for datasets with a high num-
ber of input variables, namely EL, CA and AI. For almost all the datasets the Pareto front approximation generated by
PAES-RL50 is dominated by the Pareto front approximations generated by both PAES-RL300 and PAES-RCS, thus highlighting
that PAES-RCS generates better trade-off solutions than PAES-RL, both on training and test sets, when the same number
of fitness evaluations are performed. By carefully analyzing the plots, we can observe that the benefits of using PAES-RCS
become more and more evident when the number of input variables increases. Indeed, as regards MV, HO, EL, CA and AI,
the FIRST and MEDIAN points of PAES-RCS are characterized by a lower MSE/2 than PAES-RL300. On the other hand, if we
consider the other datasets, in most cases the FIRST and the MEDIAN solutions generated by PAES-RCS and PAES-RL300 are
non-dominated with each other.
Finally, comparing the results obtained on the training and test sets in all the datasets, we can state that PAES-RCS is
not affected from the overfitting problem. Indeed the values of the MSE/2 corresponding to the FIRST, MEDIAN and LAST
solutions computed on the training set are approximately equal to the ones calculated on the test set. In order to assess
if there exist statistical differences among the MSE/2 corresponding to the representative solutions generated by PAES-RCS
and the two versions of PAES-RL, also in this case, we have applied non-parametric statistical tests by combining all the
datasets: for each approach and for each of the three representative solutions we have generated a distribution consisting of
the mean values of the MSE/2. In order to evaluate the generalization capability of the MFRBSs generated by the different
algorithms, in Table 5 we show the results of the tests obtained on the test set.
For the FIRST, MEDIAN and LAST solutions, the Iman and Davenport test rejects the statistical hypothesis of equivalence.
Subsequently, the Holm post-hoc procedure has been performed by considering as control algorithms PAES-RL300 for the
FIRST and MEDIAN solutions and PAES-RCS for the LAST solutions. For all the three solutions, the statistical hypothesis of
equivalence is rejected only for PAES-RL50. Thus, we can conclude that PAES-RL300 and PAES-RCS result to be statistically
equivalent in terms of MSE/2.
The results on the test set confirm that PAES-RCS after 50,000 fitness evaluations generates on average solutions with
MSE/2 similar to the ones generated by PAES-RL after 300,000 evaluations and that the MSE/2 of the solutions achieved by
PAES-RL after 50,000 fitness evaluations are not statistically equivalent to the ones obtained by PAES-RCS and PAES-RL300.
1444 M. Antonelli et al. / International Journal of Approximate Reasoning 54 (2013) 1434–1451

Fig. 5. Plots of the FIRST, MEDIAN and LAST solutions onto the Complexity-MSE/2 plane (first six datasets).

5.3. Embedding PAES-RCS in the co-evolutionary approach

In this section we analyze the results obtained by embedding PAES-RCS in our co-evolutionary approach. In particular,
we aim to assess whether the MFRBSs generated by the co-evolutionary approach with 10% of the overall TS are equivalent
M. Antonelli et al. / International Journal of Approximate Reasoning 54 (2013) 1434–1451 1445

Fig. 6. Plots of the FIRST, MEDIAN and LAST solutions onto the Complexity-MSE/2 plane (last six datasets).

to the MFRBSs generated by both PAES-RCS and PAES-RL300 with the overall TS. We denote the co-evolutionary approach
with embedded PAES-RCS as PAES-RCS(10%) in the following.
First of all, we compare the Pareto front approximations achieved by the three MOEFSs in terms of hypervolume and
epsilon dominance. Then, we evaluate the generalization capabilities of the MFRBSs generated by the three MOEFSs by
analyzing the FIRST, MEDIAN and LAST solutions. To make the comparison sound, all the indicators are computed by using
the MSE/2 calculated on the overall TS. Thus, for the PAES-RCS(10%), we re-compute the MSE/2 of the solutions contained
1446 M. Antonelli et al. / International Journal of Approximate Reasoning 54 (2013) 1434–1451

Table 6
Results of the statistical tests on Epsilon Dominance and Hypervolume between PAES-RL300, PAES-RCS and PAES-RCS(10%).

Epsilon dominance
Algorithm Friedman rank Iman and Davenport p-value Hypothesis
PAES-RCS(10%) 2.25
PAES-RCS 2.00 4.92E−01 Not rejected
PAES-RL300 1.75
Hypervolume
Algorithm Friedman rank Iman and Davenport p-value Hypothesis
PAES-RCS(10%) 2.25
PAES-RCS 2.083 3.53E−01 Not rejected
PAES-RL300 1.667

Table 7
Results of the statistical tests on the FIRST, MEDIAN, and LAST solutions applied to the test sets of PAES-RL300, PAES-RCS and PAES-RCS(10%).

FIRST
Algorithm Friedman rank Iman and Davenport p-value Hypothesis
PAES-RCS(10%) 2.249
PAES-RCS 1.792 5.44E−01 Not rejected
PAES-RL300 1.958
MEDIAN
Algorithm Friedman rank Iman and Davenport p-value Hypothesis
PAES-RCS(10%) 2.125
PAES-RCS 1.792 6.914−01 Not rejected
PAES-RL300 2.083
LAST
Algorithm Friedman rank Iman and Davenport p-value Hypothesis
PAES-RCS(10%) 1.917
PAES-RCS 1.583 6.95E−02 Not rejected
PAES-RL300 2.5

in the final archive by using the overall TS in place of the reduced TS. The new values of MSE/2 might make some solution
dominated by other solutions in the archive. Thus, before computing the indicators, we select the non-dominated solutions
so as to generate again a Pareto front approximation.
We have again applied non-parametric statistical tests for multiple comparisons by combining all the datasets. Table 6
shows that the null hypothesis cannot be rejected for both the indicators. Thus, we can conclude that there do not exist
statistical differences between the three distributions.
Table 7 shows the results of non-parametric statistical tests applied to the MSE/2 computed for the FIRST, MEDIAN, and
LAST solutions on the test set. For all the three solutions, the null hypothesis is again not rejected. Thus, we can conclude
that 10% of the overall TS is sufficient for allowing PAES-RCS embedded in the co-evolutionary approach to evolve towards
Pareto front approximations statistically comparable with the ones generated by both PAES-RCS and PAES-RL300 with the
overall TS.
As shown in Appendix A, PAES-RCS generates solutions more complex than the ones generated by PAES-RCS(10%). Indeed,
PAES-RCS evolves using the overall TS and this induces the generation of MFRBSs with a larger number of rules and each
rule characterized by a larger number of conditions. Furthermore, we have to consider that, after re-computing the MSE/2
of the solutions contained in the final archive by using the overall TS in place of the reduced TS, we remove all the solutions
which are non-dominated anymore. These solutions often lie in the high complexity region.
To the best of our knowledge, only another MOEFS proposed in the literature can effectively manage large and high-
dimensional datasets. In [23], the authors show the results of the most accurate solutions obtained on a set of datasets,
which partially overlaps the set used in this paper. By comparing the results in [23] with the ones in Tables A.1 and A.2,
we can conclude that the two approaches achieve similar performance in terms of trade-offs between accuracy and com-
plexity. Thus, our approach can be considered as an alternative to the method proposed in [23] for dealing with large and
high-dimensional datasets.

5.4. Analysis of the computational times

In Table 8 we show, for each dataset, the average execution times of PAES-RCS, PAES-RCS(10%) and PAES-RL300 in sec-
onds and in percentage (between parentheses) with respect to the time needed by PAES-RL300. The algorithms have been
executed on a PC equipped with an Intel( R ) core(TM) Duo E8500 3.16 GHz, 4 GB RAM and Ubuntu operating system.
The analysis of Table 8 highlights that PAES-RCS allows saving on average 83.30% of the time spent by PAES-RL300 to
achieve comparable solutions. We can observe that, although the numbers of fitness evaluations performed by PAES-RCS and
M. Antonelli et al. / International Journal of Approximate Reasoning 54 (2013) 1434–1451 1447

Table 8
Average execution times expressed in seconds and in percentage with respect to the time needed by PAES-RL for executing 300,000 evaluations.

Dataset PAES-RCS PAES-RCS(10%) PAES-RL300

DA 474.0 (12.26%) 59.6 (1.54%) 3864.5 (100%)


DE 695.2 (13.33%) 99.4 (1.91%) 5212.1 (100%)
AN 307.5 (16.30%) 46.6 (2.47%) 1886.4 (100%)
KI 827.0 (15.57%) 121.0 (2.28%) 5309.4 (100%)
PM 406.4 (7.65%) 59.2 (1.11%) 5311.3 (100%)
CH 2127.6 (20.14%) 298.6 (2.83%) 10560.5 (100%)
AB 389.0 (16.83%) 58.2 (2.51%) 2310.5 (100%)
MV 3721.6 (19.62%) 598.6 (3.15%) 18961.8 (100%)
HO 3075.0 (15.32%) 505.4 (2.52%) 20070.2 (100%)
EL 3073.2 (17.99%) 502.0 (2.93%) 17083.0 (100%)
CA 1638.4 (14.37%) 284.4 (2.49%) 11396.4 (100%)
AI 4798.3 (18.07%) 747.7 (2.77%) 26940.0 (100%)

Mean 1794.4 (16.70%) 281.7 (2.62%) 10742.2 (100%)

PAES-RL300 are, for each dataset, 50,000 and 300,000, respectively, the time saved by using PAES-RCS is different from one
dataset to another. This behaviour is mainly due to the different complexity of the solutions which are generated during the
evolutionary process. In particular, the computational cost of the fitness evaluation depends on the complexity (number of
rules and number of conditions in the rules) of the rules which compose the solution under evaluation. If the MOEFS tends
to explore a long time solutions with a high number of rules, its execution time increases. In contrast, if the MOEFS quickly
converges towards solutions with low complexity, its execution time decreases. In general, due to its strategy, PAES-RL300
is more sensible to this phenomenon than PAES-RCS.
When PAES-RCS is embedded in the co-evolutionary approach, the use of only 10% of the overall TS allows us to save
on average 97.38% and 84.3% of the execution times needed by PAES-RL300 and PAES-RCS, respectively. In particular, by
analyzing the time expressed in seconds, PAES-RL300 requires execution times of the order of hours, while PAES-RCS(10%)
needs only of execution times of the order of minutes for the datasets considered in this work.

6. Conclusion

In this paper we have presented a new approach to deal with high dimensional and large datasets in the framework
of multi-objective evolutionary learning of Mamdani fuzzy rule-based systems (MFRBSs) with different trade-offs between
accuracy and rule base complexity. We have focused on regression problems and on multi-objective evolutionary approaches
based on rule learning.
To cope with high-dimensional data, we have proposed to learn rules not from scratch, but rather from a heuristically
generated rule base: during the evolutionary process both rules and conditions in the rules are selected from this rule base.
The effect is to perform a rule learning in a very constrained space. We have experimentally shown that the constrained
rule learning preserves the modeling powerful of the classical rule learning. However, restricting the search space, good
solutions are achieved using a lower number of fitness evaluations.
To manage large datasets, we have embedded the multi-objective evolutionary approach based on rule and condition
selection in a co-evolutionary approach where a single objective genetic algorithm and the multi-objective evolutionary
approach are cyclically executed in sequence: the genetic algorithm selects a reduced training set which is used in the
evolution of the multi-objective evolutionary approach.
We have used as multi-objective evolutionary algorithm the (2 + 2)M-PAES whose first version was proposed for only
RB learning by two of the authors of this paper in 2007 and successively adapted to both RB and DB learning. We have
experimented the (2 + 2)M-PAES with both classical (PAES-RL) and constrained (PAES-RCS) rule learning on twelve regression
problems characterized by a high number of both input variables and instances. We have shown that PAES-RCS needs
only 50,000 fitness evaluations to obtain solutions statistically equivalent in terms of epsilon dominance, hypervolume and
generalization capability to the ones obtained by PAES-RL after 300,000 fitness evaluations. On average, PAES-RCS allows us
to save up to 83.3% of the execution time on the twelve datasets.
Further, we have embedded PAES-RCS in the co-evolutionary approach we have proposed recently to perform instance
selection in multi-objective evolutionary fuzzy systems. We have shown that the co-evolutionary version of PAES-RCS, which
evaluates the fitness by using only the 10% of the training set, generates Pareto front approximations that are statistically
equivalent to the ones generated by PAES-RCS and PAES-RL using the overall training set, thus on average saving up to 84.3%
and 97.38% of the execution time of PAES-RCS and PAES-RL, respectively.

Appendix A

Tables A.1 and A.2 show, for all the datasets, the results corresponding, respectively, to the FIRST and MEDIAN solutions
obtained by PAES-RL300, PAES-RL50, PAES-RCS and PAES-RCS(10%). Due to the space constraints, we do not show the results
1448
Table A.1
Average results obtained by the FIRST solutions of the four MOEFSs.

Dataset PAES-RL300 PAES-RL50 PAES-RCS PAES-RCS(10%)


MSE/2TR MSE/2TS Comp Rules Var MSE/2TR MSE/2TS Comp Rules Var MSE/2TR MSE/2TS Comp Rules Var MSE/2TR MSE/2TS Comp Rules Var

M. Antonelli et al. / International Journal of Approximate Reasoning 54 (2013) 1434–1451


DA mean 1.39E−08 1.46E−08 76.23 28.40 5 .0 1.44E−08 1.50E−08 55.10 21.20 5 .0 1.52E−08 1.55E−08 28.67 13.00 5.0 1.58E−08 1.62E−08 20.60 10.40 4.9
SD 3.70E−10 8.40E−10 6.26 2.00 0.0 3.70E−10 9.90E−10 13.07 4.00 0.0 5.10E−10 1.20E−09 7.60 2.80 0.0 5.70E−10 1.14E−09 6.01 2.20 0.4

DE mean 1.04E−06 1.07E−06 90.97 28.60 6 .0 1.06E−06 1.09E−06 55.83 19.30 6 .0 1.08E−06 1.10E−06 36.13 14.60 6.0 1.10E−06 1.11E−06 24.33 11.20 5.9
SD 2.00E−08 5.60E−08 8.86 1.80 0.0 3.00E−08 6.60E−08 18.01 4.60 0.0 3.10E−08 5.50E−08 8.89 3.20 0.0 2.60E−08 5.00E−08 6.93 2.50 0.3

AN mean 2.40E−03 3.06E−03 83.60 24.90 7 .0 2.08E−02 2.17E−02 25.53 8.20 6 .3 2.64E−03 3.10E−03 36.83 13.70 7.0 3.60E−03 4.01E−03 20.90 9.20 6.1
SD 2.08E−04 1.27E−03 15.19 3.90 0.0 2.17E−02 2.31E−02 14.77 2.80 1 .1 2.41E−04 1.35E−03 5.92 1.70 0.2 1.68E−03 2.98E−03 6.11 2.00 1.0

KI mean 1.29E−02 1.37E−02 116.63 29.50 8 .0 1.59E−02 1.65E−02 102.23 24.40 8 .0 1.70E−02 1.74E−02 41.90 17.40 8.0 1.85E−02 1.89E−02 32.30 14.10 7.9
SD 6.27E−04 8.30E−04 9.31 1.10 0.0 8.95E−04 1.11E−03 16.44 3.50 0.0 5.41E−04 1.09E−03 9.28 2.10 0.2 7.07E−04 1.33E−03 6.46 1.80 0.2

PM mean 5.57E+00 6.14E+00 105.33 24.00 8 .0 6.10E+00 6.37E+00 44.90 13.20 7 .9 5.88E+00 6.09E+00 36.83 13.60 7.8 6.33E+00 6.50E+00 23.80 9.60 7.2
SD 3.49E−01 4.18E−01 19.50 3.00 0.0 5.87E−01 5.92E−01 11.58 2.20 0.4 1.45E−01 2.59E−01 8.83 1.70 0.6 2.99E−01 3.85E−01 7.01 1.80 1.2

CH mean 2.87E+09 2.91E+09 86.87 22.70 8 .0 3.09E+09 3.12E+09 58.03 15.00 8 .0 2.59E+09 2.61E+09 43.07 17.00 7.9 2.60E+09 2.60E+09 36.64 14.90 7.9
SD 2.30E+08 2.34E+08 23.04 6.00 0.0 2.46E+08 2.41E+08 18.93 3.70 0.0 7.77E+07 1.49E+08 9.30 2.80 0.3 6.48E+07 1.15E+08 6.01 1.80 0.4

AB mean 2.39E+00 2.52E+00 92.93 24.70 8 .0 2.55E+00 2.62E+00 58.03 15.40 8 .0 2.55E+00 2.58E+00 33.43 12.80 7.9 2.64E+00 2.69E+00 25.23 10.50 7.8
SD 8.00E−02 1.88E−01 22.02 4.40 0.0 1.14E−01 2.11E−01 18.53 3.60 0.0 8.50E−02 1.92E−01 8.59 3.00 0.4 7.40E−02 2.21E−01 7.05 2.40 0.4

MV mean 4.03E+00 4.08E+00 48.29 11.90 9 .2 5.62E+00 5.68E+00 32.04 8.40 8 .6 1.66E+00 1.67E+00 29.67 10.80 8.0 1.43E+00 1.44E+00 30.50 11.40 7.8
SD 2.64E+00 2.68E+00 20.65 4.30 1 .1 2.37E+00 2.39E+00 16.45 2.80 1 .3 7.86E−01 8.28E−01 8.47 2.20 1.5 3.88E−01 4.15E−01 8.73 1.90 1.8

HO mean 1.01E+09 1.04E+09 70.10 10.80 15.2 1.16E+09 1.18E+09 65.40 8.30 15.4 8.82E+08 9.05E+08 55.03 13.10 15.1 9.09E+08 9.24E+08 45.27 10.90 15.0
SD 1.91E+08 2.15E+08 41.14 4.60 1 .4 1.32E+08 1.58E+08 33.10 2.50 1 .0 8.66E+07 1.08E+08 17.92 3.30 1.1 5.48E+07 9.09E+07 14.04 2.00 1.5

EL mean 9.19E−06 9.43E−06 64.50 12.30 16.6 1.10E−05 1.11E−05 37.10 8.00 15.9 7.18E−06 7.30E−06 76.40 17.80 17.5 7.63E−06 7.72E−06 67.50 16.80 17.0
SD 1.06E−06 1.11E−06 35.40 4.60 2 .0 1.25E−06 1.40E−06 12.20 1.80 2 .5 7.80E−07 7.38E−07 21.90 3.10 0.8 7.98E−07 7.99E−07 22.00 2.80 1.5

CA mean 9.15E+00 9.90E+00 64.50 10.10 19.0 2.34E+01 2.58E+01 87.10 8.60 19.7 4.05E+00 5.48E+00 82.30 15.60 20.2 4.60E+00 5.40E+00 58.90 12.70 19.1
SD 2.92E+00 3.56E+00 30.00 2.80 2 .8 6.26E+00 8.20E+00 47.80 2.50 2 .6 3.40E−01 2.55E+00 24.90 3.00 1.5 4.60E−01 1.83E+00 19.10 2.20 1.8

AI mean 2.39E−08 2.44E−08 69.30 8.50 25.3 3.94E−08 3.99E−08 54.30 6.90 25.3 1.69E−08 1.73E−08 122.80 14.80 37.9 1.77E−08 1.79E−08 98.20 13.30 34.5
SD 7.43E−09 7.69E−09 62.40 2.90 12.8 1.39E−08 1.45E−08 45.90 2.00 12.4 6.93E−10 1.05E−09 43.60 2.50 3.0 9.08E−10 9.66E−10 48.10 2.60 5.8
Table A.2
Average results obtained by the MEDIAN solutions of the four MOEFSs

Dataset PAES-RL300 PAES-RL50 PAES-RCS PAES-RCS(10%)


MSE/2TR MSE/2TS Comp Rules Var MSE/2TR MSE/2TS Comp Rules Var MSE/2TR MSE/2TS Comp Rules Var MSE/2TR MSE/2TS Comp Rules Var

1.41E−08 1.47E−08 1.47E−08 1.51E−08 1.53E−08 1.57E−08 1.59E−08 1.63E−08

M. Antonelli et al. / International Journal of Approximate Reasoning 54 (2013) 1434–1451


DA mean 37.23 18.30 5.0 24.80 13.40 5 .0 17.20 10.30 4 .9 12.30 8.10 4.6
SD 3.70E−10 8.00E−10 3.46 2.20 0.0 3.60E−10 1.01E−09 5.10 2.40 0.2 4.60E−10 1.17E−09 3.90 2.10 0 .3 5.80E−10 1.06E−09 2.62 1.60 0.5

DE mean 1.05E−06 1.07E−06 43.87 18.80 6.0 1.07E−06 1.09E−06 24.60 12.40 5 .9 1.09E−06 1.10E−06 19.30 11.50 5 .7 1.10E−06 1.12E−06 13.87 8.80 5.3
SD 1.90E−08 5.50E−08 5.06 2.40 0.0 3.10E−08 6.30E−08 6.40 2.70 0.3 3.00E−08 5.30E−08 3.70 2.60 0 .5 2.50E−08 5.00E−08 3.62 1.90 0.7

AN mean 2.50E−03 3.06E−03 36.63 14.80 6.7 2.33E−02 2.45E−02 13.70 6.00 5 .2 2.73E−03 3.15E−03 19.20 9.90 5 .7 3.21E−03 3.35E−03 12.23 7.20 4.3
SD 2.17E−04 1.29E−03 5.86 2.20 0.6 2.38E−02 2.55E−02 6.10 1.20 1 .5 2.69E−04 1.34E−03 2.60 0.90 0 .8 4.20E−04 1.31E−03 2.70 1.30 0.7

KI mean 1.51E−02 1.57E−02 46.30 16.50 8.0 1.76E−02 1.81E−02 38.80 13.30 7 .9 1.83E−02 1.86E−02 20.40 12.40 7 .5 1.97E−02 2.00E−02 16.20 10.40 7.2
SD 7.95E−04 1.01E−03 5.55 1.90 0.2 1.06E−03 1.17E−03 6.10 1.90 0 .4 7.25E−04 1.18E−03 2.80 1.10 0 .6 8.23E−04 1.21E−03 2.45 1.50 0.7

PM mean 5.69E+00 6.09E+00 44.67 14.50 7.8 6.25E+00 6.43E+00 19.40 8.80 6 .2 6.04E+00 6.21E+00 18.50 9.30 6 .1 6.45E+00 6.56E+00 12.70 7.30 4.6
SD 3.58E−01 3.73E−01 9.25 1.80 0.7 5.84E−01 5.95E−01 4.80 1.30 1 .5 1.96E−01 2.84E−01 3.50 1.20 1 .3 3.42E−01 3.70E−01 2.79 1.00 1.3

CH mean 2.91E+09 2.95E+09 33.83 12.70 7.8 3.18E+09 3.20E+09 21.70 8.90 7 .5 2.69E+09 2.70E+09 21.70 12.20 7 .0 2.69E+09 2.70E+09 18.89 10.70 6.3
SD 2.32E+08 2.27E+08 7.62 3.00 0.5 2.83E+08 2.80E+08 4.90 1.70 0.7 9.63E+07 1.71E+08 4.00 1.60 1 .0 6.38E+07 1.33E+08 3.41 1.70 0.9

AB mean 2.41E+00 2.51E+00 36.57 13.80 8.0 2.62E+00 2.68E+00 23.40 9.30 7 .5 2.58E+00 2.60E+00 18.90 10.00 7 .2 2.66E+00 2.72E+00 14.57 8.40 6.8
SD 8.50E−02 1.86E−01 7.50 1.90 0.0 1.23E−01 2.32E−01 5.50 1.20 0.7 8.70E−02 1.91E−01 4.50 2.10 0 .9 7.60E−02 2.20E−01 3.37 1.90 0.8

MV mean 4.25E+00 4.29E+00 19.00 7.30 6.5 6.04E+00 6.08E+00 13.80 5.80 6 .5 1.88E+00 1.90E+00 16.00 8.00 4 .9 1.80E+00 1.81E+00 16.17 8.40 5.0
SD 2.78E+00 2.80E+00 5.04 1.90 1.3 2.45E+00 2.46E+00 3.90 0.90 1 .8 8.96E−01 9.53E−01 2.90 1.30 0 .9 6.03E−01 6.19E−01 3.44 1.50 1.1

HO mean 1.04E+09 1.05E+09 25.00 6.70 12.4 1.19E+09 1.20E+09 25.80 5.50 12.1 9.07E+08 9.26E+08 25.40 10.00 12.6 9.25E+08 9.39E+08 18.67 8.10 11.2
SD 2.00E+08 2.14E+08 11.28 2.10 2.1 1.27E+08 1.59E+08 15.10 1.10 2 .6 8.43E+07 1.05E+08 6.00 2.40 1 .5 5.45E+07 9.44E+07 3.45 1.40 1.9

EL mean 9.41E−06 9.58E−06 23.93 8.70 12.5 1.14E−05 1.15E−05 16.10 6.10 10.6 7.82E−06 7.95E−06 32.20 13.80 13.6 8.29E−06 8.33E−06 25.83 12.40 12.5
SD 1.01E−06 1.04E−06 7.77 2.10 2.2 1.31E−06 1.47E−06 5.00 1.10 2 .6 6.39E−07 6.14E−07 7.10 2.10 1 .9 8.16E−07 8.62E−07 6.02 2.00 2.2

CA mean 9.71E+00 1.04E+01 25.03 6.90 12.6 2.99E+01 3.10E+01 43.80 6.00 16.5 4.56E+00 5.86E+00 34.80 11.30 15.4 5.37E+00 6.28E+00 22.14 9.00 11.2
SD 3.00E+00 3.64E+00 17.08 1.50 3.1 9.61E+00 1.06E+01 25.60 1.00 5 .1 4.30E−01 2.39E+00 7.70 1.80 2 .5 4.90E−01 2.01E+00 3.38 1.40 2.0

AI mean 2.43E−08 2.47E−08 13.31 6.20 10.1 4.43E−08 4.52E−08 18.80 5.70 13.6 1.78E−08 1.81E−08 57.40 11.90 28.0 1.89E−08 1.90E−08 41.13 10.50 22.7
SD 7.92E−09 8.11E−09 3.04 1.00 2.1 2.21E−08 2.38E−08 17.30 0.80 9 .9 7.89E−10 1.24E−09 21.60 2.40 6 .5 1.44E−09 1.70E−09 17.48 1.90 6.6

1449
1450 M. Antonelli et al. / International Journal of Approximate Reasoning 54 (2013) 1434–1451

of the LAST solutions. On the other hand, these solutions are not, in general, particularly attractive for the users because of
their low accuracy. In the tables, for each solution, we present the mean and standard deviation of the MSE/2, both on the
training (MSE/2TR ) and test (MSE/2TS ) sets, the complexity (Comp), the number of rules (Rules) and the number of variables
(Var).

References

[1] E.H. Mamdani, S. Assilian, An experiment in linguistic synthesis with a fuzzy logic controller, Int. J. Man Mach. Stud. 7 (1) (1975) 1–13.
[2] O. Cordón, A historical review of evolutionary learning methods for Mamdani-type fuzzy rule-based systems: Designing interpretable genetic fuzzy
systems, International Journal of Approximate Reasoning 52 (6) (2011) 894–913.
[3] S.-M. Zhou, J.Q. Gan, Low-level interpretability and high-level interpretability: a unified view of data-driven interpretable fuzzy system modelling,
Fuzzy Sets Syst. 159 (23) (2008) 3091–3131.
[4] J.M. Alonso, L. Magdalena, G. González-Rodríguez, Looking for a good fuzzy system interpretability index: An experimental approach, International
Journal of Approximate Reasoning 51 (1) (2009) 115–134.
[5] M. Gacto, R. Alcalá, F. Herrera, Interpretability of linguistic fuzzy rule-based systems: An overview of interpretability measures, Information Sci-
ences 181 (20) (2011) 4340–4360.
[6] P. Ducange, F. Marcelloni, Multi-objective evolutionary fuzzy systems, in: Proceedings of the 9th international conference on Fuzzy logic and applica-
tions, Springer-Verlag, Berlin, Heidelberg, 2011, pp. 83–90.
[7] M. Fazzolari, R. Alcalá, Y. Nojima, H. Ishibuchi, F. Herrera, A review of the application of multi-objective evolutionary fuzzy systems: Current status and
further directions, IEEE Trans. Fuzzy Syst. 21 (2) (2013) 45–65.
[8] H. Ishibuchi, T. Murata, I.B. Türksen, Single-objective and two-objective genetic algorithms for selecting linguistic rules for pattern classification prob-
lems, Fuzzy Sets Syst. 89 (2) (1997) 135–150.
[9] H. Ishibuchi, T. Yamamoto, Fuzzy rule selection by multi-objective genetic local search algorithms and rule evaluation measures in data mining, Fuzzy
Sets Syst. 141 (1) (2004) 59–88.
[10] H. Ishibuchi, T. Nakashima, T. Murata, Three-objective genetics-based machine learning for linguistic rule extraction, Inf. Sci. 136 (1–4) (2001) 109–133.
[11] M. Cococcioni, P. Ducange, B. Lazzerini, F. Marcelloni, A Pareto-based multi-objective evolutionary approach to the identification of Mamdani fuzzy
systems, Soft Comput. 11 (11) (2007) 1013–1031.
[12] P. Ducange, B. Lazzerini, F. Marcelloni, Multi-objective genetic fuzzy classifiers for imbalanced and cost-sensitive datasets, Soft Comput. 14 (7) (2010)
713–728.
[13] A. Botta, B. Lazzerini, F. Marcelloni, D.C. Stefanescu, Context adaptation of fuzzy systems through a multi-objective evolutionary approach based on a
novel interpretability index, Soft Comput. 13 (5) (2009) 437–449.
[14] R. Alcalá, P. Ducange, F. Herrera, B. Lazzerini, F. Marcelloni, A multiobjective evolutionary approach to concurrently learn rule and data bases of
linguistic fuzzy-rule-based systems, IEEE Trans. Fuzzy Syst. 17 (5) (2009) 1106–1122.
[15] M. Antonelli, P. Ducange, B. Lazzerini, F. Marcelloni, Multi-objective evolutionary learning of granularity, membership function parameters and rules of
Mamdani fuzzy systems, Evolutionary Intellegence 2 (1–2) (2009) 21–37.
[16] M. Antonelli, P. Ducange, B. Lazzerini, F. Marcelloni, Learning concurrently data and rule bases of Mamdani fuzzy rule-based systems by exploiting a
novel interpretability index, Soft Comput. 15 (10) (2011) 1981–1998.
[17] M. Antonelli, P. Ducange, B. Lazzerini, F. Marcelloni, Learning knowledge bases of multi-objective evolutionary fuzzy systems by simultaneously opti-
mizing accuracy, complexity and partition integrity, Soft Comput. 15 (12) (2011) 2335–2354.
[18] P. Pulkkinen, H. Koivisto, A dynamically constrained multiobjective genetic fuzzy system for regression problems, IEEE Trans. Fuzzy Syst. 18 (1) (2010)
161–177.
[19] R. Alcalá, M.J. Gacto, F. Herrera, J. Alcalá-Fdez, A multi-objective genetic algorithm for tuning and rule selection to obtain accurate and compact
linguistic fuzzy rule-based systems, Int. J. Uncertainty Fuzziness Knowledge Based Syst. 15 (5) (2007) 539–557.
[20] M.J. Gacto, R. Alcalá, F. Herrera, Adaptation and application of multi-objective evolutionary algorithms for rule reduction and parameter tuning of fuzzy
rule-based systems, Soft Comput. 13 (5) (2009) 419–436.
[21] M.J. Gacto, R. Alcalá, F. Herrera, Integration of an index to preserve the semantic interpretability in the multiobjective evolutionary rule selection and
tuning of linguistic fuzzy systems, IEEE Trans. Fuzzy Syst. 18 (3) (2010) 515–531.
[22] C.H. Nguyen, W. Pedrycz, T.L. Duong, T.S. Tran, A genetic design of linguistic terms for fuzzy rule based classifiers, International Journal of Approximate
Reasoning 54 (1) (2013) 1–21.
[23] R. Alcalá, M.J. Gacto, F. Herrera, A fast and scalable multi-objective genetic fuzzy system for linguistic fuzzy modeling in high-dimensional regression
problems, IEEE Trans. Fuzzy Syst. 19 (4) (2011) 666–681.
[24] M. Antonelli, P. Ducange, F. Marcelloni, Genetic training instance selection in multi-objective evolutionary fuzzy systems: A co-evolutionary approach,
IEEE Trans. Fuzzy Syst. 20 (2) (2012) 276–290.
[25] J. Casillas, O. Cordón, M.J. Del Jesus, F. Herrera, Genetic feature selection in a fuzzy rule-based classification system learning process for high-
dimensional problems, Inf. Sci. 136 (1–4) (2001) 135–157.
[26] O. Cordón, A. Quirin, Comparing two genetic overproduce-and-choose strategies for fuzzy rule-based multiclassification systems generated by bagging
and mutual information-based feature selection, Int. J. Hybrid Intell. Syst. 7 (1) (2010) 45–64.
[27] Y. Nojima, H. Ishibuchi, I. Kuwajima, Parallel distributed genetic fuzzy rule selection, Soft Comput. 13 (5) (2008) 511–519.
[28] I. Robles, R. Alcalá, J. Benitez, F. Herrera, Evolutionary parallel and gradually distributed lateral tuning of fuzzy rule-based systems, Evolutionary
Intelligence 2 (2009) 5–19.
[29] H. Ishibuchi, S. Mihara, Y. Nojima, Parallel distributed hybrid fuzzy GBML models with rule set migration and training data rotation, IEEE Trans. Fuzzy
Systems 21 (2) (2013) 355–368.
[30] Y. Jin, A comprehensive survey of fitness approximation in evolutionary computation, Soft Comput. 9 (1) (2005) 3–12.
[31] M. Cococcioni, B. Lazzerini, F. Marcelloni, On reducing computational overhead in multi-objective genetic Takagi–Sugeno fuzzy systems, Appl. Soft
Comput. 11 (1) (2011) 675–688.
[32] H. Liu, H. Motoda, On issues of instance selection, Data Min. Knowl. Discov. 6 (2) (2002) 115–130.
[33] J. Derrac, S. García, F. Herrera, A survey on evolutionary instance selection and generation, Int. J. of Applied Metaheuristic Computing 1 (1) (2010)
60–92.
[34] L.-X. Wang, J. Mendel, Generating fuzzy rules by learning from examples, IEEE Transactions on Systems, Man and Cybernetics 22 (6) (1992) 1414–1427.
[35] J.D. Knowles, D.W. Corne, Approximating the nondominated front using the Pareto archived evolution strategy, Evol. Comput. 8 (2) (2000) 149–172.
[36] M. Antonelli, P. Ducange, B. Lazzerini, F. Marcelloni, Multi-objective evolutionary generation of Mamdani fuzzy rule-based systems based on rule and
condition selection, in: 5th IEEE International Workshop on Genetic and Evolutionary Fuzzy Systems (GEFS), 2011, pp. 47–53.
M. Antonelli et al. / International Journal of Approximate Reasoning 54 (2013) 1434–1451 1451

[37] M. Antonelli, P. Ducange, F. Marcelloni, A new approach to handle high-dimensional and large datasets in multi-objective evolutionary fuzzy systems,
in: IEEE International Conference on Fuzzy Systems, 2011, pp. 1286–1293.
[38] H. Ishibuchi, T. Nakashima, T. Murata, Performance evaluation of fuzzy classifier systems for multidimensional pattern classification problems, IEEE
Transactions on Systems, Man, and Cybernetics, Part B: Cybernetics 29 (5) (1999) 601–618.
[39] J. Knowles, D. Corne, M-PAES: A memetic algorithm for multiobjective optimization, in: Proceedings of the 2000 Congress on Evolutionary Computation,
vol. 1, 2000, pp. 325–332.
[40] E. Zitzler, L. Thiele, M. Laumanns, C. Fonseca, V. da Fonseca, Performance assessment of multiobjective optimizers: an analysis and review, IEEE
Transactions on Evolutionary Computation 7 (2) (2003) 117–132.
[41] S. Bleuler, M. Laumanns, L. Thiele, E. Zitzler, PISA – A Platform and Programming Language Independent Interface for Search Algorithms, Springer, 2003,
pp. 494–508.
[42] J. Derrac, S. García, D. Molina, F. Herrera, A practical tutorial on the use of nonparametric statistical tests as a methodology for comparing evolutionary
and swarm intelligence algorithms, Swarm and Evolutionary Computation 1 (1) (2011) 3–18.
[43] M. Friedman, The use of ranks to avoid the assumption of normality implicit in the analysis of variance, J. American Stat. Assoc. 32 (200) (1937)
675–701.
[44] R.L. Iman, J.M. Davenport, Approximations of the critical region of the fbietkan statistic, Comm. Statist. 9 (6) (1980) 571–595.
[45] S. Holm, A simple sequentially rejective multiple test procedure, Scand. J. Statist. 6 (2) (1979) 65–70.

You might also like