You are on page 1of 11

Applied Soft Computing 8 (2008) 646656

www.elsevier.com/locate/asoc

MODENAR: Multi-objective differential evolution algorithm for


mining numeric association rules
Bilal Alatas *, Erhan Akin, Ali Karci
Department of Computer Engineering, Faculty of Engineering, Firat University, 23119 Elazig, Turkey
Received 23 June 2006; received in revised form 28 May 2007; accepted 30 May 2007
Available online 2 June 2007

Abstract
In this paper, a Pareto-based multi-objective differential evolution (DE) algorithm is proposed as a search strategy for mining accurate and
comprehensible numeric association rules (ARs) which are optimal in the wider sense that no other rules are superior to them when all objectives
are simultaneously considered. The proposed DE guided the search of ARs toward the global Pareto-optimal set while maintaining adequate
population diversity to capture as many high-quality ARs as possible. ARs mining problem is formulated as a four-objective optimization problem.
Support, confidence value and the comprehensibility of the rule are maximization objectives while the amplitude of the intervals which conforms
the itemset and rule is minimization objective. It has been designed to simultaneously search for intervals of numeric attributes and the discovery of
ARs which these intervals conform in only single run of DE. Contrary to the methods used as usual, ARs are directly mined without generating
frequent itemsets. The proposed DE performs a database-independent approach which does not rely upon the minimum support and the minimum
confidence thresholds which are hard to determine for each database. The efficiency of the proposed DE is validated upon synthetic and real
databases.
# 2007 Elsevier B.V. All rights reserved.
Keywords: Data mining; Machine learning; Evolutionary computation; Multi-objective optimization; Differential evolution

1. Introduction
Data mining is the extraction of implicit, valid, and
potentially useful knowledge from large volumes of raw data
[1]. The extracted knowledge must be not only accurate but also
readable, comprehensible and ease of understanding. There are
a lot of data mining tasks such as ARs, sequential patterns,
classification, clustering, time series, etc., and there have been
many techniques and algorithms for these tasks and different
types of data in data mining. When the data contain continuous
values, it becomes difficult to mine the data and some special
techniques need to be developed.
One new and extremely powerful algorithm due to
convergence characteristics and few control parameters is
DE that can be categorized into a novel class of floating-point
encoded, evolutionary optimization algorithms. The population
reproduction scheme and selection scheme of DE differ from
other evolutionary algorithms.
* Corresponding author. Tel.: +90 424 237 00 00; fax: +90 424 218 19 07.
E-mail addresses: balatas@firat.edu.tr, bilalalatas@yahoo.com (B. Alatas),
eakin@firat.edu.tr (E. Akin), akarci@firat.edu.tr (A. Karci).
1568-4946/$ see front matter # 2007 Elsevier B.V. All rights reserved.
doi:10.1016/j.asoc.2007.05.003

This work focuses on the AR task of data mining where the


aim is to find rules with strong relations between attributes/
attribute values from databases. More precisely, this work
proposes a multi-objective differential evolution algorithm for
mining numeric association rules (MODENAR). Mining of
numeric association rules can be characterized by the presence
of more than one objective which is conflicting in nature. These
objectives may be high support and confidence values,
interestingness, comprehensibility, narrow intervals for
numeric attributes. That is why this problem is a multiobjective optimization problem.
Since multi-objective optimization searches for an optimal
vector (rules in data mining) not just a single value (one rule), one
solution often cannot be said to be better than another and there
exists not only a single optimal solution, but a set of optimal
solutions, called the Pareto-optimal set [2,3]. Consequently, there
are two goals in multi-objective optimization in ARs mining task:
(i) to discover rules as close to the Pareto-optimal as possible, and
(ii) to find rules as diverse as possible in the obtained nondominated set. Satisfying these two goals is a challenging task.
MODENAR is relatively simple, easy to implement and easy
to use. It is a Pareto-based multi-objective algorithm which

B. Alatas et al. / Applied Soft Computing 8 (2008) 646656

finds Pareto-optimal solutions which can provide flexibility for


the human decision maker. It simultaneously searches for
intervals of numeric attributes and the discovery of comprehensible ARs which these intervals conform in only single run.
It is capable of optimizing all integer and continuous variables.
Furthermore, it follows a database-independent approach
which does not rely upon the minimum support and the
minimum confidence thresholds which are hard to determine
for each database.
The remainder of this paper is organized as follows. Section
2 presents a brief overview of related works on numeric ARs.
Section 3 briefly introduces the multi-objective optimization
problem. Section 4 is a brief introduction of differential
evolution. Section 5 describes the proposed algorithm in detail.
Section 6 reports the computational results and finally, Section
7 presents the conclusions and future research directions.
2. Numeric association rules and related works
The Boolean ARs mining problem over basket data was first
introduced in [4]. In this algorithm and many algorithms
proposed afterwards for mining the ARs are divided into two
stages: the first is to find the frequent itemsets; the second is to
use the frequent itemsets to generate ARs. The mined rules have
certain support and confidence values. Though Boolean ARs
are meaningful, there are many other situations where data
items concerned are usually categorical or numeric. That is why
numeric AR mining algorithms have been proposed. In a
numeric AR, attributes are not limited to being Boolean but can
be quantitative (e.g., age, salary, and heat) or categorical (e.g.,
sex, and brand). Thus, numeric ARs are more expressive and
informative than Boolean ARs [5].
An example of a numeric AR in an employee database is
Age 2 25; 36 ^ Sex
Male ) Salary 2 20002400 ^ Have Car
Yes Support 4%; Confidence 80%
In this numeric AR, Age 2 [25,36] ^ Sex = Male is
antecedent and Salary 2 [20002400] ^ Have_Car = Yes
is consequent part. This numeric AR states that 4% (support)
of the employees are males aged between 25 and 36 and earning
a salary of between $2000 and $2400 and have a car, while
80% (confidence) of males aged between 25 and 36 are
earning a salary of between $2000 and $2400 and have a car.
In [6], mining numeric ARs are performed by first
partitioning the attributes domains into small intervals and
combining adjacent intervals into larger one such that the
combined intervals will have enough supports. In fact, numeric
problem has been transformed to a Boolean one. Some
researchers used geometric means to find numeric intervals for
numeric values [7]. However, antecedent of the rules is
restricted to exactly one categorical value. Their mined rules
are of the form:
A 2 v1 ; v2  ) C

647

or of its expanded version:


A 2 v1 ; v2  ^ C1 ) C 2
where A is the numeric attributes and C, C1, and C2 are the
Boolean statements.
Fukuda et al. [8] and Yoda et al. [9] introduced variants with
two numerical variables in antecedent and one Boolean item in
the consequent. Aumann and Lindell used the distribution of a
numerical value as the criteria for inclusion in the AR [10].
Their contention was that an AR could be thought of as a
population subset (the rule consequent) exhibiting some
interesting behavior (the rule antecedent). They investigated
two types of numeric rules:
categorical ) numeric rules
and
numeric ) numeric rules
Limitations on these numeric AR mining algorithms, in
general, were the numbers of variables allowed in either the
consequent or antecedent of the rules. In addition, it is not
allowed for both Boolean and multiple numeric values to be in
both the consequent and/or in the antecedent of the rule.
Discretizing the numeric attributes inevitably leads to a loss of
information. Discretizing may not reflect the original distribution of the attribute and discretized intervals may hide rules (if
intervals are too large, important rules at smaller resolution may
be missed; and if they are too small, there may not be enough
data to mine rules). Intervals may not be semantically
meaningful and may not make sense to human experts.
Furthermore, cumulative effects of several numeric variables
cannot easily be represented.
Diverse researchers afterwards have used clustering
techniques. Miller and Yang [11] applied Birch clustering to
identify intervals and proposed a distance-based association
rules mining process, which improves the semantics of the
intervals. Lent et al. [12] proposed a geometric-based
algorithm, called BitOP, to perform clustering for numerical
attributes. They showed that clustering is a possible solution to
figure out meaningful regions and support the discovery of
association rules. Vannucci and Colla have proposed a neural
network, a self-organizing map, to overcome the limitations of
the proposed methods for unsupervised discretization that tries
to preserve original sample distribution [13].
In [5], information-theoretic approach that also uses
discretizing to numeric ARs mining has been proposed. A
graph is constructed that indicates the strong informativerelationship between the attributes. Then cliques are used in the
graph to prune the unpromising attribute sets and hence the
joined intervals between these attributes.
Some researchers partitioned the numeric data by means of
fuzzy sets. The rules are of the form:
A X)B Y
where A and B contain itemsets that are subsets of attributes. X
and Y contain the fuzzy sets associated with the corresponding
attributes set in A and B.

648

B. Alatas et al. / Applied Soft Computing 8 (2008) 646656

However all of these techniques have in common the facts


that they need information a priori from the user [14].
Boundaries and shapes of the fuzzy membership functions must
be established by human experts thus this method is not
applicable when automatic discretization is required.
The main problem of all these approaches is preparation of
the data before applying the algorithm. This preparation, either
by means of the user or by means of an automatic process,
conveys a loss of information because the rules will only be
generated departing from the partitions previously created. The
formed intervals for numeric data may not be concise and
meaningful enough for human experts for easily obtaining
valuable knowledge from discovered rules. Furthermore,
except fuzzy sets these approaches may have some drawbacks.
The first problem is caused by sharp boundary between
intervals which is not intuitive with respect to human
perception. The algorithms either ignore or over-emphasize
the elements near the boundary of the intervals. Furthermore,
distinguishing the degree of membership for the interval
technique without a priori knowledge is not easy. Similarly,
partitioning by means of fuzzy sets is not an easy task because it
is hard to determine most appropriate fuzzy sets for the numeric
attribute values [14,15]. Characteristics of numeric attributes
are in general unknown and it is unrealistic that the most
appropriate fuzzy sets can always be provided by domain
experts. That is why some researchers have proposed an
evolutionary algorithm for automatically obtaining the fuzzy
sets [16].
The idea of using an evolutionary algorithm (EA) for mining
only frequent sets was applied in [17]. However, the encoding
was not much effective for genetic operators to be performed
due to variable size. Furthermore, frequent itemsets were mined
by running the EA as many times as frequent itemsets which
have been wanted to obtain and this has a big computational
cost. In [18], a more effective EA has been proposed for mining
all rules in single run with an effective, modified EA while in
[19] a Pareto-based EA has been used for mining ARs from
only market-basket type database.
There is no study which uses DE for mining ARs. DE is a
simple, effective and single-objective optimization algorithm
which solves real-valued problems-based on the principles of
natural evolution and this study proposed a novel approach for
numeric ARs mining via DE. One advantage of multi-objective
optimization algorithms over classical algorithms is that many
non-dominated solutions can be simultaneously obtained by
their single run. This paper shows how this advantage can be
utilized in numeric AR mining which is formulated as a fourobjective optimization problem. A Pareto-based multi-objective DE algorithm is proposed as a novel search strategy for
mining numeric ARs which are optimal in the wider sense that
no other rules are superior to them when all objectives are
simultaneously considered.
3. Differential evolution algorithm
DE is a single-objective optimization algorithm which
solves real-valued problems based on the principles of natural

evolution using a population P of Np floating-point encoded


individuals (1) which evolve over G generations to reach
optimal solution (s). Each individual is a vector which contains
as many parameters as the problem decision variables D
[20,21].
PG X1G ; . . . ; XNGp 
G
G
XiG X1;i
; . . . ; XD;i
;

(1)
i 1; . . . ; N p

(2)

The canonical EA, GA [22], and population-based incremental learning [23] work with the strings of bits or integers
(letters). Evolution strategy (ES) [24] and DE both work with
vectors of real numbers to represent the candidate solutions.
Mutation is the main step in ES. However, ES typically utilizes
adaptive mutation rates for the vectors themselves, but DE
utilizes mutations of the differences of the parameter vectors as
described in following subsection.
DE generates new offsprings by forming a noisy replica
(trial vector) of each parent individual (target vector) of the
population. The population is successfully improved by three
basic operators: mutation, crossover, and selection. Although
these names are the same as used in EAs, the ways they are
performed are different. DE devises its own mutation,
crossover, and selection and redefines in the present context.
First the mutation operator which plays the key role in
optimization process creates mutant vectors by perturbing each
target vector with the weighted difference of two other
randomly selected individuals. Perturbation which can have
either one or two pair of vectors can be performed to either a
randomly selected vector from the population or the best
candidate solution found so far. Then, the crossover operator
generates trial vectors by mixing the parameters of the mutant
vectors with the target vectors according to a selected
probability distribution. Crossover can be based on binomial
or exponential distributions. Finally, the selection operator
forms the next generation by deterministically selecting
between the trial and the corresponding target vector which
fits better the objective function. The interesting point in
selection is that a trial vector is not compared against all the
individuals in the current population, but only against its one
counterpart target individual. These operators are repeated for
several generations until the termination criteria are met.
The schematic diagram in Fig. 1 provides a way to visualize
the working principle of DE and simple pseudo-code of DE for
solving single-objective optimization is given in Fig. 2.
3.1. Initial population
The initial population is created by assigning random values
which lie inside the feasible bounds of the decision variable to
each decision parameter of each individual of the population as
shown in the following equation:
X 0j;i X min
h j X max
 X min
j
j
j ;
i 1; . . . ; N p ; j 1; . . . ; D

(3)

B. Alatas et al. / Applied Soft Computing 8 (2008) 646656

649

optimization [26]. In this manner, mutant vectors are created as


according to Eq. (6). This scheme is DE/best/2:
0

G
FXaG  XbG XcG  XdG ;
XiG Xbest

i 1; . . . ; N p ; a 6 b 6 c 6 d 6 i

(6)

Scheme DE/rand-to-best/1 places the perturbation at a


location between a randomly chosen vector and the best
performing vector:
0

G
XiG XaG lXbest
 XaG FXbG  XcG ;
i 1; . . . ; N p ; best 6 a 6 b 6 c 6 i

(7)

l controls the greediness of the scheme and usually set to


l = F to reduce the number of control variables of the algorithm.
3.3. Crossover
Fig. 1. Schematic diagram of DE algorithm.

where X min
and X max
are the lower and upper bound of the jth
j
j
decision parameter respectively, and hj 2 [0,1] is a uniformly
distributed random number generated anew for each value of j.
3.2. Mutation

The parent vector is mixed with the mutated vector to create


00
a trial vector, X Gj;i , which is used in the selection process
according to the following equation:
( G0
X j;i if h0j  C R or j q
G00
X j;i
;
X Gj;i otherwise
i 1; . . . ; N p ; j 1; . . . ; D

The mutation operator creates mutant vectors by perturbing


a randomly selected vector, Xa, with the difference of two other
selected vectors, Xb and Xc as according to the following
equation:
0

XiG XaG FXbG  XcG ;

i 1; . . . ; N p ; a 6 b 6 c 6 i
(4)

F 2 [0,1.2] is known as the scaling constant and used to control


the perturbation and improve convergence. Xa, Xb, and Xc are
2{1, . . ., Np} randomly generated anew for each parent vector.
This mutation operator is known as scheme DE/rand/1; a
randomly selected vector is perturbed with 1 difference vector.
There are also practical variants of DE [25]. For example the
best performing vector of the current generation can be selected
as the perturbed vector. This is known as scheme DE/best/1:
0

G
XiG Xbest
FXaG  XbG ;

i 1; . . . ; N p ; best 6 a 6 b 6 i
(5)

G
Xbest
is the best solution found so far.
Perturbing the best solution found so far with two difference
vectors can present a higher convergence rate in global

Fig. 2. The main steps of DE algorithm.

(8)

CR 2 [0,1] is known as crossover constant; q 2 {1, . . ., D} is


a randomly chosen index which ensures the trial vector gets at
least one parameter from the mutant vector even if CR = 0 Thus,
it does not become an exact replica of the original parent vector.
h0j 2 0; 1 is a uniformly distributed random number generated
anew for each value of j.
3.4. Selection
The selection operator compares the fitness of the trial vector
and the fitness of corresponding target vector, and selects the
one which performs better (9). Here, better fitness implies a
bigger objective function value. The selection process is
repeated for each pair of target/trial vector until the next
population is completed:
 G00
00
Xi
if f XiG  f XiG
G1
Xi

(9)
; i 1; :::; N p
XiG otherwise
4. Multi-objective optimization
Multi-objective optimization is the problem of simultaneously optimizing a set S of two or more objective functions.
The objective functions typically measure different features of
a desired solution. Often these objectives are conflicting in that
there is no single solution which simultaneously optimizes all
functions. Instead, one has a set of optimal solutions. This set
can be defined using the notion of Pareto-optimality and is
commonly referred to as the Pareto-optimal set [27].
Assuming that the functions in S should be maximized, then
a solution s is Pareto-optimal if there is no other solution s0 such

650

B. Alatas et al. / Applied Soft Computing 8 (2008) 646656

rules in a database, several conflicting objectives will often be


present. As the ultimate goal of data mining is to discover
unexpected, useful and comprehensible knowledge, it may not
be feasible to prioritize these objectives a priori. Simply
designing an aggregate fitness function in these cases could be
seen as a more or less ad hoc solution. In this work, an
alternative has been proposed using well-established multiobjective DE.
5.1. Individual representation

Fig. 3. Concept of dominance and Pareto-optimality.

that f i(s0 )  f j(s) for all f 2 S and f i(s0 ) > f j(s) for at least one
f 2 S. Informally, this means that s is Pareto-optimal if and only
if there is no feasible solution s0 which increases some objective
function without simultaneously decreasing at least one other
objective function. The solutions in the Pareto-optimal set are
called non-dominated. Given two solutions, s0 and s, s0
dominates s if f i (s0 )  f j(s) for all f 2 S and f i(s0 ) >f j(s) for at
least one f 2 S. In other words, s0 is at least as good as s with
respect to all objectives and better than s with respect to at least
one objective.
The concept of dominance and Pareto-optimality is simply
delineated in Fig. 3. Let us consider the case, where there are
three solutions s1, s2, and s3; and assume that the two objectives
o1 and o2 are to be maximized. s1 is not dominated by any other
solution due to its having the highest value for objective o2.
Similarly, s2 is not dominated by any other solution due to its
having the highest value for objective o1. s3 is not dominated by
s1 due to higher value of s3 for objective o1. However, s3 is
dominated by s2 due to its having lower values for both
objectives o1 and o2 compared to s2. Thus, there is no solution
that dominates s1 and s2 and there is one solution, s2, which
dominates s3. Therefore, the set of non-dominated or Paretooptimal solution is given by Pareto_Set = {s1, s2}.
The goal in multi-objective optimization is to find a diverse
set of Pareto-optimal solutions. In evolutionary multi-objective
optimization, this is typically found by producing a set of
solutions from a single EA run. In rule mining task here, a set of
high-quality numeric ARs which are optimal in the wider sense
that no other rules are superior to them when all objectives are
simultaneously considered, are mined in a single DE run.
5. The proposed differential evolution algorithm
(MODENAR)
In recent years, the techniques of evolutionary computation
have proven themselves useful in the area of data mining. For
the problem of rule mining, several objective functions have
been designed, relating to accuracy, comprehensibility and
interestingness in general [28]. However, when searching for

In this work, the individuals which are being produced and


modified along the evolution process represent rules. Each
individual consists of decision variables which represent the
items and intervals. A positional encoding, where the ith item is
encoded in the ith decision variable has been used. Each
decision variable has three parts. The first part of each decision
variable represents the antecedent or consequent of the rule and
can take three values: 0, 1 or 2. If the first part of the
decision variable is 0, it means that this item will be in the
antecedent of the rule and if it is 1, this item will be in the
consequent of the rule. If it is 2, it means that this item will not
be involved in the rule. All decision variables which have 0 on
their first parts will form the antecedent of the rule while
decision variables which have 1 on their first part will form the
consequent of the rule. While the second part represents the
lower bound, the third part represents the upper bound of the
item interval. The structure of an individual has been illustrated
in Fig. 4, where m is the number of attributes of data being
mined.
In the implementation of this individual representation, the
second and third part of decision variables will be considered as
one value, similar to the rough value in [29]. Thus, the encoding
will consist of two parts, one for representing the antecedent or
consequent of ARs and the other is rough value which
represents the lower and upper bounds of attributes. Let x be a
rough value of an attribute and x; x represent lower and upper
bounds of x respectively. A rough value of each attribute
variable consists of lower and upper bounds:
x x; x

(10)

Thus, some operations on rough values which will be used


for initializing the population and mutation in DE process can
be implemented as
x y x; x y; y x y; x y

(11)

x  y x; x y; y x  y ; x  y

c  x; c  x
c  x c  x; x x; x  c
c  x ; c  x

(12)
if c  0
if c < 0
(13)

Fig. 4. Individual representation.

B. Alatas et al. / Applied Soft Computing 8 (2008) 646656

5.2. Multi-objective DE and new operators


The mined rules have to acquire large support and
confidence. However, these objectives do not have same
importance. In fact, all objectives have been weighted in order
to give them different importance and let the algorithm work
properly in every numeric database. That is why weight of the
confidence is smaller than the support, because in noisy
databases, some rules antecedent and consequent support of
which have value of 1 may exist, that is why rule have a
confidence 100% and it may be declared as a non-dominated
rule. If it is known a priori that database on which MODENAR
will run contains too much noise, weight of the confidence may
be selected too small or the confidence may be excluded from
the objectives.
Comprehensibility is another objective used for multiobjective DE. The motivation behind this objective is to bias the
system towards slightly shorter rules. By this term, readability,
comprehensibility, and ease of understanding which are
important in data mining are increased. It is known that larger
rules are more likely to contain redundant or unimportant
information, and this can obscure the basic components which
make the rule successful and efficiently processable. Comprehensibility is a measure related to the number of attributes
involved in both antecedent and consequent part of the rule. If
the discovered rules have a large number of attributes, it is
difficult to understand the rules and the user will not even be
able to use them. In this study, Ghosh and Naths
comprehensibility expression [19] has been used. It has been
formulized as follows:
 
 
log1 C 


comprehensibility
(14)




log1 A [ C 
Here, jCj and jA [ Cj are the number of attributes involved in
the consequent part and the total rule, respectively.
Amplitudes of the intervals in each of the attributes which
conform interesting rules must be smaller. In this way, between
two individuals that cover the same number of records and have
the same number of attributes, the one whose intervals are
smaller gives the best information. Note that support,
confidence and comprehensibility are maximization objectives;
the amplitude of the intervals which conforms the itemset and
rule is minimization objective. In this study, all objective are
assumed to be maximized for clarity purposes. Thus,
minimization objectives are simply transformed to maximization one by subtracting it from 1. Note that, all objective values
are in the interval [0,1].
amplitude of the intervals 1 

m
1X
ui  li
m i1 maxAi  minAi

(15)
Here, m is the number of attributes in the itemsets, ui and li
are the upper and lower bounds encoded in the itemsets
corresponding to attribute i. max(Ai) and min(Ai) are the

651

allowable limits of the intervals corresponding to attribute i.


Thus, the rules with smaller intervals are intended to be
generated.
Consequently, the total fitness has to appropriately shelter
these objective functions. In this regard, a solution defined by
the corresponding decision vector can be better than, worse, or
equal to, but also indifferent from another solution with respect
to the objective values. Better means, a solution is not worse in
any objective and better with respect to at least one objective
than another. Using this concept, an optimal solution can be
defined as: a solution which is non-dominated by any other
solution in the search space. Such a solution is called Paretooptimal, and the entire set of optimal trade-offs is called the
Pareto-optimal set which is the set of high-quality numeric ARs
in the task of rule mining [16,30]. Interestingness which can use
objective or subjective measures can also easily be included to
the objectives; however in this study it is excluded. It is often
highly dependent on users.
5.2.1. Rounding operator
The individual representation consists of integer and
continuous values. In its canonical form, DE algorithm can
only handle continuous variables. However, simple modifications allow DE to optimize integer variables. This is achieved
by a rounding operator which rounds the variable by truncating,
when the value lies between two integer values as described in
[31]. This operator is performed after the initialization and
mutation process to evaluate trial vectors and for handling
boundary constraints. Thus, rounded variables are not elsewhere assigned in order to let DE work with a population of
continuous variables regardless of the object variable type for
maintaining the diversity of the population and the robustness
of the algorithm. They are only used in objective function
evaluation:
X 1;...;D Y 1;...;k ; roundZ k1;...;D T

(16)

Here, X is the D-dimensional parameter vector, Y the k-dimensional vector of continuous parameters, Z the vector of (Dk)
discrete parameters, and round() is a function for converting a
continuous value to an integer value by truncation. In case of
integer variable, the population is initialized as follows:
X 0j;i X min
h j X max
 X min
1;
j
j
j
i 1; . . . ; N p ; j 1; . . . ; D

(17)

5.2.2. Repairing operator


After mutation operator, if one or more of the variables in the
mutant vectors are outside their boundaries or lower bound
takes a bigger value than upper bound, a constraint handling
technique which uses a repairing operator is performed to
explore only the feasible solution space. If lower bound takes a
bigger value than upper bound inside their boundaries, their
values are simply exchanged. If the variables in the new
solution are outside their boundaries, that is if the lower bound
takes a value which is smaller than its allowable minimum
value, or if the upper bound takes a value which is bigger than

652

B. Alatas et al. / Applied Soft Computing 8 (2008) 646656

its allowable maximum value, the repairing rule is applied as


follows [32]:

0
X Gj;i

8 G0
X X min
>
j
>
< j;i
2

G0
min
>
> min X j;i  X j
:
Xj
2

if X Gj;i < X min


j
if

0
X Gj;i

(18)

> X max
j

Furthermore, the individuals have at least two attributes one


for the antecedent and one for the consequent in order to form a
rule.
5.2.3. Filtrating operator
Filtrating operator is used, if the number of non-dominated
solutions exceeds some threshold. It uses a distance metric
relation (a nearest neighbor distance function) computed in (19)
to remove the parents which are very close to each others [32].
This threshold is a maximum number of non-dominated
solutions in each generation. The main idea of using the nearest
neighbor distance consists in forming representative subset of
Pareto-optimal solutions by selecting only these solutions
which are not close to each others in the space of objectives.
Dx

min jjx  xi jj min jjx  x j jj


;
2

x 6 xi 6 x j

(19)

That is, the nearest neighbor distance is the average


Euclidean distance between the closest two points. The nondominated solution with the smallest neighbor distance is
removed from the population until the total number of nondominated solutions is retained to the threshold. The threshold
should be chosen with great care. The choice of this value
depends on the problem to be solved. If it is too small, the
number of Pareto solutions might not be representative for the
problem and the DE may not reach the real Pareto frontier. If it
is too large, the effect of reducing the calculation time is lost.
5.3. Algorithm
The algorithm works as follows: an initial population is
randomly generated taking into account the boundary
constraints. Rounding operator is applied to the first part of
decision variables. All dominated solutions are removed from
the population. The remaining non-dominated solutions are
retained for mutation. Rounding and repairing operators are
applied. Then, crossover is performed. If the number of nondominated solutions exceeds some threshold, filtrating operator
which uses a distance metric relation (a nearest neighbor
distance function) computed in (19) is applied to remove the
parents which are very close to each others. Three parents are
randomly selected. A child is generated from the three parents
and replaced into the population, if it dominates the first
selected parent; otherwise weighted sum fitness is calculated
for both the child and for the first selected parent as follows:
f XiG

o
X
k1

wk f k XiG

(20)

Fig. 5. The main steps of MODENAR.

Here, o# is the number of objective, wk the weight for objective


k, and f k() is fitness of the kth objective. The solution calculated
weighted sum fitness of which is bigger is selected. This
process continues until the population is completed. To increase
the quality of rules, finally adjusting the intervals for the chosen
individuals has been performed. This has been done by decreasing the size of their intervals until the number of covered
records is smaller than the records covered by the original
rules. The stopping criterion for the algorithm may be of two
kinds:
1. There may be no new better solution added to the nondominated solutions for a specified number of solutions.
2. An upper bound on the number of generations may be
assigned.
The stopping criterion may be a combination of the two as
well. However, in this work, the second criterion is applied.
The values of the weights of the objectives are already
needed in numeric ARs mining and in fact, these values may be
optimized and empirically determined according to the
requirements. The user specified thresholds, minimum support
and minimum confidence, in other ARs mining algorithms are
determined for each database. However, no rules may be mined
dependent upon these determined database-dependent thresholds although the database may contain accurate and
comprehensible rules according to its values of attributes.
MODENAR does not use these thresholds and use only weight
values according to the requirements that can be optimized for
every database. In other words, same optimized values may be
used for every database.
The main steps of MODENAR are shown in Fig. 5.
6. Experimental results
MODENAR was first evaluated in a synthetic database with
a size of 1000 records formed by four numeric attributes. All of
the domains of values were set to [0,100]. The values were
uniformly distributed in attributes in such a way that they were
grouped in pre-determined sets as shown in Table 1. This
distribution of the values was completely arbitrary. Some

B. Alatas et al. / Applied Soft Computing 8 (2008) 646656

653

Table 1
Synthetically created sets

Table 3
ARs found by MODENAR

A1 2 [10] ^ A2 2 [1530]
A1 2 [1545] ^ A3 2 [6075]
A2 2 [6590] ^ A4 2 [1545]
A3 2 [80100] ^ A4 2 [80
100]

Rule

Support
(%)

Confidence
(%)

Records
(%)

A1 2 [110] ) A2 2 [1530]
A1 2 [1545] ) A3 2 [6075]
A3 2 [80100] ) A4 2 [8098]
A2 2 [6590] ) A4 2 [1543]
A2 2 [1530] ) A1 2 [110]
A3 2 [6075] ) A1 2 [1545]
A4 2 [8098] ) A3 2 [80100]
A4 2 [1544] ) A2 2 [6589]

25
25
25
25
25
25
25
25

100
100
100
100
100
100
100
100

100

intervals had small size and others have larger size. Support and
confidence values for these sets were 25% and 100%,
respectively. Other values outside these sets were distributed
in such a way that no other better rules than these rules exist. By
using the appropriate weights for objectives in ARs mining
task, these rules wanted to be mined. The goal was to most
accurately find the intervals of each one of the built regions. It is
wanted to be tested that MODENAR finds the association rules
with the most accurate values for the numeric intervals of each
attribute in the rule. DE/rand/1 scheme has been used for the
algorithm. The used parameter values have been shown in
Table 2. Empirically determined weight values for support,
confidence, comprehensibility, and objective computed as in
(15) for amplitude of the intervals were 0.8, 0.2, 0.1, and 0.4,
respectively.
In Table 3, the ARs found by MODENAR are shown. It can
bee seen that it found the comprehensible rules that have high
support and confidence values according to the synthetically
created sets. Note that MODENAR is database-independent,
since it does not rely upon support/confidence thresholds which
are hard to choose for each database. If support and confidence
thresholds have been used and a support threshold that is higher
than 25% is selected, no rules will be able to be found according
to the values of the attributes in this database. However, it is
known that this database contains some accurate and
comprehensible rules. MODENAR is able to find all these
rules without relying upon the minimum support and the
minimum confidence thresholds.
To test the efficiency of the proposed algorithm, it has been
executed on noisy synthetic database. The noise in this database
is introduced by locating the values that does not belong to the
interval of the second item of the set. That is why a percentage r of
records exists that is not fulfilled in the pre-established interval of
the second item. For example, for the first set there is a percentage
r of records that do not fulfill the second item A2 2 [1530], but
they are distributed in the ranges [014] or [31100].
The algorithm has tested whether it obtains the most
adequate ranges for antecedents and consequents of the rules.
This test was carried out with three levels of noise (4%, 6%, and
Table 2
The used parameter values for synthetic database
Parameters

Values

Pop. size
No. of generations
Crossover rate (CR)
Step length (F)

10
1000
0.3
Generated for each variable
from a Gaussian distribution N(0,1)
8

Threshold

8% for the value of r). The experimental results have been


demonstrated in Table 4. In this table, mined rules, support and
confidence values of mined rules and percentages of records
covered by the mined rules on the total records have been
shown. It can be seen that the ranges of the intervals almost
exactly adjust the ones synthetically created. This shows that
MODENAR is able to overcome certain levels of noise among
data.
MODENAR was also evaluated in seven public domain
databases: basketball, bodyfat, bolts, pollution, quake, sleep,
and stock price. These databases are available from Bilkent
University Function Approximation Repository [33]. The used
Table 4
Rules mined under different levels of noise
Mined rules

Support
(%)

Confidence
(%)

Records
(%)

r = 4%
A1 2 [110] ) A2 2 [1529]
A1 2 [1545] ) A3 2 [6073]
A3 2 [80100] ) A4 2 [8096]
A2 2 [6590] ) A4 2 [1546]
A2 2 [1529] ) A1 2 [110]
A3 2 [6073] ) A1 2 [1545]
A4 2 [8096] ) A3 2 [80100]
A4 2 [1546] ) A2 2 [6589]

24.1
24.0
23.7
24.2
24.1
24.0
23.7
24.2

100
100
96.7
98.3
100
100
96.7
98.3

96.0

r = 6%
A1 2 [111] ) A2 2 [1431]
A1 2 [1545] ) A3 2 [5673]
A3 2 [80100] ) A4 2 [8495]
A2 2 [6589] ) A4 2 [1449]
A2 2 [1431] ) A1 2 [111]
A3 2 [5673] ) A1 2 [1545]
A4 2 [8495] ) A3 2 [80100]
A4 2 [1449] ) A2 2 [6589]

23.3
23.6
23.3
23.8
23.3
23.6
23.3
23.8

98.9
99.0
94.5
97.8
98.9
99.0
94.5
97.8

94.0

r = 8%
A1 2 [111] ) A2 2 [1429]
A1 2 [1545] ) A3 2 [6276]
A3 2 [79100] ) A4 2 [8298]
A2 2 [6590] ) A4 2 [1548]
A2 2 [1429] ) A1 2 [111]
A3 2 [6276] ) A1 2 [1545]
A4 2 [8298] ) A3 2 [79100]
A4 2 [1548] ) A2 2 [6590]

22.4
22.9
22.8
23.7
22.4
22.9
22.8
23.7

97.6
98.0
93.4
95.8
97.6
98.0
93.4
95.8

91.8

654

B. Alatas et al. / Applied Soft Computing 8 (2008) 646656

Table 5
The used parameter values for real databases
Parameters

Values

Pop. size
No. of generations
Crossover rate (CR)
Step length (F)

100
1000
0.3
Generated for each variable from
a Gaussian distribution N(0,1)
60

Threshold

without finding frequent itemsets. Furthermore it is also based


on an evolutionary computation technique (a genetic algorithm)
and simultaneously searches for intervals of numeric attributes
and the discovery of ARs which these intervals conform in only
single run. For the algorithm proposed in [18], the population
size was set to 100 and it has been modified to find only positive
ARs. It can be shown that, number of high-quality rules found
by MODENAR is more than greater than that reported in [18].
The results obtained in these domains seem to indicate that
MODENAR is competitive with the other algorithm in terms of
confidence values.
Table 7 shows the comparison of obtained results from
MODENAR, the GAR (Genetic Association Rule Mining)
algorithm proposed in [17] and the work proposed in [18]. The
GAR algorithm uses an EA for mining only frequent itemsets.
That is why comparisons about the values according to the rules
cannot be made. The value of the column support (%)
indicates the mean of support, while the value of the column
size shows the mean number of attributes contained in the
rules. The column amplitude (%) indicates the mean size of
the intervals which conforms the set.
MODENAR has found rules with high values of support in
five out of seven databases and the difference is not significant.
The size values obtained from MODENAR are smaller than the
values obtained from the GAR in five out of seven databases
and they are smaller than the values obtained from the
algorithm proposed in [18] in three out of seven databases.
Amplitude values obtained from MODENAR are smaller than
or equal to the GAR and the results obtained from the work
proposed in [18] in four out of seven databases.
Table 8 shows the mean number of sizes and mean size of
amplitudes of the antecedent and consequent of the mined rules
by MODENAR. It can be concluded from the columns size

parameter values have been shown in Table 5. A characteristic


of MODENAR is that it is stochastic. Thus, the algorithm had
fluctuations in different runs. In order to get a better result, the
user may execute several trials of the algorithm to get the result
with the best solutions. Algorithm was executed 10 times and
the average values of such execution were presented.
Comparing MODENAR among other AR mining algorithms
is difficult due to lack of algorithms working with numeric
values. Most of them need to establish the intervals of numeric
attributes before mining process. AR mining is conditioned to
the manual discretization carried out by the user in these
algorithms. That is why MODENAR is meaningfully compared
to only two evolutionary computation-based algorithms in the
literature that discretize numerical attributes while searching
for association rules.
Table 6 shows the number of records and the number of
numeric attributes for each database as well as the mean
number of different high-quality rules and the mean of
confidence value of these rules with standard deviation found
by the algorithm proposed in [18] and MODENAR. The
experimental comparison in terms of number of rules and
confidence values has been performed because the algorithm
proposed in [18] finds directly numeric association rules
Table 6
Comparisons of the results with the work proposed in [18]
Database

Basketball
Bodyfat
Bolts
Pollution
Quake
Sleep
Stock price

No. of records

No. of attributes

96
252
40
60
2178
62
950

No. of rules

5
18
8
16
4
8
10

Confidence (%)

Ref. [18]

MODENAR

Ref. [18]

MODENAR

33.8
44.2
39.0
41.2
43.8
32.8
48.2

48.0
52.4
55.4
54.2
55.4
48.8
53.8

60  1.2
59  3.8
65  1.9
68  4.8
62  5.1
64  2.3
52  2.5

61  2.1
62  3.2
65  1.8
67  2.7
63  2.8
64  3.4
56  1.9

Table 7
Comparisons of the results
Database

Basketball
Bodyfat
Bolts
Pollution
Quake
Sleep
Stock price

Support (%)

Size

Amplitude (%)

MODENAR

GAR

Ref. [18]

MODENAR

GAR

Ref. [18]

MODENAR

GAR

Ref. [18]

37.20
65.22
28.52
44.85
39.86
36.55
45.29

36.69
65.26
25.97
46.55
38.65
35.91
45.25

32.21
63.29
27.04
38.95
36.96
37.25
46.21

3.21
6.87
5.19
6.24
2.03
4.23
6.01

3.38
7.45
5.29
7.32
2.33
4.21
5.80

3.21
7.06
5.14
6.21
2.1
4.19
6.20

19
25
19
15
17
5
22

25
29
34
15
25
5
26

20
27
27
14
19
4
22

B. Alatas et al. / Applied Soft Computing 8 (2008) 646656

655

Table 8
Mean number of sizes and mean size of amplitudes of the antecedent and consequent of the mined rules
Database

Size of antecedent

Size of consequent

Amplitude of antecedent (%)

Amplitude of consequent (%)

Basketball
Bodyfat
Bolts
Pollution
Quake
Sleep
Stock price

1.19
2.47
2.18
2.84
0.85
1.88
2.02

2.02
4.40
3.01
3.40
1.18
2.35
3.99

17
27
20
13
16
5
24

20
24
18
17
18
5
21

Table 9
Percentages of records covered by the mined rules
Database

Records (%)
MODENAR

GAR

Ref. [18]

Basketball
Bodyfat
Bolts
Pollution
Quake
Sleep
Stock price

100
86.11
80.0
95.0
88.9
80.6
98.73

100
86.0
77.5
95.0
87.5
79.03
99.26

100
84.12
77.5
95.0
87.6
79.81
98.99

of the table that MODENAR mines the rules with short


antecedent and long consequent that is coherent with the
comprehensibility definition (14) used in the work.
The last experimental result presented in Table 9 shows
percentages of records covered by the mined rules on the total
records of the real databases. It can be concluded from the
results that MODENAR is competitive with other two
evolutionary computation-based algorithms.
Consequently, MODENAR has found rules with high values
of support and confidence but without expanding the intervals
in excess. Furthermore the number of attributes in the rules is
small. Thus, the discovered rules within these databases by
MODENAR are accurate, readable, and comprehensible.
7. Conclusions and future works
In this paper, problem of mining numeric ARs has been
characterized as a multi-objective optimization problem and a
Pareto-based multi-objective DE, called MODENAR, has been
proposed for mining all accurate and comprehensible ARs from
the last population in only single run. It is a relatively simple,
easy to implement and easy to use algorithm. It has been
designed to simultaneously search for intervals of numeric
attributes which conform a rule, in such a way that, the problem
of finding rules only with the intervals created before starting
the process which has been used in the literature has been
avoided. Contrary to the techniques used as usual, comprehensible ARs with high support and confidence have directly
been mined without generating frequent itemsets and without
relying upon the minimum support and the minimum
confidence thresholds which are hard to determine for each
database.

We plan to implement this technique with more elaborated


experiments by using optimized parameters for different task of
data mining such as sequential patterns, classification, and
clustering rules mining.
References
[1] J. Han, M. Kamber, Data Mining: Concepts and Techniques, Morgan
Kaufmann Publishers Academic Press, 2001.
[2] E. Zitzler, L. Thiele, Multi-objective evolutionary algorithms: a comparative case study and the strength pareto approach, IEEE Trans. Evol.
Comput. 3 (4) (1999) 257271.
[3] P. Strom, M.L. Hetland, Multi-objective evolution of temporal rules, in:
Proceedings of the eighth Scandinavian Conference on Artificial Intelligence, IOS Press, 2003.
[4] R. Agrawal, T. Imielinski, A.N. Swami, Mining association rules between
sets of items in large databases, in: Proceedings of ACM SIGMOD,
Washington, DC, (1993), pp. 207216.
[5] K. Ke, J. Cheng, W. Ng, MIC framework: an information-theoretic
approach to quantitative association rule mining, in: Proceedings of the
ICDE 06, 2006, pp. 112114.
[6] R. Srikant, R. Agrawal, Mining quantitative association rules in large
relational tables, in: Proceedings of the ACMSIGMOD, 1996, pp. 112.
[7] T. Fukuda, M. Yasuhiko, M. Sinichi, T. Tokuyama, Mining optimized
association rules for numeric attributes, in: Proceedings of the ACM
SIGACT-SIGMOD-SIGART Symposium on Principles of Database Systems, ACM Press, 1996, pp. 182191.
[8] T. Fukuda, Y. Morimoto, S. Morishita, T. Tokuyama, Data mining using
two-dimensional optimized association rules: scheme, algorithms, and
visualization, in: Proceedings of the ACM SIGMOD International Conference Management of Data, ACM Press, 1996, pp. 1323.
[9] K. Yoda, T. Fukuda, Y. Morimoto, S. Morishita, T. Tokuyama, Computing
optimized rectilinear regions for association rules, in: Proceedings of the
third International Conference on Knowledge Discovery and Data Mining,
AAAI Press, 1997, pp. 96103.
[10] Y. Aumann, Y. Lindell, A statistical theory for quantitative association
rules, J. Intell. Inf. Syst. 20 (3) (2003) 255283.
[11] R.J. Miller, Y. Yang, Association rules over interval data, in: Proceedings
of the ACM SIGMOD International Conference on Management of Data,
29, 1997, pp. 452461.
[12] B. Lent, A. Swami, J. Widom, Clustering association rules, in: Proceedings of the IEEE International Conference on Data Engineering, 1997, pp.
220231.
[13] M. Vannucci, V. Colla, Meaningful disretization of continuous features for
association rules mining by means of a SOM, in: Proceedings of the
ESANN2004 European Symposium on Artificial Neural Networks, Belgium, (2004), pp. 489494.
[14] B. Alatas, A. Arslan, A novel approach based on genetic algorithm and
fuzzy logic for mining of association rules, J. Sci. Eng. (Firat University)
17 (1) (2005) 4251, in turkish.
[15] B. Alatas, A. Arslan, Mining of fuzzy association rules with genetic
algorithms, J. Polytech. (Gazi University) 7 (4) (2004) 269276, in
turkish.

656

B. Alatas et al. / Applied Soft Computing 8 (2008) 646656

[16] M. Kaya, R. Alhajj, Genetic algorithm based framework for mining fuzzy
association rules, Fuzzy Sets Syst. 152 (3) (2005) 587601.
[17] J. Mata, J.L. Alvarez, J.C. Riquelme, Discovering numeric association
rules via evolutionary algorithm, in: Proceedings of the sixth Pacific-Asia
Conference on Knowledge Discovery and Data Mining PAKDD-02
(LNAI), Taiwan, (2002), pp. 4051.
[18] B. Alatas, E. Akin, An efficient genetic algorithm for automated mining of
both positive and negative quantitative association rules, in: Soft Computing, vol 10 no. 3, Springer-Verlag, 2006, pp. 230237.
[19] A. Ghosh, B. Nath, Multi-objective rule mining using genetic algorithms,
in: Information Sciences, vol. 163, no. 13, Elsevier Inc., 2004, pp. 123
133.
[20] R. Storn, K. Price, Differential evolutiona simple and efficient adaptive
scheme for global optimization over continuous spaces, Technical Report
TR-95-012, ICSI, 1995.
[21] D. Karaboga, S. Okdem, A simple and global optimization algorithm for
engineering problems: differential evolution algorithm, Turkish J. Electr.
Eng. Comput. Sci. 12 (1) (2004) 5360.
[22] D.E. Goldberg, Genetic Algorithms in Search, Optimization and Machine
Learning, Addison-Wesley, New York, 1989.
[23] S. Baluja, Population-based incremental learning: a method for integrating
genetic search based function optimization and competitive learning,
Technical Report CMU-CS-94-163, Comp. Sci. Dep., Carnegie Mellon
University, 1994.
[24] I. Rechenberg, Evolution Strategy, in Zuarda et.al. (1994) 147159.

[25] Storn, R., On the Use of differential evolution for function optimization,
Technical Report, ICSI, Berkeley, 1996.
[26] R. Perez-Guerrero, Differential evolution based power dispatch algorithms, Master Thesis, University of Puerto Rico, 2004.
[27] C.A. Coello, An updated survey of GA-based multi-objective optimization
techniques, ACM Comput. Surveys 32 (2) (2000) 109143.
[28] P. Strom, M.L. Hetland, Multiobjective evolution of temporal rules, in:
Proceedings of the eighth Scandinavian Conference on Artificial Intelligence, SCAI, IOS Press, 2003.
[29] B. Alatas, E. Akin, Rough differential evolution algorithm, in: Proceedings of the Second International Conference on Electronics and Computer
Engineering IKECCO, 2005, Bishkek/Kyrgyzstan, (2005), pp. 173178.
[30] R. Sarker, H.A. Abbass, Differential evolution for solving multi-objective
optimization problems, Asia-Pacific J. Operat. Res. 21 (2) (2004) 225
240.
[31] J. Lampinen, I. Zelinka, Mixed integer-discrete-continuous optimization
by differential evolution. Part 1. The optimization method, in: P Osmera
(Ed.), Proceedings of MENDEL99, fifth International Mendel Conference on Soft Computing, Brno, Czech Republic, (1999), pp. 7176.
[32] H.A. Abbass, R. Sarker, C. Newton, PDE, A Pareto-frontier differential
evolution approach for multi-objective optimization problems, in: Proceedings of the 2001 Congress on Evolutionary Computation, vol. 2,
Seoul, South Korea, IEEE, Piscataway, NJ, USA, 2001, p. 971-L978.
[33] H. A. Guvenir, I. Uysal, Bilkent University Function Approximation
Repository, 2000 http://funapp.cs.bilkent.edu.tr.

You might also like