You are on page 1of 11

IEEE TRANSACTIONS ON SYSTEMS, MAN, AND CYBERNETICS—PART B: CYBERNETICS, VOL. 37, NO.

5, OCTOBER 2007 1321

A Neuro-Fuzzy Inference System Through


Integration of Fuzzy Logic and Extreme
Learning Machines
Zhan-Li Sun, Kin-Fan Au, and Tsan-Ming Choi, Member, IEEE

Abstract—This paper investigates the feasibility of applying a The main difference between the Mamdani and Sugeno types is
relatively novel neural network technique, i.e., extreme learning that the Sugeno output membership functions are either linear
machine (ELM), to realize a neuro-fuzzy Takagi–Sugeno–Kang or constant [5], [6].
(TSK) fuzzy inference system. The proposed method is an im- Inspired by the biological nervous system, ANNs have been
proved version of the regular neuro-fuzzy TSK fuzzy inference
system. For the proposed method, first, the data that are processed proposed to solve problems that are difficult for conventional
are grouped by the k-means clustering method. The membership computers or human beings. Most ANNs have some sort of
of arbitrary input for each fuzzy rule is then derived through training rules whereby the weights of connections are adjusted
an ELM, followed by a normalization method. At the same time, on the basis of data. In other words, ANNs can learn from
the consequent part of the fuzzy rules is obtained by multiple examples and exhibit some capability for generalization beyond
ELMs. At last, the approximate prediction value is determined the training data [7].
by a weight computation scheme. For the ELM-based TSK fuzzy
inference system, two extensions are also proposed to improve its
Both the fuzzy system and ANNs are soft-computing ap-
accuracy. The proposed methods can avoid the curse of dimension- proaches to model expert behavior [21], [22]. The goal is to
ality that is encountered in backpropagation and hybrid adaptive mimic the actions of an expert who solves relatively complex
neuro-fuzzy inference system (ANFIS) methods. Moreover, the problems. In other words, instead of investigating the problem
proposed methods have a competitive performance in training in detail, one observes how an expert successfully tackles the
time and accuracy compared to three ANFIS methods. problem and obtains knowledge by instruction and learning. A
Index Terms—Adaptive neuro-fuzzy inference system (ANFIS), learning process can be part of knowledge acquisition. In the
extreme learning machine (ELM), k-means clustering, Takagi– absence of an expert or sufficient time or data, one can resort to
Sugeno–Kang (TSK) fuzzy inference system. reinforcement learning instead of supervised learning. If one
has knowledge that is expressed as linguistic rules, one can
I. I NTRODUCTION build a fuzzy system. On the other hand, if one has enough data
or can learn from a simulation or a real task, ANNs are very

A RTIFICIAL neural networks (ANNs) and fuzzy inference


are widely used in the areas of prediction, identification,
diagnostics, and control of linear or nonlinear systems [1]–
appropriate [8].
The merits of both neural and fuzzy systems can be inte-
grated in a neuro-fuzzy approach. Combined with the learning
[4], [20]. Fuzzy inference is the process of formulating the ability of ANNs, the fuzzy inference system has proven to
mapping from a given input to an output using fuzzy logic. be a powerful mathematical construct, which also enables the
The mapping then provides a basis from which decisions can symbolic expression of machine learning results. Over the past
be made or patterns can be discerned. Fuzzy inference systems few years, the application of neuro-fuzzy methods to nonlinear
have been successfully applied in fields such as automatic process identification using input–output data is a very active
control, data classification, decision analysis, expert systems, research area [9]. A comprehensive and insightful survey can
and computer vision. There are mainly two types of fuzzy be found in [8].
inference systems: 1) the Mamdani type and 2) the Sugeno type. One of the most influential neuro-fuzzy reasoning models
has been proposed by Takagi and Sugeno in [10]–[12]. Since
Manuscript received September 6, 2006; revised April 18, 2007. The work of
then, Sugeno et al. have established what is called today
K.-F. Au was supported in part by the RGC Competitive Earmarked Research the Takagi–Sugeno–Kang (TSK) method. This neural-network-
Grant PolyU 5101/05E. The work of T.-M. Choi was supported in part by the based fuzzy reasoning scheme is capable of learning the mem-
RGC Competitive Earmarked Research Grant PolyU 5145/06E and in part by bership function of the “IF” part and determining the amount
the competitive grant of the Hong Kong Polytechnic University A-PH22. This
paper was recommended by Associate Editor E. Santos. of control in the “THEN” part of the inference rules. It is well
Z.-L. Sun was with the Institute of Textiles and Clothing, Hong suited to mathematical analysis and usually works well with
Kong Polytechnic University, Kowloon, Hong Kong. He is currently with optimization and adaptive techniques. Subsequently, many im-
the School of Computer Engineering, Nanyang Technological University,
Singapore 639798, Singapore.
proved algorithms and extensions were developed for the TSK
K.-F. Au and T.-M. Choi are with the Institute of Textiles and Clothing, model [8]. In particular, the adaptive neuro-fuzzy inference
Hong Kong Polytechnic University, Kowloon, Hong Kong (e-mail: tcjason@ system (ANFIS) is an important approach to implement the
inet.polyu.edu.hk). TSK fuzzy system. The ANFIS is a five-layer network structure
Color versions of one or more of the figures in this paper are available online
at http://ieeexplore.ieee.org. that constructs a fuzzy inference system. There are three meth-
Digital Object Identifier 10.1109/TSMCB.2007.901375 ods that ANFIS learning employs for updating membership

1083-4419/$25.00 © 2007 IEEE


1322 IEEE TRANSACTIONS ON SYSTEMS, MAN, AND CYBERNETICS—PART B: CYBERNETICS, VOL. 37, NO. 5, OCTOBER 2007

function parameters: 1) backpropagation for all parameters To select the number of hidden neurons of ELM, the data
(ANFIS1); 2) a hybrid method consisting of backpropagation are first divided into training, validation, and prediction sets.
for the parameters that are associated with the input member- We gradually increase the number of hidden neurons in a given
ship functions and least squares estimation for the parame- interval and then select the result with the smallest validation
ters that are associated with the output membership functions error as the final one. Besides this selection criterion, as a stabil-
(ANFIS2) [5], [13]; and 3) the subtractive clustering method ity performance index, the standard deviation of the validation
(ANFIS3). error is also considered in the selection procedure.
The TSK and most of its modified versions, however, are For the ETSK, we give two extensions that can obtain smaller
based on gradient-based learning algorithms. The conventional training, validation, and prediction errors. In the first extension
gradient-based learning algorithms, such as backward prop- method, we repeat a given number of trials for the ETSK with
agation (BP) and its variant, i.e., the Levenberg–Marquardt the randomly selected input weights at first. Then, the set of
method [23], have been extensively used in the training of input weights that gives the smallest validation error is selected.
multilayer feedforward neural networks. Although reasonable The prediction values of the training, validation, and prediction
performance can be obtained when the networks are trained sets are computed based on the selected input weights. This
by BP, these gradient-based learning algorithms still learn rel- extension is named ETSKE1 in this paper.
atively slowly. These learning algorithms may also easily con- For the second extension method, we select several sets of
verge to a local minimum. Moreover, the activation functions input weights with the smallest validation errors instead of
that are used in these gradient-based tuning methods need to be one set in the first extension. The prediction values of the
differentiable [14]. training, validation, and prediction sets are computed by using
A novel learning algorithm for single-hidden-layer feed- the respective set of input weights. The mean value of the
forward neural networks (SLFNs) called extreme learning prediction values is regarded as the final prediction value. This
machine (ELM) [15], [24] has recently been proposed by extension is named ETSKE2 in this paper.
Huang et al. In ELM, the input weights (linking the input layer The rest of the paper is organized as follows. Section II
to the hidden layer) and hidden biases are randomly chosen, and provides some necessary background information, and the pro-
the output weights (linking the hidden layer to the output layer) posed system is discussed in Section III. Section IV presents
are analytically determined by using Moore–Penrose (MP) the simulation results and discussion of the ETSK. Finally, the
generalized inverse. ELM not only learns much faster with a summary of this paper is given in Section V.
higher generalization performance than the traditional gradient-
based learning algorithms but it also avoids many difficulties II. B ACKGROUND
that are faced by gradient-based learning methods such as
stopping criteria, learning rate, learning epochs, local minima, Fig. 1 shows the block diagram of the proposed TSK fuzzy
and overtuning issues [14], [24], [25]. However, as the output inference system. In this section, we first give a concise review
weights are computed based on the prefixed input weights of the regular TSK fuzzy inference system, ELMs, and the
and hidden biases, there may exist a set of nonoptimal or normalization method.
unnecessary input weights, and hidden biases. In [14], a hybrid
approach named E-ELM is proposed by combining differential A. Regular TSK Fuzzy Inference System
evolution (DE) and ELM. In E-ELM, a modified DE is used to The core of the TSK model [10] is a set of IF–THEN rules
search for the optimal input weights and hidden biases, while with fuzzy implications and first-order functional consequence
the MP generalized inverse is used to analytically calculate the parts, which has been proven to be a universal approximator.
output weights. The authors find that the hybrid method can The format of fuzzy rule Ri is given as follows:
achieve a good generalization performance with much more
compact networks. One shortcoming of E-ELM, however, is Ri : IF x1 is Ai1 , x2 is Ai2 , . . . , xM is AiM
that it may take much more training time than ELM because THEN yi = ci0 + ci1 x1 + · · · + ciM xM .
it incorporates DE. Moreover, there are more parameters to be
adjusted in E-ELM than ELM. For the fuzzy inference system, The number of rules can be determined by the clustering
there are multiple neural networks that are trained at the same method. The algorithm creates linear models that locally ap-
time. Therefore, it may take a long training time if we use proximate the function to be learned. Structure identification
E-ELM in the fuzzy inference system. It is also difficult to sets a coarse fuzzy partitioning of the domain, while param-
obtain the optimum parameters of E-ELM. eter identification optimally adjusts premise and consequent
Based on the previous analyses, we propose an ELM-based parameters. The algorithm is divided into three major parts:
TSK (ETSK) fuzzy inference system in this paper. In the ETSK 1) the partition of inference rules; 2) the identification of IF
method, the membership of arbitrary input for each fuzzy rule is parts; and 3) the identification of THEN parts. For the TSK, an
derived through an ELM, followed by a normalization method, ANN represents a rule, while all the membership functions are
which transforms the outputs into the interval [0, 1]. At the represented by only one ANN.
same time, the consequent part of the fuzzy rules is identified
by multiple ELMs. Due to the advantages of ELM, the ETSK
B. ELM
can avoid the many difficulties that are faced by gradient-
based learning methods such as stopping criteria, learning rate, ELM is a relatively new learning algorithm for SLFNs [14],
learning epochs, local minima, and overtuning issues, which are [15]. It randomly chooses the input weights and analytically
also encountered in the regular TSK fuzzy inference system. determines the output weights of SLFNs.
SUN et al.: NEURO-FUZZY INFERENCE SYSTEM 1323

denotes the output of the jth hidden neuron with respect to


xi , β = [β 1 , . . . , β K ] is the matrix of output weights, and
T = [t1 , t2 , . . . , tN ]T is the matrix of targets.
In ELM, the input weights and hidden biases are randomly
generated instead of tuned. Thus, determination of the output
weights (linking the hidden layer to the output layer) is as
simple as finding the least square (LS) solution to the given
linear system. The minimum norm LS solution to the linear
system (3) is

β= H† T (5)

where H† is the MP generalized inverse of matrix H [15].


The minimum norm LS solution is unique and has the smallest
norm among all the LS solutions. As analyzed by Huang et al.
[15], by using such MP inverse method, ELM tends to obtain a
good generalization performance with a dramatically increased
learning speed.

C. Normalization Method
The input data can be projected into the interval [0, 1] [(6)
and (7)] or [−1, 1] [(8) and (9)]. In addition, the output of
the membership neural network should be projected into the
interval [0, 1], i.e.,

xpi : = (xpi − min{xpi })/(max{xpi } − min{xpi }) ,


i = 1, 2, . . . , n; p = 1, . . . , N
Fig. 1. Block diagram of the proposed TSK fuzzy inference system. (6)
yp : = (yp − min{yp })/(max{yp } − min{yp }) ,
Suppose that we are training SLFNs with K hidden neu- p = 1, . . . , N (7)
rons and an activation function vector g(x) = (g1 (x), g2 (x),
xpi : = xpi /max{abs(xpi )}, i = 1, 2, . . . , n; p = 1, . . . , N
. . . , gK (x)) to learn N distinct samples (xi , ti ), where xi =
[xi1 , xi2 , . . . , xin ]T ∈ Rn and ti = [ti1 , ti2 , . . . , tim ]T ∈ Rm . (8)
The SLFNs can approximate these N samples with a zero error ypi : = ypi /max{abs(ypi )}, p = 1, . . . , N (9)
given by
where

N
oj − tj  = 0 (1) min{xpi } := min{xpi , p = 1, . . . , N } (10)
j=1 max{xpi } := max{xpi , p = 1, . . . , N } (11)
min{yp } := min{yp , p = 1, . . . , N } (12)
which means that there exist parameters β i , wi , and bi max{xpi } := max{xpi , p = 1, . . . , N }. (13)
such that
The operators min{·} and max{·} in the preceding expres-

K
sions are used to select the minimum and maximum values
β i gi (wi · xj + bi ) = tj , j = 1, . . . , N (2)
from the given data series, respectively. The symbol abs is the
i=1
absolute value operator. For more details, please refer to [17].
where wi = [wi1 , . . . , win ]T is the weight vector connect-
ing the ith hidden neuron and the input neurons, β i = III. P ROPOSED TSK F UZZY I NFERENCE S YSTEM
[βi1 , . . . , βim ]T , i = 1, . . . , K is the weight vector connecting
the ith hidden neuron and the output neurons, and bi is the In this section, we discuss the proposed fuzzy inference
threshold of the ith hidden neuron. The operation wi · xj in system in more detail, including the general structure and the
(2) denotes the inner product of wi and xj . The preceding N corresponding learning algorithm.
equations can be written compactly as
A. General Structure
Hβ = T (3)
Fig. 1 shows the general structure of the proposed TSK
where H = {hij } (i = 1, . . . , N and j = 1, . . . , K) is the fuzzy inference system. In Fig. 1, block N Nmf is used to
hidden-layer output matrix, the expression determine the membership values of all rules, and blocks
N N1 , N N2 , . . . , N Nr are used to determine output gi for the
hij = g(wj · xi + bj ) (4) “THEN” part of the ith rule, i = 1, 2, . . . , r. The variables X
1324 IEEE TRANSACTIONS ON SYSTEMS, MAN, AND CYBERNETICS—PART B: CYBERNETICS, VOL. 37, NO. 5, OCTOBER 2007

and y are the input vector and final predicted value, respectively. Step 3) This step is the identification of the constitution
From Fig. 1, it can be seen that the system mainly consists of of each IF part. Here, the ELM and normalization
the following four major parts: method are used to generate the membership func-
1) division of input space; tions under a supervised method. The input vector
2) identification of the “IF” parts; and target vector can be constituted by the following
3) identification of the “THEN” parts; technique. For the input xi , assume that the target
4) computation of the predicted value. vector is wi = [wi1 , . . . , wir ], if xi ∈ Rs , wis = 1,
and the other elements of wi are all 0. After train-
There are two main tasks for the first part. One is determining ing, the output of the ELM is normalized into the
the number of fuzzy inference rules. The other is to divide the interval [0, 1].
input space, so that similar samples are clustered into a group. The learning of N Nmem is conducted, so that
The best number of partitions is decided in view of the distance these wis can be inferred from the input xi . Thus,
between the clusters in a clustering dendrogram. The number N Nmem becomes capable of inferring the degree of
of inference rules equals the number of groups. In this paper, attribution wis of each training data item xi to Rs .
following the approach in [10], the input data are grouped by We define the membership function of the IF part
the k-means method, which is an efficient clustering method. as the inferred value w̃is , which is the output of the
Certainly, other clustering methods can also be used here. For learned N Nmem , i.e.,
a comparison of various kinds of clustering methods, please
refer to [18].
The second part consists of an ELM and a normalization µsA (xi ) = w̃is , i = 1, . . . , N. (14)
processing procedure. This part is used to determine the mem-
bership of arbitrary input for each rule, which derives the
membership function for each rule and corresponds to the Step 4) This step is the identification of each THEN part.
identification of the IF parts (the condition parts) of the rule. Let the ELMs denote the sth neural network of the
Generally, not all of the ELM’s outputs are included in the THEN part in Fig. 1. The structure of the THEN
interval [0, 1]. As we know, the membership value should be part of each inference rule is expressed by the
a data point in the interval [0, 1]. Therefore, here, we use nor- input/output relationship of the ELMs. The T RD
malization to project the output of ELM into the interval [0, 1]. input xsi1 , . . . , xsim and the output value yis , i =
The third part of the system is the determination of the THEN 1, . . . , (Nt )s , are assigned to the input and output
parts (the conclusion parts). Here, multiple ELMs are used to of the ELMs.
get the function between the input and the output. The ELM is Step 5) The final output value yi∗ is derived according to the
trained by the learning data and the output value for each rule. following equation [10]:
The last part weighs the output of the THEN part by the
membership values of the IF parts and computes the final 
r
output value. µsA (xi ) · us (xi )
yi∗ = s=1

r , i = 1, . . . , N. (15)
µsA (xi )
B. Algorithm of the ETSK Fuzzy Inference System s=1

Assume that the dimension of the input variable is n and


that the number of the samples is N . Define the input vari- After training, we can directly get its correspond-
ables as xi = (xi1 , . . . , xin ) and the observed value as ti , i = ing prediction values when V AD and P RD are the
1, . . . , N . Here, we give the specific steps of ETSK [Steps 1)– inputs into the system.
5)], ETSKE1, and ETSKE2 [Steps 6)–8)]. Step 6) Repeat Steps 3) and 5) for P times for the same
Step 1) Divide the input/output data (xi , ti ) into training data, then obtain P actual prediction series yij , i =
data (T RD of Nt ), validation data (V AD of Nc ), 1, 2, . . . , N ; j = 1, 2, . . . , P .
and predicting data (P RD of Np ), where N = Nt + Step 7) For ETSKE1, we select the set of input weights
Nc + N p . that gives the smallest validation error in the P
Step 2) Partition of the T RD. The T RD is clustered by the trials. Then, the training and prediction errors are
k-means method. The best number of partitions is computed based on the selected input weights.
decided in view of the distance between the clusters Step 8) For ETSKE2, we set a selection ratio (ratio) first.
in a clustering dendrogram, i.e., assuming that the Then, select ratio × P sets of input weights that
best number of partitions is r, the sum of all dis- have the smallest validation errors in the P trials.
tances from each sample point to the cluster center From (2) and (5), we can see that the outputs of
is smallest when the input data are divided into r ELM are determined after the input weights are
groups. The division of m-dimensional space into chosen. Then, the actual output value y∗ is also de-
r here means that the number of inference rules is termined by (15). For each set of input weights, we
set to be r. Denoting each group of the T RD by compute the actual output value y∗ for the training,
Rs , s = 1, . . . , r, the T RD of Rs is expressed by validation, and prediction sets. Finally, the mean of
(xsi , yis ), where i = 1, . . . , (Nt )s , and (Nt )s are the the actual output values for the eight sets of input
T RD numbers in each Rs . weights is regarded as the final output value.
SUN et al.: NEURO-FUZZY INFERENCE SYSTEM 1325

IV. E XPERIMENTAL R ESULT AND D ISCUSSION TABLE I


EXPERIMENTAL PARAMETERS IN EXAMPLE 1
In order to test the validity and performance of the proposed
algorithm, we present in this section the experimental results
with artificial and real-world data sets. Specifically, as an
example, we give a concise explanation for each step in Fig. 1,
combined with the data that are processed in experiment 1.
These explanations are not repeated in experiments 2–4 since
they are similar to the first experiment. All simulations were TABLE II
conducted in Matlab environment running on an ordinary per- COMPARISON OF mse RESULTS FOR THE SIX METHODS IN EXAMPLE 1
sonal computer with a dual-core central processing unit (3.20
and 3.19 GHz) and 1-GB memory.
In the following experiments, the activation function of ELM
is the sigmoidal function:
1
y= . (16)
1 + e−x
The performance index is the mean squared error mse be-
tween the predicted value yi and the actual value ti , i.e.,

1 
N
mse = (yi − ti )2 . (17)
N i=1

To measure the stability of the algorithms, we give the


definition of the standard deviation std for performance index
mse, i.e.,


1  Q
std =  (msei − µ)2 (18)
Q i=1

where msei , i = 1, 2, . . . , Q is the mean squared error that


is obtained when we run the algorithm for the ith time
for the same data, and µ is the mean value of all msei ,
i = 1, 2, . . . , Q, i.e.,
As an example, we give a detailed description for the first set
of experiments E1. First, set S1 is used as the prediction set,
1 
Q
µ= msei . (19) while the remaining samples are used as the training samples.
Q i=1
In the training samples, 60% of the whole data is used as the
training set, and 20% is used as the validation set. We first
normalize the inputs of the training samples into the interval
A. Experimental Results [−1, 1] by (8) and (9). Then, the training set is partitioned
Example 1: To compare the performance of the proposed into two clusters by using the k-means method. After that,
algorithm with three ANFIS methods, let us consider the fol- the proposed system is trained according to Steps 3)–5), as
lowing nonlinear three-input–single-output system: shown in Section III-B, when the number of hidden neurons is
increased from 1 to 50, respectively. As a result, the validation
 −1

−1.5 2 error is the smallest in this example when the number of hidden
y = 1.0 + x0.5
1 + x2 + x3 , 1 ≤ x1 , x2 , x3 ≤ 5
(20) neurons is 12. Therefore, we set the number of hidden neurons
to be 12 and repeat 20 trials for ETSK, in which set S1 is used
which has been used in many references [2, p. 276], [10]. In as the prediction set.
simulations, n-fold cross validation is used to compare the For ETSK, we compute the mean values of the training,
reliability of the proposed and existing methods. Here, the validation, and prediction errors in the 20 trials.
parameter n is set to be 5. The data are divided equally into five For ETSKE1, we select a set of input weights that corre-
sets (S1, S2, . . . , S5). For the data, we repeat five sets of exper- sponds to the smallest validation error in the 20 trials. Ac-
iments that are denoted as E1, E2, . . . , E5, respectively. The cording to the description of Section II-B, we compute the
number of hidden neurons for ETSK nhidden, the selection output weights using (5) and the input weights. Then, the actual
ratio for ETSKE2 sratio, and the parameter “radii” for ANFIS3 outputs of ELMs in Fig. 1 are computed using the input and
pradii are given in Table I. Table II shows the corresponding output weights. Finally, the final output of the system in Fig. 1
experimental results. For simplicity, the training, validation, is obtained by (15). The training and prediction errors are
and prediction sets are abbreviated as “trnSet,” “valSet,” and computed according to the actual and expected outputs of the
“preSet,” respectively, in the following experiments. system.
1326 IEEE TRANSACTIONS ON SYSTEMS, MAN, AND CYBERNETICS—PART B: CYBERNETICS, VOL. 37, NO. 5, OCTOBER 2007

Fig. 2. Actual prediction series y1(t), y2(t), . . . , y8(t) for the validation set based on the selected eight sets of input weights.

In the same way, we can obtain the experimental results of


E2, E3, E4, and E5 when we set each one of S2, S3, S4, and
S5 as the prediction set in sequence and repeat the preceding
steps.
For the three ANFIS methods, the experimental data are the
same as the ETSK and two extensions.
For ANFIS1, the standard functions “genfis1” and “anfis”
in Matlab 6.5 are used according to the examples in [5]. The
parameter “optMethod” for the function “anfis” is set to be zero.
The training epoch number is set to be 1000. To improve the
generation performance, the validation set is used as the “chk
Data” in the function “anfis.”
For ANFIS2, the parameter “optMethod” for the function
“anfis” is set to be 1. The training epoch number is set to
be 1000. The validation set is used as the “chk Data” in the
function “anfis.”
Fig. 3. Final prediction series y(t) for the validation set. For ANFIS3, the standard function “genfis2” in Matlab 6.5
is used according to the examples in [5]. According to the
For ETSKE2, we select eight (0.4 × 20) sets of input weights suggestion of [5], the parameter “radii” is selected from the
that have the smallest validation errors in the 20 trials. For interval [0.2, 0.5].
every set of input weights, similar to ETSKE1, we compute the For the experimental results in Table II, the mean values µ,
actual prediction series for the training, validation, and predic- standard deviations σ, and coefficients of variation µ/σ are
tion sets. To keep the paper more compact, we only give the given in Table III.
results for the validation set. Fig. 2 shows the actual prediction From Table III, it can be seen that ANFIS2 and ANFIS3 have
series y1(t), y2(t), . . . , y8(t) for the validation set based on the smaller training, validation, and prediction errors than ANFIS1
selected eight sets of input weights. Finally, the mean prediction and ETSK. ETSKE1 and ETSKE2 have smaller validation
series y(t) (y(t) = (y1(t) + y2(t) + · · · + y8(t))/8) for the and prediction errors than ANFIS2 and ANFIS3 but a higher
eight sets of actual prediction series y1(t), y2(t), . . . , y8(t) is training error. In addition, ETSKE1 and ETSKE2 have smaller
regarded as the final prediction series. Fig. 3 shows the final standard deviation σ than other methods.
prediction series for the validation set. In Figs. 2 and 3, the Table IV shows the time that is consumed by the six methods.
horizontal axis denotes the code number of the sample and The symbol µ denotes the mean training time of the five sets
the vertical axis denotes the sample value. The error between of experiments. Note that the training time of ETSKE1 or
the final prediction series and the expected value can be com- ETSKE2 is the sum of 20 trials of ETSK. From Table IV, we
puted by (17). can see that ETSK has the shortest training time.
SUN et al.: NEURO-FUZZY INFERENCE SYSTEM 1327

TABLE III TABLE V


MEAN VALUES, STANDARD DEVIATIONS, AND COEFFICIENT EXPERIMENTAL PARAMETERS IN EXAMPLE 2
OF V ARIATION FOR THE S IX M ETHODS IN E XAMPLE 1

TABLE VI
COMPARISON OF mse RESULTS FOR THE SIX METHODS IN EXAMPLE 2

TABLE IV
COMPARISON OF TRAINING TIME FOR THE SIX METHODS IN EXAMPLE 1
are shown in this paper are the mean of 20 trials. The mean
values, standard deviations, and the ratios between the standard
deviations and mean values are given in Table VII.
Table VIII shows the training time of six methods. From
Tables V–VII, we can see that the six methods have similar
accuracy. However, ETSK and its extensions have an obviously
shorter training time than the three ANFIS methods in this
experiment.
Example 3: In the following, we will give the simulation
results on the rice taste data [17], [19, p. 269]. The data
consist of five inputs and a single output whose values are
associated with subjective evaluations as follows: x1 : flavor, x2 :
Example 2: In this example, all the six algorithms are used appearance, x3 : taste, x4 : stickiness, x5 : toughness, y: overall
to approximate the “SinC” function1 evaluation. The input/output pairs can be written as

sin(x)/x, x = 0
y(x) = . (21) ((xp1 , xp2 , xp3 , xp4 , xp5 ), yp ) , p = 1, 2, . . . , 105. (22)
1, x=0

Here, 10 000 input/output pairs are used as the experimen- The first 70 data pairs are used as the training data, while the
tal data. The five-fold cross validation is used to verify the rest are used as the testing data. The number of hidden neurons
reliability of six methods. The number of hidden neurons for for ETSK nhidden, the selection ratio for ETSKE2 sratio, and
ETSK nhidden, the selection ratio for ETSKE2 sratio, and the parameter “radii” for ANFIS3 pradii are given in Table IX.
the parameter “radii” for ANFIS3 pradii are given in Table V. The training, validation, and prediction errors for six meth-
The training, validation, and prediction errors for six meth- ods are given in Table X. The mean values µ, standard devia-
ods are given in Table VI. For ETSK, all the results that tions σ, and coefficients of variation µ/σ are given in Table XI.
Table XII shows the training time for the six methods.
Tables X–XII show that ETSKE2 is the best method with
1 http://www.ntu.edu.sg/home/egbhuang/. respect to both training time and errors.
1328 IEEE TRANSACTIONS ON SYSTEMS, MAN, AND CYBERNETICS—PART B: CYBERNETICS, VOL. 37, NO. 5, OCTOBER 2007

TABLE VII TABLE X


MEAN VALUES, STANDARD DEVIATIONS, AND COEFFICIENT COMPARISON OF mse RESULTS FOR THE SIX METHODS IN EXAMPLE 3
OF V ARIATION FOR THE S IX M ETHODS IN E XAMPLE 2

TABLE XI
MEAN VALUES, STANDARD DEVIATIONS, AND COEFFICIENTS
OF V ARIATION FOR THE S IX M ETHODS IN E XAMPLE 3

TABLE VIII
COMPARISON OF TRAINING TIME FOR THE SIX METHODS IN EXAMPLE 2

TABLE IX
EXPERIMENTAL PARAMETERS IN EXAMPLE 3

Example 4: The real-world benchmark data set “Abalone” is


used in this example.2 There are 4177 samples being used.
The number of hidden neurons for ETSK, the selection ratio
for ETSKE2, and the parameter “radii” for ANFIS3 are given
in Table XIII.
There are eight input variables for the benchmark data. In
this case, ANFIS1 and ANFIS2 both fall prey to the curse of dimensionality. During the simulation, for ANFIS1, the
training time is far longer than the training times of ANFIS3,
2 http://www.niaad.liacc.up.pt/~ltorgo/Regression/ds_menu.html. ELM, or ELME. We do not have the experimental results of
SUN et al.: NEURO-FUZZY INFERENCE SYSTEM 1329

TABLE XII TABLE XVI


COMPARISON OF TRAINING TIME FOR THE SIX METHODS IN EXAMPLE 3 COMPARISON OF TRAINING TIME FOR
THE F IVE M ETHODS IN E XAMPLE 4

TABLE XVII
NUMBER OF INPUTS Dim, NUMBER OF SAMPLES N , AND THE
TABLE XIII CORRESPONDING MEAN TRAINING TIME ∆t1, . . . , ∆t4 OF THE
EXPERIMENTAL PARAMETERS IN EXAMPLE 4 SIX METHODS IN THE PRECEDING FOUR EXPERIMENTS

TABLE XIV
COMPARISON OF mse RESULTS FOR THE FIVE METHODS IN EXAMPLE 4

TABLE XVIII
TIME RATIO BETWEEN THE FIRST EXPERIMENT (∆t1)
AND THE O THER T HREE E XPERIMENTS

TABLE XV
MEAN VALUES, STANDARD DEVIATIONS, AND COEFFICIENT
OF V ARIATION FOR THE F IVE M ETHODS IN E XAMPLE 4
ANFIS2 because the algorithm cannot converge. The training,
validation, and prediction errors of the five methods are given
in Table XIV. The mean values µ, standard deviations σ, and
coefficients of variation σ/µ are given in Table XV.
Table XVI shows the training time for the five methods. It
can be seen from Tables XIII–XVI that ETSKE1 and ETSKE2
have shorter training time and smaller errors than ANFIS1 and
ANFIS3. Since ANFIS1 falls prey to the curse of dimensional-
ity, the corresponding training time is increased dramatically.
From the preceding experimental results, we can find that
ETSKE1 and ETSKE2 generally have a relatively shorter train-
ing time and smaller errors than the three ANFIS methods. To
analyze the performance of the six methods further, Table XVII
shows the number of inputs Dim, the number of samples N ,
and the corresponding mean training time ∆t1, . . . , ∆t4 of
six methods in the preceding four experiments. The time ratio
between the first experiment (∆t1) and other experiments is
given in Table XVIII.
1330 IEEE TRANSACTIONS ON SYSTEMS, MAN, AND CYBERNETICS—PART B: CYBERNETICS, VOL. 37, NO. 5, OCTOBER 2007

From Tables XVI–XVIII, we can see that the training times [11] M. Sugeno and G. T. Kang, “Structure identification of fuzzy model,”
of ANFIS1 and ANFIS2 increase quickly when the numbers Fuzzy Sets Syst., vol. 28, no. 1, pp. 15–33, Oct. 1988.
[12] M. Sugeno and K. Tanaka, “Successive identification of a fuzzy model
of inputs and samples increase. When the number of inputs and its applications to prediction of a complex system,” Fuzzy Sets Syst.,
is more than five, ANFIS1 and ANFIS2 fall into the curse of vol. 42, no. 3, pp. 315–334, Aug. 1991.
dimensionality. We cannot obtain the experimental results of [13] J.-S. R. Jang, “ANFIS: Adaptive-network-based fuzzy inference systems,”
ANFIS2 in our computer. The training time of ANFIS3 also IEEE Trans. Syst., Man, Cybern., vol. 23, no. 3, pp. 665–685, May 1993.
[14] Q.-Y. Zhu, A. K. Qin, P. N. Suganthan, and G.-B. Huang, “Evolutionary
increases when the numbers of inputs and samples increase. extreme learning machine,” Pattern Recognit., vol. 38, no. 10, pp. 1759–
It seems that ANFIS3 is more sensitive to the number of 1763, Oct. 2005.
samples than the number of inputs. The training times of ETSK, [15] G.-B. Huang, Q.-Y. Zhu, and C.-K. Siew, “Extreme learning machine: A
new learning scheme of feedforward neural networks,” in Proc. IJCNN,
ETSKE1, and ETSKE2 increase slowly with the increasing Budapest, Hungary, Jul. 25–29, 2004, vol. 2, pp. 985–990.
numbers of inputs and samples. The training times of ETSKE1 [16] R. Storn and K. Price, “Differential evolution—A simple and efficient
and ETSKE2 are less than the three ANFIS methods in the last heuristic for global optimization over continuous spaces,” J. Global
three experiments. Optim., vol. 11, no. 4, pp. 341–359, Dec. 1997.
[17] H. Ishibuchi, K. Nozaki, H. Tanaka, Y. Hosaka, and M. Matsuda, “Empir-
ical study on learning in fuzzy systems by rice taste analysis,” Fuzzy Sets
Syst., vol. 64, no. 2, pp. 129–144, Jun. 1994.
V. C ONCLUSION [18] Z.-Q. Bian, X.-G. Zhang et al., Pattern Recognition. Beijing, China:
Tsinghua Univ. Press.
The TSK model is one of the most influential neuro-fuzzy [19] K. Nozzaki, H. Ishibuchi, and H. Tanaka, “A simple but powerful heuristic
reasoning models. In this paper, we investigate the feasibility method for generating fuzzy rules from numerical data,” Fuzzy Sets Syst.,
of applying a relatively novel neural network technique ELM vol. 86, no. 3, pp. 251–270, Mar. 1997.
[20] S. K. Pal and S. Mitra, “Multi-layer perceptron, fuzzy sets and classifica-
to realize a neuro-fuzzy TSK fuzzy inference system. For the tion,” IEEE Trans. Neural Netw., vol. 3, no. 5, pp. 683–697, Sep. 1992.
ETSK fuzzy inference system, two extensions are proposed [21] S. K. Pal and S. Mitra, Neuro-Fuzzy Pattern Recognition: Methods in Soft
to improve its accuracy. The proposed methods can avoid the Computing. New York: Wiley, 1999.
curse of dimensionality that is encountered in backpropagation [22] S. Mitra and S. K. Pal, “Fuzzy self organization, inferencing and rule
generation,” IEEE Trans. Syst., Man, Cybern. A, Syst., Humans, vol. 26,
and hybrid ANFIS methods. Moreover, when the numbers of no. 5, pp. 608–620, Sep. 1996.
inputs and samples are relatively large, the proposed extensions [23] J. S. R. Jang and E. Mizutani, “Levenberg–Marquardt method for ANFIS
of ETSK usually have higher accuracy and shorter training learning,” in Proc. Biennial Conf. NAFIPS, Berkeley, CA, Jun. 19–22,
1996, pp. 87–91.
time compared to the three ANFIS methods. The advantage [24] G.-B. Huang, Q.-Y. Zhu, and C.-K. Siew, “Extreme learning machine:
of the ANFIS methods is that they usually can obtain a stable Theory and applications,” Neurcomputing, vol. 70, no. 1–3, pp. 489–501,
prediction value after the training is converged, while the ETSK Dec. 2006.
methods cannot. In general, the proposed methods have a com- [25] G.-B. Huang and H. A. Babri, “Universal approximation using incre-
mental networks with random hidden computation nodes,” IEEE Trans.
petitive performance in training time and accuracy compared to Neural Netw., vol. 17, no. 4, pp. 879–892, Jul. 2006.
the three ANFIS (ANFIS) methods.

ACKNOWLEDGMENT
The authors would like to thank the Editor-in-Chief, the
Associate Editor, and the anonymous reviewers who have given
many helpful comments and suggestions. They would also like
to thank Dr. Y. Yu, Dr. C. Hung Chiu, and L. Chow for their
valuable suggestions and inputs.

R EFERENCES
[1] S. Cong, Neural Network, and Their Application in the Moving Control.
Hefei, China: Univ. Sci. and Technol. China Press, 2001.
[2] S.-T. Wang, Fuzzy System, Fuzzy Neural Network and Design of Applica-
tion Program. Shanghai, China: Shanghai Sci. & Tech. Publishers, 1998.
[3] Z.-L. Sun, D.-S. Huang, C.-H. Zheng, and L. Shang, “Optimal selection
of time lags for temporal blind source separation based on genetic algo-
rithm,” Neurocomputing, vol. 69, no. 7–9, pp. 884–887, Mar. 2006.
[4] H. Demuth and M. Beale, Neural Network Toolbox for Use With MATLAB,
Users Guide ver. 4.0. Natick, MA: The Mathworks, Inc., 1998.
[5] J. S. R. Jang and N. Gulley, Fuzzy Logic Toolbox for Use With MATLAB. Zhan-Li Sun received the B.Sc. degree from
Natick, MA: The Mathworks, Inc., 2006. Huainan Industrial University, Huainan, China, in
[6] L. A. Zadeh, “Fuzzy sets,” Inf. Control, vol. 8, no. 3, pp. 338–352, 1965. 1997, the M.Sc. degree from the Hefei University of
[7] A. P. Paplinski, Neuro-fuzzy computing. [Online]. Available: http:// Technology, Hefei, China, in 2003, and the Ph.D. de-
www.csse.monash.edu.au/courseware/cse5301/04/index.html gree from the University of Science and Technology
[8] S. Mitra and Y. Hayashi, “Neuro-fuzzy rule generation: Survey in soft of China, Hefei, in 2005.
computing framework,” IEEE Trans. Neural Netw., vol. 11, no. 3, From March 2006 to March 2007, he was a
pp. 748–767, May 2000. Research Associate in the Institute of Textiles
[9] S. G. Tzafestas and K. C. Zikidis, “NeuroFAST: On-line neuro-fuzzy and Clothing, Hong Kong Polytechnic University,
ART-based structure and parameter learning TSK model,” IEEE Trans. Kowloon, Hong Kong. He is currently with the
Syst., Man, Cybern. B, Cybern., vol. 31, no. 5, pp. 797–802, Oct. 2001. School of Computer Engineering, Nanyang Techno-
[10] T. Takagi and I. Hayashi, “NN-driven fuzzy reasoning,” Int. J. Approx. logical University, Singapore. His current research interests include machine
Reason., vol. 5, no. 3, pp. 191–212, May 1991. learning and signal and image processing.
SUN et al.: NEURO-FUZZY INFERENCE SYSTEM 1331

Kin-Fan Au received the M.Sc. (Eng.) degree in Tsan-Ming Choi (S’00–M’01) received the Ph.D.
industrial engineering from the University of Hong degree in supply chain management from the Chi-
Kong, Hong Kong, and the Ph.D. degree from nese University of Hong Kong, Hong Kong.
the Hong Kong Polytechnic University, Kowloon, He is currently an Assistant Professor with the
Hong Kong. Institute of Textiles and Clothing, Hong Kong Poly-
He is currently an Associate Professor with the technic University, Kowloon, Hong Kong. Over the
Institute of Textiles and Clothing, Hong Kong Poly- past few years, he has actively participated in a vari-
technic University. He has published many papers in ety of research projects in supply chain management.
textiles and related journals on topics of world trad- He has published in journals such as Computers
ing, offshore production, and modeling of textiles and Operations Research, European Journal of
and apparel trade. His research interests include the Operational Research, IEEE TRANSACTIONS, Inter-
business aspects of fashion and textiles, particularly global trading of fashion national Journal of Production Economics, Journal of Industrial and Manage-
and textile products. ment Optimization, Journal of the Operational Research Society, and Omega.
His main research interest is supply chain management.
Prof. Choi is a member of the Institute for Operations Research and the
Management Sciences.