Professional Documents
Culture Documents
Abstract—This paper investigates the feasibility of applying a The main difference between the Mamdani and Sugeno types is
relatively novel neural network technique, i.e., extreme learning that the Sugeno output membership functions are either linear
machine (ELM), to realize a neuro-fuzzy Takagi–Sugeno–Kang or constant [5], [6].
(TSK) fuzzy inference system. The proposed method is an im- Inspired by the biological nervous system, ANNs have been
proved version of the regular neuro-fuzzy TSK fuzzy inference
system. For the proposed method, first, the data that are processed proposed to solve problems that are difficult for conventional
are grouped by the k-means clustering method. The membership computers or human beings. Most ANNs have some sort of
of arbitrary input for each fuzzy rule is then derived through training rules whereby the weights of connections are adjusted
an ELM, followed by a normalization method. At the same time, on the basis of data. In other words, ANNs can learn from
the consequent part of the fuzzy rules is obtained by multiple examples and exhibit some capability for generalization beyond
ELMs. At last, the approximate prediction value is determined the training data [7].
by a weight computation scheme. For the ELM-based TSK fuzzy
inference system, two extensions are also proposed to improve its
Both the fuzzy system and ANNs are soft-computing ap-
accuracy. The proposed methods can avoid the curse of dimension- proaches to model expert behavior [21], [22]. The goal is to
ality that is encountered in backpropagation and hybrid adaptive mimic the actions of an expert who solves relatively complex
neuro-fuzzy inference system (ANFIS) methods. Moreover, the problems. In other words, instead of investigating the problem
proposed methods have a competitive performance in training in detail, one observes how an expert successfully tackles the
time and accuracy compared to three ANFIS methods. problem and obtains knowledge by instruction and learning. A
Index Terms—Adaptive neuro-fuzzy inference system (ANFIS), learning process can be part of knowledge acquisition. In the
extreme learning machine (ELM), k-means clustering, Takagi– absence of an expert or sufficient time or data, one can resort to
Sugeno–Kang (TSK) fuzzy inference system. reinforcement learning instead of supervised learning. If one
has knowledge that is expressed as linguistic rules, one can
I. I NTRODUCTION build a fuzzy system. On the other hand, if one has enough data
or can learn from a simulation or a real task, ANNs are very
function parameters: 1) backpropagation for all parameters To select the number of hidden neurons of ELM, the data
(ANFIS1); 2) a hybrid method consisting of backpropagation are first divided into training, validation, and prediction sets.
for the parameters that are associated with the input member- We gradually increase the number of hidden neurons in a given
ship functions and least squares estimation for the parame- interval and then select the result with the smallest validation
ters that are associated with the output membership functions error as the final one. Besides this selection criterion, as a stabil-
(ANFIS2) [5], [13]; and 3) the subtractive clustering method ity performance index, the standard deviation of the validation
(ANFIS3). error is also considered in the selection procedure.
The TSK and most of its modified versions, however, are For the ETSK, we give two extensions that can obtain smaller
based on gradient-based learning algorithms. The conventional training, validation, and prediction errors. In the first extension
gradient-based learning algorithms, such as backward prop- method, we repeat a given number of trials for the ETSK with
agation (BP) and its variant, i.e., the Levenberg–Marquardt the randomly selected input weights at first. Then, the set of
method [23], have been extensively used in the training of input weights that gives the smallest validation error is selected.
multilayer feedforward neural networks. Although reasonable The prediction values of the training, validation, and prediction
performance can be obtained when the networks are trained sets are computed based on the selected input weights. This
by BP, these gradient-based learning algorithms still learn rel- extension is named ETSKE1 in this paper.
atively slowly. These learning algorithms may also easily con- For the second extension method, we select several sets of
verge to a local minimum. Moreover, the activation functions input weights with the smallest validation errors instead of
that are used in these gradient-based tuning methods need to be one set in the first extension. The prediction values of the
differentiable [14]. training, validation, and prediction sets are computed by using
A novel learning algorithm for single-hidden-layer feed- the respective set of input weights. The mean value of the
forward neural networks (SLFNs) called extreme learning prediction values is regarded as the final prediction value. This
machine (ELM) [15], [24] has recently been proposed by extension is named ETSKE2 in this paper.
Huang et al. In ELM, the input weights (linking the input layer The rest of the paper is organized as follows. Section II
to the hidden layer) and hidden biases are randomly chosen, and provides some necessary background information, and the pro-
the output weights (linking the hidden layer to the output layer) posed system is discussed in Section III. Section IV presents
are analytically determined by using Moore–Penrose (MP) the simulation results and discussion of the ETSK. Finally, the
generalized inverse. ELM not only learns much faster with a summary of this paper is given in Section V.
higher generalization performance than the traditional gradient-
based learning algorithms but it also avoids many difficulties II. B ACKGROUND
that are faced by gradient-based learning methods such as
stopping criteria, learning rate, learning epochs, local minima, Fig. 1 shows the block diagram of the proposed TSK fuzzy
and overtuning issues [14], [24], [25]. However, as the output inference system. In this section, we first give a concise review
weights are computed based on the prefixed input weights of the regular TSK fuzzy inference system, ELMs, and the
and hidden biases, there may exist a set of nonoptimal or normalization method.
unnecessary input weights, and hidden biases. In [14], a hybrid
approach named E-ELM is proposed by combining differential A. Regular TSK Fuzzy Inference System
evolution (DE) and ELM. In E-ELM, a modified DE is used to The core of the TSK model [10] is a set of IF–THEN rules
search for the optimal input weights and hidden biases, while with fuzzy implications and first-order functional consequence
the MP generalized inverse is used to analytically calculate the parts, which has been proven to be a universal approximator.
output weights. The authors find that the hybrid method can The format of fuzzy rule Ri is given as follows:
achieve a good generalization performance with much more
compact networks. One shortcoming of E-ELM, however, is Ri : IF x1 is Ai1 , x2 is Ai2 , . . . , xM is AiM
that it may take much more training time than ELM because THEN yi = ci0 + ci1 x1 + · · · + ciM xM .
it incorporates DE. Moreover, there are more parameters to be
adjusted in E-ELM than ELM. For the fuzzy inference system, The number of rules can be determined by the clustering
there are multiple neural networks that are trained at the same method. The algorithm creates linear models that locally ap-
time. Therefore, it may take a long training time if we use proximate the function to be learned. Structure identification
E-ELM in the fuzzy inference system. It is also difficult to sets a coarse fuzzy partitioning of the domain, while param-
obtain the optimum parameters of E-ELM. eter identification optimally adjusts premise and consequent
Based on the previous analyses, we propose an ELM-based parameters. The algorithm is divided into three major parts:
TSK (ETSK) fuzzy inference system in this paper. In the ETSK 1) the partition of inference rules; 2) the identification of IF
method, the membership of arbitrary input for each fuzzy rule is parts; and 3) the identification of THEN parts. For the TSK, an
derived through an ELM, followed by a normalization method, ANN represents a rule, while all the membership functions are
which transforms the outputs into the interval [0, 1]. At the represented by only one ANN.
same time, the consequent part of the fuzzy rules is identified
by multiple ELMs. Due to the advantages of ELM, the ETSK
B. ELM
can avoid the many difficulties that are faced by gradient-
based learning methods such as stopping criteria, learning rate, ELM is a relatively new learning algorithm for SLFNs [14],
learning epochs, local minima, and overtuning issues, which are [15]. It randomly chooses the input weights and analytically
also encountered in the regular TSK fuzzy inference system. determines the output weights of SLFNs.
SUN et al.: NEURO-FUZZY INFERENCE SYSTEM 1323
C. Normalization Method
The input data can be projected into the interval [0, 1] [(6)
and (7)] or [−1, 1] [(8) and (9)]. In addition, the output of
the membership neural network should be projected into the
interval [0, 1], i.e.,
and y are the input vector and final predicted value, respectively. Step 3) This step is the identification of the constitution
From Fig. 1, it can be seen that the system mainly consists of of each IF part. Here, the ELM and normalization
the following four major parts: method are used to generate the membership func-
1) division of input space; tions under a supervised method. The input vector
2) identification of the “IF” parts; and target vector can be constituted by the following
3) identification of the “THEN” parts; technique. For the input xi , assume that the target
4) computation of the predicted value. vector is wi = [wi1 , . . . , wir ], if xi ∈ Rs , wis = 1,
and the other elements of wi are all 0. After train-
There are two main tasks for the first part. One is determining ing, the output of the ELM is normalized into the
the number of fuzzy inference rules. The other is to divide the interval [0, 1].
input space, so that similar samples are clustered into a group. The learning of N Nmem is conducted, so that
The best number of partitions is decided in view of the distance these wis can be inferred from the input xi . Thus,
between the clusters in a clustering dendrogram. The number N Nmem becomes capable of inferring the degree of
of inference rules equals the number of groups. In this paper, attribution wis of each training data item xi to Rs .
following the approach in [10], the input data are grouped by We define the membership function of the IF part
the k-means method, which is an efficient clustering method. as the inferred value w̃is , which is the output of the
Certainly, other clustering methods can also be used here. For learned N Nmem , i.e.,
a comparison of various kinds of clustering methods, please
refer to [18].
The second part consists of an ELM and a normalization µsA (xi ) = w̃is , i = 1, . . . , N. (14)
processing procedure. This part is used to determine the mem-
bership of arbitrary input for each rule, which derives the
membership function for each rule and corresponds to the Step 4) This step is the identification of each THEN part.
identification of the IF parts (the condition parts) of the rule. Let the ELMs denote the sth neural network of the
Generally, not all of the ELM’s outputs are included in the THEN part in Fig. 1. The structure of the THEN
interval [0, 1]. As we know, the membership value should be part of each inference rule is expressed by the
a data point in the interval [0, 1]. Therefore, here, we use nor- input/output relationship of the ELMs. The T RD
malization to project the output of ELM into the interval [0, 1]. input xsi1 , . . . , xsim and the output value yis , i =
The third part of the system is the determination of the THEN 1, . . . , (Nt )s , are assigned to the input and output
parts (the conclusion parts). Here, multiple ELMs are used to of the ELMs.
get the function between the input and the output. The ELM is Step 5) The final output value yi∗ is derived according to the
trained by the learning data and the output value for each rule. following equation [10]:
The last part weighs the output of the THEN part by the
membership values of the IF parts and computes the final
r
output value. µsA (xi ) · us (xi )
yi∗ = s=1
r , i = 1, . . . , N. (15)
µsA (xi )
B. Algorithm of the ETSK Fuzzy Inference System s=1
1
N
mse = (yi − ti )2 . (17)
N i=1
Fig. 2. Actual prediction series y1(t), y2(t), . . . , y8(t) for the validation set based on the selected eight sets of input weights.
TABLE VI
COMPARISON OF mse RESULTS FOR THE SIX METHODS IN EXAMPLE 2
TABLE IV
COMPARISON OF TRAINING TIME FOR THE SIX METHODS IN EXAMPLE 1
are shown in this paper are the mean of 20 trials. The mean
values, standard deviations, and the ratios between the standard
deviations and mean values are given in Table VII.
Table VIII shows the training time of six methods. From
Tables V–VII, we can see that the six methods have similar
accuracy. However, ETSK and its extensions have an obviously
shorter training time than the three ANFIS methods in this
experiment.
Example 3: In the following, we will give the simulation
results on the rice taste data [17], [19, p. 269]. The data
consist of five inputs and a single output whose values are
associated with subjective evaluations as follows: x1 : flavor, x2 :
Example 2: In this example, all the six algorithms are used appearance, x3 : taste, x4 : stickiness, x5 : toughness, y: overall
to approximate the “SinC” function1 evaluation. The input/output pairs can be written as
sin(x)/x, x = 0
y(x) = . (21) ((xp1 , xp2 , xp3 , xp4 , xp5 ), yp ) , p = 1, 2, . . . , 105. (22)
1, x=0
Here, 10 000 input/output pairs are used as the experimen- The first 70 data pairs are used as the training data, while the
tal data. The five-fold cross validation is used to verify the rest are used as the testing data. The number of hidden neurons
reliability of six methods. The number of hidden neurons for for ETSK nhidden, the selection ratio for ETSKE2 sratio, and
ETSK nhidden, the selection ratio for ETSKE2 sratio, and the parameter “radii” for ANFIS3 pradii are given in Table IX.
the parameter “radii” for ANFIS3 pradii are given in Table V. The training, validation, and prediction errors for six meth-
The training, validation, and prediction errors for six meth- ods are given in Table X. The mean values µ, standard devia-
ods are given in Table VI. For ETSK, all the results that tions σ, and coefficients of variation µ/σ are given in Table XI.
Table XII shows the training time for the six methods.
Tables X–XII show that ETSKE2 is the best method with
1 http://www.ntu.edu.sg/home/egbhuang/. respect to both training time and errors.
1328 IEEE TRANSACTIONS ON SYSTEMS, MAN, AND CYBERNETICS—PART B: CYBERNETICS, VOL. 37, NO. 5, OCTOBER 2007
TABLE XI
MEAN VALUES, STANDARD DEVIATIONS, AND COEFFICIENTS
OF V ARIATION FOR THE S IX M ETHODS IN E XAMPLE 3
TABLE VIII
COMPARISON OF TRAINING TIME FOR THE SIX METHODS IN EXAMPLE 2
TABLE IX
EXPERIMENTAL PARAMETERS IN EXAMPLE 3
TABLE XVII
NUMBER OF INPUTS Dim, NUMBER OF SAMPLES N , AND THE
TABLE XIII CORRESPONDING MEAN TRAINING TIME ∆t1, . . . , ∆t4 OF THE
EXPERIMENTAL PARAMETERS IN EXAMPLE 4 SIX METHODS IN THE PRECEDING FOUR EXPERIMENTS
TABLE XIV
COMPARISON OF mse RESULTS FOR THE FIVE METHODS IN EXAMPLE 4
TABLE XVIII
TIME RATIO BETWEEN THE FIRST EXPERIMENT (∆t1)
AND THE O THER T HREE E XPERIMENTS
TABLE XV
MEAN VALUES, STANDARD DEVIATIONS, AND COEFFICIENT
OF V ARIATION FOR THE F IVE M ETHODS IN E XAMPLE 4
ANFIS2 because the algorithm cannot converge. The training,
validation, and prediction errors of the five methods are given
in Table XIV. The mean values µ, standard deviations σ, and
coefficients of variation σ/µ are given in Table XV.
Table XVI shows the training time for the five methods. It
can be seen from Tables XIII–XVI that ETSKE1 and ETSKE2
have shorter training time and smaller errors than ANFIS1 and
ANFIS3. Since ANFIS1 falls prey to the curse of dimensional-
ity, the corresponding training time is increased dramatically.
From the preceding experimental results, we can find that
ETSKE1 and ETSKE2 generally have a relatively shorter train-
ing time and smaller errors than the three ANFIS methods. To
analyze the performance of the six methods further, Table XVII
shows the number of inputs Dim, the number of samples N ,
and the corresponding mean training time ∆t1, . . . , ∆t4 of
six methods in the preceding four experiments. The time ratio
between the first experiment (∆t1) and other experiments is
given in Table XVIII.
1330 IEEE TRANSACTIONS ON SYSTEMS, MAN, AND CYBERNETICS—PART B: CYBERNETICS, VOL. 37, NO. 5, OCTOBER 2007
From Tables XVI–XVIII, we can see that the training times [11] M. Sugeno and G. T. Kang, “Structure identification of fuzzy model,”
of ANFIS1 and ANFIS2 increase quickly when the numbers Fuzzy Sets Syst., vol. 28, no. 1, pp. 15–33, Oct. 1988.
[12] M. Sugeno and K. Tanaka, “Successive identification of a fuzzy model
of inputs and samples increase. When the number of inputs and its applications to prediction of a complex system,” Fuzzy Sets Syst.,
is more than five, ANFIS1 and ANFIS2 fall into the curse of vol. 42, no. 3, pp. 315–334, Aug. 1991.
dimensionality. We cannot obtain the experimental results of [13] J.-S. R. Jang, “ANFIS: Adaptive-network-based fuzzy inference systems,”
ANFIS2 in our computer. The training time of ANFIS3 also IEEE Trans. Syst., Man, Cybern., vol. 23, no. 3, pp. 665–685, May 1993.
[14] Q.-Y. Zhu, A. K. Qin, P. N. Suganthan, and G.-B. Huang, “Evolutionary
increases when the numbers of inputs and samples increase. extreme learning machine,” Pattern Recognit., vol. 38, no. 10, pp. 1759–
It seems that ANFIS3 is more sensitive to the number of 1763, Oct. 2005.
samples than the number of inputs. The training times of ETSK, [15] G.-B. Huang, Q.-Y. Zhu, and C.-K. Siew, “Extreme learning machine: A
new learning scheme of feedforward neural networks,” in Proc. IJCNN,
ETSKE1, and ETSKE2 increase slowly with the increasing Budapest, Hungary, Jul. 25–29, 2004, vol. 2, pp. 985–990.
numbers of inputs and samples. The training times of ETSKE1 [16] R. Storn and K. Price, “Differential evolution—A simple and efficient
and ETSKE2 are less than the three ANFIS methods in the last heuristic for global optimization over continuous spaces,” J. Global
three experiments. Optim., vol. 11, no. 4, pp. 341–359, Dec. 1997.
[17] H. Ishibuchi, K. Nozaki, H. Tanaka, Y. Hosaka, and M. Matsuda, “Empir-
ical study on learning in fuzzy systems by rice taste analysis,” Fuzzy Sets
Syst., vol. 64, no. 2, pp. 129–144, Jun. 1994.
V. C ONCLUSION [18] Z.-Q. Bian, X.-G. Zhang et al., Pattern Recognition. Beijing, China:
Tsinghua Univ. Press.
The TSK model is one of the most influential neuro-fuzzy [19] K. Nozzaki, H. Ishibuchi, and H. Tanaka, “A simple but powerful heuristic
reasoning models. In this paper, we investigate the feasibility method for generating fuzzy rules from numerical data,” Fuzzy Sets Syst.,
of applying a relatively novel neural network technique ELM vol. 86, no. 3, pp. 251–270, Mar. 1997.
[20] S. K. Pal and S. Mitra, “Multi-layer perceptron, fuzzy sets and classifica-
to realize a neuro-fuzzy TSK fuzzy inference system. For the tion,” IEEE Trans. Neural Netw., vol. 3, no. 5, pp. 683–697, Sep. 1992.
ETSK fuzzy inference system, two extensions are proposed [21] S. K. Pal and S. Mitra, Neuro-Fuzzy Pattern Recognition: Methods in Soft
to improve its accuracy. The proposed methods can avoid the Computing. New York: Wiley, 1999.
curse of dimensionality that is encountered in backpropagation [22] S. Mitra and S. K. Pal, “Fuzzy self organization, inferencing and rule
generation,” IEEE Trans. Syst., Man, Cybern. A, Syst., Humans, vol. 26,
and hybrid ANFIS methods. Moreover, when the numbers of no. 5, pp. 608–620, Sep. 1996.
inputs and samples are relatively large, the proposed extensions [23] J. S. R. Jang and E. Mizutani, “Levenberg–Marquardt method for ANFIS
of ETSK usually have higher accuracy and shorter training learning,” in Proc. Biennial Conf. NAFIPS, Berkeley, CA, Jun. 19–22,
1996, pp. 87–91.
time compared to the three ANFIS methods. The advantage [24] G.-B. Huang, Q.-Y. Zhu, and C.-K. Siew, “Extreme learning machine:
of the ANFIS methods is that they usually can obtain a stable Theory and applications,” Neurcomputing, vol. 70, no. 1–3, pp. 489–501,
prediction value after the training is converged, while the ETSK Dec. 2006.
methods cannot. In general, the proposed methods have a com- [25] G.-B. Huang and H. A. Babri, “Universal approximation using incre-
mental networks with random hidden computation nodes,” IEEE Trans.
petitive performance in training time and accuracy compared to Neural Netw., vol. 17, no. 4, pp. 879–892, Jul. 2006.
the three ANFIS (ANFIS) methods.
ACKNOWLEDGMENT
The authors would like to thank the Editor-in-Chief, the
Associate Editor, and the anonymous reviewers who have given
many helpful comments and suggestions. They would also like
to thank Dr. Y. Yu, Dr. C. Hung Chiu, and L. Chow for their
valuable suggestions and inputs.
R EFERENCES
[1] S. Cong, Neural Network, and Their Application in the Moving Control.
Hefei, China: Univ. Sci. and Technol. China Press, 2001.
[2] S.-T. Wang, Fuzzy System, Fuzzy Neural Network and Design of Applica-
tion Program. Shanghai, China: Shanghai Sci. & Tech. Publishers, 1998.
[3] Z.-L. Sun, D.-S. Huang, C.-H. Zheng, and L. Shang, “Optimal selection
of time lags for temporal blind source separation based on genetic algo-
rithm,” Neurocomputing, vol. 69, no. 7–9, pp. 884–887, Mar. 2006.
[4] H. Demuth and M. Beale, Neural Network Toolbox for Use With MATLAB,
Users Guide ver. 4.0. Natick, MA: The Mathworks, Inc., 1998.
[5] J. S. R. Jang and N. Gulley, Fuzzy Logic Toolbox for Use With MATLAB. Zhan-Li Sun received the B.Sc. degree from
Natick, MA: The Mathworks, Inc., 2006. Huainan Industrial University, Huainan, China, in
[6] L. A. Zadeh, “Fuzzy sets,” Inf. Control, vol. 8, no. 3, pp. 338–352, 1965. 1997, the M.Sc. degree from the Hefei University of
[7] A. P. Paplinski, Neuro-fuzzy computing. [Online]. Available: http:// Technology, Hefei, China, in 2003, and the Ph.D. de-
www.csse.monash.edu.au/courseware/cse5301/04/index.html gree from the University of Science and Technology
[8] S. Mitra and Y. Hayashi, “Neuro-fuzzy rule generation: Survey in soft of China, Hefei, in 2005.
computing framework,” IEEE Trans. Neural Netw., vol. 11, no. 3, From March 2006 to March 2007, he was a
pp. 748–767, May 2000. Research Associate in the Institute of Textiles
[9] S. G. Tzafestas and K. C. Zikidis, “NeuroFAST: On-line neuro-fuzzy and Clothing, Hong Kong Polytechnic University,
ART-based structure and parameter learning TSK model,” IEEE Trans. Kowloon, Hong Kong. He is currently with the
Syst., Man, Cybern. B, Cybern., vol. 31, no. 5, pp. 797–802, Oct. 2001. School of Computer Engineering, Nanyang Techno-
[10] T. Takagi and I. Hayashi, “NN-driven fuzzy reasoning,” Int. J. Approx. logical University, Singapore. His current research interests include machine
Reason., vol. 5, no. 3, pp. 191–212, May 1991. learning and signal and image processing.
SUN et al.: NEURO-FUZZY INFERENCE SYSTEM 1331
Kin-Fan Au received the M.Sc. (Eng.) degree in Tsan-Ming Choi (S’00–M’01) received the Ph.D.
industrial engineering from the University of Hong degree in supply chain management from the Chi-
Kong, Hong Kong, and the Ph.D. degree from nese University of Hong Kong, Hong Kong.
the Hong Kong Polytechnic University, Kowloon, He is currently an Assistant Professor with the
Hong Kong. Institute of Textiles and Clothing, Hong Kong Poly-
He is currently an Associate Professor with the technic University, Kowloon, Hong Kong. Over the
Institute of Textiles and Clothing, Hong Kong Poly- past few years, he has actively participated in a vari-
technic University. He has published many papers in ety of research projects in supply chain management.
textiles and related journals on topics of world trad- He has published in journals such as Computers
ing, offshore production, and modeling of textiles and Operations Research, European Journal of
and apparel trade. His research interests include the Operational Research, IEEE TRANSACTIONS, Inter-
business aspects of fashion and textiles, particularly global trading of fashion national Journal of Production Economics, Journal of Industrial and Manage-
and textile products. ment Optimization, Journal of the Operational Research Society, and Omega.
His main research interest is supply chain management.
Prof. Choi is a member of the Institute for Operations Research and the
Management Sciences.