4 views

Original Title: A Neuro-Fuzzy Inference System Through Integration of Fuzzy Logic and Extreme Learning Machines

Uploaded by Soumyadeep Bose

- Combining Neural Networks For Skin Detection
- ANNs
- Study on Application of Quantum BP Neural Network to Curve Fitting
- Adrien Danny Sam Paper
- Artificial Intelligence
- Applications of Neural Networks in Stock Market Prediction-An Approach Based Analysis
- Report 2
- L09 Using Matlab Neural Networks Toolbox
- AI on water
- 10.1.1.385.7876
- DiNuovo 2012 LearningMentalImagery, DAVID Cota
- research doc
- fuzzy logic adaptive
- 30120140507012
- Corrosion Density RC ANN ACME70 2013
- Spice Neuro Guide English
- A Hybrid Pattern Recognition Architecture for Cutting Tool Condition Monitoring Pan Fu and a. D. Hope
- CNSRComboArticle_244208
- 41493817-MIT.pdf
- Generalized Single Valued Triangular Neutrosophic Numbers and Aggregation Operators for Application to Multi-attribute Group Decision Making

You are on page 1of 11

Integration of Fuzzy Logic and Extreme

Learning Machines

Zhan-Li Sun, Kin-Fan Au, and Tsan-Ming Choi, Member, IEEE

Abstract—This paper investigates the feasibility of applying a The main difference between the Mamdani and Sugeno types is

relatively novel neural network technique, i.e., extreme learning that the Sugeno output membership functions are either linear

machine (ELM), to realize a neuro-fuzzy Takagi–Sugeno–Kang or constant [5], [6].

(TSK) fuzzy inference system. The proposed method is an im- Inspired by the biological nervous system, ANNs have been

proved version of the regular neuro-fuzzy TSK fuzzy inference

system. For the proposed method, first, the data that are processed proposed to solve problems that are difficult for conventional

are grouped by the k-means clustering method. The membership computers or human beings. Most ANNs have some sort of

of arbitrary input for each fuzzy rule is then derived through training rules whereby the weights of connections are adjusted

an ELM, followed by a normalization method. At the same time, on the basis of data. In other words, ANNs can learn from

the consequent part of the fuzzy rules is obtained by multiple examples and exhibit some capability for generalization beyond

ELMs. At last, the approximate prediction value is determined the training data [7].

by a weight computation scheme. For the ELM-based TSK fuzzy

inference system, two extensions are also proposed to improve its

Both the fuzzy system and ANNs are soft-computing ap-

accuracy. The proposed methods can avoid the curse of dimension- proaches to model expert behavior [21], [22]. The goal is to

ality that is encountered in backpropagation and hybrid adaptive mimic the actions of an expert who solves relatively complex

neuro-fuzzy inference system (ANFIS) methods. Moreover, the problems. In other words, instead of investigating the problem

proposed methods have a competitive performance in training in detail, one observes how an expert successfully tackles the

time and accuracy compared to three ANFIS methods. problem and obtains knowledge by instruction and learning. A

Index Terms—Adaptive neuro-fuzzy inference system (ANFIS), learning process can be part of knowledge acquisition. In the

extreme learning machine (ELM), k-means clustering, Takagi– absence of an expert or sufficient time or data, one can resort to

Sugeno–Kang (TSK) fuzzy inference system. reinforcement learning instead of supervised learning. If one

has knowledge that is expressed as linguistic rules, one can

I. I NTRODUCTION build a fuzzy system. On the other hand, if one has enough data

or can learn from a simulation or a real task, ANNs are very

are widely used in the areas of prediction, identification,

diagnostics, and control of linear or nonlinear systems [1]–

appropriate [8].

The merits of both neural and fuzzy systems can be inte-

grated in a neuro-fuzzy approach. Combined with the learning

[4], [20]. Fuzzy inference is the process of formulating the ability of ANNs, the fuzzy inference system has proven to

mapping from a given input to an output using fuzzy logic. be a powerful mathematical construct, which also enables the

The mapping then provides a basis from which decisions can symbolic expression of machine learning results. Over the past

be made or patterns can be discerned. Fuzzy inference systems few years, the application of neuro-fuzzy methods to nonlinear

have been successfully applied in fields such as automatic process identification using input–output data is a very active

control, data classification, decision analysis, expert systems, research area [9]. A comprehensive and insightful survey can

and computer vision. There are mainly two types of fuzzy be found in [8].

inference systems: 1) the Mamdani type and 2) the Sugeno type. One of the most influential neuro-fuzzy reasoning models

has been proposed by Takagi and Sugeno in [10]–[12]. Since

Manuscript received September 6, 2006; revised April 18, 2007. The work of

then, Sugeno et al. have established what is called today

K.-F. Au was supported in part by the RGC Competitive Earmarked Research the Takagi–Sugeno–Kang (TSK) method. This neural-network-

Grant PolyU 5101/05E. The work of T.-M. Choi was supported in part by the based fuzzy reasoning scheme is capable of learning the mem-

RGC Competitive Earmarked Research Grant PolyU 5145/06E and in part by bership function of the “IF” part and determining the amount

the competitive grant of the Hong Kong Polytechnic University A-PH22. This

paper was recommended by Associate Editor E. Santos. of control in the “THEN” part of the inference rules. It is well

Z.-L. Sun was with the Institute of Textiles and Clothing, Hong suited to mathematical analysis and usually works well with

Kong Polytechnic University, Kowloon, Hong Kong. He is currently with optimization and adaptive techniques. Subsequently, many im-

the School of Computer Engineering, Nanyang Technological University,

Singapore 639798, Singapore.

proved algorithms and extensions were developed for the TSK

K.-F. Au and T.-M. Choi are with the Institute of Textiles and Clothing, model [8]. In particular, the adaptive neuro-fuzzy inference

Hong Kong Polytechnic University, Kowloon, Hong Kong (e-mail: tcjason@ system (ANFIS) is an important approach to implement the

inet.polyu.edu.hk). TSK fuzzy system. The ANFIS is a five-layer network structure

Color versions of one or more of the figures in this paper are available online

at http://ieeexplore.ieee.org. that constructs a fuzzy inference system. There are three meth-

Digital Object Identifier 10.1109/TSMCB.2007.901375 ods that ANFIS learning employs for updating membership

1322 IEEE TRANSACTIONS ON SYSTEMS, MAN, AND CYBERNETICS—PART B: CYBERNETICS, VOL. 37, NO. 5, OCTOBER 2007

function parameters: 1) backpropagation for all parameters To select the number of hidden neurons of ELM, the data

(ANFIS1); 2) a hybrid method consisting of backpropagation are first divided into training, validation, and prediction sets.

for the parameters that are associated with the input member- We gradually increase the number of hidden neurons in a given

ship functions and least squares estimation for the parame- interval and then select the result with the smallest validation

ters that are associated with the output membership functions error as the final one. Besides this selection criterion, as a stabil-

(ANFIS2) [5], [13]; and 3) the subtractive clustering method ity performance index, the standard deviation of the validation

(ANFIS3). error is also considered in the selection procedure.

The TSK and most of its modified versions, however, are For the ETSK, we give two extensions that can obtain smaller

based on gradient-based learning algorithms. The conventional training, validation, and prediction errors. In the first extension

gradient-based learning algorithms, such as backward prop- method, we repeat a given number of trials for the ETSK with

agation (BP) and its variant, i.e., the Levenberg–Marquardt the randomly selected input weights at first. Then, the set of

method [23], have been extensively used in the training of input weights that gives the smallest validation error is selected.

multilayer feedforward neural networks. Although reasonable The prediction values of the training, validation, and prediction

performance can be obtained when the networks are trained sets are computed based on the selected input weights. This

by BP, these gradient-based learning algorithms still learn rel- extension is named ETSKE1 in this paper.

atively slowly. These learning algorithms may also easily con- For the second extension method, we select several sets of

verge to a local minimum. Moreover, the activation functions input weights with the smallest validation errors instead of

that are used in these gradient-based tuning methods need to be one set in the first extension. The prediction values of the

differentiable [14]. training, validation, and prediction sets are computed by using

A novel learning algorithm for single-hidden-layer feed- the respective set of input weights. The mean value of the

forward neural networks (SLFNs) called extreme learning prediction values is regarded as the final prediction value. This

machine (ELM) [15], [24] has recently been proposed by extension is named ETSKE2 in this paper.

Huang et al. In ELM, the input weights (linking the input layer The rest of the paper is organized as follows. Section II

to the hidden layer) and hidden biases are randomly chosen, and provides some necessary background information, and the pro-

the output weights (linking the hidden layer to the output layer) posed system is discussed in Section III. Section IV presents

are analytically determined by using Moore–Penrose (MP) the simulation results and discussion of the ETSK. Finally, the

generalized inverse. ELM not only learns much faster with a summary of this paper is given in Section V.

higher generalization performance than the traditional gradient-

based learning algorithms but it also avoids many difficulties II. B ACKGROUND

that are faced by gradient-based learning methods such as

stopping criteria, learning rate, learning epochs, local minima, Fig. 1 shows the block diagram of the proposed TSK fuzzy

and overtuning issues [14], [24], [25]. However, as the output inference system. In this section, we first give a concise review

weights are computed based on the prefixed input weights of the regular TSK fuzzy inference system, ELMs, and the

and hidden biases, there may exist a set of nonoptimal or normalization method.

unnecessary input weights, and hidden biases. In [14], a hybrid

approach named E-ELM is proposed by combining differential A. Regular TSK Fuzzy Inference System

evolution (DE) and ELM. In E-ELM, a modified DE is used to The core of the TSK model [10] is a set of IF–THEN rules

search for the optimal input weights and hidden biases, while with fuzzy implications and first-order functional consequence

the MP generalized inverse is used to analytically calculate the parts, which has been proven to be a universal approximator.

output weights. The authors find that the hybrid method can The format of fuzzy rule Ri is given as follows:

achieve a good generalization performance with much more

compact networks. One shortcoming of E-ELM, however, is Ri : IF x1 is Ai1 , x2 is Ai2 , . . . , xM is AiM

that it may take much more training time than ELM because THEN yi = ci0 + ci1 x1 + · · · + ciM xM .

it incorporates DE. Moreover, there are more parameters to be

adjusted in E-ELM than ELM. For the fuzzy inference system, The number of rules can be determined by the clustering

there are multiple neural networks that are trained at the same method. The algorithm creates linear models that locally ap-

time. Therefore, it may take a long training time if we use proximate the function to be learned. Structure identification

E-ELM in the fuzzy inference system. It is also difficult to sets a coarse fuzzy partitioning of the domain, while param-

obtain the optimum parameters of E-ELM. eter identification optimally adjusts premise and consequent

Based on the previous analyses, we propose an ELM-based parameters. The algorithm is divided into three major parts:

TSK (ETSK) fuzzy inference system in this paper. In the ETSK 1) the partition of inference rules; 2) the identification of IF

method, the membership of arbitrary input for each fuzzy rule is parts; and 3) the identification of THEN parts. For the TSK, an

derived through an ELM, followed by a normalization method, ANN represents a rule, while all the membership functions are

which transforms the outputs into the interval [0, 1]. At the represented by only one ANN.

same time, the consequent part of the fuzzy rules is identified

by multiple ELMs. Due to the advantages of ELM, the ETSK

B. ELM

can avoid the many difficulties that are faced by gradient-

based learning methods such as stopping criteria, learning rate, ELM is a relatively new learning algorithm for SLFNs [14],

learning epochs, local minima, and overtuning issues, which are [15]. It randomly chooses the input weights and analytically

also encountered in the regular TSK fuzzy inference system. determines the output weights of SLFNs.

SUN et al.: NEURO-FUZZY INFERENCE SYSTEM 1323

xi , β = [β 1 , . . . , β K ] is the matrix of output weights, and

T = [t1 , t2 , . . . , tN ]T is the matrix of targets.

In ELM, the input weights and hidden biases are randomly

generated instead of tuned. Thus, determination of the output

weights (linking the hidden layer to the output layer) is as

simple as finding the least square (LS) solution to the given

linear system. The minimum norm LS solution to the linear

system (3) is

β= H† T (5)

The minimum norm LS solution is unique and has the smallest

norm among all the LS solutions. As analyzed by Huang et al.

[15], by using such MP inverse method, ELM tends to obtain a

good generalization performance with a dramatically increased

learning speed.

C. Normalization Method

The input data can be projected into the interval [0, 1] [(6)

and (7)] or [−1, 1] [(8) and (9)]. In addition, the output of

the membership neural network should be projected into the

interval [0, 1], i.e.,

i = 1, 2, . . . , n; p = 1, . . . , N

Fig. 1. Block diagram of the proposed TSK fuzzy inference system. (6)

yp : = (yp − min{yp })/(max{yp } − min{yp }) ,

Suppose that we are training SLFNs with K hidden neu- p = 1, . . . , N (7)

rons and an activation function vector g(x) = (g1 (x), g2 (x),

xpi : = xpi /max{abs(xpi )}, i = 1, 2, . . . , n; p = 1, . . . , N

. . . , gK (x)) to learn N distinct samples (xi , ti ), where xi =

[xi1 , xi2 , . . . , xin ]T ∈ Rn and ti = [ti1 , ti2 , . . . , tim ]T ∈ Rm . (8)

The SLFNs can approximate these N samples with a zero error ypi : = ypi /max{abs(ypi )}, p = 1, . . . , N (9)

given by

where

N

oj − tj = 0 (1) min{xpi } := min{xpi , p = 1, . . . , N } (10)

j=1 max{xpi } := max{xpi , p = 1, . . . , N } (11)

min{yp } := min{yp , p = 1, . . . , N } (12)

which means that there exist parameters β i , wi , and bi max{xpi } := max{xpi , p = 1, . . . , N }. (13)

such that

The operators min{·} and max{·} in the preceding expres-

K

sions are used to select the minimum and maximum values

β i gi (wi · xj + bi ) = tj , j = 1, . . . , N (2)

from the given data series, respectively. The symbol abs is the

i=1

absolute value operator. For more details, please refer to [17].

where wi = [wi1 , . . . , win ]T is the weight vector connect-

ing the ith hidden neuron and the input neurons, β i = III. P ROPOSED TSK F UZZY I NFERENCE S YSTEM

[βi1 , . . . , βim ]T , i = 1, . . . , K is the weight vector connecting

the ith hidden neuron and the output neurons, and bi is the In this section, we discuss the proposed fuzzy inference

threshold of the ith hidden neuron. The operation wi · xj in system in more detail, including the general structure and the

(2) denotes the inner product of wi and xj . The preceding N corresponding learning algorithm.

equations can be written compactly as

A. General Structure

Hβ = T (3)

Fig. 1 shows the general structure of the proposed TSK

where H = {hij } (i = 1, . . . , N and j = 1, . . . , K) is the fuzzy inference system. In Fig. 1, block N Nmf is used to

hidden-layer output matrix, the expression determine the membership values of all rules, and blocks

N N1 , N N2 , . . . , N Nr are used to determine output gi for the

hij = g(wj · xi + bj ) (4) “THEN” part of the ith rule, i = 1, 2, . . . , r. The variables X

1324 IEEE TRANSACTIONS ON SYSTEMS, MAN, AND CYBERNETICS—PART B: CYBERNETICS, VOL. 37, NO. 5, OCTOBER 2007

and y are the input vector and final predicted value, respectively. Step 3) This step is the identification of the constitution

From Fig. 1, it can be seen that the system mainly consists of of each IF part. Here, the ELM and normalization

the following four major parts: method are used to generate the membership func-

1) division of input space; tions under a supervised method. The input vector

2) identification of the “IF” parts; and target vector can be constituted by the following

3) identification of the “THEN” parts; technique. For the input xi , assume that the target

4) computation of the predicted value. vector is wi = [wi1 , . . . , wir ], if xi ∈ Rs , wis = 1,

and the other elements of wi are all 0. After train-

There are two main tasks for the first part. One is determining ing, the output of the ELM is normalized into the

the number of fuzzy inference rules. The other is to divide the interval [0, 1].

input space, so that similar samples are clustered into a group. The learning of N Nmem is conducted, so that

The best number of partitions is decided in view of the distance these wis can be inferred from the input xi . Thus,

between the clusters in a clustering dendrogram. The number N Nmem becomes capable of inferring the degree of

of inference rules equals the number of groups. In this paper, attribution wis of each training data item xi to Rs .

following the approach in [10], the input data are grouped by We define the membership function of the IF part

the k-means method, which is an efficient clustering method. as the inferred value w̃is , which is the output of the

Certainly, other clustering methods can also be used here. For learned N Nmem , i.e.,

a comparison of various kinds of clustering methods, please

refer to [18].

The second part consists of an ELM and a normalization µsA (xi ) = w̃is , i = 1, . . . , N. (14)

processing procedure. This part is used to determine the mem-

bership of arbitrary input for each rule, which derives the

membership function for each rule and corresponds to the Step 4) This step is the identification of each THEN part.

identification of the IF parts (the condition parts) of the rule. Let the ELMs denote the sth neural network of the

Generally, not all of the ELM’s outputs are included in the THEN part in Fig. 1. The structure of the THEN

interval [0, 1]. As we know, the membership value should be part of each inference rule is expressed by the

a data point in the interval [0, 1]. Therefore, here, we use nor- input/output relationship of the ELMs. The T RD

malization to project the output of ELM into the interval [0, 1]. input xsi1 , . . . , xsim and the output value yis , i =

The third part of the system is the determination of the THEN 1, . . . , (Nt )s , are assigned to the input and output

parts (the conclusion parts). Here, multiple ELMs are used to of the ELMs.

get the function between the input and the output. The ELM is Step 5) The final output value yi∗ is derived according to the

trained by the learning data and the output value for each rule. following equation [10]:

The last part weighs the output of the THEN part by the

membership values of the IF parts and computes the final

r

output value. µsA (xi ) · us (xi )

yi∗ = s=1

r , i = 1, . . . , N. (15)

µsA (xi )

B. Algorithm of the ETSK Fuzzy Inference System s=1

that the number of the samples is N . Define the input vari- After training, we can directly get its correspond-

ables as xi = (xi1 , . . . , xin ) and the observed value as ti , i = ing prediction values when V AD and P RD are the

1, . . . , N . Here, we give the specific steps of ETSK [Steps 1)– inputs into the system.

5)], ETSKE1, and ETSKE2 [Steps 6)–8)]. Step 6) Repeat Steps 3) and 5) for P times for the same

Step 1) Divide the input/output data (xi , ti ) into training data, then obtain P actual prediction series yij , i =

data (T RD of Nt ), validation data (V AD of Nc ), 1, 2, . . . , N ; j = 1, 2, . . . , P .

and predicting data (P RD of Np ), where N = Nt + Step 7) For ETSKE1, we select the set of input weights

Nc + N p . that gives the smallest validation error in the P

Step 2) Partition of the T RD. The T RD is clustered by the trials. Then, the training and prediction errors are

k-means method. The best number of partitions is computed based on the selected input weights.

decided in view of the distance between the clusters Step 8) For ETSKE2, we set a selection ratio (ratio) first.

in a clustering dendrogram, i.e., assuming that the Then, select ratio × P sets of input weights that

best number of partitions is r, the sum of all dis- have the smallest validation errors in the P trials.

tances from each sample point to the cluster center From (2) and (5), we can see that the outputs of

is smallest when the input data are divided into r ELM are determined after the input weights are

groups. The division of m-dimensional space into chosen. Then, the actual output value y∗ is also de-

r here means that the number of inference rules is termined by (15). For each set of input weights, we

set to be r. Denoting each group of the T RD by compute the actual output value y∗ for the training,

Rs , s = 1, . . . , r, the T RD of Rs is expressed by validation, and prediction sets. Finally, the mean of

(xsi , yis ), where i = 1, . . . , (Nt )s , and (Nt )s are the the actual output values for the eight sets of input

T RD numbers in each Rs . weights is regarded as the final output value.

SUN et al.: NEURO-FUZZY INFERENCE SYSTEM 1325

EXPERIMENTAL PARAMETERS IN EXAMPLE 1

In order to test the validity and performance of the proposed

algorithm, we present in this section the experimental results

with artificial and real-world data sets. Specifically, as an

example, we give a concise explanation for each step in Fig. 1,

combined with the data that are processed in experiment 1.

These explanations are not repeated in experiments 2–4 since

they are similar to the first experiment. All simulations were TABLE II

conducted in Matlab environment running on an ordinary per- COMPARISON OF mse RESULTS FOR THE SIX METHODS IN EXAMPLE 1

sonal computer with a dual-core central processing unit (3.20

and 3.19 GHz) and 1-GB memory.

In the following experiments, the activation function of ELM

is the sigmoidal function:

1

y= . (16)

1 + e−x

The performance index is the mean squared error mse be-

tween the predicted value yi and the actual value ti , i.e.,

1

N

mse = (yi − ti )2 . (17)

N i=1

definition of the standard deviation std for performance index

mse, i.e.,

1 Q

std = (msei − µ)2 (18)

Q i=1

is obtained when we run the algorithm for the ith time

for the same data, and µ is the mean value of all msei ,

i = 1, 2, . . . , Q, i.e.,

As an example, we give a detailed description for the first set

of experiments E1. First, set S1 is used as the prediction set,

1

Q

µ= msei . (19) while the remaining samples are used as the training samples.

Q i=1

In the training samples, 60% of the whole data is used as the

training set, and 20% is used as the validation set. We first

normalize the inputs of the training samples into the interval

A. Experimental Results [−1, 1] by (8) and (9). Then, the training set is partitioned

Example 1: To compare the performance of the proposed into two clusters by using the k-means method. After that,

algorithm with three ANFIS methods, let us consider the fol- the proposed system is trained according to Steps 3)–5), as

lowing nonlinear three-input–single-output system: shown in Section III-B, when the number of hidden neurons is

increased from 1 to 50, respectively. As a result, the validation

−1

−1.5 2 error is the smallest in this example when the number of hidden

y = 1.0 + x0.5

1 + x2 + x3 , 1 ≤ x1 , x2 , x3 ≤ 5

(20) neurons is 12. Therefore, we set the number of hidden neurons

to be 12 and repeat 20 trials for ETSK, in which set S1 is used

which has been used in many references [2, p. 276], [10]. In as the prediction set.

simulations, n-fold cross validation is used to compare the For ETSK, we compute the mean values of the training,

reliability of the proposed and existing methods. Here, the validation, and prediction errors in the 20 trials.

parameter n is set to be 5. The data are divided equally into five For ETSKE1, we select a set of input weights that corre-

sets (S1, S2, . . . , S5). For the data, we repeat five sets of exper- sponds to the smallest validation error in the 20 trials. Ac-

iments that are denoted as E1, E2, . . . , E5, respectively. The cording to the description of Section II-B, we compute the

number of hidden neurons for ETSK nhidden, the selection output weights using (5) and the input weights. Then, the actual

ratio for ETSKE2 sratio, and the parameter “radii” for ANFIS3 outputs of ELMs in Fig. 1 are computed using the input and

pradii are given in Table I. Table II shows the corresponding output weights. Finally, the final output of the system in Fig. 1

experimental results. For simplicity, the training, validation, is obtained by (15). The training and prediction errors are

and prediction sets are abbreviated as “trnSet,” “valSet,” and computed according to the actual and expected outputs of the

“preSet,” respectively, in the following experiments. system.

1326 IEEE TRANSACTIONS ON SYSTEMS, MAN, AND CYBERNETICS—PART B: CYBERNETICS, VOL. 37, NO. 5, OCTOBER 2007

Fig. 2. Actual prediction series y1(t), y2(t), . . . , y8(t) for the validation set based on the selected eight sets of input weights.

E2, E3, E4, and E5 when we set each one of S2, S3, S4, and

S5 as the prediction set in sequence and repeat the preceding

steps.

For the three ANFIS methods, the experimental data are the

same as the ETSK and two extensions.

For ANFIS1, the standard functions “genfis1” and “anfis”

in Matlab 6.5 are used according to the examples in [5]. The

parameter “optMethod” for the function “anfis” is set to be zero.

The training epoch number is set to be 1000. To improve the

generation performance, the validation set is used as the “chk

Data” in the function “anfis.”

For ANFIS2, the parameter “optMethod” for the function

“anfis” is set to be 1. The training epoch number is set to

be 1000. The validation set is used as the “chk Data” in the

function “anfis.”

Fig. 3. Final prediction series y(t) for the validation set. For ANFIS3, the standard function “genfis2” in Matlab 6.5

is used according to the examples in [5]. According to the

For ETSKE2, we select eight (0.4 × 20) sets of input weights suggestion of [5], the parameter “radii” is selected from the

that have the smallest validation errors in the 20 trials. For interval [0.2, 0.5].

every set of input weights, similar to ETSKE1, we compute the For the experimental results in Table II, the mean values µ,

actual prediction series for the training, validation, and predic- standard deviations σ, and coefficients of variation µ/σ are

tion sets. To keep the paper more compact, we only give the given in Table III.

results for the validation set. Fig. 2 shows the actual prediction From Table III, it can be seen that ANFIS2 and ANFIS3 have

series y1(t), y2(t), . . . , y8(t) for the validation set based on the smaller training, validation, and prediction errors than ANFIS1

selected eight sets of input weights. Finally, the mean prediction and ETSK. ETSKE1 and ETSKE2 have smaller validation

series y(t) (y(t) = (y1(t) + y2(t) + · · · + y8(t))/8) for the and prediction errors than ANFIS2 and ANFIS3 but a higher

eight sets of actual prediction series y1(t), y2(t), . . . , y8(t) is training error. In addition, ETSKE1 and ETSKE2 have smaller

regarded as the final prediction series. Fig. 3 shows the final standard deviation σ than other methods.

prediction series for the validation set. In Figs. 2 and 3, the Table IV shows the time that is consumed by the six methods.

horizontal axis denotes the code number of the sample and The symbol µ denotes the mean training time of the five sets

the vertical axis denotes the sample value. The error between of experiments. Note that the training time of ETSKE1 or

the final prediction series and the expected value can be com- ETSKE2 is the sum of 20 trials of ETSK. From Table IV, we

puted by (17). can see that ETSK has the shortest training time.

SUN et al.: NEURO-FUZZY INFERENCE SYSTEM 1327

MEAN VALUES, STANDARD DEVIATIONS, AND COEFFICIENT EXPERIMENTAL PARAMETERS IN EXAMPLE 2

OF V ARIATION FOR THE S IX M ETHODS IN E XAMPLE 1

TABLE VI

COMPARISON OF mse RESULTS FOR THE SIX METHODS IN EXAMPLE 2

TABLE IV

COMPARISON OF TRAINING TIME FOR THE SIX METHODS IN EXAMPLE 1

are shown in this paper are the mean of 20 trials. The mean

values, standard deviations, and the ratios between the standard

deviations and mean values are given in Table VII.

Table VIII shows the training time of six methods. From

Tables V–VII, we can see that the six methods have similar

accuracy. However, ETSK and its extensions have an obviously

shorter training time than the three ANFIS methods in this

experiment.

Example 3: In the following, we will give the simulation

results on the rice taste data [17], [19, p. 269]. The data

consist of five inputs and a single output whose values are

associated with subjective evaluations as follows: x1 : flavor, x2 :

Example 2: In this example, all the six algorithms are used appearance, x3 : taste, x4 : stickiness, x5 : toughness, y: overall

to approximate the “SinC” function1 evaluation. The input/output pairs can be written as

sin(x)/x, x = 0

y(x) = . (21) ((xp1 , xp2 , xp3 , xp4 , xp5 ), yp ) , p = 1, 2, . . . , 105. (22)

1, x=0

Here, 10 000 input/output pairs are used as the experimen- The first 70 data pairs are used as the training data, while the

tal data. The five-fold cross validation is used to verify the rest are used as the testing data. The number of hidden neurons

reliability of six methods. The number of hidden neurons for for ETSK nhidden, the selection ratio for ETSKE2 sratio, and

ETSK nhidden, the selection ratio for ETSKE2 sratio, and the parameter “radii” for ANFIS3 pradii are given in Table IX.

the parameter “radii” for ANFIS3 pradii are given in Table V. The training, validation, and prediction errors for six meth-

The training, validation, and prediction errors for six meth- ods are given in Table X. The mean values µ, standard devia-

ods are given in Table VI. For ETSK, all the results that tions σ, and coefficients of variation µ/σ are given in Table XI.

Table XII shows the training time for the six methods.

Tables X–XII show that ETSKE2 is the best method with

1 http://www.ntu.edu.sg/home/egbhuang/. respect to both training time and errors.

1328 IEEE TRANSACTIONS ON SYSTEMS, MAN, AND CYBERNETICS—PART B: CYBERNETICS, VOL. 37, NO. 5, OCTOBER 2007

MEAN VALUES, STANDARD DEVIATIONS, AND COEFFICIENT COMPARISON OF mse RESULTS FOR THE SIX METHODS IN EXAMPLE 3

OF V ARIATION FOR THE S IX M ETHODS IN E XAMPLE 2

TABLE XI

MEAN VALUES, STANDARD DEVIATIONS, AND COEFFICIENTS

OF V ARIATION FOR THE S IX M ETHODS IN E XAMPLE 3

TABLE VIII

COMPARISON OF TRAINING TIME FOR THE SIX METHODS IN EXAMPLE 2

TABLE IX

EXPERIMENTAL PARAMETERS IN EXAMPLE 3

used in this example.2 There are 4177 samples being used.

The number of hidden neurons for ETSK, the selection ratio

for ETSKE2, and the parameter “radii” for ANFIS3 are given

in Table XIII.

There are eight input variables for the benchmark data. In

this case, ANFIS1 and ANFIS2 both fall prey to the curse of dimensionality. During the simulation, for ANFIS1, the

training time is far longer than the training times of ANFIS3,

2 http://www.niaad.liacc.up.pt/~ltorgo/Regression/ds_menu.html. ELM, or ELME. We do not have the experimental results of

SUN et al.: NEURO-FUZZY INFERENCE SYSTEM 1329

COMPARISON OF TRAINING TIME FOR THE SIX METHODS IN EXAMPLE 3 COMPARISON OF TRAINING TIME FOR

THE F IVE M ETHODS IN E XAMPLE 4

TABLE XVII

NUMBER OF INPUTS Dim, NUMBER OF SAMPLES N , AND THE

TABLE XIII CORRESPONDING MEAN TRAINING TIME ∆t1, . . . , ∆t4 OF THE

EXPERIMENTAL PARAMETERS IN EXAMPLE 4 SIX METHODS IN THE PRECEDING FOUR EXPERIMENTS

TABLE XIV

COMPARISON OF mse RESULTS FOR THE FIVE METHODS IN EXAMPLE 4

TABLE XVIII

TIME RATIO BETWEEN THE FIRST EXPERIMENT (∆t1)

AND THE O THER T HREE E XPERIMENTS

TABLE XV

MEAN VALUES, STANDARD DEVIATIONS, AND COEFFICIENT

OF V ARIATION FOR THE F IVE M ETHODS IN E XAMPLE 4

ANFIS2 because the algorithm cannot converge. The training,

validation, and prediction errors of the five methods are given

in Table XIV. The mean values µ, standard deviations σ, and

coefficients of variation σ/µ are given in Table XV.

Table XVI shows the training time for the five methods. It

can be seen from Tables XIII–XVI that ETSKE1 and ETSKE2

have shorter training time and smaller errors than ANFIS1 and

ANFIS3. Since ANFIS1 falls prey to the curse of dimensional-

ity, the corresponding training time is increased dramatically.

From the preceding experimental results, we can find that

ETSKE1 and ETSKE2 generally have a relatively shorter train-

ing time and smaller errors than the three ANFIS methods. To

analyze the performance of the six methods further, Table XVII

shows the number of inputs Dim, the number of samples N ,

and the corresponding mean training time ∆t1, . . . , ∆t4 of

six methods in the preceding four experiments. The time ratio

between the first experiment (∆t1) and other experiments is

given in Table XVIII.

1330 IEEE TRANSACTIONS ON SYSTEMS, MAN, AND CYBERNETICS—PART B: CYBERNETICS, VOL. 37, NO. 5, OCTOBER 2007

From Tables XVI–XVIII, we can see that the training times [11] M. Sugeno and G. T. Kang, “Structure identification of fuzzy model,”

of ANFIS1 and ANFIS2 increase quickly when the numbers Fuzzy Sets Syst., vol. 28, no. 1, pp. 15–33, Oct. 1988.

[12] M. Sugeno and K. Tanaka, “Successive identification of a fuzzy model

of inputs and samples increase. When the number of inputs and its applications to prediction of a complex system,” Fuzzy Sets Syst.,

is more than five, ANFIS1 and ANFIS2 fall into the curse of vol. 42, no. 3, pp. 315–334, Aug. 1991.

dimensionality. We cannot obtain the experimental results of [13] J.-S. R. Jang, “ANFIS: Adaptive-network-based fuzzy inference systems,”

ANFIS2 in our computer. The training time of ANFIS3 also IEEE Trans. Syst., Man, Cybern., vol. 23, no. 3, pp. 665–685, May 1993.

[14] Q.-Y. Zhu, A. K. Qin, P. N. Suganthan, and G.-B. Huang, “Evolutionary

increases when the numbers of inputs and samples increase. extreme learning machine,” Pattern Recognit., vol. 38, no. 10, pp. 1759–

It seems that ANFIS3 is more sensitive to the number of 1763, Oct. 2005.

samples than the number of inputs. The training times of ETSK, [15] G.-B. Huang, Q.-Y. Zhu, and C.-K. Siew, “Extreme learning machine: A

new learning scheme of feedforward neural networks,” in Proc. IJCNN,

ETSKE1, and ETSKE2 increase slowly with the increasing Budapest, Hungary, Jul. 25–29, 2004, vol. 2, pp. 985–990.

numbers of inputs and samples. The training times of ETSKE1 [16] R. Storn and K. Price, “Differential evolution—A simple and efficient

and ETSKE2 are less than the three ANFIS methods in the last heuristic for global optimization over continuous spaces,” J. Global

three experiments. Optim., vol. 11, no. 4, pp. 341–359, Dec. 1997.

[17] H. Ishibuchi, K. Nozaki, H. Tanaka, Y. Hosaka, and M. Matsuda, “Empir-

ical study on learning in fuzzy systems by rice taste analysis,” Fuzzy Sets

Syst., vol. 64, no. 2, pp. 129–144, Jun. 1994.

V. C ONCLUSION [18] Z.-Q. Bian, X.-G. Zhang et al., Pattern Recognition. Beijing, China:

Tsinghua Univ. Press.

The TSK model is one of the most influential neuro-fuzzy [19] K. Nozzaki, H. Ishibuchi, and H. Tanaka, “A simple but powerful heuristic

reasoning models. In this paper, we investigate the feasibility method for generating fuzzy rules from numerical data,” Fuzzy Sets Syst.,

of applying a relatively novel neural network technique ELM vol. 86, no. 3, pp. 251–270, Mar. 1997.

[20] S. K. Pal and S. Mitra, “Multi-layer perceptron, fuzzy sets and classifica-

to realize a neuro-fuzzy TSK fuzzy inference system. For the tion,” IEEE Trans. Neural Netw., vol. 3, no. 5, pp. 683–697, Sep. 1992.

ETSK fuzzy inference system, two extensions are proposed [21] S. K. Pal and S. Mitra, Neuro-Fuzzy Pattern Recognition: Methods in Soft

to improve its accuracy. The proposed methods can avoid the Computing. New York: Wiley, 1999.

curse of dimensionality that is encountered in backpropagation [22] S. Mitra and S. K. Pal, “Fuzzy self organization, inferencing and rule

generation,” IEEE Trans. Syst., Man, Cybern. A, Syst., Humans, vol. 26,

and hybrid ANFIS methods. Moreover, when the numbers of no. 5, pp. 608–620, Sep. 1996.

inputs and samples are relatively large, the proposed extensions [23] J. S. R. Jang and E. Mizutani, “Levenberg–Marquardt method for ANFIS

of ETSK usually have higher accuracy and shorter training learning,” in Proc. Biennial Conf. NAFIPS, Berkeley, CA, Jun. 19–22,

1996, pp. 87–91.

time compared to the three ANFIS methods. The advantage [24] G.-B. Huang, Q.-Y. Zhu, and C.-K. Siew, “Extreme learning machine:

of the ANFIS methods is that they usually can obtain a stable Theory and applications,” Neurcomputing, vol. 70, no. 1–3, pp. 489–501,

prediction value after the training is converged, while the ETSK Dec. 2006.

methods cannot. In general, the proposed methods have a com- [25] G.-B. Huang and H. A. Babri, “Universal approximation using incre-

mental networks with random hidden computation nodes,” IEEE Trans.

petitive performance in training time and accuracy compared to Neural Netw., vol. 17, no. 4, pp. 879–892, Jul. 2006.

the three ANFIS (ANFIS) methods.

ACKNOWLEDGMENT

The authors would like to thank the Editor-in-Chief, the

Associate Editor, and the anonymous reviewers who have given

many helpful comments and suggestions. They would also like

to thank Dr. Y. Yu, Dr. C. Hung Chiu, and L. Chow for their

valuable suggestions and inputs.

R EFERENCES

[1] S. Cong, Neural Network, and Their Application in the Moving Control.

Hefei, China: Univ. Sci. and Technol. China Press, 2001.

[2] S.-T. Wang, Fuzzy System, Fuzzy Neural Network and Design of Applica-

tion Program. Shanghai, China: Shanghai Sci. & Tech. Publishers, 1998.

[3] Z.-L. Sun, D.-S. Huang, C.-H. Zheng, and L. Shang, “Optimal selection

of time lags for temporal blind source separation based on genetic algo-

rithm,” Neurocomputing, vol. 69, no. 7–9, pp. 884–887, Mar. 2006.

[4] H. Demuth and M. Beale, Neural Network Toolbox for Use With MATLAB,

Users Guide ver. 4.0. Natick, MA: The Mathworks, Inc., 1998.

[5] J. S. R. Jang and N. Gulley, Fuzzy Logic Toolbox for Use With MATLAB. Zhan-Li Sun received the B.Sc. degree from

Natick, MA: The Mathworks, Inc., 2006. Huainan Industrial University, Huainan, China, in

[6] L. A. Zadeh, “Fuzzy sets,” Inf. Control, vol. 8, no. 3, pp. 338–352, 1965. 1997, the M.Sc. degree from the Hefei University of

[7] A. P. Paplinski, Neuro-fuzzy computing. [Online]. Available: http:// Technology, Hefei, China, in 2003, and the Ph.D. de-

www.csse.monash.edu.au/courseware/cse5301/04/index.html gree from the University of Science and Technology

[8] S. Mitra and Y. Hayashi, “Neuro-fuzzy rule generation: Survey in soft of China, Hefei, in 2005.

computing framework,” IEEE Trans. Neural Netw., vol. 11, no. 3, From March 2006 to March 2007, he was a

pp. 748–767, May 2000. Research Associate in the Institute of Textiles

[9] S. G. Tzafestas and K. C. Zikidis, “NeuroFAST: On-line neuro-fuzzy and Clothing, Hong Kong Polytechnic University,

ART-based structure and parameter learning TSK model,” IEEE Trans. Kowloon, Hong Kong. He is currently with the

Syst., Man, Cybern. B, Cybern., vol. 31, no. 5, pp. 797–802, Oct. 2001. School of Computer Engineering, Nanyang Techno-

[10] T. Takagi and I. Hayashi, “NN-driven fuzzy reasoning,” Int. J. Approx. logical University, Singapore. His current research interests include machine

Reason., vol. 5, no. 3, pp. 191–212, May 1991. learning and signal and image processing.

SUN et al.: NEURO-FUZZY INFERENCE SYSTEM 1331

Kin-Fan Au received the M.Sc. (Eng.) degree in Tsan-Ming Choi (S’00–M’01) received the Ph.D.

industrial engineering from the University of Hong degree in supply chain management from the Chi-

Kong, Hong Kong, and the Ph.D. degree from nese University of Hong Kong, Hong Kong.

the Hong Kong Polytechnic University, Kowloon, He is currently an Assistant Professor with the

Hong Kong. Institute of Textiles and Clothing, Hong Kong Poly-

He is currently an Associate Professor with the technic University, Kowloon, Hong Kong. Over the

Institute of Textiles and Clothing, Hong Kong Poly- past few years, he has actively participated in a vari-

technic University. He has published many papers in ety of research projects in supply chain management.

textiles and related journals on topics of world trad- He has published in journals such as Computers

ing, offshore production, and modeling of textiles and Operations Research, European Journal of

and apparel trade. His research interests include the Operational Research, IEEE TRANSACTIONS, Inter-

business aspects of fashion and textiles, particularly global trading of fashion national Journal of Production Economics, Journal of Industrial and Manage-

and textile products. ment Optimization, Journal of the Operational Research Society, and Omega.

His main research interest is supply chain management.

Prof. Choi is a member of the Institute for Operations Research and the

Management Sciences.

- Combining Neural Networks For Skin DetectionUploaded bysipij
- ANNsUploaded byjgrove255
- Study on Application of Quantum BP Neural Network to Curve FittingUploaded bymariosergio05
- Adrien Danny Sam PaperUploaded byAzri Mohd Khanil
- Artificial IntelligenceUploaded byArushi Gupta
- Applications of Neural Networks in Stock Market Prediction-An Approach Based AnalysisUploaded byIRJET Journal
- Report 2Uploaded byakg299
- L09 Using Matlab Neural Networks ToolboxUploaded byLakmalWeerasinghe
- AI on waterUploaded byBoddeda Goutham
- 10.1.1.385.7876Uploaded byphankhoa83
- DiNuovo 2012 LearningMentalImagery, DAVID CotaUploaded byErika Sánchez
- research docUploaded byapi-432074264
- fuzzy logic adaptiveUploaded byGirish Patil
- 30120140507012Uploaded byIAEME Publication
- Corrosion Density RC ANN ACME70 2013Uploaded byDhanamLakshmi
- Spice Neuro Guide EnglishUploaded byvitcon1909
- A Hybrid Pattern Recognition Architecture for Cutting Tool Condition Monitoring Pan Fu and a. D. HopeUploaded bypptmnlt
- CNSRComboArticle_244208Uploaded bycbqucbqu
- 41493817-MIT.pdfUploaded byTyler Lawrence Coye
- Generalized Single Valued Triangular Neutrosophic Numbers and Aggregation Operators for Application to Multi-attribute Group Decision MakingUploaded byAnonymous 0U9j6BLllB
- Fuzzy LogicUploaded byDevang_Ghiya
- A Combination of ICA-ANN Model to Predict Air-overpressure Resulting From BlastingUploaded byRishav Kanth
- NeuralNets OverviewUploaded bysavisu
- Computational-Fluid-Dynamics-Expert-System-using-Artificial-Neural-Networks.pdfUploaded byAnonymous p4tN2uKslj
- Important f CmUploaded bysrisairampoly
- Fuzzy and Logic based Route ChoiceUploaded byJorge Cristhian Chamby Diaz
- Creating of Conceptual Lattices Using Multi Layer PerceptronUploaded byskaapje
- UCusing FuzzyUploaded byamit621988
- Stanford University CS224d_ Deep Learning for Natural Language Processing - SyllabusUploaded byRossana Cunha
- Minin HandoutUploaded byDimas Bagus Cahyaningrat. W

- Lyapunov Based DesignUploaded bySoumyadeep Bose
- The Concept of a Linguistic Variable and Its Applications to Approximate Reasoning III-1975Uploaded byapi-3717234
- Zhao ShenUploaded bySoumyadeep Bose
- EID.pdfUploaded bySoumyadeep Bose
- Adv Digital System DesginUploaded bySoumyadeep Bose
- 07525498Uploaded bySoumyadeep Bose
- Internship at GeeksforGeeks -Notice- Technical Content WriterUploaded bySoumyadeep Bose
- An Improved Optimal Image Sampling Schedule for Multip 2003 IFAC ProceedingsUploaded bySoumyadeep Bose
- lec21Uploaded bySoumyadeep Bose
- Transfer function for buck boost converterUploaded bySoumyadeep Bose
- Author Registration FormUploaded bySoumyadeep Bose
- Panagopoulos Et Al-2000-International Journal of Robust and Nonlinear ControlUploaded bySoumyadeep Bose
- 01499389.pdfUploaded bySoumyadeep Bose
- Air Quality Uc iUploaded bySoumyadeep Bose
- DraculaUploaded byJohn Bejo
- Shiny Periodic TableUploaded bySalathiel Verba
- CCB Rules NewUploaded bySoumyadeep Bose
- Lectut MAN 647 Doc Lecture Controllability LinearUploaded bySoumyadeep Bose
- SVC AnalysisUploaded bySoumyadeep Bose
- Homework 2Uploaded bySoumyadeep Bose
- Solution Manual Stefani 4th EdUploaded byFURQAN
- Lecture1 Introduction RobustControlTheoryUploaded bySoumyadeep Bose
- Matlab for Control Engineers Katsuhiko Ogata PDFUploaded bySoumyadeep Bose
- A Radial Basis Function Training Using BackpropUploaded bySoumyadeep Bose
- Ieee Publishing PolicyUploaded bySoumyadeep Bose
- NPTEL Assignment 1 OptUploaded bySoumyadeep Bose
- 19.GAMatlabtoolboxUploaded bySoumyadeep Bose
- Data StructuresUploaded bySoumyadeep Bose

- Development of Quality System for Engine AssemblyUploaded bysvrbchaudhari
- Quantitative Research finalUploaded byRohit Singh
- land-06-00030Uploaded byjonikriwil
- Application of Using Fuzzy Logic as an Artificial Intelligence Technique in the Screening Criteria of the EOR TechnologiesUploaded byVeronica
- lets be rational study guideUploaded byapi-238440021
- Mergers and Acquisitions Notes @ Mba Bec Doms of FinanceUploaded byBabasab Patil (Karrisatte)
- Chess Blitz Psychology BurnsUploaded bymilos
- TuningUploaded bychennam1
- 14374-16682-1-PBUploaded byArsalan Ahmed
- Research and Design IUploaded bySoumyadeep Maity
- mark jallayuUploaded byapi-300665426
- Khamis 2007 Effect of Foot HyperpronationUploaded bypowerliftermilo
- Marketing Management.pptUploaded byManish Singh
- gimenez ventura 2003.pdfUploaded byHerman Sjahruddin
- 3.docxUploaded byYanyan Sarikin
- Nochetto Adaptive FEMUploaded byonebyzerooutlook
- The Patterns of Tourism Management in Hot Spring Sites in the Western ThailandUploaded byGlobal Research and Development Services
- FranchisingUploaded byAnis Adilah
- Implementing ResearchUploaded byCharlene Santos
- Intro Mapdl Ws03a ApdlUploaded byAlexander Narváez
- Lesson 3 - Sociological Research MethodsUploaded byRajesh Dharamsoth
- 1 TX Renfro 2009 The Practice of Econometrics + Computation.pdfUploaded byIvan Arandia Tapia
- Shore Approach DesignUploaded bySampurnanand Pandey
- Recruitment and SelectionUploaded byEzz Ezzeldeen Eldamak
- sdo main.docUploaded bymahesh
- Towns 1066Uploaded byAnat Dart
- IntroductionToRobotics-Lecture06Uploaded byShubham Salunkhe
- NCERT Solutions for Class 7 Maths Chapter 3Uploaded byVidhya Raja
- Comparative Study on Prefabrication ConsUploaded byAtul Kant
- dmd frcstUploaded byPratik Kitlekar