INTERNATIONAL JOURNAL OF WISDOM BASED COMPUTING, VOL. 1(3), DECEMBER 2011
24
J. Siva Prakash and R. Rajesh Daffodills India Technologies, 211, TVS nagar, Edayarpalayam, Coimbatore 25 email: siva5200@gmail.com Department of Computer Applications, Bharathiar University, Coimbatore  641046, India email: kollamrajeshr@ieee.org
Abstract—Recently, Extreme Learning Machine (ELM) has been proposed, which signiﬁcantly reduce the amount of time needed to train a Neural Network. The performance of ELM depends on the random input weights assigned. This paper reviews ELM and introduces Random Iterative ELM (RIELM), which iteratively assigns random weights and hence ﬁnds the optimal weights for ELM. The performance through the classiﬁcation example of iris data set shows the effectiveness of RIELM. Finally RIELM is used to effectively classify electronic nose data. Index Terms—ingle Layer Neural Network, Ex treme Learning Machine, Classiﬁcation, Electronic Nose Dataingle Layer Neural Network, Extreme Learn ing Machine, Classiﬁcation, Electronic Nose DataS
I. I NTRODUCTION
Neural Networks have been extensively used in many ﬁelds due to their ability to approximate complex nonlinear mappings directly from the input
sample; and to provide models for a large class of natural and artiﬁcial phenomena that are difﬁcult to handle using classical parametric techniques. There
ELM algorithm have been appeared in the literature [11], [16], [19]. This paper deals with random iterative ELM and its performance has been compared with other meth ods. This paper is organized as follows. Section 2 deals with a review of ELM. Section 3 deals with RI ELM. Section 4 deals with classiﬁcation of electronic data set and section 5 conclude the paper.
_{I}_{I}_{.} _{E} XTREME L EARNING M ACHINE AR EVIEW
Extreme Learning Machine proposed by Huang at el [6], [7] uses Single Layer Feedforward Neu ral Network (SLFN) Architecture [1]. It randomly chooses the input weights and analytically determines the output weights of SLFN. It has much better generalization performance with much faster learning speed. It requires less human interventions and can run thousands times faster than those conventional methods. It automatically determines all the network parameters analytically, which avoids trivial human intervention and makes it efﬁcient in online and real
are many algorithm for training Neural Network time applications. like Back propagation [18], Support Vector Machine
(SVM) [4], [10], Hidden Markov Model (HMM) [14] etc. One of the disadvantages of the Neural Network is the learning time. Recently, Huang et al [6], [17] proposed a new
learning algorithm
for Single Layer Feedforward
Neural Network architecture called Extreme Learn ing Machine (ELM) which overcomes the problems caused by gradient descent based algorithms such as Back propagation [18] applied in ANNs[9], [10]. ELM can signiﬁcantly reduce the time needed to train a Neural Network and a number of papers based on
A. A Note on Single Hidden Layer Feedforward Neural Network
Single Hidden Layer Feedforward Neural Network (SLFN) function with hidden nodes [8], [12] can be represented as mathematical description of SLFN in corporating both additive and RBF hidden nodes in a
uniﬁed way is given as
,
where , . and are the learn ing parameters of hidden nodes and the weight connecting the th hidden node to the output node.
INTERNATIONAL JOURNAL OF WISDOM BASED COMPUTING, VOL. 1(3), DECEMBER 2011
25
is the output of the th hidden node with respect to the input . For additive hidden node with the activation func tion (e.g., sigmoid and threshold),
is
given by ,
. is the weight vector connecting the input layer to the th hidden node and is the bias of the th hidden node. denotes the inner product of vector and in . For RBF hidden node with activation function (e.g., Gaussian), given
by , .
and
are the center and impact factor of th RBF node. indicates the set of all positive real values. The RBF network is a special case of SLFN with RBF nodes in its hidden layer. For , arbitrary distinct samples . Here, is a input vector and is a target vector. If an SLFN with hidden nodes can approximate these samples with zero error, then there exist , and such that
the output weights using pseudoinverse of H giving only a small error . The hidden node parameters of ELM and (input weights and biases or centers and impact factors) need not be tuned during training and may simply be assigned with random values. The following theorems state the same. Theorem 1: (Liang et.al.[12]) Let an SLFN with additive or RBF hidden nodes and an activation function which is inﬁnitely differentiable in any interval of R be given. Then, for arbitrary distinct
input vectors
and
randomly generated with any continu
ous probability distribution, respectively, the hidden layer output matrix is invertible with probability one, the hidden layer output matrix H of the SLFN is invertible and
Theorem 2: (Liang et.al.[12])Given any small positive value and activation function
which is inﬁnitely differentiable in any interval, there exists such that for arbitrary distinct input vectors
,
for
any
randomly generated

according to any continuous probability distribution 





with probability one. 


Since the hidden node parameters of ELM need not 

(1) 
be tuned during training and since they are simply 

Equation (1) can be written compactly as 
assigned with random values, eqn (2) becomes a 


(2) 
linear system and the output weights can be estimated 

where 
as follows. 






(3)

(6) 











where is the MoorePenrose generalized in (4) (5) 

. . .

verse of the hidden layer output matrix and can be calculated using several methods includ 


ing orthogonal projection method, orthogonalization 



method, iterative method, singular value decomposi 


. . .

tion (SVD), etc. The orthogonal projection method can be used only when is nonsingular and 


. Due to the use of searching 

with 
; 
;



. is the hidden layer output matrix of SLFN with th column of being the ith hidden node’s output with respect to inputs . 
and iterations, orthogonalization method and iterative method have limitations. Implementations of ELM uses SVD to calculate the MoorePenrose generalized inverse of , since it can be used in all situations. 
B. Principles of ELM
ELM [6], [7] designed as a SLFN with L hidden neurons can learn L distinct samples with zero error. Even if the number of hidden neurons (L) the number of distinct samples (N), ELM can still assign random parameters to the hidden nodes and calculate
III. R ANDOM I TERATIVE E XTREME L EARNING
M ACHINE
The working of random iterative extreme learning machine is shown below.
1) Initialize j=1 and Maximum Performance, =0
2)
Randomly assign the input weights, and bias weights,
INTERNATIONAL JOURNAL OF WISDOM BASED COMPUTING, VOL. 1(3), DECEMBER 2011
26
3)
Calculate the hidden layer output matrix H, with ;
; , given by
of 90.86%. The performance comparison of the RI ELM algorithm with PCA+MLP is shown in table II.
_{V}_{.} _{C} ONCLUSION
4)
Calculate the output matrix using , where is the
MoorePenrose generalized inverse of the hidden layer output matrix
calculated using singular value decomposition (SVD).
5)
6)
Evaluate the performance ( ) of the resulting ELM
If
Save , , H,
End If
7) Do these procedures from step 2 to 6 iteratively for number of
times (ie., j=1,2, ,n), where is the predeﬁned maximum number
of iterations.
Inorder to demonstrate RIELM, classiﬁcation of Fisher’s Iris data set is shown here. The dataset [2] consists of 50 samples from each of three species of Iris ﬂowers (Iris setosa, Iris virginica and Iris versicolor) with four features (length of sepal, width of sepal, length of petal and width of petal). RI ELM with 28 hidden nodes is able to learn the data and achieve a testing accuracy of 98.67%. The performance comparison of the ELM algorithm with other algorithms are shown in table I.
_{I}_{V}_{.} _{C} LASSIFICATION OF E LECTRONIC N OSE DATA
S ET USING RIELM
Commercial coffees are blends which contain cof fees of various origins and its analysis and control is of great importance. Characterization of coffee is usually done using the chemical proﬁle of one of its fractions such as the headspace of green or roasted beans or the phenolic fraction. The relative abundance of 700 diverse molecules identiﬁed so far in the headspace depends on the type, provenance and manufacturing of the coffee. None of these molecules can alone be identiﬁed as a marker, on the other hand one has to consider the whole spectrum, as for instance, the gas chromatographic proﬁle. Electronic noses get the gas chromatographic pro ﬁle and the related software helps to quantify the concentration of the constituents gaseous mixtures in case of simple mixtures and analyze gaseous mixtures for discriminating between different (but similar) mixtures. In this section, coffee data is considered for classiﬁcation obtained from the Pico Electronic Nose. Blends group of 7 coffees were analyzed using classiﬁcation algorithm RIELM. The blends group has 5 features extracted from a sensor response curve. In our simulations, RIELM with 33 hidden nodes is able to achieve a testing accuracy
This paper has reviewed extreme learning machine and has shown that it outperforms backpropagation neural networks. This paper also introduces random iterative ELM (RIELM) which is used to classify electronic nose data and the results show the effec tiveness of RIELM over other methods.
ACKNOWLEDGEMENT
This research is supported by University Grants Commission, India through a major research project grant (UGC F. No. 3362/2007 (SR) dated 28th Feb 2008). The authors are also thankful to all the staffs of the SCSE and Bharathiar University for their valuable help.
_{R} EFERENCES
[1] Annema, A.J. and Hoen, K. and Wallinga, H,“Precision requirements for singlelayer feedforward neural networks”, In: Fourth International Conference on Microelectronics for Neural Networks and Fuzzy Systems, pp: 145151, 1994. [2] James C. Bezdek, James M. Keller, Raghu Krishnapuram, Ludmila I. Kuncheva, and Nikhil R. Pal, “Will the Real Iris Data Please Stand Up?”,IEEE Trans. on fuzzy systems, Vol. 7, No. 3, June 1999. [3] ShyiMing Chen, YaoDe Fang, “A New Approach for Han dling the Iris Data Classiﬁcation Problem,” International Jour nal of Applied Science and Engineering, 3, 1: 37 49, 2005. [4] Corinna Cortes, Vladimir Vapnik, “SupportVector Net works”, Machine Learning, Vol. 20, pp: 273297, 1995. [5] Dehuri, S., Cho, S.B., “Multicriterion Pareto based particle swarm optimized polynomial neural network for classiﬁ cation: A Review and StateoftheArt”, Computer Science Review,pp: 1940, 2009. [6] GuangBin Huang, QinYu Zhu, CheeKheong Siew, Extreme Learning Machine: A New Learning Scheme of Feedforward Neural Networks, International Joint Conference on Neural Networks, Vol. 2, pp: 985990, 2004. [7] GuangBin Huang, QinYu Zhu, CheeKheong Siew, Extreme Learning Machine: Theory and Applications, Neurocomput ing, Vol. 70, pp: 489501, 2006. [8] GuangBin Huang, Lei Chen, CheeKheong Siew, Universal Approximation Using Incremental Constructive Feedforward Networks with Random Hidden Nodes, IEEE Transactions on Neural Networks, Vol. 17, No. 4, pp: 879892, 2006. [9] Karayiannis, A. N. Venetsanopoulos, “Artiﬁcial Neural Net work: learning algorithms, performance evaluation, and appli cations”, Kluver Academic, Boston, MA, 1993. [10] Derong Liu, Guest Editors, Huaguang Zhang, Sanqing Hu, Neural networks: Algorithms and applications, Neurocomput ing, Vol. 71, pp: 471473, 2008. [11] NanYing Liang, Paramasivan Saratchandran, GuangBin Huang, Narasimhan Sundararajan, Classiﬁcation of mental tasks from EEG signals using Extreme Learning Machine, International Journal of Neural Systems, Vol. 16, No. 1, pp:
2938, 2006.
INTERNATIONAL JOURNAL OF WISDOM BASED COMPUTING, VOL. 1(3), DECEMBER 2011
TABLE I 

_{C} LASSIFICATION ACCURACY OF IRIS DATA SET 

Algorithm 
Accuracy % 
Learning time 
ChenandFang method (2005) [3] 97.33 
1 min 

ANN (2008) [15] 
94.87 
1 min 
ELM 
93.68 (averaged over 100 runs) 
3.2 sec 
RIELM (no. of iterations = 100) 
98.67 
3.2 sec 
TABLE II 
27
[12] NanYing Liang, GuangBin Huang, HaiJun Rong, P. Saratchandran, N. Sundararajan, A Fast and Accurate Online Sequential Learning Algorithm for Feedforward Networks, IEEE Transactions on Neural Networks, Vol. 17, No. 6, pp:
14111423, 2006. [13] M. Pardo, G. Sberveglieri, “Coffee analysis with an Elec tronic Nose”, IEEE Transactions on Instrumentation and Mea surement, VOL. 51, NO. 6, DECEMBER 2002. [14] Lawrence R. Rabiner, “A tutorial on Hidden Markov Models and selected applications in speech recognition”, Proceedings of the IEEE, 257286, 1989. [15] N. P. Suraweera, D. N. Ranasinghe, “Adaptive Structural Optimisation of Neural Networks”, The International Journal on Advances in ICT for Emerging Regions, pp: 33  41, 2008.
[16] S. Suresh, R. Venkatesh Babu, H. J. Kim, “Noreference image quality assessment using modiﬁed extreme learning machine classiﬁer, Applied Soft Computing, Vol. 9, pp: 541 552, 2009. [17] Dianhui Wang, GuangBin Huang, “Protein Sequence Clas siﬁcation Using Extreme Learning Machine, Proceedings of International Joint Conference on Neural Networks, Vol. 3, pp: 1406 1411, 2005. [18] Paul John Werbos, Werbos, “The Roots Of Backpropagation:
From Ordered Derivatives To Neural Networks And Political Forecasting”, Wileyinterscience, 1994. [19] CheeWee Thomas Yeu, MengHiot Lim, GuangBin Huang, “A New Machine Learning Paradigm for Terrain Reconstruc tion, IEEE Geoscience and Remote Sensing Letters, Vol. 3, No. 3, 2006.