You are on page 1of 4

INTERNATIONAL JOURNAL OF WISDOM BASED COMPUTING, VOL. 1(3), DECEMBER 2011

24

Random Iterative Extreme Learning Machine for Classification of Electronic Nose Data

J. Siva Prakash and R. Rajesh Daffodills India Technologies, 211, TVS nagar, Edayarpalayam, Coimbatore -25 email: siva5200@gmail.com Department of Computer Applications, Bharathiar University, Coimbatore - 641046, India email: kollamrajeshr@ieee.org

Abstract—Recently, Extreme Learning Machine (ELM) has been proposed, which significantly reduce the amount of time needed to train a Neural Network. The performance of ELM depends on the random input weights assigned. This paper reviews ELM and introduces Random Iterative ELM (RI-ELM), which iteratively assigns random weights and hence finds the optimal weights for ELM. The performance through the classification example of iris data set shows the effectiveness of RI-ELM. Finally RI-ELM is used to effectively classify electronic nose data. Index Terms—ingle Layer Neural Network, Ex- treme Learning Machine, Classification, Electronic Nose Dataingle Layer Neural Network, Extreme Learn- ing Machine, Classification, Electronic Nose DataS

I. I NTRODUCTION

Neural Networks have been extensively used in many fields due to their ability to approximate complex nonlinear mappings directly from the input

sample; and to provide models for a large class of natural and artificial phenomena that are difficult to handle using classical parametric techniques. There

ELM algorithm have been appeared in the literature [11], [16], [19]. This paper deals with random iterative ELM and its performance has been compared with other meth- ods. This paper is organized as follows. Section 2 deals with a review of ELM. Section 3 deals with RI- ELM. Section 4 deals with classification of electronic data set and section 5 conclude the paper.

II. E XTREME L EARNING M ACHINE -AR EVIEW

Extreme Learning Machine proposed by Huang at el [6], [7] uses Single Layer Feedforward Neu- ral Network (SLFN) Architecture [1]. It randomly chooses the input weights and analytically determines the output weights of SLFN. It has much better generalization performance with much faster learning speed. It requires less human interventions and can run thousands times faster than those conventional methods. It automatically determines all the network parameters analytically, which avoids trivial human intervention and makes it efficient in online and real-

are many algorithm for training Neural Network time applications. like Back propagation [18], Support Vector Machine

(SVM) [4], [10], Hidden Markov Model (HMM) [14] etc. One of the disadvantages of the Neural Network is the learning time. Recently, Huang et al [6], [17] proposed a new

learning algorithm

for Single Layer Feedforward

Neural Network architecture called Extreme Learn- ing Machine (ELM) which overcomes the problems caused by gradient descent based algorithms such as Back propagation [18] applied in ANNs[9], [10]. ELM can significantly reduce the time needed to train a Neural Network and a number of papers based on

A. A Note on Single Hidden Layer Feedforward Neural Network

Single Hidden Layer Feedforward Neural Network (SLFN) function with hidden nodes [8], [12] can be represented as mathematical description of SLFN in- corporating both additive and RBF hidden nodes in a

unified way is given as

,

where , . and are the learn- ing parameters of hidden nodes and the weight connecting the th hidden node to the output node.

INTERNATIONAL JOURNAL OF WISDOM BASED COMPUTING, VOL. 1(3), DECEMBER 2011

25

is the output of the th hidden node with respect to the input . For additive hidden node with the activation func- tion (e.g., sigmoid and threshold),

is

given by ,

. is the weight vector connecting the input layer to the th hidden node and is the bias of the th hidden node. denotes the inner product of vector and in . For RBF hidden node with activation function (e.g., Gaussian), given

by , .

and

are the center and impact factor of th RBF node. indicates the set of all positive real values. The RBF network is a special case of SLFN with RBF nodes in its hidden layer. For , arbitrary distinct samples . Here, is a input vector and is a target vector. If an SLFN with hidden nodes can approximate these samples with zero error, then there exist , and such that

the output weights using pseudoinverse of H giving only a small error . The hidden node parameters of ELM and (input weights and biases or centers and impact factors) need not be tuned during training and may simply be assigned with random values. The following theorems state the same. Theorem 1: (Liang et.al.[12]) Let an SLFN with additive or RBF hidden nodes and an activation function which is infinitely differentiable in any interval of R be given. Then, for arbitrary distinct

input vectors

and

randomly generated with any continu-

ous probability distribution, respectively, the hidden layer output matrix is invertible with probability one, the hidden layer output matrix H of the SLFN is invertible and

Theorem 2: (Liang et.al.[12])Given any small positive value and activation function

which is infinitely differentiable in any interval, there exists such that for arbitrary distinct input vectors

,

for

any

randomly generated

 

according to any continuous probability distribution

with probability one.

Since the hidden node parameters of ELM need not

 

(1)

be tuned during training and since they are simply

Equation (1) can be written compactly as

 

assigned with random values, eqn (2) becomes a

 

 

(2)

linear system and the output weights can be estimated

where

as follows.

(3)

(6)

 

 
 

 

where is the Moore-Penrose generalized in-

(4)

(5)

 

. . .

verse of the hidden layer output matrix and can be calculated using several methods includ-

ing orthogonal projection method, orthogonalization

method, iterative method, singular value decomposi-

. . .

tion (SVD), etc. The orthogonal projection method can be used only when is nonsingular and

 

. Due to the use of searching

with

;

;

. is the hidden layer output matrix of SLFN with th column of being the ith hidden node’s output with respect to inputs .

and iterations, orthogonalization method and iterative method have limitations. Implementations of ELM uses SVD to calculate the Moore-Penrose generalized inverse of , since it can be used in all situations.

B. Principles of ELM

ELM [6], [7] designed as a SLFN with L hidden neurons can learn L distinct samples with zero error. Even if the number of hidden neurons (L) the number of distinct samples (N), ELM can still assign random parameters to the hidden nodes and calculate

III. R ANDOM I TERATIVE E XTREME L EARNING

  • M ACHINE

The working of random iterative extreme learning machine is shown below.

1) Initialize j=1 and Maximum Performance, =0

2)

Randomly assign the input weights, and bias weights,

INTERNATIONAL JOURNAL OF WISDOM BASED COMPUTING, VOL. 1(3), DECEMBER 2011

26

3)

Calculate the hidden layer output matrix H, with ;

; , given by

of 90.86%. The performance comparison of the RI- ELM algorithm with PCA+MLP is shown in table II.

V. C ONCLUSION

4)

Calculate the output matrix using , where is the

Moore-Penrose generalized inverse of the hidden layer output matrix

calculated using singular value decomposition (SVD).

5)

6)

Evaluate the performance ( ) of the resulting ELM

If

Save , , H,

End If

7) Do these procedures from step 2 to 6 iteratively for number of

times (ie., j=1,2, ,n), where is the predefined maximum number

of iterations.

Inorder to demonstrate RI-ELM, classification of Fisher’s Iris data set is shown here. The dataset [2] consists of 50 samples from each of three species of Iris flowers (Iris setosa, Iris virginica and Iris versicolor) with four features (length of sepal, width of sepal, length of petal and width of petal). RI- ELM with 28 hidden nodes is able to learn the data and achieve a testing accuracy of 98.67%. The performance comparison of the ELM algorithm with other algorithms are shown in table I.

IV. C LASSIFICATION OF E LECTRONIC N OSE DATA

S ET USING RI-ELM

Commercial coffees are blends which contain cof- fees of various origins and its analysis and control is of great importance. Characterization of coffee is usually done using the chemical profile of one of its fractions such as the headspace of green or roasted beans or the phenolic fraction. The relative abundance of 700 diverse molecules identified so far in the headspace depends on the type, provenance and manufacturing of the coffee. None of these molecules can alone be identified as a marker, on the other hand one has to consider the whole spectrum, as for instance, the gas chromatographic profile. Electronic noses get the gas chromatographic pro- file and the related software helps to quantify the concentration of the constituents gaseous mixtures in case of simple mixtures and analyze gaseous mixtures for discriminating between different (but similar) mixtures. In this section, coffee data is considered for classification obtained from the Pico Electronic Nose. Blends group of 7 coffees were analyzed using classification algorithm RI-ELM. The blends group has 5 features extracted from a sensor response curve. In our simulations, RI-ELM with 33 hidden nodes is able to achieve a testing accuracy

This paper has reviewed extreme learning machine and has shown that it outperforms backpropagation neural networks. This paper also introduces random iterative ELM (RI-ELM) which is used to classify electronic nose data and the results show the effec- tiveness of RI-ELM over other methods.

ACKNOWLEDGEMENT

This research is supported by University Grants Commission, India through a major research project grant (UGC F. No. 33-62/2007 (SR) dated 28th Feb 2008). The authors are also thankful to all the staffs of the SCSE and Bharathiar University for their valuable help.

R EFERENCES

[1] Annema, A.J. and Hoen, K. and Wallinga, H,“Precision requirements for single-layer feedforward neural networks”, In: Fourth International Conference on Microelectronics for Neural Networks and Fuzzy Systems, pp: 145-151, 1994. [2] James C. Bezdek, James M. Keller, Raghu Krishnapuram, Ludmila I. Kuncheva, and Nikhil R. Pal, “Will the Real Iris Data Please Stand Up?”,IEEE Trans. on fuzzy systems, Vol. 7, No. 3, June 1999. [3] Shyi-Ming Chen, Yao-De Fang, “A New Approach for Han- dling the Iris Data Classification Problem,” International Jour- nal of Applied Science and Engineering, 3, 1: 37- 49, 2005. [4] Corinna Cortes, Vladimir Vapnik, “Support-Vector Net- works”, Machine Learning, Vol. 20, pp: 273-297, 1995. [5] Dehuri, S., Cho, S.-B., “Multi-criterion Pareto based particle swarm optimized polynomial neural network for classifi- cation: A Review and State-of-the-Art”, Computer Science Review,pp: 19-40, 2009. [6] Guang-Bin Huang, Qin-Yu Zhu, Chee-Kheong Siew, Extreme Learning Machine: A New Learning Scheme of Feedforward Neural Networks, International Joint Conference on Neural Networks, Vol. 2, pp: 985-990, 2004. [7] Guang-Bin Huang, Qin-Yu Zhu, Chee-Kheong Siew, Extreme Learning Machine: Theory and Applications, Neurocomput- ing, Vol. 70, pp: 489-501, 2006. [8] Guang-Bin Huang, Lei Chen, Chee-Kheong Siew, Universal Approximation Using Incremental Constructive Feedforward Networks with Random Hidden Nodes, IEEE Transactions on Neural Networks, Vol. 17, No. 4, pp: 879-892, 2006. [9] Karayiannis, A. N. Venetsanopoulos, “Artificial Neural Net- work: learning algorithms, performance evaluation, and appli- cations”, Kluver Academic, Boston, MA, 1993. [10] Derong Liu, Guest Editors, Huaguang Zhang, Sanqing Hu, Neural networks: Algorithms and applications, Neurocomput- ing, Vol. 71, pp: 471-473, 2008. [11] Nan-Ying Liang, Paramasivan Saratchandran, Guang-Bin Huang, Narasimhan Sundararajan, Classification of mental tasks from EEG signals using Extreme Learning Machine, International Journal of Neural Systems, Vol. 16, No. 1, pp:

2938, 2006.

INTERNATIONAL JOURNAL OF WISDOM BASED COMPUTING, VOL. 1(3), DECEMBER 2011

 

TABLE I

C LASSIFICATION ACCURACY OF IRIS DATA SET

 

Algorithm

Accuracy %

Learning time

Chen-and-Fang method (2005) [3] 97.33

1 min

ANN (2008) [15]

94.87

1 min

ELM

93.68 (averaged over 100 runs)

3.2 sec

RI-ELM (no. of iterations = 100)

98.67

3.2 sec

TABLE II

INTERNATIONAL JOURNAL OF WISDOM BASED COMPUTING, VOL. 1(3), DECEMBER 2011 TABLE I LASSIFICATION ACCURACY OF IRIS

27

C LASSIFICATION ACCURACY OF N OSE D ATA SET Algorithm PCA + MLP, M. Pardo et.
C LASSIFICATION ACCURACY OF N OSE D ATA SET
Algorithm
PCA + MLP, M. Pardo et. al. (2002) [13]
RI-ELM (no. of iterations = 350)
Accuracy %
87%
90.86%
learning time
1 min
11.06 sec

[12] Nan-Ying Liang, Guang-Bin Huang, Hai-Jun Rong, P. Saratchandran, N. Sundararajan, A Fast and Accurate On-line Sequential Learning Algorithm for Feedforward Networks, IEEE Transactions on Neural Networks, Vol. 17, No. 6, pp:

1411-1423, 2006. [13] M. Pardo, G. Sberveglieri, “Coffee analysis with an Elec- tronic Nose”, IEEE Transactions on Instrumentation and Mea- surement, VOL. 51, NO. 6, DECEMBER 2002. [14] Lawrence R. Rabiner, “A tutorial on Hidden Markov Models and selected applications in speech recognition”, Proceedings of the IEEE, 257-286, 1989. [15] N. P. Suraweera, D. N. Ranasinghe, “Adaptive Structural Optimisation of Neural Networks”, The International Journal on Advances in ICT for Emerging Regions, pp: 33 - 41, 2008.

[16] S. Suresh, R. Venkatesh Babu, H. J. Kim, “No-reference image quality assessment using modified extreme learning machine classifier, Applied Soft Computing, Vol. 9, pp: 541- 552, 2009. [17] Dianhui Wang, Guang-Bin Huang, “Protein Sequence Clas- sification Using Extreme Learning Machine, Proceedings of International Joint Conference on Neural Networks, Vol. 3, pp: 1406- 1411, 2005. [18] Paul John Werbos, Werbos, “The Roots Of Backpropagation:

From Ordered Derivatives To Neural Networks And Political Forecasting”, Wiley-interscience, 1994. [19] Chee-Wee Thomas Yeu, Meng-Hiot Lim, Guang-Bin Huang, “A New Machine Learning Paradigm for Terrain Reconstruc- tion, IEEE Geoscience and Remote Sensing Letters, Vol. 3, No. 3, 2006.