Paper 4

INTERNATIONAL JOURNAL OF WISDOM BASED COMPUTING, VOL.
1(1), 2011
35
Extreme Learning Machines - A Review and State-of-the-art

R. Rajesh, J. Siva Prakash
Abstract Learning time is an important factor while designing any computational intelligent algorithms for classications, medication, control etc. Recently, Extreme Learning Machine has been proposed, which signicantly reduce the amount of time needed to train a Neural Network. It has been widely used for many applications. This paper surveys ELM and it applications. Index Terms Extreme learning machine, Neural Network, Single layer feedforward neural network, classication, regression
I. I NTROUCTION
Neural Networks have been extensively used in many elds due to their ability to approximate complex nonlinear mappings directly from the input sample; and to provide models for a large class of natural and articial phenomena that are difcult to handle using classical parametric techniques. There are many algorithm for training Neural Network like Back propagation, Support Vector Machine (SVM) [41], Hidden Markov Model (HMM) etc. One of the disadvantages of the Neural Network is the learning time. Recently, Huang et al [25], [67]proposed a new learning algorithm for Single Layer Feedforward Neural Network architecture called Extreme Learning Machine (ELM) which overcomes the problems caused by gradient descent based algorithms such as Back propagation applied in ANNs. ELM can signicantly reduce the amount of time needed to train a Neural Network. This paper presents a survey of Extreme Learning Machine (ELM). This paper is organized as follows, Section 2 describes about the working of ELM, and Section 3 presents the learning of ELM. Applications of ELM are reviewed in Section 4 and Section 5 concludes of the paper.
II. E XTREME L EARNING M ACHINE - A R EVIEW

Extreme Learning Machine proposes by Huang at el [25], [29] uses Single Layer Feedforward Neural Network (SLFN) Architecture [1]. It randomly chooses the input weights and analytically determines the output weights of SLFN. It has much better generalization performance with much faster learning speed. It requires less human interventions and can run thousands times faster than those conventional methods. It automatically determines all the network parameters analytically,
Dr. R. Rajesh is at School of Computer Science and Engineering, Bharathiar University. He can be contacted by kollamrajeshr@ieee.org Mr. J. Siva Prakash at Daffodills India Technologies, 211, TVS nagar, Edayarpalayam, Coimbatore -25. He has done his Master of Philosophy in Computer Science at Bharathiar University. He can be contacted by siva5200@gmail.com
INTERNATIONAL JOURNAL OF WISDOM BASED COMPUTING, VOL. 1(1), 2011
36
which avoids trivial human intervention and makes it efcient in online and realtime applications. Extreme Learning Machine has several advantages, Ease of use, Faster Learning Speed, Higher generalization performance, suitable for many nonlinear activation function and kernel functions. 2.1. A Note on Single Hidden Layer Feedforward Neural Network Single Hidden Layer Feedforward Neural Network (SLFN) function with hidden nodes [31], [49] can be represented as mathematical description of SLFN incorporating both additive and RBF hidden nodes in a unied way is given as follows.

where and are the learning parameters of hidden nodes and the weight is the output of connecting the th hidden node to the output node. the th hidden node with respect to the input . For additive hidden node with the activation function (e.g., sigmoid and threshold), is given by
(1)
(2)
where is the weight vector connecting the input layer to the th hidden node and is the bias of the th hidden node. denotes the inner product of vector and in . For RBF hidden node with activation function (e.g., Gaussian), given by
(3)
where and are the center and impact factor of th RBF node. indicates the set of all positive real values. The RBF network is a special case of SLFN with RBF nodes in its hidden layer. For , arbitrary distinct samples . Here, is a input vector and is a target vector. If an SLFN with hidden nodes can approximate these samples with zero error. If then implies that there exist , and such that

(4)
Equation (4) can be written compactly as
where
(5)
(6)
37
with

. . .

. . .
(7)

is the hidden layer output matrix of SLFN with th column of ith hidden nodes output with respect to inputs . 2.2. Principles of ELM - A Survey
being the
ELM [25], [29] designed as a SLFN with L hidden neurons can learn L distinct samples with zero error. Even if the number of hidden neurons (L) the number of distinct samples (N), ELM can still assign random parameters to the hidden nodes and calculate the output weights using pseudoinverse of H giving only a small error . The hidden node paremeters of ELM and (input weights and biases or centers and impact factors) need not be tuned during training and may simply be assigned with random values. The following theorems state the same. Theorem 1: (Liang et.al.[49]) Let an SLFN with additive or RBF hidden nodes and an activation function which is innitely differentiable in any interval of R be given. Then, for arbitrary distinct input vectors and randomly generated with any continuous probability distribution, respectively, the hidden layer output matrix is invertible with probability one, the hidden layer output matrix H of the SLFN is invertible and Theorem 2: (Liang et.al.[49])Given any small positive value and activation function which is innitely differentiable in any interval, there exists such that for arbitrary distinct input vectors , for any randomly generated according to with probability any continuous probability distribution one. Since the hidden node paremeters of ELM need not be tuned during training and since they are simply assigned with random values, eqn (5) becomes a linear system and the output weights can be estimated as follows.
(8)
where is the Moore-Penrose generalized inverse [60] of the hidden layer output matrix and can be calculated using several methods including orthogonal projection method, orthogonalization method, iterative method, singular value decomposition (SVD) [60] etc. The orthogonal projection method can be used only when is nonsingular and . Due to the use of searching and iterations, orthogonalization method and iterative method have limitations. Implementations of ELM uses SVD to calculate the Moore-Penrose generalized inverse of , since it can be used in all situations. ELM is thus a batch learning method.
38
Universal approximation capability of ELM has been analyzed in [31] in an incremental method and it has been shown that single SLFNs with randomly generated additive or RBF nodes with a widespread of piecewise continuous activation functions can universally approximate any continuous target function on any compact subspace of the Euclidean space . Theorem 3: (Huang et.al. [31])Given any bounded nonconstant piecewise con for additive nodes or any integrable piecewise tinuus function continuous function and for RBF nodes, for any continous target function and any randomly generated function sequence holds with probability one if
(9) Incremental algorithm (also called I-ELM) has been proposed by Huang et.al. [31] for SLFN and TLFN (Two hidden layer feedforward neural network) which increases the hidden neurons one-by-one until the error becomes less than a predened constant . Convex incremental ELM (CI-ELM) [33] is another extension of ELM. In CIELM, the output weights of existing nodes are recalculated based on the Barrons convex optimization concept [4], when a new hidden node is randomly added using , where . Theorem 4: (Huang et.al [33]) Given any nonconstant piecewise continuous , if span is dense in , function then for any continous target function f and any function sequence randomly generated based on any continuous sampling distribution, holds with probability 1 if
where G(x,a,b) is the output of hidden nodes. Based on the above theorem, the output weight for the newly added hidden = node is , the output weights of existing hidden nodes are recalculated by , and the residual error , where the after addeding the new hidden node L is estimates based on the training set are is the activation is vector of the new node for all the N training samples, the residual vector before the new hidden node is added and is the target vector The following theorem states universal approximator for any type of piecewise continuous computational hidden nodes. Theorem 5: (Huang et.al [33]) Given any nonconstant piecewise continuous , if is function dense in , for any continuous target function f and any function sequence randomly generated based on any continuous sampling holds with probabilty 1 if the output parameters distribution, are determined by ordinary least square to minimize .
39
Later Huang et.al. came up with EI-ELM (Enhanced random search based incremental learning machine) [35] and they found that some of the hidden nodes in networks play a very minor role in the network output thereby increasing the complexity of the system. So in EI-ELM, at each learning step several hidden nodes are randomly generated and among them the hidden node leading to the largest residual error decreasing will be added to the existing network. The output wieght is calculated as in I-ELM. The following theorem states the same. Theorem 6: Given an SLFN with any nonconstant piecewise continuous hidden nodes , if is dense in , for any continuous target function f and any randomly generated function sequence and any positive integer k, holds with probabilty 1 if if
, where and = . Gradient-based algorithms cannot directly train neural network with threshold functions as they are nondifferentiable. Hence most of the literature uses sigmoid function as an approximation to threshold functions. The following lemma 1, theorems 7,8 by Huang et.al. [32] states the use of threshold functions for extreme learning machines. Lemma 1: (Huang et.al. [32]) A SLFN with hidden neurons with the activation function and with randomly chosen input weights and hidden biases can learn distinct observations with any arbitrarily small error. Theorem 7: (Huang et.al. [32]) For a SLFN with the activation function in the hidden layer, given any constant , there always exists an integer such that a SLFN with hidden neurons and with randomly chosen input weights and hidden biases can learn distinct observations with a training error less than . Theorem 8: (Huang et.al. [32]) Suppose that threshold activation function is used in the hidden layer. Given any nonzero constant there always exists an integer such that a SLFN with such hidden neurons and with randomly chosen input weights and hidden biases can learn distinct observations with its training error less than . Online Sequential learning algorihtm [49] has been proposed by Liang et.al. which can learn data one-by-one or chunk by chunck. For this rst, it is needed to select the type of node (additive or RBF), the corresponding activation function g, and the number of hidden neurons L. Then initialize the learning using a small from the given training set chunck of data , . Then nd hidden layer output matrix and the intial output weight , where and . Then for each th chunk of data,
(10)
40
where denotes the number of observations in the th chunk, the sequential learning phase is given as 1) Calculate . 2) Set . 3) Calculate using (11) (12)
4) Do these same procedure for other chunks of new data Error Minimized ELM (EM-ELM) with automatic Growth of Hidden Nodes and fast Incremental output weight Learning has been proposed by Feng et.al. [18] with Lemma 2 and theorem 9. Lemma 2: (Feng et.al. [18]) Given an SLFN, let = denote the hidden layer output matrix of the SLFN with hidden nodes . If new hidden nodes are added to the SLFN, the new hidden layer output matrix of the SLFN becomes , then where denotes the output error functions of SLFNs. Theorem 9: (Feng et.al. [18]) (Convergence Theorem): For a given set of distinct training samples , given an arbitrary positive value , there exists a positive integer such that . , the maximum number of hidden Given a set of training data nodes , a small positive integer and the expected learning accuracy , the recursive EM-ELM algorithm will randomly add hidden nodes (total ) until the learning error and the hidden nodes is , where output weights is updated recursively by , and is the hidden layer output matrix with
(13)
2.2.1 Demonstrating XOR classication Inorder to demonstrate the working of ELM, an XOR problem (2 class problem) with 4 instances containing 2 input attributes is solved. Table I shows the data set for the problem. The random inputs weights and the output weights of a 3 hidden layer SLFN generated by ELM is shown in gure 1 which is able to fully classify the XOR problem. 2.3. Extensions and Applications of ELM A number of papers based on ELM algorithm have been appeared since its introduction by Huang in 2003. A breif outline of some of the works are given below. In [28], ELM is extended to the case of radial basis function (RBF) networks, which allows the centres and impact widths of RBF kernels to be rnadomly
41
TABLE I I NPUTS AND OUTPUTS OF XOR
0 0 1 1
0 1 0 1
class 0 1 1 0
x1 x2
w1 w2 w3 w4 w5 w6
H1
H2
O1 O2 O3 O4 O5 O6
Y1 Y2
H3
Fig. 1. A SLFN with one hidden layer with 3 nodes for solving XOR problem. The randlomly generated input weights are. The output weights calculated using ELM are
generated and the output weights calculated as in ELM. They have shown that it can learn exteremely fast and produce generalization performance close to SVM. Fully complex extreme learning machines (C-ELM) have been suggested by Li et.al. [44], where they extend the ELM algorithm from the real domain to complex domain and applied to nonlinear channel equalization problem. Since fuzzy inference system is equivalent to an SLFN, Rong et. al. proposed online sequential fuzzy extreme learning machines (OS-Fuzzy-ELM) [57], where the antecedent parameters, namely membership function parameters, of TakagiSugeno-Kang model are generated randomly and the consequent parameters are determined analytically. Amal Mohamad Aqlan [2] presents a Hybrid Extreme Learning Machine with Levenberg-Marquardt Algorithm using AHP method, provides better generalization performance and faster convergence rate. ELM may need higher number of hidden neurons due to the random determination of the input weights and hidden biases. Hence in E-ELM [77], the inputs weights and hidden baises are determined using differential evolutions. Each chromosome is composed of input weights and hidden biases and the tness of the chromosome is calculated using
(14)
Runxuan Zhang [75] implements Multicategory classication using an Extreme Learning Machine for Microarray Gene Expression Cancer Diagnosis, and it provides good classication accuracy, lower training time and much more compact network compared to SVM-OVO, SANN. Real-Coded Genetic Algorithm ELM (RCGA-ELM) is proposed in [65], which selects the best number of hidden neurons and the corresponding input and bias weights using to genetic operators namely weight based and network based for both crossover and mutation. Due to the high computational time, in [65],
42
TABLE II OTHER MAJOR APPLICATIONS USING ELM Work Robust Object Tracking [3] Time Series Pridiction [60], [61] Optimal Pruned KNN [74] Reducing effects of Outliers [36] Variable Selection approach [55] Mental Tasks from EEG[48] Building Regression Models [53] Image Quality Assessment [64] Text Classication [41] Land Cover Classication [56] Terrain Reconstruction [73] Channel Equalization [46] Predicting HLA-Peptide Binding [22] Active Noise Control [76] QoS Violation Application [11] ELM and SVM [70] protein secondary structure prediction [69] multicategory classication method [62], [75] Melting Point of Organic compounds [6] Medical Image Annotation and Retrieval [59] Journ./Conf. IEEE PICDM ICHIS JDCTA ESTSP IJNS ESANN Soft Comp. ICIAAI ICEGITA IEEE ISNN ANN ISNN Neural Process Lett ICISIP Neurocomputing Bioinformatics ACSIECR LNCS Year 2007 2008 2008 2008 2008 2006 2008 2009 2005 2008 2006 2006 2006 2008 2008 2006 2008 2005 2008 2005
Sparse-ELM (S-ELM) is presented which searches the best parameters of ELM using K-fold validation scheme with less computational time. Suresh et.al. applied both these algorithms for multi-category sparse data classication and compared the performance. Nanying Liang proved Non-Identity Learning Vector Quantization Applied to Evoked Potential Detection using a new algorithm LVQ-ELM [47]. It is the combination of LVQ and ELM. The LVQ-ELM algorithm provide the best testing accuracy using less hidden neurons compared to original version of ELM. Chul Kwak implements ELM based classication in Cardiac Disorder Classication [40], using segmentation algorithm by heart sound signals. It signicantly improves the classication accuracy in cardiac disorder categories compared to HMM, MLP, and SVM-based classiers. Dianhui Wang presents a Protein Sequence Classication [67] using ELM. It can be used with many nonlinear activation function and kernel functions to provide less training time, classication accuracy is slightly better than compared to BP. Table II shows a list of other major application papers based on ELM. Table III shows a list of other major extensions papers based on ELM
III. S IMULATIONS
AND RESULTS
3.1. Classication of Fishers Iris Dataset In 1936, Sir Ronald Aylmer Fisher developed Fishers Iris data set. It is sometimes called Andersons Iris data set because Edgar Anderson collected the data to quantify the geographic variation of Iris owers in the Gaspe Peninsula. The dataset [5] consists of 50 samples from each of three species of Iris owers (Iris setosa, Iris virginica and Iris versicolor). Four features were measured from each samples, they are length of sepal, width of sepal, length of petal and width
43
TABLE III OTHER MAJOR EXTENSIONS OF ELM Work Ensembling ELM [10] Robust OS-ELM [23] Improved OS-ELM [45] Recursive C-ELM [50] Robust Recursive C-ELM [51] E-ELM based on PSO [72] Improved Learning Algorithms for SLFN [20] A Priori Information in ELM [21] Multi-Stage ELM [24] Extreme SVM [43] Novel Algorithm for Feedforward NN [12] fast pruned-ELM [58] Partial Lanczos ELM [66] ELM-Bacterial Foraging [14] Journ./Conf. ISNN ISNN ISNN ISNN ISNN ISNN ICIC: AICTA ISNN NCA PAKDD ISNN Neurocomputing Neurocomputing EESRI TABLE IV C LASSIFICATION ACCURACY OF Algorithm Chen-and-Fang method (2005) [9] Hong-and-Lees method (1996) [38] Wu-and-Chens method (1999) [71] Castros method(1999) [7] Chang-and-Chens method (2001) [15] ANN (2008) [63] Our simulation using ELM Accuracy % 97.33 96.67 96.28 96.72 96.07 94.87 98.67
IRIS DATA SET
Year 2007 2007 2007 2006 2006 2006 2008 2006 2008 2008 2006 2008 2009 2007
of petal. In our simulations, ELM with 25 hidden nodes is able to learn the data within 1 minute in Pentium dual core machine (3.0 GHz) with 1GB RAM. ELM is able to achieve a testing accuracy of 98.67. The performance comparison of the ELM algorithm with other algorithms are shown in table IV. 3.2. Classication of Liver Disorders The data is obtained by taking blood tests which are thought to be sensitive to liver disorders that might arise from excessive alcohol consumption. BUPA dataset obtained from BUPA medical research ltd (created by Richard S. Forsyth in 1990) is used in our study for classication. It has 345 instance and 7 attributes including class attribute. The rst 5 variables are all blood tests and the sixth variable is for the number of alchol units consumed per day. Table V 200 and 145 samples are used for training and testing respectivily. In our simulation, ELM with 3000 hidden nodes is able to learn the data within 1.2810 minute in Pentium dual core machine (3.0 GHz) with 1 GB RAM. ELM is able to achieve a testing of 76.50%. The performance comparison of the ELM algorithm is shown in table VI 3.3. Classication of Lymphography Dataset - A reallife medical example Lymphography is an x-ray study of lymph nodes and lymphatic vessels made visible by the injection of a special dye. Classifying/predicting the Lymphography
44
TABLE V D ESCRIPTION OF BUPA ATTRIBUTES Attribute mcv alkphos sgpt sgot gammagt drinks selector Description mean corpuscular volume alkaline phosphotase alamine aminotransferase aspartate aminotransferase gamma-glutamyl transpeptidase number of half-pint equivalents of alcoholic beverages drunk per day eld used to split data into two sets TABLE VI C OMPARISON OF P ERFORMANCE OF BUPA DATASET Algorithm S. Dehuri et.al., MOPPSO technique (2009) [17] In our simulation ELM Accuracy 70.3% 76.5%
data into four classes (namely, normal, metastases, malign lymph, and brosis) is one of the difcult tasks in machine learning. The data for our study is obtained from the University Medical Centre, Institute of Oncology, Ljubljana, Yugoslavia provided by M. Zwitter and M. Soklic. The database consists of 148 instance (normal - 2, metastases - 81, malign lymph - 61, brosis - 4) and 19 attributes including the class attribute. The attribute names and its possible values are provided in table VII. In our simulations, ELM with 700 hidden nodes is able to learn the data within 4.55 minutes in Pentium dual core machine (3.0 GHz) with 1GB RAM. ELM is
TABLE VII LYMPHOGRAPHY - ATTRIBUTE NAMES AND ITS VALUES Attribute Name class lymphatics block of affere block of lymph. c block of lymph. s by pass extravasates regeneration early uptake lym.nodes dimin lym.nodes enlar changes in lym. defect in node changes in node changes in stru special forms dislocation exclusion of no no. of nodes Attribute Value normal nd, metastases, malign lymph, brosis normal, arched, deformed, displaced no, yes no, yes no, yes no, yes no, yes no, yes no, yes 0-3 1-4 bean, oval, round no, lacunar, lac. marginal, lac. central no, lacunar, lac. margin, lac. central no, grainy, drop-like, coarse, diluted, reticular, stripped, fain no, chalices, vesicles no, yes no, yes 0-9, 10-19, 20-29, 30-39, 40-49, 50-59, 60-69, =70
45
TABLE VIII C OMPARISON OF CLASSIFYING LYMPHOGRAPHY DATA SET Algorithm R. Michalski et.al., Multi-Purpose Incremental Learning System (1986) [8] a) Experts b) AQ15 P. Clark et.al. [13] a) Simple Bayes (1987) b) CN2 G. Cestnik et.al., Knowledge-Elicitation Tool (1987) [54] C4.5 (2007) Our simulation using ELM Accuracy %
85 80-82 83 82 76 80.11 88.16
able to achieve a testing accuracy of . The performance comparison of the ELM algorithm with other algorithms are shown in table VIII.
IV. C ONCLUSION
AND
F UTURE
WORKS
Extreme Learning Machine proposed by Huang at el uses Single Layer Feedforward Neural Network (SLFN) Architecture with random input weights and analytically determines the output weights. It has much better generalization performance with much faster learning speed (run thousands times faster than conventional methods). This paper has given an extensive survey of ELM. This paper also elaborates about the theory behind the ELM, its approximation capabilites, and other extensions of ELM. The examples shown in the paper expresses the importance of ELM. A PPENDIX Gradient-Based Learning Algorithm In SLFNs, if the number of hidden neurons is much less than the number of , then is a nonsquare matrix and there may distinct training samples, not exist such that . Thus, it is necessary to nd specic such that
The above equation can be considered as a minimimization problem with cost function as given below.

It is difcult to nd when is unknown. So some learining algorihtms have to be used. Gradient-based learning algorithms are generally used and can be used to search the minimum of .
46
which is the set of weights and biases parameters can be iteratively adjusted to minimize E using gradient-based algorithms as given below.
where is a learning rate. The disadvantage here is the huge learning time, slow convergence, and probelms with over-training. The resolution of a general linear system , where A may be singular and may even not be square, can be made very simple by the use of the Moore-Penrose generalized inverse . Denition 1: A matrix of order is the Moore-Penrose generalized inverse of matrix of order , if

For the sake of convenience, G will be denoted by . A least-squares solution (l.s.s), , for a general linear system as
(15)
is obtained
(16)
where is a norm in Euclidean space. Denition 2: is said to be a minimum norm least-squares solution of a linear system if for any (17) Theorem 10: Let there exist a matrix such that is a minimum norm leastsquares solution of a linear system . Then it is necessary and sufcient
that
, the Moore-Penrose generalized inverse of matrix ACKNOWLEDGEMENT
The rst author thanks UGC for their Major Project Grant. The second author thanks all the staff members in the School of Computer Science and Engineering, Bharathiar University. The authors are also thankfull to Guang-Bin Huang, N. Sundararajan, and S. Suresh for the valuable help provided for doing this research work.
R EFERENCES
[1] Annema, A.J. and Hoen, K. and Wallinga, H,Precision requirements for single-layer feedforward neural networks, In: Fourth International Conference on Microelectronics for Neural Networks and Fuzzy Systems, pp: 145-151, 1994. [2] Amal Mohamed Aqlan, Waiel Fathe Abd El-Wahed, Mohamed Amin Abd El-Wahed, Hybrid Extreme Learning Machine with Levenberg- Marquardt Algorithm using AHP Method, INFOS, pp: 110-117, 2008. [3] R. Venkatesh Babu, S. Suresh, Anamitra Makur, Robust Object Tracking with Radial Basis Function Networks, IEEE International Conference on Acoustics, Speech and Signal Processing, Vol. 1, pp: 937-940, 2007. [4] A.R. Barron, Universal approximation bounds for superpositions of a sigmoidal function, IEEE Trans. Inf. Theory 39 (3) (1993) 930-945. [5] James C. Bezdek, James M. Keller, Raghu Krishnapuram, Ludmila I. Kuncheva, and Nikhil R. Pal, Will the Real Iris Data Please Stand Up?,IEEE Trans. on fuzzy systems, Vol. 7, No. 3, June 1999
47
[6] Akshay Uday Bhat, Shamel Merchant, Sunil S Bhagwat, Prediction of Melting Point of Organic compounds using Extreme Learning Machines ,American Chemical Societys Industrial and Engineering Chemistry Research, Vol. 47, pp 920925, 2008. [7] Castro, J. L., Castro-Schez, J. J. and Zurita, J. M, Learning maximal structure rules in fuzzy logic for knowledge acquisition in expert systems, Fuzzy Sets and Systems, pp: 331-342,1999. [8] B. Cestnik, I. Kononenko, and I. Bratko, ASSISTANT 86: A Knowledge-Elicitation Tool for Sophisticated Users, Progress in Machine Learning, I. Bratko and N. Lavrac (Eds.), Wilmslow: Sigma Press, England, 1987. [9] Shyi-Ming Chen, Yao-De Fang, A New Approach for Handling the Iris Data Classication Problem, International Journal of Applied Science and Engineering, 3, 1: 37- 49, 2005. [10] Huawei Chen, Huahong Chen, Xiaoling Nian, and Peipei Liu, Ensembling Extreme Learning Machines,Proceedings of the 4th international symposium on Neural Networks: Advances in Neural Networks, pp: 1069-1076, 2007. [11] Lei Chen, LiFeng Zhou, Hung Keng Pung, Universal Approximation and QoS Violation Application of Extreme Learning Machine, Neural Process Lett (2008) 28:8195. [12] Huawei Chen and Fan Jin, A Novel Learning Algorithm for Feedforward Neural Networks, J. Wang et al. (Eds.): ISNN 2006, LNCS 3971, pp. 509 514, 2006. [13] Clark,P. and Niblett,T, Induction in Noisy Domains. In I.Bratko and N.Lavrac (Eds.) Progress in Machine Learning, pp: 11-30, Sigma Press. 1987. [14] Jae-Hoon Cho, Myung-Geun Chun, Dae-Jong Lee, Parameter Optimization of Extreme Learning Machine Using Bacterial Foraging Algorithm, EESRI, pp: 742-747, 2007. [15] Chang, C. H. and Chen, S. M. Constructing membership functions and generating weighted fuzzy rules from training data. Proceedings of the 2001 Ninth National Conference on Fuzzy Theory and Its Applications, pp: 708-713, 2001. [16] Dasarathy, B. V., Noise around the neighborhood: A new system structure and classication rule for recognition in partially exposed environments, IEEE Transactions on Pattern Analysis and Machine Intelligence, pp: 67-71, 1980. [17] Dehuri, S., Cho, S.-B., Multi-criterion Pareto based particle swarm optimized polynomial neural network for classication: A Review and State-of-the-Art, Computer Science Review,pp: 19-40, 2009. [18] Guorui Feng, Guang-Bin Huang, Qingping Lin, Robert Gay, Error Minimized Extreme Learning Machine with Growth of Hidden Nodes and Incremental Learning, IEEE Transactions on Neural Networks, 2009. [19] Fei Han, De-Shuang Huang, Improved extreme learning machine for function approximation by encoding a priori information, Neurocomputing, Vol. 69, pp: 2369-2373, 2006. [20] Fei Han,Improved Learning Algorithms of SLFN for Approximating Periodic Function, Proceedings of the 4th international conference on Intelligent Computing: Advanced Intelligent Computing Theories and Applications - with Aspects of Articial Intelligence, Vol. 5227, pp: 654-660, 2008. [21] Fei Han, Tat-Ming Lok, and Michael R. Lyu,A New Learning Algorithm for Function Approximation Incorporating A Priori Information into Extreme Learning Machine, ISNN, pp: 631-636, 2006. [22] Stephanus Daniel Handoko, Kwoh Chee Keong, Ong Yew Soon, Guang Lan Zhang, and Vladimir Brusic, Extreme Learning Machine for Predicting HLA-Peptide Binding, Advances In Neural Networks, pp: 716-721, 2006. [23] Minh-Tuan T. Hoang, Hieu T. Huynh, Nguyen H. Vo, and Yonggwan Won,A Robust Online Sequential Extreme Learning Machine, Proceedings of the 4th international symposium on Neural Networks: Advances in Neural Networks, Vol. 4, pp: 1077-1086, 2007. [24] Hu, Xue-Fa, Zhao, Zhen, Wang, Shu, Wang, Fu-Li, He, Da-Kuo, Wu, Shui-Kang, Multi-stage extreme learning machine for fault diagnosis on hydraulic tube tester, NCA, Vol. 17, pp: 399-403, 2008. [25] Guang-Bin Huang, Qin-Yu Zhu, Chee-Kheong Siew, Extreme Learning Machine: A New Learning Scheme of Feedforward Neural Networks, International Joint Conference on Neural Networks, Vol. 2, pp: 985-990, 2004. [26] Guang-Bin Huang, Chee-Kheong Siew, Extreme Learning Machine with Randomly Assigned RBF Kernels, International Journal of Information Technology, Vol. 11, No. 1, pp: 16-24, 2005. [27] Guang-Bin Huang, Nan-Ying Liang, Hai-Jun Rong, P. Saratchandran, N. Sundararajan, On-Line Sequential Extreme Learning Machine, The IASTED International Conference on Computational Intelligence, pp: 232-237, 2005. [28] Guang-Bin Huang, Chee-Kheong Siew, Extreme Learning Machine: RBF Network Case, Proceedings of the Eighth International Conference on Control, Automation, Robotics and Vision, Vol. 2, pp: 10291036, 2005. [29] Guang-Bin Huang, Qin-Yu Zhu, Chee-Kheong Siew, Extreme Learning Machine: Theory and Applications, Neurocomputing, Vol. 70, pp: 489-501, 2006.
48
[30] Guang-Bin Huang, Qin-Yu Zhu, Chee-Kheong Siew, Real-time learning capability of neural networks, IEEE Transaction on Neural Networks, Vol 17, No. 4, pp:863-878, 2006. [31] Guang-Bin Huang, Lei Chen, Chee-Kheong Siew, Universal Approximation Using Incremental Constructive Feedforward Networks with Random Hidden Nodes, IEEE Transactions on Neural Networks, Vol. 17, No. 4, pp: 879-892, 2006. [32] Guang-Bin Huang, Qin-Yu Zhu K. Z. Mao, Chee-Kheong Siew, P. Saratchandran, N. Sundararajan, Can Threshold Networks Be Trained Directly?, IEEE Transactions on Circuits and Systems-II, Vol. 53, No. 3, pp: 187-191, 2006. [33] Guang-Bin Huang, Lei Chen, Convex Incremental Extreme Learning Machine, Neurocomputing, Vol. 70, pp: 3056-3062, 2007. [34] Guang-Bin Huang, Lei Chen, Chee-Kheong Siew, Incremental Extreme Learning Machine with Fully Complex Hidden Nodes, Neurocomputing, Vol. 71, pp: 576-583, 2008. [35] Guang-Bin Huang, Lei Chen, Enhanced Random Search Based Incremental Extreme Learning Machine, Neurocomputing, Vol. 71, pp: 3460-3468, 2008. [36] Hieu Trung Huynh, Yonggwan Won, Weighted Least Squares Scheme for Reducing Effects of Outliers in Regression based on Extreme Learning Machine, JDCTA: International Journal of Digital Content Technology and its Applications, Vol. 2, No. 3, pp: 40-46, 2008. [37] Hong, T. P. and Lee, C. Y. Induction of fuzzy rules and membership functions From training examples, Fuzzy Sets and Systems, pp: 33-47, 1996. [38] Hong, T. P. and Lee, C. Y, Effect of merging order on performance of fuzzy induction, Intelligent Data Analysis, pp: 139-151, 1999. [39] Hong, T. P. and Chen, J. B, Finding relevant attributes and membership functions, Fuzzy Sets and Systems, pp: 389-404, 1999. [40] Chul Kwak, Oh-Wook Kwon, Cardiac Disorder Classication Based on Extreme Learning Machine, Proceedings of World Academy Of Science, Engineering And Technology, Vol. 36, pp: 1260-1263, 2008. [41] Derong Liu, Guest Editors, Huaguang Zhang, Sanqing Hu, Neural networks: Algorithms and applications, Neurocomputing, Vol. 71, pp: 471-473, 2008. [42] Ying Liu, Han Tong Loh, Shu Beng Tor, Comparison of extreme learning machine with support vector machine for text classication, Proceedings of the 18th international conference on Innovations in Applied Articial Intelligence, pp: 390 - 399, 2005. [43] Qiuge Liu, Qing He, and Zhongzhi Shi, Extreme Support Vector Machine Classier, PAKDD, pp: 222-235, 2008. [44] Ming-Bin Li, Guang-Bin Huang, P. Saratchandran, N. Sundararajan, Fully Complex Extreme Learning Machine, Neurocomputing, Vol. 68, pp: 306-314, 2005. [45] Bin Li, Jingming Wang, Yibin Li, Yong Song, An Improved On-Line Sequential Learning Algorithm for Extreme Learning Machine, D. Liu et al. (Eds.): ISNN 2007, Part I, LNCS 4491, pp. 10871093, 2007. [46] Ming-Bin Li, Guang-Bin Huang, Paramasivan Saratchandran,Channel Equalization Using Complex Extreme Learning Machine with RBF Kernels, ISNN, pp: 114-119, 2006. [47] Nanying Liang, Laurent Bougrain, Non-identity Learning Vector Quantization applied to evoked potential detection, 2nd French conference on Computational Neurosciences - NEUROCOMP, 2008. [48] Nan-Ying Liang, Paramasivan Saratchandran, Guang-Bin Huang, Narasimhan Sundararajan, Classication of mental tasks from EEG signals using Extreme Learning Machine, International Journal of Neural Systems, Vol. 16, No. 1, pp: 2938, 2006. [49] Nan-Ying Liang, Guang-Bin Huang, Hai-Jun Rong, P. Saratchandran, N. Sundararajan, A Fast and Accurate On-line Sequential Learning Algorithm for Feedforward Networks, IEEE Transactions on Neural Networks, Vol. 17, No. 6, pp: 1411-1423, 2006. [50] Junseok Lim, Jaejin Jeon and Sangwook Lee, Recursive Complex Extreme Learning Machine with Widely Linear Processing for Nonlinear Channel Equalizer, Proceedings of International Conference on Intelligent Sensing and Information Processing, pp: 128-134, 2006. [51] Junseok Lim, Koeng Mo Sung, and Joonil Song, Robust Recursive Complex Extreme Learning Machine Algorithm for Finite Numerical Precision, Advances in Neural Networks,vol. 3971, pp: 637643, 2006. [52] Chen Ling, Zou Ling-Jun and Tu Li, Stream Data Classication Using Improved Fisher Discriminate Analysis, JOURNAL OF COMPUTERS, pp: 208-214, 2008. [53] Yoan Miche, Patrick Bas, Christian Jutten, Olli Simula, Amaury Lendasse, A methodology for Building Regression Models using Extreme Learning Machine: OP-ELM, ESANN Proceedings, 2008. [54] Michalski,R., Mozetic,I. Hong, J.,and Lavrac,N., The Multi-Purpose Incremental Learning System AQ15 and its Testing Applications to Three Medical Domains, In Proceedings of the Fifth National Conference on Articial Intelligence, pp: 1041-1045. Philadelphia, PA: Morgan Kaufmann, 1986.
49
[55] Fernando Mateo, Amaury Lendasse, A variable selection approach based on the Delta Test for Extreme Learning Machine models, European Symposium on Time Series Prediction, 2008. [56] Mahesh Pal, Extreme Learning Machine for land cover classication, 11th Annual international conference and exhibition on geospatial information, technology and application, 2008. [57] Hai-Jun Rong, Guang-Bin Huang, P. Saratchandran, N. Sundararajan, On-Line Sequential Fuzzy Extreme Learning Machine for Function Approximation and Classication Problems, IEEE Transactions on Systems, Man, and Cybernetics: Part B, Vol. 39, No. 4, pp: 1067-1072, 2009. [58] H. J. Rong, Y. S. Ong, A. H. Tan, and Zexuan Zhu, A fast pruned-extreme learning machine for classication problem, Neurocomputing, 2008. [59] MAHMUDUR RAHMAN, DESAI Bipin C, BHATTACHARYA Prabir, Supervised Machine Learning Based Medical Image Annotation and Retrieval in ImageCLEFmed 2005, Lecture notes in computer science, vol. 4022, pp: 692-701, 2005. [60] Rampal Singh, S. Balasundaram, Application of Extreme Learning Machine Method for Time Series Analysis, International Journal of Intelligent Technology, Vol.2, No.4, pp: 256-262, 2007. [61] Rampal Singh, S. Balasundaram, On the Application of Extreme Learning Machine for Time Series Prediction, Proceeding of the International Conference on Data Management, pp: 317-324, 2008. [62] Alexander Statnikov,Constantin F. Aliferis, Ioannis Tsamardinos, Douglas Hardin and Shawn Levy,A comprehensive evaluation of multicategory classication methods for microarray gene expression cancer diagnosis, Bioinformatics, Vol. 21, pp: 631-643, 2005. [63] N. P. Suraweera, D. N. Ranasinghe, Adaptive Structural Optimisation of Neural Networks, The International Journal on Advances in ICT for Emerging Regions, pp: 33 - 41, 2008. [64] S. Suresh, R. Venkatesh Babu, H. J. Kim, No-reference image quality assessment using modied extreme learning machine classier, Applied Soft Computing, Vol. 9, pp: 541-552, 2009. [65] S. Suresh, S. Saraswathi, N. Sundararajan, Performance enhancement of Extreme Learning Machine for Multi-category Sparse data Classication Problem, Pattern Analysis and Applications [66] Xiaoliang Tang , Min Han, Partial Lanczos extreme learning machine for single-output regression problems, Neurocomputing, Vol. 72, pp:3066-3076, 2009. [67] Dianhui Wang, Guang-Bin Huang, Protein Sequence Classication Using Extreme Learning Machine, Proceedings of International Joint Conference on Neural Networks, Vol. 3, pp: 1406- 1411, 2005. [68] Lipo P. Wang, Chunru R. Wan, Comments on the Extreme Learning Machine, IEEE Transactions on Neural Networks, Vol. 19, No. 8, 2008. [69] Guoren Wang, Yi Zhaoa and Di Wang,A protein secondary structure prediction framework based on the Extreme Learning Machine , Neurocomputing, Vol. 72, Pp: 262-268, 2008. [70] Xun-Kai Wei, Ying-Hong Li, and Yue Feng, Comparative Study of Extreme Learning Machine and Support Vector Machine, Proceedings of International Conference on Intelligent Sensing and Information Processing, pp: 1089-1095, 2006. [71] Wu, T. P. and Chen, S. M, A new method for constructing membership functions and fuzzy rules from training examples, IEEE Transactions on Systems, Man, and Cybernetics-Part B: Cybernetics, pp: 25-40, 1999. [72] You Xu and Yang Shu,Evolutionary Extreme Learning Machine Based on Particle Swarm Optimization, In Advances in Neural Networks - ISNN 2006, Vol. 3971, pp. 644.652, 2006. [73] Chee-Wee Thomas Yeu, Meng-Hiot Lim, Guang-Bin Huang, A New Machine Learning Paradigm for Terrain Reconstruction, IEEE Geoscience and Remote Sensing Letters, Vol. 3, No. 3, 2006. [74] Qi Yu, Antti Sorjamaa, Yoan Miche, Eric Severin, Amaury Lendasse, Optimal Pruned K-Nearest Neighbors: OP-KNN Application to Financial Modeling, Eighth International Conference on Hybrid Intelligent Systems, pp: 764-769, 2008. [75] Runxuan Zhang, Guang-Bin Huang, Narasimhan Sundararajan, and P. Saratchandran, Multi-Category Classication Using Extreme Learning Machine for Microarray Gene Expression Cancer Diagnosis, IEEE/ACM Transactions on Computational Biology and Bioinformatics, Vol. 4, No. 3, pp. 485-495, 2007. [76] Qizhi Zhang and Yali Zhou, Active Noise Control Using a Feedforward Network with Online Sequential Extreme Learning Machine, Proceedings of the 5th international symposium on Neural Networks: Advances in Neural Networks,Vol. 5263, pp: 410-416, 2008. [77] Qin-Yu Zhu, A.K. Qin, P.N. Suganthan, Guang-Bin Huang, Evolutionary Extreme Learning Machine, Pattern Recognition, Vol. 38, pp: 1759-1763, 2005.

Paper 4

Uploaded by

Document Information

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Paper 4

Uploaded by

Copyright:

Available Formats

INTERNATIONAL JOURNAL OF WISDOM BASED COMPUTING, VOL.

Extreme Learning Machines - A Review and State-of-the-art

II. E XTREME L EARNING M ACHINE - A R EVIEW

INTERNATIONAL JOURNAL OF WISDOM BASED COMPUTING, VOL. 1(1), 2011

Equation (4) can be written compactly as

INTERNATIONAL JOURNAL OF WISDOM BASED COMPUTING, VOL. 1(1), 2011

INTERNATIONAL JOURNAL OF WISDOM BASED COMPUTING, VOL. 1(1), 2011

INTERNATIONAL JOURNAL OF WISDOM BASED COMPUTING, VOL. 1(1), 2011

INTERNATIONAL JOURNAL OF WISDOM BASED COMPUTING, VOL. 1(1), 2011

INTERNATIONAL JOURNAL OF WISDOM BASED COMPUTING, VOL. 1(1), 2011

TABLE I I NPUTS AND OUTPUTS OF XOR

INTERNATIONAL JOURNAL OF WISDOM BASED COMPUTING, VOL. 1(1), 2011

INTERNATIONAL JOURNAL OF WISDOM BASED COMPUTING, VOL. 1(1), 2011

INTERNATIONAL JOURNAL OF WISDOM BASED COMPUTING, VOL. 1(1), 2011

INTERNATIONAL JOURNAL OF WISDOM BASED COMPUTING, VOL. 1(1), 2011

85 80-82 83 82 76 80.11 88.16

INTERNATIONAL JOURNAL OF WISDOM BASED COMPUTING, VOL. 1(1), 2011

, the Moore-Penrose generalized inverse of matrix ACKNOWLEDGEMENT

INTERNATIONAL JOURNAL OF WISDOM BASED COMPUTING, VOL. 1(1), 2011

INTERNATIONAL JOURNAL OF WISDOM BASED COMPUTING, VOL. 1(1), 2011

INTERNATIONAL JOURNAL OF WISDOM BASED COMPUTING, VOL. 1(1), 2011

You might also like