You are on page 1of 4

International Journal of Futuristic Trends in Engineering and Technology Vol.

4 (01), 2014

A Comparative study for Optimizing Feed forward Neural Network using Pruning methods
Shruti Kanani
Department of Computer Science & Engineering, B. H. Gardi college of Engg. & Tech., Rajkot, Gujarat, India. 1 s.s.kanani@gmail.com
Abstract - Artificial neural networks (AANs) is use to solve complex problem. To discover the knowledge from data optimizing the structure of neural networks is an essential step. For that researchers have to develop various techniques for pruning. Small neural networks are more capable to maximize generalization capability. This paper presents comparative study of some pruning algorithms. The approach taken by these pruning methods are to train the network larger than required and then discard irrelevant or redundant parts. At last comparison of all pruning methods are given which show N2PS method is quite better than others. Keywords: Neural network, Pruning methods, Input and Hidden neurons pruning, generalization.

Kelvina Jivani
Department of Computer Science & Engineering, B. H. Gardi college of Engg. & Tech., Rajkot, Gujarat, India. 2 kelvi.jivani@gmail.com advantages and limitations presented in Section 4.Conclusion and future work are presented in Section 5.

II. FEED FORWARD NEURAL NETWORK


Architecture of FFNN In this section the elementary concepts of neural networks is presented. Fig.-1 depicts a feed forward NN, having one hidden layer with Nh nonlinear units and an output layer with M linear units.[7] For the pth pattern, the jth hidden units net function and activation are netpj = (1) , , h Opj = f( netpj ) = (2)

I. INTRODUCTION
Artificial neural networks (ANNs) are widely used in areas like data mining, web mining, bio-informatics and multimedia data processing [1]. The power of the neural network depends on how well it generalizes new data following training. The generalization capabilities of ANNs depend on the size of the training data, training epochs, and architecture of the network. The success of ANNs largely depends on their architecture, which is usually determined by a trial and error process but sometimes by growing method or pruning method. Many algorithms have been used in numerous ways to optimize the network architecture [2, 3]. In general, small-sized networks offer advantages in terms of simpler structure and better generalization ability but it may not learn the problem well. On the other side, large-sized networks learn easily but show poor generalization performance due to overtraining [4].Research in the section of network architecture can be grouped into two types of methods for building neural networks with optimal size. The first type starts with a small network and grows this network by adding hidden units or hidden layer as learning progresses (growing method). The second type starts with a large network with more than the necessary number of hidden units and then prunes off the weights or units that are irrelevant for the learning(pruning method).[5] This paper focuses on pruning approaches which start with large network and remove unnecessary neurons and interconnection weights. The outline of this paper is as follows. Section 2 presents brief idea about Feed Forward Neural Network. Section 3 describes the details of pruning methods; a survey of existing pruning approaches and its
Akshar Publication 2014

Fig. 1. NN with single hidden layer

Weight wh(j,i) connects the ith input to the jth hidden unit. Here the threshold of the jth node is represented by wh(j,N + 1) and by letting xp,N + 1 to be a constant, equal to one. The kth output for the pth pattern is given by ypk = + , (3) Where, 1 k M. Here, woi(k.i) is the weight connecting ith input to kth output and woi(k.j) is the weight connecting jth hidden unit to kth output. There are Nv training patterns denoted by where each pattern consists in an input vector xp and a desired output vector tp. For the pth pattern, the N input values are xpi, (1iN) and the M desired output values are tpk(1kM). The squared error for the pth pattern is

http://www.ijftet.wix.com/research

22

International Journal of Futuristic Trends in Engineering and Technology Vol. 4 (01), 2014

Ep=

(4)

In order to train a neural network for one epoch only then the mean squared error (MSE) for the ith output unit is defined as E(i) = (5)

The overall performance of a feed forward neural network, measured as MSE, can be written as E= = (6) Thus, only the number of hidden neurons is always unknown. To find this, a pruning algorithm may be used. Overtraining and Generalization Overtraining is a type of over fitting. Overtraining occurs if the neural network is too powerful for the current problem. It then does not "recognize" the underlying trend in the data, but learns the data by heart (including the noise in the data). [6] This results in poor generalization and too good a fit to the training data. One way to avoid overtraining is to estimate the generalization ability during training and stop it begins to decrease. So divide the data into a validation set and training set then recording the error and storing weights. Best set of weights giving minimum error on the test set retrieved when training finished.

Small networks have better generalization and they are faster and cheaper to build. [4] For unknown optimum size One can train a number of networks of different size and select the smallest one. If the optimum size is known than also small network become complex and data may be sensitive to initial condition and learning parameters. Small networks are learn slowly and it may suck in a local minima. However it is not easy to determine the smallest network, which best fits the data. One of the methods to determine the best size of the network with good generalization capability is Pruning

III. PRUNING METHOD


The neural network is a massively parallel architecture consisting of number of neuron in highly interconnected manner. The neurons are the computing units of the neural Network. As shown in Fig. 1 the network consists of a number of layers. On a particular input vector at time t let X(t) is vector denoting the desired output of neural network. A layer consist number of nodes. Where, n number of nodes at output layer. The aim of the learning procedure as shown in Fig.2 is to adjust weights and biases such that error E(t) will be minimized. A brute force parameter pruning method could be to set every

Fig. 2. Neural network training process

parameter to zero and evaluate change in E(t). If it increases then restore the parameter value otherwise remove it [8]. However the brute force method is not an effective method for parameter elimination. Pruning algorithms have been classified into two broad groups: sensitivity based pruning and penalty term based pruning. [8] 1. Sensitivity based pruning As shown in Fig. 2 (a) The sensitivity based pruning methods estimate the sensitivity S(t) of the error E(t) to removal of a parameter. During training process the sensitivity of all the parameters of the network are calculated. The parameters with least sensitivity S(t) will be removed from the
Akshar Publication 2014

network and network is retrained. This process is usually repeated several times.

Fig. 3: (a) Neural network pruning by sensitivity method

http://www.ijftet.wix.com/research

23

International Journal of Futuristic Trends in Engineering and Technology Vol. 4 (01), 2014

2. Penalty term based pruning. The penalty term pruning method is represented in Fig.3 (b). In these methods the forward calculation and backward propagation is same as that of standard learning method. However penalty terms are added with objective function that rewards the network for choosing efficient solution. The penalty term methods modify the cost function so that while parameter modification the function drives unnecessary parameters to zero and in effect removes them during training.

Fig. 3: (b) Neural network pruning by penalty term method. IV. REVIEWS OF OTHER PRUNING METHOD Network Pruning is often cast as three sub-procedures: (i) define and quantify the saliency for each element in the network; (ii) eliminate the least significant elements; (iii) readjust the remaining topology. According to these procedures different the pruning methods can be classified as follows. Lecun et al. [9] have proposed the optimal brain damage (OBD) method that approximates the measure of saliency of a weight by estimating the second derivative of the network output error with respect to that weight. In this method pruning is carried out iteratively on a well trained network to a reasonable level, compute saliencies, delete low saliency weights and resume training. This OBD method assumes that the error function is quadratic and that the Hessian is diagonal. In OBD method calculates the saliency only with the pivot elements of the Hessian matrix without retraining after the pruning step. To overcome this disadvantage, Hassibi and Storck [10] developed the optimal brain surgeon (OBS) algorithm. This method removes the diagonal assumption idea used in OBD, because if the diagonal assumption is inaccurate, it can lead to the removal of wrong weights. However it is impractical for large networks. An early stopping procedure monitors the error on a validation set and

halts learning when this error starts to increase. There is no guarantee that the learning curve passes through the optimal point and the final weights are sensitive to the learning dynamics. In cross validation method the whole dataset is divided into two parts that is training set and cross validation set.[5] This method combines the use of a penalty function during network training and a subset of the training samples for crossvalidation. The penalty is added to the error function so that the weights of network connections that are not useful have small magnitude. Such network connections can be pruned if the resulting accuracy of the network does not change beyond a preset level. Training samples in the cross validation set are used to indicate when network pruning is terminated. The magnitude based pruning (MBP) methods assume that small weights are irrelevant. Hagiwara [11] suggests three simple and effective strategies called Goodness factor, Consuming energy and Weights power for detecting both redundant hidden neurons and weights. Xing-Hu [12] suggests a two phase construction approach for pruning both input and hidden units of FFNNs based on mutual information (MI). First all salient input units are determined according to the order of ranking result and by considering their contributions to the networks performance. Then the irrelevant input units are eliminated. Second the redundant hidden units are removed from trained networks, one after another according to a relevance measure. Jie, Abdesselam and Son [13] suggest new algorithms for pruning elements (weighs and hidden neurons) in Neural Networks is presented based on Compressive Sampling (CS) theory. This method makes it possible to locates the significant elements, and hence find a sparse structure, without computing their saliency. This algorithm help to obtain the optimal topology for pruning weights and hidden neurons based on CS. The main advantage of our approach is that the proposed algorithms are capable of iteratively building up the sparse topology, while maintaining the training accuracy of the original larger architecture. The neural network pruning by significance (N2PS), [11] is based on a new significant measure which is calculated by the sigmoidal activation value of the node and all the weights of its outgoing connections. It considers all the nodes with significance value below the threshold as insignificant and eliminates them. Table 1 shows the summary of all these pruning method. Table classifies individual elements pruned by the each method and advantages and limitation of each method.

Akshar Publication 2014

http://www.ijftet.wix.com/research

24

International Journal of Futuristic Trends in Engineering and Technology Vol. 4 (01), 2014
TABLE I. COMPRESSIVE SURVEY OF D IFFERENT PRUNING METHODS WITH PRUNED ELEMENTS ADVANTAGES AND LIMITATION Sr. no. 1 OBD Methods Pruned Elements weights Easy to implement Remove wrong weights, and low impact on the learning process Computes the accuracy of the network being pruned on the training samples as well as on the cross-validation samples to guide and stop the pruning process. less sophisticated and require computational time significantly less Remove important parts of the network, relates additional local minima on the error surface during training Requires more number of iterations and pruning steps to find the optimal network, so decrease the speed of pruning very time-consuming to train a large network Advantages Limitation low computational efficiency, additional local minima on the error surface during training low computational efficiency, additional local minima on the error surface in training relates

relates

OBS

weights network connections weight based on magnitude Redundant hidden neuron and small weights Input and Hidden neurons based on MI weighs and hidden neurons based on CS All nodes and weights based on significant threshold value

Cross validation

MBP

MI Compressive Sampling (CS) based N2PS

Effective When there are dependencies between input Lead to a quick convergence and a better topology, effective accuracy and computational complexity Removes the insignificant input neurons and, best optimization methods of FFNN for classifying large datasets.

V.

CONCLUSION

In this paper many pruning algorithms based on different pruning techniques have been discussed and table shows that each algorithm has its own advantages and limitations. Some pruning algorithms prune both irrelevant input neurons and hidden neurons of the network and some algorithms prune irrelevant hidden neurons only. Real-world applications prefer simpler and more efficient methods. But a significant drawback of most standard methods consists in their low efficiency.OBD and OBS algorithms are old methods has low computation efficiency. MBP and MI methods take more time and number of iteration. But Cross validation and N2PS methods are better than other methods and useful for any size of data. REFERENCES
[1] Reitermanova Z (2008) Feedforward neural networksarchitecture optimization and knowledge extraction.In: WDS08 proceedings of contributed papers, Part I, pp 159164. [2] Ahmmed S, Abdullah-Al-Mamun K, Islam M (2007) A novel algorithm for designing three layered artificial neural networks. Int J Soft Comput 2(3): pp 450458. [3] Chin-Teng Lin,C.S.George Lee Neural Fuzzy System: A Neuro Fuzzy Synergism to Intelligent System PHI publication. [4] S. Urolagin, K.V. Prema, and N.V.S. Reddy (2012) Generalization Capability of Artificial Neural Network Incorporated with Pruning Method In: Springer, LNCS 7135, pp 171178. [5] Thuan Q. Huynh, Rudy Setiono (2005) Effective neural network pruning using cross-validation, In: Neural Networks, IEEE pp 972-977. [6] Russell Reed, Student Member, IEEE Pruning Algorithms-A Survey In: IEEE transactions on neural networks, vol. 4, pp 740-747. [7] Pramod L. Narasimha, W. H. Delashmit , T. Manry, Jiang Li, F. Maldonado (2008) An integrated growing-pruning method for feedforward network training In: Elsevier, Neurocomputing, vol.71, pp 28312847. [8] Siddhaling Urolagin (2012) Multilayer Feed-Forward Neural Network Integrated with Dynamic Learning Algorithm by Pruning of Nodes and Connections In: International Journal of Computer Applications (0975 8887), Vol.47, No.2, pp 9-17

[9] Le Cun Y, Denker JS, Solla SA (1990) Optimal brain damage In: Touretzky DS (Ed) Advances in neural information processing systems, vol 2. Morgan Kaufmann, San Mateo, pp 598605. [10] Hassibi B, Stork DG,Wolf GJ (1993) Optimal brain surgeon and general network pruning. In: Proceedings of IEEE ICNN93, vol 1, pp 293299. [11] M. Gethsiyal Augasta, T. Kathirvalavakumar (2011) A Novel Pruning Algorithm for Optimizing Feed forward Neural Network of Classification Problems In: Springer, Neural Process Lett (2011) 34: pp 241 258. [12] M. Gethsiyal Augasta1, T. Kathirvalavakumar2 (2013) Pruning algorithms of neural networks - a comparative study In: Springer, Central European Journal of Computer Science, vol 3, pp 105-115. [13] Yang, J., Bouzerdoum, A. & Phung, S. (2009). A neural network pruning approach based on compressive sampling. In: Proceedings of International Joint Conference on Neural Networks 2009 (pp. 3428-3435). New Jersey, USA: IEEE. [14] Slim ABID, Mohamed CHTOUROU and Mohamed DJEMEL (2013) Pruning Approaches for Selection of Neural Networks Structure In: 10th International Multi-Conference on Systems, Signals & Devices (SSD) pp 1-4, IEEE.

Akshar Publication 2014

http://www.ijftet.wix.com/research

25

You might also like