You are on page 1of 9

Electronics and Communications in Japan, Vol. 94, No.

7, 2011
Translated from Denki Gakkai Ronbunshi, Vol. 128-C, No. 7, July 2008, pp. 11371142

Fast Backpropagation Learning Using Optimization of Learning Rate for Pulsed Neural Networks

KENJI YAMAMOTO, SEIICHI KOAKUTSU, TAKASHI OKAMOTO, and HIRONORI HIRATA


Chiba University, Japan

SUMMARY Neural networks are widely applied to information processing because of their nonlinear processing capability. Digital hardware implementation of neural networks seems to be effective in the construction of neural network systems in which real-time operation and much broader applications are possible. However, the digital hardware implementation of analog neural networks is very difficult because of the need to satisfy restrictions concerning circuit resources such as circuit scale, arrangement, and wiring. A technique that uses a pulsed neuron model instead of an analog neuron model as a method for solving this problem has been proposed, and its effectiveness has been confirmed. To construct pulsed neural networks (PNN), backpropagation (BP) learning has been proposed. However, BP learning takes considerable time to construct a PNN compared with the learning of an analog neural network. Therefore, some method of speeding up BP learning in PNN is necessary. In this paper, we propose a fast BP learning method using optimization of the learning rate for a PNN. In the proposed method, the learning rate is optimized so as to speed up learning in every learning cycle. To evaluate the proposed method, we apply it to pattern recognition problems, such as XOR, 3-bit parity, and digit recognition. The results of the computer-based experiments demonstrate the validity of the proposed method. 2011 Wiley Periodicals, Inc. Electron Comm Jpn, 94(7): 2734, 2011; Published online in Wiley Online Library (wileyonlinelibrary.com). DOI 10.1002/ecj.10249

1. Introduction Neural networks, an engineering model of the neuronal networks in the brains of living beings, are widely used, particularly in recognition, control, and prediction, because of their nonlinear processing capabilities [1]. However, most of the forms in which neural networks are implemented are based on software running on von Neumann computers. Implementing neural networks in hardware is desirable for the purpose of increasing the range of neural network applications and improving speed. Moreover, in recent years, improvements in the hardware implementation of digital circuits, including programmable devices such as field programmable gate arrays (FPGA) and complex programmable logic devices (CPLD), have made it possible to implement circuits quickly and at low cost. In particular, dynamically reconfigurable hardware, which can change its circuit configuration dynamically in parallel with other processes, and evolvable hardware (EHW), which can autonomously acquire its own circuit configuration, are attracting attention. As a result of the implementation of neural networks in such devices, the range of applications of neural networks can be expanded, and faster processing as well as rapid response to the results of learning are possible. When considering the hardware implementation of a neural network in digital circuitry, constraints related to the scale of the circuit in the device, wiring, and other circuit resources must be satisfied, in contrast to computational processing using software. Moreover, the devices generally used as EHW are smaller than common VLSI in terms of circuit scale from considerations of reproducibility and replacement. As a result, it is not desirable to transfer the computations performed in software in unaltered form to a multibit configuration. A method using a pulsed neural network, with bit-serial transmission of 1-bit pulse density signals over one signal connection between neurons, has been proposed as a method of resolving the above problems [2]. This approach appears to be highly effective when implementing 2011 Wiley Periodicals, Inc. .

Key words: pulsed neural network; backpropagation learning; learning rate.

27

multiple neurons because the wiring region and circuit scale can be reduced in hardware implementation. The pulsed neuron model is originally based on a biological prototype. However, in an engineering implementation it can be expected to yield superior neural networks from the standpoint of circuit scale and signal processing. Previously, Hebbian learning rules [3] suitable for a pulsed neuron model and the error backpropagation method [4] specific to pulsed neural networks (PNN) have been proposed. However, because the input and output values are represented as pulse frequencies in the pulsed neuron model, the pulse frequency cannot be determined unless a certain number of pulses is received, and the input and output values cannot be determined. As a result, more time is required for error backpropagation learning in a PNN than for learning in an analog neural network. Increasing the learning rate and increasing the amount of change during a single update to accelerate learning can be considered. However, if the learning rate is increased, then convergence fluctuates due to an excessive increase in updates during initial learning, and the convergence rate deteriorates. Methods of varying the learning rate in order to accelerate learning in an analog neural network have been proposed [5, 6]. Thus, in this paper, we propose a backpropagation learning method for a PNN that can accelerate convergence by optimization during updates to the connection weights and attenuation rate, and that makes variable the learning rate that previously was a fixed value. As a result of this method, the average number of learning cycles required can be reduced without a drop in the convergence rate. The validity of the proposed method is demonstrated through computer-based experiments. Below, Section 2 describes the backpropagation learning method specific to PNNs. Section 3 explains the proposed method. In Section 4, computer-based experiments are described, and Section 5 summarizes the paper and identifies topics for the future. 2. Pulsed Neural Networks 2.1 The pulsed neuron model Figure 1 shows a schematic diagram of the pulsed neuron model. This is a simulation of the electrical activity in a single neuron. The input signal and output signal are both pulse sequences having a time evolution. The magnitude of each signal is represented by the pulse frequency. When a pulse arrives in the pulsed neuron model, the local membrane potential pn(t) in that location rises in accordance with the connection weight wn, then attenuates to the resting potential with a time constant . The internal potential I(t) in the pulsed neuron model is represented as

Fig. 1. The pulsed neuron model.

the sum of all local resting potentials at that time. The neuron fires (emits a pulse) when the internal potential exceeds the threshold Ith. This action is represented by the following four equations: (1) (2) (3) (4) Here dt represents the minimum unit of the circuit activation time, and a represents the attenuation rate, governed by the attenuation time constant. A Heaviside function is used for the output function H(.). 2.2 Backpropagation learning specific to the pulsed neuron model A backpropagation learning method [4] specific to a PNN has been proposed as a method suitable for the pulsed neuron model. Because the input and output values are represented as pulse frequencies in the pulsed neuron model, the frequency of the pulses cannot be determined unless a certain number of pulses has been received, and the input and output values cannot be determined. As a result, the input and output pulses in this model represent one set in N clocks, that is, one number. In specific terms, N firings in N clocks are represented by 1, 0 firings by 0, and q firings by q/N. In this learning method, we use an approximation of the relationship between the output and the average net value in N clocks in each neuron by employing a sigmoid function as the output function for use during learning. Equation (5) gives the mean net value, and Eq. (6) gives the sigmoid approximation function used as the output function: (5)

28

(6) In addition to x, which represents the real numerical input and output, we also use x(t), which represents the pulsed input and output at each clock cycle, as a representation specific to the pulsed neuron. Here t represents the clock time, and takes an integer value from 1 to N; u is the net value, and is the slope coefficient of the sigmoid approximation function. Equation (7) is the equation for updating the connection weight in the output layer neurons, and Eq. (8) is the equation for updating the connection weights in the hidden layer neurons in this learning method:

Table 1. Relationship between the attenuation rate and the approximation function

Here g(a) is the slope of the sigmoid approximation function, dependent on a, and is the slope coefficient of the new sigmoid approximation function. Table 1 shows the values of the relationship between the attenuation rate and the sigmoid approximation function obtained in computer experiments. Based on Table 1, the trend function g(a) is approximated as follows: (13) We have devised an attenuation rate backpropagation learning method using the extended output function. In addition to x, h, and o, representing the real numerical input and output, x(t), h(t), and o(t) representing the pulsed input and output in each clock cycle are also used as quantities specific to the pulsed neuron. t is the clock value, an integer value between 1 and N. With the learning rate during attenuation rate learning set to 2, Eq. (14) represents the update equation for the attenuation rate in the output layer neurons, and Eq. (15) represents the update equation for the attenuation rate in the middle layer neurons in this learning method:

(7)

(8) where 1 represents the learning rate during connection weight learning; is a coefficient that expresses the relationship between the input and the average net value; and _ _ Pk represents the average value of the membrane potential, independent of the connection weight, in N clocks. These are given by the equations below: (9) (10) (11)

(14) 2.3 Attenuation rate backpropagation learning method An attenuation rate backpropagation learning method specific to the pulsed neuron model has been proposed, in which the attenuation rate is learned by the backpropagation method, utilizing the property that the output function of the pulsed neuron is dependent on the attenuation rate a. Because the output function of the pulsed neuron is dependent on Eq. (6) for the sigmoid approximation function used during learning is extended as follows: (12) (15)

29

3. Learning Rate Optimization In the backpropagation learning method for PNN given in Section 2, the input and output values are represented as pulse frequencies. Unless a certain number of pulses are received, the pulse frequency cannot be determined and learning cannot be performed. As a result, compared to the backpropagation learning method in an ordinary analog neural network, the problem arises. However, if the learning rate is increased, then convergence fluctuates due to an excessive increase in the updates during initial learning, and the convergence rate deteriorates. Thus, in this paper we propose a backpropagation learning method for a PNN that incorporates learning rate optimization and accelerates convergence by the performance of optimization during updating of the connection weights and the attenuation rate, and uses a variable learning rate rather than the previously used fixed rate. First, let us consider an optimization method for the learning rate during connection weight learning. The root mean square error is used in the error evaluation function under the backpropagation learning method for a PNN. If the training signal for ok for the k-th output in the output layer is tk, then the root mean square error E is given by (16) First, let us consider correction of the connection weight in the backpropagation learning method. If the learning rate is 1, then the correction w of the connection weight w in the backpropagation learning method can be found as follows: (17) In the proposed method, E/w is found first. Thus, E/w is determined unambiguously and is constant. Consequently, the correction w of the connection weight is a single-variable function of the learning rate 1. Therefore, the root mean square error E is also a single-variable function of the learning rate 1. E is a quadratic function, as shown in Eq. (16). Thus, the learning rate 1 that minimizes E can be found by satisfying (18) (28) Below we derive the update equation for the connection weight learning rate 1. During the r-th learning step, if the k-th output after the update is o+(r), then the error k e+(r) after the update is given by k (25) (23)

(19)

(20) Here o+(r) is the k-th output after the update; h+(r) is the j-th k j ~ output in the hidden layer after the update, and w(r) is the magnitude of the update for the connection weight. Firstorder approximation is used: (21) The change in the net value resulting from the update to the connection weight is so small that it can be ignored:

(22)

(24)

(26) (27)

Based on the above, the update equation for 1(r) is as follows:

30

(29) The attenuation rate learning rate is also optimized by a similar method. The update equation for the attenuation rate learning rate 2(r) is (30) Fig. 2. The digit recognition problem.

(31)

4. Computer Experiments 4.1 Experimental results The error backpropagation learning method specific to a PNN is called conventional method 1, the error backpropagation learning method using attenuation rate learning is called conventional method 2, the error backpropagation learning method using learning rate optimization during connection weight learning is called proposed method 1, and the error backpropagation learning method using learning rate optimization during attenuation rate learning is called proposed method 2. We performed comparative experiments with these methods. In the experiments, we compared the cases in which the two conventional methods and proposed method 1 alone were used, in which conventional method 2 alone was used, and in which both proposed methods were applied to three problems: the XOR problem, the 3-bit parity problem, and the digit recognition problem. Table 2 lists the true values for the 3-bit parity problem, and Fig. 2 shows the input for the digit recognition problem. Table 3 lists the parameter values used

in each experiment. The learning rate and attenuation rate given in the tables are the values used in the fixed-rate method. Each method was implemented in C++. The computer experiments were performed on a personal computer (CPU: Pentium 4, 3.06 GHz; OS: Windows XP; compiler: Microsoft VisualStudio 2005 Ver. 8.0) and evaluated. In the experiments, we provided the learning set given by the tables for the true values, then performed one cycle of learning after the presentation of one set was completed. In the attenuation rate error backpropagation learning method, after the presentation of one set, learning of the connection weights was performed, after which attenuation rate learning was performed. In each instance of learning, learning was judged successful when the root mean square error was below the maximum permissible error , and learning was judged to have failed when the number of learning cycles exceeded 5000 without the root mean square falling below . A total of 100 trials were performed. The network configuration consisted of: 2 input layers, 3 hidden layers, and 1 output layer in the XOR problem; 3 input layers, 5 hidden layers, and 1 output layer in the 3-bit parity problem; and 35 input layers, 5 hidden layers, and 10 output layers in the digit recognition problem. Tables 4, 5, and 6 list the results of the experiments on the XOR problem, the 3-bit parity problem, and the digit recognition problem. Figures 3, 4, and 5 show the variation in the learning rate in one trial for each problem.

Table 2. The 3-bit parity problem Table 3. Parameters

31

4.2 Discussion It is clear from Tables 4, 5, and 6 that using learning rate optimization during connection weight learning in all problems reduced the average number of learning cycles and the computation time. In addition, it is clear from Figs. 3, 4, and 5 that the learning rate during connection weight learning initially had a low value, then took higher values as the amount of connection weight updating fell with the progress of training. Thus, error backpropagation learning in a PNN can be accelerated by using learning rate optimization during connection weight learning. The average number of learning cycles was reduced substantially, by approximately 800, in the digit recognition method, but it was reduced only by approximately 200 in the 3-bit parity method, a relatively small reduction compared to the digit recognition problem. The reason for the significant difference in the benefits of acceleration of error backpropagation learning depending on the problem appears to be an increase in the similarity of the input and output patterns, as shown in Table 2 for the 3-bit parity problem, so that many local solutions were generated during error backpropagation learning, and optimization became more difficult. Conversely, in the digit recognition problem, the 10 outputs were separated in the patterns, so that the similarity is low, and optimization is simpler because there are few local solutions. This appears to be the reason that the reduction in the average number of learning cycles is greatest in this problem. Furthermore, in only the digit recognition problem, when using learning rate optimization during connection weight learning, the convergence rate improved, reaching almost 100%. This appears to be because in cases where convergence previously could not be achieved within 5000 cycles of learning, it was achieved in the method using learning rate optimization due to the acceleration of learning. On the other hand, even when learning rate optimization was used during attenuation rate learning, the variation in the average number of learning cycles was minimal, as can be seen from the tables. Furthermore, even when performing learning rate optimization during both connec-

Table 5. Experimental results (3-bit parity)

Table 6. Experimental results (digit recognition)

Fig. 3. Variation of learning rate in connection weight learning (XOR).

Table 4. Experimental results (XOR)

Fig. 4. Variation of learning rate in connection weight learning (3-bit parity).

32

a comparison with conventional methods. The results showed that the average number of learning cycles required in all of the problems was reduced by optimization of the learning rate during connection weight learning, indicating the validity of the proposed method. On the other hand, optimization of the learning rate during attenuation rate learning produced little change in the results, confirming a lack of benefit. Future topics include further improvement of the method of learning rate optimization so that it works effectively even in problems with a high degree of similarity in the input and output patterns, and an analysis of the hardware implementation of the proposed method. Fig. 5. Variation of learning rate in connection weight learning (digit recognition). REFERENCES 1. Sakawa M, Tanaka Y. Introduction to neurocomputing. Morikita Publishing; 1997. (in Japanese) 2. Tanaka Y, Kuroyanagi S, Iwata A. A technique for hardware implementation of neural networks using FPGA. Technical Report of the Neuro-Computing Research Group, IEICE, NC 2000-179, p 175182, 2001. (in Japanese) 3. Motoki M, Hamagami T, Koakutsu S, Hirata H. A Hebbian learning rule restraining catastrophic forgetting in pulse neural networks. Trans IEICE 2003;123:11241133. (in Japanese) 4. Yamane Y, Koakutsu S, Hirata H. Neural network model for evolvable hardware. 16th Electrical and Electronic Systems Division Conference, p 445448, 2004. (in Japanese) 5. Murai H, Omatsu S, Oe S. Improvement of convergence speed of back-propagation method by genetic algorithm and its application to remote sensing analysis. Trans IEICE 1997;J80-D2:13111313. (in Japanese) 6. Yoshikawa T, Kawaguchi Y. A high speed learning method for backpropagation rules in neural networks. Trans IEICE 1992;J75-D2:837840. (in Japanese)

tion weight learning and attenuation rate learning, the results were virtually the same as when performing learning rate optimization alone during connection weight learning. Hence, learning rate optimization during attenuation rate learning seems to be of little benefit to the acceleration of learning. The reason appears to be that the major contribution to convergence is made by learning of connection weights, and learning of the attenuation rate is at the most supplementary, so that even when the learning rate of the attenuation rate is optimized, acceleration of learning cannot be achieved. 5. Conclusions We have devised an error backpropagation learning method using optimization of the learning rate during connection weight learning and attenuation rate learning for the purpose of accelerating backpropagation learning in a PNN. We applied our proposed method to the XOR problem, the 3-bit parity problem, and the digit recognition problem, and performed computer-based experiments and

33

AUTHORS (from left to right)

Kenji Yamamoto (student member) received a bachelors degree from the Department of Electronic and Mechanical Engineering of Chiba University in 2007 and began the first part of the doctoral program in artificial and systems engineering at the Graduate School of Engineering. He is engaged in research on neural networks. Seiichi Koakutsu (member) completed the doctoral program in manufacturing science at the Graduate School of Natural Science of Chiba University in 1992 and became a lecturer in the Faculty of Engineering. He was appointed an associate professor in 1997, and subsequently an associate professor in the Graduate School of Natural Science. In 2007 he became an associate professor in the Graduate School of Engineering. In 19941995 he was a visiting researcher at the University of California, Santa Cruz. He is engaged in research on VLSI layout, stochastic optimization methods, and neural networks. He holds a D.Eng. degree, and is a member of IEEE, INNS, IEICE, and SICE. Takashi Okamoto (member) received a bachelors degree from the Department of Physics and Informatics of Keio University in 2003 and completed the doctoral program at the Graduate School of Science and Engineering in 2007. He was a JSPS postdoctoral fellow in 2006 (DC2). In 2007 he was appointed a professor in the Graduate School of Engineering of Chiba University. He received a 2006 SICE Academic Scholarship and Research Award. He is engaged in research on optimization methods for computational modeling of nonlinear dynamic systems. He holds a D.Eng. degree, and is a member of SICE. Hironori Hirata (senior member) completed the doctoral program in electrical engineering at Tokyo Institute of Technology in 1976 and became a lecturer in the Faculty of Engineering at Chiba University. He became an associate professor in 1981 and a professor in 1994. He became a professor in the Graduate School of Natural Science in 1997, and a professor in the Graduate School of Engineering in 2007. He is interested in the basic theory of modeling, analysis, and design of large-scale systems, in particular ecological systems, VLSI layout, and distributed systems. He received an IEEJ Progress Prize in 2001. He holds a D.Eng. degree, and is a member of IEEE (Fellow), INNS, IEICE, IPSJ, SICE, ISCIE, and the Japanese Society for Mathematical Biology (JSMB).

34

Copyright of Electronics & Communications in Japan is the property of John Wiley & Sons, Inc. and its content may not be copied or emailed to multiple sites or posted to a listserv without the copyright holder's express written permission. However, users may print, download, or email articles for individual use.

You might also like