You are on page 1of 6

http://www.paper.edu.

cn

BP Neural Network principle and MATLAB Simulation


Xiong Xin Nie Mingxin
School of Information Engineering School of Information Engineering
Wuhan University of Technology Wuhan University of Technology
Wuhan, P. R. China 430070 Wuhan, P. R. China 430070
sense1559@163.com niemx@sohu.com
Abstract
This paper introduces the prevalent BP algorithm in neural network, and discusses the goodness
problem and training process of BP neural network, as well as using MATLAB software to simulate
the numbers on the basis of it. At last, several improved BP training algorithms have been compared
in the paper.
Keywords: BP neural network; number recognition; MATLAB

1 Introduction transmitting processing in the study


[2]
The development of neural network is processing of BP neural network . The
rapid since the first neural network model signal inputted from outside spreads to the
[1] output layer and gives the result through
MP model came up in 1943 . Hopfield
neural network proposed in 1982 and opposite processing layer for layer of neurons in input
phase broadcast algorithm proposed by layer and hidden layer. If the expected output
Rumelhart in 1985 make the neural network of cant be obtained in output layer, it shifts to
Hopfield model and multilayer feedforward the conversed spreading processing and the
model to be the prevalent neural network true value and the error outputted by network
model. They are effective in many will return along the coupled access formerly.
applications of fields such as speech The error is reduced by modifying contacted
recognition, mode recognition, image weight value of neurons in every layer and
processing and industry controlling. then it shifts to the positive spreading
processing and revolves iteration until the
Neural network is an theory which is error is smaller the given value. Take a three
imitative of the biological processing model to layer network for example, the network is
get the function of information intelligent composed of N input neurons, K hidden
neurons and M output neurons(as showed in
processing. It treats with the pattern
fig.1). O2pm and O1pk are the output value of
information which is hard to be expressed in output layer and hidden layer respectively.
certain language by the method from bottom w2km and w1nk are the connected weight
to top and parallel distribution way formed by value from the hidden layer to the output layer
and from the input layer to the hidden layer
self-study, self-organization and non-linear respectively. Suppose the input studying
dynamics. Neural network is a parallel and sample is Xpn, so its corresponding expected
distributed information processing network output value is tpm.
architecture. It is generally composed by
massive neurons, each of which has only one
output that can connect many other neurons.
The reciprocity between neurons is embodied
by their connected weighs. The output of
neurons is its input function. The types of
functions in common use have linear function,
Fig.1 BP neural network configuration
Sigmoid function and value function.
There are two phases of positive
transmitting processing and error reverse

-1-
http://www.paper.edu.cn

2 BP Neural Network a. From the perspective of mathematics, BP


algorithm as a kind of local searching
2.1 the Discussion about the Advantages
and Disadvantages of BP Neural Network optimized method, it is used to solve the
BP neural network is a kind of neural overall extremum of complex non-linear
network forms which has most applications function, so the algorithm is likely to be gotten
[3]
currently , but it isnt very perfect. In order into the local extremum and make the training
to understand how to apply the neural network
to resolve problems, we carry on the fail;
discussion about its advantages and b. The approaching and promoting abilities are
disadvantages here. closely linked with the representative of
The advantages of BP neural network: studying sample. It is a hard problem to
Network realizes the mapped function from choose the training collection composed of
input to output and mathematical theory has the typical sampling examples.
proved that it has the function to achieve The contradiction between the scale of
any complex non-linear mapping; examples and network is hard to solve, which
Network can extract the logical solution refers to the relationship of possibility and
rules automatically through studying the feasibility of network capacity, viz. the
examples with correct results. It has the problem of studying complexity;
ability of self-study; The choice of network configuration has
Network has the definite abilities of still no a uniform and integrate academic
promotion and generalization. guidance and it can be selected by experience.
The disadvantages of BP neural network: Therefore some people call the structure
The study speed of BP algorithm is very choice of neural network is a kind of art. The
slow. The main causations of it are: network structure infects the approaching
a. Because BP algorithm is grads declining ability and promoting character directly. So
method essentially and the aim function how to choose an appropriate network
optimized by it is very complex, the structure is an important problem;
sawtooth-shaped phenomenon is bound to New samples can infect the network which
appear which makes the BP algorithm studies successfully and the number that
inefficiency; describes the character of every input
b. The torpid phenomenon exists. Because the sample should be equal;
optimized aim function is very complex, it There is contradiction between the
can appear some flat areas in the case of the predictive ability of network(also called
output of neurons approach 0 or 1. In these generalization ability or promoting ability)
areas, the error of weight value changes and training ability(also called approaching
very little, which can hardly make the ability or study ability). Usually when the
training processing break down; training ability is poor, the predictive ability
c. In order to execute the BP algorithm in the will be poor and in a certain extent, with the
network, we cant use the traditional improvement of training ability the predictive
one-dimensional search method to solve the ability is also improved. However, this trend
interative step length every time. We should has a limit. When achieving this limit, with
put the network the updated rules of step the improvement of the training ability the
length in advance. The former algorithm will predictive ability will be decline on the
make the algorithm inefficiency. contrary, which is also called the over fitting
the network training is much more likely to phenomenon. And now the network studies
fail, the reasons as below:

-2-
http://www.paper.edu.cn

too much detail of samples and cant reflect study is greater than the predetermined value.
the embedded laws of samples. 3 the Applications of BP Network in
2.2 BP Network algorithm digital recognition
The training process of BP network is as
[4] Firstly, the numbers 0 9 should be
below .
(1) Initialization. Endow every connected carried on the digital processing to constitute
weight value and threshold value with a lesser the input samples. Considering that using the
random value. 0-1 image of 55 matrix will represent every
(2) Input the corresponding neurons in input
layer with an component of a eigenvector Xpk number clearly and then it designs and trains
= (Xpk1 ,Xpk2 Xpk3 ,,Xpkn) . an neural network which can recognize the ten
(3) Use the eigenvector of input samples to numbers 0~9 once again. When giving the
calculate the corresponding output value trained network with an input that can
O1pk=f(Xpkn) of neurons in hidden layer. represent some number, network can represent
(4) Use each unit output O1pk in hidden layer this number correctly through the 8421BCD
to calculate the input value in each output code in output terminal. This network can
layer and then further calculate the remember all the ten numbers by study and
corresponding output O2pm ,O2pm training. The neural network training should
=f(O1pk) of each unit in output layer. be supervised to train ten groups of arrays
(5) Calculate the generalized error of each unit which can represent numbers 0~9 and show
in output layer via the teach signals. the corresponding four two-scale numbers of
(6) Use the connected weight value W2km 1~10 in output terminal.
between middle layer and output layer, the 3.1 the Design of network structure
generalized error i in each unit of output From the analysis above, we can get that
layer and the output O1pk of each unit in the neural network need to have N=55 input
neurons and M=4 output neurons. The two
middle layer to calculate the generalized error layer logsig/logsig network adopting the
of each unit in middle layer. logarithm Sigmoid type active function in the
(7) Use the generalized error j of each unit in scope of (0,1) is quite effective to the 0,1
Boolean values. So the network may adopt
output layer and the output O1pk of each unit
the N-K-M structure, where the hidden layer
in middle layer to modify the weight value chooses one layer, N is the number of
w2km between output layer and middle layer neurons in input layer, K is the number of
and the threshold value Yj of each unit in neurons in hidden layer and M is the number
of neurons in output layer. The number 0
output layer. and 9 can represent respectively by 0-1
(8) Use the generalized error ei of each unit in chart as:
middle layer and the input Xpkn of each unit le0=[1 1 1 1 1 le1=[0 0 1 0 0
in input layer to modify the weight value 1 0 0 0 1 0 0 1 0 0
w1nk between input layer and middle layer 1 0 0 0 1 0 0 1 0 0
and the threshold value i of each unit in 1 0 0 0 1 0 0 1 0 0
hidden layer. 1 1 1 1 1] 0 0 1 0 0]
(9) Select the next sample in order and return le2=[1 1 1 1 1 le3=[1 1 1 1 1
2 until the samples in training collection are 0 0 0 0 1 0 0 0 0 1
all studied over. 1 1 1 1 1 1 1 1 1 1
1 0 0 0 0 0 0 0 0 1
(10) Return 2 over again until the error
1 1 1 1 1] 1 1 1 1 1]
function is lower than the predetermined value,
le4=[1 0 0 0 1 le5=[1 1 1 1 1
viz. the times of network constringency or
1 0 0 0 1 1 0 0 0 0

-3-
http://www.paper.edu.cn

1 1 1 1 1 1 1 1 1 1 training precision 0.1; the rate of study 0.01;


0 0 0 0 1 0 0 0 0 1 momentum constant 0.95; the biggest error
0 0 0 0 1] 1 1 1 1 1] ratio 1.05.
le6=[1 1 1 1 1 le7=[1 1 1 1 1 The program flow as below:
1 0 0 0 0 0 0 0 1 0 le0=[1 1 1 1 1 1 0 0 0 1 1 0 0 0 1 1 0 0 0 1 1 1
1 1 1 1 1 0 0 1 0 0 1 1 1]';
1 0 0 0 1 0 1 0 0 0 le1=[0 0 1 0 0 0 0 1 0 0 0 0 1 0 0 0 0 1 0 0 0 0
1 1 1 1 1] 1 0 0 0 0] 1 0 0]';
le8=[1 1 1 1 1 le9=[1 1 1 1 1 le2=[1 1 1 1 1 0 0 0 0 1 1 1 1 1 1 1 0 0 0 0 1 1
1 0 0 0 1 1 0 0 0 1 1 1 1]';
1 1 1 1 1 1 1 1 1 1 le3=[1 1 1 1 1 0 0 0 0 1 1 1 1 1 1 0 0 0 0 1 1 1
1 0 0 0 1 0 0 0 0 1 1 1 1]';
1 1 1 1 1] 1 1 1 1 1] le4=[1 0 0 0 1 1 0 0 0 1 1 1 1 1 1 0 0 0 0 1 0 0
3.2 Network MATLAB Simulation 0 0 1]';
To the applied BP network above, we can le5=[1 1 1 1 1 1 0 0 0 0 1 1 1 1 1 0 0 0 0 1 1 1
use the functions in MATLAB neural network
[5]
1 1 1]';
toolbox to simulate . The neural network
le6=[1 1 1 1 1 1 0 0 0 0 1 1 1 1 1 1 0 0 0 1 1 1
toolbox is one of toolboxes under the
MATLAB environment. It constructs the 1 1 1]';
excitation functions of typical neural network le7=[1 1 1 1 1 0 0 0 1 0 0 0 1 0 0 0 1 0 0 0 1 0
such as the S type, linear type, competitive 0 0 0]';
type and saturated linear type functions based
on the artificial neural networks theory and le8=[1 1 1 1 1 1 0 0 0 1 1 1 1 1 1 1 0 0 0 1 1 1
using the MATLAB language. These 1 1 1]';
functions can make the designer change the le9=[1 1 1 1 1 1 0 0 0 1 1 1 1 1 1 0 0 0 0 1 1 1
selected network output calculation to the
1 1 1]';
transfer of excitation functions.
Hidden layer gets 10 neurons relied on P=[le0,le1,le2,le3,le4,le5,le6,le7,le8,le9];
experience. To the network structure which is T=[0 0 0 0;0 0 0 1;0 0 1 0;0 0 1 1;0 1 0 0;0 1 0
25-10-4 structure, input layer has 25 neurons 1;0 1 1 0;0 1 1 1;1 0 0 0;1 0 0 1]';
and each number can be represented with 0-1 %adopting feedforward BP network algorithm
chart of 55 matrix whose elements can %and the BP algorithm training function
constitute a numeric column matrix. The 10 %Levenberg_Marquardt
numbers are represented separately by a 25 net=newff(minmax(P),[10,4],{'logsig','logsig'}
10 input matrix which is composed of 10 ,'trainlm');
numbers column matrix. And then send the 10 %the biggest training times is 5000 here
numbers to the variable named P: net.trainParam.epochs=5000;
P= [le0, le1, le2, le3, le4, le5, le6, le7, le8, net.trainParam.goal=0.1;
le9] net.trainParam.lr=0.01;
Output layer has 4 neurons, because the net.trainParam.mm=0.95;
objective vector expects that when each net.trainParam.er=1.05
number is inputted, the four binary elements net=train(net,P,T);
of 1~10 can be corresponded in the output The training curve that come out finally is
terminal correctly. shown in fig.2, in which we can that the
The design parameters of network are: the network converges to the required precision at
largest training times 5000-2000(according a great speed. In the condition that objective
to the training function); no noise recognized convergent precision and other training

-4-
http://www.paper.edu.cn

parameter dont change, compare the Levenberg_


simulations separately of the other several BP trainlm Marquardt 5000/ 0.0994116
algorithm training functions, the result shown BP algorithm 7
in table 1. In this table we can conclude that in The adjust of Levenberg Marquardt
[6]
the condition of not affecting the precision, algorithm weight value is :
adopting the L-M optimized algorithm is the 1
most fast.
Where J is a Jacobian matrix which is
composed of the differential coefficient by
error to weight value, E is the error vector, m
is a scalar quantity which depends on the
width of m. This method is ranging between
the two extreme cases smoothly which are the
Newton method(when m0) and the steepest
decline method(when m).

4 Conclusions
Train the network by five kinds of BP
algorithm and the result shows that for the part
of the rapidity of convergence, the L-M
algorithm is the fast and its precision is the
Fig.2 Training curve of L-M algorithm highest, but other algorithms are not so good
Table 1 the comparison of training effect in as the L-M algorithm relatively. L M
different BP algorithms algorithm is fit to some trainings which have a
biggest great quantity of samples.
Training BP training convergence In addition, the reliability of digital
function algorithm times / precision recognition using neural network can be
actual gotten by using hundreds of vectors that have
training random noise. If a higher recognized precision
times is required, it can lengthen the time of network
Gradual 10000/ 0.0999982 training and make the training error precision
traingd descendent 6842 higher or increase the numbers of the neurons
BPalgorithm in hidden layer. Otherwise, it can enhance the
Gradual resolving power of input vectors, for example,
traingda descendent 5000/ 0.0971766 adopting 1616 lattices and so on.
self-adaptw/l 130
BPalgorithm References:
Gradual 20000/ 0.0999995
[1] Ying Liandong.The Design and Application of
traingdm descendent 18373 BP Neural Network [J]. Information
w/momentum Technology,2003,27(6):18-20.
[2] NG S C, CHEUNG C C,LEUNG S H. Fast
BPalgorithm Convergence for Back - Propagation Network
Gradual with Magnified Gradient Function[ J ].IEEE,
2003, 9 (3) : 1903 - 1908.
traingdx descendent 5000/ 0.0957755 [3] He Qingbi, Zhou Jianli. The convergence and
w/momentum 156 improvements of BP neural network[J]. Journal
Of ChongQing Jiao TTong University,
and self-adapt 2005,24(1):143-145.
BPalgorithm [4] Fan lei, Zhang Yuntao, Chen Zhenjun.
Application of Improved BP Neural Network

-5-
http://www.paper.edu.cn

Based onMatlab[J]. Journal of China West


Normal University(Natural Sciences), 2005,
26(1):70-73.
[5] Zhang Dexi, Bi Yuhua. The Application of
MATLAB in Pattern Recognition[J]. Journal Of
XuChang Teachers COLLEGE,
2002,21(5):43-46.
[6] Yang Zhongjin, Shi Zhongke. Architecture
Optimization for Neural Network[J]. Project and
Application Of Computer, 2004,25:52-53.

-6-

You might also like