You are on page 1of 18

| 

 

2 1 Brief Introduction
2 2 Backpropogation Algorithm
2 3 A Simply Illustration
J     
 
2 uistory
2 1.2 Review to Decision Tree
ë Learning process is to reduce the error, which can
be understood as the difference between the target
and output values from learning structure.
ë ID3 Algorithm can be implemented only for discrete
values.
ë Artificial Neural Network (ANN) can describe
arbitrary functions.
2 1.3 Basic Structure
ë This example of ANN learning is provided by
Pomerluau¶s(1993) system ALVINN, which uses a
learned ANN to steer an autonomous vehicle
driving at normal speeds. The input of ANN is a
30x32 grid of pixel intensities obtained from
forward-faced camera mounted on the vehicle. The
output is the direction in which the vehicle is
steered.
ë As can be seen, 4 units receive inputs directly from
all of the 30X32 pixels from the camera in vehicle.
These are called ´hidden´ units because their
outputs are only available to the coming units in the
network, but not as apart of the global network.
2 1.4 Ability
ë Instances are represented by many attribute-value
pairs. The target function to be learned is defined
over instances that can be described by a vector of
predefined feature. such as the pixel values in the
ALVINN example.
ë The training examples may contain errors. In
following sections we can see, that ANN learning
methods are quite robust to noise in training data.
ë Long training times are acceptable. Compared to
decision tree learning, network training algorithm
requires longer training time, depending on factors
such as the number of the weights in network.
J  
   |  
2 2.1 Sigmoid
ë Like the perceptron, the
sigmoid unit first
computes a linear
combination of its input.
ë then the sigmoid unit
computes its output with
the following function.
ë This equation 2 is often referred to as the
squashing function since it map very large
input domain to a small range of output.

ë this sigmoid function has a useful property that


its derivative is easily expressed in terms of its
output. In the following description of the
backpropagation we can see, the algorithm
makes use of this derivative.
2 2.2 Function
ë the sigmoid is only one unit in the network, now
we take a look at the whole function, which the
neural network calculates. There is a figure 2.2,
if we consider an example (x, t), where x is
called input attribute and t is called target
attribute, than:
2 2.3 Squared Error
ë Above it has mentioned, that the whole learning
process is in order to reduce the error, but how
can man error describe? Generally the function
squared error is used.
ë Notice: this function 3 sums all the error over all
of the networks output units after a whole set of
training examples has been computed.
ë then the value-vector can be updated by:

ë where E(~w) is the gradient of E:

3  
   
 
ë But in practice, because the function 3 sums all the
error over a whole set of the training data, so need
the algorithm with this function more time to
compute, and can easily be effected by local
minimum, so construct man a new function, named
stochastic squared error:

ë As can be seen, the function computes error only


about a example. The gradient of Ed(~w) is easily
made out:
2 2.4 Backpropagation Algorithm
ë The learning problem faced by Backpropagation is
to search a large hypothesis space defined by all
possible weight values for all the units in the
network. The diagram of Algorithm is:
ë Notice: the error term for hidden unit h is
calculated by summing the error terms ı_k for
each output unit influenced by unit h, weighting
each of the ı_k¶s by w_kh,the weight from
hidden unit h to output unit k. This weight
characterizes the degree to which hidden unit
h is ´responsible for´ the error in output unit k.
J  | 
  
÷ 
     
 
   ÷÷
    

          
           
        
           
       

  
        
    
   
        

        


         

ë This 8 x 3 x 8 network was trained to learn the
identity function. After 5000training times, the three
hidden unit values encode the eight distinct inputs
using the encoding shown in the tabular. Notice if
the encoded values are rounded to zero or one, the
result is the standard binary encoding for 8 distinct
values.

You might also like