Professional Documents
Culture Documents
d Papert).
x z
x y z 0 0 1 1 0 1 0 1 0 1 1 0
Summary Perceptron - artificial neuron. Takes weighted sum of inputs, outputs + 1 if grater then the threshold else outputs 0. Hebbian learning (increasing effectiveness of active junctions) is predominant approach. Learning corresponds to adjusting the values of the weights. Feedforward supervised networks. Can use +1, -1 instead of 0, 1 values. Can only solve problems that are linearly separable - therefore fails on XOR. Further Reading 1. Parallel Distributed Processing, Volume 1. J.L McClelland & D.E. Rumelhart. MIT Bradford Press, 1986. An excellent, broad-ranging book that covers many areas of neural networks. It was the book that signalled the resurgence of interest in neural systems. 2. Organisation and Behaviour. Donald Hebb. 1949. Contains Hebbs original ideas regarding learning by reinforcement of active neurons. 3. Perceptrons. M. Minsky & S. Papert. MIT Press 1969. The criticism of single-layer perceptrons are laid out in this book. A very interesting read, if a little too mathematical in places for some tastes.
THE MULTILAYER PERCEPTRON XOR Problem An initial approach would be to use more than one perceptron, each set up to identify small, linearly separable sections of the inputs, then combining their outputs into another perceptron, which would produce a final indication of the class to which the input belongs.
Fig. 16. Combining perceptrons can solve XOR problem Perceptron 1 detects when the pattern corresponding to (0,1) is present, and the other detect when (1,0) is there. Combined, these to facts allow perceptron 3 to classify the input correctly.
Note
The concept proposed seems fine on first examination but for
the following reasons we have to modify the perception model Each neuron in the structure takes the weighted sum of its inputs, thresholds it and outputs a one or a zero. For the perceptron in the first layer the inputs come from the actual inputs in the network, while the perceptrons in the second layer take as their inputs the outputs from the first layer. In consequence the perceptrons in the second layer do not know which of the real inputs were or not. Sine learning corresponds to strengthening the connections between active inputs and active units, it is impossible to strengthen the correct parts of the network, since the actual inputs are effectively masked off from the input units. The hard-limiting threshold function removes the information that is needed if the network is successfully learn - credit assignment problem. The solution
If we smooth the thresholding process out so that it more or
less turns on or off, as before, but has a sloping region in the middle that will give us some information on inputs we will be able to determine when we need to strengthen or weaken the relevant weights - the network will be able to learn.
1 1
0 Linear thershold
0 Sigmoidal threshold
an input layer. an output layer a hidden layer Each unit in the hidden layer and the output layer is like a perception unit, except that the thresholding function is the one shown in figure 17, the sigmoid function not the step function as before. The units in the input layer serve to distribute the values they receive to the next layer, and so do not perform a weighted sum or threshold. We are forced to alter our learning rule.
rule, or the backpropagation rule (Rumelhart, McClelland and Williams 1986, Parker 1982, Werbos 1974)
The operation of the network is similar to that of the single
layer perception.
The learning rule is a little bit complex than the previous one.
We need to define an error function that represents the difference between the networks current output and the correct output that we want it to produce. Because we need to know the correct pattern, this type of learning is known as supervised learning. In order to learn successfully we want continually reduce the value of the error function - this is achieved by adjusting the weights on the links between units. The generalised delta rule does it by calculating the value of the error function for that particular input, and next backpropagating the error from one layer to the previous one in order to adjust weights. For units actually on the output, their output and desired output is known, so adjusting the weights is relatively simple. For units in the middle layer, the adjustment is not obvious.
The Mathematics
Notation
Ep t pj o pj wij
-the error function for pattern p. -the target output for pattern p on node j. -the actual output for the pattern p at the j -the weight from node i to node j.
Ep =
1 t pj o pj 2 j
net pj = wij o pi
i
o pj = f j net pj
w
k
kj o pk = k
w jk o pk =o pi wij
The change of error as a function of the change in the net inputs to a unit what gives
E p = pj , net pj
E p = pj o pi . wij
Decreasing the value of E p therefore means making the weight changes proportional to pj o pi p wij = pj o pi
We now need to know what pj is for each of the units - if we know this, then we can decrease E
pj =
Now considering
E p E p o pj = net pj o pj net pj
o pj = f j(net pj ) net pj
and
E p = (t pj o pj ) o pj
pi = f j(net pj )(t pj o pj )
If a unit is not an output one we can write, by the chain rule again
w
i
ik
o pi = pk wik
i
pi = f j(net pj ) pk w jk
k
The two enclosed equations above represent the change in the error function with respect to the weights in the network.
The sigmoid function o pj = f (net ) = with f (net ) = and finally f (net ) = ko pj 1 o pj Note 1 + exp( k net ) k exp( k net ) = kf (net )(1 f (net )) 1 1 + exp( k net )
The error function is proportional to the errors pj in subsequent units. The error should be calculated in the output units first. Next the error should be passed back through the net to the earlier units to allow them to alter their connection weights. It is the passing back of this error value that leads to the network being referred to as back-propagation networks.
THE MULTILAYER PERCEPTION ALGORITHM 1. Initialise weights and thresholds to small random values. 2. Present input X p = x0 , x1 , x2 ,..., xn1 and target output Tp = t 0 , t1 , t 2 ,..., xm1 where n is the number of input nodes and m is the number of output nodes. Set w0 = , the bias, and x0 = 1 . For pattern association, X p and Tp represent the patterns to be associated. For classification, Tp is set to zero except for one element set to 1 that corresponds to the class that X p is in. 3. Calculate actual output Each layer calculates n1 y pj = f wi xi i =0 and passes that as an input to the next layer. The final layer outputs values o pj . 4. Adapt weights Start from the output layer, and work backwards.
wij (t + 1) = wij (t ) + pj o pj
wij represents the weights from node i to node j at time t, is a gain term and pj is an error term form pattern p on node j. For output units f (net ) = ko pj 1 o pj t pj o pj For hidden units
)(
pi = ko pj (1 o pj ) pk w jk
k
The XOR Problem Revisited The two layer net is shown in figure 19. is able to produce the correct output. The connection weights are shown on the links. The threshold of each unit is shown inside the unit.
0.5
output unit
+1
-2 1.5
+1 hidden unit
+1
+1
input
-6.3
output unit
-4.2
-6.4
-6.4
input
Fig. 20 Weights and thresholds of a network that has learnt to solve the XOR problem.
0.5
+1
-1
0.5 +1 +1 +1
1.5
+1
input
0.8
-4.5
5.3
-1.8
-2.0
input