You are on page 1of 13

LIMITATIONS OF RECEPTRON XOR Problem The failure of the perceptron to successfully simple problem such as XOR (Minsky and

d Papert).

x z

x y z 0 0 1 1 0 1 0 1 0 1 1 0

Fig. 14. The exclusive-or logic symbol and function table.

Y X Y Coordinate Output representationZ 0 0 1 0 0 1 1 1 0 1 1 0 X

Fig. 15 The XOR problem in pattern space.

Summary Perceptron - artificial neuron. Takes weighted sum of inputs, outputs + 1 if grater then the threshold else outputs 0. Hebbian learning (increasing effectiveness of active junctions) is predominant approach. Learning corresponds to adjusting the values of the weights. Feedforward supervised networks. Can use +1, -1 instead of 0, 1 values. Can only solve problems that are linearly separable - therefore fails on XOR. Further Reading 1. Parallel Distributed Processing, Volume 1. J.L McClelland & D.E. Rumelhart. MIT Bradford Press, 1986. An excellent, broad-ranging book that covers many areas of neural networks. It was the book that signalled the resurgence of interest in neural systems. 2. Organisation and Behaviour. Donald Hebb. 1949. Contains Hebbs original ideas regarding learning by reinforcement of active neurons. 3. Perceptrons. M. Minsky & S. Papert. MIT Press 1969. The criticism of single-layer perceptrons are laid out in this book. A very interesting read, if a little too mathematical in places for some tastes.

THE MULTILAYER PERCEPTRON XOR Problem An initial approach would be to use more than one perceptron, each set up to identify small, linearly separable sections of the inputs, then combining their outputs into another perceptron, which would produce a final indication of the class to which the input belongs.

Fig. 16. Combining perceptrons can solve XOR problem Perceptron 1 detects when the pattern corresponding to (0,1) is present, and the other detect when (1,0) is there. Combined, these to facts allow perceptron 3 to classify the input correctly.

Note
The concept proposed seems fine on first examination but for

the following reasons we have to modify the perception model Each neuron in the structure takes the weighted sum of its inputs, thresholds it and outputs a one or a zero. For the perceptron in the first layer the inputs come from the actual inputs in the network, while the perceptrons in the second layer take as their inputs the outputs from the first layer. In consequence the perceptrons in the second layer do not know which of the real inputs were or not. Sine learning corresponds to strengthening the connections between active inputs and active units, it is impossible to strengthen the correct parts of the network, since the actual inputs are effectively masked off from the input units. The hard-limiting threshold function removes the information that is needed if the network is successfully learn - credit assignment problem. The solution
If we smooth the thresholding process out so that it more or

less turns on or off, as before, but has a sloping region in the middle that will give us some information on inputs we will be able to determine when we need to strengthen or weaken the relevant weights - the network will be able to learn.
1 1

0 Linear thershold

0 Sigmoidal threshold

Fig. 17. Two possible thresholding functions. THE NEW MODEL

The adapted perceptron units are arranged in layers, and so the

new model is naturally enough termed the multilayer perceptron.

Fig. 18. The multilayer perceptron: our new model.


Our model has three layers

an input layer. an output layer a hidden layer Each unit in the hidden layer and the output layer is like a perception unit, except that the thresholding function is the one shown in figure 17, the sigmoid function not the step function as before. The units in the input layer serve to distribute the values they receive to the next layer, and so do not perform a weighted sum or threshold. We are forced to alter our learning rule.

The New Learning Model


The learning rule for multilayer is called generalised delta

rule, or the backpropagation rule (Rumelhart, McClelland and Williams 1986, Parker 1982, Werbos 1974)
The operation of the network is similar to that of the single

layer perception.
The learning rule is a little bit complex than the previous one.

We need to define an error function that represents the difference between the networks current output and the correct output that we want it to produce. Because we need to know the correct pattern, this type of learning is known as supervised learning. In order to learn successfully we want continually reduce the value of the error function - this is achieved by adjusting the weights on the links between units. The generalised delta rule does it by calculating the value of the error function for that particular input, and next backpropagating the error from one layer to the previous one in order to adjust weights. For units actually on the output, their output and desired output is known, so adjusting the weights is relatively simple. For units in the middle layer, the adjustment is not obvious.

The Mathematics
Notation

Ep t pj o pj wij

-the error function for pattern p. -the target output for pattern p on node j. -the actual output for the pattern p at the j -the weight from node i to node j.

The error function

Ep =

1 t pj o pj 2 j

The activation of each unit j, for pattern p

net pj = wij o pi
i

The output from each unit j is the threshold function f j acting

on the weighted sum

o pj = f j net pj

The problem is to find weights to minimise the error function.

E p E p net pj = wij net pj wij


The second term in the above equation can be calculated

net pj = wij wij


since

w
k

kj o pk = k

w jk o pk =o pi wij

w jk = 0 except when k=i when it equals 1. wij

The change of error as a function of the change in the net inputs to a unit what gives

E p = pj , net pj

E p = pj o pi . wij

Decreasing the value of E p therefore means making the weight changes proportional to pj o pi p wij = pj o pi

We now need to know what pj is for each of the units - if we know this, then we can decrease E

pj =
Now considering

E p E p o pj = net pj o pj net pj

o pj = f j(net pj ) net pj
and

E p = (t pj o pj ) o pj

for outputs units we get

pi = f j(net pj )(t pj o pj )
If a unit is not an output one we can write, by the chain rule again

E p E p net pk E p = = o pj k net pk o pj k net pk o pj


and finally

w
i

ik

o pi = pk wik
i

pi = f j(net pj ) pk w jk
k

The two enclosed equations above represent the change in the error function with respect to the weights in the network.

The sigmoid function o pj = f (net ) = with f (net ) = and finally f (net ) = ko pj 1 o pj Note 1 + exp( k net ) k exp( k net ) = kf (net )(1 f (net )) 1 1 + exp( k net )

The error function is proportional to the errors pj in subsequent units. The error should be calculated in the output units first. Next the error should be passed back through the net to the earlier units to allow them to alter their connection weights. It is the passing back of this error value that leads to the network being referred to as back-propagation networks.

THE MULTILAYER PERCEPTION ALGORITHM 1. Initialise weights and thresholds to small random values. 2. Present input X p = x0 , x1 , x2 ,..., xn1 and target output Tp = t 0 , t1 , t 2 ,..., xm1 where n is the number of input nodes and m is the number of output nodes. Set w0 = , the bias, and x0 = 1 . For pattern association, X p and Tp represent the patterns to be associated. For classification, Tp is set to zero except for one element set to 1 that corresponds to the class that X p is in. 3. Calculate actual output Each layer calculates n1 y pj = f wi xi i =0 and passes that as an input to the next layer. The final layer outputs values o pj . 4. Adapt weights Start from the output layer, and work backwards.

wij (t + 1) = wij (t ) + pj o pj
wij represents the weights from node i to node j at time t, is a gain term and pj is an error term form pattern p on node j. For output units f (net ) = ko pj 1 o pj t pj o pj For hidden units

)(

pi = ko pj (1 o pj ) pk w jk
k

where sum is over k nodes in the layer above node j.

The XOR Problem Revisited The two layer net is shown in figure 19. is able to produce the correct output. The connection weights are shown on the links. The threshold of each unit is shown inside the unit.

0.5

output unit

+1

-2 1.5

+1 hidden unit

+1

+1

input

Fig. 19. A solution to the XOR problem.


Another solution to XOR problem is shown in figure 20.

-6.3

output unit

-4.2

-9.4 -4.2 -2.2 hidden unit

-6.4

-6.4

input

Fig. 20 Weights and thresholds of a network that has learnt to solve the XOR problem.

0.5

+1

-1

0.5 +1 +1 +1

1.5

+1

input

Fig. 21. An XOR-solving network with no direct input-output connections.

0.8

-4.5

5.3

-1.8

0.1 8.8 4.3 9.2

-2.0

input

Fig. 22. A stable solution that does not work.

You might also like