You are on page 1of 8

Artificial Neural Networks

Weiyu Yi 339229
October 7, 2005

1 Brief Introduction
Artificial neural networks(ANNs) provides a general pratical method for real-
valued, discrete-valued, and vector-valued functions from examples. the Al-
gorithm backpropagation which is widely used in ANNs, uses gradient
descent to tune network parameters to best fit a training set of input-output
pairs.

1.1 History
1.2 Review to DTree
There are many functions, which can be represented by ANN. Compared to
the ID3 Algorithm in decision tree, which can only represent discrete func-
tion, the ANN can a much complexer continuous function express. From
coming sections we can see, ANN represents a continuous function approxi-
mated with small error, and during the whole learn prosess, we can simply
say, the ANN is updating all the values of its units in order as much as possi-
ble to reduce the error, which means the different between the output value
of ANN and the target attribute.

1.3 Structure
In order to give a primitive impress on ANNs and its work mechanism, here
shows an example. This example of ANNs learning is provided by Pomer-
luau’s(1993) system ALVINN, which uses a learned ANN to steer an au-
tonomous vehicle driveing at normal speeds on public highwarys. The input
of ANNs is a 30x32 grid of pixel intensities obtained from forward-faced cam-
era mounted on the vehicle. The output is the direction in which the vehicle
is steered. The ANN is trained to steer the vehicle after at first a human

1
30 Output Units

Left
o1
4 Hidden Units
30X32 Sensor Input Retina
h1 o2

h2 o3

h3

h4

o30 Right

Figure 1: an example

steers it some time. In Figure 1.3 shows the main structure of this ANN.
In the figure 1.3, each node(circle) is a single network unit, and the lines
entering the node from below are its inputs. As can be seen, four units re-
ceive inputs directly from all of the 30X32 pixels from the camera in vehicle.
These are called ”hidden” units because their outputs are only available to
the coming units in the network, but not as a part of the global network.
The second layer composed of 30 units, which use the outputs from hidden
units as their inputs, are called ”output” units. In this example, each output
unit corresponds to a particular steering direction.
This example shows a simple structure of the ANN, more exactly this is
a feedforward network, which is one of the commen artificial network, The
feedforward network is composed of a set of nodes and connections arranged
in layers. The connections are typically formed by connecting each of the
nodes in a given layer to all of the neurons in the next layer. In this way
every node in a given layer is connected to every other node in the next layer.
In this example typically there are three layers(at least) to a feedforward
network - an input layer, a hidden layer, and an output layer. The input layer
does no processing - it is simply at which the data vector (one dimensional
data array) is fed into the network. The input layer then feeds into the
hidden layer. The hidden layer, in turn, feeds into the output layer. The
actual processing in the network occurs in the nodes of the hidden layer and
the output layer.

2
1.4 Ability
In this example, the ANN is trained with the algorithm backpropagation.
generally the ANN is appropriate for problems with following characteristics:

1. Instances are represented by many attribute-value pairs. The target


function to be learned is defined over instances that can be descr-
ribed by a vector of predefined feature. such as the pixel values in
the ALVINN example.

2. The target function output may be discrete, real-valued, or a vector of


several real- or discrete-valued attributes.

3. The training examples may contain errors. In following sections we can


see, that ANN learning methods are quite robust to noise in training
data.

4. Long training times are acceptable. Compared to decition tree learing,


network training algorithm requires longer training time, depending on
factors such as the number of the weights in network.

5. Fast evalution of the learned target function may be required.

6. The ability of humans to understand teh learned target function is not


important.

2 Backpropagation Algorithm
2.1 Sigmoid
In the ANN, each node is a sigmoid unit. Like the perceptron, the sigmoid
unit first computes a linear combination of its inputs,

~o′ = w
~ · ~x

then the sigmoid unit computes its output with the following function,

~o = sgn(w
~ · ~x) (1)

where
1
sgn(y) = (2)
1 + e−y
This equation 2 is often referred to as the squashing function since it maps a
very large input domain to a small range of output. In addition, this sigmoid

3
Output Input

w w w

input attributes target attributes


vector x vector t

Figure 2: Main Structure

function has a useful property that its derivative is easily expressed in terms
of its output.
dsgn(y)
= sgn(y) · (1 − sgn(y))
dy
In the following description of the backpropagation we can see, the algo-
rithm makes use of this derivative.

2.2 Function
the sigmoid is only one unit in the network, now we take a look at the whole
function, which the neural network calculates. There is a figure 2.2, if we
consider an example (~x, ~t), where ~x is called input attribute and ~t is called
target attribute, than:
n x
i if i is input unit
oi = P
sgn( j→i wj→ioj ) otherwise

2.3 Squared Error


Above it has mentioned, that the whole learning process is in order to reduce
the error, but how can man error describe? Generally the function squared

4
E

w2

w1

Figure 3: Hypothesis surface

error is used,
1X X
~ :=
E(w) (tkd − okd )2 (3)
2 d∈D k∈outputs

Notice: this function 3 sums all the error over all of the networks output
units after a whole set of training examples has been computed. then the
value-vector can be updated by:

~ new = w
w ~ old − ∇E(w)
~ (4)

where ∇E(w)
~ is called gradient of E
 
∂E ∂E ∂E
∇E(w)~ = , ,... (5)
∂w0 ∂w1 ∂wn

so for each value k can be updated by:

wk = wk + △wk (6)

where
∂E
△wk = −η (7)
∂wk

5
2.4 backpropagation Algorithm
The learning prblem faced by backpropagation is to search a large hy-
pothesis space defined by all possible weight values for all the units in the
network. The diagram of Algorithm is:

backpropagation((~x, ~t), η, nin , nout , nhidden )


(~x, ~t) for each instanze from training examples, where ~x is the vector of
network input values, and ~t is the vector of target network output
values.
η the learning rate.
nin , nout , nhidden
notice The input from unit i into unit j is denoted xji , and the weight from
unit i to unit j is denoted wji
• Create a ANN with nin input, nout output, and nhidden hidden units.
• Initialize all network weights to small random numbers(e.g. [−0.5, 0.5]).
• Until the termination conditions is met, Do:
– For each (~x, ~t) in the training examples, Do:
1. Input the instance ~x to the network and compute the output
ou of every unit u inthe network.
2. For each network output unit k, calculate its error terms σk
σk = ok (1 − ok )(tk − ok ) (8)
3. For each hidden unit k, calculate its error term σh
X
σh = oh (1 − oh ) (wkh · σk ) (9)
k∈outputs of h

4. update each network weight wji


wji = wji + △wji
where
△wji = ησj xji
Notice: the error term for hidden unit h is calculated by summing the error
terms σk for each output unit influenced by unit h, weighting each of the
σk ’s by wkh ,the weight from hidden unit h to output unit k. This weight
characterizes the degree to which hidden unit h is ”responsible for” the error
in output unit k. With this figure 2.4 can man more easily understand.

6
k1 k2 k3 km

W(k3, h)
W(km, h)
W(k2, h)
W(k1, h)

Figure 4: for hidden unit

Inputs Outputs

Hiddens

Figure 5: An identical function

3 A Simple Illustration
Now we make an example to give a more inductive knowledge. How does
ANN learn the simpliest function, a identical function id. We construct the
network shown in figure 4. There are eight network input units, which are
connected to three hidden units, which are in turn connected to eight output
units. Because of this structure, the three hidden units will be forced to
represent the eight input values in some way that captures their relevant
features, so that this hidden layer representation can be used by the output
units to compute the correct target values.

7
Input Hidden Values Output
10000000 → .89 .04 .08 → 10000000
01000000 → .15 .99 .99 → 01000000
00100000 → .01 .97 .27 → 00100000
00010000 → .99 .97 .71 → 00010000
00001000 → .03 .05 .02 → 00001000
00000100 → .01 .11 .88 → 00000100
00000010 → .80 .01 .98 → 00000010
00000001 → .60 .94 .01 → 00000001

This 8 x 3 x 8 network was trained to learn the identity function. After 5000
training times, the three hidden unit values encode the eight distinct inputs
using the encoding shown in the talular. Notice if the encoded values are
rounded to zero or one, the result is the standard binary encoding for eight
distinct values.

You might also like