Outline: Slide Set 5: Neural Networks

Neural Network Overview Neural Network Overview
Neural Network Neural Network

Types of Neural Networks Types of Neural Networks
Perceptron Perceptron
Mult-Layer Perceptron (MLP) Mult-Layer Perceptron (MLP)
Backpropagation Backpropagation
Outline
CSI643: Machine Learning 1 Neural Network Overview

Slide Set 5: Neural Networks 2 Neural Network
3 Types of Neural Networks

Dr. G. Anderson
4 Perceptron
Department of Computer Science
University of Botswana
5 Mult-Layer Perceptron (MLP)
Semester 1 / 2017/2018
6 Backpropagation
UB CS ML UB CS ML

Introduction Similarities Between Biological and ANN
The human brain contains about 100 billion cells.

Artificial Neural Networks (ANN) inspired by the way The brain’s neurons connect to one another in a complex
biological neural systems work to process information. network.
There are a large number of highly interconnected The number of synapses for a typical neurons varies from
processing elements (neurons) working together to solve 1,000 to 10,000.
specific problems. Connections are created and strengthened to remember
ANN learning involves adjustments of the synaptic habits and skills in a human, such as playing a piano.
connections that exist between neurons. When a human stops performing that activity, the required
network becomes weak and can eventually disappear.
UB CS ML UB CS ML
Biology Biological Neural Network vs Artificial Neural Network
A neuron consists of a soma (cell body), axon (long fiber),

and dendrites.
Axons send signals.
Dendrite receives signals.
A synapse connects and axon to a dendrite.
Given a signal, a synapse might increase (excite) or
decrease (inhibit) electrical potential.
A neuron fires when its electrical potential reaches a
threshold.
UB CS ML UB CS ML

Artificial Neuron Main Components
A set of processing units (neurons or cells).

A state of activation Yi for every unit (its output).
Connections between units defined by weights wjk which
determines the effect which the signal of unit j has on unit
k , positive for excitation, negative for inhibition.
Propagation rule which determines the effective input Xi of
a unit from its external inputs.
UB CS ML UB CS ML
Main Components Neuron Types
An activation function f , which determines the new level of

activation based on the input Xi (t) and the current
activation Yi (t). Input units.
An external input (known as a bias), θ for each unit. hidden units.
A method for information gathering (the learning rule). Output units.
An environment within which the system must operate,
providing input signals and error signals.
UB CS ML UB CS ML

Neural Network Parallelism and Goal Neural Network Types
Layered Feed-Forward Network.

Many units can carry out their computations at the same
time, hence the system is inherently parallel. The Perceptron.
The neural network aims to train itself to achieve a balance Feed-forward Radial Basis Function (RBF) Network.
between correctly responding to patterns used in training Recurrent Networks:
and the ability to give reasonable responses for new inputs Simple Recurrent Network (SRN) Elman Style.
which are similar but not identical to those used for training. SRN Jordan Style.
Self Organizing Maps
UB CS ML UB CS ML
Layered Feed Forward Network The Perceptron
Characterized by a collection of input neurons.

This is followed by one or more hidden layers, then an
output layer. Single neuron.
There are no connections from neurons to neurons in Multiple inputs and single output.
previous layer, same layer, or neurons more than one layer
It has restricted processing capability.
ahead.
The data from input layer feeds into the next layer, and the
output of this layer feeds into the next layer, etc.
A network with a single layer is called a perceptron.
UB CS ML UB CS ML

Recurrent Networks Self Organizing Maps (Kohonen Networks)
Grid topology with unequal weights.

Outputs from a layer are fed into a layer below.
Topology provides for low dimensional visualization of data
Hopfield Neural Network has symmetric connections. It
distribution.
indicates a patternis recognized by echoing it back.
Used in applications which involve browsing of large
Simple Recurrent Network (SRN) Elman Style.
volume of data.
Simple Recurrent Network (SRN) Jordan Style.
Unsupervised learning.
UB CS ML UB CS ML
Perceptron Architecture Activation Functions
Sign: Activation(X)={0, ifX < θ, 1 if X ≥ θ}

Step: Activation(X)={−1, ifX < θ, 1 if X ≥ θ}
Sigmoid (logistic): Activation(X)= 1+e1 −x
UB CS ML UB CS ML

Activation Functions Perceptron Notation
Input Vector: X (t) = [X1 (t), X2 (t), ..., XN (t)]>

Synaptic Weights Vector: w(t) = [w1 (t), w2 (t), ..., wN (t)]> ,
where 0 ≤ wi ≤ 1, i = 1, ..., N
Threshold: θ(t)
Actual output: Y (t)
Desired output: Yd (t)
Learning Rate: α(t), 0 < α < 1
UB CS ML UB CS ML
Rosenblatt Perceptron Learning Algorithm (Perceptron Perceptron Rule Remarks

Rule)
1 Set t=1; ∆wi = α · Xi · e(t)
2 Initialize weights w1 , w2 , ..., wN to random numbers in the range [−0.5, 0.5]; So, wi (t + 1) = wi (t) + ∆wi
3 Initialize threshold θ to number in range [−0.5, 0.5];
4 repeat Provided data is linearly separable and small α is used, the
5 Activate the perceptron. Inputs are X1 (t), X2 (t), ..., XN (t) and desired rule is proved to classify all training examples correctly.
output Yd (t). P The Perceptron convergence theorem states that for any
6 Actual output is Y (t) = step[ N i=1 Xi (t)wi (t) − θ];
7 Calculate the error: e(t) = Yd (t) − Y (t);
data set which is linearly separable the Perceptron learning
8 Update the weights of the perceptron: wi (t + 1) = wi (t) + α · Xi · e(t); rule is guaranteed to find a solution in a finite number of
9 t=t+1; steps.
10 Go to line 5; An epoch is the presentation of entire training set to the
11 until Convergence;
neural network.
UB CS ML UB CS ML

Delta Rule Delta Rule
Instead of using output of threshold function, delta rule

used net output. Weights are updated according to the rule: wi = wi + ∆wi
This is used when the data is not linearly separable. ∆wi = −α · e w
0
(W )
i
The key idea is to use gradient descent search. P
P e (W ) = i (Yd − Yi ) · (−Xi )
0
The algorithm tries to minimize the error e = i (Ydi − Yi )2 α is the learning rate.
The sum goes over all training examples. P
∆wi = α · i (Yd − Yi ) · (−Xi )
Yi is the inner product wX and not sign(wX) as in the
Perceptron rule.
UB CS ML UB CS ML
Delta Rule Delta Rule
There are two differences between the Perceptron and the

Delta Rule: There are two main difficulties with the gradient descent
The Perceptron is based on the output from a step function method:
while the delta rule uses a linear combination of inputs Convergence to a minimum may take a long time.
directly. There is no guarantee we will find the global minimum.
The Perceptron is guaranteed to converge to a consistent Solutions to these are using momentum and random
hypothesis assuming the data is linearly separable. The perturbations to weight vectors.
delta rule converges but does not need the condition of
linear separability of data.
UB CS ML UB CS ML

Perceptron Limitations MLP Overview
Learning is efficient if weights are not very large. Perceptron can be successfully used for functions such as
AND and OR, but not XOR.
Attributes should be weighted independently.
There is therefore a need for a network of perceptrons: a
Can only learn lines and hyperplanes.
Multi-layer Percepton.
UB CS ML UB CS ML
Typical MLP Architecture MLP Architecture
Input layer.
One or more hidden layers.
Output layer.
Hidden units must use non-linear activation functions,
otherwise the whole network to one without hidden units.
An MLP can learn any continuous mapping with some
accuracy.
One hidden layer is sufficient for most applications.
UB CS ML UB CS ML

MLP Activation Function Backpropagation Learning Algorithm
This involves:
The feed-forward of the input training patterns.
The calculation and backpropagation of the associated
error.
The adjustment of the weights.
UB CS ML UB CS ML
Backpropagation Notation Backpropagation Notation

Input Vector: x = [x1 , x2 , ..., xi , ..., xn ]> Net Input for Hidden PLayer Neurons:
z_inputj = θhid_j + i xi vij , j = 1, ..., p
Target Output Vector: t = [t1 , t2 , ..., tk , ...tm ]>
Net Input for Output P Layer Neurons:
Neurons in Input Layer: X1 , X2 , ..., Xi , ..., Xn y _inputk = θout_k + j zj wjk , k = 1, ..., m
Neurons in Hidden Layer: Z1 , Z2 , ..., Zj , ..., Zp Output Signal (Activation) Hidden Layer: zj = f (z_inputj ), j = 1, ..., p
Output Signal (Activation) Output Layer:
Neurons in Output Layer: Y1 , Y2 , ..., Yk , ..., Ym
yk = f (y _inputk ), k = 1, ..., m
Threshold of neuron in Hidden Layer: θhid_j , j = 1, ..., p δk , k = 1, ..., m, portion of error correction weight adjustment for
Threshold of neuron in Output Layer: θout_k , k = 1, ..., m weights between neurons in hidden layer and output layer, wjk , due
to error at neurons in output layer(Yk for the weights wjk . The error
Weights between neurons in the Input Layer and Hidden
at neuron Yk is propagated back to the neurons in the hidden layer
Layer: vij , i = 1, ..., n, j = 1, ..., p that feed into Yk .
Weights between neurons in the Hidden Layer and Output δj , j = 1, ..., p, similar to the above.
Layer: wjk , j = 1, ..., p, k = 1, ..., m α learning rate.
UB CS ML UB CS ML

MLP Backpropagation Network Backpropagation Activation Function
Should be continuous, easily differentiable, and

monotonically non-decreasing.
Sigmoid function is commonly used:
1
f (x) = 1+e−x
f 0 (x) = f (x)[1 − f (x)]
UB CS ML UB CS ML
Backpropagation: 1. Randomly Initialize Weights Backpropagation: 2. Feedforward
The weights initialization influences the speed of the

Each unit Xi receives an input signal and broadcasts this
network in reaching the goal.
signal to the neurons in the hidden layer.
Large weights might lead to very small derivatives of
The neurons in the hidden layer then broadcast their
sigmoid functions, which slows learning.
outputs to the neurons in the output layer.
Too small values will cause net input to a neuron to be
Each output neuron compares its output to the target value
close to zero which slows learning.
to determine the associated error for that pattern with the
[−0.5, 0.5] is commonly used. neuron.
Formulas are also used e.g. (− 2.4 2.4
Fi , Fi ), where Fi is the The error then propagates back from layer to layer.
total number of inputs of neuron i in the network.
UB CS ML UB CS ML

Backpropagation: 3. Backpropagation Backpropagation: 4. Weights Update
For each neuron Yk in the output layer, the term δk is

The adjustment of the weights wjk ,
computed based on the associated error. δk is used to
j = 1, 2, ..., p, k = 1, 2, ..., m between neurons in the hidden
distribute the error at output neuron Yk back to all the
layer and output layer are modified based on the term δk
neurons in the hidden layer which are connected with the
and the activation zj , j = 1, 2, ..., m of the neurons in the
neuron Yk . It is also used to update the weights between
hidden layer.
the hidden layer and the output layer.
The adjustment of the weights vij ,
For each neuron Zj in the hidden layer, the term δj is
i = 1, 2, ..., n, j = 1, 2, ..., p, between neurons in the input
computed. δj is used to update the weights between the
layer and neurons in the hidden layer are modified based
neuron in the input and hidden layer. In our case, since we
on the term δj and the activation xj , i = 1, 2, ..., n of the
only have one hidden layer, it is not necessary to
neurons in the input layer.
propagate the error back to the input layer.
UB CS ML UB CS ML
Backpropagation: Neurons in Output Layer Backpropagation: Neurons in Hidden Layer
1 foreach Neuron Yk do 1 foreach NeuronPZj do

2 δk = (tk − yk ) · f 0 (y _inputk ); 2 δ_inputsj = m k =1 δk wjk ;
3 ∆wjk = α · δk · zj ; 3 δj = δ_inputsj · f 0 (z_inputj );
4 ∆θout_k = α · δk ; 4 ∆vjk = α · δj · xi ;
5 Send δk to units in the layer below.; 5 ∆θhid_j = α · δj ;
6 end 6 end
UB CS ML UB CS ML

Weights Update: Neurons in Output Layer Weights Update: Neurons in Hidden Layer
1 foreach Neuron Yk do 1 foreach Neuron Zj do

2 wjk = wjk + ∆wjk , j = 1, ..., p; 2 vij = vij + ∆vij , i = 1, ..., n;
3 Θoutk = Θoutk + ∆Θoutk ; 3 Θhidj = Θhidj + ∆Θhidj ;
4 end 4 end
UB CS ML UB CS ML
Backpropagation Comments Relationship Between Dataset, Number of Weights,

and Classification Accuracy
One cycle through the entire set of training vectors is
called an epoch. Usually, several, even many, epochs are
required to train a backpropagation neural network.
Weights are updated after each training pattern is P: Number of patterns.
presented. Another approach is to update the weights W: Number of weights to be trained.
cumulated over an entire epoch. A: Accuracy of classification expected.
A common stopping condition is when the total squared
If there are enough training patterns, the network will be
error reaches a minimum, however this might not be
able to classify unknown training patterns correctly.
efficient.
One can rather divide the training data into a training set, P=W A . If A is 0.1, a network with 10 weights will require
and a validation set. Training continues until the error in the 100 training patterns.
validation set reaches a minimum, just before it starts
rising again.
UB CS ML UB CS ML

Improving the Efficiency of Backpropagation Learning Improving the Efficiency of Backpropagation Learning
If the weights are adjusted to very large values, the total input Momentum can be factored into the weight update
of a neuron can reach very high values, and because of the equation, such that when some data very different from the
sigmoid activation function, the neuron will have an activation majority of training data is encountered, a small learning
very close to zero or one. rate will be used, in order not to disrupt the progress.
Gradient descent or other optimization functions used can get For momentum to be used, the weights from one or more
stuck in local minima, when deeper minima are close by. previous training patterns must be preserved.
Probabilistic methods can help to avoid this trap but can be Large weight adjustments are made as long as the
slow. corrections are in the same general direction for several
The number of hidden units can be increased, leading to patterns.
higher dimensionality of the error space, and a smaller chance The network with momentum proceeds in the direction of a
of getting trapped, but after some number of hidden units, combination of the current gradient and the previous
there is again a high chance of getting trapped in local minima. direction of the weight correction, instead of only
proceeding in the direction of the gradient.
UB CS ML UB CS ML
Neural Network Overview
Neural Network
Types of Neural Networks
Perceptron
Mult-Layer Perceptron (MLP)
Backpropagation
Improving the Efficiency of Backpropagation Learning

The delta-bar-delta update algorithm allows each weight to
have its own learning rate.
The learning rates also vary with time as training
progresses.
If the weight change is in the same direction for several
time steps, then the learning rate for that weight should be
increased.
If the direction of the weight change alternates, the
learning rate should be decreased.
The weight change will be in the same direction if the
partial derivative of the error with respect to that weight has
the same sign for several time steps.
UB CS ML

Outline: Slide Set 5: Neural Networks

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Outline: Slide Set 5: Neural Networks

Uploaded by

Copyright:

Available Formats

Neural Network Overview Neural Network Overview

Neural Network Neural Network

CSI643: Machine Learning 1 Neural Network Overview

3 Types of Neural Networks

Neural Network Overview Neural Network Overview

Introduction Similarities Between Biological and ANN

The human brain contains about 100 billion cells.

Biology Biological Neural Network vs Artificial Neural Network

A neuron consists of a soma (cell body), axon (long fiber),

Neural Network Overview Neural Network Overview

Artificial Neuron Main Components

A set of processing units (neurons or cells).

Main Components Neuron Types

An activation function f , which determines the new level of

Neural Network Overview Neural Network Overview

Neural Network Parallelism and Goal Neural Network Types

Layered Feed-Forward Network.

Layered Feed Forward Network The Perceptron

Characterized by a collection of input neurons.

Neural Network Overview Neural Network Overview

Recurrent Networks Self Organizing Maps (Kohonen Networks)

Grid topology with unequal weights.

Perceptron Architecture Activation Functions

Sign: Activation(X)={0, ifX < θ, 1 if X ≥ θ}

Neural Network Overview Neural Network Overview

Activation Functions Perceptron Notation

Input Vector: X (t) = [X1 (t), X2 (t), ..., XN (t)]>

Rosenblatt Perceptron Learning Algorithm (Perceptron Perceptron Rule Remarks

Neural Network Overview Neural Network Overview

Delta Rule Delta Rule

Instead of using output of threshold function, delta rule

Delta Rule Delta Rule

There are two differences between the Perceptron and the

Neural Network Overview Neural Network Overview

Perceptron Limitations MLP Overview

Typical MLP Architecture MLP Architecture

Neural Network Overview Neural Network Overview

MLP Activation Function Backpropagation Learning Algorithm

Backpropagation Notation Backpropagation Notation

Neural Network Overview Neural Network Overview

MLP Backpropagation Network Backpropagation Activation Function

Should be continuous, easily differentiable, and

Backpropagation: 1. Randomly Initialize Weights Backpropagation: 2. Feedforward

The weights initialization influences the speed of the

Neural Network Overview Neural Network Overview

Backpropagation: 3. Backpropagation Backpropagation: 4. Weights Update

For each neuron Yk in the output layer, the term δk is

Backpropagation: Neurons in Output Layer Backpropagation: Neurons in Hidden Layer

1 foreach Neuron Yk do 1 foreach NeuronPZj do

Neural Network Overview Neural Network Overview

1 foreach Neuron Yk do 1 foreach Neuron Zj do

Backpropagation Comments Relationship Between Dataset, Number of Weights,

Neural Network Overview Neural Network Overview

Improving the Efficiency of Backpropagation Learning

You might also like