You are on page 1of 80

Artificial Neural Network

Rama Mehta, Scientist


rama@nih.ernet.in

It is a system based on the operation of biological


neural networks.

The neural networks have the ability to learn by


past experiences, which makes them very flexible
and powerful.

These networks are also well suitable for real time


systems because of their fast response and
computational times.

The human brain is a wonderful processor. Its exact


workings are silent a mystery.

The most basic element of the human brain is a


specific type of cell, known as neuron, which
doesnt regenerate.

The human brain comprises about 100 billion


neurons. Each neuron can connect with 0 to
200,000 other neurons, although 1,000-10,000
interconnections are typical.

The power of the human mind comes


from the pure or absolute numbers of
neurons and their multiple
interconnections.
It also comes from genetic programming
and learning.
There are multiple interconnections,
more than 100 different classes of
neurons.
The individual neurons are complicated.
Together these neurons and their
connections form a process which is not
binary, not stable, and not synchronous.

Adaptive learning: An ANN is endowed with the ability


to learn how to do tasks based on the data given for
training or initial experience.
Self organization: An ANN can create its own
organization or representation of the information it
receives during learning time.
Real time operation: ANN computations may be
carried out in parallel. Special hardware devices are
being designed and manufactured to take advantage of
this capability of ANNs.
Fault tolerance via redundant information coding:
Partial destruction of a neural network leads to the
corresponding degradation of performance. However,
some network capabilities may be retained even after
major network damage.

Adaptively.
Evidential response.
Contextual information.
VLSI (very large scale integrated) implements
ability.
Neurobiological analogy.
A neural network can perform tasks that a linear
program can not
When an element of the neural network fails, it can
continue without any problem.
A neural network learns and does not need to be
reprogrammed.
It can be implemented in any application without
any problem.

The neural network needs training to operate.

The architecture of a neural network is different


from the architecture of microprocessors
therefore needs to be emulated.

Requires high processing time for large neural


networks.

Input layer
of
source nodes

Output layer
of
neurons

NN 1

Neural Networks

3-4-2 Network

Output
layer

Input
layer

Hidden Layer

The models of ANN are specified by the three basic


entities namely:
i).
The models synaptic interconnections;
ii). The training or learning rules adopted for
updating and adjusting the connection weights;
iii).

Their activation functions.

An ANN consists a set of highly


interconnected processing elements
(neurons) such that each processing element
output is found to be connected through
weights to the other processing elements or
to itself.
the arrangements of these processing
elements and the geometry of their
interconnections are essential for an ANN.

Bias
b
x1

w1

Input x2
signal

w2

xm

wm

Synaptic
weights

Local
Field
v

Summing
function

Activation
function

()

Output
y

Single and Multi-layered perceptron

Five more basic types of neuron


connection architectures are existing
as:

single layer feed forward network;


multilayer feed forward network;
single node with its own feedback;
single layer recurrent network;
multilayer recurrent network.

Recurrent Network with hidden neuron(s): unit


delay operator z-1 implies dynamic system

z-1

z-1

z-1

input
hidden
output

strengths of the connections can be


set by the weights explicitly using
the priori knowledge.

Otherwise system can be trained


by feeding it teaching patterns and
letting it change its weights
according to some learning rule.

Parameter learning: It updates the


connecting weights in a neural net.

Structure learning: It focuses on the


change in network structure

These can performed simultaneously or separately.

Supervised learning;

Unsupervised learning;

Reinforcement learning.

The activation function acts as a squashing


function, such that the output of a neuron in a
neural network is between certain values
(usually 0 and 1, or -1 and 1).
When s signal is fed through a multilayer
network with linear activation function, the
output obtained remains same as that could
be obtained using a single layer network. Due
to this reason, nonlinear functions are widely
used in multilayer networks compare to linear
functions.

There are several linear activation functions as:


Identity function:
It is linear function and can
be defined as
f(x) = x for all x
The input layer uses the identity activation function
for single layer network and output remains the
same as input.
Binary step function:

Where theta represents the Threshold value. This


function is most widely used in single layer nets to
convert the net input to an output that is a binary
(1 or 0). It is also known as threshold function.
Bipolar step function:

Where theta represents the Threshold value. This


function is also used in single layer nets to convert
the net input to an output that is bipolar (+1 or
-1).

Sigmoidal function:

Tanh: Hard non-linearity

Signum and

Step

Feed-forward neural networks:


The data processing can extend over multiple (layers
of) units,
but no feedback connections are present, that is,
connections extending from outputs of units to inputs
of units in the same layer or previous layers.
Recurrent neural networks
that do contain feedback connections.
In some cases, the activation values of the units
undergo a relaxation process such that the neural
network will evolve to a stable state in which these
activations do not change anymore.
In other applications, the change of the activation
values of the output neurons are significant, such that
the dynamical behaviour constitutes the output of the
neural network

Each unit performs a relatively simple job:

receive input from neighbours or external


sources and use this to compute an
output signal which is propagated to
other units.
Apart from this processing, a second task
is the adjustment of the weights.
The system is inherently parallel in the
sense that many units can carry out their
computations at the same time.

Within neural systems it is useful to distinguish three


types of units:
input units which receive data from outside the
neural network,
output units which send data out of the neural
network, and
hidden units whose input and output signals remain
within the neural network.
During operation, units can be updated either
synchronously or asynchronously.
With synchronous updating, all units update their
activation simultaneously;
with asynchronous updating, each unit has a (usually
fixed) probability of updating its activation at a
time t (ONE UNIT AT ONE TIME).

Weights:

Where
is the weight vector of
processing element and
is the weight from
processing element i (source node) to processing
element j (destination node).

Bias

The bias included in the network has its impact in


calculating the net input.
The bias is included by adding a component x0 =1
to the input vector X . Thus, the input vector
becomes
X = (1,X1, .. X2, ,Xn )

Bias as Input

v wj xj
j 0

w0 b

Bias is an external parameter of the neuron.


Can be modeled by adding an extra input.

x0 = +1
x1

Input
signal

xm

w0 b

w0

w2

w x
j 0

w1

x2

Summing
function

wm

Local
Field

Synaptic
weights

Activation
function

()

Output
y

(i)

positive bias and

(ii) negative bias.

The positive bias helps in increasing the net input


of the network and the negative bias helps in
decreasing the net input of the network.

For each and every application, there is a


threshold limit. The activation function using
threshold can be defined as

Where theta is the fixed threshold value.

Learncon Conscience bias learning function


Learngd
Gradient descent weight and bias learning function
Learngdm Gradient descent with momentum weight and bias
learning function
Learnh
Hebb weight learning rule
Learnhd
Hebb with decay weight learning rule
Learnis
Instar weight learning function
Learnk
Kohonen weight learning function
Learnlv1
LVQ1 weight learning function
Learnlv2
LVQ2.1 weight learning function
Learnos
Outstar weight learning function
Learnp
Perceptron weight and bias learning function
Learnpn
Normalized perceptron weight and bias learning
function
Learnsom Self-organizing map weight learning function
Learnsomb Batch self-organizing map weight learning function
Learnwh
Widrow-Hoff weight/bias learning function

The learning rate is denoted by It is used


to control the amount of weight adjustment
at each step of training. The learning rate,
raging from 0 to 1, determines the rate of
learning at each time step.

Compet
Hardlim
Hardlims
Logsig
netinv
Poslin
Purelin
Radbas
Radbasn
Satlin
Satlins
Softmax
Tansig
Tribas

Competitive transfer function


Hard-limit transfer function
Symmetric hard-limit transfer function
Log-sigmoid transfer function
Inverse transfer function
Positive linear transfer function
Linear transfer function
Radial basis transfer function
Normalized radial basis transfer function
Saturating linear transfer function
Symmetric saturating linear transfer function
Soft max transfer function
Hyperbolic tangent sigmoid transfer function
Triangular basis transfer function

Example:
Code to create a plot of the hardlim transfer
function:
n = -5:0.1:5;
a = hardlim(n);
plot(n,a)
Assign this transfer function to layer i of a network
as:
net.layers{i}.transferFcn = 'hardlim';
Algorithms
hardlim(n) = 1 if n 0
0 otherwise

a = purelin(n)

Examples
code to create a plot of the purelin transfer function.
n = -5:0.1:5;
a = purelin(n);
plot(n,a)
Assign this transfer function to layer i of a network by
net.layers{i}.transferFcn = 'purelin';
Algorithms
a = purelin(n) = n

a= logsig(n)

Examples
Here is the code to create a plot of the logsig
transfer function.

n = -5:0.1:5;
a = logsig(n);
plot(n,a)
Assign this transfer function to layer i of a network.
net.layers{i}.transferFcn = 'logsig';
Algorithms
logsig(n) = 1 / (1 + exp(-n))

Examples

code to create a plot of the tansig transfer function


is:
n = -5:0.1:5;
a = tansig(n);
plot(n,a)
Assign this transfer function to layer i of a network.
net.layers{i}.transferFcn = 'tansig';
Algorithms
a = tansig(n) = 2/(1+exp(-2*n))-1

Train
Train neural network
Trainb
Batch training with weight and bias
learning rules
Trainbfg BFGS quasi-Newton backpropagation
Trainbfgc BFGS quasi-Newton backpropagation
for use with NN model reference adaptive
controller
Trainbr Bayesian regulation backpropagation
Trainbu Batch unsupervised weight/bias
training

Trainc
Cyclical order weight/bias training
Traincgb Conjugate gradient backpropagation
with Powell-Beale restarts
Traincgf Conjugate gradient backpropagation
with Fletcher-Reeves updates
Traincgp Conjugate gradient backpropagation
with Polak-Ribire updates
Traingd Gradient descent backpropagation
Traingda Gradient descent with adaptive learning
rate backpropagation
Traingdm Gradient descent with momentum
backpropagation
Traingdx Gradient descent with momentum and
adaptive learning rate backpropagation

Trainlm Levenberg-Marquardt
backpropagation
Trainoss One-step secant backpropagation
Trainr Random order incremental
training with learning functions
Trainrp Resilient backpropagation
Trainru Unsupervised random order
weight/bias training
Trains Sequential order incremental
training with learning functions
Trainscg Scaled conjugate gradient
backpropagation

Training stops when any of these


conditions occurs:

The maximum number of epochs


(repetitions) is reached.
The maximum amount of time is
exceeded.
Performance is minimized to the goal.
The performance gradient falls below
min_grad.
mu exceeds mu_max.

Back Propagation is a systematic method of training


multilayer artificial neural networks.

Back Propagation algorithm is a generalization of the


Widrow-Hoff correction rule given as
w1=T-w2x2/x1 and w2=T-w1x1/x2.

For a typical neuron with inputs xi, weights Wi. The


summation of the weighted inputs I is given by

Sigmoidal function has been used as the nonlinear


activation function as:

This sigmoidal function is a logistic function which


monotonically increases from a lower limit (0 or -1)
to an upper limit (+1) as increase. The values vary
between 0 and 1, with a value of 0.5 when I is
zero.

%=======================================================================%

% CODE FOR climate change in Allahabad,

%========================================================================%

Date June 21, 2011

clear all
clc

t1 = cputime;
% Loading file and assigning data accordingly
data = xlsread('annallahabad.xls');
cali = data(2:150,:); % calibratoin data
vali= data(151:end,:); % validation data
% input files to the network
caliin = cali(:,1:5);
caliout = cali(:,6);
valiin = vali(:,1:5);
valiout = vali(:,6);

% FF neural network
C=[];
n=3; % three hidden neurons
net1=newff([0 18.82; 0 650.92; 9.18 34.13; 0
79.08], [n 1], {'logsig', 'purelin'}, 'trainbr');
net.IW{1,1} = [0.01 0.01 0.01 0.01 0.01 ];
net.b{1} = 0;
net.inputweight{1,1}.learnFcn = 'learngd';
net,layerweights{3,1}.learnparam.lr=0.1
net1.trainParam.epochs=500;
net1.trainParam.goal=0.001;
net1.performFcn='msereg';
net1=train(net1, caliin', caliout');

18.82; 1.34

model_output = sim(net1,valiin');
net.IW{1,1}
net.b{1}
net1.IW{1,1}
net1.b{1}
model_output'
error = abs(valiout-model_output');
a=error.*error
b=sum(a)
rmse=sqrt(b/150)
%RMSE1=sqrt((sum(square(error)))/150)
%RMSE=((sum(square(error)))/150)

figure, plot(valiout) % plots your actual validation output in


blue line
hold on
plot(model_output,'r') % plots your model output in red
line
hold off
legend('observed PET', 'ANN model_output for PET')
ylabel('PET'); xlabel('No. of observations')
figure, plot(error) % plots your variation in error
ylabel('Error'); xlabel('No. of observations')
f=getx(net1); % to store your network parameters for
future use, if any

Aerospace
Automotive
Banking
Defense
Electronics
Entertainment
Financial
Insurance
Manufacturing
Medical
Oil and Gas
Robotics
Speech
Securities
Telecommunications
Transportation

You might also like