Professional Documents
Culture Documents
(MH1202)
Karthik Murali Madhavan Rathai
Assistant professor
Department of Mechatronics engineering
SRM University, Kattankulathur
Room no. H316
E-mail – karthikmr1991@gmail.com
Phone – +91-9840291486
Website – https://sites.google.com/site/karthikmr091/
Syllabus for ANN in MH1202
Mathematical prerequisite
• Linear algebra (Matrices, vectors, null
space, rank…etc)
• Optimization (Convex/non-convex,
gradient free methods…etc)
• Probability & statistics (Distributions, data
fitting, data analysis…etc)
• Functional analysis (Kernels,
Hilbert/Banach spaces…etc)
• Calculus (Basic multivariate
calculus,Integration/differentiation..etc)
Artificial neural network(ANN)
• ANNs are nonlinear information (signal) processing devices, which are built
from interconnected elementary processing devices called neurons.
• ANNs are inspired by the way biological nervous systems, e.i. the brain,
process information. The key element of this paradigm is the novel
structure of the information processing system.
• A neural network is a massively parallel-distributed processor that has a
natural propensity for storing experimental knowledge and making it
available for use. It resembles the brain in two respects:
Knowledge is acquired by the network through a learning process
Inter-neuron connection strengths known as synaptic weights are used to stor the
knowledge
Artificial neural network(ANN)
• An artificial neuron is characterized by
Architecture (connection between neurons)
Training or learning (determining weights on the connections)
Activation function
• Example Input layer
Output layer
x1 w1 (weights)
(Input) y
(Output)
x2 w2 (weights)
Why artificial neural networks (ANNs)?
• The long course of evolution has given the human brain many
desirable characteristics not present in Von Neumann or modern
parallel computers, which include
Massive parallelism
Distributed representation and computation
Learning ability
Generalization ability
Adaptivity
Inherent contextual information processing
Fault tolerance
Low energy consumption
NNs vs Computers
Digital Computers Neural Networks
o Deductive Reasoning - We apply known rules to o Inductive Reasoning - Given input and output data
input data to produce output. (training examples), we construct the rules.
o Computation is centralized, synchronous, and o Computation is collective, asynchronous, and
serial. parallel.
o Memory is packetted, literally stored, and location o Memory is distributed, internalized, short term and
addressable. content addressable
o Not fault tolerant. One transistor goes and it no o Fault tolerant, redundancy, and sharing of
longer works. responsibilities.
o Exact. o Inexact.
o Static connectivity. o Dynamic connectivity.
o Applicable if well defined rules with precise input o Applicable if rules are unknown or complicated, or
data. if data are noisy or partial.
Other key advantages
• Adaptive learning – An ability to learn how to do tasks based on the
data given for training or initial experience.
• Self-organization – An ANN can create its own organization or
representation of the information it receives during learning time.
• Real-time operation – ANN computations may be carried out in
parallel, using special hardware devices designed and manufactured
to take advantages of this capability.
• Fault tolerance via redundant information coding – Partial destruction
of a network leads to corresponding degradation of performance.
However, some network capabilities may be retained even after
majoe network damage due to this feature.
Historical tour on ANN
• 1943 – McCulloh & Pitts: Start of the modern era of neural networks
This forms a logical calculus of neural networks. A network consists of
sufficient number of neurons and properly set synaptic connections can
compute any computable function. A simple logic function is performed by a
neuron in this neuron in this case based upon the weights set in the
McCulloch-Pitts neuron. The arrangement of neuron in this case may be
represented as a combination of logic functions. The most important feature
of this neuron is the concept of threshold
Historical tour on ANN
• 1949 – Hebb’s book “The organization of behavior”
An explicit statement of a physiological learning rule for synaptic
modification was presented for the first time. Hebb proposed that the
connectivity of the brain is continually changing as an organization learns
differing functional tasks and that neural assemblies are created by such
changes.
The concept of Hebb’s theory theory in simple words – “If two neurons are
found to be active simultaneously the strength of connection between the
two neurons should be increased”
Historical tour on ANN
• 1958 – Rosenblatt introduces Perceptron
In Perceptron network the weights on the connection paths can be adjusted.
A method of iterative weights adjustment can be used in the Perceptron net.
The Perceptron net is found to converge if the weights obtained allow the net
to reproduce exactly all the training input and target output vector pairs.
Historical tour on ANN
w2
Cell Body Neurons p2 a
w3 f Output
Dendrite Weights or interconnections p3
Output Signals
Input Signals
i j
Hebbian learning
• Using Hebb’s Law we can express the adjustment applied to the
weight 𝑤𝑖𝑗 at iteration p in the following form:
w ij ( p ) F [ y j ( p ), x i ( p ) ]
• As a special case, we can represent Hebb’s Law as follows:
wij ( p) a y j ( p) xi ( p )
wij ( p) y j ( p) xi ( p) y j ( p) wij ( p)
• where is the 𝜑 forgetting factor.
• Forgetting factor usually falls in the interval between 0 and 1, typically between 0.01 and 0.1, to
allow only a little “forgetting” while limiting the weight growth.
• Oja’s rule - Hebbian learning rule has a severe problem - there is nothing there to stop the
connections from growing all the time, finally leading to very large values. There should be another
term to balance this growth. In many neuron models, another term representing "forgetting" has
been used: the value of the weight itself should be subtracted from the right hand side. The central
idea in the Oja learning rule is to make this forgetting term proportional, not only to the value of
the weight, but also to the square of the output of the neuron. The Oja rule reads
Hebbian learning algorithm
• Step 1: Initialization - Set initial synaptic weights and thresholds to
small random values, say in an interval [0, 1 ].
• Step 2: Activation - Compute the neuron output at iteration p
n
y j ( p) xi ( p) wij ( p) j
i 1
• where n is the number of neuron inputs, and qj is the threshold value
of neuron j.
Hebbian learning
• Step 3: Learning - Update the weights in the network:
• Where Δ𝑤𝑖𝑗 (𝑝) is the weight correction at iteration p.
• The weight correction is determined by the generalized activity
product rule:
0 2 0 y2 0 2 1 y2
x2 2 x 2
x3 0 3 3
0 y3 x 0 3 3
0 y3
0 4 0 y4 0 4 0 y4
x4 4 x 4
1 5 1 y5 1 5 1 y5
x5 5 x 5
Input layer Output layer Input layer Output layer
Initial and final weight matrices
O u t pu t l a yer O u t pu t l a yer
1 2 3 4 5 1 2 3 4 5
1 1 0 0 0 0 1 0 0 0 00
0 1 0 0 0
2
2
0 2.0204 0 0 2.0204
3 0 0 1 0 0 3 0 0 1.0200 0 0
4
0 0 0 1 0 4 0 0 00 .9996 0
5 0 0 0 0 1 5 0 2.0204 0 0 2.0204
Testing
• A test input vector, or probe, is defined as
1
0
X 0
0
1
• where ∆𝐸𝑘 is energy change of the machine resulting from such a flip (flip
from state 𝑥𝑘 to state –𝑥𝑘 )
• If this rule is applied repeatedly, the machine reaches thermal equilibrium
(note that T is a pseudo-temperature).
• Two modes of operation
Clamped condition : visible neurons are clamped onto specific states determined by
environment (i.e. under the influence of training set).
Free-running condition: all neurons (visible and hidden) are allowed to operate freely
(i.e. with no envir. input)
Boltzman Machine operation
• Such a network can be used for pattern completion
• Goal of Boltzman Learning is to maximize likelihood function (using gradient
descent)