Professional Documents
Culture Documents
Course grading
Subject: 45 hours
Grading:
Project: 50%, group report. Maximum 3 students/group
Theory and application of Neural Network and Fuzzy. Simulation or real experiments on har
dware.
Paper test: 50% (90 minutes)
Student will not allow for the exam if the attending less than 80%
Contents
Neural Network
Theory and simulation (Using Matlab or other languages)
Fuzzy
Theory and simulation (Using Matlab Fuzzy tool)
Contents
Introduction
Characteristics
Nonlinear I/O mapping
Adaptivity
Generalization ability
Fault-tolerance (graceful
degradation)
Biological analogy
Hopfield Network
Boltzmann Machine
Time factor
It takes long time for training
NETtalk [Sejnowski]
Inputs: English text
Output: Spoken phonemes
ALVINN System
Image of a
forward -
mounted Weight values
camera for one of the
hidden units
Application:
Error Correction by a Hopfield Network
corrupted
input data
original
target data
Corrected
data after
20 iterations
Corrected
data after
10 iterations Fully
corrected
data after
35 iterations
Perceptron
and
Gradient Descent Algorithm
Architecture of A Perceptron
Linear function
f (x) w0 w1 x1 wn xn
Perceptron and Decision Hyperplane
Perceptrons can represent all of the primitive boolean functions AND, OR,
NAND, and NOR.
Note: Some boolean functions cannot be represented by a single perceptron (e.g. XOR).
Why not?
Perceptron rule
Thresholded output
Converges after a finite number of iterations to a hypothesis that
perfectly classifies the training data, provided the training
examples are linearly separable.
Can deal with only linearly separable data
Delta rule
Unthresholded output
Converges only asymptotically toward the error minimum,
possibly requiring unbounded time, but converges regardless of
whether the training data are linearly separable.
Can deal with linearly nonseparable data
Multilayer Perceptron
Multilayer Network and
Its Decision Boundaries
BP learns the weights for a multilayer network, given a network with a fixed set
of units and interconnections.
1
E ( w) (t kd okd ) 2
2 dD koutputs
Termination Criteria
After a fixed number of iterations (epochs)
Once the error falls below some threshold
Once the validation error meets some criterion
Adding Momentum
Adding momentum
w ji (n) j x ji w ji (n 1), 0 1
Help to escape a small local minima in the error surface.
Speed up the convergence.
Derivation of the BP Rule
Notations
xij : the ith input to unit j
wij : the weight associated with the ith input to unit j
netj : the weighted sum of inputs for unit j
oj : the output computed by unit j
tj : the target output for unit j
: the sigmoid function
outputs : the set of units in the final layer of the network
Downstream(j) : the set of units whose immediate inputs include the output of unit j
Derivation of the BP Rule
Error measure: Ed ( w) 1 (t k ok ) 2
2 koutputs
E d
Gradient descent: w ji
w ji
Ed Ed o j
Step 1: net j w ji x ji
net j o j net j i
Ed 1
Step 2: (t
o j o j 2 koutputs
k ok ) 2
(t j o j )
o j (net j )
Step 3: o j (1 o j )
net j net j
E d
All together: w ji (t j o j )o j (1 o j ) x ji
w ji
Case 2: Rule for Hidden Unit Weights
Ed Ed net k o j
Step 1:
net j kDownstream( j ) net k o j net j
net k o j
k
kDownstream( j ) o j net j
o j
kDownstream( j )
k wkj
net j
k
kDownstream( j )
wkj o j (1 o j )
Thus: w ji j x ji , where j o j (1 o j ) w
k kj
kDownstream( j )
Backpropagation for MLP: revisited
Convergence and Local Minima
Notes
Gradient descent over the complex error surfaces represented by
ANNs is still poorly understood
No methods are known to predict certainly when local minima
will cause difficulties.
We can use only heuristics for avoiding local minima.
Heuristics for Alleviating the Local Minima Problem
Train multiple networks using the same data, but initializing each network with
different random weights.
Select the best network w.r.t the validation set
Make a committee of networks
Why BP Works in Practice?
A Possible Senario
Differentiable
Provides a useful structure for gradient search.
This structure is quite different from the general-to-specific ordering in CE, or the simple-to-
complex ordering in ID3 or C4.5.
Weight decay
Decrease each weight by some small factor during each iteration.
This is equivalent to modifying the definition of E to include a penalty term corresponding
to the total magnitude of the network weights.
The motivation for the approach is to keep weight values small, to bias learning against
complex decision surfaces.
k-fold cross-validation
Cross validation is performed k different times, each time using a different partitioning of
the data into training and validation sets
The result are averaged after k times cross validation.
Designing an Artificial Neural Network for Face
Recognition Application
Problem Definition
Input encoding
Output encoding
Possible Solutions
Extract key features using preprocessing
Coarse-resolution
Features extraction
edges, regions of uniform intensity, other local image features
Defect: High preprocessing cost, variable number of features
Coarse-resolution
Encode the image as a fixed set of 30 x 32 pixel intensity values, with one network input per
pixel.
The 30x32 pixel image is a coarse resolution summary of the original 120x128 pixel image
Coarse-resolution reduces the number of inputs and weights to a much more manageable
size, thereby reducing computational demands.
Output Coding for Face Recognition
exercise
exercise
Given the network and initial weights
Compute the output according to inputs
Train the network
Activations function are sigmoids
Exercise 2
Example Multi-layer ANN with Sigmoid Units
Bài tập
Cho mạng nơ ron có 3 lớp: lớp vào 3 nơ ron, lớp ẩn 2 nơ ron, lớp ngõ ra 2 nơ ro
n. Hàm tác động là hàm sigmod. Ma trận trọng số W1 = [0.2 0.7; -0.1 -1.2;0.
4 1.2]; w2 = [1.1 3.1;0.1 1.17]; tốc độ huấn luyện là 0.1.
0 0 0
1 1 1 2 3 1 0
1 1 0 1
5 4 0 1
1 0 1
1 1 1 1
3 5
0 0 0 1 * 1 0 1
= 3 4 * 1
0
0
1
1x1x3
1 1 0 0 1 0 1
2 4 1 0
1 0 1 3 3 0 1
0 1 0
0 1
1 0 1 1 1
1 0
0 0
0 1
Input image conv 1 1 1 0
3 maps 0 0
2x2x3 0 1
1 1 1 0
0 0