Neural Network Ch4 2

Prediction Networks
Prediction
Predict f(t) based on values of f(t 1), f(t 2),
Two NN models: feedforward and recurrent
A simple example (section 3.7.3)
Forecasting gold price at a month based on its prices at
previous months
Using a BP net with a single hidden layer
1 output node: forecasted price for month t
k input nodes (using price of previous k months for prediction)
k hidden nodes
Training sample: for k = 2: {(x
t-2
, x
t-1
) x
t
}
Raw data: gold prices for 100 consecutive months, 90 for
training, 10 for cross validation testing
one-lag forecasting: predict x
t
based on x
t-2
and x
t-1

multilag: using predicted values for further forecasting

Prediction Networks
Training:
Three attempts:
k = 2, 4, 6
Learning rate = 0.3,
momentum = 0.6
25,000 50,000
epochs
2-2-2 net with
good prediction
Two larger nets
over-trained
Results
Network MSE
2-2-1 Training 0.0034
one-lag 0.0044
multilag 0.0045
4-4-1 Training 0.0034
one-lag 0.0098
multilag 0.0100
6-6-1 Training 0.0028
one-lag 0.0121
multilag 0.0176

Prediction Networks
Generic NN model for prediction
Preprocessor prepares training samples from time series data
Train predictor using samples (e.g., by BP learning)

Preprocessor
In the previous example,
Let k = d + 1 (using previous d + 1data points to predict)

More general:
c
i
is called a kernel function for different memory model (how
previous data are remembered)
Examples: exponential trace memory; gamma memory (see p.141)
) (t x ) (t x
) (t x
d i i t x t x t x t x t x t x
i d
,..., 0 ), ( ) ( where )) ( ),..., ( ), ( ( ) (
1 0
= = =
Prediction Networks
Recurrent NN architecture
Cycles in the net
Output nodes with connections to hidden/input nodes
Connections between nodes at the same layer
Node may connect to itself
Each node receives external input as well as input from other
nodes
Each node may be affected by output of every other node
With a given external input vector, the net often converges to an
equilibrium state after a number of iterations (output of every node
stops to change)
An alternative NN model for function approximation
Fewer nodes, more flexible/complicated connections
Learning is often more complicated
Prediction Networks
Approach I: unfolding to a
feedforward net
Each layer represents a time delay
of the network evolution
Weights in different layers are
identical

Cannot directly apply BP learning
(because weights in different
layers are constrained to be
identical)
How many layers to unfold to?
Hard to determine
A fully connected net of 3 nodes
Equivalent FF net of k layers
Prediction Networks
Approach II: gradient descent
A more general approach
Error driven: for a given external input

Weight update

known) are output (desired nodes output are where
) ( )) ( ) ( ( ) (
2 2
k
t e t o t d t E
k k k k k

= =
j i
j i
w
t E
t w
,
,
) (
) (
c
c
q = A
NN of Radial Basis Functions
Motivations: better performance than Sigmoid function
Some classification problems
Function interpolation
Definition
A function is radial symmetric (or is RBF) if its output depends on
the distance between the input vector and a stored vector to that
function

Output
NN with RBF node function are called RBF-nets
RBF with the associated
vector the is or, input vect the is where Distance = i i u
2 1 2 1
whenever ) ( ) ( u u u u < >
Gaussian function is the most widely used RBF
a bell-shaped function centered at u = 0.
Continuous and differentiable

Other RBF
Inverse quadratic function, hypersh]pheric function, etc
2
) / (
) (
c u
g
e u

) ( 2 )' ) / ( ( ) ( then ) ( if
2 ) / ( ' ) / (
2 2
u u c u e u e u
g
c u
g
c u
g
= = =

Gaussian function

Inverse quadratic
function
0 , ) ( ) (
2 2
2
< + =
|
for u c u

hyperspheric function
>
s
=
c u
c u
u
s
if 0
if 1
) (

Pattern classification
4 or 5 sigmoid hidden nodes
are required for a good
classification
Only 1 RBF node is required
if the function can
approximate the circle
x
x
x
x
x
x
x x
x
x
x
XOR problem
2-2-1 network
2 hidden nodes are RBF:

Output node can be step or sigmoid
When input x is applied
Hidden node calculates distance
then its output
All weights to hidden nodes set to 1
Weights to output node trained by
LMS
t
1
and t
2
can also been trained
] 0 , 0 [ , ) (
] 1 , 1 [ , ) (
2 2
1 1
2
2
2
1
= =
= =

t e x
t e x
t x
t x
j
t x
x
(1,1) 1 0.1353
(0,1) 0.3678 0.3678
(0,0) 0.1353 1
(1,0) 0.3678 0.3678
) (
1
x ) (
2
x
(0, 0)
(1, 1)
(0, 1)
(1, 0)
Function interpolation
Suppose you know and , to approximate
( ) by linear interpolation:

Let be the distances of from and
then
i.e., sum of function values, weighted and normalized by distances
Generalized to interpolating by more than 2 known f values

Only those with small distance to are useful

) (
1
x f
) (
2
x f ) (
0
x f
) /( ) ))( ( ) ( ( ) ( ) (
1 2 1 0 1 2 1 0
x x x x x f x f x f x f + =
2 0 1
x x x < <
0 2 2 1 0 1
, x x D x x D = =
0
x
1
x
2
x
] /[ )] ( ) ( [ ) (
1
2
1
1 2
1
2 1
1
1 0

+ + = D D x f D x f D x f
0 0
1 1
2
1
1
1
2
1
2 1
1
1
0
to neighbors of number the is where
) ( ) ( ) (
) (
0
0 0
x P
D D D
x f D x f D x f D
x f
P
P P

+ + +
+ + +
=
) (
i
x f
0
x
Example:
8 samples with known
function values
can be interpolated
using only 4 nearest
neighbors
) (
0
x f
) , , , (
5 4 3 2
x x x x
1
5
1
4
1
3
1
2
1
5
1
4
1
3
1
2
1
5
1
4
1
3
1
2
5
1
5 4
1
4 3
1
3 2
1
2
0
8 3 9 8
) ( ) ( ) ( ) (
) (

+ + +
+ + +
=
+ + +
+ + +
=
D D D D
D D D D
D D D D
x f D x f D x f D x f D
x f
Using RBF node to achieve neighborhood effect
One hidden node per sample:
Network output for approximating is proportional to
1
) (

= D D
) (
0
x f
Clustering samples
Too many hidden nodes when # of samples is large
Grouping similar samples together into N clusters, each with
The center: vector
Desired mean output:
Network output:
Suppose we know how to determine N and how to cluster all P
samples (not a easy task itself), and can be determined by
learning
i
Learning in RBF net

Objective:
learning

to minimize

Gradient descent approach

One can also obtain by other clustering techniques, then use
GD learning for only
i
Polynomial Networks
Polynomial networks
Node functions allow direct computing of polynomials
of inputs
Approximating higher order functions with fewer nodes
(even without hidden nodes)
Each node has more connection weights
Higher-order networks

# of weights per node:
Can be trained by LMS
|
.
|
\
|
+ +
|
.
|
\
|
+
|
.
|
\
|
+
k
kn n n

2
2
1
1
Polynomial Networks
Sigma-pi networks

Does not allow terms with higher powers of inputs, so they are not
a general function approximater
# of weights per node:
Can be trained by LMS
Pi-sigma networks
One hidden layer with Sigma function:
Output nodes with Pi function:
Product units:
Node computes product:
Integer power P
j,i
can be learned
Often mix with other units (e.g., sigmoid)
|
.
|
\
|
+ +
|
.
|
\
|
+
|
.
|
\
|
+
k
n n n

2 1
1

Neural Network Ch4 2

Uploaded by

Document Information

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Neural Network Ch4 2

Uploaded by

Copyright:

Available Formats

Prediction Networks

Learning in RBF net

You might also like