Professional Documents
Culture Documents
Neural Networks
Copyright © 1999-2006
Investment Analytics
Copyright © 1999-2006 Investment Analytics Forecasting Financial Markets – Neural Networks Slide: 1
Overview
¾ Overview of neural networks
¾ Design considerations
¾ Applications
Copyright © 1999-2006 Investment Analytics Forecasting Financial Markets – Neural Networks Slide: 2
A Neural Network
¾ Processing elements
Neurons
• Receives & processes input(s)
• Delivers single output
¾ Network
Collection of interlinked neurons
Grouped in layers
• Input
• Intermediate (hidden)
• Output
Copyright © 1999-2006 Investment Analytics Forecasting Financial Markets – Neural Networks Slide: 3
A Schematic Diagram of a Neuron
Copyright © 1999-2006 Investment Analytics Forecasting Financial Markets – Neural Networks Slide: 4
The Neuron Analogy
Copyright © 1999-2006 Investment Analytics Forecasting Financial Markets – Neural Networks Slide: 5
A 3 Layer Neural Network
Network
Inputs Network
Outputs
Hidden Layer
Copyright © 1999-2006 Investment Analytics Forecasting Financial Markets – Neural Networks Slide: 6
Processing Information
¾ Inputs
Correspond to single attributes
• Can include qualitative data
¾ Outputs
Solution to a problem
• E.g. forecast, or binary value
¾ Weights
Express relative importance of data
• On inputs or data transferred between layers
“Learning” = adapting weights
Copyright © 1999-2006 Investment Analytics Forecasting Financial Markets – Neural Networks Slide: 7
Activation Function
¾ Determines whether a neuron will “fire”
I.E. Produce an output
¾ Weighted sum of inputs
For N inputs i into neuron j
N
Y j = ∑ Wij X i
i =1
Copyright © 1999-2006 Investment Analytics Forecasting Financial Markets – Neural Networks Slide: 8
Transfer Function
¾ Transforms or normalizes output
Also called a transformation or squashing function
¾ Popular choice
Sigmoid : f(x) = 1/(1+e-x)
Copyright © 1999-2006 Investment Analytics Forecasting Financial Markets – Neural Networks Slide: 9
Architecture / Network Topology
¾ Number of neurons
¾ Number of hidden layers
¾ Connections
Feed forward/backwards
Fully or partially connected
¾ Static or adaptive architecture
Copyright © 1999-2006 Investment Analytics Forecasting Financial Markets – Neural Networks Slide: 10
Learning
¾ Supervised
Uses set of inputs for which desired output is known
Cost function f(desired-actual) used to change weights
Example: Hopfield network
¾ Unsupervised
Network shown only inputs
No information on “correct” outputs
Self-organizing
Example: Kohonen self-organizing feature maps
Copyright © 1999-2006 Investment Analytics Forecasting Financial Markets – Neural Networks Slide: 11
Training
¾ Data divided into training & testing data sets
Training set used to adapt weights
• Many iterations or “epochs”
• Training time dependent on data, network architecture, learning
algorithm
Forecasting performance tested on test data set
• Cost function comparing desired vs. Actual outputs
Stopping rule
• Determines when to terminate training
– When weights stabilize
– When cost function minimized
– Danger of over-fitting
Copyright © 1999-2006 Investment Analytics Forecasting Financial Markets – Neural Networks Slide: 12
Applications in Finance
¾ Bankruptcy prediction
¾ Bond rating
¾ Consumer credit scoring
¾ Financial market forecasting
Equities, currencies, commodities, bonds, derivatives
Security selection
Portfolio optimization
Trading systems
Copyright © 1999-2006 Investment Analytics Forecasting Financial Markets – Neural Networks Slide: 13
A Simple NN Example
¾ Supervised Learning of OR operator
Inputs X1, X2
Outputs Z (desired), Y(actual)
Weights W1, W2 ; initial values 0.1 and 0.3
Transfer Function
• F(Y) = 1 if Y > Threshold Value (0.5); 0 otherwise
Learning
• ∆ = (Z - Y)
• Wi(final) = Wi(initial) + α∆Xi
• α is learning coefficient (0.2)
Copyright © 1999-2006 Investment Analytics Forecasting Financial Markets – Neural Networks Slide: 14
A Simple NN Example
Copyright © 1999-2006 Investment Analytics Forecasting Financial Markets – Neural Networks Slide: 15
Design Considerations
¾ Network performance
¾ Control mechanisms
Choice of activation function
Choice of cost function
Network architecture
Gradient descent/ascent efficiency
Learning times
Copyright © 1999-2006 Investment Analytics Forecasting Financial Markets – Neural Networks Slide: 16
Network Performance Measures
¾ Convergence
Accuracy of model fitness in-sample
¾ Generalization
Accuracy of model fitness out-of-sample
¾ Stability
Variance in prediction accuracy
Copyright © 1999-2006 Investment Analytics Forecasting Financial Markets – Neural Networks Slide: 17
Convergence
¾ Is network capable of learning classification?
Under what conditions?
What are computation requirements?
¾ Fixed topology networks
Prove convergence by showing error tends to zero in limit as t
→∞
• Using gradient descent
¾ Other networks
Show that network can classify the maximum # possible
mappings with arbitrarily large probability
Copyright © 1999-2006 Investment Analytics Forecasting Financial Markets – Neural Networks Slide: 18
Generalization
¾ Ability to classify data outside training set
Most important performance criterion
¾ Analogy with curve fitting
Two problems
• Finding order of polynomial
• Estimating coefficients
Too low order (NN structure too simple)
• Bad approximation both in- and out-of-sample
Too high order (NN structure too complex)
• “Over-fitting” : fits test data well, but out-of-sample performance
poor
Copyright © 1999-2006 Investment Analytics Forecasting Financial Markets – Neural Networks Slide: 19
Stability
¾ Consistency of results
When network parameters are varied
• Networks often vary widely in predictive performance
• “Chaotic”: highly sensitive to initial conditions
¾ Two components of error
• Bias: due to parameterization & associated assumptions
• Variance: sensitivity to changes in estimated parameters
Regression: high bias, low variance
Neural networks: low bias, high variance
• No parameterization, but may fit entire family of polynomials to given data
set
Copyright © 1999-2006 Investment Analytics Forecasting Financial Markets – Neural Networks Slide: 20
Choice of Activation Function
¾ Sigmoid Functions
Differentiable and well behaved
¾ Symmetric
Typical: scaled hyperbolic tangent
e Sy − e − Sy The Symmetric Scaled Hyperbolic Tangent
f ( y ) = A tanh( Sy ) = A Sy
Function
e + e − Sy
1.5
1.0
2A
= A− 0.5
-0.5
• A is amplitude -1.0
Copyright © 1999-2006 Investment Analytics Forecasting Financial Markets – Neural Networks Slide: 21
Choice of Activation Function
¾ Choice of sigmoid parameters
A = 1.7159, s = 2/3
• F(-1) = -1 and f(1) = 1
– Gain in squashing transformation is normally around 1
• ∆2f/δx2 is max at ± 1
– Improves convergence at end of learning session
Copyright © 1999-2006 Investment Analytics Forecasting Financial Markets – Neural Networks Slide: 22
Cost Function
¾ Quadratic cost function is most common
Least mean square error
1 n
E = ∑ ( d i − yi ) 2
2 i =1
• Yi is current output from unit i
• di is desired output from unit I
Discounted least square error
1 n 1
E= ∑ ( a + bi )
( d − y ) 2
n i =1 1 + e
i i
Copyright © 1999-2006 Investment Analytics Forecasting Financial Markets – Neural Networks Slide: 23
Learning
¾ Gradient descent used to minimize cost function
Change weights in proportion to δi = δe/δWi
• ∆Wij(t+1) = λ δi yij
¾ Learning rate (step size, momentum) λ
As λ → 0 and t→∞ this procedure will find MSEMin
¾ Difficult to find appropriate rate
Too small
• Slow convergence
• May get trapped in local minima
Too large: unstable weights
Copyright © 1999-2006 Investment Analytics Forecasting Financial Markets – Neural Networks Slide: 24
Learning Rate
¾ Optimal learning rate pattern
Smooth MSE chart (for each layer)
Smooth weight histograms (for each layer)
¾ Learning rate adjustment
One rate for entire network
Different rates for each layer
Different rate for each weight
Copyright © 1999-2006 Investment Analytics Forecasting Financial Markets – Neural Networks Slide: 25
Learning Rate Rules of Thumb
¾ If no connections that jump layers
Learning rate for hidden layer λL = 0.5 λ L+1
¾ With connections that jump layers
Learning rate for hidden layer λL = 0.75 λ L+1
¾ Check sign of consecutive weight changes
If same, increase λ
If opposite, decrease λ
¾ If MSE chart is erratic
Reduce learning rate (for that layer)
Copyright © 1999-2006 Investment Analytics Forecasting Financial Markets – Neural Networks Slide: 26
Network Architecture
¾ Hidden units
In general, the fewer the better
• Network will generalize better
Another approach: weight sharing
• Imposing equality constraints amongst connections strengths
• Reduces # free parameters while preserving network size and ability to
recognize complex patterns
¾ Hidden layers
Typically start with one
If use more than one make sure to connect each layer to all prior
layers
Copyright © 1999-2006 Investment Analytics Forecasting Financial Markets – Neural Networks Slide: 27
Network Architecture
¾ Constructive techniques
Hidden units added incrementally
¾ Pruning techniques
Attempts to eliminate redundant units
¾ Genetic algorithms
Selects “fittest” of several competing networks
Copyright © 1999-2006 Investment Analytics Forecasting Financial Markets – Neural Networks Slide: 28
Constructive Techniques
¾ Tiling algorithm
Divide training data set into “faithful” & “unfaithful” classes
• I.E. Those the network recognizes correctly and those it doesn’t
Add ancillary unit and connect to layer above
Select one unfaithful class and train new unit to subdivide it into
faithful and unfaithful classes
Repeat until no unfaithful classes remain
• Always possible - worst case: one unit for each input pattern
Add new master output unit and connect to all layers
• Training the new unit to learning mapping to desired output
Copyright © 1999-2006 Investment Analytics Forecasting Financial Markets – Neural Networks Slide: 29
Other Constructive Techniques
¾ Cascade algorithm
Adds hidden unit to maximize magnitude of
correlation between new unit’s output and residual
error signal to be minimized
¾ Dynamic node creation
Add new unit is rate of error decrease falls below
certain value
Copyright © 1999-2006 Investment Analytics Forecasting Financial Markets – Neural Networks Slide: 30
Pruning Techniques
¾ Multi-stage stage pruning
Outputs of hidden units analysed to see if any are not contributing
to solution
• Output of unit doesn’t change for any input pattern
• If output from two units is identical or opposite (for all inputs)
Repeat for next layer
¾ Weight decay
Weights without much influence subjected to time decay
Equivalent: add penalty term to cost function
• E* = MSE + bσσwij2
Copyright © 1999-2006 Investment Analytics Forecasting Financial Markets – Neural Networks Slide: 31
Genetically Evolved Neural Networks
¾ Initial population of randomly generated networks
¾ Proceed through training cycle with all networks
¾ At end of initial training cycle
Worst performing networks are deleted
Best performing networks are “mated”
¾ Continue training with all networks
¾ Occasional random mutations introduced
Randomize weights of lowly ranked networks
Change forecast horizon, lags on input variables
Copyright © 1999-2006 Investment Analytics Forecasting Financial Markets – Neural Networks Slide: 32
Genetic Evolution of a Neural Network
Copyright © 1999-2006 Investment Analytics Forecasting Financial Markets – Neural Networks Slide: 33
Training
¾ Epoch
# Training cycles after which weights are updated
¾ Determining epoch size
Start with initial epoch
Train network for large # (10,000) iterations
Test network and record R2 (for each output)
Repeat for variety of epoch sizes
Pick epoch size that maximizes R2
¾ Controlling over-fitting
Terminate training when MSE on test set starts to rise
Copyright © 1999-2006 Investment Analytics Forecasting Financial Markets – Neural Networks Slide: 34
Initial Weights
¾ Start with unequal initial weights
Rumelhart, Hinton, Williams (1986)
• Will not converge if solution requires unequal weights
¾ Testing stability
Initial weight matrix defines starting point on the
weight-error surface
Need several training sets with different random
weights to test for statistical stability
Copyright © 1999-2006 Investment Analytics Forecasting Financial Markets – Neural Networks Slide: 35
Data Modeling
¾ Detrending
Removing of seasonality and trends to achieve
stationarity
¾ Normalization
Variables scaled to have zero mean, unit SD
• Brings inputs into normal operating range of activation function
• Otherwise activation values may tend to zero
– Network paralysis
′ X i (t ) − X i
X i (t ) =
σX i
Copyright © 1999-2006 Investment Analytics Forecasting Financial Markets – Neural Networks Slide: 36
Data Modeling
¾ Scaling of Outputs
Some transfer functions reach max/min values
only when inputs reach infinity
Y ′(t ) = SCALE × Y (t ) + OFFSET
MAX − MIN
SCALE =
YMax − YMin
MAX − MIN
OFFSET = MAX − YMax
YMax − YMin
¾ Multi-Collinearity
Independent variables are correlated
• Solution: use principal components analysis to
orthoganalize inputs
Copyright © 1999-2006 Investment Analytics Forecasting Financial Markets – Neural Networks Slide: 37
Lab: Modeling Implied Volatility on
IBEX Options
¾ Compare two forecasting techniques
Regression
Genetic neural network
¾ Forecast implied volatility
Evaluate trading performance
Copyright © 1999-2006 Investment Analytics Forecasting Financial Markets – Neural Networks Slide: 38
Solution: Modeling Volatility
on IBEX Options
Cumulative Returns
300%
Regression Buy & Hold Neural Network
250%
200%
150%
100%
50%
0%
1 4 7 10 13 16 19 22 25 28 31 34 37 40 43 46 49 52
-50%
Copyright © 1999-2006 Investment Analytics Forecasting Financial Markets – Neural Networks Slide: 39
Solution: Modeling Volatility
on IBEX Options
Copyright © 1999-2006 Investment Analytics Forecasting Financial Markets – Neural Networks Slide: 40
Summary: Neural Networks
¾ Pros
Can capture non-linear effects
• Pattern recognition
Process model not required
Wide range of applications in finance
¾ Cons
‘Black-box’ approach
Sometimes poor stability & generalization
characteristics
Copyright © 1999-2006 Investment Analytics Forecasting Financial Markets – Neural Networks Slide: 41