Professional Documents
Culture Documents
Æ slides on www.neural-forecasting.com
Sven F. Crone
Centre for Forecasting
Department of Management Science
Lancaster University Management School
email: s.crone@neural-foreasting.com
⎧ ∂C (t pj , o pj ) '
⎪ f j (net pj ) if unit j is in the output layer
⎪ ∂o pj
δ pj = ⎨ Æ slides, data & additional info on
⎪ f ' ( net ) δ w
pj ∑ pk pjk if unit j is in a hidden layer www.neural-forecasting.com
⎩⎪
j
k
1
EVIC’05 © Sven F. Crone - www.bis-lab.com
Agenda
Agenda
1. Forecasting?
1. Forecasting as predictive Regression
2. Time series prediction vs. causal prediction
3. Why NN for Forecasting?
2. Neural Networks?
3. Forecasting with Neural Networks …
4. How to write a good Neural Network forecasting paper!
Forecasting or Prediction?
¾K-means ¾Feature Selection ¾Association rules ¾Temporal ¾Decision trees ¾Linear Regression ¾Exponential
Clustering ¾Princip.Component ¾Link Analysis association rules ¾Logistic regress. ¾Nonlinear Regres. smoothing
¾Neural networks Analysis ¾(S)ARIMA(x)
¾Neural networks ¾Neural networks
¾K-means ¾Disciminant MLP, RBFN, GRNN ¾Neural networks
Clustering Analysis
¾Class Entropy
ALGORITHMS
2
EVIC’05 © Sven F. Crone - www.bis-lab.com
Forecasting or Classification
Independent Metric scale Ordinal scale Nominal scale
dependent
Metric sale Regression Analysis of Supervised learning
Time Series DOWNSCALE Variance
Inputs Target
Analysis
...
...
Ordinal scale ...
...
DOWNSCALE DOWNSCALE DOWNSCALE ...
Cases ...
...
...
Nominal scale Classification Contingency .. .. .. .. .. .. .. .. .. ... .. ..
......... . .
DOWNSCALE Analysis ...
Forecasting or Classification?
3
EVIC’05 © Sven F. Crone - www.bis-lab.com
Agenda
Forecasting Models
4
EVIC’05 © Sven F. Crone - www.bis-lab.com
Definition
Time Series is a series of timely ordered, comparable observations
yt recorded in equidistant time intervals
Notation
Yt represents the t th period observation, t=1,2 … n
Approach
Unfortunately we cannot observe either of these !!!
Forecasting methods try to isolate the systematic part
Forecasts are based on the systematic part
The random part determines the distribution shape
Assumption
Data observed over time is comparable
The time periods are of identical lengths (check!)
The units they are measured in change (check!)
The definitions of what is being measured remain unchanged (check!)
They are correctly measured (check!)
data errors arise from sampling, from bias in the instruments or the
responses, from transcription.
Assumption
there exists a cause-effect relationship, that keeps repeating itself
with the yearly calendar
Cause-effect relationship may be treated as a BLACK BOX
TIME-STABILITY-HYPOTHESIS ASSUMES NO CHANGE:
Æ Causal relationship remains intact indefinitely into the future!
the time series can be explained & predicted solely from previous
observations of the series
5
EVIC’05 © Sven F. Crone - www.bis-lab.com
80 100
80
60
60
40
40 regular fluctuation within a year
20
Long term movement in series 20 (or shorter period)
0 0
superimposed on trend and cycle
12
15
18
21
24
27
30
3
9
Year
10
15
20
25
30
35
40
45
5
Year
8 300
250
6
200
4 150
2 Regular fluctuation 100
0
superimposed on trend (period 50
may be random)
10
15
20
25
30
35
40
45
5
0
Year
10
19
28
37
46
55
64
73
82
¾ Signal
level ‘L‘
trend ‘T‘
seasonality ‘S‘
¾ Noise
irregular,error 'e'
40
35
30
25
PULSE 20
15
M 09.2000
M 10.2000
M 11.2000
M 12.2000
M 01.2001
M 02.2001
M 03.2001
M 04.2001
M 05.2001
M 06.2001
M 07.2001
M 08.2001
M 09.2001
M 10.2001
M 11.2001
M 12.2001
M 01.2002
M 02.2002
M 03.2002
M 04.2002
M 05.2002
M 06.2002
M 07.2002
[t]
trended / seasonal
40
etc. development
30
20
10
Okt 01
Nov 00
Dez 00
Jan 01
Feb 01
Mrz 01
Mai 01
Jun 01
Jul 01
Nov 01
Dez 01
Jan 02
Feb 02
Mrz 02
Mai 02
Jun 02
Jul 02
Sep 00
Sep 01
Aug 00
Apr 01
Aug 01
Apr 02
Seasonal pattern changes & shifts STATIONARY time series with level shift
6
EVIC’05 © Sven F. Crone - www.bis-lab.com
Time Series
Level
Trend
Random
• Time Series Æ
decomposed into Components
REGULAR IRREGULAR
Time Series Patterns Time Series Patterns
40 25
50000
50
10
35
40000 20
30 40
8
25
30000 15
30
20 6
20000 20 10
15
4
10
10000 5
10
5 2
0 0 0
0 [t]
M 08.2000
M 09.2000
M 10.2000
M 11.2000
M 12.2000
M 01.2001
M 02.2001
M 03.2001
M 04.2001
M 05.2001
M 06.2001
M 07.2001
M 08.2001
M 09.2001
M 10.2001
M 11.2001
M 12.2001
M 01.2002
M 02.2002
M 03.2002
M 04.2002
M 05.2002
M 06.2002
M 07.2002
M 08.2000
M 09.2000
M 10.2000
M 11.2000
M 12.2000
M 01.2001
M 02.2001
M 03.2001
M 04.2001
M 05.2001
M 06.2001
M 07.2001
M 08.2001
M 09.2001
M 10.2001
M 11.2001
M 12.2001
M 01.2002
M 02.2002
M 03.2002
M 04.2002
M 05.2002
M 06.2002
M 07.2002
M 08.2000
M 09.2000
M 10.2000
M 11.2000
M 12.2000
M 01.2001
M 02.2001
M 03.2001
M 04.2001
M 05.2001
M 06.2001
M 07.2001
M 08.2001
M 09.2001
M 10.2001
M 11.2001
M 12.2001
M 01.2002
M 02.2002
M 03.2002
M 04.2002
M 05.2002
M 06.2002
M 07.2002
[t]
Jan 77
Mrz 77
Mai 77
Jul 77
Nov 77
Jan 78
Mrz 78
Mai 78
Jul 78
Nov 78
Jan 79
Mrz 79
Mai 79
Jul 79
Nov 79
Jan 80
Mrz 80
Mai 80
Jul 80
Nov 80
Sep 77
Sep 78
Sep 79
Sep 80
1961
1962
1963
1964
1965
1966
1967
1968
1969
1970
1971
1972
1973
1974
1975
1976
1977
1978
1979
1980
- 10 -5
Datenwert original Korrigierter Datenwert Datenwert original Korrigierter Datenwert Datenwert original Korrigierter Datenwert
Datenwert original Korrigierter Datenwert Datenwert original Korrigierter Datenwert
Sales or 800
750
observation of Yt 700
650
Different possibilities to
time series at point t 600 combine components
550
500
[t]
Jan 60
Mrz 60
Mai 60
Jul 60
Nov 60
Jan 61
Mrz 61
Mai 61
Jul 61
Nov 61
Jan 62
Mrz 62
Mai 62
Jul 62
Nov 62
Jan 63
Mrz 63
Mai 63
Jul 63
Nov 63
Sep 60
Sep 61
Sep 62
Sep 63
= Original-Zeitreihen
consists of a 45
40
combination of f( ) 35
30
25
20
15
10
0 [t]
Jan 77
Mrz 77
Mai 77
Jul 77
Nov 77
Jan 78
Mrz 78
Mai 78
Jul 78
Nov 78
Jan 79
Mrz 79
Mai 79
Jul 79
Nov 79
Jan 80
Mrz 80
Mai 80
Jul 80
Nov 80
Sep 77
Sep 78
Sep 79
Sep 80
St , Original-Zeitreihen
Additive Model
Seasonal Component 12
10
6
Yt = L + St + Tt + Et
4
[t]
0
1960
1961
1962
1963
1964
1965
1966
1967
1968
1969
1970
1971
1972
1973
1974
1975
1976
1977
1978
1979
1980
Tt ,
Datenwert original Korrigierter Datenwert
60000
Et 50000
40000
Irregular or 30000
20000
Component 0
M 08.2000
M 09.2000
M 10.2000
M 11.2000
M 12.2000
M 01.2001
M 02.2001
M 03.2001
M 04.2001
M 05.2001
M 06.2001
M 07.2001
M 08.2001
M 09.2001
M 10.2001
M 11.2001
M 12.2001
M 01.2002
M 02.2002
M 03.2002
M 04.2002
M 05.2002
M 06.2002
M 07.2002
[t]
7
EVIC’05 © Sven F. Crone - www.bis-lab.com
Additive
Trend Effect
Multiplicative
Trend Effect
Agenda
Forecasting with Artificial Neural Networks
1. Forecasting?
1. Forecasting as predictive Regression
2. Time series prediction vs. causal prediction
3. SARIMA-Modelling
1. SARIMA – Differencing
2. SARIMA – Autoregressive Terms
3. SARIMA – Moving Average Terms
4. SARIMA – Seasonal Terms
4. Why NN for Forecasting?
2. Neural Networks?
3. Forecasting with Neural Networks …
4. How to write a good Neural Network forecasting paper!
Φ p ( B)(1 − B) d Z t = δ + Θ q ( B)et
8
EVIC’05 © Sven F. Crone - www.bis-lab.com
Model selection
• Examine ACF & PACF
• Identify potential Models (p,q)(sq,sp) auto
Model Application
Model Application
• Use selected model to forecast
ARIMA-Modelling
ARIMA(p,d,q)-Models
ARIMA - Autoregressive Terms AR(p), with p=order of the autoregressive part
ARIMA - Order of Integration, d=degree of first differencing/integration involved
ARIMA - Moving Average Terms MA(q), with q=order of the moving average of error
SARIMAt (p,d,q)(P,D,Q) with S the (P,D,Q)-process for the seasonal lags
Objective
Identify the appropriate ARIMA model for the time series
Identify AR-term
Identify I-term
Identify MA-term
Identification through
Autocorrelation Function
Partial Autocorrelation Function
Recap:
Let the mean of the time series at t be μt = E (Yt )
and λ = cov Y , Y
t ,t −τ ( t t −τ )
λt ,t = var (Yt )
Definition
A time series is stationary if its mean level μt is constant for all t
and its variance and covariances λt-τ are constant for all t
In other words:
all properties of the distribution (mean, varicance, skewness, kurtosis etc.) of a
random sample of the time series are independent of the absolute time t of
drawing the sample Æ identity of mean & variance across time
9
EVIC’05 © Sven F. Crone - www.bis-lab.com
1 2 3 4
Æ 2nd order differences: d=2
xt-xt-1
Integration
Differencing
Z t = Yt − Yt −1
Transforms: Logarithms etc.
…
Where Zt is a transform of the variable of interest Yt
chosen to make Zt-Zt-1-(Zt-1-Zt-2)-… stationary
10
EVIC’05 © Sven F. Crone - www.bis-lab.com
Agenda
Forecasting with Artificial Neural Networks
1. Forecasting?
1. Forecasting as predictive Regression
2. Time series prediction vs. causal prediction
3. SARIMA-Modelling
1. SARIMA – Differencing
2. SARIMA – Autoregressive Terms
3. SARIMA – Moving Average Terms
4. SARIMA – Seasonal Terms
4. Why NN for Forecasting?
2. Neural Networks?
3. Forecasting with Neural Networks …
4. How to write a good Neural Network forecasting paper!
Problems
Independence of residuals often violated (heteroscedasticity)
Determining number of past values problematic
∑ (Y − Y )(Y
t t −k −Y )
ρ k = t =k +1 n
∑ (Y − Y )
t =1
t
denotes the correlation between lagged observations of distance Autocorrelation between x_t and x_t-1
k 220
Graphical interpretation …
200
180
160
low autocorrelations
X_t-1
140
…
100 120 140 160 180 200 220
[x_t]
11
EVIC’05 © Sven F. Crone - www.bis-lab.com
ARIMA-Modells: Parameter p
E.g. time series Yt 7, 8, 7, 6, 5, 4, 5, 6, 4.
ACF
0.6 Æ Autocorrelations rt gathered at
lags 1, 2, … make up the
… autocorrelation function (ACF)
0.0
1 2 3 lag
-0.6
.5 .5
0.0 0.0
-.5 -.5
Confidence Limits Confidence Limits
ACF
ACF
12
EVIC’05 © Sven F. Crone - www.bis-lab.com
1
Autocorrelation function 1
Partial Autocorrelation function
0,8 0,8
0,2 0,2
0 0
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16
-0,2 -0,2
-0,4 -0,4
-0,6 -0,6
Autocorrelation
-0,8 -0,8 Partial Autocorrelation
Confidence
Confidence
-1 Interval -1
Interval
0,4 0,4
0,2 0,2
0 0
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16
-0,2 -0,2
-0,4 -0,4
-0,6 -0,6
Autocorrelation
-0,8 -0,8 Partial Autocorrelation
Confidence
Confidence
-1 Interval -1
Interval
0,4 0,4
0,2 0,2
0 0
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16
-0,2 -0,2
-0,4 -0,4
-0,6 -0,6
Autocorrelation
-0,8 -0,8 Partial Autocorrelation
Confidence
Confidence
-1 Interval -1
Interval
13
EVIC’05 © Sven F. Crone - www.bis-lab.com
0,4 0,4
0,2 0,2
0 0
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16
-0,2 -0,2
-0,4 -0,4
-0,6 -0,6
Autocorrelation
-0,8 -0,8 Partial Autocorrelation
Confidence
Confidence
-1 Interval -1
Interval
AR(2) model:
=ARIMA (2,0,0)
Yt = c + φ1Yt −1 + φ2Yt − 2 + et
Yt = c + φ1Yt −1 + et
5
4
3
2
= 1.1 + 0.8Yt −1 + et
1
0
1 4 7
10
13
16
19
22
25
28
31
34
37
40
43
46 49
-1
-2
-3
-4
Agenda
Forecasting with Artificial Neural Networks
1. Forecasting?
1. Forecasting as predictive Regression
2. Time series prediction vs. causal prediction
3. SARIMA-Modelling
1. SARIMA – Differencing
2. SARIMA – Autoregressive Terms
3. SARIMA – Moving Average Terms
4. SARIMA – Seasonal Terms
4. Why NN for Forecasting?
2. Neural Networks?
3. Forecasting with Neural Networks …
4. How to write a good Neural Network forecasting paper!
14
EVIC’05 © Sven F. Crone - www.bis-lab.com
Yt = c + et − θ1et −1 − θ 2 et − 2 − ... − φq et − q
ARIMA(0,0,q)-model = MA(q)-model
1
Autocorrelation function 1
Partial Autocorrelation function
0,8 0,8
0,2 0,2
0 0
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16
-0,2 -0,2
-0,4 -0,4
-0,6 -0,6
Autocorrelation
-0,8 -0,8 Partial Autocorrelation
Confidence
Confidence
-1 Interval -1
Interval
MA(1) model:
(0,0,1)
Yt = c + et − θ1et −1 =ARIMA
1
Autocorrelation function 1
Partial Autocorrelation function
0,8 0,8
0,2 0,2
0 0
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16
-0,2 -0,2
-0,4 -0,4
-0,6 -0,6
Autocorrelation
-0,8 -0,8 Partial Autocorrelation
Confidence
Confidence
-1 Interval -1
Interval
15
EVIC’05 © Sven F. Crone - www.bis-lab.com
0,6 1st, 2nd & 3rd lag significant 0,6 Pattern in PACF
0,4 0,4
0,2 0,2
0 0
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16
-0,2 -0,2
-0,4 -0,4
-0,6 -0,6
Autocorrelation
-0,8 -0,8 Partial Autocorrelation
Confidence
Confidence
-1 Interval -1
Interval
Yt = c + et − θ1et −1
3
2,5
2
1,5
1
= 10 + et + 0.2et −1
0,5
0
-0,5 1 4 7 13
10 16
19
22
25
28
31
34
37
40
43
46
49
-1
-1,5
-2 Period
Yt = c + φ1Yt −1 + et − θ1et −1
16
EVIC’05 © Sven F. Crone - www.bis-lab.com
0,2 0,2
0 0
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 1
-0,2 -0,2
-0,4 -0,4
-0,6 -0,6
Autocorrelation
-0,8 -0,8 Partial Autocorrelation
Confidence
Confidence
-1 Interval -1
Interval
Agenda
Forecasting with Artificial Neural Networks
1. Forecasting?
1. Forecasting as predictive Regression
2. Time series prediction vs. causal prediction
3. SARIMA-Modelling
1. SARIMA – Differencing
2. SARIMA – Autoregressive Terms
3. SARIMA – Moving Average Terms
4. SARIMA – Seasonal Terms
5. SARIMAX – Seasonal ARIMA with Interventions
4. Why NN for Forecasting?
2. Neural Networks?
3. Forecasting with Neural Networks …
4. How to write a good Neural Network forecasting paper!
Seasonality in ARIMA-Models
17
EVIC’05 © Sven F. Crone - www.bis-lab.com
Seasonality in ARIMA-Models
1.0000 ACF
.8000
.6000
ACF
.4000
Upper Limit
.2000
Low er Limit
.0000 Seasonal spikes
-.2000 .8000
= monthly data
1
4
7
10
13
16
19
22
25
28
31
34
-.4000 .6000
.4000
PACF
.2000
Upper Limit
.0000
Low er Limit
-.2000
1
4
7
10
13
16
19
22
25
28
31
34
-.4000
-.6000
PACF
Seasonality in ARIMA-Models
Seasonal ARIMA(1,1,1)(1,1,1)4-modell
(1 − φ1 B ) (1 − Φ1B 4 ) (1 − B ) (1 − B 4 ) Yt = c + (1 − θ1 B ) (1 − Θ1 B 4 ) et
Agenda
Forecasting with Artificial Neural Networks
1. Forecasting?
1. Forecasting as predictive Regression
2. Time series prediction vs. causal prediction
3. SARIMA-Modelling
1. SARIMA – Differencing
2. SARIMA – Autoregressive Terms
3. SARIMA – Moving Average Terms
4. SARIMA – Seasonal Terms
5. SARIMAX – Seasonal ARIMA with Interventions
4. Why NN for Forecasting?
2. Neural Networks?
3. Forecasting with Neural Networks …
4. How to write a good Neural Network forecasting paper!
18
EVIC’05 © Sven F. Crone - www.bis-lab.com
Forecasting Models
Causal Prediction
ARX(p)-Models
Agenda
1. Forecasting?
1. Forecasting as predictive Regression
2. Time series prediction vs. causal prediction
3. SARIMA-Modelling
4. Why NN for Forecasting?
2. Neural Networks?
3. Forecasting with Neural Networks …
4. How to write a good Neural Network forecasting paper!
19
EVIC’05 © Sven F. Crone - www.bis-lab.com
Pattern or noise?
Pattern or noise?
Pattern or noise?
20
EVIC’05 © Sven F. Crone - www.bis-lab.com
Informatics 350
2
R = 0.9036
35
200 20
Engineering
Æ control applications in plants 150 15
Æ automatic target recognition (DARPA) 100 10
Æ explosive detection at airports
Æ Mineral Identification (NASA Mars Explorer) 50 5
Æ starting & landing of Jumbo Jets (NASA)
0 0
Meteorology / weather
1987
1988
1989
1990
1991
1992
1993
1994
1995
1996
1997
1998
1999
2000
2001
2002
2003
Æ ElNino Effects
Corporate Business Number of Publications by
Business Forecasting Domain
Æ credit card fraud detection
Æ simulate forecasting methods 5
21
EVIC’05 © Sven F. Crone - www.bis-lab.com
Averages 25%
Autoregressive Methods 7%
[Forecastinng Method]
Exponential Smoothing 24%
Trendextrapolation 35%
Causal Methods
Econometric Models 23%
Neural Networks 9%
(objective) Æ 23%
Regression 69%
Analogies 23%
Delphi 22%
Judgemental Methods
PERT 6%
(subjective) Æ 2x%
Surveys 49%
[number replies]
Agenda
22
EVIC’05 © Sven F. Crone - www.bis-lab.com
PQ-diagramm
in sample observations & forecasts out of sample
training ÅÆ validate = Test
Time series
actual value
Absolute Forecasting Errors
NN forecasted value
23
EVIC’05 © Sven F. Crone - www.bis-lab.com
Agenda
History
Developed in interdisciplinary Research (McCulloch/Pitts1943)
Motivation from Functions of natural Neural Networks
ª neurobiological motivation
ª application-oriented motivation [Smith & Gupta, 2000]
Turing Hebb Dartmouth Project Minsky/Papert Werbos 1st IJCNN 1st journals
1936 1949 1956 1969 1974 1987 1988
McCulloch / Pitts Rosenblatt Kohonen Rumelhart/Hinton/Williams
1943 1959 1972 1986
Minski 1954 INTEL1971 IBM 1981 Neuralware 1987 SAS 1997
builds 1st NeuroComputer 1st microprocessor Introduces PC founded Enterprise Miner
GE 1954 White 1988 IBM 1998
1st computer payroll system 1st paper on forecasting $70bn BI initiative
24
EVIC’05 © Sven F. Crone - www.bis-lab.com
Agenda
o2 wi,2 j
oi
oj wi,j
θ n +1
θ n+5
θ n+2
θ n+6
θ n+3
θ n+ h
Mathematics as abstract θ n+4
representations of reality
⎛ ⎞
Æ use in software simulators, oi = tanh ⎜ ∑ w ji o j − θ i ⎟
⎜ ⎟
hardware, engineering etc. ⎝ j ⎠
neural_net = eval(net_name);
[num_rows, ins] = size(neural_net.iw{1});
[outs,num_cols] = size(neural_net.lw{neural_net.numLayers,
neural_net.numLayers-1});
if (strcmp(neural_net.adaptFcn,''))
net_type = 'RBF';
else net_type = 'MLP';
end
fid = fopen(path,'w');
25
EVIC’05 © Sven F. Crone - www.bis-lab.com
Alternative notations –
Information processing in neurons / nodes
Biological Representation
Graphical Notation
Input Input Function Activation Function Output
in1 wi,1
in2
wi,2
neti = ∑ wij in j ai = f ( neti − θ j ) outi
βi => weights wi j
wi,j
inj Neuron / Node ui
=
Unidirectional Information Processing
⎧1 if
⎪
∑w o
j
ji j −θi ≥ 0
outi = ⎨
⎪0 if ∑w ji o j − θi < 0
⎩ j
Input Functions
26
EVIC’05 © Sven F. Crone - www.bis-lab.com
2.2
o1 w1,i
0.71 2.2* 0.71
+4.0*-1.84
+1.0* 9.01
w2,i -1.84 = 3.212 -4.778 < 0
4 o2 0.00
Î 0.00 oj
3.212
- 8.0
9.01 = -4.778
o3 w3,i
1
8.0
θ=o0 w0,i ⎧1 ∀∑ w ji o j −θ i ≥ 0
⎪ j
oi = ⎨
neti = ∑ wij o j − θ j ai = f (net i ) ⎪0 ∑ w ji o j − θ i < 0
j ⎩ j
Hyperbolic Tangent
Logistic Function
27
EVIC’05 © Sven F. Crone - www.bis-lab.com
2.2
o1 w1,i
0.71 2.2* 0.71
+4.0*-1.84
+1.0* 9.01
w2,i -1.84 = 3.212 Tanh(-4.778)
4 o2 -0.9998
= -0.9998 oj
3.212
- 8.0
9.01 = -4.778
o3 w3,i
1
8.0
θ=o0 w0,i
⎛ ⎞
neti = ∑ wij o j − θ j ai = f (net i ) oi = tanh ⎜ ∑ w ji o j − θ i ⎟⎟
⎜
j ⎝ j ⎠
y = β o + β1 x1 + β 2 x2 + ... + β n xn + ε
1
β0
x1
β1
x2 β2
∑β x
n
n n y
… βn
xn
⎜ ⎜ ⎟ ⎟
⎝ e⎝ i=1 ⎠
+ e ⎝ i=1 ⎠ ⎠
Also: θ n+1
θ n+ 2
θ n+5
θ n+3
Æ Simplification
θ n+ 4
for complex models!
28
EVIC’05 © Sven F. Crone - www.bis-lab.com
Combination of Nodes
X4 u4 u7 u11
29
EVIC’05 © Sven F. Crone - www.bis-lab.com
Agenda
Hebbian Learning
Δwij = η oi a j
x1 x W
1 2 3 4 5 6 7 8 9 10
1 5 o1 1
9 2
x2 3
2 6 4
o2 5
6
x3 10 7
3 7 8
9
w3,8 w3,10 10
8
E=o-t
o
Output-Vector
t
Teaching-Output
30
EVIC’05 © Sven F. Crone - www.bis-lab.com
⎧ ∂C (t pj , o pj ) '
⎪ f j (net pj ) if unit j is in the output layer
⎪ ∂o
δ pj = ⎨ pj
⎪ f ' ( net ) δ w
pj ∑ pk pjk if unit j is in a hidden layer
⎩⎪
j
k
random random
starting point 1 starting point 2
local
minimum
GLOBAL
minimum wj
-2
local search
-4
stepsize fixed
follow steepest decent
2
Ælocal optimum = any valley
Æglobal optimum = deepest
valley with lowest error
0
0
-2
-2 -4
6
4
0 5
2
2.5
0
2 0
-2.5
4 -4
-2 0
0
2
4
31
EVIC’05 © Sven F. Crone - www.bis-lab.com
Agenda
1. Forecasting?
2. Neural Networks?
3. Forecasting with Neural Networks …
1. NN models for Time Series & Dynamic Causal Prediction
2. NN experiments
3. Process of NN modelling
4. How to write a good Neural Network forecasting paper!
yˆ t + h = f ( xt ) + ε t + h
yt+h = forecast for t+h
f (-) = linear / non-linear function
xt = vector of observations in t
et+h = independent error term in t+h
32
EVIC’05 © Sven F. Crone - www.bis-lab.com
Æ Interpretation
weights represent
autoregressive terms
Same problems /
shortcomings as
standard AR-models!
Æ Extensions
multiple output nodes
= simultaneous auto-
regression models
Non-linearity through
different activation
function in output
node
yˆ t +1 = f ( yt , yt −1 , yt − 2 ,..., yt − n −1 )
yˆt +1 = yt wtj + yt −1wt −1 j + yt − 2 wt − 2 j + ... + yt − n −1 wt − n −1 j − θ j
linear autoregressive AR(p)-model
Æ Extensions
yˆ t +1 = f ( yt , yt −1 , yt − 2 ,..., yt − n −1 ) additional layers with
⎛t − n −1
⎞ nonlinear nodes
yˆ t +1 = tanh ⎜ ∑ yi wij −θ j ⎟ linear activation
⎝ i =t ⎠
Nonlinear autoregressive AR(p)-model function in output layer
33
EVIC’05 © Sven F. Crone - www.bis-lab.com
Æ Interpretation
Autoregressive
modeling AR(p)-
approach WITHOUT
the moving average
terms of errors
≠ nonlinear ARIMA
θ n +1
Similar problems /
shortcomings as
standard AR-models!
θ n+ 2
θn+5
Æ Extensions
θ n +3
multiple output nodes
= simultaneous auto-
θ n+ 4 regression models
yˆ t +1 = f ( yt , yt −1 , yt − 2 ,..., yt − n −1 )
⎛ ⎛ ⎛ ⎞ ⎞ ⎞
yˆt +1 = tanh ⎜ ∑ wkj tanh ⎜ ∑ wki tanh ⎜ ∑ w ji yt − j − θ j ⎟ − θ i ⎟ − θ k ⎟
⎜ k ⎜ i ⎟ ⎟
⎝ ⎝ ⎝ j ⎠ ⎠ ⎠
Nonlinear autoregressive AR(p)-model
Æ Interpretation
As single
Autoregressive
modeling AR(p)
θ n +1
θ n +5
θ n+2
θ n +6
θ n +3
θ n+h
θ n+4
Æ Interpretation
As single
Autoregressive
modeling AR(p)
Additional Event term
to explain external
events
θ n +1
Æ Extensions
θn +2 multiple output nodes
θ n +5 = simultaneous
multiple regression
θ n +3
θn +4
34
EVIC’05 © Sven F. Crone - www.bis-lab.com
Max Temperature
Rainfall
Sunshine Hours
yˆ = f ( x1 , x2 , x3 ,..., xn )
yˆ = x1 w1 j + x2t w2 j + x3t w3 j + ... + xn wnj − θ j
Linear Regression Model
Max Temperature
Rainfall
Sunshine Hours
yˆ t +1 = f ( yt , yt −1 , yt − 2 ,..., yt − n −1 )
⎛ t − n −1 ⎞
yˆt +1 = log ⎜ ∑
⎝ i =t
yi wij −θ j ⎟
⎠
Nonlinear Multiple (Logistic) Regression Model
Æ Interpretation
Similar to linear
Multiple Regression
Modeling
Without nonlinearity
in output: weighted
θ n+1 expert regime on non-
linear regression
With nonlinearity in
θ n+ 2 output layer: ???
θ n +5
θ n+ 3
θ n+ 4
yˆ = f ( x1 , x2 , x3 ,..., xn )
yˆ = x1 w1 j + x2t w2 j + x3t w3 j + ... + xn wnj − θ j
Nonlinear Regression Model
35
EVIC’05 © Sven F. Crone - www.bis-lab.com
Focus
Problem!
BUT:
Can model MA(q)-process through extended AR(p) window!
Can model SARMAX-processes through recurrent NN
36
EVIC’05 © Sven F. Crone - www.bis-lab.com
Agenda
1. Forecasting?
2. Neural Networks?
3. Forecasting with Neural Networks …
1. NN models for Time Series & Dynamic Causal Prediction
2. NN experiments
3. Process of NN modelling
4. How to write a good Neural Network forecasting paper!
… ???
Æ Simulation of
Neural Network prediction of
Artificial Time Series
37
EVIC’05 © Sven F. Crone - www.bis-lab.com
; ;
; ; ;
; ; ;
38
EVIC’05 © Sven F. Crone - www.bis-lab.com
Agenda
Data Pre-processing
Transformation
Scaling
Normalizing to [0;1] or [-1;1]
NN Modelling Process
Modelling of NN architecture
Number of INPUT nodes
Number of HIDDEN nodes
Number of HIDDEN LAYERS manual
Number of OUTPUT nodes Decisions recquire
Information processing in Nodes (Act. Functions)
Interconnection of Nodes Expert-Knowledge
Training
Initializing of weights (how often?)
Training method (backprop, higher order …)
Training parameters
Evaluation of best model (early stopping)
Evaluation
Evaluation criteria & selected dataset
39
EVIC’05 © Sven F. Crone - www.bis-lab.com
D= [DSE DSA ]
Dataset Selection Sampling
P= [C N S ]
Preprocessing Correction Normalization Scaling
A= [NI NS NL NO K T ]
Architecture no. of input no. of hidden no. of hidden no. of output connectivity / Activation
nodes nodes layers nodes weight matrix Strategy
U= [FI FA FO ]
signal processing Input Activation Output
function Function Function
L= [G PT,L IP IN B ]
learning algorithm choice of Learning initializations number of stopping
Algorithm parameters procedure initializations method &
phase & layer parameters
O
objective Function
Æ Simulation Experiments
40
EVIC’05 © Sven F. Crone - www.bis-lab.com
Data Preprocessing
Data Transformation
Verification, correction & editing (data entry errors etc.)
Coding of Variables
Scaling of Variables
Selection of independent Variables (PCA)
Outlier removal
Missing Value imputation
Data Coding
Binary coding of external events Æ binary coding
n and n-1 coding have no significant impact, n-coding appears to be more
robust (despite issues of multicollinearity)
41
EVIC’05 © Sven F. Crone - www.bis-lab.com
Outliers
extreme values
Coding errors
Data errors
Actions 0 10 253 -1 +1
Æ Eliminate outliers (delete records)
Æ replace / impute values as missing values
Æ Binning of variable = rescaling
Æ Normalisation of variables = scaling
Asymmetry
of observations
…
Æ Transform data
Transformation of data (functional transformation of values)
Linearization or Normalisation
Æ Rescale (DOWNSCALE) data to allow better analysis by
Binning of data (grouping of data into groups) Æ ordinal scale!
2 = Woman
Recode as 1 of N Coding Æ 3 new bit-variables
1 0 0 Æ Business Press
0 1 0 Æ Sports & Fun
0 0 1 Æ Woman
Recode 1 of N-1 Coding Æ 2 new bit-variables
1 0 Æ Business Press
0 1 Æ Sports & Fun
0 0 Æ Woman
42
EVIC’05 © Sven F. Crone - www.bis-lab.com
Solutions
Missing value of interval scale Æ mean, median, etc.
Missing value of nominal scale Æ most prominent value in feature
set
Æ Simulation Experiments
Æ Simulation Experiments
43
EVIC’05 © Sven F. Crone - www.bis-lab.com
Interconnection of Nodes
???
Æ Simulation Experiments
Agenda
44
EVIC’05 © Sven F. Crone - www.bis-lab.com
40,0
35,0
30,0
25,0
[MAE]
20,0
15,0
10,0
5,0
lowest error
0,0
highest error
1 5 10 25 50 100 200 400 800 1600
[initialisations]
Æ Simulation Experiments
Æ Simulation Experiments
Agenda
45
EVIC’05 © Sven F. Crone - www.bis-lab.com
Experimental Results
0 ,2 5
AA
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
AA
A
A
A
A
Æ significant positive correlations
A
A
AAAA AAAA A A
A
A
A
A
AAA
A
AA AAA AA A AA
A
A
A
A
AA
A
A
A A
A A A AA AAA A
AA
A
A
A
AA A
A
A
A AAA
A AAA
A
A
AAA
A
AA A A A A
AA A A
A A
AA
A
A
A
A
A
AA
A
A
A
A
A
A
A
A
A
AA
A A
AAAAA
A
A
A AAAAAAA
A A A A
A A
A
A
AA
A
A
A
A
A
A
AA
AA
AAAAAAAA A A
A
AAAA
A
A
A AA A
A A A A A
A
A A
A AAAAAA A A
AA A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
AA
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
AA
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
AA
A
A
A
A
A
A
AAA
AA
AAA
AA
AAA A
AA
A
A
A
AA
A A
6
A A A
A
A
AA
A
A
A
A
A
AA
A
A
A
AA
AA
A
A
AA
A
AA
A
AA
A
AA
A
A
AA
AA
AA
AA
AA
AA
A
A AA A
A A A
A
AA
AA
A
A
A
A
A
A
AA
A
A
A
A
A
A
AA
A
A
A
A
A
A
A
AA
A
A
A
AA
A
A
AA
A
A
AA
AA
A
AA
A
AA
A
AA
A
A
AA
A
A
AA A
A
A AAA A
A A
AAA A A A
A
AA
A
A
A
A
AAA A
A
AA
A
A
A
A
A
A
AA
A
A
A
A
AA
A
A
A
A
AA
A
A
A
A
AA
A
A
A
AA
A
A
AA
A
A
AA
AA
AA
A
AAA
A
A AA AA A AA A
A
A
AA
A
A
A
A
AA
A
A
A
AA
A
A
A
A
A
A
AA
A
A
A
AA
A
A
A
A
AA
A
AA
A
AA
A
AA
A
AA
A
AA
A
AA
A
AA A A
A
AAA
A
A
AA
A
A
A
A
AA
A
A A
A
A
AA
A
A
AA
A
A
AA
A
AAA
A
AA
A A
AAA
A AAAA A A
A
AAAA
AA
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
AAA
A
AA
A
A
A
A
A
A
A
A
A
A
A
A
AA
A
A
A
A A A A A A
A
A
A
AA
A
A
AA
0 ,1 0 A
AAA
A
A
AA
A
A
A
AA
A
A
A
A
AA
A
A
A
A
A
A
A
A
A
AA
A
A
A
AA
AA
AA
A
A
A
A
A
AA
A
A
A
A
A
AA
A
A
A
AA
A
AA
A
A
A
AA
AA
AA
A
AA
A A A A A
AA
A
AA
A
A
A
AA
A
A
A
AA
A
A
A
AA
A
A
AA
A
AA
A
A
AA
A
A
A
A
A
AA
A
AA
A
A
A
A
A
AA
A
A
A
AA
AAAAAAA
A A
AAAAA
A
A
A
AA
AA
A
A
A
AA
AAA
A
A
A
AA
A
A
AA
A
A
A
A
AA
A
A
AA
A
AA
AA
A
AAA
AA
AAAA
A
A A A
A
A
AAA
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
AA
A
AA
A
A
A
A
A
A
A
A
AA
A
A
A
A
A
A
AA
A
AA
A
A
A
A
A
A
A
AA
A
AA
A
A
A
A
A
A
A
AA
A
A
A
A
AA
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
AA
A
A
A
A
A
A
A
A
AAAAA A AA
AAAA
A
A
AA
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
AA
AA
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
AA
AA
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
AA
A
A
A
A
A
A
A
A
A
A
AA
A
AAA A
AA A
AA
AA
A
A
A
A
AA
A
A
A
AA
A
A
A
A
AA
A
A
A
A
A
A
A
AA
A
A
A
A
AA
A
A
A
A
AA
A
A
A
A
A
A
A
AA
A
A
A
A
AA
A
A
AA
A
A
AA
AA
A
A
AAA
A
A
AA
A
AA
A
A
A
A
AA
A
A
A
A
A
AA
A
A
A
A
A
A
A
AA
A
A
A
A
A
A
AA
AA
A
A
AA
A
AA
A
A
A
A
AA
A
A
A
A
A
A
A
A
AA
A
A
A
AA
A
A
A
A
A
AA
A
A
A
A
A
AA
A
A
A
A
AA
A
AA
A
A
AA
AA
A
A
AA
AA
AAA
AA
AA
AA
AA
AA
A A
AAA
AAA
A A
0 ,00
0,10
0 ,15
low validation error Æ high test error
higher validation error Æ lower test error
0,05 0,05
0,1 0
0,15 valid
train
decreasing correlation
S -Diagramm of Error Correlations E Tain
0,09 E Valid
for TOP 1000 ANNs by Test error
0,08
E Tain
E Test
20 ANN m ov.Average Train
20 ANN m ov.Average Test
0,035 E Valid
0,07
S-Diagramm of Error Correlations E Tain
high variance on test error
E Test
0,03 0,06 20 ANN mov.Average Train
20 ANN mov.Average Test
for TOP 1000 ANNs by Valid error E Valid
0,05
[Error]
0,025
0,3 0,04
E Test
0,08 0,03
E Valid
0,015 0,02
E Test 150 MovAv. E Train
0,15
0,04
0,03
0,1
0,02
0,05
0,01
00
11 64 127
1056 2111190
31662534221
316 5276
379 6331
442 505
738656884416319496
69410551
757 11606
820 883
12661946
13716
[ordered ANN
[ordered ANN by
by Valid
Valid error]
error]
46
EVIC’05 © Sven F. Crone - www.bis-lab.com
MAPE & MSE are subject to upward bias by single bad forecast
Alternative measures may are based on median instead of mean
⎛ ef ,t ⎞
MdAPEf = Med⎜⎜ ×100⎟⎟
⎝ yt ⎠
Median Squared Error
MdSEf = Med(ef2,t )
47
EVIC’05 © Sven F. Crone - www.bis-lab.com
(
⎛ yˆt +f t − y t +f ⎞ ) 2
∑⎜⎜ y ⎟
⎟
U= ⎝ t ⎠
⎛ (y t − y t +f ) ⎞
2
∑⎝⎜ y ⎟⎠
t
Æ Simulation Experiments
Agenda
1. Forecasting?
2. Neural Networks?
3. Forecasting with Neural Networks …
1. NN models for Time Series & Dynamic Causal Prediction
2. NN experiments
3. Process of NN modelling
4. How to write a good Neural Network forecasting paper!
48
EVIC’05 © Sven F. Crone - www.bis-lab.com
Valid Experiments
Evaluate using ex ante accuracy (HOLD-OUT data)
Use training & validation set for training & model selection
NEVER!!! Use test data except for final evaluation of accuracy
Evaluate across multiple time series
Evaluate against benchmark methods (NAÏVE + domain!)
Evaluate using multiple & robust error measures (not MSE!)
Evaluate using multiple out-of-samples (time series origins)
Æ Evaluate as Empirical Forecasting Competition!
Reliable Results
Document all parameter choices
Document all relevant modelling decisions in process
Æ Rigorous documentation to allow re-simulation through others!
Forecasting Competition
Split up time series data Æ 2 sets PLUS multiple ORIGINS!
Select forecasting model
select best parameters for IN-SAMPLE DATA
Forecast next values for DIFFERENT HORIZONS t+1, t+3, t+18?
Evaluate error on hold out OUT-OF-SAMPLE DATA
choose model with lowest AVERAGE error OUT-OF-SAMPLE DATA
Results Æ M3-competition
simple methods outperform complex ones
exponential smoothing OK
Æ neural networks not necessary
forecasting VALUE depends on
VALUE of INVENTORY DECISION
…
t+1 t+2 t+3
SIMULATED = EX POST Forecasts
49
EVIC’05 © Sven F. Crone - www.bis-lab.com
50
EVIC’05 © Sven F. Crone - www.bis-lab.com
… …
Further Information
Journals
Forecasting … rather than technical Neural Networks literature!
JBF – Journal of Business Forecasting
IJF – International Journal of Forecasting
JoF – Journal of Forecasting
51
EVIC’05 © Sven F. Crone - www.bis-lab.com
Agenda
1. Process of NN Modelling
2. Tips & Tricks for Improving Neural Networks based forecasts
a. Copper Price Forecasting
b. Questions & Answers and Discussion
a. Advantages & Disadvantages of Neural Networks
b. Discussion
- ANN can forecast any time - ANN can forecast any time
series pattern (t+1!) series pattern (t+1!)
- without preprocessing - without preprocessing
- no model selection needed! - no model selection needed!
- ANN offer many degrees of - ANN offer many degrees of
freedom in modeling freedom in modeling
- Freedom in forecasting with - Experience essential!
one single model - Research not consistent
- Complete Model Repository - explanation & interpretation
- linear models of ANN weights IMPOSSIBLE
- nonlinear models
(nonlinear combination!)
- Autoregression models
- impact of events not directly
- single & multiple regres.
deductible
- Multiple step ahead
- …
52
EVIC’05 © Sven F. Crone - www.bis-lab.com
Contact Information
Sven F. Crone
Research Associate
Internet www.lums.lancs.ac.uk
eMail s.crone@lancaster.ac.uk
53