Professional Documents
Culture Documents
\
|
+ + =
q
j
p
i
t j i j j t
x w w f w w y
1 1
, , 0 0
where w
i,j
(i=0,1,2,,p;j=1,2,,q) and wj (j=0,1,2,
,q) are the connection weights, p is the number of input
nodes, q is the number of hidden nodes, and f is a nonlinear
activation function that enables the system to learn nonlinear
features. The most widely used activation function for the
output layer are the sigmoid and hyperbolic functions. In
this paper, the hyperbolic transfer function is employed and
is given by:
( ) ,
1
1
2
2
x
x
e
e
x f
=
The MLP is trained using the backpropagation (BP)
algorithm and the weights are optimized. The objective
function to minimize is the sum of the squares of the
difference between the desirable output (y
t,p
) and the
predicted output (y
t,d
) given by:
( )
= =
= =
1
2
1
2
, ,
5 . 0 5 . 0
t t
d t p t
e y y E
The training of the network is performed by the well-
known Backpropagation (Rumelhart et al., 1986) algorithm
trained with the steepest descent algorithm given as follows:
k k k
g w =
where, w
k
is a vector of weights changes, g
k
is the
current gradient,
k
is the learning rate that determines the
length of the weight update. Thus, in the gradient descent
learning rule, the update is done in the negative gradient
direction. In order to avoid oscillations and to reduce
the sensitivity of the network to fast changes of the error
surface (Jang and Mizutani, 1997), the change in weight
is made dependent of the past weight change by adding a
momentum term:
1
+ =
k k k k
w p g w
where, p is the momentum parameter. Furthermore, the
momentum allows escaping from small local minima on
the error surface (Castillo and Soria, 2003). Unfortunately,
the gradient descent and gradient descent with momentum
do not produce the fastest convergence, and even are often
too slow to converge. One solution to speed up rate of
convergence is to use numerical optimization techniques
which can be broken into three categories: conjugate gradient
algorithms, quasi-Netwon algorithms, and Levenberg-
Marquardt algorithm. In particular, this paper compares the
prediction performance of the following algorithms: quasi-
Newton (Broyden-Fletcher-Goldfarb-Shanno, BFGS),
conjugate gradient (Fletcher-Reeves update, Polak-Ribire
update, Powell-Beale restart), and Levenberg-Marquardt
algorithm. The algorithms are briefy described in Table
1, and are well presented in Scales (1985) and Jorge and
Wright (2006).
Lahmiri S. - Wavelet Transform, Neural Networks and The Prediction of S&P Price Index: A Comparative Study of Backpropagation Numerical Algorithms.
Salim Lahmiri
Business Intelligence Journal - July, 2012 Vol.5 No.2
238 Business Intelligence Journal July
Table 1. Description of the conjugate, quasi-Newton (secant), and L-M algorithms
Algorithm Computation of search direction Description
Fletcher-Reeves
(conjugate)
0 0
g p =
k k k
p w =
1
+ =
k k k k
p g p
1 1
=
k k
k k
k
g g
g g
=
k k
k k
k
g g
g g
+
=
*
,
,
where x(t) represents the analyzed signal, a and b
represent the scaling factor and translation along the time
axis, respectively, and the superscript (*) denotes the
( )
|
.
|
\
|
=
a
b t
a
t
b a
1
,
where (t) represents the wavelet. Following the
methodology in (Wang et al., 2011), the price index time
series are decomposed using the discrete wavelet transform
(DWT) with Daubechies (db3) as mother wavelet and
with six decomposition levels. Finally, the low-frequency
components are used in this paper as inputs to artifcial
neural networks to predict future stock market price index.
Figure 1 shows the block diagram of the wavelet-BP
approach.
complex conjugate. The function a,b(.) is obtained by
scaling the wavelet at time b and scale a as follows:
2012 239
Wavelet
Approximation
Coefficients:
A (t)
Price index:
P(t)
Wavelet
Transform
Neural
Networks
Predicted Price
index: P(t+1)
Figure1. DWT-BP approach
Lags-BP Approach
This model uses lagged price index values as inputs to
the neural networks in order to predict future price values.
To fnd the appropriate lags the following methodology is
considered. First, the stock market price level is linearly
transformed according to:
( ) ( ) ( ) ( ) ( ) t P t P t R log log =
Then, the auto-correlation function is computed as
follows:
( ) ( ) ( ) ( ) ( ) ( )
1
1
2
1
= + =
(
=
T
t
T
k t
k
R k t R R k t R R t R
where t is time script, k is a lag order which is determined
using the auto-correlation function, and is the sample mean
of R(t). The appropriate k is determined following the
methodology of Greene (2002) and Brockwell and Davis
(2002). Indeed, it is important to include past returns to
predict future market directions if the return series are auto-
correlated. In other words, history of the returns may help
predicting future returns. Figure 2 shows the block diagram
of the Lags-BP approach.
Computing auto-
correlation function
Price index:
P(t)
Linear
transformation
Finding appropriate
lags
Predicted Price
index: P(t+1)
Neural
Networks
Performance Measures
The prediction performance is evaluated using the
following common statistics: root mean of squared errors
(RMSE), mean absolute error (MAE), mean absolute
percentage error (MAPE), average relative variance (ARV),
coeffcient of variation (CoV), and mean absolute deviation
(MAD). They are given as follows:
( )
=
=
N
t
t t
F P
N
RMSE
1
2
1
=
=
N
t
t t
F P
N
MAE
1
1
=
N
t t
t t
P
F P
N
MAPE
1
1
( )
( )
=
=
=
N
t
t
N
t
t t
P F
F P
ARV
1
1
2
( )
P
P F N
CoV
N
t
t t
=
2 1
=
=
N
t
t
F F
N
MAD
1
1
Lahmiri S. - Wavelet Transform, Neural Networks and The Prediction of S&P Price Index: A Comparative Study of Backpropagation Numerical Algorithms.
Salim Lahmiri
Business Intelligence Journal - July, 2012 Vol.5 No.2
240 Business Intelligence Journal July
where P
t,
F
t,
and F are respectively the true price,
forecasted price and the average of forecasted prices over
the testing (out-of-sample) period t =1 to N.
Experimental Results
The initial sample of the S&P500 daily prices is from
October 2003 to January 2008, with no missing values. All
neural networks are trained with 80% of the entire sample
and tested with the remaining 20%. Figure 3 shows the
decomposition of the index price series P(t) using the
wavelet transform.; where s, a
i
and d
i
represent respectively
the price index, approximation signal, and details signals
at level i. Figure 4 shows the auto-correlation function
up to 10 lags. Since is nonzero for k=1 and k=2, it
means that the series is serially correlated. In addition,
the values of the auto-correlation function die quickly in
the frst three lags which is a sign that the series obeys a
low-order autoregressive (AR) process; for example an
AR(2). Therefore, the number of lags to be included in the
prediction models is up to k=2.
Figure 3. Price Index Series Decomposed Using the DWT.
The performance of each numerical optimization
technique given the type of input (lags or DWT coeffcients)
in terms of MAD, RMSE, and MAE is shown in Figure 5; in
terms of ARV and MAPE is shown in fgure 6; and in terms
of CoV is shown in Figure 7. For instance, Figure 5 shows
clearly that BFGS, L-M, and Powell-Beale outperform
Fletcher-Reeves and Polak-Ribire algorithm whether using
lags or DWT approach according to MAD, RMSE, and
MAE statistics. In addition, BFGS perform slightly better
than the L-M algorithm. On the other hand, the simulations
results show evidence that BFGS, L-M, and Powell-Beale
provide higher accuracy with previous price index values
as inputs than with DWT approximation coeffcients. On
the other hand, based on ARV and MAPE statistics, BFGS,
L-M, and Powell-Beale outperform Fletcher-Reeves and
Polak-Ribire algorithm both for lags and DWT approach
(Figure 6). In addition, BFGS perform clearly much better
than the L-M algorithm. The simulations results also show
2012 241
strong evidence of the superiority of Lags-BP over the DWT-
BP approach. Finally, as shown in Figure 7, the coeffcient
of variation statistic confrms all previous results. In sum,
the BFGS algorithm has shown its superiority over the
other numerical techniques to approximate the nonlinear
relationship between the inputs and the S&P500 price index
values. In addition, previous price index values outperforms
wavelet approximation signals to predict future prices
of the S&P500 market. In terms of time of convergence,
Figure 8 shows that the L-M algorithm is in general faster
than conjugate gradient algorithms. On the other hand,
BFGS quasi-Netwon technique is not fast to converge.
However, all these algorithms can be implemented for real
time forecasting tasks since the maxim time of convergence
is 6.29 seconds.
Figure 4. Auto-correlation function
k
of S&P500 returns at different lags k.
Figure 5. Performance in terms of MAD, RMSE, and MAE
Figure 6. Performance in terms of ARV and MAPE
Lahmiri S. - Wavelet Transform, Neural Networks and The Prediction of S&P Price Index: A Comparative Study of Backpropagation Numerical Algorithms.
Salim Lahmiri
Business Intelligence Journal - July, 2012 Vol.5 No.2
242 Business Intelligence Journal July
Figure 7. Performance in terms of coeffcient of variance
Figure 8. Running time.
Conclusion
In this paper, the problem of S&P500 price index
prediction using backpropagation algorithm is considered.
Gradient descent and gradient descent with momentum are
often too slow to converge. One solution to speed up rate of
convergence is to use numerical optimization techniques.
Numerical methods can be broken into three categories:
conjugate gradient, quasi-Netwon, and Levenberg-
Marquardt algorithm. This paper compares the prediction
performance of the following algorithms: conjugate
gradient (Fletcher-Reeves update, Polak-Ribire update,
Powell-Beale restart), quasi-Newton (Broyden-Fletcher-
Goldfarb-Shanno, BFGS), and Levenberg-Marquardt (L-
M) algorithm. In addition, we examine the effectiveness
of the discrete wavelet transform as a multiresolution
technique to extract valuable information from the S&P500
time series. The performances of the learning algorithms
are evaluated by comparing the statistical measures of the
prediction error.
The simulations results show strong evidence of the
superiority of the BFGS algorithm followed by the L-M
algorithm. However, the difference between the two
numerical algorithms is very small when ARV, MAPE,
and CoV statistic are taken into account. In other words,
L-M algorithm is very powerful too. Thus, our fndings
confrm the results obtained by Kisi and Uncuoglu (2005)
and Esugasini et al. (2005). On the other hand, the L-M
algorithm is faster than the other algorithms; except Polack-
Ribire using DWT approximation coeffcients. Indeed, the
LM converges faster than conjugate gradient methods since
the Hessian matrix is not computed but only approximated
and the use of the Jacobian requires less computation than
the use of the Hessian matrix. Finally, unlike the literature,
we found that, previous price index values outperforms
wavelet approximation signals to predict future prices of
the S&P500 market. In other words, in comparison with
wavelet approximation coeffcients; more recent values
of the price index contain more information about its
behaviour the following day. As a result, we conclude that
the prediction system based on previous lags of S&P500
as inputs to the backpropagation neural networks trained
with BFGS provide the lowest prediction errors. For future
works, the effectiveness of wavelet extracted signals in
fnance should be more investigated.
2012 243
References
Ahmed WAM, Saad ESM, Aziz ESA (2001). Modifed
Back Propagation Algorithm for Learning Artifcial
Neural Networks. Proceedings of the Eighteenth
National Radio Science Conference (NRSC): 345-
352.
Atsalakis GS, Valavanis KP (2008). Surveying
Stock Market Forecasting Techniques Part II:
Soft Computing Methods. Expert Systems with
Applications. 36: 5932-5941.
Box G, Jenkins G (1970). Time Series Analysis:
Forecasting and Control. Holden-Day.
Cybenko G (1989). Approximations by Superpositions
of a Sigmoidal Function. Mathematics of Control,
Signals, and Systems. 2: 303-314.
Enke D, Thawornwong S (2005). The Use of Data
Mining and Neural Networks for Forecasting Stock
Market Returns. Expert Systems with Applications.
29: 927-940.
Esugasini E, Mashor MY, Isa N, Othman N (2005).
Performance Comparison for MLP Networks Using
Various Back Propagation Algorithms for Breast
Cancer Diagnosis. Lecture Notes in Computer
Science. 3682 : 123-130.
Giordano F, La Rocca M, Perna C (2007). Forecasting
Nonlinear Time Series with Neural Network Sieve
Bootstrap. Computational Statistics and Data
Analysis. 51: 3871-3884.
Hornik K, Stinnchcombe M, White H (1989). Multi-
layer Feed Forward Networks are Universal
Approximators. Neural Networks. 2: 359-366.
Huang W, Lai. KK, Nakamori Y, Wang SY, Yu L
(2007). Neural Networks in Finance and Economics
Forecasting. International Journal of Information
Technology and Decision Making. 6: 113-140.
Iftikhar A, Ansari MA, Mohsin S (2008). Performance
Comparison between Backpropagation Algorithms
Applied to Intrusion Detection in Computer Network
Systems. 9th WSEAS International Conference on
Neural Networks (NN08): 231-236.
Kii , Uncuoglu E (2005). Comparaison of Three
Back-Propagation Training Algorithms for Two
Cases Studies. Indian Journal of Engineering &
Materials Sciences. 12: 434-442.
Mokbnache L, Boubakeur A (2002). Comparison of
Different Back-Propagation Algorithms used in
The Diagnosis of Transformer Oil. IEEE Annual
Report Conference on Electrical Insulation and
Dielectric Phenomena: 244-247.
Wang J-Z, Wang J-J, Zhang Z-G, Guo S-P (2011).
Forecasting Stock Indices with Back Propagation
Neural Network. Expert Systems with Applications.
38: 14346-14355.
Wen J, Zhao JL, Luo SW, Han Z (2000). The
Improvements of BP Neural Network Learning
Algorithm. Proceedings of the 5th Int. Conf. on
Signal Processing WCCC-ICSP. 3:1647-1649.
Nouir Z, Sayrac ZB, Fouresti B, Tabbara W,
Brouaye F (2008). Comparison of Neural Network
Learning Algorithms for Prediction Enhancement
of a Planning Tool. http://confs.comelec.telecom-
paristech.fr/EW2007/papers/1569014736.pdf
Zhang G, Patuwo BE, Hu MY (1998). Forecasting
with Artifcial Neural Networks: The State of The
Art. International Journal of Forecasting. 14: 35-62.
Hsieh T-J, Hsiao HF, Yeh W-C (2010). Forecasting
Stock Markets using Wavelet Transforms and
Recurrent Neural Networks: An Integrated System
Based on Artifcial Bee Colony Algorithm. Applied
Soft Computing. 11 (2): 2510-2525.
Li S-T, Kuo S-C (2008). Knowledge Discovery in
Financial Investment for Forecasting and Trading
Strategy Through Wavelet-Based SOM Networks.
Expert Systems with Applications. 34: 935-951.
Funahashi K. (1989). On The Approximate Realisation
of Continuous Mappings by Neural Networks.
Neural Networks. 2: 183-192.
Rumelhart DE, Hinton GE, Williams RJ (1986).
Learning Representations by BackPropagating
Errors. Nature. 323: 533-536.
Jang J-SR., Sun C-T, Mizutani E (1997). Neuro-Fuzzy
and Soft Computing: A Computational Approach to
Learning and Machine Intelligence. NJ: Prentice-
Hall.
Ramrez E, Castillo O, Soria J (2003). Hybrid System
for Cardiac Arrhythmia Classifcation with Fuzzy
k-Nearest Neighbors and Multi Layer Perceptrons
Combined by a Fuzzy Inference System. The
International Joint Conference on Neural Networks
(IJCNN), 18-23 July, Barcelona, Spain.
Scales LE (1985). Introduction to Non-Linear
Optimization. New York, Springer-Verlag.
Jorge N, Wright SJ (2006). Numerical Optimization.
2nd Edition. Springer.
Lahmiri S. - Wavelet Transform, Neural Networks and The Prediction of S&P Price Index: A Comparative Study of Backpropagation Numerical Algorithms.
Salim Lahmiri
Business Intelligence Journal - July, 2012 Vol.5 No.2
244 Business Intelligence Journal July
Charalambous C (1992). Conjugate Gradient
Algorithm for Effcient Training of Artifcial Neural
Networks. IEEE Proceedings. 139 (3): 301-310.
Daubechies I (1990). The wavelet Transform, Time-
Frequency Localization and Signal Analysis. IEEE
Transactions in Information Theory. 36 (5): 961-
1005.
Greene WH (2002). Econometric Analysis. Prentice
Hall; 5th edition.
Brockwell PJ, Davis RA (2002). Introduction to Time
Series and Forecasting. Springer; 8th Printing.