1 s2.0 S0957417410009930 Main

Expert Systems with Applications 38 (2011) 48514859
Contents lists available at ScienceDirect
Expert Systems with Applications

journal homepage: www.elsevier.com/locate/eswa
An intelligent forecasting model based on robust wavelet m-support vector machine

Qi Wu a,b,, Rob Law b
a
b
Jiangsu Key Laboratory for Design and Manufacture of MicroNano Biomedical Instruments, Southeast University, Nanjing 211189, China
School of Hotel and Tourism Management, Hong Kong Polytechnic University, Hung Hom, Kowloon, Hong Kong
a r t i c l e
i n f o
Keywords:
Support vector machine
Wavelet kernel
Robust loss function
Particle swarm optimization
Forecast
a b s t r a c t
Aiming at the problem of small samples, season character, nonlinearity, randomicity and fuzziness in
product demand series, the existing support vector kernel does not approach the random curve of the
demands time series in the L2(Rn) space (quadratic continuous integral space). The robust loss function
is also proposed to solve the shortcoming of e-insensitive loss function during handling hybrid noises.
A novel robust wavelet support vector machine (RW m-SVM) is proposed based on wavelet theory and
the modied support vector machine. Particle swarm optimization algorithm is designed to select the
optimal parameters of RW m-SVM model in the scope of constraint permission. The results of application
in car demand forecasts show that the forecasting approach based on the RW m-SVM model is effective
and feasible, the comparison between the method proposed in this paper and other ones is also given
which proves this method is better than RW m-SVM and other traditional methods.
2010 Elsevier Ltd. All rights reserved.
1. Introduction
Application of time series prediction can found in the areas of
economic and business planning, inventory and product control,
weather forecasting, signal processing and many other elds
(Box & Jenkins, 1994; Engle, 1984; Hornik, Stinchcombe, & White,
1989; Hill, Connor, & Remus, 1996; Tuan & Lanh, 1981; Tong, 1983;
Tang, Almedia, & Fishwick, 1991; Zhang, 2001). Product demand
forecasting as an application of time series forecasting is a complex
dynamic system, and the demand behavior is affected by many factors. Many of these factors have the random, nonlinear, seasonal,
and uncertain characteristics. There is a kind of nonlinear mapping
relationship between the inuencing factors and demand series,
and it is difcult to describe the relationship by denite mathematical models.
For the linear series, Box and Jenkins (1994) developed the
autoregressive integrated moving average (ARIMA) methodology
for forecasting time series events. A basic tenet of the ARIMA modeling approach is the assumption of linearity among the variables.
However, there are many time series events for which the assumption of linearity may not hold. Clearly, ARIMA models cannot be
effectively used to capture and explain nonlinear relationships.
When ARIMA models are applied to processes that are nonlinear,
forecasting errors often increase greatly as the forecasting horizon
Corresponding author at: Jiangsu Key Laboratory for Design and Manufacture of
MicroNano Biomedical Instruments, Southeast University, Nanjing 211189, China.
Tel.: +86 25 51166581; fax: +86 25 511665260.
E-mail addresses: wuqi7812@163.com, hmwuqi@polyu.edu.hk (Q. Wu), hmroblaw@inet.polyu.edu.hk (R. Law).
0957-4174/$ - see front matter 2010 Elsevier Ltd. All rights reserved.
doi:10.1016/j.eswa.2010.09.036
becomes longer. To improve forecasting nonlinear time series

events, researchers have developed alternative modeling approaches, which include nonlinear regression models, the bilinear
model (Tuan & Lanh, 1981), the threshold autoregressive model
(Tong, 1983), and the autoregressive heteroscedastic model
(ARCH) (Engle, 1984). Although these methods exhibiting improvement over the linear models for some specic case, tend to be
application specic, lack of generality and harder to implement
(Zhang, 2001).
For the nonlinear series, the articial neural network (ANN) is a
general purpose model that has been used as a universal functional
approximator. For example, it is supposed to be able to model easily any type of parametric or non-parametric process including
automatically and optimally transforming the input data. These
claims lead an increasing interest in neural networks (Hornik
et al., 1989). Researchers use ANN methodology to forecast a number of nonlinear time series events (Hill et al., 1996; Tang et al.,
1991; Tang & Fishwick, 1993). The effectiveness of neural network
models and their performance in comparison to traditional forecasting methods have also been a subject of many studies (Gorr,
1994; Zhang, Patuwo, & Hu, 1998). Bell, Ribar, and Verchio
(1989) compare back-propagation networks against regression
models in predicting commercial bank failures. The neural network
model performs well in failure prediction and the expected costs
for misclassication by the neural network models are found to
be lower than those of the logistic regression model. Roy and Cosset (1990) also use neural network and logistic regression models
in predicting country risk ratings for economic models and political
indicators. The neural network models have lower mean absolute
error in their predictions and react more evenly to the indicators
4852
Q. Wu, R. Law / Expert Systems with Applications 38 (2011) 48514859
than their logistic counterparts. Duliba (1991) compare neural network models with four types of regression models in predicting the
nancial performance of transportation companies. She has found
that the neural network model outperforms the random-effects
regression model rather than the xed-effects model. Though neural networks are more powerful than regression methods for time
series prediction, their drawback is that the design of an efcient
architecture and the choice of the parameters involved require
longer processing time. In fact, learning neural network weights
can be considered as a hard optimization problem for which the
learning time scales exponentially as the problem size grows. To
overcome this disadvantage, a new approach should be explored.
Recently, a novel machine learning technique, called support
vector machine (SVM), has drawn much attention in the elds of
pattern classication and regression forecasting. SVM was rst
introduced by Vapnik (1995). Support vector machine (SVM) is a
kind of classiers studying method on statistic study theory. This
algorithm derives from linear classier, and can solve the problem
of two kind classier, later this algorithm applies in non-linear
elds, that is to say, we can nd the optimal hyperplane (large
margin) to classify the samples set. It is an approximate implementation to the structure risk minimization (SRM) principle in statistical learning theory, rather than the empirical risk minimization
(ERM) method (Kwok, 1999).
Compared with traditional neural networks, SVM can use the
theory of minimizing the structure risk to avoid the problems of
excessive study, calamity data, local minimal value and so on.
For the small samples set, this algorithm can be generalized well.
Support vector machine (SVM) has been successfully used for machine learning with large and high dimensional data sets. These
attractive properties make SVM become a promising technique.
This is due to the fact that the generalization property of an SVM
does not depend on the complete training data but only a subset
thereof, the so-called support vectors. Now, SVM has been applied
in many elds as follows: handwriting recognition, three-dimension objects recognition, faces recognition, text images recognition,
voice recognition, regression analysis, and so on Carbonneau,
Laframbois, and Vahidov (2008), Trontl, Smuc, and Pevec (2007)
Wohlberg, Tartakovsky, and Guadagnini (2006).
For pattern recognition and regression analysis, the non-linear
ability of SVM can use kernel mapping to achieve. For the kernel
mapping, the kernel function must satisfy the condition of Mercer
theorem. The Gauss function is a kind of kernel function which is
general used. It shows the good generalization ability. However,
for our used kernel functions so far, the SVM cannot approach
any curve in L2(Rn) space (quadratic continuous integral space), because the kernel function which is used now is not the complete
orthonormal base. This character lead the SVM cannot approach
every curve in the L2(Rn) space, similarly, the regression SVM cannot approach every function.
According to the above describing, we need nd a new kernel
function, and this function can build a set of complete base through
horizontal oating and exing. As we know, this kind of function
has already existed, and it is the wavelet functions. Based on wavelet decomposition, this paper propose a kind of allowable support
vectors kernel function which is named wavelet kernel function,
and we can prove that this kind of kernel function is existent.
The Morlet and Mexican wavelet kernel functions are the orthonormal base of L2(Rn) space. Based on the wavelet analysis and
conditions of the support vector kernel function, Morlet or Mexican
wavelet kernel function for support vector regression machine
(SVM) is proposed, which is a kind of approximately orthonormal
function. This kernel function can simulate almost any curve in
quadratic continuous integral space, thus it enhances the generalization ability of the SVR. The papers (Khandoker, Lai, Begg, &
Palaniswami, 2007; Widodo & Yang, 2008 research on wavelet
e-support vector machine. Much research indicates the performance of m-SVM is better than one of e-SVM. According to the
wavelet kernel function and the regularization theory, m-support
vector machine on wavelet kernel function (Wm-SVM) is proposed
in this paper.
However, the standard SVM encounters certain difculties in
real application. Some improved SVMs have been put forward to
solve the concrete problems (Kwok, 1999). Though the standard
SVM that adopts e-insensitive loss function has good generalization capability in some applications. But it is difcult to handle
Gaussian noises and the normal distribution noise parts of series.
Therefore, this paper focuses on the modeling of a new wavelet
SVM that can penalize the Gaussian noise parts of series.
Based on the RW m-SVM, an intelligence forecasting approach
for car demand series with the nonlinear and uncertain characteristics is proposed in this paper. Section 2 construct an intelligence
forecasting model based on a new m-support vector regression machine on wavelet kernel function and robust loss function (RW mSVM) and particle swarm optimization algorithm (PSO). Section 3
gives two algorithms to solve the intelligence forecasting problem.
Section 4 gives an application of the intelligence forecasting system based on RW m-SVM model. Section 5 draws the conclusions.
2. Robust wavelet m-support vector machine (RW m-SVM)
2.1. Support vector machine
SVM represent a novel neural network technique, which has
gained ground in classication, forecasting and regression analysis.
One of its key properties is that training SVM is equivalent to solving a linearly constrained quadratic programming problem, whose
solution turns out to be unique and globally optimal. Therefore,
unlike other networks training techniques, SVM circumvent the
problem of getting stuck at local minima. Another advantage of
SVM is that the solution to the optimization problem depends only
on a subset of the training data points, which are referred to as the
support vectors.
Let us consider a set of data points (x1, y1), (x2, y2), . . ., (xl, yl),
which are independently and randomly generated from an unknown function. Specically, xi is a column vector of attributes, yi
is a scalar, which represents the dependent variable, and l denotes
the number of data points in the training set. SVM approximate
such an unknown function by mapping x into a higher dimensional
space through a function /, and determining a linear maximummargin hyper-plane. In particle, the smallest distance to such a
hyperplane is called the margin of separation. The hyper-plane will
be an optimal separating hyper-plane if margin is maximized. The
data points that are located exactly the margin distance away from
the hyper-plane are denominated the support vectors.
Mathematically, SVM utilize a classifying hyper-plane of the
form f(x) = w x + b = 0, where the coefcients w and b are estimated by minimizing a regularized risk function:
l
X
1
kwk2 C
Le yi ;
2
i1
Pl
where kwk is denoted as the regularized term,
i1 Le yi is the
empirical error, and C > 0 is an arbitrary penalty parameter called
the regularization constant. Basically, SVM penalize f(xi) when is departs from yi by means of an e-insensitive loss function:

Le yi
if jf xi yi j < e
jf xi yi j e otherwise
So that the predicted values within the e-tube have a zero loss. In
turn, the minimization of the regularized term implies to maximize
4853
the margin of separation to the hyper-plane. The e-insensitive loss

function is illuminated in Fig. 1.
The minimization of expression (1) is implemented by introducing the slack variables ni and ni . Specically, the m-support vector regression (m-SVM) solves the following quadratic
programming problem:
w;n ;e;b
Subject to
Rn
expjx:xKx dx P 0:
yi w xi b 6 e ni ;
x m
1
;
wa;m x a2 w
a
l

1X
v e
n ni
l i1 i
e P 0:
The solution to this minimization problem is of the form
ai ai Kxi ; x b;
Wa; m a2
f xw
1
where ai and ai are the Lagrange multiplies associated with the
constrains (w xi + b) yi 6 e + ni and yi w xi b 6 e ni ,
respectively. The function K(xi, xj) = /(xi)0 /(xj) represents a kernel,
which is the inner product of the two vectors xi and xj in the space
/(xi) and /(xj).
Well-known kernel functions are Kxi ; xj x0i xj (linear),

d
Kxi ; xj cx0i xj r ; c > 0 (polynomial), K(xi, xj) = exp (ckxi
2
xjk ), c > 0 (radial basis function), and Kxi ; xj tanh cx0i xj r
(sigmoid). The radial kernel is a popular choice in the SVM literature.
2.2. The conditions of wavelet support vectors kernel function
The support vectors kernel function can be described as not
only the product of point, such as K(x, x0 ) = K(x x0 ), but also the
horizontal oating function, such as K(x, x0 ) = K(x x0 ). In fact, if a
function satised condition of Mercer, it is the allowable support
vector kernel function.
Kx; x uxux dxdx P 0:
This theorem proposed a simple method to build kernel function.

For the horizontal oating function, because hardly dividing
this function into two same functions, we can give the condition
of horizontal oating kernel function.
yi f ( xi )
x m
dx;
a
11
where w(x) stands for the complex conjugation of w(x).

The wavelet transform W(a, m) can be considered as functions
of translation m with each scale a. Eq. (11) indicates the wavelet
analysis is a timefrequency analysis, or a time-scaled analysis.
Different from the Short Time Fourier Transform, the wavelet
transform can be used for multi-scale analysis of a signal through
dilation and translation so it can extract timefrequency features
of a signal effectively.
Wavelet transform is also reversible, which provides the possibility to reconstruct the original signal. A classical inversion formula for f(x) is:
f x C 1
w
1
Wa; mwa;m x
1
da
dm;
a2
12
where
Cw
1
Lemma 1. The symmetry function K (x, x0 ) is the kernel function of

SVM if and only if: for all function u 0 which satised the condition
R
of Rd u2 ndn < 1, we need satisfy the condition as follows:
10
where a is the so-called scaling parameter, m is the horizontal oating

coefcient, and w(x) is called the mother wavelet. The parameter of
translation m 2 R and dilation a > 0, may be continuous or discrete.
For the function f(x), f(x) 2 L2(R), The wavelet transform f(x) can be dened as:
i1
Z Z
w xi b yi 6 e ni ;
l
X
If the wavelet function w(x) satised the conditions: w(x) 2

^
b is the Fourier transform of function
L2(R) \ L1(R), and wx
0; w
w(x). The wavelet function group can be dened as:
ni P 0;
f x
Fxx 2pn=2
1
sw; n ; e kwk2 C
2
min
Lemma 2. The horizontal oating function is allowable support

vectors kernel function if and only if the Fourier transform of K (x)
need satisfy the condition follows:
^
ww
2
^
jwwj
dw < 1;
jwj
13
wx expjwx dx:
14
For the above Eq. (12), Cw is a constant with respect to w(x). The
theory of wavelet decomposition is to approach the function f(x) by
the linear combination of wavelet function group.
If the wavelet function of one dimension is w(x), using tensor
theory, the multi-dimensional wavelet function can be dened as:
wd x
d
Y
wxi :
15
i1
L (e)
yi f ( xi )
Fig. 1. A pictorial illustration of the e-insensitive loss function.
4854
We can build the horizontal oating kernel function as follows:

xi x0i
;
Kx; x0
w
ai
i1
d
Y
16
f x
where ai is the scaling parameter of wavelet, ai > 0. So far, because

the wavelet kernel function must satisfy the conditions of Theorem
2, the number of wavelet kernel function which can be showed by
existent functions is few. Now, we give an existent wavelet kernel
function: Morlet wavelet kernel function, and we can prove that
this function can satisfy the condition of allowable support vectors
kernel function. Morlet wavelet function is dened as follows:
x2
wx cosx0 xexp 2 :
17
Theorem 1. Morlet wavelet kernel function is dened as:

0
Kx; x
n
Y
i1

!

xi x0 2
xi x0i
i
cos x0
exp
a
2a2

18
and this kernel function is an allowable support vector kernel function.

Proof. According to the Lemma 2, we only need to prove
Fxx 2pn=2
Z
Rn
expjx:xKx dx P 0;
19
Q

2
2
where Kx i1 w xai ni1 cos w0axi expkxi k =2a , j denote imaginary number unit. We have
Qn
Z
Rn
expjxxKxdx
!!
xi
kxi k2
dx
expjxx
cos w0
exp
a
2a2
Rn
i1

n Z 1
Y
expjw0 xi =a expjw0 xi =a
expjxi bixi
2
i1 1
!

!
Z
2
n
1
Y1
kxi k
kxi k2
w0 j
exp
jxi a xi
exp
dxi
2 1
a
2a2
2
i1
!!

kxi k2
w0 j
jxi a xi
exp

a
2
!
!!
p
n
Y jaj 2p
w0 xi a2
w0 xi a2
exp
:
exp
2
2
2
i1
Z
n
Y
If we use wavelet kernel function as the support vectors kernel

function, the regression estimation equation of Wm-SVM is dened as:
l
X
ai ai
i1
l
x x
Y
i
w
b:
a
i1
23
For wavelet analysis and theory, see (Krantz, 1994; Liu & Di,
1992). h
2.3. Robust loss function
However, for standard wavelet m-SVM, it is difcult to deal with
the hybrid noise of time series. To solve the shortage of e-insensitive loss of standard wavelet m-SVM, a new hybrid function composed of Gaussian function, Laplace function and e-insensitive
loss function is constructed as the loss function of m-SVM, which
is called robust loss function. Then robust loss function can be dened as follows:
8
>
<0
Ln 12 jnj e2
>
:
ljnj e 12 l2
jnj 6 e
e < jnj 6 el
jnj > el ;
24
where el = e + l, n are slack variable.

The middle part of robust loss function curve is replaced by error
quadratic curve, which is used to inhibit (penalize) the type of noise
with the feature of Gaussian distribution. The linear part is generally
used to inhibit (penalize) singularity points and biggish magnitude
noises of time series. The curve of robust loss function, which is divided into three parts, is illustrated in Fig. 2. The proposed robust
loss function integrates the advantage of Gaussian loss function, Laplace loss function and e-insensitive loss function and makes support
vector machine better robustness and good generalization ability.
2.4. Robust wavelet m-support vector machine
Integrating the wavelet kernel function, robust loss function
and m-support vector machine, a robust wavelet support vector
machine is proposed in this part. The parameter b is taken into account condence interval of RW m-SVM, then the new optimal
problem be reformulated as
min
w;n ;e;b
1
2
kwk2 b C
2
me
1X

n2i n2
l ni ni
i
2
l i2I
X 1
i2I1
20
Substituting formula (20) into Eq. (19), we can obtain Eq. (21).
!
!!

n
Y
jaj
w0 xi a2
w0 xi a2
exp
exp
;
FXx
2
2
2
i1
25
Subject to w xi b yi 6 e ni ;
26
yi w xi b 6 e ni ;
27

ni
28
P 0;
e P 0:
21
Problem (25) is a quadratic programming (QP) problem. By

introducing Lagrangian multipliers, a Lagrangian function can be
dened as follows.
22
X

1
1 2
L w; b; a ; b; n ; e; g kwk2 b C me C
2
2
i2I
where a 0, we have
Fxx P 0:
L (e)
CX

1 2
ni n2

l ni ni
i
2
l i2I
2
X

gi ni gi ni
be
i2I2
l
X
ai e ni w xi b yi
i1
Fig. 2. Robust loss function.
e
l
X
i1
ai e ni w xi b yi ;
29

where ai ; gi ; b P 0 are Lagrangian multipliers. Differentiating the

Lagrangian function (29) with regard to w, b, e, n(), we have
l
X

@L
ai ai xi ;
0)w
@w
i1
30
l
X

@L
0)
ai ai b;
@b
i1
31
l
X

@L
0 ) b Cm
ai ai ;
@e
i1
32
@L
0 ) gi C l=l ai :
@n
33
By substituting (30)(33) into (29), we can obtain the corresponding dual form of function (25) as follows:
min
a;a 2R
l X
l
l

X

1X
ai ai aj aj Kxi ; xj 1
yi ai ai
2 i1 j1
i1
l
2

1 X
a a2
i
2C i1 i
eT a a 6 C m;
s:t:
C
l
ai 6 min C m; l :
0 6 ai ;
34
Formula (34) is represented by means of matrix form, we have
"
min
a;a 2Rl
i Q E
1h T
C
a ; a T
2
Q
Q
Q CE

a
a
yT ; yT

a
a

35
s:t: eT a a 6 C v ;

C
ai 6 min C m; l ;
l
0 6 ai ;
where Qij = k(xi, xj) + 1, e = [1, . . ., 1]T. a and a are lagrangian multipliers, which are nonnegative number.
Transform Eq. (35) into compact formulation as follows:
min
s:t:
1 T
a Ha ya
2
T
e a a 6 C v ;

C
0 6 ai ; ai 6 min C m; l ;
l

where a
36
a
Q E=C
Q
y .
; y
a ; H
Q
Q E=C
y
The output regression function of RW m-SVM is as follows:
f x
l
X

i1
ai ai
!
l
x x
Y
i
w
1 :
a
i1
37
It is obvious that RW m-SVM (whose constraint conditions are

less than those of the standard Wm-SVM by one) has a more concise
dual problem. There is no parameter b in the estimation function
Eq. (37), which reduces the complexity of the model.
2.5. The optimization algorithm for the unknown parameters of the
RW m-SVM model
The conrmation of unknown parameters of the RW m-SVM is
complicated process. In fact, it is a multivariable optimization
problem in a continuous space. The appropriate parameter combination of models can enhance approximating degree of the original
series Therefore, it is necessary to select an intelligence algorithm
to get the optimal parameters of the proposed models. The parameters of RW m-SVM have a great effect on the generalization performance of RW m-SVM. An appropriate parameter combination
4855
corresponds to a high generalization performance of the RW mSVM. PSO algorithm is considered as an excellent technique to
solve the combinatorial optimization problems (Krusienski, 2006;
Yamaguchi, 2007). The PSO algorithm, introduced by Kenedy &
Eberhart (1995), is used to determine the parameter combination
of RW m-SVM.
Similarly to evolutionary computation techniques, PSO uses a
set of particles, representing potential solutions to the problem under consideration. The swarm consists of m particles; each particle
has a position Xi = {xi1, xi2, . . ., xim}, and a velocity Vi = {vi1, vi2, . . ., vim},
and moves through a n-dimensional search space. According to the
global variant of the PSO algorithm, each particle moves towards
its best previous position and towards the best particle g in the
swarm. Let us denote the best previously visited position of the
ith particle that gives the best tness value as p_ci = {p_ci1, p_ci2, . . ., p_cim}, and the best previously visited position of the swarm
that gives best tness as p_g = {p_g1, p_g2, . . ., p_gn}.
The change of position of each particle from one iteration to another can be computed according to the distance between the current position and its previous best position and the distance
between the current position and the best position of swarm. Then
the updating of velocity and particle position can be obtained by
using the following equations:
v k1
wv kij c1 r 1
id
xk1
xkij v k1
ij
ij ;

p cij xkij c2 r 2 p g j xkij ;
38
39
where w is called inertia weight and is employed to control the impact

of the previous history of velocities on the current one. Accordingly, the
parameter w regulates the trade-off between the global and local
exploration abilities of the swarm. A large inertia weight facilitates global exploration, while a small one tends to facilitate local exploration. A
suitable value of the inertia weight w usually provides balance between global and local exploration abilities and consequently results
in a reduction of the number of iterations required to locate the optimum solution. k = 1, 2, . . ., Kmax denotes the iteration number, c1 is
the cognition learning factor, c2 is the social learning factor and r1
and r2 are random numbers uniformly distributed in [0, 1].
Thus, the particle ies through potential solutions towards Pki
and pgk in a navigated way while still exploring new areas by the
stochastic mechanism to escape from local optima. Since there
was no actual mechanism for controlling the velocity of a particle,
it was necessary to impose a maximum value Vmax on it. If the
velocity exceeds the threshold, it is set equal to Vmax, which controls the maximum travel distance at each iteration to avoid this
particle ying past good solutions.
2.6. The intelligence forecasting system
In the forecasting technique of product demand series, two of
the key problems are how to deal with noise and nonstationarity.
A potential solution to the above two problem is to use a mixture
of experts (ME) architecture illuminated by Fig. 3. ME architecture
is generalized into a two-stage architecture to handle the non-stationary in the data. In the rst of the two-stage architecture, a mixture of experts including evolutionary algorithm, partial least
squares, k-nearest neighbors are competed to optimize the model
in the second part of the two-stage architecture. To valuate the
model forecasting capacity of the second stage, the tness function
of ME architecture is designed as follows:
fitness
2
l
1X
yi yi
;
l i1
yi
40
where l is the size of the selected sample, yi denote the forecasting value of the selected sample, yi is original date of the selected sample.
4856
3. Intelligent forecasting method based on RW m-SVM and PSO

ME architecture is an intelligence forecasting system that can
handle the noise and nonstationarity of time series and construct
the nonlinearity relation in high dimension space effectively.
According to the above idea, Particle swarm optimization algorithm can be described as following:
Algorithm 1
Step (1) Data preparation: Training and testing sets are represented as Tr and Te, respectively.
Step (2) Particle initialization and PSO parameters setting: Generate initial particles. Set the PSO parameters including
number of particles (n), particle dimension (m), number
of maximal iterations (kmax), error limitation of the tness function, velocity limitation (Vmax), and inertia
weight for particle velocity (w). Set iterative variable:
k = 0. And perform the training process from Steps 37.
Step (3) Set iterative variable: k = k + 1.
Step (4) Compute the tness function value of each particle. Take
current particle as individual extremum point of every
particle and do the particle with minimal tness value
as the global extremum point.
Step (5) Stop condition checking: if stopping criteria (maximum
iterations predened or the error accuracy of the tness
function) are met, go to Step 7. Otherwise, go to the next
step.
Step (6) Update the particle position by formula (38) and (39) and
form new particle swarms, go to Step 3.
Step (7) End the training procedure, output the optimal parameters (C, v, a).
On the basis of the RW m-SVM model, we can summarize a demand forecasting algorithm as the follows.
Algorithm 2
Step (1) Initialize the original data by normalization and fuzzication, and then, form training and testing set.
Step (2) Deal the demand series with wavelet transform on the
different scale and select the best wavelet function K
and scale scope ai that can match the original series well.
Step (3) Compute the wavelet kernel function by (16). Construct
the QP problem (34) of the RW m-SVM.
Step (4) Go to Algorithm 1, and get the optimal parameters combination vector (C, v, a), solve the optimization problem
(36) and obtain the parameters a().
Step (5) For a new demand task, extract product characteristics
and form a set of input variables x.
Step (6) Compute the forecasting result f(x) by (31).
4. Experiments
To illustrate the proposed intelligence forecasting method, the
forecast of car demand series is studied. The car is a type of consumption product inuenced by macroeconomic in manufacturing
system and its demand action is usually driven by many uncertain
factors. Some factors with large inuencing weights are gathered
to develop a factor list, as shown in Table 1. The rst four factors
are expressed as linguistic information and the last two factors
are expressed as numerical data.
In our experiments, car demand series are selected from pastdemand record in a typical company. The detailed characteristic
data and demand series of these cars compose the corresponding
Output the current

combinational parameters
Accuracy check
Output the optimal combinational paramters

Fig. 3. The intelligence forecasting system based on RW m-SVM and PSO.

Table 1
Inuencing factors of car demand forecast.
Product characteristics
Unit
Expression
Weight
Brand famous degree (BF)
Dimensionless
0.9
Performance parameter
(PP)
Form beauty (FB)
Dimensionless
Sales experience (SE)
Dimensionless
Dweller deposit (DD)
Dimensionless
Oil price (OP)
Dimensionless
Linguistic
information
Linguistic
information
Linguistic
information
Linguistic
information
Numerical
information
Numerical
information
Dimensionless
0.8
0.8
0.5
0.8
0.4
4857
training and testing sample sets. During the process of the car scale
series forecasting, six inuencing factors, viz., brand famous degree
(BF), performance parameter (PP), form beauty (FB),demands
experience (SE), dweller deposit (nd) and oil price (np), are taken
into account the rst four inuencing factors are linguistic information, the latest two factors are numerical information. All
linguistic information of gotten inuencing factors is dealt with
fuzzy logic and form numerical information.
The proposed forecasting model has been implemented in Matlab 7.1 programming language. The experiments are made on a
1.80 GHz Core (TM)2 CPU personal computer (PC) with 1.0G memory under Microsoft Windows xp professional. Some criteria, such
as mean absolute error (MAE), mean absolute percentage error
(MAPE) and mean square error (MSE), are adopted to evaluate
Fig. 4. Mexican wavelet transform ofdemands time series in the scope of different scale.
Fig. 5. Morlet wavelet transform of demands time series in the scope of different scale.
4858
Fig. 6. The car demands forecasting results from RW m-SVM model.
Table 2
Comparison of forecasting results from four different models.
Model
10
11
12
Real value
m-SVM
Wm-SVM
Wg-SVM
RW m-SVM
2967
2971
2953
2962
2967
3268
3240
3257
3258
3257
3300
3269
3286
3286
3286
1891
1964
1981
1936
1922
3489
3439
3456
3511
3519
3544
3448
3465
3465
3472
2708
2754
2736
2726
2734
1513
1661
1678
1632
1611
3411
3489
3472
3464
3467
3672
3587
3605
3605
3610
3483
3433
3451
3450
3453
1523
1669
1687
1641
1620
Table 3
Error statistic of four forecasting models.
Model
MAE
MAPE
MSE
m-SVM
Wm-SVM
69.5833
63.1667
48.5833
43.9167
0.0309
0.0303
0.0223
0.0194
6662
6673
3822
2910
Wg-SVM
RW m-SVM
the performance of the intelligence forecasting system. The initial

parameters of the intelligence forecasting system are given as follows: inertia weight w0 = 0.9; positive acceleration constants c1,
c2 = 2; l = 1; the tness accuracy of the normalized samples is
equal to 0.0005.
The wavelet transform of the original scale series on the different
scales is got by means of the Steps 1 and 2 of Algorithm 2. The selected wavelet functions consist of morlet, haar, mexican and Gaussian wavelet. To reduce the length of this paper, only representational
morlet and mexican wavelet transforms on the different scales are
given in Figs. 4 and 5. Mexican wavelet transform is the best wavelet
transform that can inosculate the original demand series on the
scope of scale from 0.01 to 2 among all given wavelet transforms.
Therefore, Mexican wavelet can be ascertained as a kernel function of RW m-SVM model, three parameters also are determined as
follows:
v 2 0; 1;
a 2 0:001; 2 and

maxxi;j minxi;j
maxxi:j minxi;j
C2
103 ;
103 :
l
l
The optimal combinational parameters are obtained by Algorithm 1, viz., C = 525.57, v = 0.82 and a = 0.27. Fig. 6 illuminated
the forecasting result of the original car demand series given by
Algorithm 2.
To analyze the forecasting capability of RW m-SVM model, the

models (wavelet m-support vector machine with Gaussian loss
function (Wg-SVM) and wavelet m-support vector machine (WmSVM)) train the original demand series respectively, then give the
last 12 months forecasting results of each model shown in Table 2
(the last 12 months sample for testing sample). The linear inertia
weight of standard PSO is adopted:
w wmax
wmax wmin
k
kmax
41
where wmax = 0.9 is the maximal inertia weight, wmin = 0.1 is the
minimal inertia weight, k is iterative number of controlling procedure process.
To evaluate the forecasting error of these models, the comparison among different forecasting approaches is shown in Table 3.
The Table 3 shows the error index distribution by means of
dealt with four different models. The index (MAE, MAE and MSE)
of Wg-SVM model is better than that of Wm-SVM model. The
indexes of RW m-SVM are better than these of Wm-SVM and
Wg-SVM. It is obvious that robust loss function can improve the
generalization ability of support vector machine.
Experiment results show that the regressions precision of RW
m-SVM is improved due to adopting wavelet kernel and robust loss
function, compared with the models (Wg-SVM and Wm-SVM) and
m-SVM whose kernel function is Gauss function under the same
conditions.
5. Conclusion
In this paper, a new version of WSVR, named RW m-SVM, is
proposed to setup the nonlinear system of product demand series
by the integration of wavelet theory, robust loss function and
m-SVM. The new forecasting model based on RW m-SVM and PSO,
named PSO RW m-SVM, is presented to approximate arbitrary

demand curve in L2 space. The simulation results indicate RW
m-SVM can provide better forecasting precision of the product
demand series.
The performance of the RW m-SVM is evaluated using the data
of car demand, and the simulation results demonstrate that RW
m-SVM is effective in dealing with uncertain data and hybrid
noises. Moreover, it is shown that particle swarm optimization
algorithm presented here is available for the RW m-SVM to seek
the optimal parameters.
Compared to Wm-SVM and Wg-SVM, RW m-SVM has the best indexes (MAE, MAPE and MSE). RW m-SVM can overcomes the curse
of dimensionality and has some other attractive properties, such
as the strong learning capability for small samples, the good generalization performance for hybrid noises, the insensitivity to noise
or outliers and the automatic select of optimal parameters. Moreover, the wavelet transform can reduce noises in data while preserve the detail or resolution of the data. Therefore, in the
process of establishing the forecasting models, much uncertain
information of scale data is not neglected but considered wholly
into the wavelet kernel function. The forecasting accuracy is improved by means of adopting wavelet technique.
Acknowledgements
This research was partly supported by the National Natural Science Foundation of China under Grant 60904043, a research grant
funded by the Hong Kong Polytechnic University, China Postdoctoral Science Foundation (20090451152), Jiangsu Planned Projects
for Postdoctoral Research Funds (0901023C) and Southeast University Planned Projects for Postdoctoral Research Funds.
References
Bell, T., Ribar, G., & Verchio, J. (1989). Neural nets vs logistic regression. In Presented
at the University of Southern California expert system symposium (Nov.).
Box, G. E. P., & Jenkins, G. M. (1994). Time series analysis: Forecasting and control (3rd
ed.). Englewood Cliffs, NJ: Prentice- Hall, Inc..
Carbonneau, R., Laframbois, K., & Vahidov, R. (2008). Application of machine
learning techniques for supply chain demand forecasting. European Journal of
Operational Research, 184(3), 11401154.
4859
Duliba, K. (1991). Contrasting neural nets with regression in predicting

performance. In Proceedings of the 24th international conference on system
science, Hawaii (Vol. 4, pp. 163170).
Engle, R. F. (1984). Combining competing forecasts of ination using a bivariate
ARCH model. Journal of Economic Dynamics and Control, 18(2), 151165.
Gorr, W. L. (1994). Research prospective on neural forecasting. International Journal
of Forecasting, 10(1), 14.
Hill, T., Connor, M. O., & Remus, W. (1996). Neural network models for time series
forecasts. Management Science, 42(7), 10821092.
Hornik, K., Stinchcombe, M., & White, H. (1989). Multilayer feed forward networks
are universal approximators. Neural Networks, 2(5), 359366.
Kenedy, J., & Eberhart, R. (1995). Particle swarm optimization. In Proceedings of the
IEEE international conference on neural networks (pp. 19421948).
Khandoker, A. H., Lai, D. T. H., Begg, R. K., & Palaniswami, M. (2007). Wavelet-based
feature extraction for support vector machines for screening balance
impairments in the elderly. IEEE Transactions on Neural Systems and
Rehabilitation Engineering, 15(4), 587597.
Krantz, S. G. (1994). Wavelet: Mathematics and application. Boca Raton, FL: CRC.
Krusienski, D. J. (2006). A modied particle swarm optimization algorithm for
adaptive ltering. In IEEE international symposium on circuits and systems, Kos,
Greece (pp. 137140).
Kwok, J. T. (1999). Moderating the outputs of support vector machine classiers.
IEEE Transactions on Neural Networks, 10(5), 10181031.
Liu, G. Z., & Di, S. L. (1992). Wavelet analysis and application. Xian, China: Xidian
Univ. Press.
Roy, J., & Cosset, J. (1990). Forecasting country risk ratings using a neural network.
In Proceedings of the 23rd international conference on system science, Hawaii (Vol.
4, pp. 327334).
Tang, Z., Almedia, C., & Fishwick, P. A. (1991). Time series forecasting using neural
networks vs. BoxJenkins methodology. Simulation, 57(5), 303310.
Tang, Z., & Fishwick, P. A. (1993). Feedforward neural nets as models for time series
forecasting. ORSA Journal of Computing, 5(4), 374385.
Tong, H. (1983). Threshold models in non-linear time series analysis. New York:
Springer-Verlag.
Trontl, K., Smuc, T., & Pevec, D. (2007). Support vector regression model for the
estimation of c-ray buildup factors for multi-layer shields. Annals of Nuclear
Energy, 34(12), 939952.
Tuan, D. P., & Lanh, T. T. (1981). On the rst-order bilinear time series model. Journal
of Applied Probability, 18(3), 617627.
Vapnik, V. (1995). The nature of statistical learning. New York: Springer.
Widodo, A., & Yang, B. S. (2008). Wavelet support vector machine for induction
machine fault diagnosis based on transient current signal. Expert Systems with
Applications, 35(12), 307316.
Wohlberg, B., Tartakovsky, D. M., & Guadagnini, A. (2006). Subsurface
characterization with support vector machines. IEEE Transactions on
Geoscience and Remote Sensing, 44(1), 4757.
Yamaguchi, T. (2007). Adaptive particle swarm optimization-Self-coordinating
mechanism with updating information. In IEEE international conference on
systems, man and cybernetics, Taipei, Taiwan (pp. 3: 23032308).
Zhang, G. P. (2001). An investigation of neural networks for linear time-series
forecasting. Computers and Operations Research, 28(12), 11831202.
Zhang, G., Patuwo, E. B., & Hu, M. Y. (1998). Forecasting with articial neural
network: The state of the art. International Journal of Forecasting, 14(1), 3562.

1 s2.0 S0957417410009930 Main

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

1 s2.0 S0957417410009930 Main

Uploaded by

Copyright:

Available Formats

Expert Systems with Applications 38 (2011) 48514859

Contents lists available at ScienceDirect

Expert Systems with Applications

An intelligent forecasting model based on robust wavelet m-support vector machine

becomes longer. To improve forecasting nonlinear time series

Q. Wu, R. Law / Expert Systems with Applications 38 (2011) 48514859

Q. Wu, R. Law / Expert Systems with Applications 38 (2011) 48514859

the margin of separation to the hyper-plane. The e-insensitive loss

The solution to this minimization problem is of the form

Kx; x uxux dxdx P 0:

This theorem proposed a simple method to build kernel function.

where w(x) stands for the complex conjugation of w(x).

Lemma 1. The symmetry function K (x, x0 ) is the kernel function of

where a is the so-called scaling parameter, m is the horizontal oating

If the wavelet function w(x) satised the conditions: w(x) 2

Lemma 2. The horizontal oating function is allowable support

Fig. 1. A pictorial illustration of the e-insensitive loss function.

Q. Wu, R. Law / Expert Systems with Applications 38 (2011) 48514859

We can build the horizontal oating kernel function as follows:

where ai is the scaling parameter of wavelet, ai > 0. So far, because

Theorem 1. Morlet wavelet kernel function is dened as:

and this kernel function is an allowable support vector kernel function.

If we use wavelet kernel function as the support vectors kernel

where el = e + l, n are slack variable.

Problem (25) is a quadratic programming (QP) problem. By

Fig. 2. Robust loss function.

Q. Wu, R. Law / Expert Systems with Applications 38 (2011) 48514859

where ai ; gi ; b P 0 are Lagrangian multipliers. Differentiating the

Formula (34) is represented by means of matrix form, we have

The output regression function of RW m-SVM is as follows:

It is obvious that RW m-SVM (whose constraint conditions are

where w is called inertia weight and is employed to control the impact

Q. Wu, R. Law / Expert Systems with Applications 38 (2011) 48514859

3. Intelligent forecasting method based on RW m-SVM and PSO

Output the current

Output the optimal combinational paramters

Q. Wu, R. Law / Expert Systems with Applications 38 (2011) 48514859

Brand famous degree (BF)

Sales experience (SE)

Dweller deposit (DD)

Oil price (OP)

Q. Wu, R. Law / Expert Systems with Applications 38 (2011) 48514859

Fig. 6. The car demands forecasting results from RW m-SVM model.

the performance of the intelligence forecasting system. The initial

To analyze the forecasting capability of RW m-SVM model, the

Q. Wu, R. Law / Expert Systems with Applications 38 (2011) 48514859

named PSO RW m-SVM, is presented to approximate arbitrary

Duliba, K. (1991). Contrasting neural nets with regression in predicting

You might also like