Professional Documents
Culture Documents
Jiangsu Key Laboratory for Design and Manufacture of MicroNano Biomedical Instruments, Southeast University, Nanjing 211189, China
School of Hotel and Tourism Management, Hong Kong Polytechnic University, Hung Hom, Kowloon, Hong Kong
a r t i c l e
i n f o
Keywords:
Support vector machine
Wavelet kernel
Robust loss function
Particle swarm optimization
Forecast
a b s t r a c t
Aiming at the problem of small samples, season character, nonlinearity, randomicity and fuzziness in
product demand series, the existing support vector kernel does not approach the random curve of the
demands time series in the L2(Rn) space (quadratic continuous integral space). The robust loss function
is also proposed to solve the shortcoming of e-insensitive loss function during handling hybrid noises.
A novel robust wavelet support vector machine (RW m-SVM) is proposed based on wavelet theory and
the modied support vector machine. Particle swarm optimization algorithm is designed to select the
optimal parameters of RW m-SVM model in the scope of constraint permission. The results of application
in car demand forecasts show that the forecasting approach based on the RW m-SVM model is effective
and feasible, the comparison between the method proposed in this paper and other ones is also given
which proves this method is better than RW m-SVM and other traditional methods.
2010 Elsevier Ltd. All rights reserved.
1. Introduction
Application of time series prediction can found in the areas of
economic and business planning, inventory and product control,
weather forecasting, signal processing and many other elds
(Box & Jenkins, 1994; Engle, 1984; Hornik, Stinchcombe, & White,
1989; Hill, Connor, & Remus, 1996; Tuan & Lanh, 1981; Tong, 1983;
Tang, Almedia, & Fishwick, 1991; Zhang, 2001). Product demand
forecasting as an application of time series forecasting is a complex
dynamic system, and the demand behavior is affected by many factors. Many of these factors have the random, nonlinear, seasonal,
and uncertain characteristics. There is a kind of nonlinear mapping
relationship between the inuencing factors and demand series,
and it is difcult to describe the relationship by denite mathematical models.
For the linear series, Box and Jenkins (1994) developed the
autoregressive integrated moving average (ARIMA) methodology
for forecasting time series events. A basic tenet of the ARIMA modeling approach is the assumption of linearity among the variables.
However, there are many time series events for which the assumption of linearity may not hold. Clearly, ARIMA models cannot be
effectively used to capture and explain nonlinear relationships.
When ARIMA models are applied to processes that are nonlinear,
forecasting errors often increase greatly as the forecasting horizon
Corresponding author at: Jiangsu Key Laboratory for Design and Manufacture of
MicroNano Biomedical Instruments, Southeast University, Nanjing 211189, China.
Tel.: +86 25 51166581; fax: +86 25 511665260.
E-mail addresses: wuqi7812@163.com, hmwuqi@polyu.edu.hk (Q. Wu), hmroblaw@inet.polyu.edu.hk (R. Law).
0957-4174/$ - see front matter 2010 Elsevier Ltd. All rights reserved.
doi:10.1016/j.eswa.2010.09.036
4852
than their logistic counterparts. Duliba (1991) compare neural network models with four types of regression models in predicting the
nancial performance of transportation companies. She has found
that the neural network model outperforms the random-effects
regression model rather than the xed-effects model. Though neural networks are more powerful than regression methods for time
series prediction, their drawback is that the design of an efcient
architecture and the choice of the parameters involved require
longer processing time. In fact, learning neural network weights
can be considered as a hard optimization problem for which the
learning time scales exponentially as the problem size grows. To
overcome this disadvantage, a new approach should be explored.
Recently, a novel machine learning technique, called support
vector machine (SVM), has drawn much attention in the elds of
pattern classication and regression forecasting. SVM was rst
introduced by Vapnik (1995). Support vector machine (SVM) is a
kind of classiers studying method on statistic study theory. This
algorithm derives from linear classier, and can solve the problem
of two kind classier, later this algorithm applies in non-linear
elds, that is to say, we can nd the optimal hyperplane (large
margin) to classify the samples set. It is an approximate implementation to the structure risk minimization (SRM) principle in statistical learning theory, rather than the empirical risk minimization
(ERM) method (Kwok, 1999).
Compared with traditional neural networks, SVM can use the
theory of minimizing the structure risk to avoid the problems of
excessive study, calamity data, local minimal value and so on.
For the small samples set, this algorithm can be generalized well.
Support vector machine (SVM) has been successfully used for machine learning with large and high dimensional data sets. These
attractive properties make SVM become a promising technique.
This is due to the fact that the generalization property of an SVM
does not depend on the complete training data but only a subset
thereof, the so-called support vectors. Now, SVM has been applied
in many elds as follows: handwriting recognition, three-dimension objects recognition, faces recognition, text images recognition,
voice recognition, regression analysis, and so on Carbonneau,
Laframbois, and Vahidov (2008), Trontl, Smuc, and Pevec (2007)
Wohlberg, Tartakovsky, and Guadagnini (2006).
For pattern recognition and regression analysis, the non-linear
ability of SVM can use kernel mapping to achieve. For the kernel
mapping, the kernel function must satisfy the condition of Mercer
theorem. The Gauss function is a kind of kernel function which is
general used. It shows the good generalization ability. However,
for our used kernel functions so far, the SVM cannot approach
any curve in L2(Rn) space (quadratic continuous integral space), because the kernel function which is used now is not the complete
orthonormal base. This character lead the SVM cannot approach
every curve in the L2(Rn) space, similarly, the regression SVM cannot approach every function.
According to the above describing, we need nd a new kernel
function, and this function can build a set of complete base through
horizontal oating and exing. As we know, this kind of function
has already existed, and it is the wavelet functions. Based on wavelet decomposition, this paper propose a kind of allowable support
vectors kernel function which is named wavelet kernel function,
and we can prove that this kind of kernel function is existent.
The Morlet and Mexican wavelet kernel functions are the orthonormal base of L2(Rn) space. Based on the wavelet analysis and
conditions of the support vector kernel function, Morlet or Mexican
wavelet kernel function for support vector regression machine
(SVM) is proposed, which is a kind of approximately orthonormal
function. This kernel function can simulate almost any curve in
quadratic continuous integral space, thus it enhances the generalization ability of the SVR. The papers (Khandoker, Lai, Begg, &
Palaniswami, 2007; Widodo & Yang, 2008 research on wavelet
e-support vector machine. Much research indicates the performance of m-SVM is better than one of e-SVM. According to the
wavelet kernel function and the regularization theory, m-support
vector machine on wavelet kernel function (Wm-SVM) is proposed
in this paper.
However, the standard SVM encounters certain difculties in
real application. Some improved SVMs have been put forward to
solve the concrete problems (Kwok, 1999). Though the standard
SVM that adopts e-insensitive loss function has good generalization capability in some applications. But it is difcult to handle
Gaussian noises and the normal distribution noise parts of series.
Therefore, this paper focuses on the modeling of a new wavelet
SVM that can penalize the Gaussian noise parts of series.
Based on the RW m-SVM, an intelligence forecasting approach
for car demand series with the nonlinear and uncertain characteristics is proposed in this paper. Section 2 construct an intelligence
forecasting model based on a new m-support vector regression machine on wavelet kernel function and robust loss function (RW mSVM) and particle swarm optimization algorithm (PSO). Section 3
gives two algorithms to solve the intelligence forecasting problem.
Section 4 gives an application of the intelligence forecasting system based on RW m-SVM model. Section 5 draws the conclusions.
2. Robust wavelet m-support vector machine (RW m-SVM)
2.1. Support vector machine
SVM represent a novel neural network technique, which has
gained ground in classication, forecasting and regression analysis.
One of its key properties is that training SVM is equivalent to solving a linearly constrained quadratic programming problem, whose
solution turns out to be unique and globally optimal. Therefore,
unlike other networks training techniques, SVM circumvent the
problem of getting stuck at local minima. Another advantage of
SVM is that the solution to the optimization problem depends only
on a subset of the training data points, which are referred to as the
support vectors.
Let us consider a set of data points (x1, y1), (x2, y2), . . ., (xl, yl),
which are independently and randomly generated from an unknown function. Specically, xi is a column vector of attributes, yi
is a scalar, which represents the dependent variable, and l denotes
the number of data points in the training set. SVM approximate
such an unknown function by mapping x into a higher dimensional
space through a function /, and determining a linear maximummargin hyper-plane. In particle, the smallest distance to such a
hyperplane is called the margin of separation. The hyper-plane will
be an optimal separating hyper-plane if margin is maximized. The
data points that are located exactly the margin distance away from
the hyper-plane are denominated the support vectors.
Mathematically, SVM utilize a classifying hyper-plane of the
form f(x) = w x + b = 0, where the coefcients w and b are estimated by minimizing a regularized risk function:
l
X
1
kwk2 C
Le yi ;
2
i1
Pl
where kwk is denoted as the regularized term,
i1 Le yi is the
empirical error, and C > 0 is an arbitrary penalty parameter called
the regularization constant. Basically, SVM penalize f(xi) when is departs from yi by means of an e-insensitive loss function:
Le yi
if jf xi yi j < e
jf xi yi j e otherwise
So that the predicted values within the e-tube have a zero loss. In
turn, the minimization of the regularized term implies to maximize
4853
w;n ;e;b
Subject to
Rn
expjx:xKx dx P 0:
yi w xi b 6 e ni ;
x m
1
;
wa;m x a2 w
a
l
1X
v e
n ni
l i1 i
e P 0:
ai ai Kxi ; x b;
Wa; m a2
f xw
1
where ai and ai are the Lagrange multiplies associated with the
constrains (w xi + b) yi 6 e + ni and yi w xi b 6 e ni ,
respectively. The function K(xi, xj) = /(xi)0 /(xj) represents a kernel,
which is the inner product of the two vectors xi and xj in the space
/(xi) and /(xj).
Well-known kernel functions are Kxi ; xj x0i xj (linear),
d
Kxi ; xj cx0i xj r ; c > 0 (polynomial), K(xi, xj) = exp (ckxi
2
xjk ), c > 0 (radial basis function), and Kxi ; xj tanh cx0i xj r
(sigmoid). The radial kernel is a popular choice in the SVM literature.
2.2. The conditions of wavelet support vectors kernel function
The support vectors kernel function can be described as not
only the product of point, such as K(x, x0 ) = K(x x0 ), but also the
horizontal oating function, such as K(x, x0 ) = K(x x0 ). In fact, if a
function satised condition of Mercer, it is the allowable support
vector kernel function.
yi f ( xi )
x m
dx;
a
11
f x C 1
w
1
Wa; mwa;m x
1
da
dm;
a2
12
where
Cw
1
10
i1
Z Z
w xi b yi 6 e ni ;
l
X
ni P 0;
f x
Fxx 2pn=2
1
sw; n ; e kwk2 C
2
min
^
ww
2
^
jwwj
dw < 1;
jwj
13
wx expjwx dx:
14
For the above Eq. (12), Cw is a constant with respect to w(x). The
theory of wavelet decomposition is to approach the function f(x) by
the linear combination of wavelet function group.
If the wavelet function of one dimension is w(x), using tensor
theory, the multi-dimensional wavelet function can be dened as:
wd x
d
Y
wxi :
15
i1
L (e)
yi f ( xi )
4854
xi x0i
;
Kx; x0
w
ai
i1
d
Y
16
f x
wx cosx0 xexp 2 :
17
Kx; x
n
Y
i1
!
xi x0 2
xi x0i
i
cos x0
exp
a
2a2
18
Fxx 2pn=2
Z
Rn
expjx:xKx dx P 0;
19
Q
2
2
where Kx i1 w xai ni1 cos w0axi expkxi k =2a , j denote imaginary number unit. We have
Qn
Z
Rn
expjxxKxdx
!!
xi
kxi k2
dx
expjxx
cos w0
exp
a
2a2
Rn
i1
n Z 1
Y
expjw0 xi =a expjw0 xi =a
expjxi bixi
2
i1 1
!
!
Z
2
n
1
Y1
kxi k
kxi k2
w0 j
exp
jxi a xi
exp
dxi
2 1
a
2a2
2
i1
!!
kxi k2
w0 j
jxi a xi
exp
a
2
!
!!
p
n
Y jaj 2p
w0 xi a2
w0 xi a2
exp
:
exp
2
2
2
i1
Z
n
Y
l
X
ai ai
i1
l
x x
Y
i
w
b:
a
i1
23
For wavelet analysis and theory, see (Krantz, 1994; Liu & Di,
1992). h
2.3. Robust loss function
However, for standard wavelet m-SVM, it is difcult to deal with
the hybrid noise of time series. To solve the shortage of e-insensitive loss of standard wavelet m-SVM, a new hybrid function composed of Gaussian function, Laplace function and e-insensitive
loss function is constructed as the loss function of m-SVM, which
is called robust loss function. Then robust loss function can be dened as follows:
8
>
<0
Ln 12 jnj e2
>
:
ljnj e 12 l2
jnj 6 e
e < jnj 6 el
jnj > el ;
24
min
w;n ;e;b
1
2
kwk2 b C
2
me
1X
n2i n2
l ni ni
i
2
l i2I
X 1
i2I1
20
Substituting formula (20) into Eq. (19), we can obtain Eq. (21).
!
!!
n
Y
jaj
w0 xi a2
w0 xi a2
exp
exp
;
FXx
2
2
2
i1
25
Subject to w xi b yi 6 e ni ;
26
yi w xi b 6 e ni ;
27
ni
28
P 0;
e P 0:
21
22
X
1
1 2
L w; b; a ; b; n ; e; g kwk2 b C me C
2
2
i2I
where a 0, we have
Fxx P 0:
L (e)
CX
1 2
ni n2
l ni ni
i
2
l i2I
2
X
gi ni gi ni
be
i2I2
l
X
ai e ni w xi b yi
i1
e
l
X
i1
ai e ni w xi b yi ;
29
30
l
X
@L
0)
ai ai b;
@b
i1
31
l
X
@L
0 ) b Cm
ai ai ;
@e
i1
32
@L
0 ) gi C l=l ai :
@n
33
By substituting (30)(33) into (29), we can obtain the corresponding dual form of function (25) as follows:
min
a;a 2R
l X
l
l
X
1X
ai ai aj aj Kxi ; xj 1
yi ai ai
2 i1 j1
i1
l
2
1 X
a a2
i
2C i1 i
eT a a 6 C m;
s:t:
C
l
ai 6 min C m; l :
0 6 ai ;
34
"
min
a;a 2Rl
i Q E
1h T
C
a ; a T
2
Q
Q
Q CE
a
a
yT ; yT
a
a
35
s:t: eT a a 6 C v ;
C
ai 6 min C m; l ;
l
0 6 ai ;
where Qij = k(xi, xj) + 1, e = [1, . . ., 1]T. a and a are lagrangian multipliers, which are nonnegative number.
Transform Eq. (35) into compact formulation as follows:
min
s:t:
1 T
a Ha ya
2
T
e a a 6 C v ;
C
0 6 ai ; ai 6 min C m; l ;
l
where a
36
a
Q E=C
Q
y .
; y
a ; H
Q
Q E=C
y
f x
l
X
i1
ai ai
!
l
x x
Y
i
w
1 :
a
i1
37
4855
corresponds to a high generalization performance of the RW mSVM. PSO algorithm is considered as an excellent technique to
solve the combinatorial optimization problems (Krusienski, 2006;
Yamaguchi, 2007). The PSO algorithm, introduced by Kenedy &
Eberhart (1995), is used to determine the parameter combination
of RW m-SVM.
Similarly to evolutionary computation techniques, PSO uses a
set of particles, representing potential solutions to the problem under consideration. The swarm consists of m particles; each particle
has a position Xi = {xi1, xi2, . . ., xim}, and a velocity Vi = {vi1, vi2, . . ., vim},
and moves through a n-dimensional search space. According to the
global variant of the PSO algorithm, each particle moves towards
its best previous position and towards the best particle g in the
swarm. Let us denote the best previously visited position of the
ith particle that gives the best tness value as p_ci = {p_ci1, p_ci2, . . ., p_cim}, and the best previously visited position of the swarm
that gives best tness as p_g = {p_g1, p_g2, . . ., p_gn}.
The change of position of each particle from one iteration to another can be computed according to the distance between the current position and its previous best position and the distance
between the current position and the best position of swarm. Then
the updating of velocity and particle position can be obtained by
using the following equations:
v k1
wv kij c1 r 1
id
xk1
xkij v k1
ij
ij ;
p cij xkij c2 r 2 p g j xkij ;
38
39
fitness
2
l
1X
yi yi
;
l i1
yi
40
where l is the size of the selected sample, yi denote the forecasting value of the selected sample, yi is original date of the selected sample.
4856
Step (1) Data preparation: Training and testing sets are represented as Tr and Te, respectively.
Step (2) Particle initialization and PSO parameters setting: Generate initial particles. Set the PSO parameters including
number of particles (n), particle dimension (m), number
of maximal iterations (kmax), error limitation of the tness function, velocity limitation (Vmax), and inertia
weight for particle velocity (w). Set iterative variable:
k = 0. And perform the training process from Steps 37.
Step (3) Set iterative variable: k = k + 1.
Step (4) Compute the tness function value of each particle. Take
current particle as individual extremum point of every
particle and do the particle with minimal tness value
as the global extremum point.
Step (5) Stop condition checking: if stopping criteria (maximum
iterations predened or the error accuracy of the tness
function) are met, go to Step 7. Otherwise, go to the next
step.
Step (6) Update the particle position by formula (38) and (39) and
form new particle swarms, go to Step 3.
Step (7) End the training procedure, output the optimal parameters (C, v, a).
On the basis of the RW m-SVM model, we can summarize a demand forecasting algorithm as the follows.
Algorithm 2
Step (1) Initialize the original data by normalization and fuzzication, and then, form training and testing set.
Step (2) Deal the demand series with wavelet transform on the
different scale and select the best wavelet function K
and scale scope ai that can match the original series well.
Step (3) Compute the wavelet kernel function by (16). Construct
the QP problem (34) of the RW m-SVM.
Step (4) Go to Algorithm 1, and get the optimal parameters combination vector (C, v, a), solve the optimization problem
(36) and obtain the parameters a().
Step (5) For a new demand task, extract product characteristics
and form a set of input variables x.
Step (6) Compute the forecasting result f(x) by (31).
4. Experiments
To illustrate the proposed intelligence forecasting method, the
forecast of car demand series is studied. The car is a type of consumption product inuenced by macroeconomic in manufacturing
system and its demand action is usually driven by many uncertain
factors. Some factors with large inuencing weights are gathered
to develop a factor list, as shown in Table 1. The rst four factors
are expressed as linguistic information and the last two factors
are expressed as numerical data.
In our experiments, car demand series are selected from pastdemand record in a typical company. The detailed characteristic
data and demand series of these cars compose the corresponding
Accuracy check
Unit
Expression
Weight
Dimensionless
0.9
Performance parameter
(PP)
Form beauty (FB)
Dimensionless
Dimensionless
Dimensionless
Dimensionless
Linguistic
information
Linguistic
information
Linguistic
information
Linguistic
information
Numerical
information
Numerical
information
Dimensionless
0.8
0.8
0.5
0.8
0.4
4857
training and testing sample sets. During the process of the car scale
series forecasting, six inuencing factors, viz., brand famous degree
(BF), performance parameter (PP), form beauty (FB),demands
experience (SE), dweller deposit (nd) and oil price (np), are taken
into account the rst four inuencing factors are linguistic information, the latest two factors are numerical information. All
linguistic information of gotten inuencing factors is dealt with
fuzzy logic and form numerical information.
The proposed forecasting model has been implemented in Matlab 7.1 programming language. The experiments are made on a
1.80 GHz Core (TM)2 CPU personal computer (PC) with 1.0G memory under Microsoft Windows xp professional. Some criteria, such
as mean absolute error (MAE), mean absolute percentage error
(MAPE) and mean square error (MSE), are adopted to evaluate
Fig. 4. Mexican wavelet transform ofdemands time series in the scope of different scale.
Fig. 5. Morlet wavelet transform of demands time series in the scope of different scale.
4858
Table 2
Comparison of forecasting results from four different models.
Model
10
11
12
Real value
m-SVM
Wm-SVM
Wg-SVM
RW m-SVM
2967
2971
2953
2962
2967
3268
3240
3257
3258
3257
3300
3269
3286
3286
3286
1891
1964
1981
1936
1922
3489
3439
3456
3511
3519
3544
3448
3465
3465
3472
2708
2754
2736
2726
2734
1513
1661
1678
1632
1611
3411
3489
3472
3464
3467
3672
3587
3605
3605
3610
3483
3433
3451
3450
3453
1523
1669
1687
1641
1620
Table 3
Error statistic of four forecasting models.
Model
MAE
MAPE
MSE
m-SVM
Wm-SVM
69.5833
63.1667
48.5833
43.9167
0.0309
0.0303
0.0223
0.0194
6662
6673
3822
2910
Wg-SVM
RW m-SVM
v 2 0; 1;
a 2 0:001; 2 and
maxxi;j minxi;j
maxxi:j minxi;j
C2
103 ;
103 :
l
l
The optimal combinational parameters are obtained by Algorithm 1, viz., C = 525.57, v = 0.82 and a = 0.27. Fig. 6 illuminated
the forecasting result of the original car demand series given by
Algorithm 2.
w wmax
wmax wmin
k
kmax
41
where wmax = 0.9 is the maximal inertia weight, wmin = 0.1 is the
minimal inertia weight, k is iterative number of controlling procedure process.
To evaluate the forecasting error of these models, the comparison among different forecasting approaches is shown in Table 3.
The Table 3 shows the error index distribution by means of
dealt with four different models. The index (MAE, MAE and MSE)
of Wg-SVM model is better than that of Wm-SVM model. The
indexes of RW m-SVM are better than these of Wm-SVM and
Wg-SVM. It is obvious that robust loss function can improve the
generalization ability of support vector machine.
Experiment results show that the regressions precision of RW
m-SVM is improved due to adopting wavelet kernel and robust loss
function, compared with the models (Wg-SVM and Wm-SVM) and
m-SVM whose kernel function is Gauss function under the same
conditions.
5. Conclusion
In this paper, a new version of WSVR, named RW m-SVM, is
proposed to setup the nonlinear system of product demand series
by the integration of wavelet theory, robust loss function and
m-SVM. The new forecasting model based on RW m-SVM and PSO,
Acknowledgements
This research was partly supported by the National Natural Science Foundation of China under Grant 60904043, a research grant
funded by the Hong Kong Polytechnic University, China Postdoctoral Science Foundation (20090451152), Jiangsu Planned Projects
for Postdoctoral Research Funds (0901023C) and Southeast University Planned Projects for Postdoctoral Research Funds.
References
Bell, T., Ribar, G., & Verchio, J. (1989). Neural nets vs logistic regression. In Presented
at the University of Southern California expert system symposium (Nov.).
Box, G. E. P., & Jenkins, G. M. (1994). Time series analysis: Forecasting and control (3rd
ed.). Englewood Cliffs, NJ: Prentice- Hall, Inc..
Carbonneau, R., Laframbois, K., & Vahidov, R. (2008). Application of machine
learning techniques for supply chain demand forecasting. European Journal of
Operational Research, 184(3), 11401154.
4859