You are on page 1of 10

Pattern Prediction in Stock Market

Saroj Kaushik and Naman Singhal

Department of Computer Science and Engineering


Indian Institute of Technology, Delhi
Hauz Khas, New Delhi – 110019, India
saroj@cse.iitd.ernet.in, singhal.naman@gmail.com

Abstract. In this paper, we have presented a new approach to predict pattern of


the financial time series in stock market for next 10 days and compared it with
the existing method of exact value prediction [2, 3, and 4]. The proposed
pattern prediction technique performs better than value prediction. It has been
shown that the average for pattern prediction is 58.7% while that for value
prediction is 51.3%. Similarly, maximum for pattern and value prediction are
100% and 88.9% respectively. It is of more practical significance if one can
predict an approximate pattern that can be expected in the financial time series
in the near future rather than the exact value. This way one can know the
periods when the stock will be at a high or at a low and use the information to
buy or sell accordingly. We have used Support Vector Machine based
prediction system as a basis for predicting pattern. MATLAB has been used for
implementation.

Keywords: Support Vector Machine, Pattern, Trend, Stock, Prediction,


Finance.

1 Introduction

Mining stock market tendency is regarded as a challenging task due to its high
volatility and noisy environment. Prediction of accurate stock prices is a problem of
huge practical importance. There are two components to prediction. Firstly, historic
data of the firm in consideration needs to be preprocessed using various techniques to
create a feature set. This feature set is then used to train and test the performance of
the prediction system.
During literature survey we found that in the past, work has been done on the
markets around the globe using various preprocessing techniques such as those based
on financial indicators, genetic algorithms, principal component analysis and
variations of time series models. The better prediction systems are based on Artificial
Intelligence techniques such as Artificial Neural Networks [3] and Support Vector
Machines (SVM) [2]. More recent work has even tried to come up with hybrid
systems [6]. Overall, techniques based on Support Vector Machines and Artificial
Neural Networks have performed better than other statistical methods for prediction
[1, 2].

A. Nicholson and X. Li (Eds.): AI 2009, LNAI 5866, pp. 81–90, 2009.


© Springer-Verlag Berlin Heidelberg 2009
82 S. Kaushik and N. Singhal

In all the literature, the emphasis is prediction of the exact value of the stock on the
next day using the data of previous N days. We have extended the concept to predict
pattern for next M days using the data of previous N days. If we try to predict the
actual values of the next M days using the previous N days, the performance is not
good. So, we propose a new concept of pattern prediction. The motivation of such an
approach is to know the periods of relative highs and lows to be expected rather than
knowing the exact values. Here, M and N are the number of days. We analyzed the
three cases i.e. when M>N, M=N and M<N. Best results were obtained for M=N.
The data set being considered for the study is based on the real time financial time
series of the Reliance Industries of the National Stock Exchange, India. We obtain
historic data of the Reliance Industries for the last 8 years from the NSE [9] which
contains day-wise closing, high and low prices. Prediction System uses Least Square
Support Vector Regression (LS-SVR) based on Support Vector Machines [10]. In the
next section we discuss the concepts of LS-SVR followed by implementation, results
and observations.

2 Prediction Methods

In attempt to predict the stock markets behavior, study has been done on many
prediction methods such as Support Vector Machines and Artificial Neural Networks
etc [1,2,7,8]. In our research, we have used SVM based technique and have come up
with unique approach to train SVM for prediction. The SVM used in the proposed
work is the Least Square Support Vector Regression which is an extension of the
Support vector classification proposed by V. Vapnik [1].

2.1 Support Vector Regression

The basic idea of SVM is to use linear model to implement nonlinear class boundaries
through some nonlinear mapping of input vector into the high dimensional feature
space. A linear model constructed in the new space can represent a nonlinear decision
boundary in the original space. In the new space, an optimal separating hyper-plane is
constructed. Thus SVM is known as the algorithm that finds a special kind of linear
model, the maximum margin hyper-plane. The maximum margin hyper-plane gives
the maximum separation between the decision classes. The training examples that are
closest to the maximum margin hyper-plane are called support vectors. All other
training examples are irrelevant for defining the binary class boundaries.
Implementation of SVM is done using Support Vector Regression to predict the
output values. Given a set of data points, {( x1, z1 ),..,( xl , zl )} , such that xi ∈ R n is an
input and zi ∈ R1 is a target output, the standard form of a support vector regression
[10] is given below.
l l
1 T
min * w w + C ∑ ε i + C ∑ ε i* . (1)
w, b ,ε ,ε 2 i =1 i =1
Pattern Prediction in Stock Market 83

subject to,
wT ϕ ( xi ) + b − zi ≤∈ +ε i .

zi − wT ϕ ( xi ) − b ≤∈ +ε i* .

ε i , ε i* ≥ 0, i = 1,.., l .
The dual of (1) is
l l
1
min* (α − α * )T Q (α − α * )+ ∈ ∑ (α i + α i* ) + ∑ zi (α i − α i* ) . (2)
α ,α 2
i =1 i =1

subject to,
l

∑ (α
i =1
i − α i* ) = 0,0 ≤α i , α i* ≤ C , i = 1,.., l .

where,

Qij = K ( xi , x j ) .

The approximate value function is given by the following equation.


l
y = ∑ ( −α i + α i* ) K ( xi , x ) + b . (3)
i =1

Here w is the weight vector, ∈, ε , etc are the standard variables used in optimizations,
K is the kernel matrix, ϕ is the kernel function and α i* , α i are the SVM coefficients.
From now on let us denote SVM coefficients by α i = (−α i +α i* ) with no restriction on
α i being greater that zero.

3 Feature Set Modeling


In this section, we will discuss the feature set modeling for pattern prediction and
value prediction [2] and subsequently, the performance of both the methods will be
compared.

3.1 Value Prediction

The basic concept of value prediction is to use previous N days to predict the value of
next day [5, 6, 7]. We extend this concept to use values of previous N days to predict
the values of next M days. We have analyzed the cases where N > M, N = M and N <
M and found that the result comes out to be best if N = M. A total of M SVMs are
required to implement the proposed prediction technique.
84 S. Kaushik and N. Singhal

th
Let us consider closing price, say, xi of i day. Since one SVM is used to predict
one day, we have used M SVMs to predict next M days prices { xi +1 , xi + 2, ..., xi + M }
from previous N days prices { xi − N +1 , xi − N + 2, ..., xi }.

3.2 Pattern Prediction

In the proposed technique for pattern prediction, first we learn all the patterns in the
time series then learn to predict a pattern for next M days using closing price of
previous N days from training data set and finally predict a pattern on test data.

Learn a Pattern in the Time Series. The pattern is represented as a vector of the
coefficients generated by SVM as represented in equation (5). Since we want to
predict the pattern for the next M days, first we learn all patterns of size M using same
size sliding window in the entire time series. To learn one pattern, we create a training
sample consisting of (Day, Price) pair in the current window. Each Day in the current
window is represented by index from 1 to M and Price is represented by xi , the
closing price of i th day (in reference to the complete time series). So to learn a
pattern, there are M training pairs required as follows:
((1, xi +1 ), (2, xi + 2 ),..., ( M , xi + M )), i ∈ TrainingSet . (4)
We train one SVM corresponding to each pattern in the training set of time series.
Once each SVM is trained, we obtain SVM coefficients corresponding to each
pattern. Let us represent the coefficients of i th SVM, say, SVM i by
(α i , α i +1 ,..., α i + M ) and the i th pattern in the time series by coefficients of SVM i as
given below.
α i = {α i +1 , αi + 2 ,..., α i + M } . (5)

Learn to Predict a Pattern. After we have learnt all the patterns in training set, we
will learn to predict the pattern of next M days { α i +1 , α i + 2, ..., α i + M } using the closing
price of previous N days { xi − N +1 , xi − N + 2, ..., xi } from the training data set. For this, a
total of M new SVMs are required. These SVMs have nothing to do with the SVMs
that were used to learn the patterns above.

Prediction of a Pattern. For the test set, we compute the coefficients for j th test
sample and is represented as follows:

β j = {β j +1 , β j + 2 ,..., β j + M } . (6)

To obtain the pattern for j th test sample, we compute the least squared error between
β j and α i ’s, ∀ i∈ TrainingSet and j ∈ TestSet. We consider the predicted pattern of
Pattern Prediction in Stock Market 85

j th day as the learned pattern of i th day for which least squared error between α i
and β j is minimum that is computed as follows.

M
errormin = ∑ (α
k =1
i+k − β j + k )2 . (7)

where i∈ TrainingSet and j ∈ TestSet.

4 Implementation and Analysis of Results


The financial time series considered is of the Reliance Industries Ltd is its row data is
obtained from NSE website [9].

4.1 Implementation

Data consists of closing price, the highest price and the lowest price of the trading day
for last 8 years. The latest 1500 values are selected and used for experiment, where
1300 values are used for training and the rest 200 values are used for testing
prediction accuracy. We have taken closing price as the feature set for SVM as in
general, the prediction is created using the closing price only.
It is assumed the initial training set is large enough sample to represent a complete
set of patterns and the process of learning pattern is not repeated when we perform
prediction for the same time series.
LS-SVM package [10] is used for implementing Support Vector Machine. For
each learning and prediction, first the parameters of the SVM are optimized and then
parameters of the features are optimized to obtain the best possible prediction
accuracy. During the optimization of the parameters of the feature set, the best
results were obtained when N=M and N=10 [8]. Implementation is done using
MATLAB.

4.2 Analysis of Results

As already mentioned earlier, the best results for predicting next M days value from
previous N days value is obtained when N = M [8], the simulations were done for N =
7,10,14,21. Out of this the best values were obtained for N = 10.

Value Prediction. The graphs are plotted for the actual closing price and the
predicted closing price for all the days that form the test set values for the next day
and 10th day using value prediction method as discussed in Section 3.1. Values on the
Y-axis are the actual closing price of stock in consideration in INR (Indian National
Rupee) X-axis goes from 0 to 200 representing each day of the test set. Plot in blue is
the predicted price while plot in green is the actual closing price.
86 S. Kaushik and N. Singhal

Fig. 1. Actual and predicted closing price for next day

Fig. 2. Actual and predicted closing price for next 10th day

It can be seen that value prediction is good for predicting the next day price. Its
performance deteriorates as we predict for more days in the future as shown in Figs 1
and 2.

Pattern Prediction. Now we show the graphs obtained using the proposed technique
of pattern prediction. They have been directly compared to the corresponding value
prediction plots. Pattern prediction graphs are plotted using the values corresponding
to the pattern predicted and the actual values. Learning data and test data is same as
above.
Pattern Prediction in Stock Market 87

Fig. 3. Pattern Prediction vs Value Prediction at j = 55

Fig. 4. Pattern Prediction vs Value Prediction at j = 82

We know that α i where i=1,2,…,1300 is learned pattern at the close of i th day and
β j where j=1,2,…,200, the predicted coefficients at the close of j th day. Consider
such i and j for which error between α i and β j is minimum. The pattern to be
expected for the next M days at j th day will be similar to the pattern at i th day (refer
subsection 3.2). The graph is plotted between {xi +1 , xi + 2 ,..., xi + M } and
{x j +1 , x j + 2 ,..., x j + M } where i∈ TrainingSet and j ∈ TestSet. Graph for the value
prediction is the plotted between value predicted by the SVM and the actual value.
The pattern prediction and value prediction graphs shown in the following figures
have been compared. Pattern prediction graphs are on the left while value prediction
graphs are on the right. Prediction is done for the next M (=10) days following the
close of each day of the test set. Out of the total test set of 200 values only a few
graphs are shown. Plot in blue is the predicted value while plot in green is the actual
closing price.
We can conclude from these graphs that pattern prediction is able to predict the
highs and lows that can be expected in the near future more accurately as compared to
the approach of prediction based on actual values. In the next section, we compare the
results quantitatively.
88 S. Kaushik and N. Singhal

Fig. 5. Pattern Prediction vs Value Prediction at j = 119

Fig. 6. Pattern Prediction vs Value Prediction at j = 147

Fig. 7. Pattern Prediction vs Value Prediction at j = 179

4.3 Prediction Accuracy

Prediction accuracy is computed in percentage. For each test sample j, we predict for
next M days, where j = 1,..,200. Let us denote the predicted value by p(k), and
corresponding actual values by a(k), where k = 1,…,M. The following formula is used
to compute the prediction accuracy in %age.

Table 1. Performance of Pattern prediction vs Value prediction

Accuracy Mean Min Max


Value prediction 51.3% 0% 88.9%
Pattern prediction 58.7% 11.1% 100%
Pattern Prediction in Stock Market 89


100
acc = * sgn[( p(k ) − p(k − 1)) *(a (k ) − a (k − 1))] . (8)
M − 1 k =2

where,

1, x > 0
sgn( x) = .
0, x <= 0
The table1 shows the performance of both the methods quantitatively using above
formula. Here we have shown the minimum, maximum and the average prediction
accuracy obtained over the complete test set. We can clearly see that the pattern
prediction performs better than value prediction. The average for pattern prediction is
58.7% while that for value prediction is 51.3%. Similarly, maximum for the pattern
and value prediction is 100% and 88.9% respectively whereas the minimum is 11.1%
and 0% respectively.

5 Conclusion
Value prediction is a good technique for predicting next day price. However, if we
want to predict the price for next 10-15 days, we do not get good results by predicting
actual value. To tackle such a scenario, we have proposed technique of pattern
prediction. Although, it does not attempt to predict the exact value but it predicts the
expected trend of the prices for the next 10 days. Pattern-prediction gives better
results in prediction of patterns for longer duration.
In the proposed work, we learnt all the patterns present in the time series. Due to
this the SVM coefficients corresponding to the patterns obtained while learning are
very noisy. As a future study, performance of the pattern prediction can be improved
by processing of patterns and using a finite set of patterns rather than all the possible
patterns found in the financial time series. To come up with further improved
performance of the predicted pattern, we can apply some statistical algorithm between
the SVM coefficients and learned pattern coefficients as well.

References
1. Vapnik, V.: The Nature of Statistical Learning Theory. Springer, Heidelberg (1995)
2. Kim, K.: Financial Time Series Forecasting using Support Vector Machines. Neurocomputing
55, 307–319 (2003)
3. Kim, K., Lee, W.B.: Stock Market Prediction using Artificial Neural Networks with
Optimal Feature Transformation. Neural Computing and Application 13, 255–260 (2004)
4. Ince, H., Trafalis, T.B.: Kernel Principal Component Analysis and Support Vector
Machines for Stock Price Prediction. In: Proceedings of IEEE International Joint
Conference on Neural Networks, vol. 3, pp. 2053–2058 (2004)
5. Yu, L., Wang, S., Lai, K.K.: Mining Stock Market Tendency Using GA-based Support
Vector Machines. In: Deng, X., Ye, Y. (eds.) WINE 2005. LNCS, vol. 3828, pp. 336–345.
Springer, Heidelberg (2005)
90 S. Kaushik and N. Singhal

6. Li, W., Liu, J., Le, J.: Using GARCH-GRNN Model to Forecast Financial Times Series.
In: Yolum, p., Güngör, T., Gürgen, F., Özturan, C. (eds.) ISCIS 2005. LNCS, vol. 3733,
pp. 565–574. Springer, Heidelberg (2005)
7. Chen, W.H., Sheh, J.Y., Wu, S.: Comparison of Support Vector Machines and Back
Propagation Neural Networks in forecasting the six major Asian Stock Markets. Int. J.
Electronic Finance 1(1), 49–67 (2006)
8. Singhal, N.: Stock Price Prediction for Indian Market, Master, Thesis. Department of
Computer Science and Engineering. Indian Institute of Technology, Delhi, India (2008)
9. National Stock Exchange of India Ltd., Data Products NCFMOnline ReportCBT,
http://www.nseindia.com
10. LS-SVM, package. Home Toolbox Book People Publications Faq Links,
http://www.esat.kuleuven.ac.be/sista/lssvmlab