Professional Documents
Culture Documents
net/publication/319041996
CITATIONS READS
0 892
3 authors, including:
Oscar. Chang
Yachay Tech
26 PUBLICATIONS 165 CITATIONS
SEE PROFILE
Some of the authors of this publication are also working on these related projects:
Grey-Based Neural Network model to predict supply chain performance. View project
All content following this page was uploaded by Oscar. Chang on 11 September 2017.
Abstract— This work proposes a deep neural network (DNN) computational power and abstract representation of some
algorithm that accomplishes consistent sales forecasting for functions [10] [11]. Unfortunately, well stablished gradient
weekly data of pharmaceutical products. The resultant time descent methods such as backpropagation, that have proved
series is used to train with backpropagation, step by step, a DNN, effective when applied to shallow architectures, does not work
where shallow nets face selected scenarios, with different space-
well when applied to deep architectures.
time data considerations.
In each step, by using a sum of square differences and a peak
search procedure, a reasonable quality in the obtained abstract In previous works [12] [13] [14] we have shown an innovative
representations is pursued. First an autoencoder is trained as to line of deep learning algorithms, with its own set of
develop in its hidden layer neural data abstractions about a advantages / disadvantages, but eventually producing efficient
random moving window. Thereafter the autoencoder neural computing processors. We have taken these ideas
abstractions are used to train a second shallow net which further and in this paper, we propose a DNN specialized in
operates in a restricted area and specializes in one week ahead forecasting the sales or pharmaceutical products. The general
predictions. Lastly by using the abstraction of this second net problem is to find, for each outlet and for each product, an
plus recently captured information, a third shallow net is trained
ideal balance that minimizes inventory costs and maximize
to produce its own one week ahead estimates, using new timing
and data procedures. After training, the whole stacked system customer attention. For a distribution centers with hundreds of
can produce stable weekly forecasting with up to 91%_ 55 % hit outlets and thousands of products, this becomes a most
rate, for assorted products and periods. The system has been entangled and important operation, where deep learning could
tested in real time with real data. contribute with practical solutions.
Keywords— Deep Learning, Time Series Prediction, Sales Our methodology contemplates the training with
Forecasting. backpropagation of shallow networks inside explicit scenarios,
with specialized tasks, where predictive information about
I. INTRODUCTION predictive sales, circulates freely and is used as immediate
In the deep learning world state-of-the art performance have targets or rewards for local neural training. The final objective
gained a good reputation in fields like object recognition [1], is to produce reliable abstract representations of the data
speech recognition [2], natural language processing [3], behavior at both short-term and long-term influences, codified
physiological affect modelling [4] and many others. More in hidden layers, and then stack them together as to produce
recently papers on time-series prediction or classification with forecasting information.
deep neural networks have been reported [5] [6] [7] [8].
We also propose a primordial method to measure the
The search for depth quality of abstract representations generated in the different
Both in biology and circuit complexity theory it is maintained used hidden layers, by monitoring, while training is in
that deep architectures can be much more efficient (even progress, the neural activity of hidden neurons. This procedure
exponentially efficient) than shallow ones in terms of requires quadratic sum of differences over a selected period.
For operative purposes, the proposed stacked architecture is provides the final prediction information. Finally, the zone
derived from three shallow networks called Autoencoder, “unknown future” is used to test the performance of the system
Precursor and Gambler. and to make a real prediction, when the unknown future line
reaches the end of the data. At any time, more data can be
B. Data Handling added and the system responds creating new predictions.
Given three years of daily sales grouped in weeks, the
network unravels the problem of predicting sales one week
ahead of the current input window (one product-one outlet). C. Input Vector
The dataset is taken from the database of a real pharmaceutical
databases in Ecuador. For training purposes, the available data The input vector is composed by a moving window of 16
is divide in three mobile zones (Fig. 2), where times moves to consecutive weeks plus three other elements defined by the
the right. day/month/year where the top right of the moving window
stays at a given time instant (Fig. 2). All 19 entries are
The first initial zone, to the left, is reserved to train the first normalized to neural values inside the analog segment [0,1].
two shallow nets “autoencoder-predictor” which work as a When a target is needed, it will be taken as the sale value of
coordinate duet. the week next to the right of the sample window (near future).
The shown data ranges from January 2014 to April 2017.
The next zone, of about 10 weeks, is reserved to train the net
“gambler”, which holds the final solidary network output and
IT _Empresarial.
neuron in two consecutive randomly selected images in times t
and t-1. That is:
(1)
Where:
Vt = hidden outputs variation between two consecutive
inputs
n = number of hidden neurons
Figure 2. Data handling and input vector. Weekly sales behavior of a typical oi,t = output of hidden neuron i at time t.
pharmaceutical products, with an erratic pattern of consumption and a moving
window of 16 weeks data sale plus window ubication date information.
In a typical run, with small initial random weights in the
hidden layer, V starts from a small value and then grows into a
. random oscillatory time series. We use this outcome and
For training purposes, the moving widow travels in introduce a selective peak search procedure where the last
different space-time patterns for diverse training scenarios. found peak value of V is stored until a bigger peak value is
found. In pseudo code:
cycles_count=0;
III. FIRST SCENARIO: THE AUTOENCODER do
Our autoencoder has 19 inputs, 11 hidden and 19 output {
neurons. To train it, the moving window is located at a random
position inside the autoencoder zone and the same input vector calculate net;
is used as target. backpropagation;
The job of the trained autoencoder is to reproduce in its calculate Vn
output, as exactly as possible, the image of the moving
window just loaded in its inputs, for any random position in the if(Vn>peak) peak={Vn;cycles_count=0;}
allowed area. Since hidden neurons are less than input neurons,
cycles_count++;
data compression and abstract representations must occur
during training. Our stacked systems will work with } while(cycles_count<3000);
abstraction that travel from layer to layer as the main source of
information, so we take special care about abstractions’
quality. It turns out that after a while the period required to reach
the next peak value (cycles count) grows exponentially,
probably as an overtraining signal. So, for our purposes if in
3000 consecutive cycles no new bigger peak originate, the net
is assumed to be done and training stops. In our experiments,
this peak-search scheme produces small output error and high
hidden layer activity in the autoencoder, taking an average of
50k cycles of backpropagation to be completed.
Figure 4. The behavior of the staked Autoencoder-Precursor duet. After training the quadratic error inside the allowed training zone shrinks to a minimal. The two
curves look almost identical, so the hidden layers of precursor should convey valuable feature abstractions about predicting. The predictive capacity fades away
when the duet moves to the unknown future, with never seen data (Fig. 4).
Product A. High rotation, average sale is 72 units per week. analysis that uses both short-term and long-term features and in
There are 3 fails in 22 trials. The hit rate for this run is 86.4 %. experiments with real-world data delivers good results for
Some peaks and valleys are correctly anticipated. Other runs products with different consumption behaviors. Due to the
may go down to 66.3 %. many possibilities in training strategies, linking with other
Product B. Medium rotation, average sale 9.6 units per week. training techniques such as reinforcement learning and genetic
There are 4 fails in 22 trials. The hit rate is 81.8 %. Some algorithms are foreseen.
peaks and valleys are correctly anticipated.
Product C. High rotation, average sale 0.6 units per week.
There are 10 fails in 22 trials. The hit rate is 54.5 %.
Product D. Low rotation, average sale 110 units per week. REFERENCES
There are 4 fails in 22 trials. The hit rate is 81.8 %.
[1] Alex Krizhevsky, I Sutskever, and GE Hinton. Imagenet classification
VII. DISCUSSION with deep convolutional neural networks. Advances in Neural
Information Processing Systems, pages 1–9, 2012. (object
recognition)
For some products, the system produces good hit rate [2] O Abdel-Hamid and A Mohamed. Applying convolutional neural
forecasting, where some peaks and valleys are well predicted. networks concepts to hybrid NN-HMM model for speech recognition.
Acoustics, Speech, and Signal Processing, 2012. (speech recognition)
For other kind of products, the hit rate barely keeps above
[3] R. Collobert and Jason Weston. A unified architecture for natural
54%. Further parameters and training strategies should yet be language processing: Deep neural networks with multitask learning.
developed for these cases. Proceedings of the 25th international conference on Machine learning,
2008. (natural language processing )
According to our results the proposed peak-search scheme
[4] HP Martinez. Learning deep physiological models of affect.
produces good enough abstraction that convey important Computational Intelligence Magazine, (April):20–33, 2013.
information, raising the hit rates well above 50%. (physiological affect modelling)
[5] Atiya, Amir F. , Gayar, Neamat El and El-Shishiny, Hisham(2010) 'An
VIII. CONCLUSIONS Empirical Comparison of Machine Learning Models for Time Series
Forecasting', Econometric Reviews, 29: 5, 594-621
We presented a sales forecasting deep learning model that [6] S. Prasad, P. Prasad. ‘Deep Recurrent Neural Networks for Time Series
makes sales predictions by staking abstract representations, Prediction’.
whose quality is monitored by using a sum of square [7] DeepLearningforEvent-DrivenStockPrediction
differences and a peak search scheme. Mladen Dalto 'Deep neural networks for time series prediction with
Abstraction are produced by three different shallow applications in ultra-short-term wind forecasting', 2015 IEEE
International Conference on Industrial Technology (ICIT)
networks: autoencoder, precursor and gambler, trained inside
[8] XiaoDing, YueZhang, TingLiu, JunwenDuan, Mladen Dalto, “Deep
explicit scenarios, with focused tasks, timing and reward neural networks for time series prediction with applications in ultra-
procedures. Our training algorithm accomplishes a temporal short-term wind forecasting time series_I. “University of Zagreb,
Faculty of Electrical Engineering and Computing, E-mail: [13] O Chang, P Constante, A Gordon, M Singaña, F Acuna. “A deep
mladen.dalto@fer.hr. architecture for visually analyze Pap cells”. - IEEE 2nd Colombian
[9] C. Deep Prakash et al….. Data Analytics based Deep Mayo Predictor for Conference Automatic Control (CCAC), Oct. 2015. DOI:
IPL-9 (time series_II . International Journal of Computer Applications 10.1109/CCAC.2015.7345210
(0975 – 8887) Volume 152 – No.6, October 2016 exponential [14] Chang Oscar, A Bio-Inspired Robot with Visual Perception of
[10] Yoshua Bengio, Aaron Courville, and Pascal Vincent. Representation Affordances, in Computer Vision - ECCV 2014 Workshops,vol. 8926,
learning: A review and new perspectives. Pattern Analysis and Lecture Notes in Computer Science 2015 , pp. 420-426,Springer
Applications, (1993):1–30, 2013. International Publishing, http://link.springer.com/chapter/10. 1007
[11] Yoshua Bengio. Learning Deep Architectures for AI. Foundations and [15] MathWorks. “Improve Neural Network Generalization and Avoid
Trends® in Machine Learning, 2(1):1–127, 2009. Overfitting”. https://www.mathworks.com/help/nnet/ug/improve-neural-
network-generalization-and-avoid-
[12] O. Chang, P Constante, A Gordon, M Singaña. “A Novel Deep Neural
overfitting.html?requestedDomain=www.mathworks.com#responsive_o
Network that Uses Space-Time Features for Tracking and Recognizing a ffcanvas. 2017
Moving Object”. Journal of Artificial Intelligence and Soft Computing
Research. Poland, 2017. (on press)