You are on page 1of 7

Available online at www.sciencedirect.

com

ScienceDirect
Procedia Engineering 186 (2017) 537 – 543

XVIII International Conference on Water Distribution Systems Analysis, WDSA2016

Support vector machines in urban water demand forecasting using


phase space reconstruction
Sina Shabania, Peyman Yousefia, and Gholamreza Nasera*
a
Okanagan School of Engineering, The University of British Columbia, Canada

Abstract

High complexity of water distribution systems’ (WDS) dynamics has convinced researchers to look for more sophisticated
statistical approaches in urban water demand forecasting. Given the huge threat to major water reserves and ongoing draughts,
water authorities are concerned with long term analysis of water demand to deal with uncertain future of this dynamic
infrastructure. Researchers have tried a wide range of modelling techniques to propose an accurate model. However, applications
of machine learning techniques are yet to be explored in detail. This research proposes a support vector machine (SVM) model,
using polynomial kernel function to forecast monthly water demand of City of Kelowna (CKD), Canada. The prime objective of
this research is to assess the use of phase space reconstruction prior to design of models’ input variables combinations. Results
of this study proved optimum lag time of the input variables can significantly improve the performance of SVM models.

©2016
© 2016TheTheAuthors.
Authors. Published
Published by Elsevier
by Elsevier Ltd.is an open access article under the CC BY-NC-ND license
Ltd. This
Peer-review under responsibility of the organizing committee of the XVIII International Conference on Water Distribution
(http://creativecommons.org/licenses/by-nc-nd/4.0/).
Systems. under responsibility of the organizing committee of the XVIII International Conference on Water Distribution Systems
Peer-review

Keywords: Water demand forecasting; phase space reconstruction; average mutual information; lag time; support vector machine.

1. Introduction

Scientific and technological awareness of engineers have been constantly improving both practically and
theoretically. Like other engineering disciplines, water resources engineering is coupled with new scientific concepts
using data mining, management, computer programs, and artificial intelligence. Potable water reserves have been
under a serious threat due to water scarcity concerning many countries in the world. Therefore, forecasting

* Corresponding author. Tel.: +1-250-807-8464; fax: +1-250-807-9850.


E-mail address: bahman.naser@ubc.ca

1877-7058 © 2016 The Authors. Published by Elsevier Ltd. This is an open access article under the CC BY-NC-ND license
(http://creativecommons.org/licenses/by-nc-nd/4.0/).
Peer-review under responsibility of the organizing committee of the XVIII International Conference on Water Distribution Systems
doi:10.1016/j.proeng.2017.03.267
538 Sina Shabani et al. / Procedia Engineering 186 (2017) 537 – 543

municipal water demand is essential to all water utilities for water distribution systems (WDS) planning, design,
operation, and asset management. With the current water scarcity, robust and accurate demand forecasting models
are needed to aid WDS operators with meaningful data for better judgment calls in their design and operation.
Accurate predictions of water demand can enhance water sectors with tools for efficient management of reservoirs
and water resources, supply of water, and design of hydraulic structures. Researchers have proposed different
techniques for predicting this parameter. Given the complexity in water demand analysis and existence of a large set
of effective determinants, researchers have employed intelligent techniques for more accurate models. These
emerging artificial intelligent techniques are receiving more attention than conventional parametric and stochastic
models. Conventional methods of demand forecasting have been time series prediction and linear regression [1-3]
which do not account for nonlinearity of the input data in such models. Therefore, some new techniques such as
ANN (artificial neural networks) [4-6] and Non-linear regression [7] have been recently proposed which could
outperform traditional linear models. Support vector machine method was firstly proposed as a classification method
in 1995 [8]. However, it was later modified as a regression technique. Using kernel functions, SVM regressions can
account for nonlinearity in the systems. SVMs have been recently used in predictive models [9]; however, handful
number of studies deployed this technique in water demand forecasting [10, 11]. Therefore, there is a need for
further investigation of these models. For the first time, this research applies SVMs in urban water demand
forecasting along with pre-processed water demand and climatic information. Time series of input data are
transformed using phase space reconstruction, which allows different lag times to be explored in the final output of
the model.

2. Methodology

The main purpose of this approach was to determine the impact of lag time on support vector machines using
phase space reconstruction. AMI (average mutual information) is opted for determination of optimum lag time. This
technique is selected rather than auto-correlation function (ACF) and correlation integral (CI) as these alternatives
require large sets of data or fail to exhibit nonlinearity of the models [12].

2.1. Phase Space Reconstruction: proper lag time

The following equation (1) can determine the appropriate lag time between two independent time series. This
approach uses the joint probability of sequential time series which succeed one another by an increment (equal to
1unit of the lag time). Moreover, the product of their marginal probability is also utilized to determine the optimum
lag time. Similar to Shannon’s entropy, this technique can be a good estimate of how entropy levels can change the
dynamics of these deployed time series in forecasting models. An optimum lag time is used to make sure sufficient
information is added on the balanced independent time series which can magnify the behavior of these time series in
a desired phase space. Figure 1 shows the first local minimum of the drawn graphs can be a fair estimate of optimum
lag time. In this case, optimum lag time was selected as 1 month for precipitation. On the other hand, temperature
and water demand showed an optimum lag time of 3 months.

௜ୀ௡ (1)
ܲሺܺ௜ ǡ ܺ௜ାఛ ሻ
‫ܫ‬ఛ ൌ ෍ ܲሺܺ௜ ǡ ܺ௜ାఛ ሻ ή Ž‘‰ ଶ
ܲሺܺ௜ ሻ ή ܲሺܺ௜ାఛ ሻ
௜ୀଵ
Sina Shabani et al. / Procedia Engineering 186 (2017) 537 – 543 539

3.5

Avergare Mutual Information (AMI) 3


Water Demand
2.5 Precipitation
Temperature
2

1.5

0.5

0
0 1 2 3 4 5 6
Lag time (Month)

Fig. 1. Average mutual information.

2.2. Model Design

Phase space reconstruction could improve the GEP (gene expression programming) models proposed in our
previous work [13]. However, the impact of feeding models explicitly with individual optimum lag times remained
to be explored. Therefore, this research considered 4 different design combinations as shown in table 1. The first
model, labelled as ࣎D opt only considers the optimum lag time of demand (3 months). Followed by ࣎T opt and ࣎R opt
which consider their corresponding individual optimum lag times of 3 and 1 accordingly. Finally, the last designed
combination considers all possible lag times from 1 to the optimum lag time of the demand as the main focus of this
study. This is to investigate which design combination can result in better performance of models.

Table 1. Design combinations

Model ID* Input variables


࣎D opt
‫ܦ‬௧ିଷ ǡ ܶ௧ିଵ ǡ ܲ௧ିଵ

࣎T opt ‫ܦ‬௧ିଵ ǡ ܶ௧ିଷ ǡ ܲ௧ିଵ

࣎P opt ‫ܦ‬௧ିଵ ǡ ܶ௧ିଵ ǡ ܲ௧ିଵ

࣎1,2,3 ‫ܦ‬௧ିଵ ǡ ‫ܦ‬௧ିଶ ǡ ‫ܦ‬௧ିଷ ǡ ܶ௧ିଵ ǡ ܶ௧ିଶ ǡ ܶ௧ିଷ ǡ ܲ௧ିଵ ǡ ܲ௧ିଶ ǡ ܲ௧ିଷ
* t refers to current month; ߬ represents the lag, D is demand in Ml;
T is temperature °C; P is total precipitation in mm.
540 Sina Shabani et al. / Procedia Engineering 186 (2017) 537 – 543

2.3. Support Vector Machine (SVM)

SVM has been introduced as a classification method in 1995 [8]. Although, inserting the kernel functions could
add another dimension to this powerful technique, making it possible for regression analysis. The explanatory
variables, as time series data, are the input vectors playing a major role as supports of the training models. Suppose
N samples of the population are given by ‫ א‬୫ǡ ሼ୏ ǡ ୏ ሽ୒୏ୀଵ ǡ  ‫ א‬, then equation (2) can be a regression function as
below:

݂ሺ‫ݔ‬ሻ ൌ ܹ߶ሺܺሻ ൅ ܾ (2)

In this function, φ represents the kernel functions which transform the support vectors into higher dimensions in
the phase space. X is an input parameter with m components and Y is its response output variable, W is a weight
vector, b represents a bias. Figure 2 illustrates overall structure of a typical SVM model. A wide range of kernel
functions have been used in the literature according to the nature of explanatory variables. The most popular
functions are linear, radial based, and polynomial. Our previous work showed Polynomial kernel function
outperformed the other mentioned approaches [13]. Kernel functions allow nonlinearity in the repression of input
space. Cortes and Vapnik introduced optimization of the following equation (3) where Ɍ୩ and Ɍ‫כ‬୩ are variables that
reduce training errors. The constraint of this optimization is shown below equation (4) [8]:

ଵ (3)
‹‹‹œ‡‫ݓ‬ǡ ܾǡ ߦǡ ߦ ‫  כ‬ଶ ԡܹԡଶ ൅ ‫ ܥ‬σ௞ୀே ‫כ‬
௞ୀଵ ሺߦ௞ െ ߦ௞ ሻ

ܻ௞ െ ܹ ் ߶ሺܺ௞ ሻ െ ܾ ൑ ߝ ൅ ߦ௞ (4)
•—„Œ‡…––‘ ቐܹ ் ߶ሺܺ௞ ሻ ൅ ܾ െ ܻ௞ ൑ ߝ ൅ ߦ௞‫ כ‬ቑ ݇ ൌ ͳǡ ʹǡ ‫ ڮ‬ǡ ܰ
ߦ௞ ǡ ߦ௞‫ כ‬൒ Ͳ

Support vectors

X1

Polynomial Kernel Bias (b)


function
X1
K (x, x1)

K (x, x2) Output

K (x, xm)
Xm

Fig. 2. Structure of SVM model


Sina Shabani et al. / Procedia Engineering 186 (2017) 537 – 543 541

3. Study Area and Data Collection

Average monthly demand of urban water in ML was used from 1996-2010 in the City of Kelowna district (CKD).
In order to train the SVM models, 80% of data (140) were used for training followed by 20% (36) for testing our
models. Temperature and rainfall data were also used as average monthly recorded values of these corresponding
variables. This climatic information is available online by Environment Canada (http://kelowna.weatherstats.ca/).
Environment Canada reports such information using the weather station A located in the Kelowna airport (Latitude:
49° 57' 13" N: Longitude: 119° 22' 29" W).

4. Results and Discussion

As mentioned earlier in this paper, 4 combinations were used to assess the impact of lag time in the inputs of
demand forecasting models. Table 2 compares the performance of the SVM models with their corresponding design.
The models are compared based on coefficient of determination (R2) and Root mean square of error (RMSE). Since
R2is not sensitive to data point’s outlier to trend [14], RMSE is used to have a more solid reason of comparing these
models. R2 ranges between 0-1, with R2=1 being the best model which can replicate the field data explained by
developed models. Its complementary index, RMSE, gives an estimate of prediction error, with values close to 0
being the best possible outcome. Results showed optimum lag time of demand (࣎D opt), can lead to promising
results. Using demand’s lag time explicitly could slightly outperform the second model using temperature’s
optimum lag time ( ࣎ T opt). However, explicit use of optimum lag time for precipitation ሺ࣎ P opt), performed
significantly poorly in comparison with the other two design combinations. In the end, the final model which
considered all possible lag times of 1-3 months for explanatory variables came on top which comes in agreement
with the outcome of previous research [13]. Figure 3 illustrates a visual comparison of the superior model in this
research with actual water demand. It shows SVM models are highly sensitive to application of phase space
reconstruction.

Table 2. Performance of SVM model

Model ID Training Testing


R2 RMSE R2 RMSE
࣎D opt
0.9258 0.3778 0.9434 0.3551
࣎T opt
0.946 0.324 0.9521 0.314
࣎P opt 0.8476 0.5301 0.7374 0.6807
࣎1,2,3 0.9582 0.2863 0.9662 0.2615
542 Sina Shabani et al. / Procedia Engineering 186 (2017) 537 – 543
y f g g ( )
3000

Actual demand Predicted demand


2500
Water Demand (ML)

2000

1500

1000

500

0
1 6 11 16 21 26 31 36
3000

R² = 0.9334
2500
Actual Demand (ML)

2000

1500

1000

500

0
0 500 1000 1500 2000 2500 3000
Predicted Demand (ML)

Figure 3. Actual demand Vs Predicted demand by model τ 1,2,3


Sina Shabani et al. / Procedia Engineering 186 (2017) 537 – 543 543

5. Conclusion

Time series of weather information and water demand are considered as input variables of water demand
forecasting models. Presumably, a given lag time can be associated with such time series that can better explain their
natural cycles or behavior. This research used AMI as a well-known technique to determine the appropriate lag time
for water demand, temperature, and rainfall as explanatory variables in the developed models. Following our
previous work [13], this work focused the optimum lag time of these variables separately. Followed by, a model
which considered all given lags up to the optimum value. Results proved that a model which uses extra information
as independent time series can outperform models which target individual optimum lag time of these variables.
Support vector machines showed their high level of sensitivity to phase space reconstruction of input variables. In
general, this research could highlight the importance of design combination of inputs fed into demand forecasting
models by employing the concept of optimum lag time determined by average mutual information.

Acknowledgements

The authors received financial support from the Natural Sciences and Engineering Research Council (NSERC) of
Canada. The Okanagan Basin Water Board and the City of Kelowna are thanked for providing water consumption
data.

References

[1] L. Brekke, M. Larsen, M. Ausburn, L. Takaichi, Suburban water demand modeling using stepwise regression, Journal of American Water
Works Association, 94 (10) (2002) 65–75.
[2] A. Polebitski, R. Palmer, P. Waddell, Evaluating water demands under climate change and transitions in the urban environment. Journal of
Water Resources Planning and Management, 137(3) (2010) 249-257.
[3] S.J. Lee, E.A. Wentz, P. Gober, Space-time forecasting using soft geostatistics: A case study in forecasting municipal water demand for
Phoenix, Arizona, Stochastic Environmental Research and Risk Assessment. 24(2) (2010) 283–295.
[4] L. Jentgen, H. Kiddler, R. Hill, S. Conrad, Energy management strategies useshort-term water consumption forecasting to minimize cost of
pumping operations, Journal of American Water Works Association. 99(6) (2007) 86-94.
[5] M. Ghiassi, D. Zimbra, H. Saidane, Urban water demand forecasting with a dynamic artificial neural network model, Journal of Water
Resources Planning and Management. 134(2) (2008) 138–146.
[6] J.F. Adamowski, Peak daily water demand forecast modeling using artificial neural networks, Journal of Water Resources Planning and
Management. 134(2) (2008)119–128.
[7] J.F. Adamowski, H. Fung Chan, S.O. Prasher, B. Ozga Zielinski, A. Sliusarieva, Comparison of multiple linear and nonlinear regression,
autoregressive integrated moving average, artificial neural network, and wavelet artificial neural network methods for urban water demand
forecasting in Montreal, Canada, Water Resources Research. 48(1) (2012).
[8] V. Vapnik, C. Cortes, Support Vector Networks, Machine Learning. 20(1995) 1-25.

[9] M.A. Mohandes, T.O. Halawani, S. Rehman, A.A. Hussain, Support vector machines for wind speed prediction, Renewable Energy. 29(6)
(2004) 939-947.
[10] M. Herrera, L. Torgo, J. Izquierdo, R. Pérez-García, Predictive models for forecasting hourly urban water demand, Journal of hydrology.
387(1)(2010) 141-150.
[11] B.M. Brentan, E. Luvizotto. M. Herrera, J. Izquierdo, R. Pérez-García, Hybrid regression model for near real-time urban water demand
forecasting, Journal of Computational and Applied Mathematics. 2016.
[12] R. Khatibi, B. Sivakumar, M.A. Ghorbani, O. Kisi, K. Kocak, D.F. Zadeh, Investigation chaos in river stage and discharge time series, J. of
Hydrology. 414-415(2012) 108-117.
[13] S. Shabani, P.Yousefi, J.F. Adamowski, Gh. Naser, Intelligent Soft Computing Models in Water Demand forecasting, Intech- Water Stress.
(2016).
[14] D.R. Legates, G.J. McCabe, Evaluating the use of goodness of fit measures in hydrologic and hydroclimatic model validation,Water
Resources Research. 35 (1) (1999) 233ದ241.

You might also like