Application of Artificial Neural Networks To The Forecasting of Dissolved

Ecological Engineering 100 (2017) 6372
Contents lists available at ScienceDirect
Ecological Engineering
journal homepage: www.elsevier.com/locate/ecoleng
Application of articial neural networks to the forecasting of dissolved

oxygen content in the Hungarian section of the river Danube
Anita Csbrgi a, , Sndor Molnr a , Pter Tanos a , Jzsef Kovcs b
a
b
Institute of Mathematics and Informatics, Szent Istvn University, Pter K. u.1, Gdllo,
H-2103, Hungary
Department of Physical and Applied Geology, Etvs Lornd University, Pzmny Pter stny 1/C, Budapest, H-1117, Hungary
a r t i c l e
i n f o
Article history:
Received 8 August 2016
Received in revised form 8 December 2016
Accepted 16 December 2016
Keywords:
Dissolved oxygen forecasting
General regression neural networks
Multilayer perceptron neural networks
Multivariate linear regression
Radial basis function neural network
a b s t r a c t
Dissolved oxygen content is one of the most important parameters in the characterization of surface
water conditions. Our goal is to make a forecast of this parameter in Central Europes most important
river with the use of other, easily measurable water quality parameters (pH, temperature, electrical
conductivity and runoff) with the use of linear and nonlinear models. We adapt four models for forecasting
dissolved oxygen concentration, namely a Multivariate Linear Regression model, a Multilayer Perceptron
Neural Network, a Radial Basis Function Neural Network and a General Regression Neural Network model.
Data is available for Hungarian sampling locations on River Danube (Mohcs, Fajsz and Gyorzmoly)
for
the period of 19982003. The analysis was performed with four alternative combinations, the models
were formulated using data from the period 19982002 and a dissolved oxygen forecast was made for
2003. Evaluating model performance with various statistical measures (root mean square error, mean
absolute error, coefcient of determination, and Willmotts index of agreement), we found that nonlinear models gave better results than linear models. In two cases the General Regression Neural Network
provided the best performance, in two other cases the Radial Basis Function Neural Network gave the
best results. A further goal was to conduct a sensitivity analysis in order to identify the parameter with
the highest inuence on the performance of the created models. Sensitivity analysis was performed for
the combination of all three sampling locations (4th combination) and it was found that for all three
neural network models sensitivity analyses showed that pH has the most important role in estimating
dissolved oxygen content.
2016 Elsevier B.V. All rights reserved.
1. Introduction
To have a proper understanding of surface waters it is vital to
know the water quality parameters provided by the data of monitoring networks. The operation of a monitoring network can be
improved considering various criteria (e.g. cost efciency), or can
be facilitated by estimating certain parameters from other param-
Abbreviations: ANN, Articial Neural Network; CA , CB , CC , CD , Combination A, B, C,

D; DO, dissolved oxygen; EC, electrical conductivity; GRNN, General Regression Neural Network; hydro PP, hydro power plant; HNPP, Hungarian Nuclear Power Plant;
IA, Willmotts index of agreement; MAE, mean absolute error; MLPNN, Multilayer
Perceptron Neural Network; MLR, Multivariate Linear Regression; R2 , coefcient of
determination; RBFNN, Radial Basis Function Neural Network; RF, runoff; rkm, river
kilometres; RMSE, root mean square error; WT, water temperature.
Corresponding author.
E-mail addresses: csabragi.anita@gmail.com, csabragi.anita@gek.szie.hu
(A. Csbrgi), molnar.sandor@gek.szie.hu (S. Molnr), tanospeter@gmail.com
(P. Tanos), kevesolt@gmail.com (J. Kovcs).
http://dx.doi.org/10.1016/j.ecoleng.2016.12.027
0925-8574/ 2016 Elsevier B.V. All rights reserved.
eters. The current article examines several parameters which can

be used for the estimation of dissolved oxygen.
Dissolved oxygen is a very signicant parameter in the characterisation of the condition of aquatic ecosystems, thus forecasting
its concentration using easily available and measureable parameters may be considered important scientic advantage. The sources
of dissolved oxygen (DO) in a water body include re-aeration
from the atmosphere, photosynthetic oxygen production, and DO
loading. The sinks include the oxidation of carbonaceous and
nitrogenous material, the oxygen demand of the sediment, and the
respiration of aquatic plants (Kuo et al., 2007). The concentration
of DO reects the equilibrium, or the lack of one, between oxygenproducing and oxygen-consuming processes (Ahmed, 2014). Thus,
the preservation of DO in water bodies is one of the primary concerns for water resource managers.
The estimation and forecasting of the major parameters of surface waters is typically performed using various types of articial
intelligence based techniques relying on machine learning. This
requires training, validation (the latter can be omitted if data is
64
A. Csbrgi et al. / Ecological Engineering 100 (2017) 6372
scarce) and test sets. The creation of these sets can be undertaken
various ways. Most mainstream sources suggest random creation
of the respective sets (1), in this case the term estimation or modelling should be used (Ahmed, 2014; Antanasijevic et al., 2014;
Basant et al., 2010; Emamgholizadeh et al., 2014; Heddam, 2014;
Rankovic et al., 2012, 2010; Talib and Amat, 2012; Wen et al., 2013).
Other sources divide sets according to sampling points (2), assigning the majority of sampling points to the training set, and a smaller
proportion of sampling points to the test set; in this case the correct terminology is spatial forecasting (Dogan et al., 2009; He et al.,
2011a; Palani et al., 2008). Finally, some sources divide the temporal interval of measurement by assigning multiple initial years to
the training set and a couple of nal years to the test set (3), in
this case temporal forecasting is performed (Antanasijevic et al.,
2013; Ay and Kisi, 2012; Csbrgi et al., 2015; He et al., 2011b;
Singh et al., 2009). This article proposes examples for the 3rd case,
temporal forecasting.
In the following some results are presented from temporal forecasting studies. Antanasijevic et al. (2013) compared three Articial
Neural Networks (ANNs), namely, Multilayer Perceptron Neural
Network (MLPNN), General Regression Neural Network (GRNN)
and Recurrent Neural Network with Multivariate Linear Regression
(MLR) to the forecasting of DO in the River Danube at a single location in Serbia, Bezdan. The data from the years 20042008 were
used as a training data set, and the data from 2009 were applied as
the test data set. The authors found that the Recurrent Neural Network performed much better than the others. Singh et al. (2009)
developed two MLPNN models to forecast the biological oxygen
demand and DO concentration in the River Gomti, India, with the
help of 11 input water quality parameters. The entire water quality data set spanning 10 years was divided into three sub-sets; the
training set contained data from the rst 6 years, the validation set
comprised data of the next 2 years, and the test data set consisted of
the data from the remaining nal 2 years. The authors established
that the MLPNN was a powerful predictive tool in the computation of water quality parameters. Ay and Kisi (2012) developed and
compared two ANNs the MLPNN and RBFNN and MLR for the
forecasting of DO concentration by using four parameters (temperature (WT), pH, electrical conductivity (EC) and runoff (RF)) as
input in Foundation Creek, Colorado, USA. The whole data set was
collected from upstream and downstream USGS stations and the
training, validation and test data sets were divided by date of the
experimental data set. Comparison of the results showed that the
Radial Basis Function Neural Network (RBFNN) model performed
better than the MLPNN and MLR models, and that the RBFNN model
was quite effective without the runoff parameter in DO concentration forecasting. Finally, the downstream DO concentration was
successfully forecasted using only water temperature data of the
upstream station. He et al. (2011b) applied MLPNN and MLR to
forecast the daily DO minimum and the daily DO variation in the
Bow River, Canada. The water quality parameters of 20062007
recorded at 15 or 30 min intervals of both sampling sites were
used for the training set and the test set contained the data from
2008. The DO minimum was forecast using water temperature and
runoff, and the input parameters for the estimation of daily DO variance were radiation, water temperature and runoff. In both cases
the MLPNN model outperformed the linear model.
Our main goal is to aid water quality management using estimation procedures which optimise the operation of monitoring by
ensuring cost efciency and representativity. This may be attained
by providing forecasts of DO-concentration, which is one of the
most important hydrochemical parameters, using easily measureable physical and chemical parameters. We use the mainstream
approaches to reach our objective, i.e. MLR, and the various ANN
methods, and we provide (1) an efciency ranking of these methods
for different combinations of sampling locations (see the details in
Section 2.1). (2) We examine if there is a difference in the efciency

of the respective estimations of the alternative combinations. For
reasons of economy, the reduction of the number of parameters
used could be considered; in this case, the models discussed can
effectively support decision-making. (3) Sensitivity analysis was
performed to identify those parameters with the highest impact
on estimation for all three non-linear models and the results were
evaluated.
2. Material and methods
2.1. Study area
The River Danube is a very important ecological and economic
factor in the region. Thus, the conservation and improvement of its
water quality is of primary importance to the future of the region.
The Danube is the second longest river in Europe, with a length of
2817 km from the Black Forest (Germany) to the Black Sea (Romania). The section in Hungary is 417 km long, with an average RF of
2000 m3 /s. The construction of the Gabcikovo hydro power plant
(hydro PP) on the Slovakian-Hungarian border signicantly altered
the Danube, with around 80% of the discharge being rerouted to
the Slovakian side and a RF of only 400 m3 /s remaining on the Hungarian side. The river returns to its original riverbed at 1806 river
kilometres (rkm) (Kovcs et al., 2015b). An additional noteworthy anthropogenic impact of the Hungarian Nuclear Power Plant
(HNPP) (Paks, at 1526 rkm) is the effect of the efuent coolant water
reaching the Danube, thus the efuent RF of the HNPP has a direct
effect on primary greenhouse gas emissions from the electricity
grid (Molnr, 2002).
There are 12 sampling sites in the section of the River Danube
owing through Hungary (Fig. 1). The Mohcs station (D11, 1451.7
rkm) was chosen as an undisturbed representative location,
because this sampling site is not disturbed by tributaries or anthropogenic installations. The sampling location D11 belongs to the
Section type 6 according to the results of Sommerhuser et al.
(2003) and Liska et al. (2015). Further two locations, Gyorzmoly

(D2, 1806.2 rkm) and Fajsz (D9, 1507.6 rkm) were considered
noisy locations and classied to the Section type 4 and the Section
type 5, respectively. The D2 sampling location is the rst after the
sub-channel of the Gabcikovo hydro PP rejoins the Danube, while
D9 is the rst sampling location after HNPP. The four combinations
used for the analysis were as follows, the rst three considered
the individual data from Mohcs, Fajsz and Gyorzmoly,

denoted
respectively as CA , CB and CC . Finally, the fourth combination simultaneously assessed the data from all three sampling locations (CD ).
2.2. Water quality data set
The complete river water quality data set was divided into two
subsets. Data from 2003 were used as the test data set (26 data
samples on all sampling locations), and data from 1998 to 2002
were used as the training set (128 data samples in D11, 125 data
samples in D9 and 130 data samples in D2). The same training and
test sets were used in the application of each model.
The models required input parameters (in our case, measured
pH, RF (m3 sec1 ), WT ( C), EC (S cm1 )) to generate the output (forecasted DO (mg L1 )). Input and target data (measured DO)
were entered into the applied models after the z-score normalization technique had been applied (normalizing so the inputs and
targets have zero mean and unit standard deviation). The target
parameter corresponding to the input parameters belonged to the
same water sample, measured at the same time and location.
The descriptive statistics of the available data (Table 1) highlighted the fact that the parameters with the highest variance are
65
Fig. 1. The Hungarian section of the River Danube.
Table 1
Descriptive statistics of input and output data on the three sampling locations (CV- coefcient of variation, SD standard deviation).
Station
Parameters
D11
RF
WT
pH
EC
DO
RF
WT
pH
EC
DO
RF
WT
pH
EC
DO
D9
D2
Max
5400
25.9
8.75
530
17.7
5310
25.2
8.85
525
15.5
5130
22.8
8.9
560
14.03
RF and WT, the most stable parameter was pH, while EC and DO
show similar degrees of variance. It can also be seen that there is
little difference between sampling locations D11 and D9, see e.g.
Kovcs et al., 2015a. On the other hand, D2 signicantly differs from
the other sampling locations, which can mainly be explained with
the large distance (more than 298 rkm) between D2 and D9. The
difference is well shown through the average values of runoff and
temperature.
Methods presented in sub-chapters 2.32.6 were used in all
combinations, the learning set consisted of data from 1998 to 2002,
the test set was the data from 2003. Eventually, sensitivity analysis
is presented for the CD .
2.3. Multivariate linear regression (MLR)
MLR is used to estimate the linear association between the
dependent and one or more independent parameters. MLR is based
on least squares (Draper and Smith, 1981); it expresses the value
Min
Mean
SD
CV(%)
910
0.2
7.8
272
6.8
920
0.2
7.75
256
7
618
0
7.02
294
5.76
2363.19
12.84
8.24
377.62
11.04
2346.74
12.49
8.26
371.30
11.20
1940.58
11.59
8.13
370.56
9.82
979.18
7.42
0.22
58.98
1.80
939.77
7.50
0.26
57.31
1.60
770.25
6.48
0.27
48.71
1.54
0.41
0.58
0.03
0.16
0.16
0.40
0.60
0.03
0.15
0.14
0.40
0.56
0.03
0.13
0.16
of the predicted parameter as a linear function of one or more

predictor parameters:
y = 0 + 1 x1 + 2 x2 + ... + i xi
(1)
where xi is the value of the ith predictor parameter, o is the regression constant, and i is the coefcient of the ith predictor parameter.
2.4. Multilayer perceptron neural networks (MLPNN)
ANNs are basically parallel computing systems similar to biological neural networks. Among the various types of ANNs, the MLPNN
structure is the most commonly used and is a well-researched basic
ANN architecture. The MLPNN generally has three layers: input,
output and one or more hidden layer(s), as shown in Fig. 2. Each
layer consists of one or more basic element(s) called a neuron or
a node (or a processing unit). Nodes are connected to each other
by links, and synapses are characterised by a weight factor denoting the strength of the connection between two nodes (wi,j ). Each
66
Fig. 3. A schematic representation of RBFNN.

Fig. 2. A schematic representation of MLPNN.
node in the input and inner layers receives input values, processes
it, and passes it to the next layer. This process is conducted using
weights (Dogan et al., 2009), meaning that the hidden layer sums
the weighted inputs and their own bias values (bk ) and uses its
own transfer function to create an output value (yk ). Typical transfer functions are the linear, sigmoid or hyperbolic tangent functions
(Haykin, 1999).
MLPNNs are trained on the input data using an error
back-propagation algorithm (Antanasijevic et al., 2013). Backpropagation was proposed by Rumelhart et al. (1986), and it is
the most popular algorithm for the training of an MLPNN network
(Haykin, 1999). This back-propagation algorithm has two steps. The
rst step is a forward pass, in which the effect of the input is passed
forward through the network to reach the output layer and to calculate the output value (Yk ). After the error is computed (k ), a second
step starts backward through the network (Emamgholizadeh et al.,
2014) to correct the initial assigned weights of the input layer in
such a way as to minimize the error. This represents one complete cycle, in which all data pass through the network, and is
known as an epoch. The term feed-forward means that a node
connection only exists from a node in the input layer to other
nodes in the hidden layer or from a node in the hidden layer to
nodes in the output layer; and nodes within a layer are not interconnected with each other, and there are no lateral or feedback
connections. MLPNN using a BP algorithm is sensitive to randomly
assigned initial connection weights (Kim and Kim, 2008). The initialization of weights and bias values for a layer is conducted using
the Nguyen-Widrow method in the MATLAB environment (Pavelka
and Prochzka, 2004), and these initial values are dissimilar on
every single run, so after the training process different predicted
values are obtained.
In this study, the Levenberg-Marquardt algorithm is applied to
the adjustment of the MLPNN weights (Marquardt, 1963), and the
number of epochs was 1000. One hidden layer and a hyperbolic
tangent sigmoid transfer function were used between the input
and the hidden layers, and a linear transfer function was applied
between the hidden and output layers. Neural Network Toolbox of
MATLAB was utilized for every one of the three ANNs.
2.5. Radial basis function neural networks (RBFNN)
RBFNN was rst introduced into the neural network literature
by Broomhead and Lowe (1988) and Poggio and Girosi (1990). The
RBFNN is an unsupervised learning neural network and contains a
feed-forward structure including one input layer, a single hidden
layer, and one output layer, as shown in Fig. 3. RBFNNs have the
advantage of not suffering from local minima, as is also the case

with MLPNNs (Haykin, 1999). RBFNNs are also good at modelling
nonlinear data and can be trained in one stage rather than using an
iterative process, as in MLPNN, and also learn the given application
quickly (Venkatesan and Anitha, 2006). The hidden layer is selforganizing; its parameters depend on the distribution of the inputs,
not on the mapping from the input to the output (Ahmed, 2014). The
training between the input layer and the hidden layer is performed
by dening the weights (the center of the RBF) with the help of a
special clustering algorithm, such as the k-means algorithm (Kim
and Kim, 2008), and estimates the Euclidean distance of the ith input
vector (xi ) and the weight vector of the jth hidden node (wij ), where
N is the number of input parameters.

N

2
Dj =
wij xi .
(2)
i=1
The most popular RBF is the Gaussian Kernel Function, which is

used as an activation function, with as the smoothing factor or
spread:
f (Dj ) = exp
Dj
2 2
(3)
The smoothing factor must be given before training the model;

it represents the shape of the calculated Gaussian Kernel Function.
The built-in function in MATLAB (newrb) creates a radial basis network one neuron at a time. Neurons are added to the network until
the sum-squared error falls beneath an error threshold or a maximum number of neurons has been reached (Demuth and Beale,
2000). A single output of RBFNN with m hidden layer neurons can
be described as
Y=
m

wj f Dj + b.
(4)
j=1
where wi is the connecting weight between the hidden layer and

the output layer and b is a bias (constant value).
2.6. General regression neural networks (GRNN)
GRNN was rst introduced by Specht (1991) as an alternative
to MLPNN. GRNN is a modied form of the radial basic function
neural network model (Kim and Kim, 2008). GRNN is a one-pass
supervised learning network, and it is a universal approximator for
smooth functions. GRNN is a four-layer feed-forward neural network, which is shown in Fig. 4. The rst layer is fully connected
to the second. Each input unit in the rst layer corresponds to
67
Fig. 4. A schematic representation of GRNN, based on Antanasijevic et al. (2014).
an independent parameter in the model and the number of pattern neurons is equal to the number of data samples. The training
between the input layer and the pattern layer corresponds to the
learning between the input and the hidden layer of the RBFNN.
The number of neurons in the summation layer can be expressed
as No + 1, where No is the number of output neurons (Antanasijevic
et al., 2014). Since the model has only one output, each pattern layer
unit is connected to the two neurons in the summation layer: the
S-summation neuron and the D-summation neuron. The weights
between the summation-neuron and pattern neurons are equal to
the measured value of the output parameter (target data). The Ssummation neuron computes the sum of the weighted outputs of
the pattern layer (S) while the D-summation neuron calculates the
unweighted outputs of the pattern neurons (D).
S=
K

yj f (Dj )
(5)
j=1
D=
R2 =
n

Oi O
i=1
n

Oi O
Pi P
n
2
i=1
Pi P
(9)
2
(10)
i=1
n

IA = 1
n

i=1
(Oi Pi )
|Pi O| + |Oi O|
i=1
where n is the number of input samples; and Oi and Pi are the

observed and predicted output values from the ith element, respectively, and O, P denote their respective averages.
3. Results
K

3.1. Prediction using multivariate linear regression on CA

(6)
f (Dj )
j=1
Finally, the output layer merely divides the S-summation neuron by the D-summation neuron (Heddam, 2014).
2.7. Performance criteria

The performance of the applied models can be assessed by several statistical error measures. The root mean square error (RMSE),
mean absolute error (MAE), coefcient of determination (R2 ) and
Willmotts index of agreement (IA) were used to provide an indication of goodness of t between the observed and predicted values.
Expressions of these error parameters are given as follows:

n
1
2
RMSE =
(Oi Pi )
n
(7)
i=1
1
|Oi Pi |
n
i=1
In the rst case, the MLR1 model with four predictor parameters for the training data set was signicant (the result of the
F-test was 8.3E-19). The performance of this model is characterised
by the least square error (1.13 mg L1 ) and the value of R2 (0.52).
The p-value of RF and EC indicated that these parameters were
not signicant in the MLR1 model (p-value is higher, than 0.05.
Table 2), and so another model, denoted MLR2, was used without
these parameters.
In the second case, without RF and EC, the MLR2 model was
also signicant (3.25862E-20). The p-values of the two predictor
parameters (pH and WT) were under 0.05, so these parameters
were acceptable (Table 2). The Eq. (11) describes the relationship
between the DO value and the two predictor parameters.
DO = 0.18 WT + 5.91 pH 35.46
(11)
This equation was applied to the test data set, the least square
error was 2.03 mg L1 and the value of R2 was 0.4 (Table 3, rst
row).
3.2. Prediction using multilayer perceptron neural network on CA
MAE =
(8)
As presented in the description of the MLPNN method, initial

values are assigned randomly. We examined how this random
assignment inuences model results using 60 test runs (with an
68
Table 2
Coefcients and errors of the MLR model on CA.
First case MLR1
(Constant)
RF
WT
pH
EC
Second case MLR2
Coefcient
Standard error
p-value
Coefcient
Standard error
p-value
30.251
0.000
0.224
5.664
0.006
6.228
0.000
0.036
0.651
0.004
4.85717
0.339514
4.78E-09
1.73E-14
0.171113
35.460
4.746
1.2E-11
0.183
5.912
0.018
0.591
1.6E-18
1.2E-17
Table 3
Evaluating model performance on training and test datasets in all four combinations.
Comb.
CA
CB
CC
CD
Model
MLR2
MLPNN
RBFNN
GRNN
MLR2
MLPNN
RBFNN
GRNN
MLR2
MLPNN
RBFNN
GRNN
MLR2
MLPNN
RBFNN
GRNN
RMSE (mg L1 )
MAE (mg L1 )
R2
IA
training
test
training
test
training
test
training
test
1.14
0.65
0.84
0.47
1.00
0.64
0.48
0.35
1.13
0.97
0.79
0.77
1.31
0.96
1.11
0.92
2.03
1.57
1.65
1.42
1.94
1.72
1.62
1.74
1.57
1.46
1.43
1.36
1.98
1.70
1.63
1.70
0.79
0.43
0.62
0.27
0.75
0.46
0.37
0.23
0.84
0.74
0.60
0.56
0.97
0.67
0.81
0.62
1.38
1.28
1.25
1.14
1.41
1.22
1.33
1.27
1.23
1.21
1.19
1.09
1.41
1.21
1.17
1.21
0.51
0.85
0.74
0.93
0.49
0.80
0.88
0.94
0.43
0.58
0.72
0.75
0.36
0.66
0.54
0.70
0.4
0.57
0.59
0.72
0.44
0.58
0.54
0.55
0.50
0.59
0.47
0.60
0.41
0.59
0.59
0.55
0.82
0.93
0.92
0.98
0.81
0.93
0.97
0.98
0.77
0.84
0.91
0.91
0.72
0.88
0.83
0.89
0.72
0.77
0.78
0.87
0.76
0.79
0.75
0.77
0.72
0.72
0.75
0.75
0.73
0.78
0.77
0.77
Fig. 5. Boxplot diagrams of sixty runs for the test dataset on CA.
epoch count of 1000 for each run) with identical conguration.

Thus, different results could exclusively be attributed to different initial values. After assessing the resulting estimations of the
test set, we found that for a given datum we received different
predictions in each run. The largest differences were in the estimation of the 14th sample (Fig. 5), where the lowest prediction was
8.39 mg L1 and the highest reached 20.01 mg L1 (with a measured
value of 17 mg L1 ). These signicant differences highlighted the
importance of making multiple runs with MLPNN, as a single run
could result in a misleading outcome. This attribute of the MLPNN
can be managed if the (pre-iterated) runs are repeated 60 times
with identical settings, and the results of these 60 runs are averaged
for a sample both on the training and on the test set and statistical
indicators are calculated for data estimated in this manner. We will
use this approach and compare MLPNN with other neural networks
below.
In the case of MLPNN, we assessed the performance of the net
for different neuron numbers, from 2 to 9 (Fig. 6). It was found that
the RMSE of the test set decreased monotonously in the range of
25 neurons and increased between 5 and 9 neurons, reaching a
minimal value at 5 neurons. Thus, for the current set of parameters
the 5 neuron setting provides the most precise MLPNN estima-
RMSE (mg L-1)
69
efciency. In the case of CC the GRNN method turned out to be the

most efcient (Table 3).
2.2
2
1.8
1.6
1.4
1.2
1
2
5
6
7
Number of neurons
Fig. 6. RMSE of the test set with different neuron numbers using MLPNN on CA.
tion. Therefore, we evaluated and compared the 60-fold iterated

5 neuron MLPNN with the other models (Table 3).
3.3. Prediction using radial basis function neural network on CA
The performance of RBFNNs depends exclusively on the value
of the smoothing factor as the newrb function increases the neuron
count in the hidden layer until it reaches the present RMSE value
or the maximum neuron count which by default equals the sample
count of the input dataset. We evaluated the RMSE value for the
test set as a function of the smoothing factor, starting from an initial
value of 0.02 up to a value of 3 (Fig. 7). The results showed that while
initially RMSE declined decisively and reached its minimum at 0.26,
with a minimal value of 1.65 mg L1 (Table 3), afterwards it started
to grow and remained at a constant level (2.01 mg L1 ) all through
the remaining section of the examined interval. The neuron count
in the hidden layer was 18 for a smoothing factor of 0.26.
3.4. Prediction using general regression neural network on CA
The performance of GRNNs depends solely on the general
smoothing factor applicable for all parameters, thus the determination of an optimal smoothing factor is of great importance. We
evaluated the RMSE values for the test set as a function of the
smoothing factor in the interval between 0.02 and 3 (Fig. 8). First,
a signicant decline is visible until the minimal value is reached
(1.42 mg L1 RMSE at 0.3 smoothing factor), which is followed by a
steady increase (Table 3).
3.5. Modelling results for CB , CC and CD
3.5.1. CB
In the case of CB the D9 sampling location was examined.
Two MLR models were formulated. The rst excluded RF and EC
parameters from the four input parameters, as these showed no
signicance in the model. The second MLR model was developed
without these parameters, and all the resulting model parameters
showed signicance (Table 3, MLR2). The MLPNN method provided
the best results with 4 neurons (Table 3). The performance of the
RBFNN model was best at a smoothing factor of 0.12. In this case
the neuron count of the hidden layer was 49. Last, the GRNN model
was the most efcient with a smoothing factor of 0.3. Out of the
four models the RBFNN model gave the best results (Table 3).
3.5.2. CC
In the case of CC the separate analysis of the D2 sampling location was concluded using two MLR models. The rst MLR model
excluded the pH and EC parameters, thus the second MLR was formulated without these parameters (Table 3). The MLPNN model
gave the best results with 2 neurons. The RBFNN model application
showed that the highest efciency can be reached with a smoothing factor of 0.28. In this case the hidden layer neuron count was 36.
For the GRNN model a smoothing factor of 0.5 ensured the highest
3.5.3. CD
Finally all three locations were simultaneously analysed (CD ),
as a composite system. As usual two MLR models were formulated, and as not all parameters turned out to be signicant in the
rst model, the EC parameter was excluded from the second MLR
model which, on the other hand had only signicant parameters
(Table 3). The MLPNN model was most efcient with 6 neurons;
RBFNN model runs showed a smoothing factor of 0.46 as ideal, the
neuron count in the hidden layer was 14 in this case. Best results
could be achieved with the GRNN model if a smoothing factor of
0.44 was used as an input parameter. The composite system created in this manner was most efciently forecasted with the RBFNN
model.
3.6. Sensitivity analysis
Sensitivity analysis was performed to identify which of the four
parameters is the most important in predicting DO-values. We
tested all three neural network models by omitting a parameter
on each run and examining how this affects model performance.
We analysed the RMSE values of the test set and compared them
with the RMSE values gained in the case when the complete parameter range was used for the test set. This allowed us to develop
a ranking of parameters, as it was obvious that the omission of
the most important parameter would have the highest inuence
on the RMSE and result in the largest loss of model performance.
Sensitivity analysis was performed for the CD with all three neural
networks (Table 4). Results with each neural network conrmed
the importance of the pH parameter as model performance significantly deteriorated without this parameter.
4. Discussion
4.1. Validity of multivariate linear regression
The validity of the linear model was successfully established
through an F-test. However, when testing for the signicance of the
individual parameters it was found that some parameters are not
statistically different from zero (e.g. RF and EC, in the case of CA ).
Thus, the inclusion of these parameters does not improve model
accuracy. Despite the general practice, which ignores these constraints (Akkoyunlu et al., 2011; Antanasijevic et al., 2013; Ay and
Kisi, 2012; He et al., 2011b), we deemed it necessary to construct
an MLR2 model without the respective parameters. In this case the
models and its parameters are statistically signicant.
4.2. Development of the MLPNN
The 60-fold repeated iterations of the MLPNN (Fig. 5) highlighted the fact that due to the random initialisation identical model
settings can result in highly distinct results. Most of the earlier
works in this eld ignored this phenomenon (Dogan et al., 2009;
Kuo et al., 2007; Rankovic et al., 2010; Singh et al., 2009; Talib and
Amat, 2012), though Palani et al. (2008) raises this issue.
In an optimal scenario, the 60-fold reiteration should produce
results with zero variance, that is, all of the results should be
identical. Throughout the course of the analysis, the opposite was
experienced. Variances of the estimations of individual observations of the test set ranged between 0.11 and 32.69.
Thus, this repeated iteration approach turned out to be well
founded, as this allowed us to manage the outliers of certain MLPNN
runs; without this, extremely incorrect results could randomly be
generated and accepted.
70
RMSE (mg L-1)
4.5
4
3.5
3
sigma=0.26
testRMSE=1.645402361
2.5
2
1.5
0
0.1
0.2
0.3
smoothing factor
0.4
0.5
0.6
Fig. 7. RMSE of the test set with different smoothing factors using RBFNN on CA.
2.6
2.4
RMSE (mg L-1)
2.2
2
1.8
1.6
1.4
sigma=0,3;
testRMSE=1,4
1.2
1
0
0.5
1.5
Smoothing factor
2.5
Fig. 8. RMSE of the test set with different smoothing factors using GRNN on CA.
Table 4
Results of sensitivity analysis on combination CD for the test set.
RMSE
MLPNN
RBFNN
GRNN
R2
MLPNN
RBFNN
GRNN
All
Skip pH
Skip WT
Skip EC
Skip RF
1.70
1.88
1.84
1.77
1.81
1.63
3.90
1.81
1.71
1.71
1.70
1.90
1.81
1.78
1.74
All
Skip pH
Skip WT
Skip EC
Skip RF
0.59
0.43
0.51
0.54
0.46
0.59
0.29
0.45
0.52
0.51
0.55
0.48
0.52
0.48
0.50
Concerning the neuron count in the hidden layer, Fletcher and

Goss (1993) assert that it should be between 2*I1/2 + O and 2*I + 1,
where I is the number of input parameters, and O is the number of
output parameters. In our case this means that the optimal neuron
count should be between 5 and 9. This rule of thumb was mentioned and successfully applied in some papers (Basant et al., 2010;
Emamgholizadeh et al., 2014; He et al., 2011a; Wen et al., 2013).
4.3. Comparison of results obtained by ANNs and MLR
4.3.1. The result of the CA
According to Eq. (13), the relative errors (RE) of the four applied
models (MLR, MLPNN, RBFNN and GRNN) were evaluated in percent
terms (Najah et al., 2011),
RE =
Oi Pi
(%)
Oi
In spring and summer, when DO concentrations vary over a larger

range, the models can generate signicantly different errors while
sharing the tendency towards larger errors. Nevertheless, based on
RE values the GRNN also turned out to be the most efcient model.
The results of the four model applications showed
(Tables 3 and 5, Fig. 9) that the GRNN was the best performer,
both in the test and the training sets. In the learning phase, GRNN
improved RMSE values by 59%, in the test phase an improvement
of 30% was attained over the MLR, a 66% reduction of the MAE
values was achieved on the training set, and 17% over the test
set, respectively, compared to the linear model. The coefcient of
determination improved by 82% with GRNN in the training phase,
and by 80% in the test phase compared to MLR. The IA value of the
GRNN model improved by 20% and 21% over the MLR model for
the training and test sets, respectively.
(13)
where Oi and Pi are the observed and predicted DO values from the
ith element.
Based on RE, it can be asserted that certain models have dynamic
errors (Fig. 9). Our models differ in their degree of accuracy in
describing the variance of dissolved oxygen. According to the test
set, the smallest errors can be expected in those intervals where
DO variance is small (e.g. fall and winter). In these intervals the
RE values of the models are smaller and dont differ signicantly.
4.3.2. Comparison of the result of the other combinations

Ultimately, the differences of the models created for the respective combinations showed that for CB and CD the RBFNN model
provided the smallest RMSE value (Table 3) for the test dataset.
For the CC case the GRNN model gave the best results considering
the RMSE of the test dataset. Under all constructions the neural
networks could predict the DO parameter with a much higher
efciency than the linear model (Table 5). From the three neural
networks the most efcient were the GRNN and RBFNN, which
71
Fig. 9. The error distribution of the four models for the test data set on CA.
Table 5
Values of RMSE and R2 in the respective combinations as a fraction of the MLR model values.
RMSE
MLR
MLP
RBNN
GRNN
R2
MLR
MLP
RBNN
GRNN
CA
CB
CC
CD
100%
100%
100%
100%
77%
89%
93%
86%
81%
84%
91%
82%
70%
90%
87%
86%
CA
CB
CC
CD
100%
100%
100%
100%
143%
133%
117%
135%
148%
123%
94%
143%
180%
126%
119%
143%
provided better results than the MLPNN, which almost always

underperformed the other models. Another advantage of the GRNN
and RBFNN is fast computation, as opposed to the MLPNN which is
slowed down by its double-iteration learning strategy.
If results from all four combinations are considered then the
most signicant improvement was achieved in the rst case with
the application of neural networks (Table 5). In all other cases the
neural networks are also signicantly more efcient then linear
models, but the improvement is not as large as in the case of an
undisturbed location. In the case of noisy sampling locations the
anthropogenous effect disturbs and hardens estimations. In the
case of the D2 sampling location the nearby hydro PP signicantly
alters run-off (Klaver et al., 2007; Kovcs et al., 2015b). This inuences the complete ecological system (Liang et al., 2016; Onderka
and Pekrov, 2008) and the DO parameter. At the sampling location D9 the cooling of the HNPP locally though inuences the
heat conditions of the Danube and might inuence DO (Turnpenny
et al., 2010; Wetzel, 2001). These factors make prediction of DO at
the D2 and D9 locations more difcult. In the case of CD a composite system is modelled, with both undisturbed and noisy locations,
making the prediction of DO more difcult than at an undisturbed
sampling location (D11). The improvement of RMSE is only 18%
over MLR results, which is smaller than the 30% improvement experienced in the case of CA (Table 5).
4.4. Sensitivity analysis

Sensitivity analysis emphasises the importance of the pH
parameter, thus conrming the results from the MLR model where
analysis only allowed the exclusion of conductivity out of the four

parameters.
Oxygen is indispensable for the vital processes of decomposing
bacteria. Decomposition inter alia generates CO2 , which pushes pH
values in the direction of acidity. In addition, the reduction of pH can
cause decomposition of organic matter which can also be aided by
dissolved oxygen. A reduction of DO concentrations can frequently
result in increasing pH-values (Akkoyunlu et al., 2011).
5. Conclusions
Three alternative neural network models (MLPNN, RBFNN and
GRNN) and a multivariate linear regression model (MLR) were
applied at three sampling locations on the river Danube, in Mohcs,
Fajsz and Gyorzmoly

with four combinations. The goal was to forecast the content of dissolved oxygen using temperature, runoff,
pH and conductivity as explanatory parameters. Several statistical indicators (RMSE, MAE, IA, and R2 ) were used to assess model
performance. A forecast was made for the year 2003 using data
from previous years. The results of the four combinations showed
that the best performing models were the GRNN and RBFNN, which
outperformed the MLPNN. The worst performance was observed in
the case of the MLR model on every four combinations. Sensitivity
analysis of the three neural networks which was conducted in
CD conrmed that pH-value is the most important of the four
parameters in predicting dissolved oxygen levels. This is in concordance with results of MLR, where the model selected pH, runoff and
temperature as the parameters most inuencing dissolved oxygen
levels. Our results also show that all three neural network models
are highly efcient in predicting dissolved oxygen content in rivers,
72
and this provides a valuable input for water quality management

experts in decision making.
Acknowledgement
The authors would like to thank Paul Thatcher for his work on
our English version.
References
Ahmed, A.A.M., 2014. Prediction of dissolved oxygen in Surma River by
biochemical oxygen demand and chemical oxygen demand using the articial
neural networks (ANNs). J. King Saud Univ. Eng. Sci., http://dx.doi.org/10.1016/
j.jksues.2014.05.001.
Akkoyunlu, A., Altun, H., Cigizoglu, H.K., 2011. Depth-integrated estimation of
dissolved oxygen in a lake. J. Environ. Eng. 137, 961967.
Antanasijevic, D., Pocajt, V., Povrenovic, D., Peric-Grujic, A., Rictic, M., 2013.
Modelling of dissolved oxygen content using articial neural networks Danube
River, North Serbia, case study. Environ. Sci. Pollut. Res. 20, 90069013.
Antanasijevic, D., Pocajt, V., Peric-Grujic, A., Rictic, M., 2014. Modelling of dissolved
oxygen in the Danube River using articial neural networks and Monte Carlo
Simulation uncertainty analysis. J. Hydrol. 519, 18951907.
Ay, M., Kisi, O., 2012. Modeling of dissolved oxygen concentration using different
neural network techniques in Foundation Creek, El Paso County, Colorado. J.
Environ. Eng. 138, 654662.
Basant, N., Gupta, S., Malik, A., Singh, K.P., 2010. Linear and nonlinear modelling for
simultaneous prediction of dissolved oxygen and biochemical oxygen demand
of the surface watera case study. Chemom. Intell. Lab. Syst. 104, 172180.
Broomhead, D., Lowe, D., 1988. Multivariable functional interpolation and adaptive
networks. Complex Syst. 2, 321355.
Csbrgi, A., Molnr, S., Tanos, P., Kovcs, J., 2015. Forecasting of dissolved oxygen
in the river Danube using neural networks. Hung. Agric. Eng. 27, 3841.
Demuth, H., Beale, M., 2000. Neural Network Toolbox Users Guide: Matlab. The
Mathworks Inc.
Dogan, E., Sengorur, B., Koklu, R., 2009. Modelling biochemical oxygen demand of
the Melen River in Turkey using an articial neural network technique. J.
Environ. Manag. 90, 12291235.
Draper, N.R., Smith, H., 1981. Applied Regression Analysis. Wiley, New York.
Emamgholizadeh, S., Kashi, H., Marofpoor, I., Zalaghi, E., 2014. Prediction of water
quality parameters of Karoon River (Iran) by articial intelligence-based
models. Int. J. Environ. Sci. Technol. 11, 645656.
Fletcher, D., Goss, E., 1993. Forecasting with neural network: an application using
bankruptcy data. Inform. Manag. 24, 159167.
Haykin, S., 1999. Neural Networks: A Comprehensive Foundation, 2nd ed.
Prentice-Hall, Upper Saddle River, New Jersey.
He, B., Oki, T., Sun, F., Komori, D., Kanae, S., Wang, Y., Kim, H., Yamazaki, D., 2011a.
Estimating monthly total nitrogen concentration in streams by using articial
neural network. J. Environ. Manag. 92, 172177.
He, J., Chu, A., Ryan, M., Valeo, C., Zaitlin, B., 2011b. Abiotic inuences on dissolved
oxygen in a riverine environment. Ecol. Eng. 37, 18041814.
Heddam, S., 2014. Generalized regression neural network-based approach for
modelling hourly dissolved oxygen concentration in the Upper Klamath River
Oregon, USA. Environ. Technol. 35, 16501657.
Kim, S., Kim, H.S., 2008. Neural networks and genetic algorithm approach for
nonlinear evaporation and evapotranspiration modelling. J. Hydrol. 351,
299317.
Klaver, G., van Os, B., Negrel, P., Petelet-Giraud, E., 2007. Inuence of hydropower
dams on the composition of the suspended and riverbank sediments in the
Danube. Environ. Pollut. 148, 718728.
Kovcs, J., Kovcs, S., Hatvani, I.G., Magyar, N., Tanos, P., Korponai, J., Blaschke, A.,
2015a. Spatial optimization of monitoring networks on the examples of a river,
a lake-wetland system and a sub-surface water system. Water Resour. Manag.
29, 52755294.
Kovcs, J., Mrkus, L., Szalai, J., Kovcs, I.S.Z., 2015b. Detection and evaluation of
changes induced by the diversion of River Danube in the territorial appearance
of latent effects governing shallow-groundwater uctuations. J. Hydrol. 520,
314325.
Kuo, J., Hsieh, M., Lung, W., She, N., 2007. Using articial neural network for
reservoir eutrophication prediction. Ecol. Modell. 200, 171177.
Liang, C., Xin, S., Dongsheng, W., Xiujing, Y., Guodong, J., 2016. The ecological
benet-loss evaluation in a riverine wetland for hydropower projects-A case
study of Xiaolangdi reservoir in the Yellow river. China. Ecol. Eng. 96, 3444.
Liska, I., Wagner, F., Sengl, M., Deutsch, K., Slobodnk, J., 2015. Joint Danube Survey
3, International Commission for the Protection of the Danube River, Vienna
ISBN: 978-3-200-03795-3.
Marquardt, D., 1963. An algorithm for least square estimation of nonlinear
parameters. J. Soc. Ind. Appl. Math. 11, 431441.
Molnr, M., 2002. Possible role of nuclear power in reducing greenhouse-gas
emissions in the Hungarian power sector. In: Proceedings of the 4th
International Conference on Nuclear Option in Countries with Small and
Medium Electricity Grids, Croatian Nuclear Society, Zagreb, pp. 17.
Najah, A., El-Shae, A., Karim, O.A., Jaafar, O., El-Shae, A.H., 2011. An application of
different articial intelligences techniques for water quality prediction. Int. J.
Phys. Sci. 6 (22), 52985308, http://dx.doi.org/10.5897/IJPS11.1180.
Onderka, M., Pekrov, P., 2008. Retrieval of suspended particulate matter
concentrations in the Danube River from Landsat ETM data. Sci. Total Environ.
397 (13), 238243.
Palani, S., Liong, S., Tkalich, P., 2008. An ANN application for water quality
forecasting. Mar. Pollut. Bull. 56, 15861597.
Pavelka, A., Prochzka, A., 2004. Algorithms for initialization of neural network
weights. Sbornik prispevku 11. in: Konference MATLAB. 2, 453459.
Poggio, T., Girosi, F., 1990. Regularization algorithms for learning that are
equivalent to multilayer networks. Science 247 (4945), 978982.
Rankovic, V., Radulovic, J., Radojevic, I., Ostojic, A., Comic, L., 2010. Neural network
modelling of dissolved oxygen in the Gruza reservoir, Serbia. Ecol. Modell. 221,
12391244.
Rankovic, V., Radulovic, J., Radojevic, I., Ostojic, A., Comic, L., 2012. Prediction of
dissolved oxygen in reservoirs using adaptive network-based fuzzy inference
system. J. Hydroinf. 14, 167179.
Rumelhart, D.E., Hinton, G.E., Williams, R.J., 1986. Learning internal representation
by error back propagation, in: Rumelhart, D.E. and McClelland, J.L., (Eds.),
Parallel distributed processing. MIT Press, Cambridge, pp. 318362. ISBN:
9780262680530.
Singh, K., Basant, A., Malik, A., Jain, G., 2009. Articial neural network modeling of
the river water qualitya case study. Ecol. Modell. 220, 888895.
Sommerhuser, M., Robert, S., Birk, S., Hering, D., Moog, O., Stubauer, I., Ofenbck,
T., 2003. Final report for he developing the typology of surface waters and
dening the relevant reference conditions, UNDP/GEF danube regional project.
Vienna (Assessed 1 December 2016) http://www.undp-drp.org/pdf/1.1
River%20Basin%20Management%20-%20Phase%201/1.1 UNDP-DRP
Typology%20of%20SW 116 fr.pdf.
Specht, D.F., 1991. A general regression neural network. IEEE Trans. Neural Netw.
2, 568576.
Talib, A., Amat, M.I., 2012. Prediction of chemical oxygen demand in Dondang river
using articial neural network. Int. J. Inform. Educ. Technol. 2, 259261.
Turnpenny, A.W.H., Coughlan, J., Ng, B., Crews, P., Bamber, R.N., Rowles, P., 2010.
Cooling water options for the new generation of nuclear power stations in the
UK. Env. Agency, Bristol. (assessed: 03.12.16) https://www.gov.uk/
government/uploads/system/uploads/attachment data/le/291077/
scho0610bsot-e-e.pdf.
Venkatesan, P., Anitha, S., 2006. Application of a radial basis function neural
network for diagnosis of diabetes mellitus. Curr. Sci. 91, 11951199.
Wen, X., Fang, J., Diao, M., Zhang, C., 2013. Articial neural network modelling of
dissolved oxygen in the Heihe River, Northwestern China. Environ. Monit.
Assess. 185, 43614371.
Wetzel, RG., 2001. 9Oxygen, in: Wetzel, RG., Limnology, third ed., Academic Press
San Diego, pp. 151168. ISBN: 9780127447605.

Application of Artificial Neural Networks To The Forecasting of Dissolved

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Application of Artificial Neural Networks To The Forecasting of Dissolved

Uploaded by

Copyright:

Available Formats

Ecological Engineering 100 (2017) 6372

Contents lists available at ScienceDirect

Application of articial neural networks to the forecasting of dissolved

Abbreviations: ANN, Articial Neural Network; CA , CB , CC , CD , Combination A, B, C,

eters. The current article examines several parameters which can

A. Csbrgi et al. / Ecological Engineering 100 (2017) 6372

Section 2.1). (2) We examine if there is a difference in the efciency

(2003) and Liska et al. (2015). Further two locations, Gyorzmoly

the individual data from Mohcs, Fajsz and Gyorzmoly,

A. Csbrgi et al. / Ecological Engineering 100 (2017) 6372

Fig. 1. The Hungarian section of the River Danube.

of the predicted parameter as a linear function of one or more

A. Csbrgi et al. / Ecological Engineering 100 (2017) 6372

Fig. 3. A schematic representation of RBFNN.

advantage of not suffering from local minima, as is also the case

The most popular RBF is the Gaussian Kernel Function, which is

The smoothing factor must be given before training the model;

where wi is the connecting weight between the hidden layer and

A. Csbrgi et al. / Ecological Engineering 100 (2017) 6372

Fig. 4. A schematic representation of GRNN, based on Antanasijevic et al. (2014).

where n is the number of input samples; and Oi and Pi are the

3.1. Prediction using multivariate linear regression on CA

2.7. Performance criteria

As presented in the description of the MLPNN method, initial

A. Csbrgi et al. / Ecological Engineering 100 (2017) 6372

Second case MLR2

epoch count of 1000 for each run) with identical conguration.

RMSE (mg L-1)

A. Csbrgi et al. / Ecological Engineering 100 (2017) 6372

efciency. In the case of CC the GRNN method turned out to be the

tion. Therefore, we evaluated and compared the 60-fold iterated

A. Csbrgi et al. / Ecological Engineering 100 (2017) 6372

RMSE (mg L-1)

Concerning the neuron count in the hidden layer, Fletcher and

In spring and summer, when DO concentrations vary over a larger

4.3.2. Comparison of the result of the other combinations

A. Csbrgi et al. / Ecological Engineering 100 (2017) 6372

provided better results than the MLPNN, which almost always

4.4. Sensitivity analysis

analysis only allowed the exclusion of conductivity out of the four

Fajsz and Gyorzmoly

A. Csbrgi et al. / Ecological Engineering 100 (2017) 6372

and this provides a valuable input for water quality management

You might also like