You are on page 1of 22

Industrial Steam Consumption Prediction Through an Artificial Neural Networks

(ANNS) Approach: the case of Bahir Dar Textile, Ethiopia


Abstract
Current research studies have demonstrated the capability of Artificial Neural Networks (ANNs)
in learning to generalize to solve complex industrial problems. However, hardly few such studies
have been conducted to investigate if these ANNs are also effective in identifying energy use
patterns in industrial processes. In this research work a gradient descent and learning with
momentum variants of a multilayer ANNs were developed for determining steam consumption
patterns as a function of production rate in textile factory. These models are tested using real-time
data from each steam-consuming machine’s daily production and a meter reading of an electrical
steam boiler. Part of these data (85%) were randomly selected in order to train the network. The
remaining data were used to test the performance of the trained network. The correlation
coefficient between the estimated and target values showed to be R=0.9781. It the result also gave
an acceptable error performance index of magnitude around 0.0674. Thus the proposed neural
network can be used as a valuable tool as an energy use approximator in industrial
production processes. Moreover, with the availability of more training data, an increase prediction
capability can be achieved.
Key words: Artificial Neural Networks (ANNs), Industrial Processes, Steam Consumption,
Gradient descent, Learning with Momentum
method, however, is costly and might mean
Statement of the problem persistent measurements under different
industrial production conditions. Another
Energy consumption determination is
way is to get the industrial processes average
perhaps the first crucial element in demand
energy consumption from a manufacturers
side energy management (DSM).
specification. This method, even though,
Additionally, in integration process of
simple, is not usually practically usable. This
renewables such as solar plant in industries,
is because, it does not take into account the
knowledge of the load is a necessary
energy utilization under changing scenarios
requirement. To achieve this, direct
such as a not nominal operation, changing
measurements of generation and
input parameters in production processes,
consumption at the load side can be done,
and changing behavior of machines through
otherwise known as an energy audit. This
its life cycle. The last method, which is the
proposed in this study is to use ANNs to pertinent in practical implementation of these
predict energy use patterns under real-time ANNs as explained in section…. The work
changing production processes. This has also considered two efficient training
however, require a substantial data and paradigms i.e. learning with momentum and
several model configuration trials in order to gradient descent and made a comparison of
generalize well. them. To this end a Matlab scrip code was
written that incorporate all the above
This work is part of a larger project called
mentioned issues and arrived at acceptable
“Control and Optimization of a Large-Scale
performance during run-time.
Solar Plant in Ethiopian Textile Industry”.
The aim of the project is primarily a smooth Overview of Ethiopian Textile Industry
integration of an economically realizable
Textile industry has always held a central
solar plant for existing steam boiler in feed
attention in all of Ethiopian industrial policy
water. During the course of this project,
and development phases. In 2011 ranking of
determining the thermal energy demand was
top ten manufacturing exports in the country,
deemed necessary for optimal sizing and
share from textile and appeal was 11.49% of
operation of the solar plant. This task was
the total [1]. There is a growing trend in the
difficult to achieve due to the absence of huge
expansion of new textile industries and
fund to do energy auditing. Neither average
capacitating the existing one. One indication
thermal energy use determination was
is the fact that Ethiopian export value of
possible since the factory was very old and no
textile industry showed a 55% growth in just
known specifications available. These
one year from 2010 to 2011[1].
combined factors lead to the idea of
predicting energy use patterns from other As of 2010, contribution of Ethiopian textile

available related data through an ANNs. industries in terms of GDP and industrial

These related data are daily production from output by value terms is 1.6% and 12.4%

steam consuming machines and a KWh meter respectively [2]. Table 1 and Figure 1 show

reading of a boiler. the distribution of these textile sectors in the


country and their production capacity.
The research work employs the well-known
back propagation (BP) algorithm for a Most Ethiopia’s textile processing units use

multilayer feedforward neural network. The non-fossil boilers which consume furnace,

work has taken in to consideration all issues diesel or coal as a source of energy with the
exception of two factories which partially run including yarns and fabrics. It was
electric boilers. Currently about 8,112,000 established in 1961 from the fund of Italian
tons of coal, 10,256,803 liter of furnace oil war reparation in the town of Bahir Dar, 570
are utilized every year by the sixteen km North West of Addis Ababa, Ethiopia.
processing units in the country. In addition The factory is utilizing substantial quantity of
about 70,607,459 kilowatt of power and cotton as raw material, and supplies its
35,344,417 m³ of water was consumed by the products to the local market.
sub-industry in 2011[3].

Integrated
Integrated mills mills
Spinning
18%
Spinning 4%

Weaving/Knitting
Handloom Garment factory Weaving/Knitting
54% 11%
Dyeing & Printing
Blanket factory Handloom
5%
Garment factory
Dyeing & Printing
Blanket factory 3%
5%

Figure 1Textile sector distribution in Ethiopia


Table 1Anual production capacity of Ethiopian textiles

No Textile product Annual production


1. Yarn (thousand tones) 102
2. Woven fabric (million meters) 207
3. Knitted fabric (million kilograms) 50
4. Knitted & woven garment (million pcs) 91

Steam Generation and Consumption in Bahir


In this aspect, it is playing a vital role in
Dar Textile Factory
linking backward and forward with cotton
Bahir Dar Textile Share Company is a producers and garment factories respectively.
vertically integrated textile company,
manufacturing 100% cotton products,
Table 2 Factory and Boiler Power consumption

Power Consumption
Electrical Boiler Factory Power Factor, pf Unit Cost
Month Active Reactive Active Reactive Boiler Factory Birr/KWh
(KWh) (kVAr) (KWh) (kVAr)
16-Jan 126 98 193 139 0.66 0.62 0.453
16-Feb 82 69 200 142 0.70 0.62 0.453
16-Mar 85 77 213 155 0.74 0.63 0.453
16-Apr 90 79 202 149 0.72 0.64 0.453
16-May 75 68 180 135 0.74 0.64 0.453
16-Jun 103 91 200 149 0.72 0.64 0.453
16-Jul 88 79 189 142 0.73 0.64 0.453
16-Aug 99 92 161 128 0.75 0.67 0.453
16-Sep 62 58 131 112 0.75 0.71 0.453
16-Oct 93 85 200 147 0.74 0.63 0.453
16-Nov 95 86 212 156 0.74 0.63 0.453
16-Dec 93 85 207 153 0.74 0.64 0.453

Bahir Dar Textile Share Company is chosen Electricity consumption

for this study due to the availability of data The electrical energy consumption at Bahir
relating to production process and an existing Dar textile comprises the factory and an
steam electrical boiler which is a prerequisite electrical steam boiler. Monthly active and
for further application of the results of this reactive power of the same is collected from
work. each meter reading for the year 2016. Table 2
depicts the detail characteristics.

30000

25000

20000
Production

15000

10000

5000

0
Spinning Weaving Finishing Garment
Textile Sub-section

Figure 2Average Daily Production in 2016


Table 3COLLINE Boiler specification

No. Specification Description Specification Value

1 Name and Type COLLINE, electrical boiler(3) with soften plant

2 Permissible and working pressure 13 bar, 10.3 bar

3 Design & Max Steam temperature 190oC, 184oC

4 Rated steam output 3348Kg/hr/boiler

5 Power consumption 2106KW/boiler

steam from this boiler is used for bleaching,


Production Details
Washing dryer, Calendar machine, new
The production details of the Bahir Dar
jigger, and sizing processes. The brief
Textile Share Company collected from the
specification of the boiler is given in table 3.
plant personnel. Last year average daily
Table 2 Electrical steam boiler specification
production for each sub-section from Jan-
2016 to Dec-2016 is shown in Figure 2. Introduction and Fundamentals of Artificial
Neural Networks (ANNs)
Figure 2 Production details of last year
One common as well as important
Steam generation & Distribution
component that finds itself in many practical
The factory has installed two electrical
applications is function approximation.
boilers, although one currently is not
Function approximation range from
working, and single furnace oil fired boiler to
determining realizable feedback function that
meet the steam demand at various sections of
relates measured outputs to control input in
the textile industry. Steam is generated at
control systems to finding a function that
pressure 5 bar and utilized at around 2~4 bar
correlated past values of an input signal to
as per the production process requirement.
output in adaptive filtering. Lately, Artificial
For this study only the steam production
Neural Networks (ANNs) have been used
characteristics and associated loads of the
extensively in finding underling functional
working electrical boiler is considered. The
relation of engineering processes. This Realization of perceptron concept by
pertains to the ability of ANNs to predict or Rosenblatt in 1958 was the hallmark of
solve non-linear problems with high degree ANNs. The perceptron unit is an individual
of accuracy given enough data to learn from. processing unit that accepts weighted input
A wide variety of ANNs have been used with and produces a rule based threshold output.
varying configuration that suit the specific MLP is a feed-forward ANN that is
requirements of an application. implemented by customizing these
fundamental units. This customization
A widely used and efficient ANN function
introduced addition of layers of neurons and
approximation is the MLP (multi-layer
a nonlinear transfer function [7-8].
perceptron) networks based on the BP (back-
propagation) learning algorithm. Though ANNs (Artificial Neural Networks)
researches are still contributing to know more ANNs are defined as a collection of
about these ANNS, several studies have processing units with networks for
exemplified the back propagation learning interaction with each other through a
algorithm as the forerunner among the Multi- weighted interconnection [ 8]. The whole aim
layer perceptron algorithms [6-8]. of these networks is to replicate, in a rather
The accuracy and convergence speed of these simplified manner, the workings of a human
MLPs usually depend on the neural network biological central nervous system. The
architectural configuration as well as choice performance of these ANNs depends, in a not
of tunable parameters during the clearly defined manner, on the number,
implementation stage. In previous studies, interconnection and interaction of these
researchers have used some techniques to constituent units.
solve real applications using these The aforementioned units are known as
algorithms. However hardly any examples of neurons. These neurons receive and give
industrial processes energy consumption input signals to all other units of which they
prediction from production process were are connected. These processing units are
done. This paper is an attempt to answer this referred to as neurons and each of them
question by implementing on of the most receives many input signals, from
powerful ANNs-MLP while trying to
A typical single neuron model is shown in
consider issues relating to their practical
Figure 3. The output strength from the neuron
application.
is determined from the function f, which Inputs Neuron
itself depend on the value of weight(W) and
bias(b) associated with each interconnection. n
W ∑
The implementation process begins when an P f a

input is presented to the network and b


propagated through the network as an output
1
by the transfer function otherwise known as
activation function. For MLP this process 𝑎 = 𝑓(𝑊𝑃 + 𝑏)
goes on from neuron to neuron and layer by
layer through the output layer that process Figure 3A single neuron ANNs

and gives the final value. A typical MLP The MLP Architecture and the Back
neural network is depicted in Figure 2. Propagation (BP) Algorithm

In MLP the training is implemented by It will simplify our development of the back
examples prior to their usage as a useful propagation algorithm if we use the
network. This training attempts to iteratively abbreviated notation for the multilayer
adjust connection weights and biases using a network which is stated as follows:
known training data. To facilitate this Scalars-small italic letters: a,b,c
training, the outputs from the network is
Vectors-small bold non italic letters: a,b,c
compared to the target examples, which is
known as the error performance index (PI). Matrices-capital BOLD non-italic letters: A,

This error is compared and propagated back B, C

through the network to adjust weights and The three-layer network in abbreviated
biases until an acceptable PI is achieved. notation is shown below in Figure 4

The final stage of the network For multilayer networks the output of one
implementation involves fixing the adaptive layer becomes the input to the following
weights and biases using the last values of the layer. The equations that describe this
training stage. The network then computes operation are
the output directly to give an estimated value
𝑎𝑚+1 = 𝑓 𝑚+1 (𝑊 𝑚+1 𝑎𝑚 + 𝑏 𝑚+1 ) 𝑓𝑜𝑟 𝑚
for the inputs. = 0,1, … 𝑀 − 1 1.1
where 𝑀 is the number of layers in the The steepest descent algorithm for the
network. approximate mean square error (stochastic
gradient descent) is
The neurons in the first layer receive external
inputs: 𝑚 𝑚 𝜕𝐹
𝑊𝑖,𝑗 (𝑘 + 1) = 𝑊𝑖,𝑗 (𝑘) − 𝛼 𝑚 1.6
𝜕𝑊𝑖,𝑗
𝑎0 = 𝑝 1.2
𝜕𝐹
which provides the starting point for Eq. (1.2) 𝑏𝑖𝑚 (𝑘 + 1) = 𝑏𝑖𝑚 (𝑘) − 𝛼 1.7
𝜕𝑏𝑖𝑚
The outputs of the neurons in the last layer
where 𝛼 is the learning rate
are considered the network outputs:
Because the error is an indirect function of
𝑎 = 𝑎𝑀 1.3
the weights in the hidden layers, we will use
The algorithm is provided with a set of
the chain rule of calculus to calculate the
examples of proper network behavior:
derivatives. To review the chain rule,
{𝑃1 , 𝑡1 }, {𝑃2 , 𝑡2 }, … , {𝑃𝑄 , 𝑡𝑄 } 1.4 suppose that we have a function that is an
explicit function only of the variable. We
Where PQ is an input to the network, and is
want to take the derivative of with respect to
the corresponding target output. We will
a third variable.
approximate the mean square error by

𝐹(𝑋) = (𝑡(𝑘) − 𝑎(𝑘))𝑇 (𝑡(𝑘) − 𝑎(𝑘))


= 𝑒(𝑘)𝑇 𝑒(𝑘) 1.5

Input First Layer Second Layer Output Layer

P W1 a1 1 a2 1
Rx1 1 W W
S x1 Rx1
S1x n1 2
+ f1 S2xS
1
n
2
f2
S3xS n3 f
3
a3
R s1x1 + +
2 1
s x1 s3x1
sx
1 b1 1 1
1 1 1
b b
R S1x1 S1 2 3
2 3
S x1 S S x1 S

𝒂𝟏 = 𝒇𝟏 ൫𝑾𝟏 𝑷 + 𝒃𝟏 ൯ The chain rule is then:


𝒂𝟐 = 𝒇𝟐 ൫𝑾𝟐 𝒂𝟏 + 𝒃𝟐 ൯ 𝒂𝟑 = 𝒇𝟑 ൫𝑾𝟑 𝒂𝟐 + 𝒃𝟑 ൯

Figure 4Three-Layer Network, Abbreviated Notation


𝑑𝑓(𝑛(𝑤)) 𝑑𝑓(𝑛) 𝑑𝑛(𝑤) 𝑏𝑖𝑚 (𝑘 + 1) = 𝑏𝑖𝑚 (𝑘) − 𝛼𝑆𝑖𝑚 1.17
= 𝑥 1.8
𝑑𝑤 𝑑𝑛 𝑑𝑤
In matrix form this becomes:
𝜕𝐹 𝜕𝐹 𝜕𝑛𝑖𝑚
𝑚 = 𝑥 1.9 𝑾𝑚 (𝑘 + 1) = 𝑾𝑚 (𝑘) − 𝛼𝑆 𝑚 (𝒂𝑚−1 )𝑇 1.18
𝜕𝑤𝑖,𝑗 𝜕𝑛𝑖𝑚 𝜕𝑤𝑖,𝑗
𝑚

𝜕𝐹 𝜕𝐹 𝜕𝑛𝑖𝑚 𝒃𝑚 (𝑘 + 1) = 𝒃𝑚 (𝑘) − 𝛼𝑆 𝑚 1.19


= 𝑥 1.10
𝜕𝑏𝑖𝑚 𝜕𝑛𝑖𝑚 𝜕𝑏𝑖𝑚 Where:
The second term in each of these equations 𝜕𝐹
can be easily computed, since the net input to 𝜕𝑛1𝑚
𝜕𝐹
layer m is an explicit function of the weights
𝜕𝐹 𝑚
and bias in that layer: 𝑆𝑚 = 𝑚 = 𝜕𝑛. 2 1.20
𝜕𝑛 .
.
𝑺𝒎−𝟏
𝜕𝐹
𝑛𝑖𝑚 = ∑ 𝑤𝑖,𝑗
𝑚 𝑚−1
𝑎𝑗 + 𝑏𝑖𝑚 1.11 [𝜕𝑛𝑠𝑚𝑚 ]
𝑗=1
Back propagating the Sensitivities
Therefore
Lastly, the sensitivities 𝑆 𝑚 will be calculated
𝜕𝑛𝑖𝑚 𝜕𝑛𝑚
𝑚 = 𝑎𝑗𝑚−1 , 𝜕𝑏𝑖𝑚 =1 1.12
𝜕𝑤𝑖,𝑗 𝑖 which requires another application of the
chain rule. The algorithm is known as
If we now define
backpropagation, because it employs a
𝜕𝐹
𝑆𝑖𝑚 = 𝑚 1.13 recurrence relationship whereby the
𝜕𝑛𝑖
sensitivity at layer m is determined from the
Sensitivity: (the sensitivity of F to change in sensitivity at layer m+1.
the ith element of the net input layer m ), then
To derive the recurrence relationship for the
Eq. (1.9) and Eq. (2.0) can be simplified to:
sensitivities, we will use the following
𝜕𝐹 𝑚 𝑚−1 Jacobian matrix:
𝑚 = 𝑆𝑖 𝑎𝑗 1.14
𝜕𝑤𝑖,𝑗
𝜕𝑛1𝑚+1 𝜕𝑛1𝑚+1 𝜕𝑛1𝑚+1
𝜕𝐹 …
𝜕𝑛1𝑚 𝜕𝑛1𝑚 𝜕𝑛1𝑚
= 𝑆𝑖𝑚 1.15
𝜕𝑏𝑖𝑚 𝜕𝑛1𝑚+1 𝜕𝑛1𝑚+1 𝜕𝑛1𝑚+1

𝜕𝒏𝑚+1 𝜕𝑛1𝑚 𝜕𝑛1𝑚 𝜕𝑛1𝑚
We can now express the approximate = . . . 1.21
𝜕𝒏𝑚
. . .
steepest descent algorithm as
. . .
𝑚 𝑚 𝜕𝑛1𝑚+1 𝜕𝑛1𝑚+1 𝜕𝑛1𝑚+1
𝑊𝑖,𝑗 (𝑘 + 1) = 𝑊𝑖,𝑗 (𝑘) − 𝛼𝑆𝑖𝑚 𝑎𝑗𝑚−1 1.16
[ 𝜕𝑛1𝑚 𝜕𝑛1𝑚 𝜕𝑛1𝑚 ]
The sensitivities are propagated backward
through the network from the last layer to the
Next we want to find an expression for this
first layer:
matrix. Consider the i, j element of the
matrix: 𝑆 𝑀 → 𝑆 𝑀−1 → ⋯ → 𝑆 2 → 𝑆 1
𝑚+1 𝑚 𝑚
𝜕𝑛𝑖𝑚+1 𝜕൫∑𝑆𝑙=1 𝑤𝑖,𝑙 𝑎𝑙 + 𝑏𝑖𝑚+1 ൯ Although the BP algorithm is the best among
=
𝜕𝑛𝑗𝑚 𝜕𝑛𝑗𝑚 the MLP networks, in its basic form it has two

𝑚+1
𝜕𝑓 𝑚 (𝑛𝑗𝑚 ) major limitations-long learning time and
= 𝑤𝑖,𝑗
𝜕𝑛𝑗𝑚 possibility of local minima [6,8].
𝑚+1 ̇ 𝑚
= 𝑤𝑖,𝑗 𝑓 ൫𝑛𝑗𝑚 ൯ 1.22 Variants of the BP algorithm

where 1. Learning with Momentum


𝜕𝑓 𝑚 (𝑛𝑗𝑚 ) This is a modification on BP algorithm based
𝑓̇ 𝑚 ൫𝑛𝑗𝑚 ൯ = 1.23
𝜕𝑛𝑗𝑚 on the observation that convergence might be
Therefore, the Jacobian matrix can be improved if we could smooth out the
written: oscillations in the trajectory. This can be
done with a low-pass filter to reduce the
𝜕𝑛𝑚+1
= 𝑊 𝑚+1 𝐹̇ 𝑚 (𝑛𝑚 ) 1.24 amount of oscillation, while still tracking the
𝜕𝑛𝑚
average value. From the general BP
Where:
algorithm, is it known that the parameter
𝐹̇ 𝑚 (𝑛𝑚 ) =
updates are:
𝑓̇ 𝑚 ൫𝑛𝑗𝑚 ൯ 0 …0
0 𝑓̇ 𝑚 ൫𝑛𝑗𝑚 ൯ …0 ∆𝑊 𝑚 (𝑘) = −𝛼𝑠 𝑚 (𝑎𝑚−1 )𝑇
. . . 1.25
. . . ∆𝑏 𝑚 (𝑘) = −𝛼𝑠 𝑚
. . .
[ 0 0 𝑓̇ 𝑚 ൫𝑛𝑗𝑚 ൯]
When the momentum filter is added to the
The recurrence relation for the sensitivity by parameter changes, we obtain the following
using the chain rule in matrix form is: equations:

𝜕𝐹 𝜕𝑛𝑚+1 𝜕𝐹
𝑇 ∆𝑊 𝑚 (𝑘) = 𝛾∆𝑊 𝑚 (𝑘 − 1)
𝑚
𝑆 = 𝑚=( )
𝜕𝑛 𝜕𝑛𝑚 𝜕𝑛𝑚+1 − (1 − 𝛾)𝛼𝑠 𝑚 (𝑎𝑚−1 )𝑇
𝜕𝐹
= 𝐹̇ 𝑚 (𝑛𝑚 )(𝑊 𝑚+1 )𝑇 ∆𝑏 𝑚 (𝑘) = 𝛾∆𝑏 𝑚 (𝑘 − 1) − (1 − 𝛾)𝛼𝑠 𝑚
𝜕𝑛𝑚+1
= 𝐹̇ 𝑚 (𝑛𝑚 )(𝑊 𝑚+1 )𝑇 𝑠 𝑚+1
Thus by using momentum we can use a larger The diagram in Figure 5 depicts the ANNs
learning rate, while maintaining the stability training procedure followed. This procedure
of the algorithm as well as accelerate is a continuous iterative process starting from
convergence when the trajectory is moving in data collection and preprocessing stage to
a consistent direction. achieve a more efficient neural network
training. While at this first step, the data were
2. Resilient Propagation algorithm
partitioned into training and testing sets.
In this variant of the BP algorithm, only the
Following this, selection of suitable network
sign of derivative is used to determine the
type that of a multilayer and architecture
weight update value. The implementation of
(e.g., number of hidden layers, number of
this algorithm follows the following rule:
nodes in these layers) were done. Then choice
a) If the partial derivative of the of appropriate training algorithm from the
corresponding weight has the same multitude of available paradigms were
sign for the two consecutive carried out to handles the task. Finally, once
iterations, the weight update is the ANNs is trained, analysis to determine
increased by a factor ɳ+ otherwise the the network performance was done. This last
weight update value is decreased by a stage has resulted some practical issues with
factor ɳ- if the derivative is zero, then the data, the network architecture, and the
the weight update value remains training algorithm which was dealt with as
same. illustrated in the practical section. The whole

b) However, if the weight continues to procedure is then iterated until an acceptable

change in the same direction i.e. if the performance is achieved.

derivative is positive, the weight is Pre-Training Steps


decreased by its update value
The pre-training steps involve comprises
otherwise the update value is added.
three separate tasks namely data collection,
For this study both variants are considered to data Preprocessing, and choice of Network
see which best fit to the task at hand. type and architecture.

Implementation of BP for textile machine Data Collection


steam consumption prediction
1. Daily production of steam
consuming Machine-Input Data
Data Collection &
Pre-processing

Set Network Type Choose a Training


& Configuration Algorithm

Analyze Network Initialize and


Performance Train Network

Implement
Network

Figure 5Methodology of BP implementation

Actual daily production from all steam The aim of this step is to lay a conducive
consuming machines were collected for the ground for better network training. Though
year 2016 in Bahir Dar textile factory. Part of several data pre-processing steps exit in the
these data are shown in Table 4 for August literature, this work used feature extraction,
2016. normalization, and handling of missing data.

2. Daily total steam production from an Feature extraction


electrical boiler (Collins Walker)- The available data for the ANNs output are
Output Data meter reading of an electrical boiler. These
The existing steam electrical boiler with its
data show the total electrical energy (KWh)
specification is given in Table 3.
consumed by the boiler. To make these data
Meter readings for the same year and day
useful a manipulation to get the total steam
as the input data were done. Table 4, last
delivered at the premises of the steam-
entry depicts these meter readings for
consuming machines is done. The procedure
August 2016.
is explained as follows:
Data pre-processing
The total steam delivered at the steam- and 1. This can be done with the
consuming machines is given by transformation

𝑆𝐵 = 𝑆𝑀 + 𝑆𝑙𝑜𝑠𝑠 𝐷 − 𝐷𝑚𝑖𝑛
𝐷𝑛 =
𝐷𝑚𝑎𝑥 − 𝐷𝑚𝑖𝑛
Where 𝑆𝑀 is the steam delivered, 𝑆𝐵 is the
total boiler steam produced and 𝑆𝑙𝑜𝑠𝑠 is the where 𝐷𝑚𝑖𝑛 is the minimum of the input

steam transmission loss vectors in the data set, and 𝐷𝑚𝑎𝑥 is the
maximum value.
The total daily steam produced by the boiler
can be determined from Practically what this normalization does is to
shift zero of the scale and normalize the
𝐵𝐾𝑊ℎ
𝑆𝐵 = 𝑏𝑥 standard deviation of the data. Also shuffling
𝐵𝐾𝑊
of these data were done to decrease the effect
Where 𝐵𝐾𝑊ℎ is the daily electrical energy
of learning of the network for similar sets of
consumed by boiler, 𝐵𝐾𝑊 is the rated boiler
data at the expense of another.
power that relates to boiler steam production
Missing Data
b in Kg as given in boiler specification Table.
Because of limited data, we just can’t afford
The steam loss could range from 5-20% of
to simply throw out missing data. Rather, two
the steam produced [4]. In the current model,
strategies were used depending on whether
a stochastic representation of this loss as a
the missing data was from input or output.
uniform distribution of the minimum and
When there was a missing input data, a flag
maximum values was used. This was done to
to know this data (either a 1 or 0) were set and
reduce the uncertainty of quantifying the
a replacement of this missing component
steam loss in the several varying steam
with the average values of the input data were
distribution networks.
carried out. Instead when a missing data was
Normalization and shuffling
present at the output a modification of the
It is reported in [5] that rescaling or error performance was done in such a way
normalization of training data improves the that, for this particular data the performance
learning and convergence of a network. The calculation was skipped to nullify its
normalization procedure used in this work contribution to learning process.
aims to adjust the data so that they have a
Finally, the collected data was divided in to
specified mean and variance — typically 0
two sets: training, and testing. The training
set made up 85% of the full data set, with learning process. However, it is highly
testing making up the remaining 15% each. unlikely to use more than two hidden layers
Caution to make each of these sets for a standard function approximation
representative of the full data set — that the problem [8].
test sets cover the same region of the input To fix the number of neuron in the hidden
space as the training set were considered. For layer, different authors suggest a rule-of
this, selection of each set from the full data thumb from their experiences. In [9] it is
set were done. given as
Choice of Network Architecture
𝑛 = √𝑛𝑖 + 𝑛𝑜 + 𝑎
The universally accepted network
Where n is the number of hidden neurons, 𝑛𝑖
architecture for fitting problems is the
and 𝑛𝑜 are number of neurons in input and
multilayer perceptron [6-7]. It was shown in
output and a is a constant between 1 and 10.
[8] that this standard neural configuration
uses tansig function in the hidden layers, and Another work [10] suggested to use

linear function in the output layer. This is 𝑁ℎ = 𝑁𝑝 𝑥√(𝑁𝑖 + 𝑁𝑜 )


because the former function produces outputs
Where 𝑁ℎ is hidden neuron numbers, 𝑁𝑝 is
(which are inputs to the next layer) that are
number of training samples, 𝑁𝑖 &𝑁𝑜 are input
centered near zero, whereas the later function
always produces positive outputs. and output neurons.

The choice of the optimum number of hidden The authors strongly believe that the best way

units depends on many factors whose is to try multiple runs for a range of different

interactions are not easy to understand. These hidden layers with different neurons in each

factors are amount of training data, number layer and observe the network performance.

of input and output units, the level of For the current work, two hidden layers with

generalization requirement from the network, ten neurons in each layer achieved the set

type of transfer function and the training performance criterion.

algorithm [6]. Conflicting trends are Training the Network


observed when the number of hidden units
Now that we have done the preliminary work
vary i.e. too few leads to under-fitting while and set the network architecture, it is time to
too many results in over-fitting and slow
get on the training stage.
Weight Initialization considered. There are several methods
reported in the literature such as stooping
ANNs weights should be initialized with
when the performance index reaches a certain
small random values. Since the BP algorithm
level, setting a high training iteration number,
work on the weights in a similar fashion,
training for a fixed iteration then restarting
initializing these weights alike will
the training with initial weights from
eventually make all units learn in the same
previous training and stopping when the
way. Similarly, these small random values
gradient of the performance index is
will result in network output that corresponds
sufficiently low [14-15]. For this work a
to highest weight update [11]. In this work,
stopping criterion when either the
effort has been made to make the
performance index is met or when a large
performance of the final trained neural
number of iteration reached is implemented
network independent of the choice of initial
for the simple reason it met the practical
weight values. For that several runs of the
requirement of the task.
network for different initial weight values
were performed that has resulted in similar Post-Training Analysis
performance. Prior to concluding the work, analysis of the
Choice of Training Algorithm trained network to see if the training was
successful is necessary. A powerful method
For multilayer networks for function
of doing this is to do curve fitting for
approximation, the learning with momentum
regression between the trained network
and gradient descent training algorithm
outputs and the corresponding targets [8].
provide a guaranteed performance
For that, we fit a linear function of the form
minimization of the error function with
relatively fast convergence rate [12-13]. In 𝑎𝑞 = 𝑚𝑡𝑞 + 𝑐 + 𝜀𝑞
this work, both of this algorithms were tested where m & c are the slope & offset,
to check their validity for the task at hand. respectively, of the linear function, 𝑡𝑞 is a
target value, 𝑎𝑞 is a trained network output,
Stopping Criteria and 𝜀𝑞 is the residual error of the regression.
For the majority of practical neural networks, The terms in the regression can be computed
the training error never converges identically as follows:

to zero. As a result, other criteria for deciding


when to stop the training is generally
Table 4Production rate vs boiler meter reading

Meter Reading
Steam Consuming Machine Production (m2) (KWh)
Bleaching Washing Calendaring Sizing Jigger Boiler
Date
1-Aug-16 19700 27294 7413 13443 10600 38934
2-Aug-16 17276 12161 18325 14820 13090 38939
3-Aug-16 15500 23199 0 11154 11900 38942
4-Aug-16 10484 8765 4088 15187 8600 38947
5-Aug-16 13198 17699 15944 15275 6730 38950
6-Aug-16 22546 12974 19427 13622 7300 38954
7-Aug-16 0 4326 0 3654 800 0
8-Aug-16 0 3100 884 4163 3800 38454
9-Aug-16 8400 0 0 3760 4450 38960
10-Aug-16 1300 7923 0 8852 11950 38961
11-Aug-16 7300 13267 5500 13828 4800 38964
12-Aug-16 11950 18240 0 15654 14600 38968
13-Aug-16 28695 3700 3601 16306 72466.32 38972
14-Aug-16 16200 19671 0 15392 7300 0
15-Aug-16 13600 2986 7484 14709 17100 38980
16-Aug-16 4460 24151 0 14362 16000 38983
17-Aug-16 15137 20968 22084 14027 11900 38987
18-Aug-16 22000 13246 0 13042 11200 38992
19-Aug-16 23960 28312 0 13533 13200 38995
20-Aug-16 13620 19793 10306 12531 2000 39000
21-Aug-16 4200 3167 0 4538 500 0
22-Aug-16 9000 5730 10935 3769 4550 39005
23-Aug-16 9500 20940 26936 9029 5900 39007
24-Aug-16 8200 19855 17364 12430 12250 39011
25-Aug-16 17500 23378 192 13260 6500 39014
26-Aug-16 9880 19764 0 14060 9700 39018
27-Aug-16 0 7300 10944 12865 2100 39022
28-Aug-16 5150 3755 3760 9873 6300 0
29-Aug-16 0 0 1698 3017 0 39027
30-Aug-16 0 3900 0 0 0 0
31-Aug-16 0 0 0 0 0 0
∑𝑄 ̅
𝑞=1൫𝑡𝑞 − 𝑡൯ ൫𝑎𝑞 − 𝑎
̅൯
𝑚
̂= 2
∑𝑄 ̅
𝑞=1൫𝑡𝑞 − 𝑡 ൯ On the other hand, the learning with
̂ 𝑡̅
𝑐̂ = 𝑎̅ − 𝑚 momentum gives a fair result only for simpler
Where, network configuration.
1 1
𝑎̅ = 𝑄 ∑𝑄
𝑞=1 𝑎𝑞 , 𝑡̅ = 𝑄 ∑𝑄
𝑞=1 𝑡𝑞 Next we will consider performance of the

A plot of this fitting to gauge the performance of best resilient configuration. Figure 7 shows
the proposed ANNs is discussed in the results the regression analysis where the solid line
section. represents the linear regression, the thin

Results and Discussions dotted line represents the perfect match, and
the circles represent the data points. From
A Matlab script file for the implementation of
this figure it is possible to see that the match
both the learning with momentum and
is good, although not perfect. There are few
resilient gradient variants of the BP algorithm
points that seem to diverge from the
were written. This code was run for different
regressed line. This might rise due to the
learning rates and varying number of hidden
presence of an incorrect data point, or
neurons. The regression coefficient R and
because the data is far from other training
Mean Square Error (MSE) were compared.
points. The latter is the case here since the
As can be seen from Table 5, the resilient
data used is not representative of all input
gradient method shows superior performance
space. Analysis of the scatter plot as shown
as the complexity of the neural network
in Figure 6 clearly shows the case.
increase.

Table 5 Gradient Descent vs momentum

Architecture and Results


Algorithm Hidden
Neuron [10 10] [25 10] [25 25] [50 50] [100 100]

R MSE R MSE R MS R MS R MSE


Learning
E E
Rate
Gradient Descent 𝜂1 = .1 .8458 .0396 .5540 .0335 .6761 .0356 .8545 .0312 .8592 .0579
𝜂2 = .15 .5758 .0429 .8220 .0591 .6574 .0428 .9428 .0661 .8113 .0687
𝜂2 = .2 .5802 .0312 .8712 .0460 .7647 .0814 .8464 .0884 .9745 .0697
Learning with 𝜂1 = 1 .0358 .0376 .2248 .0411 .2823 .0349 .0570 .0885 -.1493 .0891
Momentum 𝜂2 .0289 .0353 .0653 .0367 -.1347 .0957 -.0771 .0996 .0059 .0950
𝜂3 .2569 .0335 .2881 .0337 -.3239 .0955 -.0935 .0787 -.0257 .0863
1.2000
Washing Calendering Calendering Jigger

1.0000

0.8000

0.6000

0.4000

0.2000

0.0000
0.0000 0.2000 0.4000 0.6000 0.8000 1.0000 1.2000

Figure 6 Scatter graph of Input data

Addition of points that span the whole data The R value varies from –1 to 1, however it
space will improve the generalization is should be closer to 1 for prediction
capability of the proposed neural network. applications of BP algorithm. R=1 means all
Additionally, the correlation coefficient of the data points lie exactly on the regression
between the estimated and target values, line & R=-1 means they are randomly
which is the R value was computed. scattered away from the regression line.

Figure 7 Regression results for the best BP


For this case as can be seen from Figure 4, the The MSE and plot of the MLP output vs
data does not fall exactly on the regression target values are given in Figure 8 and 9
line, but the variation is very small. respectively.

Figure 8 MSE of the best MLP

Figure 9 Trained vs target output

Table 6 Textile machine Steam consumption rate

Steam Consuming Machines

Bleaching Washing Calendaring Jigger Sizing

Steam consumption (Kg/Kg) 0.6-0.9 0.7-1.1 0.8-1.4 1.2-4.5 7.8-


Table 6 gives the final result i.e. the steam
consumption rate of each textile machine. For
this the average production value of the
machine is presented to the network as input.
The output value is given in a range because
of the random stochastic nature of steam loss
and weight initialization used.
Reference

1. Mulu Gebreeyesus, “Industrial policy and development in Ethiopia: Evolution and present
experimentation”, Working Paper No. 6, The Brookings Institution
2. “An overview of facts and opportunities of Ethiopian textile industry”, Ethiopian textile industry
development Institute (ETIDI), October 2014.
3. “An abstract to Ethiopia’s Textile Chemical Processing/Finishing Industry”, Ethiopian textile
industry development institute, Finishing Technology Directorate
4. “Energy Audit of Bahir Dar Textile Share Company, Ethiopia”, Bangalore: The Energy and
Resources Institute; 53 pp., Project Report No. 2013IB22], 2014
5. Jiang, J., Zhang, J., Yang, G., Zhang, D., and Zhang, L. 2010. “Application of Back
Propagation Neural Network in the Classification of High Resolution Remote Sensing
Image: Take Remote Sensing Image of Beijing for Instance.” In Proceedings of 18th
International Conference on Geoinformatics, IEEE, 1-6.
6. Alsmadi, M., Omar, K., and Noah, S. 2009. “Back Propagation Algorithm: The Best
Algorithm among the Multi-layer Perceptron Algorithm,” IJCSNS International Journal of
Computer Science and Network Security 9 (4): 378-83.
7. K. M. Hornik, M. Stinchcombe and H. White, “Multilayer feedforward networks are
universal approximators,” Neural Networks, vol. 2, no. 5, pp. 359–366, 1989.
8. Neural Network Design 2nd Edtion, Martin T. Hagan and Howard B. Demuth
9. Comparative Study of Adaptive Learning Rate with Momentum and Resilient Back
Propagation Algorithms for Neural Net Classifier Optimization, Saduf Afzal, Mohd. Arif
Wani
10. Wahed, M. A. (2008). Adaptive learning rate versus Resilient back propagation for
numeral recognition. Journal of Al-Anbar University for Pure Science, 94-105.
11. D. Nguyen and B. Widrow, “Improving the learning speed of 2-layer neural networks by
choosing initial values of the adaptive weights,” Proceedings of the IJCNN, vol. 3, pp. 21–
26, July 1990.
12. E. Barnard, “Optimization for training neural nets,” IEEE Trans. on Neural Networks, vol.
3, no. 2, pp. 232–240, 1992.
13. T. P. Vogl, J. K. Mangis, A. K. Zigler, W. T. Zink and D. L. Alkon, “Accelerating the
convergence of the backpropagation method,” Biological Cybernetics., vol. 59, pp. 256–
264, Sept. 1988.
14. W. S. Sarle, “Stopped training and other remedies for overfitting,” In Proceedings of the
27th Symposium on Interface, 1995.
15. C. Wang, S. S. Venkatesh, and J. S. Judd, “Optimal Stopping and Effective Machine
Complexity in Learning,” Advances in Neural Information Processing Systems, J. D.
Cowan, G. Tesauro, and J. Alspector, Eds., vol. 6, pp. 303- 310, 1994.
16.

You might also like