Professional Documents
Culture Documents
available related data through an ANNs. industries in terms of GDP and industrial
These related data are daily production from output by value terms is 1.6% and 12.4%
steam consuming machines and a KWh meter respectively [2]. Table 1 and Figure 1 show
multilayer feedforward neural network. The non-fossil boilers which consume furnace,
work has taken in to consideration all issues diesel or coal as a source of energy with the
exception of two factories which partially run including yarns and fabrics. It was
electric boilers. Currently about 8,112,000 established in 1961 from the fund of Italian
tons of coal, 10,256,803 liter of furnace oil war reparation in the town of Bahir Dar, 570
are utilized every year by the sixteen km North West of Addis Ababa, Ethiopia.
processing units in the country. In addition The factory is utilizing substantial quantity of
about 70,607,459 kilowatt of power and cotton as raw material, and supplies its
35,344,417 m³ of water was consumed by the products to the local market.
sub-industry in 2011[3].
Integrated
Integrated mills mills
Spinning
18%
Spinning 4%
Weaving/Knitting
Handloom Garment factory Weaving/Knitting
54% 11%
Dyeing & Printing
Blanket factory Handloom
5%
Garment factory
Dyeing & Printing
Blanket factory 3%
5%
Power Consumption
Electrical Boiler Factory Power Factor, pf Unit Cost
Month Active Reactive Active Reactive Boiler Factory Birr/KWh
(KWh) (kVAr) (KWh) (kVAr)
16-Jan 126 98 193 139 0.66 0.62 0.453
16-Feb 82 69 200 142 0.70 0.62 0.453
16-Mar 85 77 213 155 0.74 0.63 0.453
16-Apr 90 79 202 149 0.72 0.64 0.453
16-May 75 68 180 135 0.74 0.64 0.453
16-Jun 103 91 200 149 0.72 0.64 0.453
16-Jul 88 79 189 142 0.73 0.64 0.453
16-Aug 99 92 161 128 0.75 0.67 0.453
16-Sep 62 58 131 112 0.75 0.71 0.453
16-Oct 93 85 200 147 0.74 0.63 0.453
16-Nov 95 86 212 156 0.74 0.63 0.453
16-Dec 93 85 207 153 0.74 0.64 0.453
for this study due to the availability of data The electrical energy consumption at Bahir
relating to production process and an existing Dar textile comprises the factory and an
steam electrical boiler which is a prerequisite electrical steam boiler. Monthly active and
for further application of the results of this reactive power of the same is collected from
work. each meter reading for the year 2016. Table 2
depicts the detail characteristics.
30000
25000
20000
Production
15000
10000
5000
0
Spinning Weaving Finishing Garment
Textile Sub-section
and gives the final value. A typical MLP The MLP Architecture and the Back
neural network is depicted in Figure 2. Propagation (BP) Algorithm
In MLP the training is implemented by It will simplify our development of the back
examples prior to their usage as a useful propagation algorithm if we use the
network. This training attempts to iteratively abbreviated notation for the multilayer
adjust connection weights and biases using a network which is stated as follows:
known training data. To facilitate this Scalars-small italic letters: a,b,c
training, the outputs from the network is
Vectors-small bold non italic letters: a,b,c
compared to the target examples, which is
known as the error performance index (PI). Matrices-capital BOLD non-italic letters: A,
through the network to adjust weights and The three-layer network in abbreviated
biases until an acceptable PI is achieved. notation is shown below in Figure 4
The final stage of the network For multilayer networks the output of one
implementation involves fixing the adaptive layer becomes the input to the following
weights and biases using the last values of the layer. The equations that describe this
training stage. The network then computes operation are
the output directly to give an estimated value
𝑎𝑚+1 = 𝑓 𝑚+1 (𝑊 𝑚+1 𝑎𝑚 + 𝑏 𝑚+1 ) 𝑓𝑜𝑟 𝑚
for the inputs. = 0,1, … 𝑀 − 1 1.1
where 𝑀 is the number of layers in the The steepest descent algorithm for the
network. approximate mean square error (stochastic
gradient descent) is
The neurons in the first layer receive external
inputs: 𝑚 𝑚 𝜕𝐹
𝑊𝑖,𝑗 (𝑘 + 1) = 𝑊𝑖,𝑗 (𝑘) − 𝛼 𝑚 1.6
𝜕𝑊𝑖,𝑗
𝑎0 = 𝑝 1.2
𝜕𝐹
which provides the starting point for Eq. (1.2) 𝑏𝑖𝑚 (𝑘 + 1) = 𝑏𝑖𝑚 (𝑘) − 𝛼 1.7
𝜕𝑏𝑖𝑚
The outputs of the neurons in the last layer
where 𝛼 is the learning rate
are considered the network outputs:
Because the error is an indirect function of
𝑎 = 𝑎𝑀 1.3
the weights in the hidden layers, we will use
The algorithm is provided with a set of
the chain rule of calculus to calculate the
examples of proper network behavior:
derivatives. To review the chain rule,
{𝑃1 , 𝑡1 }, {𝑃2 , 𝑡2 }, … , {𝑃𝑄 , 𝑡𝑄 } 1.4 suppose that we have a function that is an
explicit function only of the variable. We
Where PQ is an input to the network, and is
want to take the derivative of with respect to
the corresponding target output. We will
a third variable.
approximate the mean square error by
P W1 a1 1 a2 1
Rx1 1 W W
S x1 Rx1
S1x n1 2
+ f1 S2xS
1
n
2
f2
S3xS n3 f
3
a3
R s1x1 + +
2 1
s x1 s3x1
sx
1 b1 1 1
1 1 1
b b
R S1x1 S1 2 3
2 3
S x1 S S x1 S
𝑚+1
𝜕𝑓 𝑚 (𝑛𝑗𝑚 ) major limitations-long learning time and
= 𝑤𝑖,𝑗
𝜕𝑛𝑗𝑚 possibility of local minima [6,8].
𝑚+1 ̇ 𝑚
= 𝑤𝑖,𝑗 𝑓 ൫𝑛𝑗𝑚 ൯ 1.22 Variants of the BP algorithm
𝜕𝐹 𝜕𝑛𝑚+1 𝜕𝐹
𝑇 ∆𝑊 𝑚 (𝑘) = 𝛾∆𝑊 𝑚 (𝑘 − 1)
𝑚
𝑆 = 𝑚=( )
𝜕𝑛 𝜕𝑛𝑚 𝜕𝑛𝑚+1 − (1 − 𝛾)𝛼𝑠 𝑚 (𝑎𝑚−1 )𝑇
𝜕𝐹
= 𝐹̇ 𝑚 (𝑛𝑚 )(𝑊 𝑚+1 )𝑇 ∆𝑏 𝑚 (𝑘) = 𝛾∆𝑏 𝑚 (𝑘 − 1) − (1 − 𝛾)𝛼𝑠 𝑚
𝜕𝑛𝑚+1
= 𝐹̇ 𝑚 (𝑛𝑚 )(𝑊 𝑚+1 )𝑇 𝑠 𝑚+1
Thus by using momentum we can use a larger The diagram in Figure 5 depicts the ANNs
learning rate, while maintaining the stability training procedure followed. This procedure
of the algorithm as well as accelerate is a continuous iterative process starting from
convergence when the trajectory is moving in data collection and preprocessing stage to
a consistent direction. achieve a more efficient neural network
training. While at this first step, the data were
2. Resilient Propagation algorithm
partitioned into training and testing sets.
In this variant of the BP algorithm, only the
Following this, selection of suitable network
sign of derivative is used to determine the
type that of a multilayer and architecture
weight update value. The implementation of
(e.g., number of hidden layers, number of
this algorithm follows the following rule:
nodes in these layers) were done. Then choice
a) If the partial derivative of the of appropriate training algorithm from the
corresponding weight has the same multitude of available paradigms were
sign for the two consecutive carried out to handles the task. Finally, once
iterations, the weight update is the ANNs is trained, analysis to determine
increased by a factor ɳ+ otherwise the the network performance was done. This last
weight update value is decreased by a stage has resulted some practical issues with
factor ɳ- if the derivative is zero, then the data, the network architecture, and the
the weight update value remains training algorithm which was dealt with as
same. illustrated in the practical section. The whole
Implement
Network
Actual daily production from all steam The aim of this step is to lay a conducive
consuming machines were collected for the ground for better network training. Though
year 2016 in Bahir Dar textile factory. Part of several data pre-processing steps exit in the
these data are shown in Table 4 for August literature, this work used feature extraction,
2016. normalization, and handling of missing data.
𝑆𝐵 = 𝑆𝑀 + 𝑆𝑙𝑜𝑠𝑠 𝐷 − 𝐷𝑚𝑖𝑛
𝐷𝑛 =
𝐷𝑚𝑎𝑥 − 𝐷𝑚𝑖𝑛
Where 𝑆𝑀 is the steam delivered, 𝑆𝐵 is the
total boiler steam produced and 𝑆𝑙𝑜𝑠𝑠 is the where 𝐷𝑚𝑖𝑛 is the minimum of the input
steam transmission loss vectors in the data set, and 𝐷𝑚𝑎𝑥 is the
maximum value.
The total daily steam produced by the boiler
can be determined from Practically what this normalization does is to
shift zero of the scale and normalize the
𝐵𝐾𝑊ℎ
𝑆𝐵 = 𝑏𝑥 standard deviation of the data. Also shuffling
𝐵𝐾𝑊
of these data were done to decrease the effect
Where 𝐵𝐾𝑊ℎ is the daily electrical energy
of learning of the network for similar sets of
consumed by boiler, 𝐵𝐾𝑊 is the rated boiler
data at the expense of another.
power that relates to boiler steam production
Missing Data
b in Kg as given in boiler specification Table.
Because of limited data, we just can’t afford
The steam loss could range from 5-20% of
to simply throw out missing data. Rather, two
the steam produced [4]. In the current model,
strategies were used depending on whether
a stochastic representation of this loss as a
the missing data was from input or output.
uniform distribution of the minimum and
When there was a missing input data, a flag
maximum values was used. This was done to
to know this data (either a 1 or 0) were set and
reduce the uncertainty of quantifying the
a replacement of this missing component
steam loss in the several varying steam
with the average values of the input data were
distribution networks.
carried out. Instead when a missing data was
Normalization and shuffling
present at the output a modification of the
It is reported in [5] that rescaling or error performance was done in such a way
normalization of training data improves the that, for this particular data the performance
learning and convergence of a network. The calculation was skipped to nullify its
normalization procedure used in this work contribution to learning process.
aims to adjust the data so that they have a
Finally, the collected data was divided in to
specified mean and variance — typically 0
two sets: training, and testing. The training
set made up 85% of the full data set, with learning process. However, it is highly
testing making up the remaining 15% each. unlikely to use more than two hidden layers
Caution to make each of these sets for a standard function approximation
representative of the full data set — that the problem [8].
test sets cover the same region of the input To fix the number of neuron in the hidden
space as the training set were considered. For layer, different authors suggest a rule-of
this, selection of each set from the full data thumb from their experiences. In [9] it is
set were done. given as
Choice of Network Architecture
𝑛 = √𝑛𝑖 + 𝑛𝑜 + 𝑎
The universally accepted network
Where n is the number of hidden neurons, 𝑛𝑖
architecture for fitting problems is the
and 𝑛𝑜 are number of neurons in input and
multilayer perceptron [6-7]. It was shown in
output and a is a constant between 1 and 10.
[8] that this standard neural configuration
uses tansig function in the hidden layers, and Another work [10] suggested to use
The choice of the optimum number of hidden The authors strongly believe that the best way
units depends on many factors whose is to try multiple runs for a range of different
interactions are not easy to understand. These hidden layers with different neurons in each
factors are amount of training data, number layer and observe the network performance.
of input and output units, the level of For the current work, two hidden layers with
generalization requirement from the network, ten neurons in each layer achieved the set
Meter Reading
Steam Consuming Machine Production (m2) (KWh)
Bleaching Washing Calendaring Sizing Jigger Boiler
Date
1-Aug-16 19700 27294 7413 13443 10600 38934
2-Aug-16 17276 12161 18325 14820 13090 38939
3-Aug-16 15500 23199 0 11154 11900 38942
4-Aug-16 10484 8765 4088 15187 8600 38947
5-Aug-16 13198 17699 15944 15275 6730 38950
6-Aug-16 22546 12974 19427 13622 7300 38954
7-Aug-16 0 4326 0 3654 800 0
8-Aug-16 0 3100 884 4163 3800 38454
9-Aug-16 8400 0 0 3760 4450 38960
10-Aug-16 1300 7923 0 8852 11950 38961
11-Aug-16 7300 13267 5500 13828 4800 38964
12-Aug-16 11950 18240 0 15654 14600 38968
13-Aug-16 28695 3700 3601 16306 72466.32 38972
14-Aug-16 16200 19671 0 15392 7300 0
15-Aug-16 13600 2986 7484 14709 17100 38980
16-Aug-16 4460 24151 0 14362 16000 38983
17-Aug-16 15137 20968 22084 14027 11900 38987
18-Aug-16 22000 13246 0 13042 11200 38992
19-Aug-16 23960 28312 0 13533 13200 38995
20-Aug-16 13620 19793 10306 12531 2000 39000
21-Aug-16 4200 3167 0 4538 500 0
22-Aug-16 9000 5730 10935 3769 4550 39005
23-Aug-16 9500 20940 26936 9029 5900 39007
24-Aug-16 8200 19855 17364 12430 12250 39011
25-Aug-16 17500 23378 192 13260 6500 39014
26-Aug-16 9880 19764 0 14060 9700 39018
27-Aug-16 0 7300 10944 12865 2100 39022
28-Aug-16 5150 3755 3760 9873 6300 0
29-Aug-16 0 0 1698 3017 0 39027
30-Aug-16 0 3900 0 0 0 0
31-Aug-16 0 0 0 0 0 0
∑𝑄 ̅
𝑞=1൫𝑡𝑞 − 𝑡൯ ൫𝑎𝑞 − 𝑎
̅൯
𝑚
̂= 2
∑𝑄 ̅
𝑞=1൫𝑡𝑞 − 𝑡 ൯ On the other hand, the learning with
̂ 𝑡̅
𝑐̂ = 𝑎̅ − 𝑚 momentum gives a fair result only for simpler
Where, network configuration.
1 1
𝑎̅ = 𝑄 ∑𝑄
𝑞=1 𝑎𝑞 , 𝑡̅ = 𝑄 ∑𝑄
𝑞=1 𝑡𝑞 Next we will consider performance of the
A plot of this fitting to gauge the performance of best resilient configuration. Figure 7 shows
the proposed ANNs is discussed in the results the regression analysis where the solid line
section. represents the linear regression, the thin
Results and Discussions dotted line represents the perfect match, and
the circles represent the data points. From
A Matlab script file for the implementation of
this figure it is possible to see that the match
both the learning with momentum and
is good, although not perfect. There are few
resilient gradient variants of the BP algorithm
points that seem to diverge from the
were written. This code was run for different
regressed line. This might rise due to the
learning rates and varying number of hidden
presence of an incorrect data point, or
neurons. The regression coefficient R and
because the data is far from other training
Mean Square Error (MSE) were compared.
points. The latter is the case here since the
As can be seen from Table 5, the resilient
data used is not representative of all input
gradient method shows superior performance
space. Analysis of the scatter plot as shown
as the complexity of the neural network
in Figure 6 clearly shows the case.
increase.
1.0000
0.8000
0.6000
0.4000
0.2000
0.0000
0.0000 0.2000 0.4000 0.6000 0.8000 1.0000 1.2000
Addition of points that span the whole data The R value varies from –1 to 1, however it
space will improve the generalization is should be closer to 1 for prediction
capability of the proposed neural network. applications of BP algorithm. R=1 means all
Additionally, the correlation coefficient of the data points lie exactly on the regression
between the estimated and target values, line & R=-1 means they are randomly
which is the R value was computed. scattered away from the regression line.
1. Mulu Gebreeyesus, “Industrial policy and development in Ethiopia: Evolution and present
experimentation”, Working Paper No. 6, The Brookings Institution
2. “An overview of facts and opportunities of Ethiopian textile industry”, Ethiopian textile industry
development Institute (ETIDI), October 2014.
3. “An abstract to Ethiopia’s Textile Chemical Processing/Finishing Industry”, Ethiopian textile
industry development institute, Finishing Technology Directorate
4. “Energy Audit of Bahir Dar Textile Share Company, Ethiopia”, Bangalore: The Energy and
Resources Institute; 53 pp., Project Report No. 2013IB22], 2014
5. Jiang, J., Zhang, J., Yang, G., Zhang, D., and Zhang, L. 2010. “Application of Back
Propagation Neural Network in the Classification of High Resolution Remote Sensing
Image: Take Remote Sensing Image of Beijing for Instance.” In Proceedings of 18th
International Conference on Geoinformatics, IEEE, 1-6.
6. Alsmadi, M., Omar, K., and Noah, S. 2009. “Back Propagation Algorithm: The Best
Algorithm among the Multi-layer Perceptron Algorithm,” IJCSNS International Journal of
Computer Science and Network Security 9 (4): 378-83.
7. K. M. Hornik, M. Stinchcombe and H. White, “Multilayer feedforward networks are
universal approximators,” Neural Networks, vol. 2, no. 5, pp. 359–366, 1989.
8. Neural Network Design 2nd Edtion, Martin T. Hagan and Howard B. Demuth
9. Comparative Study of Adaptive Learning Rate with Momentum and Resilient Back
Propagation Algorithms for Neural Net Classifier Optimization, Saduf Afzal, Mohd. Arif
Wani
10. Wahed, M. A. (2008). Adaptive learning rate versus Resilient back propagation for
numeral recognition. Journal of Al-Anbar University for Pure Science, 94-105.
11. D. Nguyen and B. Widrow, “Improving the learning speed of 2-layer neural networks by
choosing initial values of the adaptive weights,” Proceedings of the IJCNN, vol. 3, pp. 21–
26, July 1990.
12. E. Barnard, “Optimization for training neural nets,” IEEE Trans. on Neural Networks, vol.
3, no. 2, pp. 232–240, 1992.
13. T. P. Vogl, J. K. Mangis, A. K. Zigler, W. T. Zink and D. L. Alkon, “Accelerating the
convergence of the backpropagation method,” Biological Cybernetics., vol. 59, pp. 256–
264, Sept. 1988.
14. W. S. Sarle, “Stopped training and other remedies for overfitting,” In Proceedings of the
27th Symposium on Interface, 1995.
15. C. Wang, S. S. Venkatesh, and J. S. Judd, “Optimal Stopping and Effective Machine
Complexity in Learning,” Advances in Neural Information Processing Systems, J. D.
Cowan, G. Tesauro, and J. Alspector, Eds., vol. 6, pp. 303- 310, 1994.
16.