Electronics 07 00222 PDF

electronics
Article
A Prediction Methodology of Energy Consumption
Based on Deep Extreme Learning Machine and
Comparative Analysis in Residential Buildings
Muhammad Fayaz and DoHyeun Kim *
Department of Computer Engineering, Jeju National University, Jejusi 63243, Jeju Special Self-Governing
Province, Korea; hamaz_khan@yahoo.com
* Correspondence: kimdh@jejunu.ac.kr; Tel.: +82-64-754-3658

Received: 6 August 2018; Accepted: 18 September 2018; Published: 28 September 2018
Abstract: In this paper, we have proposed a methodology for energy consumption prediction in
residential buildings. The proposed method consists of four different layers, namely data acquisition,
preprocessing, prediction, and performance evaluation. For experimental analysis, we have collected
real data from four multi-storied residential building. The collected data are provided as input for
the acquisition layer. In the pre-processing layer, several data cleaning and preprocessing schemes
were deployed to remove abnormalities from the data. In the prediction layer, we have used the deep
extreme learning machine (DELM) for energy consumption prediction. Further, we have also used the
adaptive neuro-fuzzy inference system (ANFIS) and artificial neural network (ANN) in the prediction
layer. In the DELM different numbers of hidden layers, different hidden neurons, and various
types of activation functions have been used to achieve the optimal structure of DELM for energy
consumption prediction. Similarly, in the ANN, we have employed a different combination of hidden
neurons with different types of activation functions to get the optimal structure of ANN. To obtain the
optimal structure of ANFIS, we have employed a different number and type of membership functions.
In the performance evaluation layer for the comparative analysis of three prediction algorithms,
we have used the mean absolute error (MAE), root mean square error (RMSE) and mean absolute
percentage error (MAPE). The results indicate that the performance of DELM is far better than ANN
and ANFIS for one-week and one-month hourly energy prediction on the given data.
Keywords: energy; prediction; residential building; machine learning algorithms; consumption
1. Introduction
The energy consumption in residential buildings has significantly increased in the last decade.
Energy is an essential part of our lives and almost all things in some way are associated with
electricity [1,2]. According to the report issued by the US Energy Information Administration (EIA),
28% growth in global energy demand may occur until 2040 [3]. Due to improper usage, a tremendous
amount of energy is wasted annually; hence, energy wastage can be avoided by efficient utilization of
energy. Smart solutions are required to certify the proper use of energy [4]. An energy consumption
prediction is very significant to achieve efficient energy maintenance and reduce environmental
effect [5–7]. However, in residential buildings, it is quite challenging as there are many types of
buildings and different forms of energy. Also many factors are involved to influence the energy
behaviour of the building structures, such as weather circumstances, the physical material used in
the building construction, company behaviour, sub-level systems, i.e., lighting, heating, ventilating,
and air-conditioning (HVAC) systems, and the execution and routines of the sub-level components [8].
Technologies based on the Internet of Things (IoT) are immensely significant to comprehend
the notion of smart homes. Numerous solutions for obtaining energy consumption predictions in
Electronics 2018, 7, 222; doi:10.3390/electronics7100222 www.mdpi.com/journal/electronics

Electronics 2018, 7, x FOR PEER REVIEW 2 of 22
heating, ventilating, and air-conditioning (HVAC) systems, and the execution and routines of the
sub-level components [8].
Electronics 2018, 7, 222 2 of 22
Technologies based on the Internet of Things (IoT) are immensely significant to comprehend
the notion of smart homes. Numerous solutions for obtaining energy consumption predictions in
buildings
buildingsbasedbased on on the
the IoT can be
IoT can be found
foundin inthe
theliterature
literature[9].
[9].Energy
Energymanagement
management and
and efficiency
efficiency is
is the
the
nextnext
mostmost crucial
crucial areaarea
for for
IoTIoT applications
applications in South
in South Korea
Korea [9]. [9].
From From
2003,2003,
homeshomes in country
in this this country
have
have been getting smarter and smarter with the inclusion of remote communication
been getting smarter and smarter with the inclusion of remote communication devices. The energy devices. The
energy demand in South Korea is growing day-by-day; in 2013 South Korea
demand in South Korea is growing day-by-day; in 2013 South Korea was the eighth largest energy was the eighth largest
energy
consuming consuming
country.country.
The energy Theconsumption
energy consumption
distributiondistribution
in South Koreain South Korea
is between is between
residential and
residential and commercial sectors (38%), the industrial sector (55%), the transport
commercial sectors (38%), the industrial sector (55%), the transport sector (1%), and the public sector sector (1%), and
the
(6%)public sector
[10] as shown (6%)in[10] as shown
Figure 1. in Figure 1.
Figure 1. Annual energy consumption distribution in the different zones of South Korea [10].
Figure 1. Annual energy consumption distribution in the different zones of South Korea [10]
Many solutions were developed based on machine learning algorithms for energy consumption
Many solutions
prediction. These modelswere developed
use historical based on machine
data, which reflect learning algorithms
process behavior for toenergy consumption
be modeled [11,12].
prediction. These models use historical data, which reflect process
The machine learning techniques that have been used redundantly for prediction purposes are behavior to be modeled [11,12].
The machine
artificial neural learning
networks techniques that have
[7], adaptive been used
neuro-fuzzy redundantly
inference system for (ANFIS)prediction purposes
[13], support are
vector
artificial neural networks [7], adaptive neuro-fuzzy inference system
machine (SVM) [14], extreme learning machine (ELM) [15], and so forth. The ELM method has some (ANFIS) [13], support vector
machine
advantages (SVM) over[14],theextreme learning
conventional NNsmachine
such (ELM)
as they[15], are and
easysotoforth.
use, Thefast ELM method
learning, has some
provide good
advantages over the conventional NNs such as they are easy to use,
generalization results, can easily be applied, and can get least training inaccuracy and minimum fast learning, provide good
generalization
norm of weights results, can easily be
[16]. Nowadays theapplied, and can
deep learning get least have
approaches trainingalsoinaccuracy
been used in and minimum
various areas
norm of weights [16]. Nowadays the deep learning approaches have
for prediction purposes [17], such as a deep neural network, deep belief network, and a recurrent also been used in various areas
for prediction
neural network. purposes
The term[17], ofsuchdeepas learning
a deep neural
statesnetwork,
the number deepofbelief
layersnetwork,
throughand whicha recurrent
data are
neural
transferred [18]. The deep learning techniques are powerful tools to obtain healthier modelling are
network. The term of deep learning states the number of layers through which data and
transferred [18]. The deepThe
prediction performance. learning
datasets techniques are powerful
used in References tools
[18–20] fortotime
obtainserieshealthier
prediction modelling and
applications
prediction
do not haveperformance.
a large quantity The datasets
of data used intoReferences
as compared datasets in the [18–20]
researchfor areas
time ofseries
imageprediction
processing,
applications do not have
speech recognition a large quantity
and machine of data as
vision. Though, in compared to datasets
these applications, theindeepthe research
learning areas
methods of
image processing, speech recognition and machine vision. Though, in
worked efficiently as compared to the conventional machine learning approaches due to their slightly these applications, the deep
learning methods worked
deeper architectures efficiently
and novel as compared
learning methods. to the conventional machine learning approaches
due toIntheir slightly deeper architectures
this paper, we have proposed a methodology and novel learningfor the methods.
energy prediction having four layers, i.e.,
In this paper, we have proposed a methodology
the data acquisition layer, the pre-processing layer, the prediction for the energy
layer, prediction having four
and the performance layers,
evaluation
i.e., theWe
layer. datahaveacquisition
performed layer, the pre-processing
different operations on the layer,
datathe prediction
in each layer of layer, and the performance
the proposed model. In the
evaluation layer. We have performed different operations
prediction layer, we used the deep extreme learning (DELM) approach for the improvedon the data in each layer of the proposed
performance
model. In the prediction layer, we used the deep extreme
of the energy consumption prediction. The DELM takes the benefits of both extreme learninglearning (DELM) approach for the
and
improved
deep learningperformance
techniques. of theTheenergy
DELM consumption
increases the prediction.
number The DELMlayers
of hidden takes the benefits
in the of both
original ELM
extreme
networklearning
structure, andarbitrarily
deep learning techniques.
initializes the inputThelayer
DELM increases
weights and the
the number
initial hiddenof hiddenlayerlayers
weightsin
the original
along with the ELM biasnetwork
of initialstructure,
hidden layer, arbitrarily
uses theinitializes
techniquethe for input
hiddenlayerlayers weights
(excluding and first
the hidden
initial
hidden layer weights along with the bias of initial hidden layer, uses the
layer) parameters calculation, and finally uses the least square technique for output network weights technique for hidden layers
(excluding
calculation.firstWehidden
have used layer)
theparameters
trial and error calculation,
method toand set finally
the bestuses the least
number square
of hidden technique
layers, for
a suitable
number of neurons in the hidden layers, and a compatible activation function. The performance
Electronics 2018, 7, x; doi: FOR PEER REVIEW
evaluation of the proposed DELM model with an adaptive neuro-fuzzy inference system (ANFIS) and
artificial neural network (ANN) regarding energy consumption prediction was carried out.
The rest of the paper is organized as follow. The related work is given in Section 2, a detailed
explanation of the proposed model comprised of data acquisition, preprocessing, prediction and
performance evaluation modules are given in Section 3. Section 4 discusses the experimental results
based on prediction algorithms in detail. The discussion and comparative analysis explanation are
provided in Section 5. The paper conclusion and future work are discussed in Section 6.
2. Related Work
Energy is an extremely vital resource and its demand is growing day-by-day. Saving energy
is not only significant to promote a green atmosphere for future sustainability but also vital for
household consumers and the energy production corporations. Electricity affects the user’s regular
expenses and the user always wants to decrease their monthly expenses. Energy production companies
are always under intense pressure to fulfil the growing energy demand of the commercial and
domestic sectors. Techniques for proficient energy consumption prediction are essential for all
stakeholders. Many researchers have made numerous efforts and developed several methods for
energy consumption prediction.
Generally, we can achieve more accurate results by using machine learning in different real-world
applications. Kalogirou [21] approached a back propagation neural network for the required heating
load prediction in buildings. The training of the algorithm was carried out for the energy consumption
data of 225 buildings; these buildings were different in sizes from small spaces to big rooms.
Olofsson [22] proposed a method to forecast the energy requirement on a yearly basis for a small
single-family building situated in Sweden. Yokoyama [23] suggested a method for a cooling demand
prediction in a building based on the back propagation neural network. Kreider [24] applied a recurrent
neural network to predict energy consumption hourly based on the heating and cooling energy
prediction in buildings. The recurrent neural network also used by Ben-Nakhi [25] for the cooling load
prediction of three office buildings. The data used for model training and testing was collected from
1997 to 2000 for short-term energy prediction. Carpinteiro [26] used a hierarchical neural network based
model for short-term energy consumption prediction. They have used two self-organization maps
for load forecasting. The Euclidian distance was used to calculate the distance between two vectors.
They have used the Brazilian utility dataset for the energy consumption prediction. Their proposed
approach performed well for both short-term and long-term forecasting. Another technique was
suggested based on a regression model for short-term load forecasting [27]. Irisarri [28] proposed
a method of energy load prediction based on summer and winter sessions. Ali, in Reference [29],
proposed a technique comprising of six stages for smart homes energy consumption prediction in South
Korea. They have used the Kalman filter as a predictor and the Mamdani fuzzy controller to control the
actuators. Wahid [30] proposed a technique for energy consumption prediction in residential buildings.
They have calculated the first two statistical moments namely mean and variance for the data that
consisted of hourly, daily, weekly and monthly energy consumption. Then, the multilayer perceptron
on the data with statistical moments was applied for energy consumption prediction. The trial and
test methods were used to address a suitable combination of input, hidden, and output layers of
neurons. Wahid [31] proposed another energy consumption prediction methodology for residential
buildings. The introduced method consisted of five stages, namely data source, data collection, feature
extraction, prediction, and performance assessment. Different machine learning algorithms, such as
Multi-Layer Perceptron, Random Forest, and K-Nearest Neighbors Algorithms (KNN) were used
to obtain the energy consumption predicted results. Arghira [32] presented an energy consumption
forecasting method for different appliances in homes. The technique used in this study was developed
to predict the day-ahead electricity demand for homes. In this study, the authors have used a historical
dataset for homes in France. In this paper, the authors have suggested a stochastic predictor and
tested two other predictors. Two pre-processing methods were also proposed namely, segmentation
and aggregation. Li [33] suggested an alternate method called the hybrid genetic algorithm-adaptive
network-based fuzzy inference system (GA-ANFIS) for energy consumption prediction. In their
proposed approach, the GA algorithm was used as an optimizer, which assisted in developing the
rule base and the premises and subsequent factors were adjusted through ANFIS for optimization of
the prediction performance. Kassa, in Reference [34], proposed a model based on ANFIS for one-day
ahead energy generation prediction. The proposed model was tested on real information of a wind
power generation profile and the results provided by this method were prominent. In another paper,
Ekici [35] proposed a technique using the ANFIS model to predict the energy demands of diverse
buildings with different characteristics.
Nowadays, deep learning approaches have been used extensively for energy consumption
prediction [18–20]. Hence, due to the greater ability of learning, the deep learning (DL) methods
have been used to improve the performance of modelling, classification, and visualization problems.
Collobert [36] proposed an approach based on a convolutional neural network (CNN) for natural
language processing (NLP). Hinton [37] used a deep auto-encoder to reduce the dimensionality and
the results indicate that the deep auto-encoder performs better compared to a principal component
analysis (PCA). Qiu [20] used a DL technique to predict time series small-batch data sets. Li [18] used
a deep learning technique to predict traffic flow based on time series data. After review of all these
applications, the results indicate that the performance of deep learning techniques is better compared
to other counterpart approaches.
Figure 2 illustrates the proposed conceptual model for energy consumption prediction.
The proposed methodology consisted of four main modules, namely data acquisition, pre-processing,
prediction, and performance evaluation.
Figure 2. A conceptual model of the proposed approach.
3. Proposed Energy Consumption Prediction Methodology

Energy consumption prediction in residential building is extremely important; it assists the
manager to preserve energy and to avoid wastage. Due to the unpredictability and noisy disorder,
correct energy consumption prediction in residential buildings is a challenging task. In this paper,
we have proposed a methodology based on a deep extreme learning machine (DELM) for energy
consumption prediction
Electronics 2018, in REVIEW
7, x FOR PEER residential buildings. We have divided the proposed method5 into of 22 four
main layers, namely data acquisition, preprocessing, prediction, and performance evaluation. In the
data main layers, namely
acquisition layer, wedatahave
acquisition, preprocessing,
discussed prediction,
the detailed andin
data used performance evaluation.
the experimental In theIn the
work.
data acquisition layer, we have discussed the detailed data used in the experimental work. In the
preprocessing layer, the moving average has been used to remove abnormalities from the data.
preprocessing layer, the moving average has been used to remove abnormalities from the data. In
In thethe
prediction layer, the deep extreme learning machine (DELM) has been proposed to enhance
prediction layer, the deep extreme learning machine (DELM) has been proposed to enhance the
the accuracy of energy consumption
accuracy of energy consumption results.
results. Inperformance
In the the performance evaluation
evaluation layer,
layer, MAE, MAE,
RMSE, andRMSE,
and MAPE
MAPE [14] [14] performance
performancemeasures
measures have
have been
been used
used to to measure
measure the the performance
performance of prediction
of prediction
algorithms.
algorithms.Figure
Figure3 shows
3 showsthethe
detailed
detailedstructure
structure of the
thediagram
diagramofof the
the proposed
proposed method.
method.
Figure 3. Detailed
Figure processing
3. Detailed diagram
processing diagramfor
for the
the proposed energyconsumption
proposed energy consumption prediction
prediction approach.
approach.
3.1. Data Acquisition

3.1. Data Layer
Acquisition Layer
Figure 4 shows
Figure thethe
4 shows data collection
data collectionphase.
phase. The datasetsfrom
The datasets fromfour
four residential
residential buildings
buildings from from
January to December
January 2010
to December were
2010 werecollected
collected[10].
[10].
Building
33 Floors
Data One Year

Collection Period
Jan Feb Mar Dec
Each Day of the Month
Temperature Humidity Environmental User

1 1 Circumstances kWh
(F) Reading Condition Occupancy
2 Temperature Humidity Environmental User

2 Circumstances kWh
Temperature Humidity Environmental User

31 Circumstances kWh
31
Figure 4. Data collection, the Year 2010.
We have completed the task of data collection in the data acquisition layer for the proposed
Figure
work. Sensors were mostly used 4. Data
for the collection,
collection the Year information
of contextual 2010. such as environmental
1
We 2018,
Electronics have7,completed the task of data collection in the data acquisition layer for the proposed
x FOR PEER REVIEW 6 of 22
work. Sensors were mostly used for the collection of contextual information such as environmental
conditions, circumstances, temperature, humidity, user occupancy, occupancy, and
and so so forth.
forth. For user occupancy
detection,
detection, several
severalPassive
PassiveInfra-Red
Infra-Red(PIR)(PIR)sensors
sensorswere
wereused
usedtoto
obtain
obtaininformation
information in in
0, 10,form, such
1 form, as
such
busy or not busy. To get the information about user occupancy, the installation
as busy or not busy. To get the information about user occupancy, the installation of cameras in of cameras in transition
positions
transition between
positionsseveral
between regions
severalof the building
regions of the was required.
building wasThe collection
required. Theofcollection
the designated
of the
residential
designated residential buildings data was carried out from January 2010 to December 2010. had
buildings data was carried out from January 2010 to December 2010. The building The
33 floors (394
building had 33ft. floors
tall), the floor-wise
(394 information
ft. tall), the floor-wisewas collectedwas
information fromcollected
smart meters and used
from smart for this
meters and
work.
used for The installations
this of these meters
work. The installations have meters
of these been carried out in
have been a floor
carried wise
out in amanner
floor wisein the chosen
manner in
buildings. It also indicated a direct relationship between energy utilization
the chosen buildings. It also indicated a direct relationship between energy utilization and users and users occupancy in
the dataset.inTo
occupancy thebetter explain
dataset. the entire
To better explainenergy consumption
the entire data, a box plot
energy consumption data,wasa boxused.
plotThe
wasx-axis
used.
represents
The x-axis the hours ofthe
represents thehours
day (24 ofh),
thethe
dayy-axis represents
(24 h), therepresents
the y-axis energy consumption
the energy in kWh. The box
consumption in
represents the energy consumption in a particular hour of the day for the whole
kWh. The box represents the energy consumption in a particular hour of the day for the whole year. year. The long length
box
The indicates
long length highbox energy consumption
indicates high energyand the short lengthand
consumption boxthe
indicates low energy
short length consumption.
box indicates low
As residential buildings are very busy during noon and night times,
energy consumption. As residential buildings are very busy during noon and night times, thethe energy consumption was
higher
energy during these timings.
consumption The entire
was higher dataset
during of hourly
these energy
timings. The consumption
entire dataset usedof inhourly
the proposed
energy
work is shownused
consumption in Figure
in the5proposed
for betterworkobservation.
is shown in Figure 5 for better observation.
(boxplot)
Consumption VS Hour (boxplot)
Consumption(KWh)
Hour
Figure
Figure 5. Distribution of
5. Distribution of data,
data, on
on an
an hourly
hourly basis,
basis, for
for energy
energy consumption.
consumption.
3.2. Preprocessing Layer

3.2. Preprocessing Layer
In the pre-processing layer, we have removed abnormalities from the data. The data were assumed
In the pre-processing layer, we have removed abnormalities from the data. The data were
to have noise due to the inherent nature of data recording where several external aspects affect the
assumed to have noise due to the inherent nature of data recording where several external aspects
reading. In the same way, there were many factors involved in outliers such as meter problem,
affect the reading. In the same way, there were many factors involved in outliers such as meter
human mistake, measurement errors, and so forth. Different smoothing filters can be used to remove
problem, human mistake, measurement errors, and so forth. Different smoothing filters can be used
abnormalities from the data, such as a moving average, loess, lowess, Rloess, RLowess, Savitsky-Golay.
to remove abnormalities from the data, such as a moving average, loess, lowess, Rloess, RLowess,
In this study, we have used the moving average method, which is an important filter and widely used
Savitsky-Golay. In this study, we have used the moving average method, which is an important
by various authors [14] for data smoothing. Equation (1) is the mathematical representation of the
filter and widely used by various authors [14] for data smoothing. Equation (1) is the mathematical
moving average filter.
representation of the moving average filter.
1 M−1
M J∑
y [i ] = X(i + j) (1)
1 =0
[ ] = X(i + j) (1)
where x [ ] represents the inputs, y [ ] denotes theM outputs, and M indicates the points of the moving
average. In the proposed work, M was equal to 5, which was a suitable size for data smoothing [38].
where x [ ] represents the inputs, y [ ] denotes the outputs, and M indicates the points of the moving
average. In the proposed work, M was equal to 5, which was a suitable size for data smoothing [38].
Usually, data normalization is required when the sample data is scattered and the sample span
is large. Hence, the span of the data was minimized for building models and predictions. The
Usually, data
normalization was normalization is required
done, in essence, when
to have the same therange
sample data isforscattered
of values andinputs
each of the the sample
to the
span is large. Hence, the span of the data was minimized for building models
machine learning models. It can guarantee a stable convergence of weight and biases. In machine and predictions.
The normalization
learning algorithmwas done, in essence,
modelling to have the
for increasing same range
prediction of values
accuracy andforfor
eachtraining
of the inputs
processto
the machine learning models. It can guarantee a stable convergence of weight and
improvement, complete sample data were normalized to fit them in the interval [0, 1] by using biases. In machine
learning
Equationalgorithm
(2) given modelling
below: for increasing prediction accuracy and for training process improvement,
complete sample data were normalized to fit them in the interval [0, 1] by using Equation (2)
given below: = , i = 1, 2, 3…, N, (2)
xi − xmin
P= , i = 1, 2, 3 . . . , N, (2)
xmax − xmin
where P represents the mapped values, x denotes the starting value, is i of input data; and
where P represents the mapped values, x denotes the starting value, xi is i of input data; xmax and xmin
indicate the maximum and minimum values of starting data accordingly [39].
indicate the maximum and minimum values of starting data accordingly [39].
3.3. Prediction
3.3. Prediction Layer
Layer
In the
In the prediction
prediction layer,
layer, we
we have
have used
used three
three well-known
well-known machine
machine learning
learning algorithms
algorithms to
to make
make
one-week and
one-week and one-month
one-month energy
energy consumption
consumptionpredictions
predictionsfor
forthe
theresidential
residentialbuildings.
buildings.
3.3.1. Deep
3.3.1. Deep Extreme
Extreme Learning
Learning Machine
Machine (DELM)
(DELM)
The extreme
The extreme learning
learning machine
machine (ELM)
(ELM) technique
technique isis aa very
very famous
famous technique
technique andand itit has
has been
been used
used
in different fields for energy consumption prediction. The conventional artificial
in different fields for energy consumption prediction. The conventional artificial neural network basedneural network
based algorithm
algorithm requiresrequires moresamples,
more training trainingslower
samples, slower
learning learning
times, and maytimes,
leadand may
to the lead to of
over-fitting thea
over-fitting of a learning model [40]. The idea of ELM was first specified by Reference
learning model [40]. The idea of ELM was first specified by Reference [41]. The ELM is used widely in [41]. The ELM
is used areas
various widelyforinclassification
various areas andfor classification
regression and regression
purposes because an purposes
ELM learns because an ELM
very quickly andlearns
it is
very quickly andefficient.
computationally it is computationally
The ELM model efficient. The ELM
comprises modellayer,
the input comprises thehidden
a single input layer,
layer, aand
single
an
output layer. The structural model of an ELM is shown in Figure 6, where p represents input layerp
hidden layer, and an output layer. The structural model of an ELM is shown in Figure 6, where
represents
nodes, input layer
q represents nodes,
hidden q represents
layer nodes, and hidden layeroutput
r indicates nodes, layer
and rnodes.
indicates output layer nodes.
Figure 6. Structural
Figure 6. Structural diagram
diagram of
of an
an extreme
extreme learning
learning machine
machine (ELM).
(ELM).

Initially, take a sample of training [ A, B] = ak, bk, (i = 1, 2, . . . , Z ), and input feature A =
Initially, take a sample of training [ , ] = , , ( = 1,2, … . , ) , and input feature =
[ ak1 ak2 ak3 . . . akZ ] and a targeted matrix B = [bl1 bl2 bl3 . . . blZ ] consisted of the training samples,
[ ….. ] and a targeted matrix = [ … . . ] consisted of the training
then A and B matrices can be represented as in Equations (3) and (4) respectively, where, a and
samples, then A and B matrices can be represented as in Equations (3) and (4) respectively, where, a
b represent the dimension of the input matrix and the output matrix respectively. Next, the ELM
and b represent the dimension of the input matrix and the output matrix respectively. Next, the ELM
arbitrarily adjusts the weights between the input layer and the hidden layer where wkl is the weight
arbitrarily adjusts the weights between the input layer and the hidden layer where is the weight
between the kth input layer node and lth hidden layer node as represented in Equation (5). Then, the
between the kth input layer node and lth hidden layer node as represented in Equation (5). Then, the
ELM randomly fixes the weights between the hidden neurons and output layer neurons that can be
represented by Equation (6), where γ is the weight between the input and hidden layer nodes.

ELM randomly fixes the weights between the hidden neurons and output layer neurons that can be
represented by Equation (6), where γkl is the weight between the input and hidden layer nodes.
a11 a12 . . . a1z

 

 a21 a22 . . . a2z 


 a31 a32 . . . a3z 

 
A=  (3)
.. .. ..
 
 

 . . . 

 a p1 a p2 . . . a pz 
b11 b12 . . . b1z

 

 b21 b22 . . . b2z 


 b31 b32 . . . b3z 

 
B=  (4)
.. .. ..
 
 

 . . . 

 br1 br2 . . . brz 
w11 w12 . . . w1p

 

 w21 w22 . . . w2p 


 w31 w32 . . . w3p 

 
w=  (5)
.. .. ..
 
 

 . . . 

 wi1 wi2 . . . wip 
 
γ11 γ12 . . . γ1r

 γ21 γ22 . . . γ2r 


 γ31 γ32 . . . γ3r 

 
γ=  (6)
.. .. ..
 
 

 . . . 

 γ p1 γ p2 . . . γ pr 
Next, the biases of hidden layers nodes were randomly selected by the ELM, as in Equation (7).
Further, the ELM selected a g(x) function, which was the activation function of the network. Consider
Figure 4; the resultant matrix can be represented as in Equation (8). Respectively, the column vector of
the resultant matrix, T, is represented in Equation (9).
T
B = b1, b2, b3 , ··· , b p (7)
V = [ v1 , v2 , v3 , · · · , v Z ]r × Z (8)
   q 
v1j ∑l =1 γk1 g(wk al + bk )
q
v2j ∑l =1 γk2 g(wk al + bk )
   
   
q
t3j ∑l =1 γk3 g(wk al + bk )
   
vl = =   (9)
.. ..
   
   
 .   . 
q
trj ∑l =1 γkr g(wk al + bk ) (l =1,2,3,. . . ,Q )
Next, by considering Equations (8) and (9), we can obtain Equation (10). The hidden layer output
is expressed as H and the transposition of V is represented as V 0 and weight matrix values of γ [42,43]
were computed using the least square method as given in Equation (11).
Hγ = V 0 (10)
γ = H+ V 0 (11)
The regularization term γ has been used in order to make the network more generalized and
more stable [44].
Deep learning is emerging and is the most popular topic for researchers nowadays. A network
having at least four layers with input/output layers meets the requirement of a deep learning network.
In a deep neural network, the neurons of each layer are trained on a different set of parameters using
the prior layer’s output. It enables the deep learning networks (DLN) to handle extensive data sets.
Deep learning has grasped the attention of many researchers because it is very efficient to solve
real-world problems. In the proposed work, we have used the DELM to encapsulate the advantages of
both ELM and deep learning. The configuration model of DELM consisted of one input layer having
four neurons, six hidden layers where each hidden layer consisted of 10 neurons, and one output
layer having one neuron is illustrated in Figure 7. The trial and error method has been used to select
the number of nodes in the hidden layers due to the unavailability of any specific mechanism for
specifying hidden layers neurons. The projected output of the second hidden layer can be achieved as:
H1 = Vγ+ (12)
where γ+ represents the general inverse of matrix γ. Hence, the values of the hidden layer 2 can be
simply achieved by means of Equation (11) and the activation function inverse.
g(W1 H + B1 ) = H1 (13)
In the Equation (13), the parameters W1 , H, B1 , and H1 represent the weight matrix of the first
two hidden layers, the bias of the first hidden layer neurons, the estimated output of first hidden layer,
and the estimated output of the second hidden layer respectively.
WHE = g−1 ( H1 ) HE+ (14)
HE+ represents the inverse of HE and the activated function g(x) is used to compute Equation (5).
So by specifying any proper activation function g(x), the desired result of the second hidden layer is
updated as below:
H2 = g(WHE HE ). (15)
The update of the weighted Matrix γ between hidden layer 2 and hidden layer 3, carried out as in
Equation (16), where H2+ indicates the inverse of H2 . Therefore, the estimated result of the layer 3 is
represented as in Equation (17).
γnew = H2+ V (16)
+
H3 = Vγnew (17)
+ represents the inverse of the weight matrix γ
Vγnew new . Then the DELM defines the matrix
WHE1 = [ B2 , W2 ]. The output of the third layer can be achieved by using Equations (10) and 11).
H3 = g−1 ( H2 W2 + B2 )= g(WHE1 HE1 ) (18)

WHE1 = γ−1 ( H3 ) HE1
+
. (19)
Electronics 2018, 7, 222 10 of 22
In Equation (18) the H2 signifies the desired result and the hidden layer 2, the weight between
the hidden layer 2 and the hidden layer 3 is represented by W2 , the hidden layer B2 is the bias of
+
the hidden layer 3 neurons. HE1 represents the inverse of HE1 , and g−1 ( x ) denotes the inverse of the
activation function g(x). The logistic sigmoid function represented in Equation (20) has been adopted.
The third hidden layer output is computed as in Equation (21).
1
g( x ) = (20)
1 + e− x
H3 = g(WHE1 HE1 ). (21)
Finally, the resultant weighted matrix between the hidden layer 3 and the last layer output is
computed as in Equation (22). The estimated result of the hidden layer 3 is represented as in Equation (23).
−1
1
γnew = H4T ( + H4T H4 ) V (22)
λ
+
H4 = Vγnew (23)
+ represents the inverse of the weight matrix γ
Vγnew new . Then the DELM defines the matrix
WHE2 = [ B3 , W3 ]. The output of the fourth layer can be achieved by using Equations (15) and (24).
H4 = g−1 ( H3 W3 + B3 ) = g(WHE1 HE1 ) (24)

WHE2 = γ−1 (( H4 ) HE2
+
. (25)
In Equation (11), the H3 denotes the desired output of the third hidden layer, the weight between
the third hidden layer and the fourth hidden layer is represented by W3 , the hidden layer B3 is the bias
+
of the third hidden layer neurons. HE1 represents the inverse of HE1 , and g−1 ( x ) denotes the inverse
of the activation function g(x). The logistic sigmoidal function has been adopted. The output of the
third and fourth hidden layer is computed as Equation (26) below:
H4 = g(WHE2 HE2 ) (26)
Finally, the output weight matrix between the fourth layer and the output layer is computed as
in Equation (27). The estimated result of the fifth layer can be denoted by Equation (28). The desired
output of the DELM network is represented by Equation (29).
−1
1
γnew = H5T ( + H5T H5 ) V (27)
λ
+
H5 = Vγnew (28)
f ( x ) = H5 β new. (29)
So far, we have discussed the calculation process of the four hidden layers of the DELM
network. The cycle theory has been applied to demonstrate the calculation process of the DELM.
The recalculation of Equations (18)–(22) can be done to get and record each hidden layer’s parameters
and eventually the last result of DELM network. If increments occur in the hidden layers, the same
computation procedure can be reused and executed similarly. In the proposed work we have applied a
trial and error method [30,31] to determine the optimal neural network structure. Inputs to the DELM
as shown in Figure 7 are hours of the day (X1 ), days of the week (X2 ), days of the month (X3 ) and
month (X4 ) and the output is the energy consumption prediction (ECP).
and eventually the last result of DELM network. If increments occur in the hidden layers, the same
computation procedure can be reused and executed similarly. In the proposed work we have
applied a trial and error method [30,31] to determine the optimal neural network structure. Inputs to
the DELM as shown in Figure 7 are hours of the day (X1), days of the week (X2), days of the month
Electronics
(X3) and 2018, 7, 222
month (X4) and the output is the energy consumption prediction (ECP). 11 of 22
Figure 7. Structural diagram of the proposed energy consumption prediction based on the deep
Figure
extreme7.learning
Structural diagram
machine of the
(DELM) proposed energy consumption prediction based on the deep
approach.
extreme learning machine (DELM) approach.
3.3.2. Artificial Neural Network (ANN)
ANNs are based on biological information processing and have been extensively used for energy
consumption
Electronics 2018, 7,prediction in residential
x; doi: FOR PEER REVIEW buildings. The ANNs have been commonly used because
of their robust nonlinear mapping capability. The ANN might be reflected in a regression method,
which signifies the sophisticated nonlinearity between independent and dependent variables [45].
In recent years, researchers have deployed ANN models for analyzing numerous types of prediction
problems in a variety of circumstances. The ANN model used in the proposed work is the multilayer
perceptron (MLPs). MLPs usually have three layers namely input, hidden, and the output consisting of
input nodes, neurons, and synaptic connections. In MLPs backpropagation method is used to reduce
the prediction residual sum of squares (RSS). The mathematical representation of RSS is given in the
Equation (30).
n 2
RSS = ∑i=1 (Yi − Yi ) (30)
where Yi represents the ith targeted values in the training data and Yi indicates the predicted values.
The strength of the input signal is represented through synapse weights, and initially, these weights
are initially allocated randomly. The sum of the product of each connection input value and synapse
is computed and provided as input to each neuron in the hidden layer. Commonly three types of
activation functions, namely linear, tan-sigmoid, and logarithmic sigmoid as represented in Equations
(31)–(33) respectively are used in the hidden layer and output layer of MLP [30].
χ(x) = linear (x) (31)
2
Φ (x) = −1 (32)
(1 + e−2x )
1
ψ(x) = (33)
(1 + e − x )
The tan-sigmoid function is used as the activation function in the hidden layer. The best transfer
function selection in the hidden is also somewhat trial and test method [46]. In the proposed
work, we have tested five transfer functions, such as tan-sigmoid, linear, radial basis, symmetric,
and saturating linear. In the output layer linear function has been used which is the most appropriate
activation function for output neuron (s) of ANNs for regression problems. The structure diagram of
the ANN used in the proposed approach is shown in Figure 8.
The tan-sigmoid function is used as the activation function in the hidden layer. The best transfer
function selection in the hidden is also somewhat trial and test method [46]. In the proposed work,
we have tested five transfer functions, such as tan-sigmoid, linear, radial basis, symmetric, and
saturating linear. In the output layer linear function has been used which is the most appropriate
activation function for output neuron (s) of ANNs for regression problems. The structure diagram
Electronics 2018, 7, 222
of
12 of 22
the ANN used in the proposed approach is shown in Figure 8.

8. Structural
Figure 8.
Figure Structuralofofa model of the
a model of artificial neuralneural
the artificial network (ANN) (ANN)
network used in the
usedproposed
in the approach.
proposed
Different training algorithms, such as Levenburg Marquardt (LM), Bayesian regularization,
approach.
Different training algorithms, such as Levenburg Marquardt (LM), Bayesian regularization, scaled
scaled conjugate gradient and so forth [47] have been used for network training. The development of
conjugate gradient and so forth [47] have been used for network training. The development of
MLP with the number of pre-defined hyper-parameters disturbs the ability of fitness of the model.
MLP with the number of pre-defined hyper-parameters disturbs the ability of fitness of the model.
The selection
Electronics of
2018, 7,of the
x; the number
doi: number
FOR of
PEERof neurons in the hidden layer is a somewhat trial and test method [30].
REVIEW
The selection neurons in the hidden layer is a somewhat trial and test method [30].
3.3.3. Adaptive
3.3.3. Adaptive Neuro-Fuzzy
Neuro-Fuzzy Inference
Inference System
System (ANFIS)
(ANFIS)
ANFIS uses
ANFIS usesa afeed-forward
feed-forward network
network having
having multiple
multiple layerslayers
whichwhich
use NNuse NN algorithms
algorithms for
for learning
learning and fuzzy reasoning for mapping the input space to output space. ANFIS
and fuzzy reasoning for mapping the input space to output space. ANFIS is used extensively in various is used
extensively
areas in various
for predictions areas for
[48–50]. predictions
ANFIS is a fuzzy[48–50]. ANFIS
inference is a(FIS),
system fuzzy inference
and system (FIS),
its implementation and its
is carried
implementation is carried out in the adaptive neural framework. The structure of ANFIS
out in the adaptive neural framework. The structure of ANFIS is shown in Figure 9 (2 inputs, and one is shown in
Figure 9 (2 inputs, and one output) for the first order Sugeno fuzzy logic model. In this structure,
output) for the first order Sugeno fuzzy logic model. In this structure, for each input, two membership for
each input,
functions twobeen
have membership
defined. functions have been defined.
Figure 9. Structural diagram for an adaptive neuro-fuzzy inference

inference system.
system.
The adaptive neuro-fuzzy system consisted of five different layers, each of them them is
is explained
explained
below
below inindetail
detailsuch
suchas layer 1 nodes
as layer are adaptive
1 nodes and produce
are adaptive inputs degree
and produce inputsofdegree
membership functions
of membership
(MFs). Layer 2 nodes are fixed, and these nodes do simple multiplication. Layer 3 nodes are
functions (MFs). Layer 2 nodes are fixed, and these nodes do simple multiplication. Layer 3 nodes also fixed,
and the role of these nodes in the network is normalization. Layer 4 nodes are adaptive
are also fixed, and the role of these nodes in the network is normalization. Layer 4 nodes are whose output
is a simplewhose
adaptive multiplication
output isof anormalized firing strength
simple multiplication of and first-orderfiring
normalized Sugeno model.and
strength Thefirst-order
factors of
Sugeno model. The factors of this layer are named as consequent factors. Layer 5 has a single
permanent node where the calculation of all incoming is carried out.
The supervised learning is used to train the network. Hence, the purpose is to train the adaptive
network for known functions approximation supplied by training data and then finds the exact
value of the mentioned parameters. There is no hard and fast rule to determine a suitable number of
Electronics 2018, 7, 222 13 of 22
this layer are named as consequent factors. Layer 5 has a single permanent node where the calculation
of all incoming is carried out.
The supervised learning is used to train the network. Hence, the purpose is to train the adaptive
network for known functions approximation supplied by training data and then finds the exact value of
the mentioned parameters. There is no hard and fast rule to determine a suitable number of membership
functions for a variable in ANFIS. In the proposed work, we have applied the trial and error mechanism to
determine the effective number of MFs for each variable. Similarly, there are many types of membership
functions, such as triangular, trapezoidal, and so forth [49]. In the proposed work, we have considered
the bell-shaped membership functions as illustrated in Equations (34) and (35).
1
µXi (a) = 2xi , i, 1, 2, (34)
a − ci
1+ ai
1
µYi (b) = 2xj , j, 1, 2, (35)
b−ci
1+ ai
The bell-shaped membership functions are the most common and effective MFs used in the ANFIS
for prediction purposes [51].
3.4. Performance Evaluation Layer

Several criteria are used for performance computation of different prediction algorithms. In the
performance evaluation layer of the proposed model, MAE, RMSE, and MAPE performance indices
have been used to compare the target values and the actual values. The MSE is a measure that used
for the minimization of the error distribution. The RMSE measures the error between the predicted
power and the targeted power, and the MAPE is a measure which evaluates the prediction difference
as a percentage of the targeted power. The RMSE, MAE, and MAPE performance can be computed in
Equations (36)–(38) respectively as:
r
1 n
N ∑ k=0
RMSE = (Ti − Pi )2 (36)
1 n
N ∑ i=1
MAE = |Ti − Pi | (37)
1 n |Ti − Pi |
MAPE =
N ∑ i = 1 Ti
×100 (38)
where N indicates the entire values, T represents the target value, and P indicates the predicted value.
These metrics provide a single value to measure the accuracy of the outcomes of different algorithms.
These statistical measurements have been used in previous studies to analyze energy consumption
prediction models [34].
4. Experimental Results Based on Prediction Algorithms
4.1. Model Validation of DELM

To validate the model and analyze the experiments we have used, the actual data collected by
using different meters fixed in the designated multi-storied residential buildings. The data is collected
for a single year, i.e., 1 January 2010 to 31 December 2010. The size of complete input data is equal
to 365 days × 24 h per day = 8760. The installation of smart meters at each floor sub-distribution
switchboard has been carried out, and these meters are in connection with the central server. The energy
consumption for each hour is recorded for a year, and the unit used for measurement is Kilowatt hour
(kWh). The information contained by data-set is floor-wise hourly energy consumption. Example
using different meters fixed in the designated multi-storied residential buildings. The data is
collected for a single year, i.e., 1 January 2010 to 31 December 2010. The size of complete input data is
equal to 365 days × 24 h per day = 8760. The installation of smart meters at each floor sub-distribution
switchboard has been carried out, and these meters are in connection with the central server. The
Electronics
energy 2018, 7, 222
consumption for each hour is recorded for a year, and the unit used for measurement 14 of 22is
Kilowatt hour (kWh). The information contained by data-set is floor-wise hourly energy
consumption. Example view for two days hourly collected data is illustrated in Figure 10 for
view for two days hourly collected data is illustrated in Figure 10 for anonymous building-04 having
anonymous building-04 having 33 floors.
33 floors.
5-6 4-5 3-4 2-3 1-2 0-1
6
5
4
3
2
1
Floor23
0
Floor12
Floor1
Figure 10.Example
Figure10. Exampleview
viewof
oftwo
twoday
dayhourly
hourlyenergy
energyconsumption
consumptiondata
datacollected
collectedininBuilding-IV.
Building-IV.
We have used four important parameters, hours of the day (X1 ), days of the week (X2 ), days of
the month (X3 ) and month (X4 ) as input to machine learning algorithms used in the proposed work.
Further, to prevent overfitting, we used the k-fold cross-validation. It is a popular method because it is
simple to understand and generally results in a less biased or less optimistic estimate of the model
skill than other methods, such as a simple train/test split [52]. For one-week energy consumption
prediction the data is divided into 52 folds of approximately equal size. The first fold is treated as a
validation set, and the method is fit on the remaining folds. In the proposed work, we have carried
out energy consumption prediction for one-week and one-month. Hence, for a one-week energy
consumption prediction, one year hourly data is divided into 52 folds. In testing data set we have used
the one-week (7 days × 24 h = 168) data for testing and the remaining data (358 days × 24 h = 8592) to
train the models for one week energy consumption prediction. We have swapped the results achieved
for one-week energy consumption prediction, and next to the training and testing data and randomly
selected another data set (one week) for testing and the remaining for training.
This process continues until 52 iterations. Similar, for one-month energy consumption prediction
the data have been divided into 12 (k) sets with approximately equal size. We have used the one-month
(January) (31 days × 24 h = 744 h) data for testing and the remaining 11 months data (8016 h) for
training. Next, we have selected another month (February) data (28 days × 24 h = 672 h) for testing and
the remaining 11 months hourly data (8088 h) for training. The processes continue until the 12th month
(December) hourly data (31 days × 24 h = 744 h) get selected for testing and the remaining (11 months
data) (31 days × 24 h = 744 h) for training. Finally, the average of the testing results was determined.
The optimum network configuration depends on the number of hidden layers, a number of
neurons in the hidden layer (s), and the type of activation function. We have applied trial and error
method to achieve the optimum structure [46]. After applying trial and error method, we achieved the
well-suited configuration model consisted of 6 hidden layers and 20 neurons in each hidden layer for
the proposed DELM approach. The sigmoid activation function is used as activation function because
it is the most popular activation function and numerously used from the last couple of years [51].
We have also tried different iteration numbers from 1000 to 3000 with 100 increments and set iteration
number as 2000. Now by using the best-suited configuration model the one-week and one-month
hourly energy consumption resulted are recorded as shown in Figures 11 and 12 respectively.
function because it is the most popular activation function and numerously used from the last
couple of years [51]. We have also tried different iteration numbers from 1000 to 3000 with 100
increments and set iteration number as 2000. Now by using the best-suited configuration model the
one-week and one-month hourly energy consumption resulted are recorded as shown in Figures 11
Electronics 2018, 7, 222
and 12 respectively. 15 of 22

Figure 11. Actual vs. DELM predicted results for one-week energy consumption.
Figure 11. Actual vs. DELM predicted results for one-week energy consumption.
Figure 12. Actual vs. DELM predicted results for one-month energy consumption.
Figure 12. Actual vs. DELM predicted results for one-month energy consumption.
In this work, we have used ANN and ANFIS models for comparison with ANN and DELM.
In this work, we have used ANN and ANFIS models for comparison with ANN and DELM.
The reason behind the selection of ANFIS was its ability to seek for useful features and development
The reason behind the selection of ANFIS was its ability to seek for useful features and development
of the prediction model. The ANNs is also a very famous technique and it is used for energy
of the prediction model. The ANNs is also a very famous technique and it is used for energy
consumption purposes.
consumption purposes.
4.2. Model Validation of ANFIS
The structure diagram of the ANFIS for the proposed work is shown in Figure 13 [53]. We have
used The structure
a trial diagram
and error of thefor
approach ANFIS for the
the type andproposed
number ofwork is shown in
membership Figure 13
functions [53]. Weinhave
selection the
used a trial
proposed and For
work. error approach
each variable,for
wethe type
have and number
considered two of membership
generalized functionsMFs
bell-shaped selection in the
as shown in
proposed work. For each variable, we have considered two generalized bell-shaped
Figure 14. Total 16 rules were specified; the rule viewer is shown in Figure 15. MFs as shown
in Figure 14. Total 16 rules were specified; the rule viewer is shown in Figure 15.
The structure diagram of the ANFIS for the proposed work is shown in Figure 13 [53]. We have
used a trial and error approach for the type and number of membership functions selection in the
proposed
Electronics 2018, 7,work.
222 For each variable, we have considered two generalized bell-shaped MFs as shown 16 of 22
in Figure 14. Total 16 rules were specified; the rule viewer is shown in Figure 15.
Figure 13. Screenshot

Figure 13. Screenshotof the structure
of the structureofofthe
theadaptive neuro-fuzzyinference
adaptive neuro-fuzzy inference system
system (ANFIS)
(ANFIS) for the
for the
proposed work
proposed
Electronics 2018, 7, x[53].
work [53].
FOR PEER REVIEW 16 of 22
Figure
Figure 14.14. Screenshotofofthe
Screenshot themembership
membership functions
functionsused
usedin in
ANFIS [53].[53].
ANFIS
Figure
Figure 15.15. Screenshotof
Screenshot of the
the If-then
If-then rules
rulesafter
aftertraining [53].
training [53].
The output predicted results for one-week and one-month energy consumption is shown in
Figures 16 and 17 respectively.
Electronics 2018, 7, 222 17 of 22
Figure 15. Screenshot of the If-then rules after training [53].
The output
The output predicted
predicted results
results for
for one-week
one-week and
and one-month
one-month energy
energy consumption
consumption is
is shown
shown in
in
Figures 16
Figures 16 and
and 17
17 respectively.
respectively.

.
Figure 16. Actual
Figure 16. Actual vs. ANFIS predicted
vs. ANFIS predicted results
.
results for
for one-week
one-week energy
energy consumption.
consumption.
Figure 17. Actual vs ANFIS predicted results for one-month energy consumption.
Figure 17.
Figure Actual vs
17. Actual vs ANFIS
ANFIS predicted
predicted results
results for
for one-month
one-month energy
energy consumption.
consumption.
4.3.
4.3. Model
Model Validation
Validation ofof ANN
ANN
4.3. Model Validation of ANN
In the
In the proposed
proposed workwork for for achieving
achieving the the best
best ANN
ANN prediction model, we
prediction model, we tried
tried different
different hidden
hidden
layerIn
layer activation
activation functions,
the proposed
functions, different
workdifferent training
for achieving
training functions,
thefunctions,
best ANNand and output layer
prediction
output layer transfer
model, we tried
transfer functions instead
different
functions of
hidden
instead of
the
layersensitivity
activation of input parameters.
functions, different All the
training network models
functions, and have
output four neurons
layer transfer
the sensitivity of input parameters. All the network models have four neurons in input layer, a single in input layer,
functions a single
instead of
neuron
the in output
sensitivity of layer
input and for
parameters.hiddenAll layer,
the we
networkhave tried
models some
have neurons
four in
neurons
neuron in output layer and for hidden layer, we have tried some neurons in the hidden layer started theinhidden
input layer
layer, astarted
single
from
neuron
from 55 to
in30
to 30with
withfive
output fiveincrements
layer and to to
for hidden
increments find best
layer,
find combination
have triedinput
we combination
best some layer,
input hidden
neurons
layer, in layer,
the
hidden and layer
hidden
layer, output layer
and started
output
neurons
from 5 toTrial
30 and
with error
five method
increments hasto been
find applied
best to determine
combination inputthe
layer neurons Trial and error method has been applied to determine the number of neurons in number
layer, hiddenof neurons
layer, in
and hidden
output
layers
hidden [30].
layer neurons We[30].
layers haveWe
Trial considered
and the model
errorconsidered
have method theasbeen
has shown
model asinshown
applied Figure 18 Figure
becausethe
to determine
in 18it because
provides
numberitof the least MSE
neurons
provides in
the
values
hidden with tan-sigmoid
layers [30]. We (x)
have function
consideredin the
thehidden
model layer,
as linear
shown function
in Figure
least MSE values with tan-sigmoid (x) function in the hidden layer, linear function in the outputin
18 the output
because it layer and
provides the
Levenberg-Marquardt
least and
layer MSEthe values withalgorithm
tan-sigmoid
Levenberg-Marquardt for algorithm
training.
(x) function forintraining.
the hidden layer, linear function in the output
layer and the Levenberg-Marquardt algorithm for training.
Figure 18. ANN

ANN structure
structure model to predict energy consumption.
Figure 18. ANN structure model to predict energy consumption.
The recorded energy consumption prediction results for one-week and one-month have been
The
shown in recorded
Figures 19energy
and 20consumption
respectively. prediction results for one-week and one-month have been
shown in Figures 19 and 20 respectively.
Electronics 2018, 7, 222 18 of 22
Figure 18. ANN structure model to predict energy consumption.
The
Therecorded
recordedenergy
energyconsumption
consumptionprediction
predictionresults
resultsfor
forone-week
one-weekand
andone-month
one-monthhave
havebeen
been

Figure 19.Actual
Figure19. Actualvs.
vs.ANN
ANNpredicted
predictedresults
resultsfor
forone-week
one-weekenergy
energyconsumption.
consumption.
Figure 20. Actual vs. ANN predicted results for one-month energy consumption.
Figure 20. Actual vs. ANN predicted results for one-month energy consumption.
5. Discussion and Comparative Results Analysis
5. Discussion and Comparative Results Analysis
In the proposed work, we have applied the deep extreme learning algorithm along with ANN and
ANFIS In on
thereal
proposed work, we
data collected forhave applied
one year the deep
to predict extreme
energy learning algorithm
consumption along
in buildings forwith ANN
one-week
and one-month.
and ANFIS on real data have
The data collected for one year to
been pre-processed to predict
remove energy consumption
abnormalities from theindatabuildings
and make for
one-week and one-month. The data have been pre-processed to remove abnormalities
the data smooth and error free. In the DELM different number of hidden layers, hidden neurons, from the data
and makecombinations
different the data smooth and errorfunctions
of activation free. In the DELM
have been different number
tried to find of hidden
the best layers, hidden
configuration model
neurons, different combinations of activation functions have been tried
for energy consumption prediction. For a fair comparison, we also apply trial and error approach to find the best
configuration model for energy consumption prediction. For a fair comparison,
for ANN to find the best configuration model. Hence, we have tried different numbers of neurons in we also apply trial
and error
hidden approach
layers, fortypes
different ANNoftoactivation
find the best configuration
functions, model.
and different Hence,ofwe
numbers have tried
neurons in thedifferent
hidden
numbers of neurons in hidden layers, different types of activation functions, and
layer. Similarly, we also tested different types of membership functions and different numbers of different numbers
of neurons infunctions
membership the hidden layer. Similarly,
to achieve the suitable westructure
also tested different
of ANFIS types ofconsumption
for energy membershipprediction.
functions
and Indifferent numbers
this work, we haveof membership functions DELM
applied the proposed to achieve the different
for two suitable structure
periods ofoftime ANFIS for
energy
energy consumption prediction.
consumption prediction along with optimized ANN, and ANFIS approaches to test the efficiency
In this
of these work, we
algorithms have applied
properly. the proposed
For one-week energyDELM for two prediction
consumption different periods
the data ofinto
timetraining
energy
consumption
set prediction
would be more alongas
significant with optimized
compared ANN, and energy
to one-month ANFIS consumption
approaches toprediction.
test the efficiency
Hence to of
these algorithms properly. For one-week energy consumption prediction the
properly evaluate the performance of prediction algorithms both short-term and long-term energy data into training set
would be more
consumption significant
prediction haveasbeen
compared
carried to one-month energy consumption prediction. Hence to
output.
properly evaluate
We have used the performance
different statisticalof measures
predictiontoalgorithms
measure the both short-termofand
performance thelong-term
proposed DELM energy
consumption prediction have been carried output.
algorithm along with counterpart algorithms. In Tables 1 and 2 the MAE, RMSE and MAPE values
We have
of DELM, ANN used
anddifferent
ANFIS for statistical
one-week measures to measure
and one-month energythe consumption
performance of the proposed
prediction have
DELM algorithm along with counterpart algorithms. In Tables 1 and 2 the MAE, RMSE and MAPE
values of DELM, ANN and ANFIS for one-week and one-month energy consumption prediction
have been recorded. As in the proposed work, we have computed the one-month and one-week
energy consumption prediction using machine learning algorithms. Hence the average of statistical
measures values for both periods for the used prediction algorithms have been computed in Table
3. These statistical measures values indicate that the DELM performance is far better than the other
Electronics 2018, 7, 222 19 of 22
been recorded. As in the proposed work, we have computed the one-month and one-week energy
consumption prediction using machine learning algorithms. Hence the average of statistical measures
values for both periods for the used prediction algorithms have been computed in Table 3. These
statistical measures values indicate that the DELM performance is far better than the other counterpart
algorithms. The performance of ANFIS is better as compared to the ANN.
Table 1. Performance evaluation of deep extreme learning machine (DELM), adaptive neuro-fuzzy inference
system (ANFIS) and artificial neural network (ANN) for one-week energy consumption prediction.
Statistical Measures MAE MAPE RMSE

DELM 2.0008 5.7077 2.2451
ANFIS 2.2679 6.3884 2.4636
ANN 2.3918 6.7097 2.6030
Table 2. Performance evaluation of DELM, ANFIS and ANN for one-month energy consumption prediction.

DELM 2.3347 6.5464 2.6864
ANFIS 2.6433 7.3798 3.1712
ANN 2.5437 7.4562 3.2400
Table 3. Average values of statistical measures for one-week and one-month energy consumption
prediction results of DELM, ANFIS, and ANN.

DELM 2.1677 6.1271 2.4657
ANFIS 2.4556 6.8841 2.8174
ANN 2.4317 7.0830 4.8561
The statistical measures values indicate that the performance of the proposed DELM is far better
than the ANN and ANFIS for short-term (one-week) as well as on long-term (one-month) hourly
energy consumption prediction. So, the proposed DELM is the best choice for the energy consumption
prediction for both short and long terms energy consumption prediction.
6. Conclusions and Future Work

Modelling of energy consumption prediction in residential buildings is a challenging task, because
of randomness and noisy disturbance. To obtain better prediction accuracy, in this paper, we have
proposed a model for energy consumption prediction in residential buildings. The proposed model
comprised of four stages, namely data acquisition layer, a preprocessing layer, the prediction layer,
and performance evaluation layer. In data acquisition layer the data was collected through smart
meters in a designated building to validate the model and analyze the results. In the preprocessing
layer, some pre-processing operations have been carried out on the data to remove abnormalities
from the data. In the second stage, we have proposed deep extreme learning machine and applied
to the pre-processed data for one-week and one-month energy consumption prediction in residential
buildings. The purpose of using different machine learning algorithms on collected data was to obtain
better results in term of accuracy for practical applications. For the optimal structure of DELM different
parameters various number of hidden layers, different numbers of neurons in the hidden layer and
different activation functions have been tuned to get the optimized structure of DELM. We have also
applied other well-known machine learning algorithms, such as ANN, and ANFIS one the same data
for comparison with proposed DELM. We have used different statistical measures for performance
measurements of these machine learning algorithms. These statistical measures values indicate that
the performance of proposed DELM is far better as compared to other counterpart algorithms. These
Electronics 2018, 7, 222 20 of 22
initial results give us confidence, and we are currently exploring various alternatives and collecting
data to extend this work in directions above.
Author Contributions: M.F. designed the proposed scheme, implemented the system, did experimental work
and paper writing. D.K. conceived the overall idea for energy consumption prediction in residential buildings
and did supervision of the overall work.
Funding: This research received no external funding.
Acknowledgments: This research was supported by the 2018 scientific promotion program funded by Jeju
National University.
Conflicts of Interest: The authors declare no conflicts of interest.
References
1. Fayaz, M.; Kim, D. Energy Consumption Optimization and user comfort management in residential buildings
using a bat algorithm and fuzzy logic. Energies 2018, 11, 161. [CrossRef]
2. Selin, R. The Outlook for Energy: A View to 2040; ExxonMobil: Irving, TX, USA, 2013.
3. Sieminski, A. International Energy Outlook; Energy Information Administration: Washington, DC, USA, 2014.
4. Mitchell, B.M.; Ross, J.W.; Park, R.E. A Short Guide to Electric Utility Load Forecasting; Rand Corporation: Santa
Monica, CA, USA, 1986.
5. Parez-Lombard, L.; Ortiz, J.; Pout, C. A review on buildings energy consumption information. Energy Build.
2008, 40, 394–398. [CrossRef]
6. Zhao, H.-X.; Magoulès, F. A review on the prediction of building energy consumption. Renew. Sustain.
Energy Rev. 2012, 16, 3586–3592. [CrossRef]
7. Fumo, N. A review on the basics of building energy estimation. Renew. Sustain. Energy Rev. 2014, 31, 53–60.
[CrossRef]
8. Ahmad, A.; Hassan, M.; Abdullah, M.; Rahman, H.; Hussin, F.; Abdullah, H.; Saidur, R. A review on
applications of ANN and SVM for building electrical energy consumption forecasting. Renew. Sustain.
Energy Rev. 2014, 33, 102–109. [CrossRef]
9. Kim, S.; Kim, S. A multi-criteria approach toward discovering killer iot application in Korea. Technol. Forecast.
Soc. Change 2016, 102, 143–155. [CrossRef]
10. Malik, S.; Kim, D. Prediction-learning algorithm for efficient energy consumption in smart buildings based
on particle regeneration and velocity boost in particle swarm optimization neural networks. Energies 2018,
11, 1289. [CrossRef]
11. Khosravani, H.R.; Castilla, M.D.M.; Berenguel, M.; Ruano, A.E.; Ferreira, P.M. A comparison of energy
consumption prediction models based on neural networks of a bioclimatic building. Energies 2016, 9, 57.
[CrossRef]
12. Kalogirou, S.A. Artificial neural networks in energy applications in buildings. Int. J. Low-Carbon Technol.
2006, 1, 201–216. [CrossRef]
13. Kampouropoulos, K.; Cárdenas, J.J.; Giacometto, F.; Romeral, L. An energy prediction method using
adaptive neuro-fuzzy inference system and genetic algorithms. In Proceedings of the 2013 IEEE International
Symposium on Industrial Eleactronics, Taipei, Taiwan, 28–31 May 2013.
14. Ullah, I.; Ahmad, R.; Kim, D. A prediction mechanism of energy consumption in residential buildings using
hidden markov model. Energies 2018, 11, 358. [CrossRef]
15. Huang, G.B.; Zhu, Q.Y.; Siew, C.K. Extreme learning machine: Theory and applications. Neurocomputing
2006, 70, 489–501. [CrossRef]
16. Fan, C.; Xiao, F.; Zhao, Y. A short-term building cooling load prediction method using deep learning
algorithms. Appl. Energy 2017, 195, 222–233. [CrossRef]
17. Schmidhuber, J. Deep learning in neural networks: An overview. Neural Netw. 2015, 61, 85–117. [CrossRef]
[PubMed]
18. Li, L.; Lv, Y.; Wang, F.Y. Traffic signal timing via deep reinforcement learning. IEEE/CAA J. Autom. Sin. 2016,
3, 247–254.
19. Lv, Y.; Duan, Y.; Kang, W.; Li, Z.; Wang, F.Y. Traffic flow prediction with big data: A deep learning approach.
IEEE Trans. Intell. Transp. Syst. 2015, 16, 865–873. [CrossRef]
Electronics 2018, 7, 222 21 of 22
20. Qiu, X.; Zhang, L.; Ren, Y.; Suganthan, P.N.; Amaratunga, G. Ensemble deep learning for regression and time
series forecasting. In Proceedings of the 2014 IEEE Symposium on Computational Intelligence in Ensemble
Learning (CIEL), Orlando, FL, USA, 9–12 December 2014; pp. 21–26. [CrossRef]
21. Kalogirou, S.; Neocleous, C.; Schizas, C. Building heating load estimation using artificial neural networks.
In Proceedings of the Clima 2000 Conference, Brussels, Belgium, 30 August–2 September 1997.
22. Olofsson, T.; Andersson, S.; Östin, R. A method for predicting the annual building heating demand based on
limited performance data. Energy Build. 1998, 28, 101–108. [CrossRef]
23. Yokoyama, R.; Wakui, T.; Satake, R. Prediction of energy demands using neural network with model
identification by global optimization. Energy Convers. Manag. 2009, 50, 319–327. [CrossRef]
24. Kreider, J.; Claridge, D.; Curtiss, P.; Dodier, R.; Haberl, J.; Krarti, M. Building energy use prediction and
system identification using recurrent neural networks. J. Sol. Energy Eng. 1995, 117, 161–166. [CrossRef]
25. Ben-Nakhi, A.E.; Mahmoud, M.A. Cooling load prediction for buildings using general regression neural
networks. Energy Convers. Manag. 2004, 45, 2127–2141. [CrossRef]
26. Carpinteiro, O.A.; Reis, A.J.; da Silva, A.P. A hierarchical neural model in short-term load forecasting.
Appl. Soft Comput. 2004, 4, 405–412. [CrossRef]
27. Gross, G.; Galiana, F.D. Short-term load forecasting. Proc. IEEE 1987, 75, 1558–1573. [CrossRef]
28. Irisarri, G.; Widergren, S.; Yehsakul, P. On-line load forecasting for energy control center application.
IEEE Trans. Power App. Syst. 1982, 71–78. [CrossRef]
29. Ali, S.; Kim, D.-H. Effective and comfortable power control model using kalman filter for building energy
management. Wirel. Pers. Commun. 2013, 73, 1439–1453. [CrossRef]
30. Wahid, F.; Kim, D.H. Short-term energy consumption prediction in korean residential buildings using
optimized multi-layer perceptron. Kuwait J. Sci. 2017, 44, 67–77.
31. Wahid, F.; Kim, D.H. A prediction approach for demand analysis of energy consumption using K-nearest
neighbor in residential buildings. Int. J. Smart Home 2016, 10, 97–108. [CrossRef]
32. Arghira, N.; Hawarah, L.; Ploix, S.; Jacomino, M. Prediction of appliances energy use in smart homes. Energy
2012, 48, 128–134. [CrossRef]
33. Li, K.; Su, H.; Chu, J. Forecasting building energy consumption using neural networks and hybrid
neuro-fuzzy system: A comparative study. Energy Build. 2011, 43, 2893–2899. [CrossRef]
34. Kassa, Y.; Zhang, J.; Zheng, D.; Wei, D. Short term wind power prediction using ANFIS. In Proceedings of
the 2016 IEEE International Conference on Power and Renewable Energy (ICPRE), Shanghai, China, 21–23
October 2016; pp. 388–393.
35. Ekici, B.B.; Teoman Aksoy, U. Prediction of building energy needs in early stage of design by using ANFIS.
Expert Syst. Appl. 2011, 38, 5352–5358. [CrossRef]
36. Collobert, R.; Weston, J. A unified architecture for natural language processing: Deep neural networks with
multitask learning. In Proceedings of the 25th International Conference on Machine Learning, Helsinki,
Finland, 5–9 July 2008; pp. 160–167.
37. Hinton, G.E.; Salakhutdinov, R.R. Reducing the dimensionality of data with neural networks. Science 2006,
313, 504–507. [CrossRef] [PubMed]
38. Nau, R. Forecasting with Moving Averages. Fuqua School of Business, Duke University, 2014. Available
online: https://people.duke.edu/~rnau/Notes_on_forecasting_with_moving_averages--Robert_Nau.pdf
(accessed on 24 June 2018).
39. Niu, D.; Wang, H.; Chen, H.; Liang, Y. The general regression neural network based on the fruit fly
optimization algorithm and the data inconsistency rate for transmission line icing prediction. Energies 2017,
10, 66. [CrossRef]
40. Cheng, J.; Duan, Z.; Xiong, Y. QAPSO-BP algorithm and its application in vibration fault diagnosis for a
hydroelectric generating unit. Shock Vib. 2015, 34, 177–181.
41. Huang, G.-B.; Wang, D.H.; Lan, Y. Extreme learning machines: A survey. Int. J. Mach. Learn. Cybern. 2011, 2,
107–122. [CrossRef]
42. Wang, S.; Chen, H.; Yan, W.; Chen, Y.; Fu, X. Face recognition and micro-expression recognition based on
discriminant tensor subspace analysis plus extreme learning machine. Neural Process. Lett. 2014, 39, 25–43.
[CrossRef]
43. Huang, G. An insight into extreme learning machines: Random neurons, random features and kernels.
Cogn. Comput. 2014, 6, 376–390. [CrossRef]
Electronics 2018, 7, 222 22 of 22
44. Wei, J.; Liu, H.; Yan, G.; Sun, F. Robotic grasping recognition using multi-modal deep extreme learning
machine. Multidim. Syst. Signal Process. 2017, 28, 817–833. [CrossRef]
45. Geem, Z.W. Parameter estimation for the nonlinear muskingum model using the bfgs technique. J. Irrig.
Drain. Eng. 2006, 132, 474–478. [CrossRef]
46. Shine, P.; Murphy, M.; Upton, J.; Scully, T. Machine-learning algorithms for predicting on-farm direct water
and electricity consumption on pasture based dairy farms. Comput. Electron. Agric. 2018, 150, 74–87.
[CrossRef]
47. Chau, K.W. Particle swarm optimization training algorithm for ANNs in stage prediction of Shing Mun
River. J. Hydrol. 2006, 329, 363–367. [CrossRef]
48. Lo, S.-P. An adaptive-network based fuzzy inference system for prediction of workpiece surface roughness
in end milling. J. Mater. Process. Technol. 2003, 142, 665–675. [CrossRef]
49. Elena Dragomir, O.; Dragomir, F.; Stefan, V.; Minca, E. Adaptive neuro-fuzzy inference systems as a strategy
for predicting and controling the energy produced from renewable sources. Energies 2015, 8, 13047–13061.
[CrossRef]
50. Chang, F.-J.; Chang, Y.-T. Adaptive neuro-fuzzy inference system for prediction of water level in reservoir.
Adv. Water Resour. 2006, 29, 1–10. [CrossRef]
51. Jang, J.-S.R. ANFIS: Adaptive-network-based fuzzy inference system. IEEE Trans. Syst. Man Cybern. 1993, 23,
665–685. [CrossRef]
52. Owda, H.; Omoniwa, B.; Shahid, A.; Ziauddin, S. Using Artificial Neural Network Techniques for
Prediction of Electric Energy Consumption. Available online: https://arxiv.org/abs/1412.2186 (accessed on
24 June 2018).
53. MATLAB, version 8.1.0 (R2013a); The MathWorks Inc.: Natick, MA, USA, 2013.
© 2018 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access
article distributed under the terms and conditions of the Creative Commons Attribution
(CC BY) license (http://creativecommons.org/licenses/by/4.0/).

Electronics 07 00222 PDF

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Electronics 07 00222 PDF

Uploaded by

Copyright:

Available Formats

electronics

Keywords: energy; prediction; residential building; machine learning algorithms; consumption

Electronics 2018, 7, 222; doi:10.3390/electronics7100222 www.mdpi.com/journal/electronics

Figure 2. A conceptual model of the proposed approach.

3. Proposed Energy Consumption Prediction Methodology

3.1. Data Acquisition

Data One Year

Jan Feb Mar Dec

Each Day of the Month

Temperature Humidity Environmental User

2 Temperature Humidity Environmental User

Temperature Humidity Environmental User

Figure 4. Data collection, the Year 2010.

Electronics 2018, 7, x; doi: FOR PEER REVIEW

3.2. Preprocessing Layer

Electronics 2018, 7, x; doi: FOR PEER REVIEW

a11 a12 . . . a1z

b11 b12 . . . b1z

w11 w12 . . . w1p

WHE = g−1 ( H1 ) HE+ (14)

H3 = g−1 ( H2 W2 + B2 )= g(WHE1 HE1 ) (18)

H3 = g(WHE1 HE1 ). (21)

H4 = g−1 ( H3 W3 + B3 ) = g(WHE1 HE1 ) (24)

H4 = g(WHE2 HE2 ) (26)

χ(x) = linear (x) (31)

Electronics 2018, 7, x FOR PEER REVIEW 12 of 22

Figure 9. Structural diagram for an adaptive neuro-fuzzy inference

3.4. Performance Evaluation Layer

4. Experimental Results Based on Prediction Algorithms

4.1. Model Validation of DELM

5-6 4-5 3-4 2-3 1-2 0-1

Electronics 2018, 7, x FOR PEER REVIEW 15 of 22

Electronics 2018, 7, x; doi: FOR PEER REVIEW

Figure 13. Screenshot

Electronics 2018, 7, x; doi: FOR PEER REVIEW

Electronics 2018, 7, x FOR PEER REVIEW 17 of 22

Electronics 2018, 7, x; doi: FOR PEER REVIEW

Figure 18. ANN

Electronics 2018, 7, x FOR PEER REVIEW 18 of 22

Statistical Measures MAE MAPE RMSE

Statistical Measures MAE MAPE RMSE

Statistical Measures MAE MAPE RMSE

6. Conclusions and Future Work

You might also like