You are on page 1of 7

REGULARIZATION NEURAL NETWORK FOR

CONSTRUCTION COST ESTIMATION

By Hoijat Adeli l and Mingyang Wu2

ABSTRACT: Estimation of the cost of a construction project is an important task in the management of con-
struction projects. The quality of construction management depends on accurate estimation of the construction
cost. Highway construction costs are very noisy and the noise is the result of many unpredictable factors. In
this paper, a regularization neural network is formulated and a neural network architecture is presented for
estimation of the cost of construction projects. The model is applied to estimate the cost of reinforced-concrete
pavements as an example. The new computational model is based on a solid mathematical foundation making
the cost estimation consistently more reliable and predictable. Further, the result of estimation from the regu-
larization neural network depends only on the training examples. It does not depend on the architecture of the
neural network, the leaming parameters, and the number of iterations required for training the system. Moreover,
Downloaded from ascelibrary.org by University of Leeds on 04/30/13. Copyright ASCE. For personal use only; all rights reserved.

the problem of noise in the data is taken into account in a rational manner.

INTRODUCTION Leaming from previous data or examples neural networks


can make very reasonable estimations without using specific
Estimation of the cost of a construction project is an im- experts and rules. For example, the price of a concrete pave-
portant task for the management of construction projects. The ment is influenced by a number of factors including the quan-
quality of construction management depends on the accurate tity, the dimension (thickness of the pavement), the local ec-
estimation of the construction cost. This task is currently per- onomic factors, and the year of construction. The problem to
formed by "experienced" construction cost estimators in a be investigated is whether using the values for these factors,
highly subjective manner. Such a subjective analysis is subject obtained from past construction projects, a neural network
to human errors and varying results depending on who the model can estimate the price of a future construction project
construction cost estimator is and possible litigation conse- accurately.
quences. In this paper, first the concepts of estimation, learning, and
Automating the process of construction cost estimation noisy curve fitting are described and formulated mathemati-
based on objective data is desirable not only for improving cally. Next, the special case of a radial-based function neural
the efficiency, but also for removing the subjective question- network, called regularization neural network, is formulated
able human factors as much as possible. The problem is not for estimating the cost of construction projects. Then, the
amenable to traditional problem-solving approaches. The costs model is applied to reinforced-concrete pavement cost esti-
of construction materials, equipment, and labor depend on nu- mation as an example.
merous factors, with no explicit mathematical model or rule
for price prediction. Recently, neural networks have been used ESTIMATION, LEARNING, AND NOISY CURVE
for learning and prediction problems with no explicit model FITTING
such as securities price prediction (Hutchinson et al. 1994), The most fundamental problem that neural networks have
system identification (Narendra and Parthasarathy 1990), en- been used to solve is that of "learning." But it is very difficult,
gineering design (Hung and Adeli 1991), and image recogni- if not impossible, to define learning precisely. To model learn-
tion (Adeli and Hung 1995). ing computationally, however, it has to be defined in a prag-
Several authors have discussed potential applications of matic manner rather than as an abstract concept. Learning can
neural networks in construction in recent years. Moselhi et al. be defined as a self-organizing process, a mapping process, an
(1991) point out the potential applications of neural networks optimization process, or a decision-making process. The last
in the general area of construction. Moselhi et al. (1993) used definition is based on the observation that given a set of ex-
the back-propagation neural networks (Hegazy et al. 1994) and amples and a stimulus, one makes a decision about the re-
the genetic algorithm (Adeli and Hung 1995) to develop a sponse to the stimulus.
decision-support system to aid contractors in preparing bids. Consider the special case of supervised learning, where the
Back-propagation algorithm has also been used for estimating system is first trained by a set of input-output examples. Then,
construction productivity (Chao and Skibniewski 1994), eval- given a new input the learner decides the output. This is an
uation of new construction technology acceptability (Chao and ill-posed problem because the answer can be any multitude of
Skibniewski 1995), and for condition rating of roadway pave- values. The selected answer depends on the generalization cri-
ment sections (Eldin and Senouci 1995). Kartam (1996) uses teria or constraints chosen for the decision process. The ad-
neural networks to determine optimal equipment combinations vantage of viewing the learning process as a decision-making
for earth-moving operations. Pompe and Feelders (1997) use process is its explicit representation of the generalization
neural networks to predict corporate bankruptcy. criteria.
An estimation problem can be formulated as a supervised
'Prof., Dept. of Civ. and Envir. Engrg. and Geodetic Sci., The Ohio learning problem. Consider a one-input-one-output noisy sys-
State Univ., 470 Hitchcock Hall, Columbus, OH 43210.
2Grad. Res. Assoc., Dept. of Civ. and Envir. Engrg. and Geodetic Sci., tem with input x and output y. The system can be expressed
The Ohio State Univ., 470 Hitchcock Hall, Columbus, OH. mathematically as
Note. Discussion open until July I, 1998. To extend the closing date
one month, a written request must be filed with the ASCE Manager of y =!(x) + e (1)
Journals. The manuscript for this paper was submitted for review and where !(x) = a function of the input variable x; and e = an
possible publication on February 28, 1997. This paper is part of the Jour-
nal of Construction Engineering and Management, Vol. 124, No. I, independent noise function with zero mean. A set of input-
February, 1998. ©ASCE, ISSN 0733-9364/98/0001-0018-0024/$4.00 + output examples Xi and Yi (i = I, 2, ... , N) is given. The
$.50 per page. Paper No. 15205. estimation problem is: For any given input x find the best
18/ JOURNAL OF CONSTRUCTION ENGINEERING AND MANAGEMENT / JANUARY/FEBRUARY 1998

J. Constr. Eng. Manage. 1998.124:18-24.


and estimation and this is the research challenge. A method to
achieve proper fitting will be discussed in the following sec-
tions.
Highway construction costs are affected by many factors,
but only a few main factors are usually recorded and can be
considered in the mathematical modeling of the cost-estima-
tion problem. As such, the highway construction data are very
noisy, and the noise is the result of many unpredictable factors
y such as human judgmental factors, random market fluctua-
tions, and weather conditions. Consequently, finding a prop-
erly fitted approximation is extremely important. Otherwise,
the predicted cost will have a substantial error.
One approach to solve this problem is the multilayer feed-
forward back-propagation neural network (Haykin 1994). The
problem with this approach is that the generalization properties
(underfitted, overfitted, or properly fitted) depend on many fac-
Downloaded from ascelibrary.org by University of Leeds on 04/30/13. Copyright ASCE. For personal use only; all rights reserved.

tors including the architecture (number of hidden layers and


x
number of nodes in the hidden layers), initial weights of the
FIG. 1. Comparison of Properly Learned and Overfltted links connecting the nodes, and number of iterations for train-
Learned Curve ing the system. The performance of the algorithm depends
highly on the selection of these parameters. The problem of
estimate of the output y. The best estimate can be defined as arbitrary trial-and-error selection of the learning and momen-
the estimate that minimizes the average error between the real tum ratios encountered in the momentum back-propagation
and estimated outputs. Thus, such a supervised learning prob- learning algorithm can be circumvented by the adaptive con-
lem becomes equivalent to a mapping or curve-fitting problem jugate gradient learning algorithm developed by Adeli and
in a multidimensional hyperspace. The traditional statistical Hung (1994). However, that algorithm does not address the
approaches to curve fitting, such as the regression analysis, issue of noise in the data. In the highway construction esti-
fail to represent problems with no explicit mathematical model mation problem, the data has substantial noise and the neural
accurately in a multidimensional space of variables. On the network algorithm must be able to address the issue of noise
other hand, the neural network approach can solve such prob- in the data properly. In this paper we use a neural network
lems more effectively. architecture, called regularization network, to obtain the prop-
Fig. 1 shows a very simple example of curve fitting and erly fitted approximation function and to solve the construction
learning. The dots represent the example data points that in- estimation problem accurately.
clude noise. The dashed line represents the properly learned
curve. The solid line represents the overfitted learned curve. REGULARIZATION NETWORKS
For this curve, the training error is very small (because the
learned curve passes through all the training data points), but According to the regularization theory (Tikhonov and Ar-
the estimation or generalization error is large. This is because senin 1977; Haykin 1994), the approximation mapping func-
the influence of the noise has not been taken into account at tion F is determined by minimizing an error function, E(F),
all. This problem is referred to as overfitting, which leads to consisting of two terms in the following form:
less than satisfactory learning. Avoiding the overfitting prob-
lem is very important for accurate estimation and learning. E(F) = EiF) + Ec(F) (3)
A mathematical definition of learning is now formulated as The first term is the standard error term measuring the error
a mapping (generalization of curve fitting) problem in a mul- between the sample example response d; and the correspond-
tidimensional hyperspace. A neural network is designed to per- ing computed response ai' and is defined as follows:
form a nonlinear mapping function s from a p-dimensional N N
input space RP to a one-dimensional output space R 1
E.(F) =! 2: (d; -
2 ;_\
0;)2 =!
2
2: [d; -
i-\
F(X;)]2 (4)
(2)
As discussed earlier, the perfect fit may not be the best answer
The set of N available input-output data can be described as due to the noise in the data. To overcome the overfitting prob-
follows: lem, a regularization term is added to the standard error term;
Input signal: Xi E RP, i = I, 2, ... , N the function of this term is to smoothen the approximation
function. This term is defined as
Example output signal: di E R 1, i = I, 2, ... , N

where X; = (x~, x~, ... , x;) = ith example with p input attributes Ec(F) =~ IlpFW (5)
(x~ is the nth attribute of the ith example); and d; = corre-
sponding example output. The approximation mapping func- where the symbol IIgll denotes the norm of function g(x) de-
tion is denoted by F(x). fined as
What is the best fit? This is an important question. Because
of the noise in the data examples, a perfect fit, that is when (6)
F(x;) = d i , usually is not the best fit. In this case, the approx-
imation function is often very curvy, with numerous steep
peaks and valleys that lead to poor generalization. This is the and P = a linear-differential operator defined as (Poggio and
overfitting problem mentioned earlier. 1\\10 other fitting situ- Girosi 1990; AI-Gwaiz 1992)
ations can also be recognized: underfitting with oversmooth K
surfaces resulting in poor generalization and proper fitting.
Only the last type of fitting can lead to accurate generalization
IIPFI1
2
= 2: MD F(x)11 k 2
(7)
k-O

JOURNAL OF CONSTRUCTION ENGINEERING AND MANAGEMENT / JANUARY/FEBRUARY 1998/19

J. Constr. Eng. Manage. 1998.124:18-24.


In (7), K = a positive integer; bk (k = 0, 1, ... , K) = positive connecting the hidden layer to the output layer represent the
real numbers; and the norm of the differential operator D k is weights WI in the approximation function, (11).
defined as The learning process of the regularization network consists

IID FW =
k
2:
lal-k
iRP
[oaF(x)]2 dx\ dx2 '" dxp (8)
of two steps. In the first step, the value of the parameter <T in
(11) is found by a cross-validation procedure that is described
later. The smoothness of the approximation function is pri-
marily controlled by this parameter. The smaller the value of
The multiindex a = (al> a2, ... , a p ) = a sequence of nonneg- a, the smoother the approximation function. We will call a
ative integers whose order is defined as lal = ~f=1 al' In (8), the smoothing parameter. In the second step, WI is found using
the partial differential term inside the bracket is defined as the method described in next section.
olaIF(x) DETERMINATION OF THE REGULARIZATION
o°F(x) = OX\'OX2" ..• OXpP
a <L a (9)
NETWORK WEIGHTS
Therefore, the regularization term is The smoothing parameter <T and the weights WI depend on

2: 2: i each other in a complicated way and consequently must be


calculated iteratively. In this section, a method to find the
Downloaded from ascelibrary.org by University of Leeds on 04/30/13. Copyright ASCE. For personal use only; all rights reserved.

ElF) = -1 K b.[oaF(x)]2 dx\ ~ ... dxp (10)


2 k_O lal-k RP
weight WI is presented. Defining the following matrices:

This function is simply the summation of integrations of the d = [d lt d 2 , ••• , dNt (12)
partial derivatives of the approximation function squared. As
G(Xl;Xl) G(Xt;X2) G(X\;XN)]
such, the regularization term is small when the function is
smooth because the derivatives tend to be small and vice versa. G = G(xt X\) G(xt X2 ) G(X~;XN) (13)
For bk = 132klk!2k and K approaching infinity, where ~ is a [
G(XN;X\) G(XN;X2) G(XN;XN)
positive real number, it can be proved that by minimizing the
error function, (3), with respect to the approximation function, w = [Wit W2, ... , WNt (14)
the solution of the problem can be written in the following
form (Poggio and Girosi 1990): where G(Xi;Xj) = exp[ -allx; - xjln it can be shown that the
solution of the regularization problem, Le., the weight WI' sat-
F(x) = ±
,,..1
Wi exp (- 2~211x -
tJ
XiW) = ±i-I
Wi exp(-ullx - XiW)
(11)
isfies the following equation (Haykin 1994):
(G + I)w = d (15)

where I = N X N identity matrix. If the matrix (G + I) is not


where u = l/2~2 and Wi = real numbers. Eq. (11) is a linear ill-conditioned, the solution of the linear equations represented
superposition of multivariate Gaussian functions with centers by (15) can be solved by linear equation solvers such as the
XI located at the data points. Gauss-Jordan elimination or LU decomposition method. The
The architecture of the regularization network is shown in N X N matrix (G + I), however, is large and usually suffers
Fig. 2. It consists of an input layer, a hidden layer, and an from numerical ill-conditioning. This leads to zero pivot in the
output layer. The number of nodes in the input layer is equal Gauss-Jordan elimination method resulting in large errors in
to p, the number of input attributes. The number of nodes in the solution. The aforementioned approach was used in this
the hidden layer is equal to N, the number of training exam- research but without any success. The elements of the opti-
ples. The output node gives the estimated construction cost. mum vector w were found to be very large due to numerical
The network shown in Fig. 2 is a feed forward network. instability.
The nodes in the hidden layer represent the nonlinear multi- To overcome the numerical ill-conditioning problem, a sin-
variate Gaussian activation functions Gi(x) = exp[ -ullx - gular value decomposition method is used to find WI (Press et
xi11 2]. In other words, the output of the ith node in the hidden al. 1988). In this approach, the matrix (G + I) is first decom-
layer is Gi(x) = exp[ -ullx - xiIl 2]. The input and hidden layers posed as
are fully connected. That means every node in the hidden layer
receives inputs from all the nodes in the input layer. The links G +I =UCyT (16)
T
x, where U and V = N X N orthonormal matrices (Le., UU = I
and VV T = I); and C = an N X N diagonal matrix with di-
agonal entries Ci' called singular values, where
I c I x, led ~ IC21 2 ... ~ ICNI
.~~
Having found the matrices U, V, and C, the weights vector
~ I~ can be found from
3. ',8
o F0<)
If
iZ

~ -~[~].y W
w- LJ (17)
i-I c;
where U(i), i = 1, ... , N = ith column of U; and y(O' i = 1,
... , N = ith column of Y. In (17), the summation is performed
Xp
Hidden Loyer of
over J terms and not over all the N terms in order to avoid
Input Loyer Gaussian Functions OJtput Loyer numerical ill-conditioning due to division by very small num-
./ "- bers and the truncation error. The CI terms in the denominator
/ ~
of (17) cannot take very small values. All of the singular val-
[(J] -croSs-\.§'~~~ . --~ ues used in (17) are greater than E = NEmlc\l, where Em is the
FIG. 2. Architecture of Regularization Network for Construc- machine precision, and c\ is the largest singular value. By
tion Cost Estimation selecting a predetermined value for Em the small values of Ci
20/ JOURNAL OF CONSTRUCTION ENGINEERING AND MANAGEMENT / JANUARY/FEBRUARY 1998

J. Constr. Eng. Manage. 1998.124:18-24.


are excluded from the summation in (17), and the summation INPUT AND OUTPUT NORMALIZATION
is done over J tenns such that lei I > e for any i ::s J and
Because the regularization network uses spherically sym-
lei I ::s e for any i > J. metric multivariate Gaussian activation functions, to improve
the perfonnance the input variables are nonnalized so that they
PROPER GENERALIZATION AND ESTIMATION BY span similar ranges in the input space. Nonnalization is un-
CROSS-VALIDATION dertaken to ensure that all examples in the training set have
For proper solution of the problem and accurate estimation, similar influence on the learned approximation function. In
a trade-off is necessary between minimizing the standard error other words, it is statistically desirable to have variables with
tenn, (4), and smoothening of the approximation function. As zero mean and the same unit standard deviation. This can be
mentioned earlier, the smoothness of the approximation func- achieved by using the following change of variables and nor-
tion and the generalization properties of the network are influ- malization procedure (Fukunaga 1990):
enced by the parameter CJ'. To obtain a proper value for CJ' a n -
-n Xi - Xi
method called cross-validation, used in statistical pattern rec- Xi =--- (19)
ognition, is employed in this research (Fukunaga 1990). CJ'i

In the cross-validation method, the available set of examples where i7, n = 1. 2•...• N = nonnalized input data, and
Downloaded from ascelibrary.org by University of Leeds on 04/30/13. Copyright ASCE. For personal use only; all rights reserved.

is divided randomly into two sets: a training set (x~, d~), n = N


1, 2, ... , N, and a validation set (x~, d~), n = 1, 2, ... , N v ,
where subscripts t and v refer to training and validation sets, Xi =.!.N 2: x7 for i = 1, ... , p (20)
n-\
respectively, and N, and N v are the numbers of training and
validation examples, respectively. The network is trained with and
the set (x~, d~), n = 1. 2•... , N" using different values of CJ'
within a given range. This range is problem dependent and is CJ'I
2
=- 1 ~
- LJ (XI -
n
Xi)
-2
for i = 1, ...• P (21)
detennined by experience and numerical experimentation. N - 1 n-t

For each value of CJ', the Wi weights are found using (16) are the means and standard deviations of the original set of
and (17), and an average training error is calculated in the variables.
following fonn: The Gaussian activation function is maximum at its center
N,
(data point) and approaches zero at large distances from the
E,= 2:
.-1 [d7 - F(x7)]2/N, (l8a) center. In other words, statistically speaking, use of the Gauss-
ian activation function amounts to a large output near the
center (data point) and zero output at large distances from the
Next, using the validation set (x:, d~). n = 1, 2, ...• N v , an center where there is no data point. But the lack of data point
average validation error is calculated in the following fonn: does not necessarily mean the output is zero at large distances
from the available sample data points.
N.
One may argue that it is not possible to make an accurate
2:
.-\ [d: - F(x:mNv (I8h) estimate at large distances from the example data points. This
is the well-known extrapolation problem. While the regulari-
Typical trend relationships between the average training and zation theory solves the interpolation problem accurately, it is
validation errors and the smoothing factor CJ' are shown in Fig. not concerned with the extrapolation problem. But a practical
3. The average training error always decreases with an increase estimation system should not fail abruptly at the boundaries
in the magnitude of CJ' for a numerically stable algorithm. In of the available data domain. Consequently, to improve the
contrast. the average validation error curve does not have a estimation accuracy at large distances from the available data
continuously decreasing trend. Rather. one can identify a min- points. first a linear trend (hyperplane) is found through the
imum on this curve. As mentioned earlier, broadly speaking a example data points by perfonning a linear regression analysis.
large CJ' indicates overfitting and a small CJ' indicates underfit- Next. the output data are nonnalized with respect to this hy-
ting. The CJ' corresponding to the global minimum point on the perplane (the outputs are measured from this plane instead of
validation curve represents the properly fitted estimation curve. a zero base hyperplane). Finally, a regularization network is
The average validation error gives an estimate of the estima- applied using the nonnalized data output. This process will
tion/prediction error. bring the estimates at large distances from the available data
points close to the linear trend hyperplane.
Mathematically, the function 'L~=I(d7 - 'Lf=1 a ii7 - ao)2 is
minimized with respect to linear parameters al (i = O. 1,
... , p) in order to find the linear trend hyperplane. This hy-
perplane is represented by
p

Y= 2: aixi + ao
i-I
(22)

Then the nonnalized output data are defined as


p
'.
a; =
n
;-
dn ~
L.i
i_\
Q;X;
-n
- tlo for n = 1. 2, ...• N (23)
"
..... - .
.... _- -._- ..... APPLICATION
a The computational models presented in this article have
FIG. 3. Typical Trend Relationships between Average Training been implemented in the programming language MATLAB
and Validation Errors and the Smoothing Factor CT (1992) and applied to the problem of estimating the cost of
JOURNAL OF CONSTRUCTION ENGINEERING AND MANAGEMENT / JANUARY/FEBRUARY 1998/21

J. Constr. Eng. Manage. 1998.124:18-24.


concrete pavements. MATLAB is selected because of the Example 1
availability of a large number of built-in numerical analysis
functions such as singular value decomposition. In this example only the quantity information is used. The
The data set was collected from the files of previous projects variation of the unit cost versus the quantity is presented in
at the Ohio Department of Transportation. It includes 242 ex- Fig. 4. Because the reinforced-concrete pavement quantity has
amples of construction costs for reinforced-concrete pave- a large variation the quantity scale in Fig. 4 is a logarithmic
ments. The cost factors used in the examples are the quantity one. The figure clearly shows that the highway construction
and the dimension (thickness) of the pavement. cost data are very noisy.
The training set of 121 examples and the validation set of
UU r------------------------, 121 examples are selected randomly from the available 242
data examples. Variations of the average training and valida-
70 10 Example Data Points I tion errors with respect to the smoothing parameter (J" are
shown in Fig. 5. The trend for both curves is similar to trends
i 80 a
o
0
o
0
0
0 CP 0
0
0 discussed in an earlier section and in Fig. 3. The minimum
E o goo CO point on the average validation error curve corresponds to (J"
.c
250 080 ° 00
0
= 0.8, which represents the value needed for the proper gen-
~
0
0 , 0 0000«00 OeD 80
1:1
Downloaded from ascelibrary.org by University of Leeds on 04/30/13. Copyright ASCE. For personal use only; all rights reserved.

1:
.!! 40 00 o ll''''llACO R0f:i. ~ °t 0 o eralization. The corresponding average training and validation
U) 0 OO~O 4" ~ ca: 121"00 49
! ~ qg ~o "'JID0 errors for the unit cost of the concrete pavement are $6,45/m3
9'
OJ'. 00 aDam
00' p.ooe'~8~6".~0~.F ••
~30 o 0 0 0 0 ° 00 0 f9
o 0
and $7.22/m3, respectively. For comparison, the average unit
cost of the concrete pavement for the 242 example data is
:5'" 20
$39.2/m3 • Fig. 6 shows the learned curve along with the train-
10
ing and validation data sets.
oL- ~ ~ ~ ~ __l

1 10 100 1000 10000 100000


Example 2
Quantity (cubic meter)
In this example, the quantity and the dimension (thickness
FIG. 4. Unit Cost versus Quantity for Example Data Set of the pavement) are used as input attributes. The unit cost
versus the concrete pavement thickness for the 242 example
9.00 r--------------------------, data are shown in Fig. 7. Similar to example 1, the data set is

8.50 -Average Validation Error "U r----------------------------,


- - - Average Training Error
I. Example Date Points I
.
j
E
8.00
• Mlnlmlum Valldlltlon Error Point 70

o
go 7.50 I

I
i 0.8
~ 7.00

g
.
II 6.50 --------------------- -------
~
~ 6.00

5.50
10
5.00 L _
~ ~ ~

oL-._ _ ~ _ __'___ _........._ __'__ __'__ _ ~ _ ___.J


0.01 0.10 1.00 10.00 100.00
o 50 100 150 200 250 300 350
Thlcknaa. (mm)
FIG. 5. Average Training and Validation Errors for Different
FIG. 7. Unit Cost versus Dimension for Example Data Set
Values of a Using only Quantity Information
Y.UU r---------------------------,
"U r-------------------------
70 - Learned ApprOXimation c~rva
8.50
- Averaga Valldlltlon Error
_._. Average Training Error
-I
Validation Set Data Points

-
X
"C" 8.00
o Tralnln Set Dala P~I~~ r • Mlnlmlum Validation Error Point
"C" 60 x
OX j 7.50
j ~
~ 50 G 7.00
G j
1:
~ 40
! 6.50 0.05

:2- g6.00
§ 30
o x
.
w
~ 5.50
'":::>c 20 ~
i( 5.00
-'-
10
4.50
oL-._ _ ~ ~ ~ __ ~ ~ _ ______J 4.00 L. --'- -'- ~ _'_ _____J

1 10 100 1000 10000 100000 100000 0.001 0.010 0.100 1.000 10.000 100.000
Quantity (cubic meter) n

FIG. 6. Proper Generalized Learned Curve and TralnlnglVall- FIG. 8. Average Training and Validation Errors for Different
dation Data Set Values of a Using Quantity and Dimension Information

22/ JOURNAL OF CONSTRUCTION ENGINEERING AND MANAGEMENT / JANUARY/FEBRUARY 1998

J. Constr. Eng. Manage. 1998.124:18-24.


divided into 121 training and validation examples. Variations AI-Gwaiz, M. A. (1992). Theory of distributions. Marcel Dekker, Inc.,
of the average training and validation errors with respect to New York, N.Y.
Bishop, C. M. (1995). Neural networksfor pattern recognition. Clarendon
the smoothing parameter rr are shown in Fig. 8. The smoothing Press, Oxford, U.K.
parameter corresponding to the minimum point on the average Chao, L. C., and Skibniewski, M. J. (1994). "Estimating consttuction
validation error curve is found to be rr = 0.05. The correspond- productivity: neural-network-based approach." J. Compo in Civ.
ing average training and validation errors for the unit cost of Engrg., 8(2), 234-251.
the concrete pavement are $6.3/m3 and $6.7/m3 , respectively. Chao, L. C., and Skibniewski, M. J. (1995). "Neural networks of esti-
The learned approximation function is a surface in a three- mating consttuction technology acceptability." J. Constr. Engrg. and
Mgmt., 121(1), BO-142.
dimensional space. The average validation error for this ex- Eldin, N. N., and Senouci, A. B. (1995). "A pavement condition rating
ample is less than that for example 1. As the number of at- model using backpropagation neural networks." Microcomputers in
tributes is increased the average validation error decreases, Civ. Engrg., 10(6),433-441.
which means the construction cost is estimated more accu- Fukunaga, K. (1990). Introduction to statistical pattern recognition, 2nd
rately. Ed., Academic Press, Boston, Mass.
Haykin, S. (1994). Neural networks: a comprehensive foundation. Mac-
millan College Publishing Co., Inc., New York, N.Y.
CONCLUSION Hegazy, T., Fazio, P., and Moselhi, O. (1994). "Developing practical neu-
ral network applications using back propagation." Microcomputers in
Downloaded from ascelibrary.org by University of Leeds on 04/30/13. Copyright ASCE. For personal use only; all rights reserved.

A regularization neural network for estimating the cost of


Civ. Engrg., 9(2), 145 -159.
construction projects was presented in this paper. The problem Hung, S. L., and Adeli, H. (1991). "A model of perceptton learning with
is formulated in terms of an error function consisting of a a hidden layer for engineering design." Neurocomputing, 3(1), 3-14.
standard error term and a regularization term. The aim of the Hutchinson, J. M., Lo, A., and Poggio, T. (1994). "A nonparametric ap-
latter term is to compensate for the overfitting problem and to proach to pricing and hedging derivative securities via learning net-
improve the cost estimation outside of the available data works." MIT A.I. Memo No. 147/, Cambridge, Mass.
points. Kartam, N. (1996). "Neural network-spreadsheet integration for earth-
moving operations." Microcomputers in Civ. Engrg., 11(4), 283-288.
The traditional regression analysis methods can fit the data MATLAB, high-performance numeric computation and visualization soft-
only in certain types of functions, such as polynomial func- ware: user's guide: for UNIX workstations. (1992). Math Works, Inc.,
tions. Further, a major assumption is that the data must fit one Natick, Mass.
of these functions. In the regularization neural networks ap- Moselhi, 0., Hegazy, T., and Fazio, P. (1991). "Neural networks as tools
proach presented here, on the other hand, no assumption is in consrruction." J. Constr. Engrg. and Mgmt., 117(4), 606-623.
made about the shape of the approximation function to be Moselhi, 0., Hegazy, T., and Fazio, P. (1993). "DBIO: analogy-based
DSS for bidding in consttuction." J. Constr. Engrg. and Mgmt., 119(3),
learned. The only assumptions made are the continuity and 466-479.
general smoothness of the function. Narendra, K. S., and Parthasarathy, K. (1990). "Identification and conttol
The neural networks model presented in this paper has the of dynamical system using neural networks." IEEE Trans. on Neural
following major advantages over other neural networks algo- Networks, 1,4-27.
rithms such as the back-propagation (BP) neural networks: Poggio, T., and Girosi, F. (1990). "Network for approximation and learn-
ing." Proc., IEEE, 78(9), 1481-1497.
Pompe, P. P. M., and Feelders, A. J. (1997). "Using machine learning
• The regularization neural networks is based on a solid
and statistics to predict corporate bankruptcy." Microcomputers in Civ.
mathematical foundation. This makes the cost-estimation Engrg., 12(4),267-276.
model consistently reliable and predictable. Press, W. H., Flannery, B. P., Teukolsky, S. A., and Vetterling, W. T.
• The result of estimation from the regularization neural (1988). Numerical recipes in C: the art of scientific computing. Cam-
network depends only on the training examples. It does bridge University Press, New York, N.Y.
not depend on the architecture of the neural network (such Tikhonov, A. N., and Arsenin, V. Y. (1977). Solution of ill-posed prob-
as the number of nodes in the hidden layer), the learning lems. W. H. Winston, Washington, D.C.
parameters (such as the learning and momentum ratios in
the BP algorithm), and the number of iterations required APPENDIX II. NOTATION
for training the system. As such, the regularization neural The following symbols are used in this paper:
network presented in this paper is an objective cost esti-
mator. a; = parameters obtained from the linear least-
• The problem of noise in the data, which is important in squares regression algorithm for output
the highway construction cost data, is taken into account nonnalization;
in a rational manner. ~2k/k!2';
diagonal matrix in singular value decom-
The generalization error of the regularization networks can position;
be attributed to insufficient data examples, which can be im- C j = singular values;
proved by increasing the database of examples from previous Cl largest singular value;
construction projects, and to intrinsic noise resulting from non- d example output vector;
quantifiable and unpredictable factors that are, impossible to d; = example output corresponding to input
avoid. example X; = (x;, x~, ... , x~);
J?, nonnalized output data;
ACKNOWLEDGMENT E(F) error function;
Ec(F) regularization tenn;
This paper is based on the research sponsored by the Ohio Department ElF) standard error tenn;
of Transportation and Federal Highway Administration. E, average training error;
E v = average validation error;
APPENDIX I. REFERENCES e independent noise function with zero
Adeli, H., and Hung, S. L. (1994). "An adaptive conjugate gradient leam-
mean;
ing algorithm for efficient ttaining of neural networks." Appl. Math., F(x) approximation mapping function;
62(1), 81-102. g(x) = nonlinear mapping function mapping a p-
Adeli, H., and Hung, S. L. (1995). Machine learning-neural networks, dimensional input space R P to a one-di-
genetic algorithms, and fuzzy systems. John Wiley & Sons, Inc., New mensional output space R 1;
York, N.Y. G= Gaussian matrix;

JOURNAL OF CONSTRUCTION ENGINEERING AND MANAGEMENT / JANUARY/FEBRUARY 1998/23

J. Constr. Eng. Manage. 1998.124:18-24.


GI..X) = exp[ -<rllx - XiW]; U(I) ith column of U;
2
G(Xi;XJ) = exp[ -<rllxi - xAI ]; V orthonormal matrices in singular value
I identity matrix; decomposition;
N = number of data examples; V(I) = ith columns of V;
Nt = number of training examples; w = weight vector;
N. = number of validation examples; Xi = (x\. x~. , ..• x~) = input data;
0/ = computed response corresponding to Xi = x~ = nth attribute of ith example input data;
(x\. x~. ,., • x~); Xi = mean for the ith input attribute;
P = linear differential operator in the regular- i7 = normalized input data;
ization term; a = (a.. a2•.. '. a p) = a sequence of nonnegative integers;
E = NEmlcd = threshold value for c;s;
p = dimension of the input space (number of Em = machine precision;
input attributes);
U = orthonormal matrices in singular value <r = smoothing parameter; and
decomposition; <r/ = standard deviation for the ith input attrib-
ute,
Downloaded from ascelibrary.org by University of Leeds on 04/30/13. Copyright ASCE. For personal use only; all rights reserved.

24/ JOURNAL OF CONSTRUCTION ENGINEERING AND MANAGEMENT / JANUARY/FEBRUARY 1998

J. Constr. Eng. Manage. 1998.124:18-24.

You might also like