You are on page 1of 7

Using Neural

Networks in
Reliability
Prediction
NACHIMUTHU KARUNANITHI, DARRELL WHITLEY, and
YASHWANT K. MALAIYA
, Colorado State University

predictions under all circumstances.


4b The neural- research, the concern is how to develop A p i b l e solution is to develop models
general prediction models. Existing mod- that dont require malungassumptions about
network model els typically rely on assumptions about de- either the development environment or ex-
requires onlyfailure velopment environments, the nature of temal parameters. Recent advances in neural
software failures,and the probability of in- networks show that they can be used in appli-
histo? as input and dividual failures occurring. Because all cations that involve predictions. An interest-
these assumptions must be made before ing and difficult application is time-series
predictsfitwe the project begins, and because many pro- prediction,which predicts a complexsequen-
jects are unique, the best you can hope for tial process like reliabilitygrowth. One draw-
failures more is statistical techques that predict failure back of neural networks is that you cant in-
on the basis of failure data from similar terpret the knowledge stored in theirweights
accurately than some projects. These models are called reliabil- in simple terms that are drectly related to
ity-growth models because they predict s o h a r e metria -which is somedung you
ana& models. But when reliabilityhas grown enough to war- can do with some analyhc models.
rant product release. Neural-network models have a signifi-
the approach is ve?y Because reliability-growth models ex- cant advantage over analytic models,
hibit different predictive capabilities at though, because they require only failure
neu. different testing phases both within a pro- hstory as input, no assumptions. Using
ject and across projects, researchers are that input, the neural-network model au-
&ding it nearly impossible to develop a tomatically develops its own internal
universal model that w illprovide accurate model of the failure process and predicts
futurc Mires. Because it adjusts model tion, which involves predicting an output Training the network is the process of
c o ~ n p l e stoi ~match the complexity of the for some f u m e fixed point in time. In end- adjusting the neurons (neurons are de-
failure history, it can be more accurate point prediction, the prediction window fined in the box below) interconnection
than some commonly used analpc mod- becomes shorter as you approach the fixed strength using part of the softwaresfailure
els. In ow experiments,we found &IS to be point of interest. history. After a neural network is trained,
mle. Here you can use it to predict the total number
k+h of faults to be deteded at the end of a
TAILORING NEURAL NETWORKS A= Dl
j=k+ 1
future test session k+h by inputting
FOR PREDICTION ik+/,(t+A).
represents the cumulative execution time The three steps of developing a neural
Reliability prediction can be stated in of h consecutive future test sessions. You network for reliability prediction are spec-
the folloning way. Given a sequence of can use A to predict the number of accu- ifying a suitable network architecture,
cu~iiulativeexecution times (21, ...,ik) E I&), mulated faults after some specified choosing the training data, and training
and the corresponding observed accumu- amount of testing. From the predicted ac- the network.
lated fiults (01, ...,ok) E o k ( t ) up to the pres- cumulated faults, you can infer both the
ent time t, and the cumulative execution current reliability and how much testing Spedfying an architecture. Both prediction
time a t the end of a future test session k+h, may be needed to meet the particular reli- accuracy and resource allocation to simu-
zk+,,(t+A), predict the correspondmg cu- ability criterion. lation can be compromised if the architec-
mulative f d t s ok+h(t+A). This reliability-prediction problem ture is not suitable. Many of the algo-
For the prediction horizon h=l, the can be stated in terms of a neural network rithms used to train neural networks
prediction is cxlled the next-step predic- mapping: require you to decide the network arch-
tion (also known as short-term predic- ik+h(t+A)} + Ok+h(t+A)
p: {(lk(t), ok(t)), tecture ahead of time or by trial and error.
tion), and for h=n(>2 ) consecutive test in- where (Ik(t),ok(t)) represents the failure To provide a more suitable means of
tenals, it is known as the n-step-ahead hstory of the software system at time t selecting the appropriate network archi-
prediction, or long-term prediction. A type used in training the network and o&+/,(t+A) tecture for a project, Scott Fahlman and
of long-term prediction is endpoint predic- is the networks prediction. colleagues developed the cascade-corre-

- - ._

WHAT ARE NEURAL NETWORKS?


Neural networks are a com- used processing-unit models is with feed-forward networks B E E A c m q Speech,and Sip1
Fmcerrng,Apr. 1987, pp. 4-22.
putational metaphor inspired based on the logistic function. and a variant class of recurrent
by studies of the brain and ner- The resultingtransfer function networks, calledJordan net- 2. N. Karmanithi, Y. Malaiya, and D.
vous system in biological organ- is given by works. We selected these two Whitley, Predictionof Software
isms. They are highly idealized model classes because we found Reliability Using Neural Net-
1 works, P m t t ? Spp.SofFWure
mathematical models of haw output = them to be more accurate in re- ReliabZiy Eng., May 1991, pp. 124-
~

we understand the essence of 1 + e-S-


liability predictions than other 130.
these simple nervous systems. where Sum is the aggregateof networkmode1s.2~~
3. N. Karmanithi, D. Whitley, and Y.
The basic characteristics of a weighted inputs. Malaiya, Predictionof Software
neural network are Figure Ashows the actual REFERENCES ReliabilityUsing Connectionisr
+ It consists of many simple I/O response of this unit 1. R Lippmann, An Inmduction to Apploaehs, IEEE Trm.Sofhure
Computing with Neural Nets, f i g . (to appear).
processing units, called neu- model, where Sum is computed
rons, that perform a local com- as a weighted sum of inputs.
putation on their input to pro- The unit is nonlinear and con-
duce an output. tinuous.
+ Many weighted neuron Richard Lippman describes 1 X0.W
interconnectionsencode the manyneuralnetworkmodels
knowledge of the network.
+ The networkhas a leam-
ing algorithmthat lets it auto-
and learningprocedures. Two
well-known classessuitablefor
prediction applications are feed-
J- Oulpul

maticallydevelop internal rep-


resentations.
One of the most widely
forward networks and recur-
rent networks. In the main text
I Sum= wo x,, t t wli x,

of the article, we are concerned

54 JULY 1992
lation learning algorithm. The algorithm, neurons in the (first) hdden layer. cascade-correlation algorithm. The cas-
which dynamically constructs feed-for- Feed-forward networks can propagate cade network differs from the feed-for-
ward neural networks, combines the idea activations only in the forward direction; ward network in Figure 1a because it has
of incremental archtecture and learning Jordan networks, on the other hand, have feed-forward connections between I/O
in one training algorithm. It starts with a both forward and feedback connections. layers, not just among hidden units.
minimal network (consisting of an input The feedback connection in the Jordan In our experiments, all neural net-
and an output layer) and dynamically network in Figure 1b is from the output works use one output unit. O n the input
trains and adds hidden units one by one, layer to the hidden layer through a recur- layer the feed-forward nets use one
until it builds a suitable multilayer archi- rent input unit. At time t, the recurrent input unit; the Jordan networks use two
tecture. unit receives as input the output unit's out- units, the normal input unit and the re-
As the box on the facing page describes, put at time t - 1. That is, the output of the current input unit.
we chose feed-forward and Jordan net- additional input unit is the same as the
works as the two classes of models most output of the network that corresponds to Choosing lraitingdata. A neural network's

-
suitable for our prediction experiments. the previous input pattem. predictive ability can be affected by what it
Figure l a shows a typical three-layer feed- In Figure 1b, the dashed h e represents learns and in what sequence. Figure 3
forward network; Figure a fixed connection with a shows two reliability-prediction regimes:
l b shows a Jordan net- weight of 1.0. T h i s generalization training and prediction
work. weight copies the output training.
A typical feed-forward to the additional recur- Generalization training is the standard
neural network comprises The coxodecorrelation rent input unit and is not way of training feed-forward networks.
During training, each input i, at time t is
associated with the corresponding output
ot. Thus the network learns to model the
actual functionahty between the indepen-
dent (or input) variable and the dependent
(or output) variable.
Prediction training, on the other hand,
is the general approach for training recur-
rent networks. Under t h ~ straining, the
value of the input variable it at time t is
associated with the actual value ofthe out-
put variable at time t+1. Here, the network
leams to predict outputs anticipated at the
next time step.
neurons do not perform any computation; rithm to construct both feed-forward and Thus if you combine these two train-
they merely copy the input values and as- Jordan networks. Figure 2 shows a typical ing regimes with the feed-forward net-
sociate them with weights, feeding the feed-forward network developed by the work and the Jordan network, you get four

t
Output layer
(rumulstive fauhs)
A Output loyer
(tumulotive faults)
1

Q,

Hidden units
Input layer ~nputlayer ~5
(execution time) (execution time)

- ~.
Figure 2. Afeed-fmward network deoeloped by the
Figure 1. (A)A standard feed-forward network and (B) aJordan netvmk cascade-cowelation alprithm.

IEEE SOFTWARE 55
output
!3
Input
/
io ri
il

Time

[Bl

Figure 3. Two network-training regimes: (A) generalizatim trnining and (B) prediction trainhig.

before you attempt to use a neural net-


work, you may have to represent the
~
problems U 0 variables in a range suitable
for the neural network. In the simplest
representation, you can use a direct scal-
ing, whch scales execution time and cu-
, mulative faults from 0.0 to 1.0.
~
We did not use &IS simple representa-
...............

0 20 40 60 80 100
Normalized execution lime
1 ~~
~

Figure 4. Endpoint predictions of neural-nemork models.

neural network prediction models: FFN training pairs. We used cumulative execu-
generalization, FFN prediction, JN gen- tion time as input and the corresponding
erahzation, a n d m prediction. cumulative faults as the desired output to
form a training pair. The algorithm then
Troini the network. Most feed-forward calculates a sum squared error between
networks and Jordan networks are trained the desired outputs and the networks ac-
using a supervised learning algorithm. tual outputs. It uses the gradient of the
Under supervised learning, the algorithm sum squared error (with respect to
adjusts the network weights using a quan- weights) to adapt the network weights so
tified error feedback There are several su- that the error measure is smaller in future
pervised learning algorithms, but one of epochs.
the most widely used is back propagation Training terminates when the sum
-an iterative procedure that adjusts net- squared error is below a specified toler- Method. Most training methods initial-
work weights by pro agating the error ance lunit. ize neural-network weights with random
back into the network. P values at the beginning of training, whch
Typically,traininga neural network in- PREDICTION EXPERIMENT causes the network to converge to differ-
volves several iterations (also known as ep- ent weight sets at the end of each training
ochs). At the beginning of training, the We used the testing and debugging session. You can thus get different predic-
algorithm initializes network weights with data fiom an actual project described by tion results at the end of each training ses-
a set of small random values (between + 1.0 Yoshiro Tohma and colleagues to illustrate sion. To compensate for these prediction
and -1.0). the prediction accuracy of neural net- variations, you can take an average over a
During each epoch, the algorithm works. In thls data (Tohas Table 4), ex- large number of trials. In our experiment,
presents the network with a sequence of ecution time was reported in terms of days we trained the network with 50 random

56 JULY 1992
Model
Average error Maximum error I
1 st half 2nd half Overall 1st half 2nd half Overall '

Neural-net models
FFNgeneralization 7.34 1.19 3.36 10.48 2.85 10.48
seeds for each training-set size and aver-
aged their predictions. FENprediction 6.25 1.10 2.92 8.69 3.18 8.69
JN generalization 4.26 3.03 3.47 11.00 3.97 11.00
Results. %er training the neural net- JNprediction 5.43 2.08 3.26 7.76 3.48 7.76
work with a failure history up to time t Analpc models
(where t is less than the total testing and
debugging time of 44 days), you can use Logarithmic 21.59 6.16 11.61 35.75 13.48 35.75
the network to predict the cumulative Inverse polynomial 11.97 5.65 7.88 20.36 11.65 20.36
faults a t the end of a future testing and Exponential 23.81 6.88 12.85 40.85 15.25 40.85
debugging session. Power 38.30 6.39 17.66 76.52 15.64 76.52
To evaluate neural networks, you can Delayed S-shape 43.01 7.11 19.78 54.52 22.38 54.52
use the following extreme prediction hori-
zons: the next-step prediction (at t=t+l)
and the endpoint prediction (at t=46).
Since vou alreadv know the actual cu-
mulanve faults for those two future testing - -
and debuggmg sessions, you can compute
the netw&%'s-prediction. error at t. Then
the relative prediction error is given by
(predicted faults - actual faults)/actual
faults.4
Figures 4 and 6 show the relative pre-
diction error curves of the neural network
models. In these figures the percentage
prediction error is plotted against the per-
centage normalized execution time t/%.
Figures 4 and 5 show the relative error
curves for endpoint predictions of neural
networks and five well-known analytic
models. Results fkom the analytic models
are included because they can provide a
better basis for evaluating neural net-
works. Yashwant Malaiya and colleagues
give details about the analpc models and
fitting T h e graphs suggest
that neural networks are more accurate 0 20 40 60 80 100
than analytic models. i Normulized exetutioii tiiiie

Table 1 gives a summary of Figures 4


and -5 in terms of average and maximum Figure 5.Endpoiizt pr-edictions of'nnallltic model.
error measures. The columns under Aver-
age error represent the following:
+ First hulfis the model's average pre- work models' average prediction errors reasonably well. The maximum predic-
diction error in the first half of the testing are less than eight percent of the total de- tion errors in the table show how unrealis-
and debugging session. fects disclosed at the end of the testing and tic a model can be.
+ Secmad half is the model's average debugging session. These values also suggest that the neu-
prediction error in the second half of the This result is significant because such ral-network models have fewer worst-case
testing and debugging session. reliable predictions at early stages of test- predictions than the analyuc models at
+ &wall is the model's average pre- ing can be valuable in long-term planning. various phases of testing and debugging.
diction error for the entire testing and de- Among the neural network models, the Figure 6 represents the next-step pre-
bugging session. difference in accuracy is not significant; dictions of both the neural networks and
These average error measures also sug- whereas, the analpc models exhibit con- the analpc models. These graphs suggest
gest that neural networks are more accu- siderable variations. Among the analytlc that the neural-network models have only
rate than analytlc models. First-halfresults models the inverse polynomial model and slightly less next-step predicrion accuracy
are interesting because the neural-net- the logarithmic model seem to perform than the analytic ~nodels.

57
15 -

20 -

15 -
IO -
I

the size of the training set. O n average, the


5 5 -
-k neural networks used one hidden unit
when the normalized execution time was
f 0 -
c below 60 to 75 percent and zero hdden
z-5 -
._
units afterward. However, occasionally
eJ
a.. two or three hidden unitswere used before
-10 - training was complete.
I I Though we have not shown a similar
comparison between Jordan network
models and equivalent analytlc models,
-20
25 1
t I
extending the feed-forward network com-
parison is straightforward. However, the
0 20 40 60 80 100 models developed by the Jordan network
Normalized exetution time can be more complex because of the addi-
tional feedback connection and the
Figure 6. Next-rtep predictions of neural-network models and anabttc m d e h
weights from the additional input unit.

FFN genemlization. In h s method, with


no hidden unit, the network's actual com-
putation is the same as a simple logistic
expression:
1
Average error Maximum error o1 =
Model
~

,p~0+"'1 t,)
1st half 2nd half Overall 1st half 2nd half Overall j +

where wo and w1 are weights from the bias


unit and the input unit, respectively, and t,
is the cumulative execution time at the end
of ith test session.
This expression is equivalent to a two-
parameter logistic-function model, whose
p(tJ is given by

6.34 7.83 7.83 where POand p1 are parameters.


It is easy to see that P O = -wo and p1 =
-wl. Thus, training neural networks (find-
ing weights) is the same as estimatingthese
parameters.
If the network uses one hdden unit,
the model it develops is the same as a
Table 2 shows the summary of Figure 6 analpc models have a similar next-step three-parameter model:
in terms of average and maximum errors. prediction accuracy. 1
r-l(tr) =
Since the neural-network models' average 1 + ,(PO+Pl 4+Pz h,)
~

errors are above the analyticmodels in the NEURAL NETWORKS VS. ANALYTIC MODELS where PO, PI, and pz are the model pa-
first half by only two to four percent and
rameters, which are determined by
the difference in the second halfis less than In comparing the five analytlc models weights feeding the output unit. In thls
two percent, these two approaches don't and the neural networks in our experi- model, PO= -WO and p 1 = -u1, and pz = -wh
appear to be that different. But worst-case ment, we used the number of parameters
prediction errors may suggest that the an- as a measure of complexity; the more pa- (the weight from the hidden unit). How-
alytlc models have a slight edge over the rameters, the more complex the model. ever, the output of h, is an intermediate
neural-network models. However, the dif- Since we used the cascade-correlation value computed using another two-pa-
ference in overall average errors is less algorithm for evolving network archtec- rameter logistic-function expression:
than two percent, which suggests that h 1
m e , the number of hdden units used to 1-
both the neural-network models and the 1 +?-(U 3+"4 til
learn the problem varied, depending on

58 JULY 1992
Thus, the model has five parameters d y beginning to tap the potential ofneu- recognize that our approach is very new
that correspond to the five weights in the al-network models in reliability, but we and still needs research to demonstrate its
network. believe that &IS class of models will even- practicality on a broad range of software
ually offer significant benefits. We also projects. +
FFN prediiion. In h
s model, for the net-
work with no hidden unit, the equivalent
1 two-parameter model is
ACKNOWLEDGMENTS
We thank IEEE Sofnuare reviewers for their useful comments and suggestions.We also t h a n k Scott
Fahhan for providing the code for his cascade-correlationalgorithm.
where the tr-l is the cumulative execution This research was supported in part by NSFgrant IN-9010546, and in part by a project funded by the
time at the z-lth instant. SDIOflST and monitored by the Office of Naval Research.
For the network with one hidden unit,
the equivalent five-parameter model is REFERENCES
1 1. S.Fahlman and C. Lebiere, The Cascaded-Cxrrelation Learning Architecture,Tech. Report (MU-(3-
MtJ =
+ ,(PO+Pl tr-l+Pz b,) 90-100, CS Dept., Carnegie-Mellon Univ., Pittsburgh, Feb. 1990.
2. D. Rumelhart, G.Hmton, and R. \Villiamns, Leaming Intemal Representations by Error Propagation, in
ParallelDimbuted Pmcessmg, VolumeI, MIIT Press, Cambridge, Mass., 1986,pp. 3 18-162.
I m p l i i n ~ .These expressions imply 3 . Y. Tohma et al., Parameter Esdmation ofthe Hyper-Gometric Distribution Model for Real Test/Debug
that the neural-network approach devel- Data, Tech. Report 901002, CS Dept., ToLyo Inst. ofltchnology, 1990.
4. J. M u s a , A. Iannino, and K. Okunioto, .Sofii,ure Reliability -Measurmrent, U - ~ d h ~Appluutio?rr,
n, ;McGraw-
ops models that can be relatively complex. HiU,NewYork, 1987.
These expressionsalso suggest that neural 5 . Y Mabya, N.Karunanithi, and P. Verina, PredictabilityMeasures for Software Reliability .Wxkk,
networks use models of varying complex- IEEE Trans.Relizbility Eng. (to appear).
ity at different phases of testing. In con- 6. Sojhare Reliability Models: Theowhcal Dmelqwents, Erulirutrona~zJAppIirnnunr, Y. Malaiya and P. Srunani,
eds., IEEE C;S Press, Los Alamitos, Calif., 1990.
trast, the analyttcmodels have only two or
three parameters and their complexity re-
main static. Thus, the main advantage of
neural-network models is that model coni-
plexity is automatically adjusted to the com-
plexity of the failure history. Nachimuthu Karunanithi IS a PhD candidate in computer science at C ~ i i l i i r a dState
i~
University.His research interests are neural ncnrrirks, genetic algorithnis, and sofhvare-
reliability modeling-.
Kanmanithi received a BE in clectric.il enpnccring from PSG Tech., 3ladras Uni-

W e have demonstrated how you can versity, in 1982 and an ME in ciimputer science k0ni Anna Uniremity, hladrds, in 1984.
He is a member of the suhcominittee iin software rehdhility c n + m i n g ofthe IEEF.
use neural-network models and Chnputer Societys.khnical (:onimittcc on Softuare F,nginccring.
training regimes for reliability prediction.
Results with actual testing and debugging
data suggest that neural-network models
are better at endpoint predictions than an-
a l p c models. Though the results pre-
sented here are for only one data set, the
results are consistent with 13 other data Darrell Whitley i s an associate professor of computer science at Colorado State Cni-
sets we tested. versity. He has published inore than 30 papers on neural netu-orksand genetic dgo-
T h e Inajor advantages in using the lithms.
Whitley received an .MSin computer science and a PhD in anthropology, both
neural-network approach are from Southem Illinois University. 1 IC serve.; on the <k)vcrningB o d of the Interna-
+ It is a black-box approach; the user tional Society for Genetichlgorithms and is p r o p m chair ofboth the l W 2 Workshop
need not know much about the underlying on Combinations of Genetic hlgorithm\ and Neural Networks and the 1092 Founda-
failure process of the project. tions of Genetic iUgorithms IVorksh(ip.
+ It is easy to adapt models of varying
complexity at different phases of testing
w i h n a project aswell as across projects Yashwant K. Malaiya is a g u e ~editor
t ofthi?q)rcidl issue. His phiitograph and biography appcar on p. I?.
+ You can simultaneously construct a
model and estimate its parameters if you
use a training algorithm like cascade cor-
relation. Address questions dlxIut this arhck til Kininanithi ar C S Dept., Ci~loradoState Vnhersity, Fort <;ollins, <;O
IVe recognize that our experiments are 80523 ; Intemet kanindniQcs.co~ostate.e(~u.

IEEE SOFTWARE 59

You might also like