You are on page 1of 17

European Journal of Operational Research 116 (1999) 1632

Theory and Methodology

Articial neural networks in bankruptcy prediction:


General framework and cross-validation analysis
Guoqiang Zhang a, Michael Y. Hu b,*
, B. Eddy Patuwo b, Daniel C. Indro b

a
Department of Decision Sciences, College of Business, Georgia State University, Atlanta, GA 30303, USA
b
Graduate School of Management, College of Business Administration, Kent State University, Kent, OH 44240-0001, USA
Received 10 March 1997; accepted 22 December 1997

Abstract

In this paper, we present a general framework for understanding the role of articial neural networks (ANNs) in
bankruptcy prediction. We give a comprehensive review of neural network applications in this area and illustrate the
link between neural networks and traditional Bayesian classication theory. The method of cross-validation is used to
examine the between-sample variation of neural networks for bankruptcy prediction. Based on a matched sample of 220
rms, our ndings indicate that neural networks are signicantly better than logistic regression models in prediction as
well as classication rate estimation. In addition, neural networks are robust to sampling variations in overall classi-
cation performance. 1999 Elsevier Science B.V. All rights reserved.

Keywords: Articial intelligence; Neural networks; Bankruptcy prediction; Classication

1. Introduction depend largely on some restrictive assumptions


such as the linearity, normality, independence
Prediction of bankruptcy has long been an among predictor variables and a pre-existing
important topic and has been studied extensively functional form relating the criterion variable and
in the accounting and nance literature [2,3, predictor variables. These traditional methods
6,16,29,30]. Since the criterion variable is cate- work best only when all or most statistical as-
gorical, bankrupt or nonbankrupt, the problem is sumptions are apt. Recent studies in articial
one of classication. Thus, discriminant analysis, neural networks (ANNs) show that ANNs are
logit and probit models have been typically used powerful tools for pattern recognition and pattern
for this purpose. However, the validity and eec- classication due to their nonlinear nonparametric
tiveness of these conventional statistical methods adaptive-learning properties. ANN models have
already been used successfully for many nancial
problems including bankruptcy prediction [62,67].
*
Corresponding author. Tel.: 001 330 672 2426; fax: 001 330 Many researchers in bankruptcy forecasting
672 2448; e-mail: mhu@kentvm.kent.edu. including Lacher et al. [33], Sharda and Wilson

0377-2217/99/$ see front matter 1999 Elsevier Science B.V. All rights reserved.
PII: S 0 3 7 7 - 2 2 1 7 ( 9 8 ) 0 0 0 5 1 - 4
G. Zhang et al. / European Journal of Operational Research 116 (1999) 1632 17

[57], Tam and Kiang [61], and Wilson and Sharda validation results which will be followed by the
[66] report that neural networks produce signi- nal section containing concluding remarks.
cantly better prediction accuracy than classical
statistical techniques. However, why neural net-
works give superior classication is not clearly 2. Neural networks for pattern classication
explained in the literature. Particularly, the rela-
tionship between neural networks and traditional 2.1. Neural networks
classication theory is not fully recognized [51]. In
this paper, we provide explanation that neural ANNs are exible, nonparametric modeling
network outputs are estimates of Bayesian poste- tools. They can perform any complex function
rior probabilities which play a very important role mapping with arbitrarily desired accuracy [14,23
in the traditional statistical classication and pat- 25]. An ANN is typically composed of several
tern recognition problems. layers of many computing elements called nodes.
In using neural networks, the entire available Each node receives an input signal from other
data set is usually randomly divided into a training nodes or external inputs and then after processing
(in-sample) set and a test (out-of-sample) set. The the signals locally through a transfer function, it
training set is used for neural network model outputs a transformed signal to other nodes or
building and the test set is used to evaluate the nal result. ANNs are characterized by the net-
predictive capability of the model. While this work architecture, that is, the number of layers,
practice is adopted in many studies, the random the number of nodes in each layer and how the
division of a sample into training and test sets may nodes are connected. In a popular form of ANN
introduce bias in model selection and evaluation in called the multi-layer perceptron (MLP), all nodes
that the characteristics of the test may be very and layers are arranged in a feedforward manner.
dierent from those of the training. The estimated The rst or the lowest layer is called the input layer
classication rate can be very dierent from the where external information is received. The last or
true classication rate particularly when small-size the highest layer is called the output layer where
samples are involved. For this reason, it is one of the network produces the model solution. In be-
the major purposes of this paper to use a cross- tween, there are one or more hidden layers which
validation scheme to accurately describe predictive are critical for ANNs to identify the complex
performance of neural networks. Cross-validation patterns in the data. All nodes in adjacent layers
is a resampling technique which uses multiple are connected by acyclic arcs from a lower layer to
random training and test subsamples. The advan- a higher layer. A multi-layer perceptron with one
tage of cross-validation is that all observations or hidden layer and one output node is shown in
patterns in the available sample are used for test- Fig. 1. This three-layer MLP is a commonly used
ing and most of them are also used for training the ANN structure for two-group classication prob-
model. The cross-validation analysis will yield lems like the bankruptcy prediction. We will focus
valuable insights on the reliability of the neural on this particular type of neural networks
networks with respect to sampling variation. throughout the paper.
The remainder of the paper will be organized as Like in any statistical model, the parameters
follows. In Section 2, we give a brief description of (arc weights) of a neural network model need to be
neural networks and a general discussion of the estimated before the network can be used for
Bayesian classication theory. The link between prediction purposes. The process of determining
neural networks and the traditional classication these weights is called training. The training phase
theory is also presented. Following that is a survey is a critical part in the use of neural networks. For
of the literature in predicting bankruptcy using classication problems, the network training is a
neural networks. The methodology section con- supervised one in that the desired or target re-
tains the variable description, the data used and sponse of the network for each input pattern is
the design of this study. We then discuss the cross- always known a priori.
18 G. Zhang et al. / European Journal of Operational Research 116 (1999) 1632

Fig. 1. A typical fully connected feedforward neural network (MLP) used for two-group classication problems.

During the training process, patterns or exam- f1 x f2 x 1 ex 1 : 2


ples are presented to the input layer of a network.
The activation values of the input nodes are The purpose of network training is to estimate
weighted and accumulated at each node in the the weight matrices in Eq. (1) such that an overall
hidden layer. The weighted sum is transferred by error measure such as the mean squared errors
an appropriate transfer function into the node's (MSE) or sum of squared errors (SSE) is mini-
activation value. It then becomes an input into the mized. MSE can be dened as
nodes in the output layer. Finally an output value 1X N
2
is obtained to match the desired value. The aim of MSE aj yj ; 3
N j1
training is to minimize the dierences between the
ANN output values and the known target values where aj and yj represent the target value and
for all training patterns. network output for the jth training pattern re-
Let x x1 ; x2 ; . . . ; xn be an n-vector of pre- spectively, and N is the number of training pat-
dictive or attribute variables, y be the output from terns.
the network, w1 and w2 be the matrices of linking From this perspective, network training is an
weights from input to hidden layer and from hid- unconstrained nonlinear minimization problem.
den to output layer, respectively. Then a three- The most popular algorithm for training is the
layer MLP is in fact a nonlinear model of the form well-known backpropagation [54] which is basi-
cally a gradient steepest descent method with a
y f2 w2 f1 w1 x; 1
constant step size. Due to problems of slow con-
where f1 and f2 are the transfer functions for hidden vergence and ineciency with the steepest descent
node and output node, respectively. The most method, many variations of backpropagation have
popular choice for f1 and f2 is the sigmoid function: been introduced for training neural networks
G. Zhang et al. / European Journal of Operational Research 116 (1999) 1632 19

[5,13,41]. Recently, Hung and Denton [27] and where j 1, 2. Using Bayes rule, the posterior
Subramanian and Hung [59] have proposed to use probability is
a general-purpose nonlinear optimizer, GRG2, in
training neural networks. The benets of GRG2 f xjxj P xj
P xj jx ;
have been reported in the literature for many f xjx1 P x1 f xjx2 P x2
classication problems [35,42,59]. This study uses j 1; 2: 4
a GRG2 based system to train neural networks.
For a two-group classication problem, only The Bayes decision rule in classication is a
one output node is needed. The output values from criterion such that the overall misclassication
the neural network (the predicted outputs) are error rate is minimized. The misclassication rate
used for classication. For example, a pattern is for a given x is
classied into group 1 if the output value is greater
than 0.5, and into group 2 otherwise. It has been P xi jx 1 P xj jx
shown that the least squares estimate as in the if x belongs to xj ; i; j; 1; 2:
neural networks used in this study yields the pos-
terior probability of the optimal Bayesian classier Thus, the Bayesian classication rule can be stated
[51]. In other words, outputs of neural networks as
are estimates of the Bayesian posterior probabili-
ties [28]. As will be discussed in the following Assign x to group k if
section, most classication procedures rely on 1 P xk jx min1 P xj jx
posterior probabilities to classify observations into j

groups. or equivalently

Assign x to group k if
2.2. Neural networks and Bayesian classiers
P xk jx max P xj jx: 5
j
While neural networks have been successfully
It is now clear that the Bayesian classication rule
applied to many classication problems, the rela-
is based on the posterior probabilities. In the case
tionship between neural networks and the con-
that f xjxj (j 1, 2) are all normal distributions,
ventional classication methods is not fully
the above Bayesian classication rule leads to
understood in most applications. In this section,
the well-known linear or quadratic discriminant
we rst give a brief overview of the Bayesian
function. See [15] for a detailed discussion.
classiers. Then the link between neural networks
To see the relationship between neural net-
and Bayesian classiers is discussed.
works and Bayesian classiers, we need the fol-
Statistical pattern recognition (classication)
lowing theorem [40].
can be established through Bayesian decision the-
ory [15]. In classication problems, a random
pattern or observation x 2 Rn is given and then a Theorem 1. Consider the problem of predicting y
decision about its membership is made. Let x be from x, where x is an n-vector random variable and
the state of nature with x x1 for group 1 and y is a random variable. The function mapping
x x2 for group 2. Dene F : x ! y which minimizes the squared expected
error
P xj prior probability for an observation
Ey F x2 6
x belonging to group j;
is the conditional expectation of y given x,
f xjxj conditional probability density function
for x given that the pattern belongs to group j; F x Eyjx: 7
20 G. Zhang et al. / European Journal of Operational Research 116 (1999) 1632

The result stated in the above theorem is the Following [38], a number of studies further in-
well-known least-squares estimation theory in vestigate the use of ANNs in bankruptcy or busi-
statistics. ness failure prediction. For example, Rahimian et
In classication context, if x is the observed al. [49] test the same data set used by Odom and
attribute vector and y is the true membership Sharda [38] using three neural network paradigms:
value, that is, y 1 if x 2 group 1; y 0 if x 2 backpropagation network, Athena and Percep-
group 2, then F(x) becomes tron. A number of network training parameters
are varied to identify the most ecient training
F x Eyjx 1P y 1jx 0P y 0jx paradigm. The focus of this study is mainly on the
P y 1jx P x1 jx: 8 improvement in eciency of the backpropagation
algorithm. Coleman et al. [12] also report im-
Eq. (8) shows that the least-squares estimate for
proved accuracy over that of Odom and Sharda
the mapping function in classication problem is
[38] by using their NeuralWare ADSS system.
exactly the Bayesian posterior probability.
Salchenberger et al. [55] present an ANN ap-
As mentioned earlier, neural networks are uni-
proach to predicting bankruptcy of savings and
versal function approximators. A neural network
loan institutions. Neural networks are found to
in a classication problem can be viewed as a
perform as well as or better than logit models
mapping function, F : Rn R (see Eq. (2)), where
an n-dimensional input x is submitted to the net- across three dierent lead times of 6, 12 and 18
work and a network output y is obtained to make months. To test the sensitivity of the network to
dierent cuto values in classication decision,
the classication decision. If all the data in the
entire population are available for training, then they compare the results for the threshold of 0.5
Eqs. (3) and (6) are equivalent and the neural and 0.2. The information is useful when one ex-
networks produce the exact posterior probabilities pects dierent costs related to Type I and Type II
in theory. In practice, however, training data are errors.
almost always a sample from an unknown popu- Tam and Kiang's paper [61] has had a greater
lation. Thus it is clear that the network output is impact on the use of ANNs in general business
actually the estimate of posterior probability, i.e. y classication problems as well as in the applica-
tion of bankruptcy predictions. Based on [60], they
estimates P x1 jx.
provide a detailed analysis of the potentials and
limitations of neural network classiers for busi-
3. Bankruptcy prediction with neural networks ness research. Using bank bankruptcy data, they
compare neural network models to statistical
ANNs have been studied extensively as a useful methods such as linear discriminant analysis,
tool in many business applications including logistic regression, k nearest neighbor and ma-
bankruptcy prediction. In this section, we present chine learning method of decision tree. Their re-
a rather comprehensive review of the literature on sults show that neural networks are generally
the use of ANNs in bankruptcy prediction. more accurate and robust for evaluating bank
The rst attempt to use ANNs to predict status.
bankruptcy is made by Odom and Sharda [38]. In Wilson and Sharda [66] and Sharda and Wilson
their study, three-layer feedforward networks are [57] propose to use a rigorous experimental design
used and the results are compared to those of methodology to test ANNs' eectiveness. Three
multi-variate discriminant analysis. Using dierent mixture levels of bankrupt and nonbankrupt rms
ratios of bankrupt rms to nonbankrupt rms in for training set composition with three mixture
training samples, they test the eects of dierent levels for test set composition yield nine dierent
mixture level on the predictive capability of neural experimental cells. Within each cell, resampling
networks and discriminant analysis. Neural net- scheme is employed to generate 20 dierent pairs
works are found to be more accurate and robust in of training and test samples. The results more
both training and test results. convincingly show the advantages of ANNs rela-
G. Zhang et al. / European Journal of Operational Research 116 (1999) 1632 21

tive to discriminant analysis and other statistical rms in the training and testing data sets, the
methods. variables used in the models, and the relative cost
With a very small sample size (18 bankrupt and of Type I and Type II errors. Boritz and Kennedy
18 nonbankrupt rms), Fletcher and Goss [19] [8,9] also investigate the eectiveness of several
employ an 18-fold cross-validation method for types of neural networks for bankruptcy predic-
model selection. Although the training eort for tion problems. Dierent types of ANNs do have
building ANNs is much higher, ANNs yield much varying eects on the levels of Type I and Type II
better model tting and prediction results than the errors. For example, the optimal estimation theory
logistic regression. based network has the lowest Type I error level
In a large scale study, Altman et al. [4] use over and the highest Type II error level and back-
1000 Italian industrial rms to compare the pre- propagation networks have intermediate levels of
dictive ability of neural network models with that Type I and II errors while traditional statistical
of linear discriminant analysis. Both discriminant approaches generally have high Type I error and
analysis and neural networks produce comparable low Type II error levels. They also nd that the
accuracy on holdout samples with discriminant performance of ANNs is sensitive to the choice of
analysis producing slightly better predictions. As variables and sampling errors.
discussed in the paper, neural networks have po- Kryzanouski and Galler [32] employ the
tential capabilities for recognizing the health of Boltzman machine to evaluate the nancial state-
companies, but the black-box approach of neural ments of 66 Canadian rms over seven years.
networks needs further studies. Fourteen nancial ratios are used in the analysis.
Poddig [44] reports the results from an ongoing The results indicate that the Boltzman machine is
study of bankruptcy prediction using two types of an eective tool for neural networks model build-
neural networks. The MLP networks with three ing. Increasing the training sample size has positive
dierent data preprocessing methods give overall impact on the accuracy of neural networks.
better and more consistent results than those of Leshno and Spector [36] evaluate the prediction
discriminant analysis. The use of an extension of capability of various ANN models with dierent
Kohonen's learning vector quantizer, however, data span, neural network architecture and the
does not show the same promising results as the number of iterations. Their main conclusions are
MLP. Kerling [31], in a related study, compares (1) the prediction capability of the model depends
bankruptcy prediction between France and USA. on the sample size used for training; (2) dierent
He reports that there is no signicant dierence in learning techniques have signicant eects on both
the correct classication rates for both American model tting and test performance; and (3) over-
and French companies although dierent ac- tting problems are associated with large number
counting rules and nancial ratios are employed. of iterations.
Brockett et al. [10] introduce a neural network Lee et al. [34] propose and compare three
model as an early warning system for predicting hybrid neural network models for bankruptcy
insurer insolvency. Compared to discriminant prediction. These hybrid models combine statisti-
analysis and other insurance ratings, neural net- cal techniques such as multi-variate discrimi-
works have better predictability and generaliz- nant analysis (MDA) and ID3 method with neural
ability, which suggests that neural networks can be networks or combine two dierent neural net-
a useful early warning system for solvency moni- works. Using Korean bankruptcy data, they show
toring and prediction. that the hybrid systems provide signicant better
Boritz et al. [9] use the algorithms of back- predictions than benchmark models of MDA and
propagation and optimal estimation theory in ID3 and the hybrid model of unsupervised net-
training neural networks. The benchmark models work and supervised network has the best per-
by Altman [2] and Ohlson [39] are employed. Re- formance.
sults show that the performance of dierent clas- Most studies use the backpropagation algo-
siers depends on the proportions of bankrupt rithm [11,38,55,61,64,66] or its variations [43,49] in
22 G. Zhang et al. / European Journal of Operational Research 116 (1999) 1632

training neural networks. It is well known that variables in their study. Piramuthu et al. [43] use
training algorithms such as the backpropagation 12 continuous variables and three nominal vari-
have many undesirable features. Piramuthu et al. ables. Alici [1] employs two sets of nancial ratios.
[43] address the eciency of network training al- The rst set of 28 ratios is suggested by prole
gorithms. They nd that dierent algorithms do analysis while the second set of nine variables is
have eects on the performance of ANNs in sev- obtained by using principal component analysis.
eral risk classication applications. Coats and Boritz and Kennedy [9] test the neural networks
Fant [11] and Lacher et al. [33] use a training with Ohlson's nine and 11 variables as well as
method called ``Cascade-Correlation'' in a bank- Altman's ve variables. Rudorfer [53] selects ve
ruptcy prediction analysis. Compared to MDA or nancial ratios from a company's balance sheet. It
Altman's Z score model, ANNs provide signi- is interesting to note that in the literature one
cantly better discriminant ability. Fanning and study uses as many as 41 independent variables
Cogger [18] compare the performance of a gener- [36] while Fletcher and Goss [19] and Fanning and
alized adaptive neural network algorithm (GAN- Cogger [18] use only three variables.
NA) and a backpropagation network. They nd In order to detect maximal dierence between
that GANNA and backpropagation algorithm are bankrupt and nonbankrupt rms, many studies
comparable in terms of the predictive capability employ matched samples based on some common
but GANNA saves them time and eort in build- characteristics in their data collection process.
ing an appropriate network structure. Raghupathi Characteristics used for this purpose include asset
[47] conducts an exploratory study to compare or capital size and sales [19,36,63], industry cate-
eight alternative neural network training algo- gory or economic sector [48], geographic location
rithms in the domains of bankruptcy prediction. [55], number of branches, age, and charter status
He nds that the Madaline algorithm is the best in [61]. This sample selection procedure implies that
terms of correct classications. However, com- sample mixture ratio of bankrupt to nonbankrupt
paring the Madaline with the discriminant analysis rms is 50% to 50%.
model shows no signicant advantage of one over Most researchers in bankruptcy prediction
the other. Lenard et al. [35] rst apply the gener- using neural networks focus on the relative per-
alized reduced gradient (GRG2) optimizer for formance of neural networks over other classical
neural network training in an auditor's going statistical techniques. While empirical studies
concern assessment decision model. Using GRG2 show that ANNs produce better results for many
trained neural networks results in better perfor- classication or prediction problems, they are not
mance in terms of classication rates than using always uniformly superior [46]. Bell et al. [7] report
backpropagation-based networks. disappointing ndings in applying neural networks
Based on the pioneering work by Altman [2], for predicting commercial bank failures. Boritz
most researchers simply use the same set of ve and Kennedy [9] have found in their study that
predictor variables as in Altman's original model ANNs perform reasonably well in predicting
[11,33,38,49,57,66]. These nancial ratios are (1) business failure but their performance is not in any
working capital/total assets; (2) retained earnings/ systematic way superior to conventional statistical
total assets; (3) earnings before interest and taxes/ techniques such as logit and discriminant analysis.
total assets; (4) market value equity/book value of As the authors discussed that there are many fac-
total debt; (5) sales/total assets. Other predictor tors which can aect the performance of ANNs.
variables are also employed. For example, Rag- Factors in the ANN model building process such
hupathi et al. [48] use 13 nancial ratios previously as network topology, training method and data
used successfully in other bankruptcy prediction transformation are well known. On top of these
studies. Salchengerger et al. [55] initially select 29 ANN related factors, other data related factors
variables and perform stepwise regression to de- include the choice of predictor variables, sample
termine the nal ve predictors used in neural size and mixture proportion. It should be pointed
networks. Tam and Kiang [61] choose 19 nancial out that in most studies, commercial neural net-
G. Zhang et al. / European Journal of Operational Research 116 (1999) 1632 23

work packages are used, which do restrict the users ability of a rm in using liquid assets to cover short
from obtaining a clear understanding of the sen- term obligations. This ratio is believed to have a
sitivity of solutions with respect to initial starting signicant inuence on the likelihood of a rm's
conditions. ling for bankruptcy.
A sample of manufacturing rms that have
led for bankruptcy from 1980 through 1991 is
4. Design of the study selected from the pool of publicly traded rms in
the United States on New York, American and
ANNs are used to study the relationship be- NASDAQ exchanges. These cuto dates for the 12
tween the likelihood of bankruptcy and the rele- year sample period ensure that the provisions of
vant nancial ratios. Two important questions the 1978 Bankruptcy Reform Act have been fully
need to be addressed: implemented and that the disposition of all bank-
What is the appropriate neural network archi- rupt rms in the sample can be established by the
tecture for a particular data set? 1994 year end. An extensive search of bankrupt
How robust the neural network performance is rms is made of the list provided by the Oce of
in predicting bankruptcy in terms of sampling the General Counsel of the Security Exchange
variability? Commission (SEC) and non-SEC sources such as
For the rst question, there are no denite rules to the Wall Street Journal Index and the Commerce
follow since the choice of architecture also depends House's Capital Changes Reporter as well as the
on the classication objective. For example, if the COMPUSTAT research tapes. Company descrip-
objective is to classify a given set of objects as well tions and characteristics required for the identi-
as possible, then a larger network may be desir- cation of ling dates are obtained from LEXIS/
able. On the other hand, if the network is to be NEXIS news reports as well as other SEC lings.
used to predict the classication of unseen objects, The initial search has netted a sample of 396
then a larger network is not necessarily better. For manufacturing rms that have led for bankrupt-
the second question, we employ a vefold cross- cy. The following editing procedures are further
validation approach to investigate the robustness implemented to remove sources of confounding in
of the neural networks in bankruptcy prediction. the sample. Firms that (1) have operated in a
This section will rst dene variables and the data regulated industry; (2) are foreign based and
used in this study. Then a detailed description of traded publicly in the US; and (3) have led
the issues in our neural network model building is bankruptcy previously are excluded from the
given. Finally, we illustrate cross-validation sample. These sample screenings result in a total of
methodology used in the study. 110 bankrupt manufacturing rms.
In order to highlight the eects of key nancial
characteristics on the likelihood that a rm may go
4.1. Measures and sample bankrupt, a matched sample of non-bankrupt
rms is selected. Financial information for the
As described in the previous section, most three years immediately preceding bankruptcy is
neural network applications to bankruptcy prob- obtained from the COMPUSTAT database. Non-
lems employ the ve variables used by Altman [2] bankrupt rms are selected to match with the 110
and often a few other variables are also injected bankrupt rms in our sample on two key charac-
into the model. This study utilizes a total of six teristics: two-digit Standard Industrial Classica-
variables. The rst ve are the same as those in tion code and size. Size corresponds to the total
Altman's study working capital/total assets, re- assets of a bankrupt rm in the rst of the three
tained earnings/total assets, earnings before inte- years before bankruptcy ling. The six nancial
rest and tax/total assets, market value of equity/ ratios for the year immediately before the ling of
total debt, and sales/total assets. The sixth vari- bankruptcy are constructed as independent vari-
able, current assets/current liabilities, measures the ables in this study. In summary, we obtained a
24 G. Zhang et al. / European Journal of Operational Research 116 (1999) 1632

matched sample of 220 rms with 110 observations most problems including classication. All net-
each in the bankrupt and nonbankrupt group. works used in this study will have one hidden
layer. For classication problems, the number of
input nodes is the number of predictor variables
4.2. Design of neural network model which can be specied by the particular applica-
tion. For example, in our bankruptcy prediction
Currently there are no systematic principles to model, the networks will have six input nodes in
guide the design of a neural network model for a the rst layer corresponding to six predictor vari-
particular classication problem although heuris- ables. Node biases will be used in the output nodes
tic methods such as the pruning algorithm [50], the and logistic activation function will be specied in
polynomial time algorithm [52], and the network the networks. In order to attain greater exibility
information technique [65] have been proposed. in modeling a variety of functional forms, direct
Since many factors such as hidden layers, hidden connections from the input layer to the output
nodes, data normalization and training method- layer will be added (see Fig. 2).
ology can aect the performance of neural net- The number of hidden nodes is not easy to
works, the best network architecture is typically determine a priori. Although there are several rules
chosen through experiments. In this sense, neural of thumb suggested for determining the number of
network design is more an art than a science. hidden nodes, such as using n/2, n, n + 1 and
ANNs are characterized by their architectures. 2n + 1 where n is the number of input nodes, none
Network architecture refers to the number of lay- of them works well for all situations. Determining
ers, nodes in each layer and the number of arcs. the appropriate number of hidden nodes usually
Based on the results from [14,23,37,42], networks involves lengthy experimentation since this pa-
with one hidden layer is generally sucient for rameter is problem and/or data dependent. Huang

Fig. 2. A complete connected neural network used in this study (direct link from input nodes to the output node).
G. Zhang et al. / European Journal of Operational Research 116 (1999) 1632 25

and Lippmann [26] point out that the number of ples of training and test sets. The training sample is
hidden nodes to use depends on the complexity of used for model tting and/or parameter estimation
the problem at hand. More hidden nodes are and the predictive eectiveness of the tted model
called for in complex problems. The issue of the is evaluated using the test sample. Because the best
number of hidden nodes also depends on the ob- model is tailored to t one subsample, it often es-
jective of classication. If the objective is to clas- timates the true error rate overly optimistically
sify a given set of observations in the training [17]. This problem can be eased by using the so-
sample as well as possible, a larger network may be called vefold cross-validation, that is, carrying
desirable. On the other hand, if the network is used out the simple cross-validation ve times. A good
to predict classication of unseen objects in the introduction to ideas and methods of cross-vali-
test sample, then a larger network is not neces- dation can be found in [20,58].
sarily appropriate [42]. To see the eect of hidden Two cross-validation schemes will be imple-
nodes on the performance of neural network mented. First, as in most neural networks classi-
classiers, we use 15 dierent levels of hidden cation problems, arc weights from the training
nodes ranging from 1 to 15 in this study. sample will be applied to patterns in the test
Another issue in neural networks is the scaling sample. In this study, a vefold cross-validation is
of the variables before training. This so-called data used. We split the total sample into ve equal and
preprocessing is claimed by some authors to be mutually exclusive portions. Training will be con-
benecial for the training of the network. Based on ducted on any four of the ve portions. Testing
our experience (Shanker et al. [56] and also a will then be performed on the remaining part. As a
preliminary study for this project), data transfor- result, ve overlapping training samples are con-
mation is not very helpful for the classication structed and testing is also performed ve times.
task. Raw data are hence used without any data The average test classication rate over all ve
manipulation. partitions is a good indicator for the out-of-sample
As discussed earlier, neural network training is performance of a classier. Second, to have a
essentially a nonlinear nonconvex minimization better picture of the predictive capability of the
problem and mathematically speaking, global classier for the unknown population, we also test
solutions cannot be guaranteed. Although our each case using the whole data set. The idea behind
GRG2 based training system is more ecient than this scheme is that the total sample should be more
the backpropagation algorithm [27], it cannot representative of the population than a small test
completely eliminate the possibility of encounter- set which is only one fth of the whole data set. In
ing local minima. To decrease the likelihood of addition, when the whole data set is employed as
being trapped in bad local minima, we train each the test sample, sampling variation in the testing
neural network 50 times by using 50 sets of ran- environment is completely eliminated since the
domly selected initial weights and the best solution same sample is tested ve dierent times. The
of weights among the 50 runs is retained for a variability across ve test results reects only the
particular network architecture. eect of training samples.
The results from neural networks will be com-
pared to those of logistic regression. We choose
4.3. Cross-validation this technique because it has been shown that the
logistic regression is often preferred over discrim-
The cross-validation methodology is employed inant analysis in practice [22,45]. Furthermore, the
to examine the neural network performance in statistical property of logistic regression is well
bankruptcy prediction in terms of sampling vari- understood. We would like to know which method
ation. Cross-validation is a useful statistical tech- gives better estimates of the posterior probabili-
nique to determine the robustness of a model. One ties and hence leads to better classication results.
simple use of the cross-validation idea is consisted Since logistic regression is a special case of
of randomly splitting a sample into two subsam- the neural network without hidden nodes, it is
26 G. Zhang et al. / European Journal of Operational Research 116 (1999) 1632

expected in theory that ANNs will produce more overall classication rate of neural networks
accurate estimates than logistic regression partic- ranges from 77.27% to 84.09% while logistic re-
ularly in the training sample. Logistic regression is gression yields classication rates ranging from
implemented using SAS procedure LOGISTIC. 75% to 81.82%. However, for each category of
bankruptcy and nonbankruptcy, the results indi-
cate no clear patterns. For some subsamples,
5. Results neural networks predict much better than logistic
regression. For others, logistic regression is better.
Table 1 gives the results for the eect of hidden Table 3 gives the pairwise comparison for these
nodes on overall classication performance for two methods in prediction performance. Overall,
both training and small test sets across ve sub- neural networks are better than logistic regres-
samples. In general, as expected, one can see when sion and the dierence of 2.28% is statistically
the number of hidden nodes increases, the overall signicant at 5% level (p-value is 0.0342). For
classication rate in the training sets increases. bankruptcy prediction, neural networks give an
This shows the neural network powerful capability average of 81.82% over the ve subsamples, higher
of approximating any function as more hidden than 78.18% achieved by logistic regression. For
nodes are used. However, as more hidden nodes nonbankruptcy prediction, average neural net-
are added, the neural network becomes more work classication rate is 76.09%, lower than av-
complex which may cause the network to learn erage logistic regression classication rate of
noises or idiosyncrasies in addition to the under- 78.18%. Paired t-test results show that the dier-
lying rules or patterns. This is recognized as the ence between ANNs and logistic regression is not
notorious model overtting or overspecication signicant in the prediction of bankrupt and
problem [21]. For neural networks, obtaining a nonbankrupt rms.
model that ts the training sample very well is Tables 4 and 5 show the superiority of ANNs
relatively easy if we increase the complexity of a over logistic regression in estimating the true
network by, for example, increasing the number of classication rate for the large test set. As we have
hidden nodes. However, such a large network may indicated previously, the large test set is basically
have poor generalization capability, that is, it re- the available whole sample data which is consisted
sponds incorrectly to other patterns not used in the of a small test sample and a training sample.
training process. It is not easy to know a priori Hence, the correct classication rates in Table 4
when overtting occurs. One practical way to see for the large test set are derived directly from the
this is through the test samples. From Table 1, the results for both small test sample and training
best predictive results in test samples are not nec- sample. For example, for training sample 1, the
essarily those with the larger number of hidden total number of correctly classied rms in the
nodes. In fact, neural classiers with nine or 10 large test set is 191 which is equal to the best small
hidden nodes produce the highest classication test result (35) plus the corresponding training
rates in test samples except for subsample 4 where result (156).
the best test performance is achieved at four hid- For large test set, ANNs provide consistently
den nodes. not only higher overall classication rates but also
For small test sets, cross-validation results on higher classication rates for each category of
the predictive performance for both neural net- bankrupt and nonbankrupt rms across ve
work models and logistic regression are given training samples. Furthermore, ANNs are more
in Table 2. This table shows that the overall robust than logistic regression in estimating the
classication rates of neural networks are con- overall classication rate across ve training
sistently higher than those of logistic regression. In samples. This is evidenced from the overall clas-
addition, neural networks seem to be as robust as sication rate of 86.82% for each of the subsam-
logistic regression in predicting the overall classi- ples 1, 2 and 5, 87.73% for subsample 3, and 85%
cation rate. Across the ve small test subsamples, for subsample 4. Results of paired t-test in Table 5
Table 1
a
The eect of hidden nodes on overall classication results for training and small test sets
Hidden Subsample 1 Subsample 2 Subsample 3 Subsample 4 Subsample 5
nodes
b c
Training Test Training Test Training Test Training Test Training Test
1 139 (78.98) 31 (70.46) 143 (81.25) 36 (81.82) 142 (80.68) 37 (84.09) 145 (82.39) 31 (70.46) 142 (80.68) 31 (70.46)
2 154 (87.50) 28 (63.64) 143 (81.25) 36 (81.82) 142 (80.68) 35 (79.55) 147 (83.52) 34 (77.27) 145 (82.39) 30 (68.18)
3 150 (85.23) 27 (61.36) 146 (82.96) 33 (75.00) 142 (80.68) 37 (84.09) 151 (85.80) 34 (77.27) 143 (81.25) 30 (68.18)
4 148 (84.09) 26 (59.09) 144 (81.82) 27 (61.36) 147 (83.52) 35 (79.55) 152 (86.36) 35 (79.55) 153 (86.93) 32 (72.73)
5 147 (83.52) 29 (65.91) 145 (82.39) 31 (70.46) 146 (82.96) 36 (81.82) 147 (83.52) 32 (72.73) 146 (82.96) 33 (75.00)
6 154 (87.50) 31 (70.46) 154 (87.50) 32 (72.73) 153 (86.93) 37 (84.09) 154 (87.50) 33 (75.00) 154 (87.50) 31 (70.46)
7 156 (88.64) 33 (75.00) 155 (88.07) 35 (79.55) 152 (86.36) 37 (84.09) 154 (87.50) 33 (75.00) 154 (87.50) 31 (70.46)
8 156 (88.64) 29 (65.91) 156 (88.64) 30 (68.18) 153 (86.93) 36 (81.82) 155 (88.07) 28 (63.64) 153 (86.93) 33 (75.00)
9 156 (88.64) 35 (79.55) 155 (88.07) 36 (81.82) 156 (88.64) 35 (79.55) 156 (88.64) 33 (75.00) 156 (88.64) 32 (72.73)
10 158 (89.77) 26 (59.09) 157 (89.21) 31 (70.46) 156 (88.64) 37 (84.09) 156 (88.64) 29 (65.91) 157 (89.21) 34 (77.27)
11 159 (90.34) 24 (54.55) 158 (89.77) 35 (79.55) 157 (89.21) 35 (79.55) 156 (88.64) 30 (68.18) 157 (89.21) 29 (65.91)
12 159 (90.34) 27 (61.36) 159 (90.34) 35 (79.55) 156 (88.64) 34 (77.27) 159 (90.34) 32 (72.73) 155 (88.07) 34 (77.27)
13 159 (90.34) 23 (52.27) 159 (90.34) 26 (59.09) 158 (89.77) 34 (77.27) 157 (89.21) 32 (72.73) 158 (89.77) 32 (72.73)
14 161 (91.48) 26 (59.09) 159 (90.34) 34 (77.27) 157 (89.21) 33 (75.00) 159 (90.34) 31 (70.46) 157 (89.21) 33 (75.00)
15 160 (90.91) 29 (65.91) 160 (90.91) 32 (72.73) 159 (90.34) 33 (75.00) 160 (90.91) 28 (63.64) 158 (89.77) 31 (70.46)
a
The number in the table is the number of correctly classied; percentage is given in bracket.
b
Training sample size is 176.
c
Test sample size is 44.

Table 2
a
Cross-validation results on the predictive performance for small test set
G. Zhang et al. / European Journal of Operational Research 116 (1999) 1632

b
Method Subsample 1 Subsample 2 Subsample 3 Subsample 4 Subsample 5
B NB Overall B NB Overall B NB Overall B NB Overall B NB Overall
Neural 15 20 35 20 16 36 20 17 37 18 17 35 17 17 34
network (68.18) (90.91) (79.55) (90.91) (72.73) (81.82) (90.91) (77.27) (84.09) (81.82) (77.27) (79.55) (77.27) (77.27) (77.27)
Logistic 18 16 34 17 17 34 17 19 36 18 17 35 16 17 33
regression (81.82) (72.73) (77.27) (77.27) (77.27) (77.27) (77.27) (86.36) (81.82) (81.82) (77.27) (79.55) (72.73) (77.27) (75.00)
a
The number in the table is the number of correctly classied; percentage is given in bracket.
b
27

B stands for bankruptcy group; NB stands for nonbankruptcy group.


28 G. Zhang et al. / European Journal of Operational Research 116 (1999) 1632

Table 3 general framework for understanding the role of


Pairwise comparison between ANNs and logistic regression for neural networks for this problem. While tradi-
small test set
tional statistical methods work well for some
Statistics Overall Bankrupt Nonbankrupt situations, they may fail miserably when the sta-
ANN Logistic ANN Logistic ANN Logistic tistical assumptions are not met. ANNs are a
Mean 80.46 78.18 81.82 78.18 76.09 78.18 promising alternative tool that should be given
t-statistic 3.1609 0.7182 0.1963 much consideration when solving real problems
p-value 0.0342 0.5124 0.8539
like bankruptcy prediction.
The application of neural networks has been
reported in many recent studies of bankruptcy
clearly show that the dierences between ANNs prediction. However, the mechanism of neural
and logistic regression in the overall and individual networks in predicting bankruptcy or in general
class classication rates are statistically signicant classication is not well understood. Without a
at the 0.05 level. The dierences in overall, bank- clear understanding of how neural networks op-
ruptcy and nonbankruptcy classication rate are erate, it will be dicult to reap full potentials of
8.36%, 11.27% and 4.91%, respectively. this technique. This paper attempts to bridge the
Comparing the results for small test sets in gap between the theoretical development and the
Table 2 and those for large test sets in Table 4, we real world applications of ANNs.
make the following two observations. First, the It has already been theoretically established
variability in results across the ve large test that outputs from neural networks are estimates of
samples is much smaller than that of the small test posterior probabilities. Posterior probabilities are
set. This is to be expected as we pointed out earlier important not only for traditional statistical deci-
that the large test set is the same for each of the sion theory but also for many managerial decision
ve dierent training sets and the variability in the problems. Although there are many estimation
test results reects only the dierence in the procedures for posterior probabilities, ANNs is
training set. Second, the performance of logistic the only known method which estimates posterior
regression models is stable, while the neural net- probabilities directly when the underlying group
work performance improves signicantly, from population distributions are unknown. Based on
small test sets to large test sets. The explanation the results in this study and [28], neural networks
lies in the fact that neural networks have much with their exible nonlinear modeling capability
better classication rates in the training samples. do provide more accurate estimates, leading to
Tables 6 and 7 list the training results of neural higher classication rates than other traditional
networks and logistic regression. The training re- statistical methods. The impact of the number of
sults for neural networks are selected according to hidden nodes and other factors in neural network
the best overall classication rate in the small test design on the estimation of posterior probabilities
set. Neural networks perform consistently and is a fruitful area for further research.
signicantly better for all cases. The dierences This study used a cross-validation technique to
between ANNs and logistic regression in overall, evaluate the robustness of neural classiers with
bankruptcy and nonbankruptcy classication are respect to sampling variation. Model robustness
9.54%, 13.18% and 5.90%, respectively. has important managerial implications particularly
when the model is used for prediction purposes. A
useful model is the one which is robust across dif-
6. Summary and conclusions ferent samples or time periods. The cross-valida-
tion technique provides decision makers with a
Bankruptcy prediction is a class of interesting simple method for examining predictive validity.
and important problems. A better understanding Two schemes of vefold cross-validation method-
of the causes will have tremendous nancial and ology are employed. Results show that neural
managerial consequences. We have presented a networks are in general quite robust. It is encour-
Table 4
a
Cross-validation results on the estimation of true classication rates for large test set
b
Method Subsample 1 Subsample 2 Subsample 3 Subsample 4 Subsample 5
B NB Overall B NB Overall B NB Overall B NB Overall B NB Overall
Neural 95 96 191 98 93 191 102 91 193 94 93 187 97 94 191
network (86.36) (87.27) (86.82) (89.09) (84.55) (86.82) (92.73) (82.73) (87.73) (85.45) (84.55) (85.00) (88.18) (85.45) (86.82)
Logistic 87 86 173 87 89 176 83 89 172 86 88 174 81 88 169
regression (79.09) (78.18) (78.64) (79.09) (80.91) (80.00) (75.45) (80.91) (78.18) (78.18) (80.00) (79.09) (73.64) (80.00) (76.82)
a
The number in the table is the number of correctly classied; percentage is given in bracket.
b
B stands for bankruptcy group; NB stands for nonbankruptcy group.

Table 5
Pairwise comparison between ANNs and logistic regression for large test set
Overall Bankrupt Nonbankrupt
Statistics ANN Logistic ANN Logistic ANN Logistic
Mean 86.64 78.55 88.36 77.09 84.91 80.00
t-statistic 10.3807 5.6211 4.0737
p-value 0.0005 0.0049 0.0152

Table 6
a
Comparison of ANNs vs. logistic regression on training sample
G. Zhang et al. / European Journal of Operational Research 116 (1999) 1632

b
Method Subsample 1 Subsample 2 Subsample 3 Subsample 4 Subsample 5
B NB Overall B NB Overall B NB Overall B NB Overall B NB Overall
Neural 80 76 156 78 77 155 82 74 156 76 76 152 80 77 157
network (90.91) (86.36) (88.64) (88.64) (87.50) (88.07) (93.18) (84.09) (88.64) (86.36) (86.36) (86.36) (90.91) (87.50) (89.20)
Logistic 69 70 139 70 72 142 66 70 136 68 71 139 65 71 136
regression (78.41) (79.55) (78.98) (79.55) (81.82) (80.68) (75.00) (79.55) (77.27) (77.27) (80.68) (78.98) (73.86) (80.68) (77.27)
a
The number in the table is the number of correctly classied; percentage is given in bracket.
b
29

B stands for bankruptcy group; NB stands for nonbankruptcy group.


30 G. Zhang et al. / European Journal of Operational Research 116 (1999) 1632

Table 7 [5] R. Battiti, First- and second-order methods for learning:


Pairwise comparison between ANNs and logistic regression for Between steepest descent and Newton's method, Neural
training sample Computation 4 (2) (1992) 141166.
Statistics Overall Bankrupt Nonbankrupt [6] W. Beaver, Financial ratios and predictors of failure,
Empirical Research in Accounting: Selected Studies (1966)
ANN Logistic ANN Logistic ANN Logistic 71111.
Mean 88.18 78.64 90.00 76.82 86.36 80.46 [7] T.B. Bell, G.S. Ribar, J. Verchio, Neural nets vs. Logistic
t-statistic 9.9623 6.8578 13.8807 regression: A comparison of each model's ability to predict
p-value 0.0006 0.0024 0.0002 commercial bank failures, in: Proceedings of the 1990
Deloitte Touche/University of Kansas Symposium on
Auditing Problems, 1990, pp. 2953.
[8] J.E. Boritz, D.B. Kennedy, Eectiveness of neural network
aging to note that the variation across samples in types for prediction of business failure, Expert Systems
training and test classication rates are reasonably with Applications 9 (4) (1995) 503512.
[9] J.E. Boritz, D.B. Kennedy, A. de Miranda e Albuquerque,
small. Much of the variation in results is associated Predicting corporate failure using a neural network
with the number of hidden nodes and initial start- approach, Intelligent Systems in Accounting, Finance
ing seeds. Users of ANNs will be well advised to and Management 4 (1995) 95111.
use a large number of sets of random starting seeds [10] P.L. Brockett, W.W. Cooper, L.L. Golden, U. Pitaktong,
and experiment on the hidden nodes. After the A neural network method for obtaining an early warning
of insurer insolvency, The Journal of Risk and Insurance
``optimal'' solution is identied and the appropri- 61 (3) (1994) 402424.
ate number of hidden nodes is selected, the neural [11] P.K. Coats, L.F. Fant, Recognizing nancial distress
classiers tend to provide consistent estimates. patterns using a neural network tool, Financial Manage-
We also compared neural networks with logis- ment (1993) 142155.
tic regression, a well-known statistical method for [12] K.G. Coleman, T.J. Graettinger, W.F. Lawrence, Neural
networks for bankruptcy prediction: The power to solve
classication. Neural networks provide signi- nancial problems, AI Review (1991) 4850.
cantly better estimate of the classication rate for [13] M.B. Cottrell, Y. Girard, M. Mangeas, C. Muller, Neural
the unknown population as well as for the unseen modeling for time series: A statistical stepwise method for
part of the population. It can be easily argued that weight elimination, IEEE Transactions on Neural Net-
the cost of not being able to predict a bankruptcy works 6 (6) (1995) 13551364.
[14] G. Cybenko, Approximation by superpositions of a
is much higher than that for a nonbankrupt rm. sigmoidal function, Mathematical Control Signals Systems
Neural networks in our study clearly show their 2 (1989) 303314.
superiority over logistic regression in the predic- [15] R.O. Duda, P. Hart, Pattern Classication and Scene
tion of bankrupt rms. Analysis, Wiley, New York, 1973.
[16] R. Edmister, An empirical test of nancial ratio
analysis for small business failure prediction, Journal
of Finance and Quantitative Analysis 7 (1972) 1477
References 1493.
[17] B. Efron, G. Gong, A leisurely look at the bootstrap, the
[1] Y. Alici, Neural networks in corporate failure prediction: jackknife and crossvalidation, American Statistician 37
The UK experience, in: A.P.N. Refenes, Y. Abu-Mostafa, (1983) 3648.
J. Moody, A. Weigend (Eds.), Neural Networks in [18] K.M. Fanning, K.O. Cogger, A comparative analysis of
Financial Engineering, World Scientic, Singapore, 1996, articial neural networks using nancial distress predic-
pp. 393406. tion, Intelligent Systems in Accounting, Finance and
[2] E.L. Altman, Financial ratios, discriminate analysis and Management 3 (1994) 241252.
the prediction of corporate bankruptcy, Journal of [19] D. Fletcher, E. Goss, Forecasting with neural networks:
Finance 23 (3) (1968) 589609. An application using bankruptcy data, Information and
[3] E.L. Altman, Accounting implications of failure predic- Management 24 (1993) 159167.
tion models, Journal of Accounting Auditing and Finance [20] S. Geisser, The predictive reuse method with applications,
(1982) 419. Journal of the American Statistical Association 70 (1975)
[4] E.I. Altman, G. Marco, F. Varetto, Corporate distress 320328.
diagnosis: Comparisons using linear discriminant analysis [21] S. Geman, E. Bienenstock, R. Dousat, Neural networks
and neural networks (the Italian experience), Journal of and the bias/variance dilemma, Neural Computation 5
Banking and Finance 18 (1994) 505529. (1992) 158.
G. Zhang et al. / European Journal of Operational Research 116 (1999) 1632 31

[22] F.E. Harreli, K.L. Lee, A comparison of the discriminant [40] A. Papoulis, Probability, Random Variables, and Sto-
analysis and logistic regression under multivariate nor- chastic Processes, McGraw-Hill, New York, 1965.
mality, in: P.K. Sen (Ed.), Biostatistics: Statistics in [41] D.B. Parker, Optimal algorithm for adaptive networks:
Bionmedical, Public Health, and Environmental Sciences, Second order back propagation, second order direct
North-Holland, Amsterdam, 1985. propagation, and second order Hebbian learning, in:
[23] K. Hornik, Approximation capabilities of multilayer Proceedings of IEEE International Conference on Neural
feedforward networks, Neural Networks 4 (1991) 251257. Networks, 1987, pp. 593600.
[24] K. Hornik, Some new results on neural network approx- [42] E. Patuwo, M.Y. Hu, M.S. Hung, Two-group classica-
imation, Neural Networks 6 (1993) 10691072. tion using neural networks, Decision Science 24 (4) (1993)
[25] K. Hornik, M. Stinchcombe, H. White, Multilayer feed- 825845.
forward networks are universal approximators, Neural [43] S. Piramuthu, M.J. Shaw, J.A. Gentry, A classication
Networks 2 (1989) 359366. approach using multi-layered neural networks, Decision
[26] W.Y. Huang, R.P. Lippmann, Comparisons between Support Systems 11 (1994) 509525.
neural net and conventional classiers, in: IEEE First [44] T. Poddig, Bankruptcy prediction: A comparison with
International Conference on Neural Networks, vol. IV, discriminant analysis, in: A.P.N. Refenes (Ed.), Neural
San Diego, CA, 1987, pp. 485493. Networks in the Capital Markets, Wiley, Chichester, 1995,
[27] M.S. Hung, J.W. Denton, Training neural networks with pp. 311324.
the GRG2 nonlinear optimizer, European Journal of [45] S.J. Press, S. Wilson, Choosing between logistic regression
Operations Research 69 (1993) 8391. and discriminant analysis, Journal of American Statistical
[28] M.S. Hung, M.Y. Hu, M. Shanker, B.E. Patuwo, Esti- Association 73 (1978) 699705.
mating posterior probabilities in classication problems [46] J.R. Quinlan, Comparing connectionist and symbolic
with neural networks, International Journal of Computa- learning methods, in: G. Hanson, G. Drastal, R. Rivest
tional Intelligence and Organizations 1 (1996) 4960. (Eds.), Computational Learning Theory and Natural
[29] C. Johnson, Ratio analysis and the prediction of rm Learning Systems: Constraints and Prospects, MIT Press,
failure, Journal of Finance 25 (1970) 11661168. Cambridge, MA, 1993.
[30] F.L. Jones, Current techniques in bankruptcy prediction, [47] W. Raghupathi, Comparing neural network learning
Journal of Accounting Literature 6 (1987) 131164. algorithms in bankruptcy prediction, International Jour-
[31] M. Kerling, Corporate distress diagnosis An interna- nal of Computational Intelligence and Organizations 1 (3)
tional comparison, in: A.P.N. Refenes, Y. Abu-Mostafa, (1996) 179187.
J. Moody, A. Weigend (Eds.), Neural Networks in [48] W. Raghupathi, L.L. Schkade, B.S. Raju, A neural
Financial Engineering, World Scientic, Singapore, 1996, network approach to bankruptcy prediction, in: Proceed-
pp. 407422. ings of the IEEE 24th Annual Hawaii International
[32] L. Kryzanowski, M. Galler, Analysis of small-business Conference on Systems Sciences, vol. 4, 1991, pp. 147155.
nancial statements using neural nets, Journal of Ac- [49] E. Rahimian, S. Singh, T. Thammachote, R. Virmani,
counting, Auditing and Finance 10 (1995) 147172. Bankruptcy prediction by neural network, in: R. Trippi, E.
[33] R.C. Lacher, P.K. Coats, S.C. Sharma, L.F. Fant, A neural Turban (Eds.), Neural Networks in Finance and Investing:
network for classifying the nancial health of a rm, Using Articial Intelligence to Improve Real-World Per-
European Journal of Operations Research 85 (1995) 5365. formance, Probus, Chicago, IL, 1993, pp. 159176.
[34] K.C. Lee, I. Han, Y. Kwon, Hybrid neural network [50] R. Reed, Pruning algorithm A survey, IEEE Transac-
models for bankruptcy predictions, Decision Support tions on Neural Networks 4 (5) (1993) 740747.
Systems 18 (1996) 6372. [51] M.D. Richard, R.P. Lippmann, Neural network classiers
[35] M.J. Lenard, P. Alam, G.R. Madey, The application of estimate Bayesian a posterior probabilities, Neural Com-
neural networks and a qualitative response model to the putation 3 (1991) 461483.
auditor's going concern uncertainty decision, Decision [52] A. Roy, L.S. Kim, S. Mukhopadhyay, A polynomial time
Science 26 (2) (1995) 209226. algorithm for the construction and training of a class of
[36] M. Leshno, Y. Spector, Neural network prediction anal- multilayer perceptrons, Neural Networks 6 (1993) 535
ysis: The bankruptcy case, Neurocomputing 10 (1996) 545.
125147. [53] G. Rudorfer, Early bankruptcy detection using neural
[37] R. Lippmann, An introduction to computing with neural networks, APL Quote Quad 25 (4) (1995) 171176.
nets, IEEE ASSP Magazine 4 (1987) 222. [54] D.E. Rumelhart, G.E. Hinton, R.J. Williams, Learning
[38] M. Odom, R. Sharda, A neural network model for internal representations by error propagation, in: D.E.
bankruptcy prediction, in: Proceedings of the IEEE Rumelhart, J.L. Williams (Eds.), Parallel Distributed
International Conference on Neural Networks, II, 1990, Processing: Explorations in the Microstructure of Cogni-
pp. 163168. tion, MIT Press, Cambridge, MA, 1986.
[39] J. Ohlson, Financial ratios and the probabilistic prediction [55] L.M. Salchengerger, E.M. Cinar, N.A. Lash, Neural
of bankruptcy, Journal of Accounting Research 18 (1) networks: A new tool for predicting thrift failures,
(1980) 109131. Decision Sciences 23 (4) (1992) 899916.
32 G. Zhang et al. / European Journal of Operational Research 116 (1999) 1632

[56] M. Shanker, M.Y. Hu, M.S. Hung, Eect of data [62] R.R. Trippi, E. Turban, Neural Networks in Finance and
standardization on neural network training, Omega 24 Investment: Using Articial Intelligence to Improve Real-
(4) (1996) 385397. World Performance, Probus, Chicago, IL, 1993.
[57] R. Sharda, R.L. Wilson, Neural network experiments [63] J. Tsukuda, S. Baba, Prediction Japanese corporate
in business-failure forecasting: Predictive performance bankruptcy in terms of nancial data using neural
measurement issues, International Journal of Compu- network, Computers and Industrial Engineering 27
tational Intelligence and Organizations 1 (2) (1996) (1994) 445448.
107117. [64] G. Udo, Neural network performance on the bankruptcy
[58] M. Stone, Cross-validatory choice and assessment of classication problem, Computers and Industrial Engi-
statistical predictions, Journal of the Royal Statistical neering 25 (1993) 377380.
Society B 36 (1974) 111147. [65] Z. Wang, C.D. Massimo, M.T. Tham, A.J. Morris, A
[59] V. Subramianian, M.S. Hung, A GRG2-based system for procedure for determining the topology of multilayer
training neural networks: Design and computational feedforward neural networks, Neural Networks 7 (1994)
experience, ORSA Journal on Computing 5 (4) (1993) 291300.
386394. [66] R.L. Wilson, R. Sharda, Bankruptcy prediction using
[60] K.Y. Tam, Neural network models and the prediction of neural networks, Decision Support Systems 11 (1994) 545
bank bankruptcy, OMEGA 19 (5) (1991) 429445. 557.
[61] K.Y. Tam, M.Y. Kiang, Managerial applications of neural [67] F. Zahedi, A meta-analysis of nancial application of
networks: The case of bank failure predictions, Manage- neural networks, International Journal of Computational
ment Science 38 (7) (1992) 926947. Intelligence and Organizations 1 (3) (1996) 164178.

You might also like