You are on page 1of 11

Information Sciences 180 (2010) 12571267

Contents lists available at ScienceDirect

Information Sciences
journal homepage: www.elsevier.com/locate/ins

Failure prediction of dotcom companies using neural networkgenetic


programming hybrids
P. Ravisankar a, V. Ravi a,*, I. Bose b
a
Institute for Development and Research in Banking Technology, Castle Hills Road #1, Masab Tank, Hyderabad 500 057, AP, India
b
School of Business, The University of Hong Kong, Room 730, Meng Wah Complex, Pokfulam Road, Hong Kong, China

a r t i c l e i n f o a b s t r a c t

Article history: This paper presents novel neural networkgenetic programming hybrids to predict the fail-
Received 28 November 2008 ure of dotcom companies. These hybrids comprise multilayer feed forward neural network
Received in revised form 5 October 2009 (MLFF), probabilistic neural network (PNN), rough sets (RS) and genetic programming (GP)
Accepted 22 December 2009
in a two-phase architecture. In each hybrid, one technique is used to perform feature
selection in the rst phase and another one is used as a classier in the second phase. Fur-
ther t-statistic and f-statistic are also used separately for feature selection in the rst phase.
Keywords:
In each of these cases, top 10 features are selected and fed to the classier. Also, the NNGP
Dotcom companies
Failure prediction
hybrids are compared with MLFF, PNN and GP in their stand-alone mode without feature
Feature selection selection. The dataset analyzed here is collected from Wharton Research Data Services
Multilayer feed forward neural network (WRDS). It consists of 240 dotcom companies of which 120 are failed and 120 are healthy.
Probabilistic neural network Ten-fold cross-validation is performed throughout the study. Results in terms of average
Genetic programming accuracy, average sensitivity, average specicity and area under the receiver operating
t-Statistic characteristic curve (AUC) indicate that the GP outperformed all the techniques with or
f-Statistic without feature selection. The superiority of GPGP is demonstrated by t-test at 10% level
of signicance. Furthermore, the results are much better than those reported in previous
studies on the same dataset.
2009 Elsevier Inc. All rights reserved.

1. Introduction

The growth of the Internet based companies in the last two decades was so phenomenal that the Internet has assumed
the form of an alternate shopping mall. Several companies have started selling their products such as pet supplies, garden
tools, cosmetics and books on the Web. Barnes and Nobles, Wal-Mart, etc. are some of these companies that have also phys-
ical presence in retailing. The companies which have online subsidiaries and which can produce their own nancial reports
separately from the parent company are known as brick-and-mortar corporations. Those companies, which have only an
online existence in the internet are known as the dotcoms or the click-and-mortar corporations, examples of which are Ama-
zon.com, expedia.com, etc. As the number of Internet users across the world increases rapidly, retailing over the Internet
continues to gain popularity in a steady state. A report [18] stated that the online retail sector would experience an average
annual growth rate of 21% between 2002 and 2007. It also stated that, more than 5% of US retail sales would be transacted
online by 2007. Forrester Research predicted that e-commerce sales would grow at a steady rate of 19% per year, with in-
crease from $95.7 billion in 2003 to $229.0 billion in 2008, with the total number of online shopping households in the
US reaching 63 million by 2008 [38]. However, in the early years of the new millennium, the phenomenal growth in Internet
related companies has suffered a severe setback. Later, Forrester Research [17] indicated that the most click-and-mortar

* Corresponding author. Tel.: +91 40 23534981x2042; fax: +91 40 23535157.


E-mail addresses: ravisankar_hcu@yahoo.co.in (P. Ravisankar), rav_padma@yahoo.com (V. Ravi), bose@business.hku.hk (I. Bose).

0020-0255/$ - see front matter 2009 Elsevier Inc. All rights reserved.
doi:10.1016/j.ins.2009.12.022
1258 P. Ravisankar et al. / Information Sciences 180 (2010) 12571267

companies would be driven out of business by 2001 due to various factors like weak nancial strength, increased competi-
tion, and investor ight. At the same time, the Gartner Group had predicted that almost 95% of all click-and-mortar compa-
nies would fail by 2002 [9]. Later, all came to know that the prediction of the nancial pundits had come true and the dotcom
bubble had burst. Within a very short time, several corporations that saw spectacular growth in their stock prices in the late
1990s had gone out of business. In 2000, when CNNfn.com had asked the market data and research rm Birinyi Associates of
Westport, Connecticut, to calculate the market value of the stocks in the Bloomberg US Internet Index, they found that the
combined market values of all stocks had fallen to $1.193 trillion from $2.948 trillion at their peak, with a loss of $1.755
trillion in only seven months [19]. The main factors like dotcoms inability to improve revenues and earnings, failure to
post-prots, attempt to capture a major market share in the smallest possible time, and tendency to operate in limited
geographical areas are responsible for their demise [42].
The main objective of the present work is to propose new genetic programming based hybrids that predict accurately the
failure of dotcom companies. The auxiliary aim is to select the most important nancial ratios which would yield accurate
prediction. This is accomplished by resorting to feature selection. The results obtained here would be useful to emerging dot-
com companies in the near future. In fact, emerging dotcom rms can carefully monitor those nancial ratios for getting
long-term advantages in the competitive market. Further, it would be very useful for the investors also, who plan to invest
in such companies.
The rest of the paper is organized as follows. Section 2 reviews the research done in bankruptcy prediction of banks and
rms, including dotcom companies. Section 3 overviews the techniques applied in this paper. Section 4 describes the hybrid
intelligent systems developed in this paper. Section 5 presents the results and discussions. Finally, Section 6 concludes the
paper.

2. Literature review

Altman [1] was the rst to study the failure prediction of nancial rms and banks in 1960. Since then it has been the
most extensively researched area in the last few decades. As bankruptcy adversely affects creditors, auditors, stockholders
and senior management all of them are interested in bankruptcy prediction [52]. Federal Deposit Insurance Corporation
Improvement Act of 1991 used a six part rating system to indicate the safety and soundness of the institution. This rating
is called as CAMELS rating, evaluates banks and rms in terms of: Capital adequacy, Assetquality, Management expertise, Earn-
ings strength, Liquidity, and Sensitivity to market risk. While CAMELS ratings clearly provide regulators with important infor-
mation, Cole and Gunther [14] reported that these CAMELS ratings decay rapidly.
Altman [2] used nancial ratios and multiple discriminant analysis (MDA) for bankruptcy prediction. Later, Whalen and
Schott [51] proposed a fuzzy production system as a powerful technique for performing an automated preliminary diagnosis
in a fuzzy network to analyze the nancial performance of a rm using linguistic assessments of nancial indicator ratios. It
is found that MDA could be used only if assumptions on linear separability, multivariate normality and independence of the
predictive variables are satised. But unfortunately, most of the nancial ratios violate these assumptions [2]. Incidentally, it
was observed that bankruptcy prediction problem can also be solved using some other types of classiers and an overview of
such works is given as follows: The logit model is employed to predict rm failure by Ohlson [26]. Later, backpropagation
neural network (BPNN) and discriminant analysis were employed by Odom and Sharda [25]. Tam [47] reported that BPNN
outperformed MDA, logistic regression, k-nearest neighbor (k-NN) method and ID3. Similarly, Salchenberger [40] found that
the BPNN performed better than logit model.
In 1996, Serrano-Cinca [41] compared the performance of SOM with that of LDA and BPNN in nancial diagnosis. He pro-
posed two hybrid neural systems consisting of LDA, BPNN and SOM viz., (i) a combination of LDA with SOM, where LDA cal-
culated the Z-score for each rm, which was superimposed onto SOM to obtain isosolvent regions, (ii) a combination of BPNN
with SOM. Rahimian et al. [31] compared the performance of (i) BPNN, (ii) Athena, an entropy-based neural network and (iii)
single layer perceptron on the bankruptcy prediction problem. They compared them with BPNN of Odom and Sharda [25],
and discriminant analysis also. Then, Olmeda and Fernandez [27] compared the accuracy of a number of classiers in stand-
alone mode and developed a hybrid intelligent system from those individual classiers. Later, Varetto [50] employed a ge-
netic algorithm (GA) for bankruptcy prediction and compared its performance with that of LDA. Gorzalczany and Piasta [16]
presented two different hybrid intelligent decision support systems viz., (i) neuro-fuzzy classier (N-FC) and (ii) rough clas-
sier (RC) for rm bankruptcy prediction and compared the results with the previous works. They concluded that N-FC out-
performed rule induction techniques like C4.5 and CN2.
Further, a rough set based bankruptcy prediction model was reported by McKee [23]. Then, Mckee and Lensberg [24] pro-
posed a hybrid approach to bankruptcy prediction, using GP to construct a bankruptcy prediction model with variables from
a rough sets model derived in prior research. They concluded that the hybrid consisting of rough sets and GP produced better
results compared to that of rough sets model alone. Later, Atiya [3] proposed some new nancial indicators for bankruptcy
prediction problem. Further, Swicegood and Clark [46] used bank call reports to extract variables to design a new neural net-
work. They compared DA, BPNN and human judgment in predicting bank failures. Then, Park and Han [28] proposed a hybrid
of analytical hierarchy process, case based reasoning and k-nearest neighbor technique for bankruptcy prediction. Later Tung
et al. [49] proposed a new neural fuzzy system, viz., the generic self-organizing fuzzy neural network based on the compo-
sitional rule of inference, GenSoFNN-CRI(S) for bankruptcy prediction.
P. Ravisankar et al. / Information Sciences 180 (2010) 12571267 1259

At the same time the usage of linear discriminant models, multi-layer perceptron and wavelet networks for corporate
nancial distress prediction is reported by Becerra et al. [5]. They concluded that wavelet neural networks may have
advantages over the multi-layer perceptron and they may be a valid alternative to the linear discriminant models. Then,
Shin et al. [44] used support vector machine (SVM) to predict corporate bankruptcy. They concluded that SVM outper-
formed the BPNN in terms of accuracy along with the reduction in training dataset size. Further Canbas et al. [8] pro-
posed an early warning system for detecting banks suffering from serious problems. They applied this methodology to
Turkish banks dataset. Later, isotonic separation for prediction of rm bankruptcy was introduced by Ryu and Yue [39].
Further, the application of neural logic networks by means of genetic programming was proposed by Tsakonas et al.
[48].
Further, Cheng et al. [13] proposed distress prediction model that combines the approaches of radial basis function net-
work (RBFN) learning and logit analysis and concluded that the performance of the proposed RBFN is superior to the tradi-
tional logit analysis and a backpropagation neural network. Later, Ravikumar and Ravi [35] proposed a fuzzy rule based
classier (FRBC) for bankruptcy prediction and reported that FRBC outperformed the BPNN in the case of US banks data. Fur-
ther, Ravi et al. [33] proposed a semi-online training algorithm for the radial basis function neural network (SORBF) and ap-
plied it to bankruptcy prediction in banks. Semi Online RBFN without linear terms (SORBF2) performed better than ANFIS,
SVM, MLFF-BP, RBF, Semi Online RBFN with linear terms (SORBF1) and Orthogonal RBF. Later Ravikumar and Ravi [36] pro-
posed an ensemble classier based on ANFIS, SVM, RBF, SORBF1, SORBF2, Orthogonal RBF and MLFF-BP for the bankruptcy
prediction problem. They reported that, ANFIS, SORBF2, MLP are the most prominent models as they appeared in the best
ensemble classier combinations.
Recently, Ravikumar and Ravi [37] conducted a comprehensive review of all the works reported using statistical and intel-
ligent techniques to solve the bankruptcy prediction problem in banks and rms during 19682005. It compares the tech-
niques in terms of prediction accuracy, data sources, and timeline of each study wherever available. Later, Zhu et al. [54]
described a new computational intelligence algorithm, the self-organizing learning array (SOLAR) and applied to a number
of specialized economic and nancial cases including bankruptcy prediction with great success. They also reported that their
results are comparable with that of method followed by Atiya [1]. Recently, Yen [53] used the unsupervised Adaptive Res-
onant Theory (ART) to detect versatile accounting fraud and proposed warning signals to identify the possibility of potential
fraud in a company. He also demonstrated that these early warning signals are helpful to identify the important features
responsible for accounting fraud.
Further, Ravi et al. [34] developed a novel soft-computing system for bank performance prediction based on MLP, RBF,
CART, PNN, FRBC and PCA based hybrid techniques. Then most recently, Modied Great Deluge Algorithm based Auto Asso-
ciative Neural Network by Pramodh and Ravi [30], principal component neural networks by Ravi and Pramodh [32], fuzzy
rule based classier with wavelet neural network based feature selection by Chandra and Ravi [10] and differential evolu-
tion trained wavelet neural network by Chauhan et al. [12] were applied to bankruptcy prediction with great success. Most
recently, Li and Sun [21] proposed a hybrid Gaussian Case Based Reasoning (GCBR) and applied it in business failure pre-
diction (BFP) with statistically signicant results on empirical data in China. They concluded that GCBR showed superior
performance compared to MDA, logistic regression and two classical CBR systems in terms of accuracy and coefcient of
variation.
However, the research in the area of click-and-mortar companies is relatively a new area. Bose and Pal [7] were the rst to
employ BPNN, DA and SVM to predict the failure of the click-and-mortar corporations based on a set of nancial ratios and
concluded that BPNN yielded the best results. Further, Bose [6] applied rough sets and concluded that three variables RE/TA,
S/MC and S/TA appeared to be the three major predictors as they occurred most frequently in the generated reducts. Most
recently, Chandra et al. [11] reported different hybrid intelligent techniques along with t-statistic on the same dataset and
concluded that t-statistic based feature subset selection yielded very high accuracies for all the classication techniques and
their results are superior to those reported in the previous studies on the same dataset.

3. Overview of techniques

The data is analyzed using different stand-alone techniques like MLFF (also known as BPNN and MLP), GP and PNN. Total
dataset is initially analyzed using these classiers in their stand-alone mode and then GP-based hybrids are constructed un-
der soft-computing paradigm.

3.1. Multilayer feed forward neural network (MLFF)

Multilayer feed forward neural network (MLFF), by far the most popular neural network is trained by the standard back
propagation algorithm. They are supervised networks so they require a desired response to be trained. They learn how to
transform input data into a desired response. So they are widely used for pattern classication and prediction. A multi-layer
perceptron is made up of several layers of neurons. Each layer is fully connected to the next one. With one or two hidden
layers, they can approximate virtually any inputoutput map. They have been shown to yield accurate predictions in difcult
problems. Most neural network applications involve MLFFs.
1260 P. Ravisankar et al. / Information Sciences 180 (2010) 12571267

3.2. Probabilistic neural network (PNN)

The PNN was introduced by Specht [45] in 1990. It is a feed forward neural network with one pass training algorithm and
used for classication and mapping of data. Also, it is a pattern classication network based on the classical Bayes classier,
which is statistically an optimal classier that seeks to minimize the risk of misclassications. Any pattern classier places
each observed data vector x x1 ; x2 ; x3 ; . . . ; xN T in one of the predened classes ci ; i 1; 2; . . . ; m where m is the number of
possible classes. The effectiveness of any classier is limited by the number of data elements that vector x can have and the
number of possible classes n. The classical Bayes pattern classier [43] implements the Bayes conditional probability rule
that the probability Pci jx of x being in class ci is given by the following equation:

Px=ci Pci
Pci =x Pm 1
j1 Px=c j Pc j

where Pxjci is the conditioned probability density function of x given set ci ; Pcj is the probability of drawing data from
class cj. Vector x is said to belong to a particular class ci , if Pci jx > Pcj jx; 8j 1; 2; . . . ; m and j i.
An advantage of the PNN [22] is that it is guaranteed to approach the Bayes optimal decision surface provided that the
class probability density functions are smooth and continuous. The PNN simplies the Bayes classication procedure by
using a training set from which the desired statistical information for implementing Bayes classier can be drawn. The de-
sired probability density function of the class is approximated by using the Parzen windows approach [22]. In particular,
PNN approximates the probability that vector x 2 RN belongs to a particular class ci as a sum of weighted Gaussian distribu-
tions centered at each training sample is given by the following equation:
2  T  3
1
nt i
X x  xij x  xij
6 7
Pci =x exp 4 5 2
N
2P 2 rN nti j1
2r2

where xij is the jth training vector for the patterns in class i, r is known as a smoothing factor, Nis the dimension of the input
pattern, and nti is the number of training patterns in class i. For nonlinear decision boundaries, the smoothing factor r needs
to be as small as possible. The computational structure of the PNN is shown in Fig. 1. The network has an input layer, a pat-
tern layer, a summation layer and an output layer.

Input vectors
X1 X2 X3 XN

Input layer

Pattern
layer

P ( c1 x ) P ( c2 x ) P ( cm x )
Summation
layer

Output layer

Output class for the given input vector

Fig. 1. Structure of probabilistic neural network.


P. Ravisankar et al. / Information Sciences 180 (2010) 12571267 1261

This input x is fed into each of the patterns in the pattern layer. The summation layer computes the probability Pci jx of
the given input x to be in each of the classes ci represented by the patterns in the pattern layer. The output layer selects the
class for which highest probability was obtained in the summation layer. The input is then classied to belong to this class.
The effectiveness of the network in classifying input vectors depends highly on the value of the smoothing parameter r.

3.3. Genetic programming (GP)

Genetic programming (GP) [20] is an extension of genetic algorithms (GA). GP is a search methodology belonging to the
family of evolutionary computation (EC). It is a general search method that uses analogies from natural selection and evo-
lution [15]. Nowadays these algorithms have been applied in a wide range of real-world problems. Genetic programming in
its canonical form enables the automatic generation of mathematical expressions or programs. According to the most com-
mon implementations, a population of candidate solutions is maintained, and after the completion of a generation, the pop-
ulation is expected to be better t for a given problem. Usual termination criteria are based on the maximum number of
generations, the achievement of a desired classication error, etc.
In contrast to GA, GP encodes multiple solutions for specic problems as a population of programs or functions. The pro-
grams can be represented as parse trees. Usually, parse trees are composed of internal nodes and leaf nodes. Internal nodes
are called primitive functions, and leaf nodes are called terminals. The terminals can be viewed as the inputs to the specic
problem. They might include the independent variables and the set of constants. The primitive functions are combined with
the terminals or simpler function calls to form more complex function calls. Fitness is typically determined by the objective
function for some optimization problem, e.g., the number of correctly classied cases in a set of bankrupt and nonbankrupt
rms/banks.
GP randomly generates an initial population of solutions. Then, the initial population is manipulated using various genetic
operators to produce new populations. These operators include reproduction, crossover, mutation, dropping condition, etc.
The whole process of evolving from one population to the next population is called a generation. A high-level description of
GP algorithm can be divided into a number of sequential steps [15]:

 Create a random population of programs, or rules, using the symbolic expressions provided as the initial population.
 Evaluate each program or rule by assigning a tness value according to a predened tness function given by Eq. (3) that
can measure the capability of the rule or program to solve the problem.
 Use reproduction operator to copy existing programs into the new generation.
 Generate the new population with crossover and mutation from a randomly chosen set of parents.
 Repeat steps 2 onwards for the new population until a predened termination criterion is satised, or a xed number of
generations is completed.
 The solution to the problem is the genetic program with the best tness within all the generations.

Thus, GP is a search technique [29] that explores the space of computer programs. In GP, crossover operation is achieved
rst by reproduction of two parent trees. Two crossover points are then randomly selected in the two offspring trees.
Exchanging sub-trees, which are selected according to the crossover point in the parent trees, generates the nal offspring
trees. The obtained offspring trees are usually different from their parents in size and shape. Then, mutation operation is also
considered in GP. A single parental tree is rst chosen. Then a mutation point is randomly selected from the reproduction,
which can be either a leaf node or a sub-tree. Finally, the leaf node or the sub-tree is replaced by a new leaf node or sub-tree
generated randomly. Fitness functions ensure that the evolution goes toward optimization by calculating the tness value
for each individual in the population. The tness value evaluates the performance of each individual in the population.
GP can perform classication by returning numeric (real) values and then translating these values into class labels. For
binary classication problems the division between negative and non-negative numbers acts as a natural boundary for a
division between two classes. This means that genetic programs can easily represent binary class problems. While evaluating
the GP-expression for an input data, if the result of the GP-expression is P0, the input data is assigned to one class. Else, it is
assigned to the other class. Thus, the desired output D is +1 for one class and is 1 for the other one in the training set. During
the genetic evolution of individuals, the best individual is that which correctly classies the maximum of training samples,
the positive samples must give a value of +1 for the output, and negative samples must give 1. GP is guided by the tness
function to search for the most efcient computer program to solve a given problem. A simple measure of tness [15]
adopted for the binary classication problem is given by the following equation:

no: of samples classified correctly


FitnessT 3
no: of samples used for training during ev aluation

The major advantages in applying GP to pattern classication are:

 GP-based techniques are data distribution-free, so no a priori knowledge is needed about statistical distribution of the
data.
 GP can directly operate on the data in its original form.
1262 P. Ravisankar et al. / Information Sciences 180 (2010) 12571267

 GP can detect the underlying but unknown relationship that exists among data and express it as a mathematical
expression.
 GP can discover the most important discriminating features of a class during the training phase.
 The generated expression can be easily used in the application environment.

4. Neural networkGP hybrids

The fundamental basis of soft-computing paradigm is that hybrid intelligent techniques tend to outperform the stand-
alone techniques. Bates and Granger [4] concluded that the composite set of forecasts could yield lower mean square error
than either of the original forecasts. So based on the above conclusion, our main idea behind developing these hybrids is to
improve each stand-alone techniques ability to correctly classify the data. The hybrids that are developed in our study in-
clude MLFFGP, MLFFMLFF, PNNGP, PNNPNN, RSMLFF, RSPNN, RSGP and GPGP, where the rst intelligent technique
takes care of feature selection and the second one takes care of classication. Accordingly, all these hybrids are loosely cou-
pled hybrid architectures, which have two separate and independent modules. The control ow is sequential in the sense
that the processing in one module has to be nished before the processing in the next module can began. These hybrids
are described briey as follows:
In the rst phase, we designed a feature subset selection methodology as follows. We performed 10-fold cross-valida-
tion throughout the study. Since the dataset is a well-balanced one (healthy banks 50% and bankrupt banks 50%), we
ensured that each fold has equal number of healthy and bankrupt banks, through stratied random sampling. In each of
the 10 experiments, training data is fed to MLFF, PNN, GP, t-statistic and f-statistic separately for feature selection purpose.
Top 10 features are selected in each fold for a given technique and it was observed that different folds yielded different
features as the top features. In order to arrive at a unied and optimal feature subset, we computed the frequency of
occurrence all the features in the top 10 slots across all folds. Then, all the features are sorted in the descending order
of the frequency of occurrence. This helps us in selecting feature subset for that particular technique. We repeated the
same method for every other technique. In case of rough sets, we had taken the top 10 features from the results obtained
by Bose [6].
In case of MLFF (as described in Neuroshell 2.0), the contribution factor is calculated for each of the inputs in the given
dataset. This contribution factor is an approximate measure of the importance of the input variable in predicting the net-
works output, relative to the other input variables in the same network and it is developed from an analysis of the weights
of the trained neural network. In case of PNN (as described in Neuroshell 2.0), the input-smoothing factor is calculated for
each of the input variables from the overall smoothing factor. The larger the factor for a given input, the more important is
that input to the model. In each case, top 10 features that contribute to high accuracy are selected from the 24 features. The
feature subset so formed is fed separately to MLFF/PNN/GP for classication purpose in the second phase. The block diagram
for these hybrids is shown in Fig. 2. In case of rough sets based approach, however, the top 10 features are taken from Bose
[6], who followed hold-out method of 8020 ratio. In addition to using the techniques MLFF/PNN/GP/RS separately for fea-
ture selection, we also employed t-statistic and f-statistics for feature selection, which resulted in two new hybrids that are
described as follows:

4.1. t-StatisticGP

In order to make the feature selection phase more exhaustive, we resorted to employing t-statistic for feature selection. t-
Statistic is one of the simplest and efcient techniques for feature selection. The features are ranked according to the formula
given by the following equation:

MLFF/
PNN/ Dataset MLFF/
Dataset GP/ with Final
PNN/ output
with all RS/ reduced GP
features t-statistic/ features
f-statistic

Feature Classification
selection phase
phase

Fig. 2. Architecture of different hybrids.


P. Ravisankar et al. / Information Sciences 180 (2010) 12571267 1263

jl  l2 j
t-statistic q1
4
r21 r2
n1
n22

where l1 and l2 represents for a given feature, respectively, the means of the samples of bankrupt rms and nonbankrupt
rms, r1 and r2 represents for a given feature, respectively, the standard deviations of the samples of bankrupt rms and
nonbankrupt rms, n1 and n2 represents for a given feature, respectively, the number of samples of bankrupt rms and non-
bankrupt rms.
The t-statistic values are computed for each feature over all the 10 folds. Then, the top 10 features with highest t-statistic
values are considered by using feature subset selection methodology as discussed in the previous section. A high t-statistic
value indicates that the feature could highly discriminate the samples that are failed from those which did not fail. The fea-
ture subset so formed is fed as input to GP for classication purpose. The block diagram for this hybrid is shown in Fig. 2.

4.2. f-StatisticGP

In addition to t-statistic, f-statistic is also used for feature selection. The f-statistic is calculated using the formula given in
the following equation:

l1  l2 2
f -statistic 5
r21 r22
where l1 ; l2 ; r1 ; r2 ; n1 and n2 have their usual meanings as discussed in earlier section.
The features are thus ranked and the top 10 features are selected. The feature subset so formed is fed as input to GP for
classication purpose. The block diagram for this hybrid is shown in Fig. 2.

5. Results and discussion

The dataset analyzed here is taken from Wharton Research Data Services (WRDS) for the year 2000. Bose [6], Bose and Pal
[7] and Chandra et al. [11] analyzed the same data set earlier. It comprises 24 nancial ratios for 240 click-and-mortar com-
panies some of which 120 rms failed and 120 rms did not fail. Of the 24 features, 115 were the most popularly used ones
in literature related to bankruptcy prediction of nancial rms and the rest were constructed to capture the novelty of dot-
coms. Though there were no guidelines on nancial ratios specically important to dotcoms, ratios 1624 was used to reect
their sales, earnings, cash, income, market capitalization and stock prices. The nancial ratios are given in Table 1. We em-
ployed the GP as implemented in the tool Discipulus (available at www.rmltech.com and downloaded on 20th August, 2008).
As regards MLFF and PNN, we employed Neuroshell 2.0.
The sensitivity is the measure of proportion of the number of bankrupt rms predicted correctly as bankrupt by a partic-
ular model to the total number of actual bankrupt rms. The specicity is the measure of proportion of the number of non-

Table 1
Financial ratios used in dotcom data set.

S. No. Predictor variable name Financial ratios


1 Working capital/total assets WC/TA
2 Total debt/total assets TD/TA
3 Current assets/current liabilities CA/CL
4 Operating income/total assets OI/TA
5 Net income/total assets NI/TA
6 Cash ow/total debt CF/TD
7 Quick assets/current liabilities QA/CL
8 Cash ow/sales CF/S
9 Retained earnings/total assets RE/TA
10 Sales/total assets S/TA
11 Gross prot/total assets GP/TA
12 Net income/shareholders equity NI/SE
13 Cash/total assets C/TA
14 Inventory/sales I/S
15 Quick assets/total assets QA/TA
16 Price per share/earnings per share P/E
17 Sales/market capitalization S/MC
18 Current assets/total assets CA/TA
19 Long term debt/total assets LTD/TA
20 Operating income/sales OI/S
21 Operating income/market capitalization OI/MC
22 Cash/sales C/S
23 Current assets/sales CA/S
24 Net income/(total assets-total liabilities) NI/(TA-TL)
1264 P. Ravisankar et al. / Information Sciences 180 (2010) 12571267

Table 2
Average results of dataset with all features using 10-fold cross-validation.

Classier Accuracy Sensitivity Specicity AUC


MLFF 73.33 71.66 75 7333
PNN 92.06 90.83 93.33 9208
GP 94.58 91.67 97.5 9458.5

bankrupt rms predicted correctly as nonbankrupt by a model to the total number of actual nonbankrupt rms. In all the
cases we presented the average accuracies, sensitivities, specicities and area under the receiver operating characteristic
curve (AUC) on the test data obtained over 10 folds. We ranked the classiers based on AUC. First, the results of 10-fold
cross-validation method on the stand-alone techniques viz MLFF, PNN and GP without feature selection are presented in Ta-
ble 2. From Table 2, we observe that GP with 94.58% accuracy and 91.67% sensitivity outperformed PNN and MLFF. PNN
yielded next best results with 92.06% accuracy and 90.83% sensitivity.
The feature subsets yielded by MLFF, PNN, GP, t-statistic, f-statistic and rough sets [6] are presented in Table 3. It is ob-
served that the feature subsets selected by different techniques do have some overlaps. With t-statistic, out of 10 features
selected, 6 are from rst 15 nancial ratios and the other 4 are from 9 special nancial ratios. With f-statistics, out of 10 fea-
tures selected, 5 are from rst 15 nancial ratios and the other 5 are from 9 special nancial ratios. With MLFF, out of 10
features selected, 5 are from rst 15 nancial ratios and the other 5 are from 9 special nancial ratios. With PNN, out of
10 features selected, 4 are from rst 15 nancial ratios and the other 6 are from 9 special nancial ratios. With GP, out of
10 features selected, 6 are from top 15 nancial ratios and the other 4 are from 9 special nancial ratios. The 5 features that
are not selected by all the above feature selection techniques are shown in Table 4. Out of those 5 left-over features, 4 are
from top 15 nancial ratios and 1 is from 9 special nancial ratios. Also, the top 10 features that are selected by at least three
techniques are presented in Table 5. Out of these 10 features, 6 are from 9 special nancial ratios and the rest 4 are from top
15 nancial ratios. From these observations we can conclude that 9 special nancial ratios contribute very much to the accu-
racy compared to the usual 15 nancial ratios.
The average results of hybrids over all folds with feature subset selection are presented in Table 6. From Table 6, we ob-
serve that GPGP hybrid outperformed all other hybrids with 95.42% average accuracy and 93.33% average sensitivity while
t-statisticGP hybrid came closely behind with 95% average accuracy and 93.33% average sensitivity. Furthermore, results
based on AUC curve indicate that the GPGP hybrid yielded very high accuracies followed by t-statisticGP, PNNGP and
rough setsGP hybrids, which yielded marginally less AUC values. This makes us infer that feature subsets selected have high
discriminating power and the left-over or the redundant features have very little to contribute to the accuracy.
Further, we performed t-test between GPGP hybrid versus other hybrids in order to test whether the difference in aver-
age accuracies is statistically signicant or not. The results of t-test values between the average accuracy obtained by GPGP
hybrid and that of other hybrids are presented in Table 7. These values are compared with 1.73, which is the distribution

Table 3
Selected feature subsets for different techniques.

# t-Statistic f-Statistic MLFF PNN GP RSa


1 RE/TA OI/MC S/TA RE/TA OI/TA RE/TA
2 I/S S/TA TD/TA P/E NI/TA S/MC
3 QA/TA CA/TA LTD/TA LTD/TA CF/S P/E
4 P/E TD/TA S/MC CF/S RE/TA S/TA
5 LTD/TA NI/(TA-TL) C/S S/MC S/TA CF/S
6 CF/S CF/S NI/(TA-TL) TD/TA GP/TA LTD/TA
7 TD/TA LTD/TA CA/CL OI/TA S/MC CA/TA
8 CA/TA P/E QA/CL OI/S CA/TA QA/TA
9 S/TA QA/TA I/S C/S OI/S GP/TA
10 OI/MC I/S OI/MC NI/(TA-TL) OI/MC CF/TD
a
Features selected by rough sets based approach, in the hold out method of 8020 ratio [6].

Table 4
Financial ratios not selected by any of the feature selection techniques.

# Predictor variable name Financial ratios


1 Working capital/total assets WC/TA
2 Cash ow/total debt CF/TD
3 Net income/shareholders equity NI/SE
4 Cash/total assets C/TA
5 Current assets/sales CA/S
P. Ravisankar et al. / Information Sciences 180 (2010) 12571267 1265

Table 5
Most commonly selected nancial ratios by different feature selection techniques.

# Predictor variable name Financial ratios


1 Long term debt/total assets LTD/TA
2 Cash ow/sales CF/S
3 Total debt/total assets TD/TA
4 Sales/total assets S/TA
5 Operating income/market capitalization OI/MC
6 Retained earnings/total assets RE/TA
7 Price per share/earnings per share P/E
8 Current assets/total assets CA/TA
9 Net income/(total assets-total liabilities) NI/(TA-TL)
10 Sales/market capitalization S/MC

Table 6
Average results of dataset with reduced features using 10-fold cross-validation.

Technique used for FSSa Classier Accuracy Sensitivity Specicity AUC


t-Statistic GP 95 93.33 96.67 9500
f-Statistic GP 92.09 90 94.17 9208.5
MLFF GP 93.33 90 96.67 9333.5
MLFF MLFF 73.33 82.16 64.99 7357.5
PNN GP 94.16 90.83 97.5 9416.5
PNN PNN 89.58 85.83 93.33 8958
GP GP 95.42 93.33 97.5 9541.5
RS GP 94.17 93.33 95 9416.5
RS MLFF 70 70 70 7000
RS PNN 83.75 81.34 93.33 8275.5
a
Feature subset selection.

Table 7
t-Test values of average accuracies of GPGP hybrid compared to that of other hybrids.

# Hybrid compared t-Test value


1 t-statGP 0.308282
2 f-statGP 2.553784
3 PNNGP 0.721404
4 MLFFGP 1.371362
5 MLFFMLFF 7.400529
6 PNNPNN 2.893897
7 RSGP 0.720887
8 RSPNN 4.964391
9 RSMLFF 18.83484

values at n1 + n2  2 = 10 + 10  2 = 18 degrees of freedom at 10% level of signicance. The hybrids that are statistically sig-
nicant compared to GPGP hybrid are highlighted in Table 7. From Table 7, we observe that the t-test values are more than
1.73 whenever GP is not used as a classier, except for f-statisticGP hybrid. Therefore, we can conclude that GP, as a clas-
sier outperforms other techniques.
Based on our experiments, we can also conclude that GP without feature selection outperformed methods such as random
forest, multi-layer perceptrons (MLFF-BP), support vector machines, classication and regression trees, logistic regression
and other boosting methods employed by Chandra et al. [11] and discriminant analysis, neural networks and support vector
machines employed by Bose and Pal [7] and rough sets by Bose [6]. Hence, based on the experiments conducted we conclude
that GP with and without feature selection outperformed all the other techniques.

6. Conclusions

This paper presents novel neural networkgenetic programming hybrids in the soft-computing paradigm to predict the
failure of dotcom companies. The hybrids designed are (1) MLFFGP, (2) MLFFMLFF, (3) PNNGP, (4) PNNPNN, (5) t-sta-
tisticGP, (6) f-statisticGP, (7) rough setsMLFF, (8) rough setsPNN, (9) rough setsGP and (10) GPGP, wherein the rst
technique performs feature subset selection and the second one takes care of classication. The dataset analyzed consists of
240 dotcom companies of which 120 are failed and 120 are healthy. Ten-fold cross-validation is performed throughout the
paper. In addition to MLFF, PNN, GP and RS, simpler techniques such as t-statistic and f-statistic are also employed for feature
1266 P. Ravisankar et al. / Information Sciences 180 (2010) 12571267

subset selection purpose and top 10 features are selected in each case. The reduced feature subset in each case is used to
train MLFF, PNN and GP separately. We noticed that there is not much difference in the accuracies even after completely
removing 14 features. Results based on area under the receiver operating characteristic (AUC) curve indicate that the GP
GP is the top performer followed by t-statisticGP and PNNGP hybrids which yielded marginally less accuracies. After per-
forming t-test at 10% level of signicance, we noticed that the differences in accuracies obtained by GPGP hybrid and that of
t-statisticGP, PNNGP, MLFFGP, RSGP hybrids are not statistically signicant. Whereas the differences in accuracies ob-
tained by GPGP hybrid and that of MLFFMLFF, PNNPNN, f-statisticGP, RSMLFF, RSPNN hybrids are statistically signif-
icant. Of particular signicance is the important observation that GP performed very well as a classier compared to others.
Furthermore, we conclude that results yielded by these hybrids are superior to those reported in previous studies on the
same dataset.

Acknowledgment

We are very thankful to Mr. Frank Francone to give us permission to using Discipulus tool (demo version) for conducting
various experiments reported in this paper.

References

[1] E. Altman, Financial ratios, discriminant analysis and the prediction of corporate bankruptcy, Journal of Finance 23 (1968) 589609.
[2] E.I. Altman, The success of business failure prediction models: an international survey, Journal of Banking and Finance 8 (2) (1984) 171198.
[3] F. Atiya, Bankruptcy prediction for credit risk using neural networks: a survey and new results, IEEE Transactions on Neural Networks 12 (2001) 929
935.
[4] J.M. Bates, C.W.J. Granger, The combination of forecasts, Operations Research Quarterly 20 (1969) 451468.
[5] V.M. Becerra, R.K.H. Galvao, M. Abou-Seads, Neural and wavelet network models for nancial distress classication, Data Mining and Knowledge
Discovery 11 (2005) 3555.
[6] I. Bose, Deciding the nancial health of dot-coms using rough sets, Information and Management 43 (2006) 835846.
[7] I. Bose, R. Pal, Predicting the survival or failure of click-and-mortar corporations: a knowledge discovery approach, European Journal of Operational
Research 174 (2006) 959982.
[8] S. Canbas, A. Caubak, S.B. Kilic, Prediction of commercial bank failure via multivariate statistical analysis of nancial structures: the Turkish case,
European Journal of Operational Research 166 (2005) 528546.
[9] E. Cane, Online laughing stocks: betting on dotcom failures takes off, 2000. <http://www.nanceasia.com/articles/428EB6C0-4BC2-11D4-
C0F0008C72B383C.cfm>.
[10] D.K. Chandra, V. Ravi, Feature selection and fuzzy rule-based classier applied to bankruptcy prediction in banks, International Journal of Information
and Decision Sciences 1 (4) (2009) 343365.
[11] D.K. Chandra, V. Ravi, I. Bose, Failure prediction of dotcom companies using hybrid intelligent techniques, Expert Systems with Applications 36 (3)
(2009) 48304837.
[12] N.J. Chauhan, V. Ravi, D.K. Chandra, Differential evolution trained wavelet neural network: application to bankruptcy prediction in banks, Expert
Systems with Applications 36 (4) (2008) 76597665.
[13] B. Cheng, C.L. Chen, C.J. Fu, Financial distress prediction by a radial basis function network with logit analysis learning, Computers and Mathematics
with Applications 51 (2006) 579588.
[14] R. Cole, J. Gunther, A CAMEL ratings shelf life, Federal Reserve Bank of Dallas Review (1995) 1320.
[15] K.M. Faraoun, A. Boukelif, Genetic programming approach for multi-category pattern classication applied to network intrusion detection,
International Journal of Computational Intelligence and Applications 6 (1) (2006) 7799.
[16] M.B. Gorzalczany, Z. Piasta, Neuro-fuzzy approach versus rough-set inspired methodology for intelligent decision support, Information Sciences 120
(1999) 4568.
[17] M.P. Grenier, High-tech downturn is no surprise to startups, 2003. <http://www.startupjournal.com/runbusiness/failure/200011220939-
grenier.html>.
[18] Jupiter Research Corporation, Jupiter market forecast report: retail through 2007, 2003. <http://www.marketresearch.com/map/prod/867533.html>.
[19] Kleinbard, The $1.7 trillion dot-com lesson index of 280 Internet stocks is down dramatically from its 52-week high, 2000. <http://www.cnn.com/
2000/fyi/news/11/13/dot.com.economics>.
[20] J.R. Koza, Genetic Programming: On the Programming of Computers by Means of Natural Selection, MIT Press, Cambridge, MA, 1992, ISBN: 0-262-
11170-5.
[21] H. Li, J. Sun, Gaussian case-based reasoning for business failure prediction with empirical data in China, Information Sciences 179 (2009) 89108.
[22] R. Low, R. Togneri, Speech recognition using probabilistic neural network, in: Proceedings of the Fifth International Conference on Spoke Language
Processing (ICSLP98), Sydney, Darling Harbour, 1998.
[23] T.E. McKee, Developing a bankruptcy prediction model via rough sets theory, International Journal of Intelligent Systems in Accounting, Finance and
Management 9 (2000) 159173.
[24] T.E. McKee, T. Lensberg, Genetic programming and rough sets: a hybrid approach to bankruptcy classication, European Journal of Operational
Research 138 (2002) 436451.
[25] M. Odom, R. Sharda, A neural network for bankruptcy prediction, in: Proceedings of the IJCNN International Conference on Neural Networks, San Diego,
CA, 1990.
[26] J.A. Ohlson, Financial rations and the probabilistic prediction of bankruptcy, Journal of Accounting Research 18 (1980) 109131.
[27] I. Olmeda, E. Fernandez, Hybrid classiers for nancial multicriteria decision making: the case of bankruptcy prediction, Computational Economics 10
(1997) 317335.
[28] C. S Park, I. Han, Case-based reasoning with the feature weights derived by analytic hierarchy process for bankruptcy prediction, Expert Systems with
Applications 23 (3) (2002) 255264.
[29] R. Poli, W.B. Langdon, J.R. Koza, A Field Guide to Genetic Programming, Publisher-Lulu.com, United Kingdom, 2008, ISBN: 978-1-4092-0073-4.
[30] C. Pramodh, V. Ravi, Modied great deluge algorithm based auto associative neural network for bankruptcy prediction in banks, International Journal
of Computational Intelligence Research 3 (4) (2007) 363370.
[31] R. Rahimian, S. Singh, T. Thammachote, R. Virmani, Bankruptcy prediction by neural network, in: R.R. Trippi, E. Turban (Eds.), Neural Networks in
Finance and Investing, Irwin Professional Publishing, Burr Ridge, USA, 1996.
[32] V. Ravi, C. Pramodh, Threshold accepting trained principal component neural network and feature subset selection: application to bankruptcy
prediction in banks, Applied Soft computing 8 (4) (2008) 15391548.
P. Ravisankar et al. / Information Sciences 180 (2010) 12571267 1267

[33] V. Ravi, P. Ravikumar, E. Ravi Srinivas, N.E. Kasabov, A semi-online training algorithm for the radial basis function neural networks: applications to
bankruptcy prediction in banks, in: V. Ravi (Ed.), Advances in Banking Technology and Management: Impact of ICT and CRM, IGI Global, USA, 2007.
[34] V. Ravi, H. Kurniawan, Peter, NweeKok Thai, P. Ravikumar, Soft computing system for bank performance prediction, Applied Soft Computing Journal 8
(1) (2008) 305315.
[35] P. Ravikumar, V. Ravi, Bankruptcy prediction in banks by fuzzy rule based classier, in: Proceedings of the First IEEE International Conference on Digital
and Information Management, Bangalore, 2006, pp. 222227.
[36] P. Ravikumar, V. Ravi, Bankruptcy prediction in banks by an ensemble classier, in: Proceedings of IEEE International Conference on Industrial
Technology, Mumbai, 2006, pp. 20322036.
[37] P. Ravikumar, V. Ravi, Bankruptcy prediction in banks and rms via statistical and intelligent techniques a review, European Journal of Operational
Research 180 (1) (2007) 128.
[38] L. Rush, US e-commerce to see signicant growth by 2008, 2003.<http://ecommerce.internet.com/research/stats/article/0,3371,10371_2245631,
00.html>.
[39] Y.U. Ryu, W.T. Yue, Firm bankruptcy prediction: experimental comparison of isotonic separation and other classication approaches, IEEE Transactions
on Systems, Man and Cybernetics, Systems and Humans 35 (5) (2005) 727737.
[40] L. Salchenberger, C. Mine, N. Lash, Neural networks: a tool for predicting thrift failures, Decision Sciences 23 (1992) 899916.
[41] C. Serrano-Cinca, Self-organizing neural networks for nancial diagnosis, Decision Support Systems (1996) 227238.
[42] A. Sharma, Dot-coms_coma, The Journal of Systems and Software 26 (2001) 101104.
[43] M.F. Selekwa, V. Kwigizile, R.N. Mussa, Setting up a probabilistic neural network for classication of highway vehicles, International Journal of
Computational Intelligence and Applications 5 (4) (2005) 411423.
[44] K.S. Shin, T.S. Lee, H.J. Kim, An application of support vector machines in bankruptcy prediction model, Expert Systems with Applications 28 (2005)
127135.
[45] D.F. Specht, Probabilistic neural networks, Neural Networks 3 (1990) 110118.
[46] P. Swicegood, J.A. Clark, Off-site monitoring for predicting bank under performance: a comparison of neural networks, discriminant analysis and
professional human judgment, International Journal of Intelligent Systems in Accounting, Finance and Management 10 (2001) 169186.
[47] K.Y. Tam, Neural network models and the prediction of bank bankruptcy, OMEGA 19 (1991) 429445.
[48] A. Tsakonas, G. Dounias, M. Doumpos, C. Zopounidis, Bankruptcy prediction with neural logic networks by means of grammar-guided genetic
programming, Expert Systems with Applications 30 (2006) 449461.
[49] W.L. Tung, C. Quek, P. Cheng, GenSo-EWS: a novel neural-fuzzy based early warning system for predicting bank failures, Neural Networks 17 (2004)
567587.
[50] F. Varetto, Genetic algorithm applications in the analysis of insolvency risk, Journal of Banking and Finance 22 (1998) 14211439.
[51] T. Whalen, B. Schott, Generalized network modeling and diagnosis using nancial ratios, Information Sciences 37 (13) (1985) 179192.
[52] R.L. Wilson, R. Sharda, Bankruptcy prediction using neural networks, Decision Support Systems 11 (1994) 545557.
[53] E.C. Yen, Warning signals for potential accounting frauds in blue chip companies an application of adaptive resonance theory, Information Sciences
177 (20) (2007) 45154525.
[54] Z. Zhu, H. He, J.A. Starzyk, C. Tseng, Self-organizing learning array and its application to economic and nancial problems, Information Sciences 177 (5)
(2007) 11801192.

You might also like