CredScoring Micro India

Article
Modelling Credit Journal of Emerging Market Finance

16(3) 1–13
Default in © 2017 Institute for Financial

Management and Research
SAGE Publications
Microfinance—An sagepub.in/home.nav
DOI: 10.1177/0972652717722084
Indian Case Study http://emf.sagepub.com
P. K. Viswanathan1
S. K. Shanthi1
Abstract
Credit score models have been successfully applied in a traditional credit
card industry and by mortgage firms to determine defaulting customer
from the non-defaulting customer. In the light of growing competition
in the microfinance industry, over-indebtedness and other factors, the
industry has come under increased regulatory supervision. Our study
provides evidence from a large microfinance institutions (MFI) in India,
and we have applied both the credit scoring method and neural network
(NN) method and compared the results. In this article, we demonstrate
the capability of credit scoring models for an Indian-based microfinance
firm in terms of predicting default probability as well the relative impor-
tance of each of its associated drivers. A logistic regression model and
NN have been used as the predictive analytic tools for sifting the key
drivers of default.
Keywords
Logistic regression, probability of default, MFI, neural network
1
Great Lakes Institute of Management, Kanchipuram, Tamil Nadu, India.
Corresponding author:
P. K. Viswanathan, Great Lakes Institute of Management, Kanchipuram, Tamil Nadu, India.
E-mail: viswanathan.pk@greatlakes.edu.in
2 Journal of Emerging Market Finance 16(3)
1. Introduction
As a sequel to the microfinance crisis which took place in Andhra
Pradesh in India in October 2010, the microfinance institutions (MFIs)
in India have come under stringent regulations. Among other things, the
regulators have sought to put a cap on the interest margins that MFIs can
receive from their operations. Given this fact, assessment of credit risk in
any MFI assumes great relevance. The Indian microfinance industry is
also in that critical growth phase where they are attempting to make the
transition from non-profit making entities to profit making enterprises,
capable of economic viability over the years to come. Increasing aware-
ness of investors and lenders to commercial viability, growing competi-
tion and shrinking returns in the microfinance market are the key factors
forcing MFIs to improve the efficiency of their lending operations. It is
in this context predicting credit default looms large. Predictive analytic
models are being used to estimate probability of default as well as dif-
ferentiating defaulters from non-defaulters of loan in terms of important
characteristics/variables. To our knowledge, no study relating to risk
measurement in microfinance sector, pertaining to India, has used credit
scoring methodology.
Credit scoring models envisage quantitative analysis of parameters of
past data on loans to predict the future in terms of default probability
assuming in future also the same parameters hold true. The models in
this regard do two things. First, they predict the probability of default,
and second, they provide a classification table giving defaulters and non-
defaulters predicted by the model alongside the actual defaulters and
non-defaulters thus enabling us to evaluate the efficacy of the predictive
power of the model used. Scoring models primarily rely on the use of
enormous computing power available today to predict and classify prob-
ability of defaulters using advanced statistical techniques.
2. Review of Literature

According to Lewis (1992), Credit scoring models are built using statis-
tical techniques that assign points to the variables that are part of a credit
system for deciding to give loan or not. These models require identifying
characteristics that may facilitate the differentiation of the potential
credit defaulters from the non-defaulters.
One of the earliest studies that applied credit scoring models for
microfinance is Vigano (1993). Empirical evidence in this area is rather
Viswanathan and Shanthi 3
limited, particularly with respect to developing countries. Saunders

(1999) focussed on the use of credit score for the purpose of classifying
credit categories into prompt payers, insolvent, good or bad and desira-
ble or not. This classification can help the credit analyst whether to grant
the requested credit or not. Schreiner (2000) demonstrates that scoring
model does have a role to play in microfinance. Although models will
not be a substitute for the judgement and sound personal knowledge of
loan officers or loan groups with regard to characteristics that differenti-
ate a defaulter from a non-defaulter, they can certainly improve esti-
mates of risk thus predict probability of a loan default. Thomas (2000)
explained the differences that exist between the models of credit approval
and behavioural scoring. While the primary aim of the credit approval
models is to estimate the probability of a new credit applicant becoming
insolvent with the institution in a particular time horizon, the behavioural
scoring models aim to estimate the probability of a client’s insolvency
that already has availed credit facility from the institution.
Andrade (2004) pointed out that while some institutions still practice
judgmental credit scoring, models such as discriminant analysis, logistic
regression and neural network (NN) are being used extensively. Schreiner
(2004) discusses the advantages and limitations of credit scoring applied
to microfinance. Analytic models do have predictive capability to sig-
nificantly improve the evaluation of the risk associated with loans. This
article also points out the basic steps in a scoring project. Carmona and
Araujo (2011) used credit score models for a microcredit institution for
credit approval and behavioural scoring. The multivariate model used for
scoring was logistic regression. The results of their study revealed a pre-
dictive accuracy of about 80 per cent. It was also pointed out that the two
critical problems, namely, insolvency and high operational costs, that
adversely impact the financial sustainability could be mitigated substan-
tially, resulting in reduction in insolvency incidence and decrease in
operational costs.
Maves (1991) observes that because of its adaptability to change,
NNs can be ‘retrained’ much more quickly than discriminant analysis-
based techniques when markets, products and economy change. Neural
network has served as versatile predictive analytic tool in a variety of
complex environments. In finance, it has been successfully applied
to predicting bankruptcy and loan default as well as credit evaluation
(DeLurgio & Hays, 2001; Jain & Nag, 1995). Ghatge and Halkarnikar
(2013) point out that the feed-forward back propagation NN when used
to predict credit default based on selected parameters show abilities of
the network to learn the patterns as well as robust in classification.
In this article, we attempt to fill an important gap in the empirical

literature pertaining to India. We use primary data collected from one of
the largest MFIs in India and try to evaluate the accuracy of forecast
using the credit scoring methodology applying logistic regression and
NNs techniques. In the current context, this article assumes a great deal
of importance since the potential for microfinance to ease the path
towards total financial inclusion and then on to inclusive growth is unde-
niable (Gangopadhyay & Shanthi, 2012).
We further find answers to some important questions. (i) What are the
key variables that can significantly predict a credit default in the micro-
finance context? (ii) How effectively can we use the logistic regression
method for this purpose? (iii) Can we corroborate its predictive accuracy
using another modelling technique, which is also used in the literature,
namely, the NN? and (iv) Apart from identifying the key variables, can
we also identify the relative importance of the key drivers?
3. Data
We have used primary data collected from a leading MFI ABC in Tamil
Nadu. The company has been in the business of extending microcredit to
people who are unable to get finance from the mainstream banking
avenues. In this context, the alternative source of securing finance is from
private money lenders whose rates of interest are in the vicinity of 30–100
per cent. The mission of ABC is to make available finance at reasonable
cost to such customers in a transparent manner and, in the process, tries to
achieve acceptable returns on investment to ensure economic viability.
ABC is one of the largest microfinance companies in India. As of July
2014, the company has a client/beneficiary base of close to two million,
employee strength of about 3400 and total outstanding credit of about
`1.73 billion. The company currently has about 335 branches spread
over India.
3.1. ABC’s Lending Model

Customers are formed into groups comprising five members. Three to
six groups are to be amalgamated into a centre. Each group and centre
will have one leader. The groups have joint liability in the sense each
member of the group stands guarantee to the loan repayment of the other
members of the group. Areas are identified for microfinance customers
by their teams and the sales officer communicates the salient features of
the company’s schemes to prospective borrowers. The applicants
are then screened for their credit risk. Criteria used include length of stay
in the same place of residence, nature of business, income, expenditure,
age, caste, among others.
3.2. ABC’s Operational Risk Management

ABC has an operational risk team. The operational risk team has three
types operational risk audit
1. Member audit—this audit is done by field risk officer. A random

sample is selected and member audit form is filled. Finally, member
audit risk score is calculated.
2. Centre-meeting audit—a risk team picks up a random sample,
visits the field and audits the centre meetings. A centre-meeting
audit form is filled and finally the score is calculated.
3. Branch audit score—there are total of 280 branches in ABC com-
pany. Every month all 280 branches are audited, and a branch
audit score is calculated.
These scores reflect the operational efficiency of the sales officers

and relationship officers working on the ground.
4. Methodology
Logistic regression is a variation of ordinary regression in which the
dependent variable is binary and it takes values 0 or 1. The dependent
variable is categorical and usually represents the occurrence or non-
occurrence of an event and the independent variables can be continuous,
categorical or both. Logistic regression has been widely used in the
financial service industry for credit scoring models. On theoretical
grounds, logistic regression is a more appropriate statistical tool than
linear regression, given the fact the dependent variable is categorical that
has two discrete classes in credit risk, namely, a customer is a defaulter
or a non-defaulter. Ordinary least squares (OLS) regression will be fraught
with problems in predicting probability of default which has to be
between 0 and 1. It cannot guarantee estimated probability will always
fall in the range 0–1. On the contrary, logistic regression will ensure the
estimated probability to fall in the range 0–1 because it is based on a
sigmoid function. In logistic regression, the individual parameters can be

tested for statistical significance. The model has clarity when it comes to
writing the equation connecting the dependent variable with a host of
independent variables. This facilitates predicting the default probability
for a new customer asking for loan.
Neural networks can be used effectively in corporate credit decisions
and in fraud detection. The initial work on NNs was motivated by the
study on human brain and the idea of neurons as its building blocks.
Artificial intelligence researchers introduced a computing neuron model
to simulate the way neurons work in human brain. This model provided
the basis for many later NNs developments. Neural networks are univer-
sal approximation and extremely powerful as a predictive analytic tool.
If the main objective is hypothesis testing, then one should go to tradi-
tional and proven statistical modelling. If the main objective is predictive
power, then NN is a strong contender and often can provide more accu-
rate results than statistical regression modelling. Neural network cannot
directly assess the change in the dependent variable caused by the change
in the independent variable. In other words, it cannot provide a satisfac-
tory answer to the question ‘if the independent variable increases by one
unit, what is the change in the dependent variable?’. It is hard to write the
final equation in the NN-based modelling that is required for predicting
the dependent variable. For example, in predicting probability of default
for a new customer, we need the equation connecting the dependent and
independent variables. That is pretty hard to find in the NN which may
have many layers. These short comings could be perhaps overcome by
performing sensitivity analysis on the independent variables.
In this article, we would like to take advantage of both these tech-
niques to confirm the significant independent variables that impact the
behaviour of credit default. For prediction, the logistic regression will be
used in which the independent variables are more precisely defined. We
then use the NN method to find support to the results obtained using
logistic regressions.
5. The Credit Risk Models

In this article, we focus on the analysis of credit risk, which is a part of
the financial risk to the service provider, the other part of financial risk
being the market risk. The main focus of this article is to find out the
relative merits of using NNs and logistic regression methods in model-
ling credit risk. We have also explored the issue of these two methodolo-
gies reinforcing one another and giving us a better model fit.
For the purposes of this study, the following 10 variables have been
identified in the context of predicting credit risk, based on a detailed
discussion with the organisation. These are the variables that the micro-
finance organisations use in trying to understand the credit risks involved.
The modelling techniques are different and have better scientific under-
pinnings and expected to perform better. ABC Ltd. does not use either
the NNs or the logistic regression methodologies in their modelling pro-
cess. Therefore, the purpose of this work is also to provide them with a
modelling technique that performs better in predicting the credit risk.
1. Age
2. Total family members
3. Length of stay—duration of stay in the house
4. Loan amount requested—loan principal amount
5. Total income of family
6. Monthly expenses
7. Toilet—attached or public toilet
8. Type of house—tiled or RCC—concrete or sheet or thatched
9. Religion
10. Caste
Sample size of 640 customers comprising 504 good accounts (non-

defaulters) and 136 bad accounts (defaulters) were selected from the
company’s data base to model the behaviour of credit default. For
classification in terms of prediction versus actual and also for sifting the
relative importance of independent variables that impact default behav-
iour, logistic regression and neural network based on multilayer percep-
tron (MLP) have been used for modelling the credit default. SPSS
software was used to obtain the results both for logistic regression and
NN (MLP). As the scope of this research article is confined to answering
the four specific research questions enunciated earlier under ‘Research
Questions’ that have arisen after the review of literature, the following
points are being succinctly addressed in terms of the two techniques.
6. Discussion of Results

6.1. Logistic Regression Model
Our results show that the logistic regression model has an overall predic-
tive accuracy of 88.9 per cent (Table 1) in terms correct of classification.
Table 1. Results–Logistic Regression
Predicted
Overdue
Observed No Yes Percentage Correct
Overdue No 479 25 95.0
Yes 46 90 66.2
Overall Percentage 88.9
Source: SPSS Output.
The model performs well in terms of its overall predictive power. Out of
the actual cases of 504 which belong to ‘no overdue’, the model has
incorrectly predicted 25 as ‘yes overdue’, which is only 5 per cent of the
total sample size. This is a measure of type I error. Out of 136 cases
observed in the actual data which are in overdue category, the model has
incorrectly predicted them as ‘no overdue’ which amounts to 33.8 per
cent. This is a measure of type II error. The type II error is large, and
thus, we observe that the model has not been able to strike a proper bal-
ance between type I and type II errors though the overall predictive accu-
racy is satisfactory (88.9 per cent).
From Table 2, the following insights could be drawn: length of stay,
total income, loan amount required and expenses are overwhelmingly
significant, predictors of loan default based on 5 per cent level (see Table
2 where p values under column Sig are given).
1. Type of house and Age are highly significant at 5 per cent level
pointing they are good predictors of loan default.
2. Caste as a factor is overwhelmingly significant predictor of default
( p-value is very small) at 5 per cent level.
3. Total family (number of members) is moderately significant
(significant at 6.4 per cent) as a predictor of default.
4. EXP (B) column in the output in Table 2 has an interesting inter-
pretation. These are odds, and whenever the number is more than
1, the probability of default is more than 50 per cent, and it will
increase for every one additional unit of the concerned independ-
ent variable. By this criterion, we find that length of stay, total
income, total family, expenses, type of house and caste are critical
in assessing default behaviour.
5. Loan amount required has odds (0.999) almost close to 1 and
hence can be taken to be critical predictor of risk.
Table 2. Logistic Regression—Relative Importance of Variables
Logistic Regression- Variables in the Equation
B S.E. Wald df Sig. Exp(B)

lengthofstay .004 .001 12.565 1 .000 1.004
total_income .001 .000 48.877 1 .000 1.001
total_family .295 .160 3.418 1 .064 1.343
Age –.039 .017 5.015 1 .025 .962
LoanAmtReq –.001 .000 66.632 1 .000 .999
Expenses .001 .000 12.683 1 .000 1.001
Typeofhouse 16.470 3 .001
typeofhousecode(1) –1.033 .373 7.673 1 .006 .356
typeofhousecode(2) –1.774 .566 9.838 1 .002 .170
typeofhousecode(3) .081 .508 .026 1 .873 1.084
toiletcode 11.808 2 .003
toiletcode(1) –.513 .354 2.101 1 .147 .599
toiletcode(2) 1.398 .602 5.391 1 .020 4.046
religioncode 3.148 5 .677
religioncode(1) –3.152 49202.786 .000 1 1.000 .043
religioncode(2) 14.124 40190.860 .000 1 1.000 1360739.815
religioncode(3) 15.350 40190.860 .000 1 1.000 4637565.815
religioncode(4) 14.678 40190.860 .000 1 1.000 2369502.829
religioncode(5) –1.061 56839.952 .000 1 1.000 .346
castecode 28.194 4 .000
castecode(1) 1.733 .360 23.158 1 .000 5.660
castecode(2) 1.306 .367 12.693 1 .000 3.691
castecode(3) –19.462 16104.113 .000 1 .999 .000
castecode(4) 1.495 .779 3.680 1 .055 4.459
Constant –12.705 40190.860 .000 1 1.000 .000
6.2. Neural Network Model
1. In Table 3, we present the results obtained from using NN meth-

odology. In terms of predictive power, NN outperforms logistic
regression. The predictive accuracy is in the vicinity of 93 per cent
for the training sample and 94 per cent for the testing sample.
2. It is significant to note that the type I error both for the training
(3.3 per cent) and testing data (4.2 per cent) is smaller than logistic
regression. The type II error is substantially lower than logistic
regression both for training data (21.9 per cent) and for testing
data (15 per cent). The balancing power of NN is also better than
in the case of logistic regression with regard to controlling type I
and type II errors.
3. From Figure 1, giving relative importance of the independent var-
iables, we deduce that total income, expenses, length of stay, age,
type of house, loan amount, caste, total family, toilet type and reli-
gion are the order in which the independent variables are ranked
in terms of their importance in predicting default risk.
4. As discussed earlier, because of the inability to fully redress the
shortcomings of NN with regard to explanatory variables and pre-
cise form of equation, we confine to the logistic regression model
to predict the loan default.
5. It may be seen that NN confirms the statistical validity of the sig-
nificant predictors of logistic regression as a corroborative tool.
Unifying both logistic regression and NN results, we confirm that the

significant predictors of credit risk for the case under consideration are
Table 3. Neural Network Model
Predicted
Per cent
Sample Observed No Yes Correct
Training No 325 11 96.7
Yes 21 75 78.1
Overall Per cent 92.6
Testing No 161 7 95.8
Yes 6 34 85.0
Overall Per cent 93.8
Note: Dependent Variable: overdue.
Figure 1. Relative Importance of Variables

length of stay, total income, loan amount required, expenses, age, type of
house, total family, caste and toilet type.
7. Conclusion
In this research article, we have successfully demonstrated the capability
of credit scoring models for an Indian-based microfinance firm in terms
of predicting default probability. Further, we have been able to sift the
relative importance of each of its associated drivers. The strengths and
limitations of logistic regression and NN have been discussed in the con-
text of predictive power of credit risk modelling. In terms of predictive
power, NN outperforms logistic regression. The predictive accuracy of
NN is in the vicinity of 93 per cent for the training sample and 94 per
cent for the testing sample and is higher than logistic regression (88.9 per
cent). However, because of the inability to fully redress the shortcomings
of NN with regard to explanatory variables and precise form of equation,
we confine to the logistic regression model to predict the loan default.
We have synergised the advantages of both these techniques to confirm
the significant independent variables that impact the behaviour of credit
default. We conclude that for prediction, logistic regression is preferred

over NN because it has statistical rigor of interpreting significance of
independent variables. The results of logistic regression have been cor-
roborated by NN. The key drivers of the credit default by unifying both
logistic regression and NN are length of stay, total income, loan amount
required, expenses, age, type of house, total family, caste and toilet type.
Even though any analytic model is not necessarily a complete substi-
tute for the existing judgement-based practices of credit scoring preva-
lent in the microfinance sector, it can be used as an important decision
support mechanism alongside so that the credit risk can be at the best
minimum.
References
Andrade, F. W. M. (2004). Development of risk model of portfolio of credit
portfolios of individuals. Doctorate thesis in Administration of Companies,
Escola de Administracao de Empresas of São Paulo, Fundacao Getúlio
Vargas.
Carmona, Charles Ulises De Montreuil, & Araújo, Elaine Aparecida. (2011).
Application of credit scoring models in the analysis of insolvency of a
Brazilian microcredit institution. Journal of Modern Accounting and
Auditing, 7(8), 799–812.
DeLurgio, S. A., & Hays, F. (2001). Understanding the financial interests in
neural networks. Credit and Financial Management Review, 7(3), 27–53.
Gangopadhyay, S., & Shanthi, S. K. (2012). Governance issues in Indian micro-
finance. In James R. Barth, Chen Lin & Clas Wihlborg (Eds), Research hand-
book on international banking and governance (pp. 696–706). Cheltenham,
UK: Edward Elgar Publishing.
Ghatge, A. R., & Halkarnikar, P. P. (2013). Ensemble neural network strategy
for predicting credit default evaluation. International Journal of Engineering
and Innovative Technology (IJEIT ), 2(7), 223–225.
Jain, Bharat A., & Nag, Barin N. (1995). Artificial neural network models for
pricing initial public offerings. Decision Sciences, 26(3), 283–302.
Lewis, E. (1992). An introduction to credit scoring. San Rafael, CA: The Athena
Press.
Maves, G. (1991). Perfecting prediction. Marketing. Retrieved from http://www.
accessmylibrary.com
Saunders, A. (1999). Credit risk measurement: New approaches to value at risk
and other paradigms. New York, NY: John Wiley & Sons.
Schreiner, M. (2000). A scoring model of the risk of costly arrears for loans from
affiliates of Women’s World Banking in Colombia. Women’s World Banking.
Retrieved 11 July 2013, from http://www.microfinance.com
———. (2004). Benefits and pitfalls of statistical credit scoring for microfi-
nance. Retrieved from http://www.microfinance.com
Thomas, L. C. (2000). A survey of credit and behavioral scoring: Forecasting

financial risk of lending to consumers. International Journal of Forecasting,
16(2), 149–172.
Vigano, L. (1993). A credit scoring model for development banks: An African
case study. Savings and Development, 17(4), 441–482.

CredScoring Micro India

Uploaded by

Document Information

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

CredScoring Micro India

Uploaded by

Copyright:

Available Formats

Article

Modelling Credit Journal of Emerging Market Finance

Default in © 2017 Institute for Financial

2. Review of Literature

limited, particularly with respect to developing countries. Saunders

In this article, we attempt to fill an important gap in the empirical

3.1. ABC’s Lending Model

3.2. ABC’s Operational Risk Management

1. Member audit—this audit is done by field risk officer. A random

These scores reflect the operational efficiency of the sales officers

sigmoid function. In logistic regression, the individual parameters can be

5. The Credit Risk Models

Sample size of 640 customers comprising 504 good accounts (non-

6. Discussion of Results

Table 1. Results–Logistic Regression

B S.E. Wald df Sig. Exp(B)

6.2. Neural Network Model

1. In Table 3, we present the results obtained from using NN meth-

Unifying both logistic regression and NN results, we confirm that the

Table 3. Neural Network Model

Figure 1. Relative Importance of Variables

default. We conclude that for prediction, logistic regression is preferred

Thomas, L. C. (2000). A survey of credit and behavioral scoring: Forecasting

You might also like