Professional Documents
Culture Documents
CARD APPLICANTS
1 2
Irma Rohaiza Ibrahim and Yap Bee Wah
1,2
Faculty of Computer and Mathematical Sciences,
Universiti Teknologi MARA, 40450 Shah Alam, Selangor, Malaysia
Abstract
The process of credit scoring is very important for banks and financial institutions as they need to
segregate good credit risk from bad credit risk in term of their credit worthiness. With the advancement of
computer technology and statistical software such as SAS Enterprise Miner, banks can use credit scoring
to classify credit card applicants. Credit scoring involves building predictive models. The objective of the
study is to build a predictive model to classify credit card application as accepted or rejected. The sample
size of this study is 4305. The data was first partitioned into training (70%) and validation (30%) samples.
Three different credit scoring model were compared: Logistic Regression, Decision Tree and Neural
Network. Comparisons of the performance of the three credit scoring models were based on
misclassification rate. Results show that Decision Tree using chi-square splitting criteria has the lowest
misclassification rate (LR = 28.07%, NN = 24.28%, CART = 20.18%). Results also show that female and
older applicants are more likely to be accepted. Applicants with more years of employment, shorter loan
duration and those who owned house and properties are more likely to be accepted.
Keywords: credit scoring, logistic regression, decision tree, classification, predictive modeling
1. Introduction
In Malaysia, credit cards and charge cards were introduced in the mid-1970s. In the early days, holdings
of such payments cards are reserved for the rich. Prior to the introduction of payment cards, cheque is
the dominant form of non-cash payment instrument. However, it is only accessible to a small number of
consumers and is dominantly used for large value transactions. By early 1990s, credit cards were more
accessible to the general public but there were strict income requirements imposed on credit cardholders.
In 1998, the entrants of non-bank credit card issuers have raised the competition further to an already
competitive credit card market in Malaysia.
Nowadays, millions of people rely on credit cards for purchasing goods and services. They enjoy the
advance of 30 day credit, an organized bill at the end of the month and with money prestige “gold cards”
and “platinum cards”, additional service such as insurance plan and free air miles proportionate to their
purchases via credit cards. Due to the intense competition of credit issued by banks, more and more
people can easily apply a credit card without the bank carefully examining their credit worthiness.
Presently, banks do have some screening procedures such as using credit scorecards to make decisions.
The credit vendor who evaluates new applications must use a certain set of profiles of old good credit
applicants or past applicants as a yardstick against which to evaluate new applicants who may be either
new bad or good credit applicants. Credit scoring development is based on historical information from the
databank on existing clients, in order to assess whether the prospective client will have a greater chance
of being a good or bad payer. The main idea in credit risk modeling consists of building classification rules
that predict bank customers as good or bad credit risk. With the existence of data mining software,
predictive model can easily be deployed by bank to analyze a large number of applications quickly and
efficiently. Moreover, a good credit risk scoring model enables the management to make better and more
accurate decisions while processing credit card applications. Hence, the objective of this paper is to
compare the predictive ability of three predictive models: Logistic Regression (LR), decision tree and
Neural Network (NN) model in the classification of credit card applicants.
This paper is organized as follows. In Section 2, we briefly review the applications of predictive models
and the selection of variables. Section 3 presents the methodology for constructing the predictive models.
The results are discussed in Section 4. Finally, some concluding remarks are given in Section 5.
2. Predictive Models
Credit scoring was first introduced in the 1940s and over the years had evolved and developed
significantly. In the 1960s, with the creation of credit cards, banks and other credit card issuers realized
the advantages of credit scoring in the credit granting process. In the 1980s, the banks started to use
credit scoring for other purposes such as personal loan applications. In recent years, credit scoring has
been used for home loans, small business loans and insurance applications and renewals (Thomas,
2000; Koh, 2004). Credit scoring is based on statistical or operational research methods. Historically,
discriminant analysis and linear regression have been the most widely used techniques for building score-
cards. Other techniques include logistic regression, probit analysis, nonparametric smoothing methods
especially k-nearest neighbours, mathematical programming, Markov chain models, recursive partitioning,
expert systems, genetic algorithms and neural networks (Hand and Henley, 1997). Multivariate adaptive
regression splines (MARS), classification and regression tree (CART) case based reasoning (CBR), and
Support Vector machines (SVM) are some recently developed techniques for building credit scoring
models (Huang et al., 2004; Lee at al., 2006; Huang et al., 2007).
Recently, with the development of data mining software the process involved in building credit scoring
model is made much easier for credit analysts. However, the popular techniques for banking and
business enterprises are credit score-cards, logistic regression and decision trees as it is relatively easy
to identify the important input variable, interpret the results and deploy the model.
In building a scoring model, historical data on the performance of previously made loans and borrowers
characteristics are required. Vojtex and Kocenda (2006) provided a table of indicators that are typically
important in retail credit scoring models. They classify the indicators as demographic, financial,
employment and behavioral indicators. Mavri et al., (2008) used variables such as gender, age,
education, marital status and monthly income to estimate the risk level of credit card applicants.
The data in this study used only the demographic, behavioral and employment indicators. The variables
and their categories for this study are shown in Table 1.
3. Methodology
Variable Variable
Role Description
Name Type
Credit card application
APPLICATION STATUS Target Binary
0 : Rejected 1: Accepted
Applicant
HOUSING Input Binary
0: Rent 1: Own
Nature of job
0 : unemployed
1 : unskilled
JOB Input Ordinal
2 : skilled employee / official
3 : management/ self-
employed/ officer
Applicant is
GENDER Input Binary
0: Male, 1: Female
Applicant is
MARITAL_STATUS Input Nominal
0 : Single, 1: Married ,
2: divorced/widowed
P(Y = 1)
log = α + β1 X 1 + β 2 X s + ... + β n X n (1)
1 − P(Y = 1)
where P(Y=1) is the probability of the outcome of interest.
The multilayer perceptron (MLP) is the most widely used neural network model in data analysis. An
illustration of the MLP is given in Figure 2. It is a feed-forward network which composed of input layer
(units corresponding to input variables), hidden layers (consists of neurons which outputs a nonlinear
function of a linear combination of its inputs) and an outer layer (consists of neurons corresponding to the
target). If the target variable has multiple class (>2), there are multiple neurons in the output layer (SAS
Institute Inc, 2005).
Figure 2 An Illustration of Multilayer Perceptron NN model
(Source:SAS Institute Inc,2005)
4 Results
This section presents and discusses the results obtained.
Demographic Indicators
JOB Unemployed 88 2
Nature of job Unskilled 879 20.4
skilled employee / official 2717 63.1
management/ self-employed/officer 631 14.5
Financial Indicators
Financial Indicators
Valid: Train:
Selected Model Misclassification Misclassification
Model Node Rate Rate
Table 5 presents the STEPWISE model results. Results show that female and older applicants are more
likely to be accepted. Those who do not own homes or property are more likely to be rejected. Applicants
with many existing loans and longer duration loans are also more likely to be rejected.
Valid: Train:
Selected Model Misclassification Misclassification
Model Node Rate Rate
Fit Statistics
Model selection based on _VMISC_
Valid: Train:
Selected Model Misclassification Misclassification
Model Node Rate Rate
Table 10 displays the sensitivity, specificity and the misclassification rate for each model. The sensitivity
rate is the true positive rate (the percentage of accepted applicants predicted correctly as accepted) while
specificity is the true negative rate (percentage of rejected applicants predicted as rejected). Decision tree
is the best predictive model as it has the lowest misclassification rate and the highest sensitivity.
5. Conclusion
This study focused on the construction and evaluation of three predictive models which include logistic
regression, decision tree and neural network model to classify credit card applicants. Results revealed
that the decision tree has the lowest misclassification rate. The performance of predictive models
depends on the data structure, data quality and the objective of the classification. In practical applications,
classification methods such as decision trees and logistic regression which are relatively easy to
understand and deploy are more appealing to users. With the availability of data mining software, more
banks are finding data mining techniques useful in gaining competitive advantage.
References
Berry, M.J.A. and G.S. Linoff (2004). Mastering Data Mining: The Art and Science of Customer
Relationship Management, New York: John Wiley & Sons, Inc.
Davis, R.E., K. Elder, D. Howlett and E. Bouzaglou, (1999). Relating storm and weather factors to dry
slab activity at Alta, Utah and Mammot Mountain, California, using classification and regression trees.
Cold Regions Science and Technology, Vol. 30, pp. 79.
Fu, L. (2004). Efficient evaluation of sparse data cubes. In Advances in Web-Age Information
Management: Fifth International Conference (WAIM’04), Dalian, China, July 15-17, (pp. 336-345).
Hand, D. J and Henley, W. E. (1997). Statistical Classification Method in Consumer Credit Scoring: a
Review. Journal Statistics Social A, Vol. 160(3), pp. 523-541.
Hian, C. K. and Chan, K.L. (2004). Going concern prediction using data mining techniques, Managerial
Auditing Journal, Vol 19, No 3, 462-476.
Huang, Z., Chen, H., Hsu, C. J, Chen W.H., Wu, S., (2004). Credit rating analysis with support vector
machines and neural networks: A market comparative study. Decision Support Systems, Vol.37, pp.
543– 558.
Huang, C.L, Chen, M. C and Wang, C. J. (2007). Credit scoring with a data mining approach based on
support vector machines. Expert Systems with Applications, Vol. 33(4), pp. 847–856.
Koh, H.C., Tan, W.C. and Goh, C.P. (2004). Credit scoring using data mining techniques. Singapore
Management Review. Vol. 26, No. 2, pp. 25-47.
Kurt, I., Ture, M., & Kurum, A. T. (2008). Comparing performances of logistic regression, classification
and regression tree, and neural networks for predicting coronary artery disease. Expert Systems with
Applications, Vol. 34, pp. 366–374.
Lee, T. S., Chiu, C. C., Chou, Y. C., & Lu, C. J. (2006). Mining the customer credit using classification and
regression tree and multivariate adaptive regression splines. Computational Statistics & Data Analysis,
Vol. 50, pp. 1113–1130.
Mavri, M,. Angelis, V. and Loannou, G. (2008). A two-stage dynamic credit scoring model based on
customers profiles and time horizon. Journal of Financial Services Marketing. Vol.13, No.1, pp.17-27.
Olson, D. and Yong, S. (2007). Introduction to Business Data Mining. McGraw Hill International Edition.
Roiger, R. J. and Geatz, M. W. (2003). Data Mining: A Tutorial-Based Primer. Pearson Education, Inc.
®
SAS Institute Inc. (2005). SAS Training Course Notes: Applying Data Mining Techniques Using SAS
Enterprise Miner™, SAS Institute Inc., Cary, NC 27513, USA.
Thomas, L. C.(2000). A survey of credit and behavioral scoring: Forecasting risk of leading to customer,
International, Journal of Forecasting, 16, 149-172,2000.
Vojtek, M. & Evžen Kočenda, (2006). Credit-Scoring Methods (in English), Czech Journal of Economics
and Finance (Finance a uver), Charles University Prague, Faculty of Social Sciences, vol. 56(3-4), pages
152-167.