You are on page 1of 1

Statistics III

FPM 2018

Assignment

Due: July 13, 2018

This assignment is to develop a credit score based on data. The data are from the book “Credit
Scoring and Its Applications” by Thomas, Edelman, and Crook. They are available in the attached
files public.xls and publicdict.xls. The first file has the data and the second file describes the variables.

Data from Excel can be read into R in a variety of ways such as >read.table or >read.csv. The
simplest may be >read.csv(file.choose()).

The variable ‘Bad’ is a binary variable indicating bad credit. The task of credit scoring is to develop a
score by which we can predict whether an individual will end up with bad credit. You are welcome to
read up the literature on credit scoring for additional information.

For this assignment, treat Bad as a variable for classification. Use the following methods to classify:

1. Logistic Regression (>glm, or >lrm in >library(rms))


2. Classification Tree (>rpart in library(rpart))
3. Discriminant Analysis (>lda and >qda in library(MASS))

Each method gives its output differently and you will need a way to compare and contrast them. One
way might be to convert the output to Prob(Bad). If predictions are made, sensitivity, specificity, and
Misclassification can also
misclassification rates may also be reported if the output is binary.
be represented via sensitivity and specificity and associated measures like ROC
and AUC.
You will also need a way to conclude how good each method is. One way might be to do cross-
validation. This can be implemented by holding out a fraction of the data as a validation set, building a
score on the remainder, and then seeing performance on the validation set. The >predict() function in
R may be useful for this.

The end objective of the exercise is to suggest one credit scoring procedure and to justify its use. This
may be a single method, or it may be a combination of methods. How the procedure may be used or
implemented in practice can also be considered.

You are also free to try other methods like probit models, neural networks, random forests and so on
if you so wish, or compare with other software.

Please submit your analysis in the form of a report. Any R or other code can be included in
appendices. In the report, also mention what difficulties and data peculiarities you ran into, if any.

You might also like