You are on page 1of 7

Title: FICO Score, Loan Duration and Amount, and Borrowers Income are Correlated with Lending Club

Interest Rates

Introduction

Lending Club is the peer lending network that enables members to provide loans to other members for specified interest rates [1]. In the Lending Club model, borrowers apply and obtain loans from investors, who are then paid back with interest. By cutting out the overhead of the traditional banking system, the Lending Club claims to reduce costs for borrowers and increase investment returns for lenders. By choosing loans of different interest rates and durations, the investors can assembled a personalized portfolio of loans. In order to obtain a loan, a potential borrower completes an application with personal financial information. The Lending Club uses the provided information to calculate a personalized interest rate at which the investors can fund that loan. Interest rates can vary widely and depend primarily on the FICO score, but also other variables. Using standard multiple regression analysis, we investigated the correlation between financial determinants and loan interest rate. Our analysis shows that in addition to FICO score, the differences in the interest rates can be predicted by loan duration and amount, and borrowers income. The purpose of loans, years of employment, home ownership status and state of residence have minimal predictive value. Our results suggest that individual financial characteristics can be used to determine loan interest rates for the Lending Club borrowers.

Methods Data Collection Data in a csv format and the codebook for variables were downloaded from the Data Analysis course website [2, 3]. Exploratory Analysis All statistical analysis was performed using R statistical package v2.15.2 [4]. Exploratory analysis was performed by examining data tables, distributions using histograms and plots, and correlations using scatter plots. Exploratory analysis identified (1) missing values, (2) data classes of variables and (3) potential correlations among variables. Required transforms of the data for ease of regression analysis were performed.

Statistical Modeling In order to estimate the association of measured variables on the interest rate, we built a standard multiple linear regression model [5]. In addition to FICO scores, the selection of other variables to include in the model was based on exploratory analysis and correlations of residuals from simple regression model with other variables. Coefficients were estimated using ordinary least squares. Reproducibility The analysis reported here can be reproduced exactly using the R markdown file, lendingclubloans.rmd and data available on the course website [6].

Analysis

Data for 2,500 loans facilitated by the Lending Club were used for this analysis. In addition to the interest rate, the data include values for loan amount requested (LAR, dollars), loan amount funded (LAF, dollars), loan duration (LD, 36 months or 60 months), purpose of the loan, debt-toincome ratio (%), state, home ownership status, monthly income (MI, dollars), FICO score range (in bands of 5 points), number of open credit lines, revolving credit balance (dollars), number of credit inquiries in last 6 months and length of employment (years). Exploratory analysis found that two observations lacked values for monthly income, open credit lines, revolving credit balances and/or inquiries in the previous six months [5]. These two observations were removed from further analysis. The small number of observations with missing values is unlikely to influence the studys conclusions. Interest rates ranged from 5.42% to 24.89% with a median of 13.11% and mean=13.11%. The distribution of interest rates mostly followed a normal distribution pattern, with a maximum around 11%, and second peak around 5% (Figure a). We first correlated the interest rates with the FICO score range using a simple linear regression model. The initial simple regression model for predicted interest rate (PIR) had the form: PIR = b0 + b1*FICO + e Where b0 is an intercept term, b1 represents regression coefficient indicating change in interest rate with one band change in FICO score, and e represents all un-modeled and unmeasured sources of variation. The model explained about 54% of variance in the interest rates. The regression coefficient varied significantly among different FICO score bands, with differences between bands in the range from 645 to 709 being not significant, but being significant above 709 FICO score bands. Predicted interest rates showed a correlation coefficient = 0.74 ( 95% C.I. = 0.72, 0.75) with actual interest rates (Figure b).

Analysis of residuals from model 1 showed significant correlation with amount of loan amount requested and funded, and borrowers monthly income. Values of these three variables were highly right skewed. In order to facilitate accurate regression modeling, we performed log base 10 transformation. These transformed values were used for the multiple regression modeling. Model 1 residuals also showed a significant dependence on the loan duration with 60 month loans carrying a higher interest rate than 36 month loans (Figure c). These residual correlated variables were included in the final regression model. The final regression model (model 2) was: PIR = b0 + b1*FICO + b2*log10(LAR) + b3*log10(LAF) + b4*log10(MI) + f(LD) + e Where b0 is an intercept term, b1 represents regression coefficient indicating change in interest rate with one band change in FICO score, b2, b3 and b4 represent change in interest rate with 10fold change in loan amount requested, loan amount funded and monthly income, respectively, and f(LD) is a factor model 2 levels for loan duration. The term e represents error from all sources, either un-modeled or unmeasured. The predicted interest rates correlated closely with actual interest rates (correlation coefficient = 0.89) (Figure d). The residuals from model 2 did not appear to show any significant non-random association patterns with any of the variables. In addition to significant association with FICO score, we observed strongly statistical significant association between the interest rates and loan amount requested (regression coefficient=1.16 (p-value <0.001; 95% C.I.= 0.94, 1.37). Loan duration also showed strong statistical significance with 60 month loans carrying higher interest rates (regression coefficient= 3.54 (p-value <0.001; 95% C.I.= 3.33, 3.74). Loan amount funded and borrowers income showed lower, but still statistically significant associations, with regression coefficients of 0.22 (p-value = 0.009; 95% C.I. = 0.05, 0.38) and -0.18 (p-value = 0.04; 95% C.I. = -0.35, -0.06). Therefore, for borrowers with similar FICO score range, the interest rates can be predicted by modeling the loan amount requested, loan duration and borrowers income. A loan with 10-fold higher requested loan amount would be expected to carry, on average, 1.16% higher interest rate. A loan with 60 month duration would be expected to invite, on average, a 3.54% higher interest rate. Conclusions Our analysis suggests that there is a statistically significant correlation between FICO score and interest rate charged in the studied loan sample. The correlation is negative with an increase in FICO score range associated with a lower interest rate. However, FICO score bands alone account for slightly more than half of the interest rate variance. Other factors, such as loan duration, loan amount requested and funded, and borrowers monthly income contribute to the interest rate calculation. A multiple regression model that includes five additional variables predicts almost 80% variance in the loan sample. Predicted interest rates show a correlation coefficient of 0.90 with actual interest rates.

Although our regression modeling shows promising correlation with actual interest rates at the Lending Club, this analysis is based on a small sample of 2,500 loans. Whether this sample is representative of all loans originated at the Lending Club is unknown. Also unclear is the duration over these loans were funded. If the sample consisted of loans issued over a small time period, it might not be applicable to a broader time period. There are potential confounding factors that were not included in the dataset. Such confounders could include investment returns available to investors outside the Lending Club, such as Certificate of Deposit (CD) interest rates, treasury bills, etc. However, over a short period of time, when the interest rate environment is steady, a predictive model to determine interest rates based on borrowers financial parameters can be highly useful.

References

1. Lending Club home page. URL: https://www.lendingclub.com/home.action. Accessed 11/13/2013. 2. Coursera, Data Analysis Course. URL: https://sparkpublic.s3.amazonaws.com/dataanalysis/loansData.csv. Accessed 11/13/2013 3. Coursera, Data Analysis Course. URL: https://sparkpublic.s3.amazonaws.com/dataanalysis/loansCodebook.pdf Accessed 11/13/2013 4. R Core Team (2012). R: A language and environment for statistical computing. URL:http://www.R-project.org 5. Seber, George AF, and Alan J. Lee. Linear regression analysis. Vol. 936. Wiley, 2012. 6. R Markdown Page. URL: http://www.rstudio.com/ide/docs/authoring/using_markdown. Accessed 1/31/2013

Figure: Building a Multiple Regression Model to Predict Loan Interest Rates at the Lending Club. (a) A histogram shows the interest rate distribution in the 2,500 loan sample from the Lending Club. Interest rates are normally distributed with a second peak at around 7%. (b) A scatter plot shows the comparison of actual and predicted interest rates (single regression model using only FICO score range). A red line represents the least square fit of the data. Correlation coefficient = 0.74. (c) A side by side boxplot shows the relationship between loan duration and single regression residuals. 60 months loans (green box plot) carry a higher interest rate than 36 months loans (red box plot). (d) A scatter plot shows the comparison of actual and predicted interest rates from a multiple regression model. A red line represents the least square fit of the data. Correlation coefficient =0.89.

(a)
500 Predicted Interest Rate (%) 25

(b)

400

20

Frequency

300

GG G G G G G GG GG G GGG G G GG G G G G G G G G G G G G G G G G G GG G GGG GG G G GGG GG G G G G G GG G G G G GG G G G G G G G GG G GG G GG GG GG G G G G G G G GG G G GG G G G G G GG GG G G G

200

G G G GG GG

GG G GG G G G G G G G G G G G G G G G G G GG G G G G G GG GGG G G G G G G G G GG G G G G GG GG G G G G GG G G G G G G G GG G GG GG G G G G G G G G G G G G G G G G GG G GG GG GG G GG G G G G G G GG GG GGG GG GG G G G G G G G G G GG GG G G GG G G G G G G G G G G G G G G G G G GG G G G G GG G G G G G G G G G GG G G GG G GG G GG G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G GG G GGG GGG G G G G G G G G G G G G G G G G G GG G G G G G G G G G G G G G G GG G G G G G GG G G GG G G G G G G G G G G GG G G G G G G GG GG G G G G GG GGG G GG G G G G GG G GG GGG GG G GG G G G GG G GG G G G G G G G GG G G G G GG G GG G G G G GGG G G GG G G G G GGG G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G GG G G G G GG G G G G G G GG G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G GG G G G GG GG G

15 5 10 5

G GGG G G G GG GGG G G G G G G G G G G G G G G G G G G GG G G G G G GG G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G GGG G G G G G G G G G G G GG G G G G G GG G GG GG G G GG G G GG G G G GG G G G G G G GG G G G G G G G GG G G G GG G G GG GG G G G G GG G G G G GG G G G G G G G G G G G G G GG G GG G GG G GG G GG G GG GG G GG G G G GG G G G GG G G G G G G G G G G G G G G GG GG G G G GG G GG G G G G G G G GG G G G G G GG G GG G GG G G G G G G G G G G G G G GG G G G G G G G G G G GG G G G G G G GG G G GG G G G G G G G G GG GG G G G GG G G G G G G G G G GG GG G G G G G G G G GG GG GG G G G G G GG G GG G G G G G GG G G G G G G G G G G G G G G G G G G G G G G G GG G GG G G G G

0 5

100

10

15

20

25

10

15

20

25

Interest Rate (%)

Actual Interest Rate (%)

(c)
Predicted Interest Rate (%) 10 Residual Distribution
G G G G G G G G G G G G G G G G

(d)
25 5 10 15 20
G G G G GG G G G G G GG GG G G G GG G G G G G G G G G G G G G GG G GG G G GG GG G G G G G G G G G G G GG G G G G G G G G G G G G G G GGGG GG G GG GGG G G G G G G G GG GG GGG G GG G G G G G G G G GG G G G G G G G G G G G G GG GG GGG G G GG GG GG G GG G G GG G G GG G G G GG G G G G G GG G G G G GG G G G GG G G GG G G GG G G GG G G G G GG G G G G GG GG G GG G G G G G GG G G G G G G G G G G G G G G G G G G G G G G G G G G GG G G G GG G G G G GG G G G GG G G G G G G GG G G G G G GG G G G G G G G G G G G GG G G G G G G G G G G GG GG GG G G G G G G G G G G GG G G G G G G G G G G G G G G G G G G G G GG G G G G G G G G G G G G G G GG G G G G G G G G G G G G G G G G G G G G G G G GG G G G G G G G G G G G G G G G G G G G G G G G G G G G G G GG G G G GG G G G G G G G G G G G G G G G G G G G G G GG G G G G G G G G G G G G G G G G G G G G GG G G G G G G G G G GG G G GGG G G G G G GG G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G GG G G G G G G G G G G G G G G G G G G G G G G G G G GG G GG G G G G GG G GG G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G GG G G GG GG G GG G G G G G GG G G G G G G G G G GG G G G G G G G G G G G G GG G G GG GG G G GG G GG G G G G G G G G G G G G GG GG G GG G G G G G G GG G G G G G G GG G G G GG GG G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G GG G G G G G G G G G G G G G GGG G G G G G G G GG G G G G G G G G G G G G G G G GG G G G G G G GG G G G G G G G G G G G G G G G G GG G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G GG G G G G G G G G G GG G GG G GGG G G G G G G G G G G G GG G G G G G G GG G G G G G G G G G GG G G GG G G G G G G GGG G GG G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G GG G G G GG G G G G G G G G G G G G GG G G G G G GG G G GG G G GG G G G G G GG G G G G G G G G G G G GG G G G GG G G G G G G G G GG G G G GG G G G GG G G G G G G G G G G GG GG GG G G GG G G GG G G G G GG G G G G G G G G GG G GGG G G G GG G G G G G G G GG G G G G G G GG G GG G G G G G GG G G G G GG G G G G G G G G G G G G G G G G GG G G G G G G G G G G G GG GG G G GG G G G G G G G G G GG G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G GG G GG G GG G GG G G G GG G G G G G G G G G G G G G G G G G G G G G G GG G G G G G G GG G G GG G G G GG G G G G G G G G G G G GG G GG G G GG G G G G G G G G G G G G G GG G G GG G G G GG G GG G G G G GG

5
G G

36 months

60 months

10

15

20

25

Loan Duration

Actual Interest Rate (%)

You might also like