You are on page 1of 14

Submitted by : Rachit Kulshrestha

The Curious Case of Online Services


Pilgrim Bank has around 5 million customers and is an established bank in the United States ( at least in three districts ) . There is an substantial debate among the management to decide on the Internet strategy for their services. The Online Channel is lucrative to adopt, as it reduces cost per transaction over the previous channel, but it has a higher overall cost structure for implementation and maintenance. Alan Green, the newly appointed analyst, has to come up with his findings based on the data for the year 1999 and present a report. The Bank has two options : 1- Start charging a fees for the use of the online banking channel 2- Offer discounts or rebates to promote the use of online channel. So how does Green come to the conclusion? Thanks to a course in analytics he attended during his school days

Questions.. on his way back home!


Does the data I have representative of the whole customer population? What do I do with the missing data and how important is it? Is there a difference between the data for the Online and Offline customers? What all factors contribute to the profitability? Age, Income, District Is there any co-relation between all the available variables in the dataset. What are my null hypothesis? And what analysis technique do I use to analyze them? Are Online customers more profitable than Offline? Do I have enough coffee to slog all over the weekend!

What do we know from the data!


We have data for more than 31,633 customers, of which 3,854 (12%) use the online channel. Average profitability for all the customers is $111.50, for offline customers it is $110.79 and for online customers it is $116.67. This does not show a large difference in the profitability for the online and offline customers. The standard deviation for the profitability is 272. The mode for Age bucket is 3 ( 25-34 years) and for Income bucket is 6 ($50,000 - $74,999). The average tenure for all the customers is approx 10 years, which shows the established nature of the bank. District wise distribution of the customers is as follows:
1300 - 4150 customers ; 1200 - 24342 customers; 1100 - 3142 customers

The bank determines the profitability of the customer from the following equation
Profitability = (Balance in Deposit Accounts)*(Net Interest Spread) + (Fees) + (Interest From Loans) (Cost to Serve)

Can we use the available data?


To identify that the dataset is representative of the overall population we did a Z - test to identify the 95% confidence interval. The 95% confidence interval lies in 108.496< Profit Mean< 114.509 , the variation from the mean in this case is (+- 3.01) and hence we can conclude that the data set is representative of the whole population. The Z-value for this experiment was (1.96). Our next null hypothesis was The profitability data for Online and Offline Customers is similar. The P-stat for two tail is 0.2224 As it is greater than .05, we accept the null hypothesis and conclude that there is not difference between the profitability of online and offline customers. The t-stat also lies in the interval of (+- 1.96) which is the Z-value)

What do we do with the missing data?


We have around 8822 samples without the data for age or income buckets. Can we ignore this data? Again use t-test to indentify the significance of the missing data. Our null hypothesis is that no significant difference exists if we ignore the missing data. The t-test was conducted and it showed that P-value for two tail is <0.05 and hence we have to reject the null hypothesis and so we conclude that missing data is significant and hence we should replace it in the data.

We decided to replace the data with the mode values of each observation in Age bucket and Income bucket. We have replaces all the blanks in the age bucket with 3 and blanks in the income bucket with 6.

Who are our profitable customers?


Plotting a frequency chart yields results where the majority of profits lie in the -$100 to $200 range.

Only little more than 50% customers were profitable and around 20% customers accounted for all the profits. More surprisingly around 10% of the customers generated around 70% of total profit. So it becomes more important to retain the highly profitable customers and make them our prime customers. ( Prime Customers : customers above $300 profitability). These customers have an average tenure of 13.72 years.

Running Regressions
First we ran a linear regression with profitability as the dependent variable and online flag as the independent variable. This is to check how is the online channel related to profitability.

Very low value of Adjusted R-square (close to 0) displays that the model is not predictable at all and we need to include more factors to the regression. The Significance F is high and hence we reject this model completely.

What other factors affect Profitability?

Regression analysis with profit as the dependant variable and online usage, Age, Income, tenure, Location as the independent Variables. The Adjusted R square is again very low at
at 0.5879 and thus shows a poor predictability of this regression model. The interesting things to note are the p-values for each independent variable. By looking at the these we can say that a high p-value for district means Profitability is not affected by location and we can safely remove this variable from the final Regression equation. All the other variables have a positive affect on the profitability and p-values less than 0.05 shows we have to keep them in the regression analysis.

So we can safely say that location has no affect on the profitability, but other variables like Age, Income, and Tenure does have a significant affect.

Finding a better regression model.


Regression analysis with profit as the dependant variable and online usage, Age, Income, tenure, Log Tenure as the independent Variables. The adjusted R-square is still low but a better fit then the first model. The mode still has a very low predictability. It has low significance F and is acceptable. The regression equation comes out to be :

Profitability = -105.69 + 18.34(online_flag) + 19.17(age) +16.78(income)+ 4..78(Tenure)


The Model shows : We see that for every increase in online usage, profitability increases by $18.34. Average online usage is very low (0.122) and so online usage must be promoted. Age is an important factor as for every increase in 10yrs of age, profitability increases by $19.17. Income also affects profitability and thus we should focus on high income groups. One year increase in tenure increments profitability by $4.78 and tenure becomes an important factor.

Co-Relation between variables.

Interesting to notice the co-relation among all the different variables present in the dataset. We see that age is negatively related to Online flag and thus means that younger customers are more interested in Online channel. The mean of age for online customers is 3.26, significantly lower than overall mean of 4.04. There is a high co-relation among tenure and profit. We calculated profitability for customers with more than 10 years tenure and found that it comes to 172.40, significantly higher than the average. From our previous analysis we already know that maximum profits come from 10 percent of our prime customers.

Conclusions and Suggestions.

As the Online Channel needs high installation cost we should charge our customers for the Channel as there is no significant co relation that it drives profitability alone. On the other hand promoting online channel is also important as, however small, it affects profitability in our regression model. We have only 12% online customer at present, and we should provide rebate for few customer groups to promote online services : Customers with a tenure of more than 10 years should be given free online service to reward them for loyalty and promote online channel among them. They have very high profitability. Young customers ( under age bucket 3) and new customers should be given a rebate or a free subscription for few years until they become more profitable. They generally use Online services elsewhere and providing a discount or free service may attract them. Depending on their profitability after few years we can decide on charging a fee or not. High profitability customers (>$300) should be given free service as they are our Prime customers and our focus should be on to retain them and provide the best service. The above suggestions will improve online usage and also drive profitability. All the other customers who fall outside these categories can be charged for the services as the bank needs to recuperate the investments made in the online channel.

Points to add for analysis!


The low predictability of the regression model and high standard errors made us to reject it and we only picked out key observations from it. Profitability is not directly related to Online Customers and hence our focus should be to involve other demographic variables. We do not know the exact cost to setup the Online channel for all the customers. It might be so that the current infrastructure supports a fraction of total customers and increasing the capacity of the infrastructure may involve high cost. Thus we can focus on providing free online service to a section of customers now and slowly move towards completely free online service in the near future. If we could get the exact values of age and income in the data, we could have had a better regression model in terms of predictability. We do not know the retention behavior of customers and there is no data to show whether all customers are active customers of the bank. Variables like last transaction date may give us that picture.

Thank You!! Questions? If Any!

You might also like