You are on page 1of 3

Case Analysis - Pilgrim Bank

Group 10, Sec A, PGP I

PreliminariesInitial scanning of the data set reveals that it contains three types of data as listed belowa) Nominal Online and district; b) Ordinal- Income & Age, and c) Ratio- Tenure & Profits.
ApproachSince some of the data points are missing, we have deleted the entire data set pertaining to
that data point. This has reduced over all sample size to 22813 including 2954 for online and
19859 for non-online customer groups (around 88%).
Descriptive Statistics and Histogram of Count of Age wrt Online Customers:

Mean
Median
Mode
SD
Range

Profit
Profit
Online
Offline Income
131.524 126.522
5.488
20.500
27.000
6.000
-2.000
-31.000
6.000
290.365 281.724
2.336
2292.000
2199.000
8.000

5000

Tenure
10.996
8.250
7.410
8.525
41.000

4000
3000

Offline

2000

online

1000
0
1 2 3 4 5 6 7

Analysis:
First, simple regression models of profit, with each of the independent variables i.e. Age,
income, tenure, Online/Not Online and District (Using two dummy variables) was attempted.
It was found that Income, Age, Tenure and District 1200 have positive relationship with
profitability. Also, the regression model of profit versus online/offline shows that online
customers are $5 (Slope in this model) more profitable than offline customers. But the t-stat
values when we carry out the two tailed tests gives a P value greater than 0.05 which is
needed for significance. So this model is not significant.

R Square
Adjusted R Sq
F value
Coff Intercept
Coeff Variable
t Stat Variable
P val Variable

Online
3.53E-05
-8.6E-06
0.8044
126.52
5.0028
0.8968
0.3697

Age
0.0203
0.0202
473.56
26.787
24.6963
21.761
6.1E-104

Income
0.0214
0.0214
501.00
29.754
17.750
22.383
8.8E-110

Tenure
0.0288
0.0288
678.20
65.171
5.6380
26.042
2.3E-147

District (1200&1300)
0.0025
0.0024
29.051
95.504
39.243
9.6252
6.1308
1.2066
8.88E-10 0.2275

After this, we carried out stepwise regression considering the independent variables. We
conclude that the profitability is correlated with tenure, income and age (Other variables turn
out to be insignificant in the t test). The R^2 value worked out for this model is only 5.7%.
Therefore the change in profitability cannot strongly be attributed to these independent
variables. The equation obtained is as follows:
Profit = -87.86 + 4.014*tenure + 18.03*income + 17.69*age
The p-value of all the above independent variable is below 0.05.

Scatter Plots for Variables against profit:

Age
3000
Age

2000

Polynomial (Age)

1000
0
-1000

0 1 2 3 4 5 6 7 8

Income
4000

Income

2000

Polynomial
(Income)

0
-2000

0 1 2 3 4 5 6 7 8 9 10

Tenure
5000

Tenure
Polynomial (Tenure)

0
0 5 1015202530354045
-5000

On observing scatter plots of these variables, we were unable to categorically identify the
Tukeys model quadrant. We then carried out a Quick and Dirty method considering the
square of all independent variables. After removing those variables that did not satisfy
individual t-tests, our model has an R^2 value of 6.3%.
The Quick and dirty model is as follows- Profits = -37.1+5.71tenure-0.05(tenure)217.09income+ 3.32*(income)2+18.16age+16.86online+14.7district1200
Besides this we also carried out a regression model between age and online/not online to find
out the relationship between these two independent variables. This exercise helped us in
concluding that there is a negative co-relation between age and bring online. The p value of
this model is also significant. The other multi co linearity data is as follows

Online
Age
Income

Multicollinearity
Online
Age
Income
-0.1685 0.08069
-0.0699

Tenure
-0.08078
0.42031
0.040002

As we see from the above table, there


is a correlation between age and tenure.

There is also a negative co-relation between age and online which means that the younger
customers use the online service. Hence in our opinion, as the online customers are more
profitable, this can be promoted to young customers.

You might also like