You are on page 1of 10

This document consists of a summary of an exploratory data analysis for AJ Davis

Department Stores. AJ Davis seeks to know more about their customers which pay via credit.
We will assess the location, income, household size, years lived at household and credit balance.
The following data shall serve as a sample of AJ Davis customer base:
Location
Income
($1000) Size Years
Credit
Balance($)
Urban 54 3 12 4016
Rural 30 2 12 3159
Suburban 32 4 17 5100
Suburban 50 5 14 4742
Rural 31 2 4 1864
Urban 55 2 9 4070
Rural 37 1 20 2731
Urban 40 2 7 3348
Suburban 66 4 10 4764
Urban 51 3 16 4110
Urban 25 3 11 4208
Urban 48 4 16 4219
Rural 27 1 19 2477
Rural 33 2 12 2514
Urban 65 3 12 4214
Suburban 63 4 13 4965
Urban 42 6 15 4412
Urban 21 2 18 2448
Rural 44 1 7 2995
Urban 37 5 5 4171
Suburban 62 6 13 5678
Urban 21 3 16 3623
Suburban 55 7 15 5301
Rural 42 2 19 3020
Urban 41 7 18 4828
Suburban 54 6 14 5573
Rural 30 1 14 2583
Rural 48 2 8 3866
Urban 34 5 5 3586
Suburban 67 4 13 5037
Rural 50 2 11 3605
Urban 67 5 1 5345
Urban 55 6 16 5370
Urban 52 2 11 3890
Urban 62 3 2 4705
Urban 64 2 6 4157
Suburban 22 3 18 3579
Urban 29 4 4 3890
Suburban 39 2 18 2972
Rural 35 1 11 3121
Urban 39 4 15 4183
Suburban 54 3 9 3730
Suburban 23 6 18 4127
Rural 27 2 1 2921
Urban 26 7 17 4603
Suburban 61 2 14 4273
Rural 30 2 14 3067
Rural 22 4 16 3074
Suburban 46 5 13 4820
Suburban 66 4 20 5149


Location:
Using Minitab, we are able to deduce the following distribution of customer locations:

URBAN: 42% SUBURBAN: 30% RURAL: 28%
The majority of AJ Davis credit customers appear to be Urban, based on the sample provided.
Size:
We now assess the statistics relevant to household size. It should be possible to correlate the size
of a household to other factors such as credit carried, location and income. Speaking strictly
about size, the results indicate that there is a Mean or average household size of 3.42. The
Standard Deviation indicates that the size has a spread, from house to house, on average of
1.739. The Variance of 3.024 importantly shows on average, how far above or below from the
Mean any household size might be. The Minimum size of households is and must be 1 for this
analysis. The First Quartile (Q1) is 2, which indicates that 25% of the households contain less
than 2 individuals. The Median is similar to the Mean (3.420) but reports lower at 3.000. This is
important because it demonstrates that the data is skewed positively, or to the right. There is a
tendency for larger households in this data set. The Third Quartile (Q3) is 5, which indicates that
75% of the households contain less than 5 individuals. The Maximum number of persons in any
household measured is 7.

Variable Mean StDev Variance Minimum Q1 Median Q3 Maximum Range
Size 3.420 1.739 3.024 1.000 2.000 3.000 5.000 7.000 6.000

Years present at Household:
This will be similar in appearance to size of household but will measure longevity of presence in
the market in question. We will be able to compare the time living at a household to other factors
7 6 5 4 3 2 1 0
16
14
12
10
8
6
4
2
0
Size
F
r
e
q
u
e
n
c
y
Mean 3.42
StDev 1.739
N 50
Histogram of Size
Normal
such as income, location and credit carried. Speaking strictly about years, we see a Mean of
12.38 years lived at each household. The Standard Deviation between households tends to be
5.103 years. Typically, the time spent at a location will vary from house to house by just over 5
years. The Variance of 26 years indicates there may be many households with little time spent at
this location, as well as many with a large amount of time spent at the location. At Minimum, 1
year is spent at a household in this data set. The First Quartile (Q1) is 9, which indicates that
25% of the households have been present for less than 9 years. The Median is 13, which is larger
than the Mean and indicates negative skewness which indicates households may be spending less
time in the area, over time. The Third Quartile (Q3) is 16. From this, we learn that less than 75%
of the houses have been in the area for more than 16 years. The Maximum time any household
has been present is 20 years.

Variable Mean StDev Variance Minimum Q1 Median Q3 Maximum
Years 12.380 5.103 26.036 1.000 9.000 13.000 16.000 20.000


20 15 10 5 0
14
12
10
8
6
4
2
0
Years
F
r
e
q
u
e
n
c
y
Mean 12.38
StDev 5.103
N 50
Histogram of Years
Normal
Income:
Using similar analysis, we will be able to measure the relevant statistics regarding customer
income. It will be possible to compare income to location, amount of credit, house size and
length of time spent in an area. For now, we will analyze the basic information relating to
income. It should be noted that these figures are per $1000.
The Mean of these households is 43.48, which means average income is $43,480. The
Standard Deviation from this average is 14.55, showing a $14,550 sway from house to house.
This is a 33% difference in income between each house, on average. The Variance is quite large
at 211.72, which indicates a very wide spread in high and low incomes. The minimum household
income reported is $21,000 from two Urban households. The First Quartile (Q1) indicates that
25% of the households earn less than $30,000 (30.0). The Median is lower than the Mean,
resulting in positive skewness. The Third Quartile (Q3) indicates that 75% of households earn
less than $55,000. The Maximum income reported was $67,000.

Variable Mean StDev Variance Minimum Q1 Median Q3 Maximum
Income ($1000) 43.48 14.55 211.72 21.00 30.00 42.00 55.00 67.00



70 60 50 40 30 20 10
7
6
5
4
3
2
1
0
Income ($1000)
F
r
e
q
u
e
n
c
y
Mean 43.48
StDev 14.55
N 50
Histogram of Income ($1000)
Normal
Credit Balance:
The study specifically requests all households to be carriers of Credit Balances. We will be able
to compare who uses the most credit based on location, household size, income and deduce
whether living in the home longer or shorter impacts the amount of credit carried. The basic facts
pertaining to Credit are as follows: The Mean, or average balance carried is $3,964. The
Standard Deviation is $933 from account to account. For some reason, the Variance measures
871,411, which indicates very unpredictable usage, if it is accurate. The Minimum amount of
credit carried is $1,864. The First Quartile (Q1) indicates that 25% of all households carry less
than $3109. The Median is larger than the Mean at $4090, which indicates negative skewness.
The Third Quartile (Q3) is $4748, meaning that 75% of houses carry less than $4748. The
Maximum amount of credit carried is $5,678.


Variable Mean StDev Variance Minimum Q1 Median Q3 Maximum
Credit Balance($) 3964 933 871411 1864 3109 4090 4748 5678

6000 5000 4000 3000 2000
14
12
10
8
6
4
2
0
Credit Balance($)
F
r
e
q
u
e
n
c
y
Mean 3964
StDev 933.5
N 50
Histogram of Credit Balance($)
Normal
Comparison of Categories:
The data shall depict the connection between any two categories. As AJ Davis appears to be
motivated to learn about their customers who carry credit, all categories will compare to credit.
Amount of Credit vs Income:
The amount of credit and the income of the household appear to be related. Analysis reveals that
as income increases, so too does the amount of credit. Note that this represents all households in
all locations.






70 60 50 40 30 20
6000
5000
4000
3000
2000
Income ($1000)
C
r
e
d
i
t

B
a
l
a
n
c
e
(
$
)
Scatterplot of Credit Balance($) vs Income ($1000)
The following charts depict location as a factor in credit balance and income:
Overlaid Locations:

Individual Locations:

Overall, it is observed that rural customers earn lower incomes but carry lesser amounts of credit,
respectively. Urban customers carry a wider range of credit, and are in the middle range of
income, though much greater than rural and slightly lower than suburban.
70 60 50 40 30 20
6000
5000
4000
3000
2000
Income ($1000)
C
r
e
d
i
t

B
a
l
a
n
c
e
(
$
)
Rural
Suburban
Urban
Location
Scatterplot of Credit Balance($) vs Income ($1000)
60 50 40 30 20
6000
5000
4000
3000
2000
60 50 40 30 20
6000
5000
4000
3000
2000
Rural
Income ($1000)
C
r
e
d
i
t

B
a
l
a
n
c
e
(
$
)
Suburban
Urban
Scatterplot of Credit Balance($) vs Income ($1000)
Panel variable: Location
Amount of Credit vs. Household Years:
This is a wide distribution however, there appears to be a trend where ultimately the longer a
household exists, the more credit it carries up to a certain point.

Amount of Income vs. Household Years:
The data shows that over time, income is decreasing.

20 15 10 5 0
6000
5000
4000
3000
2000
Years
C
r
e
d
i
t

B
a
l
a
n
c
e
(
$
)
Scatterplot of Credit Balance($) vs Years
20 15 10 5 0
70
60
50
40
30
20
Years
I
n
c
o
m
e

(
$
1
0
0
0
)
Scatterplot of Income ($1000) vs Years
Although income tends to decrease over time, credit balances actually increase slightly over that
same time span. Credit balances increase with both time and income which suggests credit is
inescapable in relation to the reduction in income and will always increase with an increase in
income.

You might also like