You are on page 1of 12

17

Two-sample t-test
a)
I want to if the average amount of money spent on B
BBC books is the same for different genders. I
want to know this to determine which demographics I
should target for the direct mailing campaign for
the
new coming book. Then my claim is that the average
money spent is the same in different genders. To
test this claim, I created two sample sets out of m
y data. The amount of money spent on BBBC books by
Male and the amount of money spent on BBBC books by
Female. Each sample set now has its own
average and own standard deviation.
Two-sample t-test will compare these two sample ave
rages (
x
1
and
x
2
). Here is
x
is the average amount of
money spent on BBBC books. If the difference betw
een
x
1
and
x
2
is small enough to be explained by
sampling variation, then the difference will not be
statistically significant. Hence we will accept th
e null
hypothesis that the average money spent is indeed t
he same for different genders.
17
Two-sample t-test
a)
I want to if the average amount of money spent on B
BBC books is the same for different genders. I
want to know this to determine which demographics I
should target for the direct mailing campaign for
the
new coming book. Then my claim is that the average
money spent is the same in different genders. To
test this claim, I created two sample sets out of m
y data. The amount of money spent on BBBC books by
Male and the amount of money spent on BBBC books by
Female. Each sample set now has its own
average and own standard deviation.
Two-sample t-test will compare these two sample ave
rages (
x
1
and
x
2
). Here is
x
is the average amount of
money spent on BBBC books. If the difference betw
een
x
1
and
x
2
is small enough to be explained by
sampling variation, then the difference will not be
statistically significant. Hence we will accept th
e null
hypothesis that the average money spent is indeed t
he same for different genders.
The TTEST Procedure
Variable:
Purchased
Gender N Mean Std
Dev Std Err Minimum Maximum
0 721 198.1 95.5
136 3.5571 15.0000 458.0
1 1579 194.0 96.3
274 2.4241 15.0000 461.0
Diff (1-2) 4.1806 96.0
732 4.3182
Gender Method Mean 9
5% CL Mean Std Dev 95% CL Std Dev
0 198.1 1
91.2 205.1 95.5136 90.8252 100.7
1 194.0 1
89.2 198.7 96.3274 93.0811 99.8101
Diff (1-2) Pooled 4.1806 -4.
2875 12.6487 96.0732 93.3744 98.9338
Diff (1-2) Satterthwaite 4.1806 -4.
2635 12.6247
Method Variances
DF t Value Pr > |t|
Pooled Equal
2298 0.97 0.3331
Satterthwaite Unequal
1405.7 0.97 0.3316
Equality of
Variances
Method Num DF De
n DF F Value Pr > F
Folded F 1578
720 1.02 0.7960
The two sample t-test suggests that we should accep
t the null hypothesis. Hence, we conclude that the
average spending is indeed the same for different g
enders.

24
Logistic Regression
Recall that Choice represents
whether the customer purchased the book titled The
Art History of
Florence or not. 1 corresponds to a purchase and 0
corresponds to a non-purchase.
I want to see how
the probability of purchase changes with the amount
of money spent on BBBC books.
26
Multiple Regression
So far, I used a model with one response and one pr
edictor variable. However, I was suspecting that th
e
proposed model was not sufficient enough to state a
functional relationship between response and
predictor. It gave the clues of how response would
behave in presence of particular predictors but did
nt
give a predictive model. Hence, I will employ a mul
tiple regression analysis to see if I can find a pr
edictive
model by using all available predictors.
The REG P
rocedure
Number of Observatio
ns Read 2300
Number of Observatio
ns Used 2300
Descriptive
Statistics
Uncorrected Standard
Variable Sum Mean
SS Variance Deviation
Intercept 2300.00000 1.00000
2300.00000 0 0
PurchasePeriod 45514 19.78870
1319690 182.26503 13.50056
Frequency 30600 13.30435
562776 67.70899 8.22855
Childbook 1676.00000 0.72870
3578.00000 1.02510 1.01247
Youthbook 788.00000 0.34261
1186.00000 0.39844 0.63122
Cookbook 1807.00000 0.78565
3993.00000 1.11932 1.05798
DIYbook 934.00000 0.40609
1542.00000 0.50575 0.71116
Artbook 759.00000 0.33000
1115.00000 0.37605 0.61323
Purchased 449137 195.27696
108925429 9229.80538 96.07188
Correl
ation
Purchase
Variable Period Fr
equency Childbook Youthbook
PurchasePeriod 1.0000
0.5835 0.5000 0.3716
Frequency 0.5835
1.0000 -0.0149 -0.0149
Childbook 0.5000
-0.0149 1.0000 0.2796
Youthbook 0.3716
-0.0149 0.2796 1.0000
Cookbook 0.4702
-0.0205 0.2807 0.2416
DIYbook 0.3782
-0.0245 0.2304 0.2161
Artbook 0.3617
-0.0097 0.2592 0.1427
Purchased 0.3221
-0.0104 0.3009 0.2383
Correl
ation
Variable Cookbook
DIYbook Artbook Purchased
PurchasePeriod 0.4702
0.3782 0.3617 0.3221
Frequency -0.0205
-0.0245 -0.0097 -0.0104
Childbook 0.2807
0.2304 0.2592 0.3009
Youthbook 0.2416
0.2161 0.1427 0.2383
Cookbook 1.0000
0.2377 0.2190 0.2869
DIYbook 0.2377
1.0000 0.2102 0.2429
Artbook 0.2190
0.2102 1.0000 0.2323
Purchased 0.2869
0.2429 0.2323 1.0000
27
The REG P
rocedure
Model:
FULL
Dependent Varia
ble: Purchased
Number of Observatio
ns Read 2300
Number of Observatio
ns Used 2300
Analysis o
f Variance
Sum
of Mean
Source DF Squa
res Square F Value Pr > F
Model 7 3925
791 560827 74.33 <.0001
Error 2292 17293
531 7545.17076
Corrected Total 2299 21219
323
Root MSE 86.86
294 R-Square 0.1850
Dependent Mean 195.27
696 Adj R-Sq 0.1825
Coeff Var 44.48
192
Parameter
Estimates
Parameter
Standard
Variable DF Estimate
Error t Value Pr > |t|
Intercept 1 158.08858
4.05424 38.99 <.0001
PurchasePeriod 1 1.42308
0.31972 4.45 <.0001
Frequency 1 -1.39450
0.38515 -3.62 0.0003
Childbook 1 10.32082
2.34115 4.41 <.0001
Youthbook 1 11.83323
3.26920 3.62 0.0003
Cookbook 1 9.48416
2.17098 4.37 <.0001
DIYbook 1 10.92493
2.94038 3.72 0.0002
Artbook 1 12.47801
3.33942 3.74 0.0002
The coefficients of predictors seem statistically s
ignificant; hence the proposed model is successful.
The predictive model is:
E(Purchase|X=x) = 158.09 + 1.423*PurchasePeriod 1
.394*Frequency + 10.321*Childbook +
10.925*DIYbook + 12.478*Artbook
28
29
The residual plot gives a pretty good null plot, su
ggesting that the model is successful.
Annotated SAS Program

Dave began a series of live market tests, each involving a random sample of
customers from the database. An offer for the current book selection is sent to the
sample and then the sample customers responses, either purchase or no purchase, are
recorded and used to calibrate a response model for the current offering. The response
models results are then used to score the remaining customers in the database and
select customers from the full customer database for the rollout mailing campaign.

Daves first market tests relied on RFM (recency frequency monetary) analysis.
Direct marketers have used this approach to predict customer behavior for more than
50 years. The approach is intuitive, easy to implement, and produced significant
improvements in response rates and profits compared with mass mailings to
BookBinders full database. Despite this initial success, Dave is eager to evaluate the
effectiveness of alternate approaches. BookBinders offers books in different
categories including cooking, art and childrens books and the number of previous
book purchases in each category is recorded in each customers record in the
database. RFM analysis does not use this or other customer information such as
gender, and Dave suspects that a more sophisticated modeling approach could yield
superior results to the RFM approach.
Logistic Regression offers a powerful method for modeling response. Logistic
regression is similar to linear regression the key difference is that the dependent
variable is binary (for example, purchase or no purchase) rather than continuous. For
each customer, logistic regression predicts a probability, between 0 and 1, of purchase
or response, which can be used for targeting and prediction decisions. Like linear
regression, it can accommodate both continuous and categorical predictors, including
interaction terms. Its use in database marketing has grown as software becomes more
readily available and as familiarity with the approach grows.

Dave has just received a dataset containing the responses of a random sample of
50,000 customers to a new offering from BookBinders titled The Art History of
Florence. Dave is eager to assess the potential value of logistic regression as a
method for predicting customer response and has asked you to complete the following
analyses.

1. Logistic Regression
Estimate a logistic regression model using BUYER as the dependent variable
and the following as predictor variables: (Use Analyze/Regression/Binary
Logistic in SPSS. Save the predicted probabilities by clicking on the Save
button and then on Probabilities under Predicted Values).
LAST
TOTAL$
GENDER
CHILD
YOUTH
COOK
DO_IT
REFERNCE
ART
GEOG

Technical Note:

PURCH is excluded from the set of predictor


variables including it will lead to perfect
collinearity since PURCH (the number of books
purchased) is equal to the sum of the number of
books purchased in the 7 categories. By
including the number of purchases in each
category, there is no need to include the total
number of purchases.
1. Summarize and interpret the results (so that a marketing manager can
understand them). Which variables are significant? Which seem to be
important? Interpret the coefficients for each of the predictors.

2. Decile Analysis of Logistic Regression Results

1. Assign each customer to a decile based on his or her predicted


probability of purchase. (Hint: use largest values to create deciles)

2. Create a bar chart plotting response rate summarized by decile. (Hint:


Use deciles as the Category Axis and mean Bought on the vertical
axis)

3. Generate a report showing number of customers, the number of buyers


of Art History of Florence and the response rate to the offer by
decile. (Hint: use Case Summaries, be sure to uncheck Display
Cases)

4. Generate a report showing the mean values of the following variables


by probability of purchase decile:
Total $ spent
Months since last purchase, and
Number of books purchased for the seven categories (i.e., children,
youth, cookbooks, do-it-yourself, reference, art and geography). (Hint:
use Case Summaries, be sure to uncheck Display Cases)

5. Summarize and interpret the decile analysis results.

3. Lift and Cumulative Lift

1. Use the information from the report in 2c) above to create a chart
showing the lift and cumulative lift for each decile. Recall that the lift
for a decile is the response rate for that decile divided by the overall
response rate. Similarly, cumulative lift is the cumulative response rate
(summing up to and including that decile) divided by the overall
response rate. You may want to use Excel for these calculations.

2. Create a graph showing the cumulative lift by decile.

4. Gains, Cumulative Gains and Banana Charts


1. Use the information from the report in 2c) above to create a chart
showing the gains and cumulative gains for each decile. Recall that the
gains for a decile are the proportion of responders who are in that
decile. Similarly, cumulative gains are the sum of gains up through
that decile. You may want to use Excel for these calculations.

2. Create a banana chart showing the cumulative gains by decile along


with a reference line corresponding to no model. Interpret the Banana
chart.

5. Profitability Analysis

Use the following cost information to assess the profitability of using logistic
regression to determine which customers will receive a specific offer:

Cost to mail offer to customer: $.50


Selling price (shipping included): $18.00

Wholesale price paid by BookBinders: $9.00

Shipping costs: $3.00

8. Create a new variable (call it MAIL) with a value of 1 if the customers


predicted probability is .083 or greater, and 0 otherwise. Since the breakeven
response rate is 8.3%, this variable will be used to determine who gets mailed
the offer and who doesnt.

9. Generate a report summarizing the number of customers, the number of buyers


of Art History of Florence and the response rate to the offer by the MAIL
variable.

10. What would the gross profit (in dollars, and also as a percentage of gross
sales) and the return on marketing expenditures have been if BookBinders had
mailed the offer to buy The Art History of Florence only to customers with a
predicted probability of buying of 8.3% or greater?

6. What are the key learning points from this assignment? What are the
managerial implications of your findings?

You might also like