You are on page 1of 5

Model Answers for Chapter 7: CLASSIFICATION AND REGRESSION TREES

To remain within the limitation of 30 predictors, we will combine some categories of


categorical variables (see the pivot table sheet and the “Data_Partition1” sheet in
7.1_eBayAuctions.xls.

Answer to 7.1.a:

Refer to the CT_PruneTree1 sheet in 7.1_eBayAuctions.xls.


Description:

If (open price < 2.4498) and (close price <2.025) then class=0.
If (open price < 2.4498) and (close price >2.025) then class=1.
If (4.9191> open price > 2.4498) and (close price <4.06) then class=0.
If (4.9191> open price > 2.4498) and (4.06< close price <10) then class=1.
If (open price > 4.9191) and (close price <10) then class=0.
If (11.0668> open price > 2.4498) and (close price > 10) then class=1.
If (open price > 11.0668) and (close price < 39.995) then class=0.
If (open price > 11.0668) and (close price > 39.995) and (seller rating < 1107.5)
then class=1.
If (open price > 11.0668) and (close price > 39.995) and (seller rating > 1107.5)
then class=0.

Answer to 7.1.b:
No, because the close price is not known at the start of the auction.

Answer to 7.1.c:

In general, we see that auctions with low close prices, relative to open prices, tend
not to be competitive (attract less than two bids). This is not interesting, because
obviously a single bid is likely to lead to a low close price. The most
interesting/surprising rule is the one that involves seller rating: among auctions with
high open and close prices, those that tend to transact are associated with low seller
ratings. In other words, less experienced sellers who had high open and close prices
(i.e., expensive items) tended to generate competitive auctions.

Answer to 7.1.d:
Refer to the CT_PruneTree2 sheet in 7.1_eBayAuctions.xls.

Description:

If (open price < 1.23) then class=1.


If (1.23< open price < 1.7373) and then class=0.
If (1.7373< open price < 2.4498) then class=1.
If (2.4498< open price > 9.97) and (seller rating < 557) then class=0.
If (open price > 9.97) and (seller rating < 557) then class=1.
If (open price > 2.4498) and (557< seller rating < 2295) then class=0.
If (open price > 2.4498) and (2295< seller rating < 2370) then class=1.
If (open price > 2.4498) and (2370< seller rating < 3350) then class=0.
If (open price > 2.4498) and (seller rating > 3350) then class=0.

Answer to 7.1.e:
Refer to the “Scatter plot” sheet in 7.1_eBayAuctions.xls.

A clearer visualization is achieved by plotting both variables on the log-scale (see


below).
1000

100

Competitive
Non-Competitive
split1
10
split2
Open price

split3
split4
split5
1
split6
split7
split8

0.1

0.01
1 10 100 1000 10000 100000
Seller Rating

Although it is hard to see a clear separation between the competitive and non-
competitive auctions on the scatter plot, we see a set of competitive auctions with
opening price > 10 held by sellers with rating < 1000. This is surprising, because we
would expect higher seller ratings to be associated with a higher chance of
competitive auctions. We also see that the bulk of auctions with opening price < $1
or so are competitive, and this is not surprising (lower opening bids attract bidders).

Answer to 7.1.f:

Lift-chart:
Lift chart (validation dataset)

500
Cumulative
400 Competitive?

Cumulative
when sorted
300
using predicted
200 values
Cumulative
100
Competitive?
0 using average
0 500 1000
# cases

From the lift chart we see that the model’s predictive performance (i.e. correctly
capturing the auctions that are most likely to be competitive) is better than the
baseline model, since its lift curve is higher than that of the baseline model.

Cut off Prob.Val. for Success (Updatable) 0.5


Classification Confusion Matrix
Predicted Class
Actual Class 1 0
1 430 201
0 79 473
Error Report
Class # Cases # Errors % Error
1 631 201 31.85
0 552 79 14.31
Overall 1183 280 23.67

Validation Data scoring - Summary Report (Using Best Pruned Tree)

Cut off Prob.Val. for Success (Updatable) 0.5


Classification Confusion Matrix
Predicted Class
Actual Class 1 0
1 279 156
0 55 299
Error Report
Class # Cases # Errors % Error
1 435 156 35.86
0 354 55 15.54
Overall 789 211 26.74
From the classification matrices we see that the overall accuracy for the training data
set is 76.33% and for validation data set is 73.26%, which are both quite high. In
both cases the error percentage is higher for competitive auctions (30%-35%
isclassified) than non-competitive auctions (14%-15% misclassified).

Answer to 7.1.g:

To get a competitive auction, the most important factor controlled by the seller is the
opening price, with lower opening prices attracting more bidders. From the tree we
see that if the opening price < $1.23, then it will lead to a competitive
auction. In particular, it appears that setting the opening price to the minimum of
$0.01 (which is eBay’s default) is most likely to lead to a competitive auction.

You might also like