You are on page 1of 10

BAUDM

ASSIGNMENT
Assignment 2

QUESTION: Competitive Auctions on eBay.com.


The file eBayAuctions.xls contains information on 1972 auctions
transacted on eBay.com during May-June 2004. The goal is to
use these data to build a model that will classify competitive
auctions from noncompetitive ones. A competitive auction
is defined as an auction with at least two bids placed on the
item auctioned. The data include variables that describe the
item (auction category), the seller (his/her eBay rating), and the
auction terms that the seller selected (auction duration,
opening price, currency, day-of-week of auction close).
In addition, we have the price at which the auction closed. The
goal is to predict whether or not the auction will be competitive.
ANSWER:
SET SEED=14091992.
USE ALL.
COMPUTE filter_$=(uniform(1)<=.70).
VARIABLE LABELS filter_$ 'Approximately 70% of the cases (SAMPLE)'.
FORMATS filter_$ (f1.0).
FILTER BY filter_$.
EXECUTE.
FILTER OFF.
USE ALL.
EXECUTE

1.Discriminant Analysis

Test of
Submitted
Function(s)
1

Wilks' Lambda
Wilks'
Chi-square
By:
Lambda
.924
111.572

Pranav Aggarwal
14PGP030
Standardized
Canonical
Discriminant
Function
Coefficients
Function
1
sellerRatin
-.142
g

df

Sig.
4

.000

ClosePrice
OpenPrice
Duration

1.128
-.926
-.039

Classification Resultsa,b
Competitiv Predicted Group
Total
e?
Membership
0
1
0
628
26
654
Count
1
452
301
753
Cases
Origin
Selected al
0
96.0
4.0
100.0
%
1
60.0
40.0
100.0
0
245
7
252
Count
Cases
1
169
144
313
Origin
Not
al
0
97.2
2.8
100.0
Selected
%
1
54.0
46.0
100.0
a. 66.0% of selected original grouped cases correctly classified.
b. 68.8% of unselected original grouped cases correctly classified.
According to the analysis, wilks lamda is close to 1, hence this
may not be a good method of analysis and may not predict
accurate model,
Also, according to confusion matrix, accuracy of the model is of
66% which is very low.
2.Logistic Regression
Block 0: Beginning Block

Observed

Classification Tablea,b
Predicted
c
Selected Cases
Unselected Casesd

Competitive Percentag Competitive Percentag


?
e Correct
?
e Correct
0
1
0
1
0 654
.0
0 252
.0
Competitive 0
Ste ?
1
0 753
100.0
0 313
100.0
p 0 Overall
53.5
55.4
Percentage
a. Constant is included in the model.
b. The cut value is .500
c. Selected cases Approximately 70% of the cases (SAMPLE) EQ 1
d. Unselected cases Approximately 70% of the cases (SAMPLE) NE 1

Block 1: Method = Enter

Classification Tablea
Observed
Predicted
b
Selected Cases
Unselected Casesc
Competitive Percentag Competitive Percentag
?
e Correct
?
e Correct
0
1
0
1
80.4 203
49
80.6
Competitive 0 526 128
Ste ?
1 193 560
74.4
70 243
77.6
p 1 Overall
77.2
78.9
Percentage
a. The cut value is .500
b. Selected cases Approximately 70% of the cases (SAMPLE) EQ 1
c. Unselected cases Approximately 70% of the cases (SAMPLE) NE 1

Step
1a

Category
Category(1)
Category(2)
Category(3)

Variables in the Equation


B
S.E.
Wald
df
53.801
17
.141
.276
.262
1
-1.029
.332
9.593
1
-.095
.400
.056
1

Sig. Exp(B)
.000
.609 1.151
.002
.357
.813
.910

Category(4)
.168
.617
.074
1
.786
Category(5) -1.321
.382 11.968
1
.001
Category(6) -1.149
.576
3.977
1
.046
Category(7)
.000
.249
.000
1
.999
Category(8)
-.077
.574
.018
1
.893
Category(9)
.974
.537
3.289
1
.070
Category(10) -1.609
.715
5.064
1
.024
Category(11) -1.588
.459 11.957
1
.001
Category(12) -.139
.338
.170
1
.680
Category(13) -.446
.355
1.576
1
.209
Category(14) -.074
.228
.106
1
.745
Category(15)
.823 1.269
.421
1
.517
Category(16)
.064
.673
.009
1
.925
Category(17) -.528
.380
1.929
1
.165
currency
10.680
2
.005
currency(1)
-.536
.239
5.042
1
.025
currency(2)
.953
.532
3.205
1
.073
sellerRating
.000
.000
7.872
1
.005
Duration
10.442
4
.034
Duration(1)
-1.627
.846
3.694
1
.055
Duration(2)
-.327
.344
.904
1
.342
Duration(3)
-.059
.302
.038
1
.845
Duration(4)
-.531
.267
3.965
1
.046
endDay
16.707
6
.010
endDay(1)
.408
.408
1.000
1
.317
endDay(2)
.977
.400
5.966
1
.015
endDay(3)
.185
.405
.208
1
.648
endDay(4)
.234
.397
.347
1
.556
endDay(5)
.074
.550
.018
1
.893
endDay(6)
.408
.411
.982
1
.322
ClosePrice
.091
.009 97.425
1
.000
OpenPrice
-.105
.010 102.684
1
.000
Constant
.013
.443
.001
1
.977
a. Variable(s) entered on step 1: Category, currency, sellerRating,
Duration, endDay, ClosePrice, OpenPrice.

1.182
.267
.317
1.000
.926
2.649
.200
.204
.870
.640
.928
2.277
1.066
.590
.585
2.593
1.000
.197
.721
.943
.588
1.504
2.657
1.203
1.264
1.077
1.503
1.095
.901
1.013

According to the above analysis, Accuracy is better and improved from 53.5%
to 77.2%.

Also, not all variables are significant. Those which are significant are:
Categor
y(2)
Categor
y(5)
Categor
y(6)
Categor
y(10)
Categor
y(11)
currency
currency
(1)
sellerRat
ing
Duration
Duration
(4)
endDay
endDay(
2)
ClosePri
ce
OpenPri
ce

-1.029
-1.321
-1.149
-1.609
-1.588
-0.536
0
-0.531
0.977
0.091
-0.105

3.Tree

Classification

Sample

Observed

Predicted
1

0
0
Training

1
Overall Percentage
0
Test
1
Overall Percentage
Growing Method: CHAID
Dependent Variable: Competitive?

566

88

Percent
Correct
86.5%

135
49.8%
212
45
45.5%

618
50.2%
40
268
54.5%

82.1%
84.2%
84.1%
85.6%
85.0%

So according to the analysis, most important relation is competitive.


Accuracy percentage in this model is found to be 84.2%. The other model have
higher accuracy percentage. Therefore, we dont use this model
4.Neural Networks
Classification
Sample

Observed
0

0
551
Training
1
124
Overall Percent
48.0%
0
215
Testing
1
50
Overall Percent
46.9%
Dependent Variable: Competitive?

Predicted
1
103
629
52.0%
37
263
53.1%

Percent
Correct
84.3%
83.5%
83.9%
85.3%
84.0%
84.6%

Area Under the Curve


Area
0
.907
Competitive?
1
.907

ROC curve is a plot between sensitivity and 1-specificity. The more it is


towards the left, the better it is.
Area under the curve = 0.907
From the classification table, the percentage of accuracy is 83.9%.
Comparing all the four models, the highest accuracy % is 84.2% from Tree.

You might also like