Professional Documents
Culture Documents
Low
Low High
9-5
Scatter Diagram for Cluster Observations
High
Frequency of eating out
Low
Low High
9-6
Scatter Diagram for Cluster Observations
High
Frequency of eating out
Low
Low High
Frequency of going to fast food
restaurants
9-7
Scatter Diagram for Cluster Observations
High
Frequency of eating out
Low
Low High
9-8
Criticisms of Cluster Analysis
The following must be addressed by
conceptual rather than empirical
support:
n
d ( x, y ) | xi yi |
i 1
Chebyshev distance
D( p, q) : max pi qi
i
D max x2 x1 , y2 y1
Mahalanobis distance
The Mahalanobis distance is a
measure of the distance between a
point P and a distribution, D.
It is a multi-dimensional
generalization of the idea of
measuring how many standard
deviation away P is from the mean of
D. This distance is zero if P is at the
mean of D, and grows as P moves
away from the mean. it measures the
number of standard deviations from P
to the mean of D. If each of these
axes is rescaled to have unit variance,
then Mahalanobis distance
corresponds to standard euclidean
distance in the transformed space.
Mahalanobis distance is thus unitless
and scale variant and takes into
account the correlations of the data
9-23
Exercise
Three items have the following
bivariate measurements (y1, y2): (2,
5), (4, 2), (7, 9).
Representativeness of the
sample.
Impact of multicollinearity.
Rules of Thumb 9 3
ASSUMPTIONS IN CLUSTER ANALYSIS
Input variables should be examined for
substantial multicollinearity and if
present . . .
Reduce the variables to equal numbers in
each set of correlated measures.
Use a distance measure that
compensates for the correlation, like
Mahalanobis Distance.
Take a proactive approach and include
only cluster variables that are not highly
correlated.
Stage 4: Deriving Clusters and
Assessing Overall Fit
1. Agglomerative Methods
(buildup)
Continue recursively
Copyright 2010
Pearson Education, Inc.,
9-54
publishing as Prentice-
Step 1: Cluster Analysis Variable
Selection
Variables are typically measured
metrically, but technique can be
applied to non-metric variables.
Variables must be logically related
to a single underlying concept or
construct.
Copyright 2010
Pearson Education, Inc.,
9-55
publishing as Prentice-
Description of HBAT Primary Database Variables
Variable Description Variable Type
Data Warehouse Classification Variables
X1 Customer Type nonmetric
X2 Industry Type nonmetric
X3 Firm Size nonmetric
X4 Region nonmetric
X5 Distribution System nonmetric
Performance Perceptions Variables
X6 Product Quality metric
X7 E-Commerce Activities/Website metric
X8 Technical Support metric
X9 Complaint Resolution metric
X10 Advertising metric
X11 Product Line metric
X12 Salesforce Image metric
X13 Competitive Pricing metric
X14 Warranty & Claims metric
X15 New Products metric
X16 Ordering & Billing metric
X17 Price Flexibility metric
X18 Delivery Speed metric
Outcome/Relationship Measures
X19 Satisfaction metric
X20 Likelihood of Recommendation metric
X21 Likelihood of Future Purchase metric
X22 Current Purchase/Usage Level metric
Copyright 2010
X23 Consider Strategic Alliance/Partnership in Future nonmetric
Pearson Education, Inc.,
9-56
publishing as Prentice-
Cluster Analysis
Learning Checkpoint
Copyright 2010
Pearson Education, Inc.,
9-57
publishing as Prentice-