Professional Documents
Culture Documents
20-2
Cluster Analysis
Cluster analysis is a class of techniques used to
classify objects or cases into relatively homogeneous
groups called clusters. Objects in each cluster tend
to be similar to each other and dissimilar to objects
in the other clusters. Cluster analysis is also called
classification analysis, or numerical taxonomy.
Both cluster analysis and discriminant analysis are
concerned with classification. However, discriminant
analysis requires prior knowledge of the cluster or
group membership for each object or case included,
to develop the classification rule. In contrast, in
cluster analysis there is no a priori information about
the group or cluster membership for any of the
objects. Groups or clusters are suggested by the
data, not defined a priori.
20-3
Variable 1
Variable 2
20-4
Variable 1
X
Variable 2
20-5
Case No. V1 V2 V3 V4 V5 V6
1 6 4 7 3 2 3
2 2 3 1 4 5 4
3 7 2 6 4 1 3
4 4 6 4 5 3 6
5 1 3 2 2 6 4
6 6 4 6 3 3 4
7 5 3 6 3 3 4
8 7 3 7 4 1 4
9 2 4 3 3 6 3
10 3 5 3 6 4 6
11 1 3 2 3 5 3
12 5 4 5 4 2 4
13 2 2 1 5 4 4
14 4 6 4 6 4 7
15 6 5 4 2 1 4
16 3 5 4 6 4 7
17 4 4 7 2 2 5
18 3 7 2 6 4 3
19 4 6 3 7 2 7
20 2 3 2 4 7 2
Conducting Cluster Analysis
20-10
Hierarchical Nonhierarchical
Agglomerative Divisive
Ward’s Method
Cluster 1 Cluster 2
Complete Linkage
Maximum
Distance
Cluster 1 Cluster 2
Average Linkage
Average Distance
Cluster 1 Cluster 2
Conducting Cluster Analysis
20-16
Centroid Method
Conducting Cluster Analysis
20-18
1 1 1 1
2 2 2 2
3 1 1 1
4 3 3 2
5 2 2 2
6 1 1 1
7 1 1 1
8 1 1 1
9 2 2 2
10 3 3 2
11 2 2 2
12 1 1 1
13 2 2 2
14 3 3 2
15 1 1 1
16 3 3 2
17 1 1 1
18 4 3 2
19 3 3 2
20 2 2 2
20-22
Cluster Centroids
Table 20.3
Means of Variables
Cluster No. V1 V2 V3 V4 V5 V6
Cluster
1 2 3
V1 4 2 7
V2 6 3 2
V3 3 2 6
V4 7 4 4
V5 2 7 1
V6 7 2 3
Iteration Historya
Cluster
1 2 3
V1 4 2 6
V2 6 3 4
V3 3 2 6
V4 6 4 3
V5 4 6 2
V6 6 3 4
Cluster 1 2 3
1 5.568 5.698
2 5.568 6.928
3 5.698 6.928
20-31
Cluster Error
Mean Square df Mean Square df F Sig.
V1 29.108 2 0.608 17 47.888 0.000
V2 13.546 2 0.630 17 21.505 0.000
V3 31.392 2 0.833 17 37.670 0.000
V4 15.713 2 0.728 17 21.585 0.000
V5 22.537 2 0.816 17 27.614 0.000
V6 12.171 2 1.071 17 11.363 0.001
The F tests should be used only for descriptive purposes because the clusters have been
chosen to maximize the differences among cases in different clusters. The observed
significance levels are not corrected for this, and thus cannot be interpreted as tests of the
hypothesis that the cluster means are equal.
Clustering Variables
In this instance, the units used for analysis are the
variables, and the distance measures are computed
for all pairs of variables.
Hierarchical clustering of variables can aid in the
identification of unique variables, or variables that
make a unique contribution to the data.
Clustering can also be used to reduce the number of
variables. Associated with each cluster is a linear
combination of the variables in the cluster, called the
cluster component. A large set of variables can often
be replaced by the set of cluster components with
little loss of information. However, a given number
of cluster components does not generally explain as
much variance as the same number of principal
components.
20-33
SPSS Windows
To select this procedures using SPSS for Windows click:
Analyze>Classify>Hierarchical Cluster …
Analyze>Classify>K-Means Cluster …