You are on page 1of 4

Wards Hierarchical Clustering (Minimum Variance Method

Objective
Join groups while minimizing the loss of information ESS Or to join cases into clusters such that the variance within a cluster is minimized

Procedure
1. Each observation represents a cluster. For example, if there are 10 observations, (initially) there are 10 clusters. 2. For each of the k clusters, compute ESSk the squared deviations of the observations in cluster k from the mean of the cluster. 3. Compute for ESS. ESS= ESS1 + ESS2 + + ESSk-1 + ESSk 4. The union of every possible pair of clusters is considered then ESS is computed for each of the cases. 5. Two clusters will be joined if the ESS is at its minimum.

Example
15 subjects were referred for a treatment based on their psychological conditions. Four summary measures were computed from 15 subjects: Spielberger Trait Anxiety Inventory (STAI), the Beck Depression Inventory (BDI), a measure of Intrusive Thoughts and Rumination (IT) and a measure of Impulsive Thoughts and Actions (Impulse). Using this four summaries, we cluster the observations to confirm a subjects condition. The rationale behind this analysis is that people with the same disorder should report a similar pattern of scores across the measures (so the profiles of their responses should be similar). Subject 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 Diagnosis GAD Depression OCD GAD OCD OCD GAD OCD Depression Depression GAD Depression GAD OCD Depression STAI 74 50 70 76 68 62 71 67 35 33 80 30 65 78 40 BDI 30 70 5 35 23 8 35 12 60 58 36 62 38 15 55 IT 20 23 58 23 66 59 27 65 15 11 30 9 17 70 10 IMPULSE 10 5 29 12 37 39 17 35 8 16 16 13 10 40 2

Wards Hierarchical Clustering (Minimum Variance Method

SPSS Output
Ward Linkage
Agglomeration Schedule Stage Cluster Combined Cluster 1 1 2 3 4 5 6 7 8 9 10 11 12 13 14 10 1 7 6 9 1 3 5 9 1 3 2 1 1 Cluster 2 12 4 11 8 10 7 6 14 15 13 5 9 2 3 19.000 40.000 86.000 132.500 185.500 274.000 364.167 458.667 599.917 759.217 1034.450 1456.400 6625.100 20275.333 Coefficients Stage Cluster First Appears Cluster 1 0 0 0 0 0 2 0 0 5 6 7 0 10 13 Cluster 2 0 0 0 0 1 3 4 0 0 0 8 9 12 11 5 6 6 7 9 10 11 11 12 13 14 13 14 0 Next Stage

-The values in the Coefficients Column gives the ESS for when there are k clusters. If we have one cluster, ESS is 20275.333, when there are 2 clusters, ESS is 6625.100, and so on. Note that if there are 15 clusters, the ESS value is equal to 0 (and hence, not included in the agglomeration schedule). -In choosing the number of clusters, we need to consider the ESSs. One suggestion is to check the difference between the ESS in previous clustering to the current. If the change is small, then it is recommended to stick to the previous clustering rather than creating another level of clustering. In this example, notice that the change between ESS(1 cluster) and ESS(2 clusters) is large. This is also small for ESS(2 clusters) and ESS(3clusters). BUT, ESS(3 clusters) and ESS(4 clusters) only yield a difference of 1456.4 1034.45 = 421.95 which is small compared to the other differences. Hence, rather than choosing 4 clusters, we choose clustering with 3 groups.

Cluster Membership Case 1 2 3 4 5 6 5 Clusters 1 2 3 1 4 3 4 Clusters 1 2 3 1 3 3 3 Clusters 1 2 3 1 3 3 2 Clusters 1 1 2 1 2 2

Wards Hierarchical Clustering (Minimum Variance Method

7 8 9 10 11 12 13 14 15

1 3 5 5 1 5 1 4 5

1 3 4 4 1 4 1 3 4

1 3 2 2 1 2 1 3 2

1 2 1 1 1 1 1 2 1

-This table gives the cluster membership according to the chosen number of clusters. In our case, we chose three clusters.

-This is a dendogram. This serves as a graphical representation of the groupings of the subjects (observations).

Wards Hierarchical Clustering (Minimum Variance Method Result Subject 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 Diagnosis GAD Depression OCD GAD OCD OCD GAD OCD Depression Depression GAD Depression GAD OCD Depression STAI 74 50 70 76 68 62 71 67 35 33 80 30 65 78 40 BDI 30 70 5 35 23 8 35 12 60 58 36 62 38 15 55 IT 20 23 58 23 66 59 27 65 15 11 30 9 17 70 10 IMPULSE 10 5 29 12 37 39 17 35 8 16 16 13 10 40 2 1 2 3 1 3 3 1 3 2 2 1 2 1 3 2

Cluster 1: GAD Cluster 2: Depression High STAI Low STAI Medium BDI High BDI Medium IT Low IT Medium IMPULSE Low IMPULSE STAI: Spielberger Trait Anxiety Inventory IT: Intrusive Thoughts and Rumination

Cluster 3: OCD Medium STAI Low BDI High IT High IMPULSE

BDI: Beck Depression Inventory IMPULSE: Impulsive Thoughts and Actions

What to do after clustering?


Profile the clusters suggestion: compute for summary measures of the variables such as means for each of the clusters. Assess reliability and validity o perform cluster analysis using different distance measures, results should be somehow stable o use different clustering methods and compare the results o split the data into halves and perform cluster analysis separately on each half. o Delete variables randomly and do cluster analysis. Compare results.

References: Johnson, Wichern. Applied Multivariate Statistical Analysis 6th Ed Malhotra. Fundamentals of Marketing Research: An Applied Orientation http://www.statisticshell.com/docs/cluster.pdf http://www.norusis.com/pdf/SPC_v13.pdf http://www.uk.sagepub.com/burns/website%20material/Chapter%2023%20-%20Cluster%20Analysis.pdf