You are on page 1of 3

Performing Cluster Analysis

One type of analysis that you may find useful in classifying survey respondents into groups of
respondents who had similar responses is cluster analysis. However, this is a more advanced
topic of marketing research. It will not be necessary to perform a cluster analysis for the project
you do in this class. Cluster analysis is a method of grouping respondents into groups of
respondents who had similar responses to your survey. Each group might represent a particular
segment of the market with uniue attributes, attitudes, and characteristics. It is generally best,
but not necessary, to perform a cluster analysis after performing a factor analysis. !"ou may find
it best to read through the Performing Factor Analysis tutorial before reading this tutorial.#
Performing a Cluster Analysis
It is possible to perform a cluster analysis using E$cel, but you are limited to conducting cluster
analysis on only two variables. %herefore, if you conduct a cluster analysis using E$cel, you will
probably need to conduct a factor analysis first. %o perform a cluster analysis using E$cel, you
simply create a scatter graph of the responses each respondent had to the two variables !or
factors# used in the analysis. %he Regression tutorial contains instructions for creating a scatter
graph. &fter you have created the graph, you can look at where the respondents fall on the graph
and see if there are any patterns or groupings of respondents in the graph.
E$ample' (uppose you had factor scores for ) respondents for factors you have named
*forceful+ and *social+. !,ote that this e$ample is a continuation of the e$ample used in the
Performing Factor Analysis tutorial.# %he following is a scatter graph of the factor scores for
the ) respondents'
1
3
5
1 3 5
Social
F
o
r
c
e
f
u
l
It can be seen that the respondents fall into four distinct sections of the graph. -rom this you
could conclude that you have four clusters, one cluster that represents respondents with *social+
and *forceful+ factor scores both above four, another cluster that represents respondents with
both factor scores below two, and so on. It may be useful to get better descriptions of each
cluster by looking back at the original variables that make up each factor. %his may help you in
obtaining a more complete description of the attributes, attitudes, and characteristics that make
up each cluster or market segment.
& better way to perform a cluster analysis is using (.((. & cluster analysis of more than two
variables or factors can be performed using (.((. %o run a cluster analysis using (.(( follow
these steps'
/. Enter your data into an (.(( data editor as described in the (.(( %ool 0it tutorial.
1. Click on *&naly2e+ from the list at the top of the 3ata Editor.
4. (elect *Classify+ from the drop down menu.
5. Click on *Hierarchical Cluster+ from the drop down menu.
6. In the window that pops up, highlight the variables from the list on the left that you want
to include in the analysis and click on the arrow button to move the variables to the
*7ariable!s#'+ list on the right. ,ote that you can highlight and move more than one
variable at a time by holding down the *Ctrl+ key on the keyboard while selecting.
8. 9nder the *3isplay+ options, uncheck the *.lots+ bo$.
:. Click on the *O0+ bo$.
%wo new tables appear in the output viewer. %o analy2e the number of clusters you need to look
at the table labeled *&gglomeration (chedule+. %his table has a column labeled *Coefficients+
that starts with the biggest number at the bottom of the column decreasing to the smallest number
at the top of the column. "ou determine how many clusters you have by looking at this column
from the bottom up and finding the point at which there is a big drop in the numbers, followed by
very little change in the rest of the numbers above. Count the number of rows up to the row just
after the big drop in the number. %his number of rows is the number of clusters you have.
E$ample' (uppose after running a cluster analysis you get this agglomeration schedule'
Agglomeration Schedule
2 5 .000 0 0 4
8 9 2.778E-02 0 0 7
1 3 2.778E-02 0 0 7
2 4 5.556E-02 1 0 8
7 10 .222 0 0 6
6 7 .333 0 5 8
1 8 8.069 3 2 9
2 6 8.485 4 6 9
1 2 12.375 7 8 0
Stage
1
2
3
4
5
6
7
8
9
Cluster 1 Cluster 2
Cluster Combined
Coeffiients Cluster 1 Cluster 2
Stage Cluster !irst
"##ears
$e%t Stage
;ooking at the *Coefficients+ column you see that there is a big drop between the first and
second rows from the bottom !/1.4:6 to <.5<6# followed by a very small drop between the
second and third rows !<.5<6 to <.=8)#. "ou might think this indicates the number of clusters
you have, but continuing up the column you notice an even bigger drop in value between the
third and fourth rows !<.=8) to .444# followed by very little drop in number values in the rest of
the column !.444 to .===#. %his indicates the number of clusters you have. Counting the rows up
to the row just after the big drop !the row with the value of .444# indicates that you should have
four clusters.
,ow that you know how many clusters you have, re>run the cluster analysis following the steps
above, but adding these additional steps between steps si$ and seven'
8a. Click on the bo$ labeled *(ave+ at the bottom of the *Hierarchical Cluster &nalysis+
window.
8b. In the new window that pops up, choose the *(ingle (olution+ option.
8c. In the field ne$t to this option, enter the number of clusters you have concluded that you
have !in the case of the e$ample you would enter 5#.
8d. Click on the *Continue+ bo$.
(.(( re>runs the e$act same cluster analysis. %his time, however, if you return to your data
editor window you will see a new variable !the very last column of your data# labeled *clu5?/+
or something very similar to this. %he numbers in this column indicate what cluster each
respondent belongs too. "ou now know which respondents make up each cluster and can
identify the attributes, attitudes, and characteristics that identify each of the different clusters.

You might also like