You are on page 1of 43

Multidimensional scaling

Yee Jean 12524752 Andrea12524807 Mohan 12524729

What is MDS?
belongs to the more general category of methods for multivariate data analysis Multidimensional scaling is an exploratory technique used to visualize proximities in a low dimensional space relation between a pair of entities = proximities (distance or similarity/dissimilarity) Correlations can be considered to be similarities, hence the usage of correlation matrix

Key Terms
Objects, also called variables or stimuli, are the products, candidates, opinions, or other choices to be compared Subjects are those doing the comparing Sometimes the subjects are termed the "source" and the objects are termed the "target Possible for subjects to rate themselves, in which case subjects and objects are the same Dimensions: usually hierarchies and have one or more levels

Key terms
Euclidean distance
the "ordinary" distance between two points on a plane, and is given by the Pythagorean formula. By using this formula as distance, Euclidean space (or even any inner product space) becomes a metric space. the proximities are then represented in a geometrical space, e.g. in a Euclidean space. Most commonly used space in MDS Sum of squared distances

TOBECHANGED
You ve been asked to fill in this similarity data in two different ways. In the lecture we ll look at the multidimensional scaling of the results

M&M KitKat M&M Snicker Pocky Mentos TicTac

Snicker

Pocky

Mentos

TicTac

Multidimensional scaling
Part of family of techniques called Multidimensional Analyses (MDA) Exploratory data analysis Shepard (1962) and Kruskal (1964)

Method of ordination

Similarities

Dissimilarities

Goals of MDS
Reduce large amounts of data into easy-to-visualize structures Attempts to find structure (visual rep) in distance measures Show how variables/objects are related perceptually Assigning causes to specific locations

How MDS works


An MDS algorithm starts with a matrix of items item similarities

Assign a location to each item in N-dimensional space, where N is specified a priori

For sufficiently small N, the resulting locations may be displayed in a graph or 3D visualisation

MDS process
Obtaining data: type and source Determining proximities Transform/ scale data Fitting into appropriate model Finding stress level Not acceptable stress: transform data or change model

Types of Data
Decompositional/ Attribute-free
most common type rate objects on overall basis without reference to objective attributes Perceptual map

Compositional

Rate objects on variety of ALL specific attributes Object matrices May involve specialized procedures

Data collection: How raw proximities are obtained?


Pairwise comparison method Preference method Confusion data method Direct ranking method Objective methods

Proximity Measures
Types of proximities: similarity/dissimilarity Shape of the datamatrix
number of ways of a data matrix refers to the dimensionality of the data-matrix number of modes refers to the number of unique ways underlying the dissimilarities symmetry of the proximities is often assumed in the muldimensional scaling of square matrices but not always fulfilled

Measurement characteristics of the data

The measurement level relates to the invariance of the proximities under transformations. The usual scales are ratio-, interval-, ordinal and nominal scale. Multidimensional scaling is particularly suited for the analysis of ordinal data, these are the non-metric scaling models. measurement process comes down to the distinction between continuous and discrete: objects measured by a discrete process and belonging to the same category have the same number while objects measured by a continuous proces fall in a range of numbers when belonging to the same category

MDS Models (proximity matrix)


Classical
One proximity matrix (metric or non-metric)

Replicated
Several matrices

Weighted
Aggregate proximities and individual differences in a common MDS space.

Metric or Non-Metric
MDS-analyses which imply uniqueness on the interval level (or stronger levels of uniqueness such as ratio or absolute level) are known as metric MDS or classical scaling. If weaker levels of uniqueness than the interval level are assumed, use is made of socalled non-metric MDS algorithms.

MDS model summary

How MDS works (The iterative MDS-algorithm)

SPSS Case Study


Facial Expressions by Abelson and Sermat (1962) Description:
Dissimilarities of facial expressions for 13 situations 30 students rated pairs of 13 pictures with facial expressions acted by a woman
9-point scale with respect to overall dissimilarity.

Dissimilarity: difference in emotional expression or content

Method
For each subject, 78 proximities resulted Rescaled over individuals by method of successive intervals (Diederich et al., 1957). The means of these intervals were taken as the proximity data.

Method - Measurements
The facial expressions are: 1 Grief at death of mother 2 Savoring a coke 3 Very pleasant surprise 4 Maternal love-baby in arms 5 Physical exhaustion 6 Something wrong with plane 7 Anger at seeing dog beaten 8 Pulling hard on seat of chair 9 Unexpectedly meets old boy friend 10 Revulsion 11 Extreme pain 12 Knows plane will crash 13 Light sleep

SPSS options PROXSCAL ALSCAL

PROXSCAL performs multidimensional scaling of proximity data to find a least-squares representation of the objects in a lowdimensional space. Individual differences models are allowed for multiple sources A majorization algorithm guarantees monotone convergence for optionally transformed metric and nonmetric data under a variety of models and constraints.

alternating least squares scaling


ALSCAL performs metric or nonmetric Multidimensional Scaling and Unfolding with individual differences options. It can analyze one or more matrices of dissimilarity or similarity data. The analysis represents the rows and columns of the data matrix as points in a Euclidean space. If a row and column are similar, then their points are close together, while if the row and column are dissimilar, they are far apart.

SPSS does not allow you to use proximities directly

Proximity matrix: Input data


Proximities Grief Grief . Savor Surprise Love Exhaustion Wrong Anger Pulling Meets Revulsion Pain KnowFear Sleep Savor 4.050 .

Surprise

8.250

2.540

Love

5.570

2.690

2.110

Exhaustion

1.150

2.670

8.980

3.780

Wrong

2.970

3.880

9.270

6.050

2.340

Anger

4.340

8.530

11.870

9.780

7.120

1.360

Pulling

4.900

1.310

2.560

4.210

5.900

5.180

8.470

Meets

6.250

1.880

.740

.450

4.770

5.450

10.200

2.630

Revulsion

1.550

4.840

9.250

4.920

2.220

4.170

5.440

5.450

7.100

Pain

1.680

5.810

7.920

5.420

4.340

4.720

4.310

3.790

6.580

1.980

KnowFear

6.570

7.430

8.300

8.930

8.160

4.660

1.570

6.490

9.770

4.930

4.830

Sleep

3.930

4.510

8.470

3.480

1.600

4.890

9.180

6.050

6.550

4.120

3.510

12.650

Testing for validity/reliability


Split data tests Data stability tests Test- retest reliability

Analysis of results
Stress (phi) is a goodness of fit measure for MDS models
The smaller the stress, the better the fit. High stress may reflect measurement error but also may reflect having too few dimensions 2 versions, Young's S-stress (based on squared distances) and the Kruskal's stress (a.k.a., stress formula 1 or stress 1, based on distances)
SPSS generates both but uses S-stress as the criterion for stopping the iterations by which it resets point coordinates to reduce stress, when the improvement in S-stress is .001 or less for that iteration. (The Model dialog lets the researcher adjust this cut-off; if "0" is entered, the algorithm computes 30 iterations)

Overall stress is the SPSS label for average stress in RMDS models (because RMDS has more than one matrix). Average stress is the square root of the mean of squared Kruskal stress values.

Stress: badness of fit


Overall stress
Stress and Fit Measures Normalized Raw Stress Stress-I Stress-II S-Stress Dispersion Accounted For (D.A.F.) .02639 .16246a Object .38154a Savor .05219b .97361 Surprise Love Exhaustion Tucker's Coefficient of Congruence .98672 Wrong Anger Pulling PROXSCAL minimizes Normalized Raw Stress. Meets Revulsion a. Optimal scaling factor = 1.027. b. Optimal scaling factor = 1.030. Pain KnowFear Sleep Mean .0187 .0181 .0456 .0357 .0306 .0264 .0187 .0181 .0456 .0357 .0306 .0264 .0229 .0292 .0183 .0279 .0648 .0171 .0077 .0229 .0292 .0183 .0279 .0648 .0171 .0077 Grief

Decomposition of stress table Individual stress values


Decomposition of Normalized Raw Stress Source SRC_1 .0064 Mean .0064

Common space
Final Coordinates Dimension 1 Grief Savor Surprise Love Exhaustion Wrong Anger Pulling Meets Revulsion Pain KnowFear Sleep .223 -.371 -.854 -.625 -.016 .514 .991 -.308 -.699 .328 .271 .796 -.250 2 -.301 .113 .485 -.067 -.463 .021 .180 .386 .183 -.386 -.078 .632 -.707

MDS perceptual map

Shepard Diagram

R2 greater than 0.6

Plot of transformation

As a practical strategy, we may start with a weaker assumption, but as soon as we find, as a result of the analysis, that a stronger measurement assumption can be justified, we switch to the stronger assumption. In this way we can get more reliable results while avoiding unaffordable scale level assumptions.

Contrasts
Three methods of analysis are closely related to MDS. These are principal component analysis (PCA), correspondence analysis (CA) and cluster analyis. In this section we will give a short description of PCA, CA and cluster analysis and their relation to MDS. 6.1. Principal Components Analysis Principal components analysis or PCA is performed on a matrix A of n entities observed w.r.t. p variables. The aim is to search for new variables, called principal components, which are based on a linear combination of the original variables and this in a way that they account for most of the variation in the original variables. In metric CMDS a matrix of distances D between the n entities is given and the aim is to find a low-dimensional configuration of the entities such that the distances are approximated in a least-squares sense. When these distances are Eulidean distances, the coordinates contained in X do represent the principal coordinates which would be obtained when doing PCA on A. This approach is called principal coordinates analysis as well as classical scaling. A more detailed account of this correspondence can be found in Everitt and Rabe-Hesketh (1997).

Applications of MDS
6.2. Correspondence Analysis Correspondence analysis is classically used on a two-way contingency table with the aim to visualize the relations (i.e. deviations from statistical independence) between the row and column categories. The same is done by the unfolding models: subjects (row-categories) and objects (column-categories) are visualized in a way that the order of the distances between a subject-point and the objectpoints reflects the preference-ranking of the subject. The measure of "proximity" used in CA is the chi-square distance between the profiles. A short description of CA and its relation to MDS can be found in Borg and Groenen (1997). 6.3. Cluster Analysis Cluster analysis models or ultrametric tree models, are equally applicable to proximity data including two-way (asymmetric) square and rectangular data as well as three-way two-mode data. The main difference with the MDS models is that most models for cluster analysis lead to a hierarchical structure. The dissimilarities are approached by path distances under a number of restrictions. The path distances are looked for in a way that minimizes the sum of squared errors:

Take Note
Analysis is not straightforward: many algorithms input into SPSS program (Proxscal/Alscal) which makes it seem easy to compute, but interpretation needs to take into account process the data underwent in order to elicit a better understanding More dimensions, greater complexity of analysis

References
http://www.mathpsyc.uni-bonn.de/doc/delbeke/delbeke.htm http://repub.eur.nl/res/pub/1274/ei200415.pdf http://forrest.psych.unc.edu/teaching/p208a/mds/mds.html http://www.analytictech.com/borgatti/mds.htm http://www.terry.uga.edu/~pholmes/MARK9650/Classnotes4.pdf http://publib.boulder.ibm.com/infocenter/spssstat/v20r0m0/index. jsp?topic=%2Fcom.ibm.spss.statistics.help%2Fsyn_proxscal_overvie w.htm http://forrest.psych.unc.edu/research/alscal.html http://www.statsoft.com/textbook/multidimensional-scaling/ http://faculty.chass.ncsu.edu/garson/PA765/mds.htm#ALSCAL http://takane.brinkster.net/Yoshio/c045.pdf

You might also like