Professional Documents
Culture Documents
What is MDS?
belongs to the more general category of methods for multivariate data analysis Multidimensional scaling is an exploratory technique used to visualize proximities in a low dimensional space relation between a pair of entities = proximities (distance or similarity/dissimilarity) Correlations can be considered to be similarities, hence the usage of correlation matrix
Key Terms
Objects, also called variables or stimuli, are the products, candidates, opinions, or other choices to be compared Subjects are those doing the comparing Sometimes the subjects are termed the "source" and the objects are termed the "target Possible for subjects to rate themselves, in which case subjects and objects are the same Dimensions: usually hierarchies and have one or more levels
Key terms
Euclidean distance
the "ordinary" distance between two points on a plane, and is given by the Pythagorean formula. By using this formula as distance, Euclidean space (or even any inner product space) becomes a metric space. the proximities are then represented in a geometrical space, e.g. in a Euclidean space. Most commonly used space in MDS Sum of squared distances
TOBECHANGED
You ve been asked to fill in this similarity data in two different ways. In the lecture we ll look at the multidimensional scaling of the results
Snicker
Pocky
Mentos
TicTac
Multidimensional scaling
Part of family of techniques called Multidimensional Analyses (MDA) Exploratory data analysis Shepard (1962) and Kruskal (1964)
Method of ordination
Similarities
Dissimilarities
Goals of MDS
Reduce large amounts of data into easy-to-visualize structures Attempts to find structure (visual rep) in distance measures Show how variables/objects are related perceptually Assigning causes to specific locations
For sufficiently small N, the resulting locations may be displayed in a graph or 3D visualisation
MDS process
Obtaining data: type and source Determining proximities Transform/ scale data Fitting into appropriate model Finding stress level Not acceptable stress: transform data or change model
Types of Data
Decompositional/ Attribute-free
most common type rate objects on overall basis without reference to objective attributes Perceptual map
Compositional
Rate objects on variety of ALL specific attributes Object matrices May involve specialized procedures
Proximity Measures
Types of proximities: similarity/dissimilarity Shape of the datamatrix
number of ways of a data matrix refers to the dimensionality of the data-matrix number of modes refers to the number of unique ways underlying the dissimilarities symmetry of the proximities is often assumed in the muldimensional scaling of square matrices but not always fulfilled
The measurement level relates to the invariance of the proximities under transformations. The usual scales are ratio-, interval-, ordinal and nominal scale. Multidimensional scaling is particularly suited for the analysis of ordinal data, these are the non-metric scaling models. measurement process comes down to the distinction between continuous and discrete: objects measured by a discrete process and belonging to the same category have the same number while objects measured by a continuous proces fall in a range of numbers when belonging to the same category
Replicated
Several matrices
Weighted
Aggregate proximities and individual differences in a common MDS space.
Metric or Non-Metric
MDS-analyses which imply uniqueness on the interval level (or stronger levels of uniqueness such as ratio or absolute level) are known as metric MDS or classical scaling. If weaker levels of uniqueness than the interval level are assumed, use is made of socalled non-metric MDS algorithms.
Method
For each subject, 78 proximities resulted Rescaled over individuals by method of successive intervals (Diederich et al., 1957). The means of these intervals were taken as the proximity data.
Method - Measurements
The facial expressions are: 1 Grief at death of mother 2 Savoring a coke 3 Very pleasant surprise 4 Maternal love-baby in arms 5 Physical exhaustion 6 Something wrong with plane 7 Anger at seeing dog beaten 8 Pulling hard on seat of chair 9 Unexpectedly meets old boy friend 10 Revulsion 11 Extreme pain 12 Knows plane will crash 13 Light sleep
PROXSCAL performs multidimensional scaling of proximity data to find a least-squares representation of the objects in a lowdimensional space. Individual differences models are allowed for multiple sources A majorization algorithm guarantees monotone convergence for optionally transformed metric and nonmetric data under a variety of models and constraints.
Surprise
8.250
2.540
Love
5.570
2.690
2.110
Exhaustion
1.150
2.670
8.980
3.780
Wrong
2.970
3.880
9.270
6.050
2.340
Anger
4.340
8.530
11.870
9.780
7.120
1.360
Pulling
4.900
1.310
2.560
4.210
5.900
5.180
8.470
Meets
6.250
1.880
.740
.450
4.770
5.450
10.200
2.630
Revulsion
1.550
4.840
9.250
4.920
2.220
4.170
5.440
5.450
7.100
Pain
1.680
5.810
7.920
5.420
4.340
4.720
4.310
3.790
6.580
1.980
KnowFear
6.570
7.430
8.300
8.930
8.160
4.660
1.570
6.490
9.770
4.930
4.830
Sleep
3.930
4.510
8.470
3.480
1.600
4.890
9.180
6.050
6.550
4.120
3.510
12.650
Analysis of results
Stress (phi) is a goodness of fit measure for MDS models
The smaller the stress, the better the fit. High stress may reflect measurement error but also may reflect having too few dimensions 2 versions, Young's S-stress (based on squared distances) and the Kruskal's stress (a.k.a., stress formula 1 or stress 1, based on distances)
SPSS generates both but uses S-stress as the criterion for stopping the iterations by which it resets point coordinates to reduce stress, when the improvement in S-stress is .001 or less for that iteration. (The Model dialog lets the researcher adjust this cut-off; if "0" is entered, the algorithm computes 30 iterations)
Overall stress is the SPSS label for average stress in RMDS models (because RMDS has more than one matrix). Average stress is the square root of the mean of squared Kruskal stress values.
Common space
Final Coordinates Dimension 1 Grief Savor Surprise Love Exhaustion Wrong Anger Pulling Meets Revulsion Pain KnowFear Sleep .223 -.371 -.854 -.625 -.016 .514 .991 -.308 -.699 .328 .271 .796 -.250 2 -.301 .113 .485 -.067 -.463 .021 .180 .386 .183 -.386 -.078 .632 -.707
Shepard Diagram
Plot of transformation
As a practical strategy, we may start with a weaker assumption, but as soon as we find, as a result of the analysis, that a stronger measurement assumption can be justified, we switch to the stronger assumption. In this way we can get more reliable results while avoiding unaffordable scale level assumptions.
Contrasts
Three methods of analysis are closely related to MDS. These are principal component analysis (PCA), correspondence analysis (CA) and cluster analyis. In this section we will give a short description of PCA, CA and cluster analysis and their relation to MDS. 6.1. Principal Components Analysis Principal components analysis or PCA is performed on a matrix A of n entities observed w.r.t. p variables. The aim is to search for new variables, called principal components, which are based on a linear combination of the original variables and this in a way that they account for most of the variation in the original variables. In metric CMDS a matrix of distances D between the n entities is given and the aim is to find a low-dimensional configuration of the entities such that the distances are approximated in a least-squares sense. When these distances are Eulidean distances, the coordinates contained in X do represent the principal coordinates which would be obtained when doing PCA on A. This approach is called principal coordinates analysis as well as classical scaling. A more detailed account of this correspondence can be found in Everitt and Rabe-Hesketh (1997).
Applications of MDS
6.2. Correspondence Analysis Correspondence analysis is classically used on a two-way contingency table with the aim to visualize the relations (i.e. deviations from statistical independence) between the row and column categories. The same is done by the unfolding models: subjects (row-categories) and objects (column-categories) are visualized in a way that the order of the distances between a subject-point and the objectpoints reflects the preference-ranking of the subject. The measure of "proximity" used in CA is the chi-square distance between the profiles. A short description of CA and its relation to MDS can be found in Borg and Groenen (1997). 6.3. Cluster Analysis Cluster analysis models or ultrametric tree models, are equally applicable to proximity data including two-way (asymmetric) square and rectangular data as well as three-way two-mode data. The main difference with the MDS models is that most models for cluster analysis lead to a hierarchical structure. The dissimilarities are approached by path distances under a number of restrictions. The path distances are looked for in a way that minimizes the sum of squared errors:
Take Note
Analysis is not straightforward: many algorithms input into SPSS program (Proxscal/Alscal) which makes it seem easy to compute, but interpretation needs to take into account process the data underwent in order to elicit a better understanding More dimensions, greater complexity of analysis
References
http://www.mathpsyc.uni-bonn.de/doc/delbeke/delbeke.htm http://repub.eur.nl/res/pub/1274/ei200415.pdf http://forrest.psych.unc.edu/teaching/p208a/mds/mds.html http://www.analytictech.com/borgatti/mds.htm http://www.terry.uga.edu/~pholmes/MARK9650/Classnotes4.pdf http://publib.boulder.ibm.com/infocenter/spssstat/v20r0m0/index. jsp?topic=%2Fcom.ibm.spss.statistics.help%2Fsyn_proxscal_overvie w.htm http://forrest.psych.unc.edu/research/alscal.html http://www.statsoft.com/textbook/multidimensional-scaling/ http://faculty.chass.ncsu.edu/garson/PA765/mds.htm#ALSCAL http://takane.brinkster.net/Yoshio/c045.pdf