Multivariate Analysis

ARVIND BANGER
ASSISTANT PROFESSOR
DEPARTMENT OF MANAGEMENT
FACULTY OF SOCIAL SCIENCES
DEI
ARVIND BANGER
MULTIVARIATE ANALYSIS TECHNIQUE

It is used to simultaneously analyze more than two
variables on a sample of observations.
Objective : to represent a collection of massive data

in a simplified way.
i.e. transform a mass of observations into a smaller
number of composite scores in such a way that
they may reflect as much information as possible
contained in the raw data obtained concerning the
research study
ARVIND BANGER
All multivariate methods
Are some variables

dependent?
YES
NO
Dependence
methods
Interdependence
methods
How many variables

are dependent?
Are all inputs

Metric?
ONE
Several
Is it metric?
Are they metric?
YES
NO
Multiple
Regression
Multiple
discriminant
analysis
YES
YES
Factor
analysis
NO
Cluster
analysis
Metric
MDS
NO
Non-metric
MDS
Latent structure
analysis
Canonical analysis
MAV
ARVIND BANGER
VARIABLES IN MULTIVARIATE ANALYSIS

Explanatory Variable & Criterion Variable
If X may be considered to be the cause of Y, then X is

described as explanatory variable and Y is described as
criterion variable. In some cases both explanatory variable
& criterion variable may consist of a set of many variables
in which case set (X1, X2, X3, ..Xp) may be called a set of
explanatory variables and the set (Y1, Y2, Y3, ..Yp) may be
called a set of criterion variables if the variation of the
former may be supposed to cause the variation of the latter
as whole.
ARVIND BANGER
OBSERVABLE VARIABLES & LATENT

VARIABLES
Explanatory variables described above are supposed to be
observable directly in some situations, and if this is so,
the same are termed as observable variables. However,
there are some unobservable variables which may
influence the criterion variables. We call such
unobservable variables as latent variables.
DISCRETE VARIABLE & CONTINUOUS

VARIABLE
Discrete variable is that variable which when measured
may take only the integer value whereas continuous
variable is one which, when measured, can assume any
real value.
ARVIND BANGER
DUMMY VARIABLE
This term is being used in a technical sense and is useful in

algebraic manipulations in context of multivariate analysis.
We call Xi (i = 1,..,m) a dummy variable, if only one of Xi is
1 and the others are all zero.
ARVIND BANGER
IMPORTANT MULTIVARIATE
TECHNIQUES
ARVIND BANGER
MULTIPLE DISCRIMINANT ANALYSIS

Through this method we classify individuals or objects
into two or more mutually exclusive & exhaustive
groups on the basis of a set of independent variables.
Used for single dependent variable which in nonmetric
E.g. brand preference, that depends on individuals
age, education, income etc.
ARVIND BANGER
Contd
E.g. if an individual is 20 years old, has income of Rs 12,000
and 10 years of formal education

If b1, b2, b3 are the weights given to these independent
variables, then his score would be
Z=b1(20)+b2(12000)+b3(10)
ARVIND BANGER
FACTOR ANALYSIS
APPLICABLE WHEN there is systematic
interdependence among a set of observed variables

and the researcher wants to find out something more
fundamental/latent which creates this commonality.
E.g. observed variables: income, education,
occupation, dwelling area

latent factor: social class
ARVIND BANGER
10
i.e.
a large set of measured variables is resolved into
a few categories called
factors
ARVIND BANGER
11
MATHEMATICAL BASIS OF FACTOR

ANALYSIS
SCORE MATRIX
Measures (variables)
objects
a1
b1
c1
k1
a2
b2
c2
k2
a3
b3
c3
k3
aN
bN
cN
kN
ARVIND BANGER
12
FEW BASIC TERMS

FACTOR : It is an underlying dimension that account for
several observed variables.

FACTOR LOADING : They are the values which explain
how closely the variables are related to each one of the

factors discovered. Its absolute size helps in interpreting
the factor.
COMMONALITY (h2) : It shows how much of each factor
is accounted for by the underlying factor taken together.

h2 of the variable=(ith factor loading of factor A)2+(ith
factor loading of factor B)2
ARVIND BANGER
13
Contd
EIGEN VALUE : It is the sum of squared values of the factor
loadings relating to a factor. It indicates relative

importance of each factor in accounting the particular set
of variables being analyzed.
TOTAL SUM OF SQUARES : It is the sum of squared
values of factor loadings related to a factor.

FACTOR SCORES : These represent the degree to which
each respondent gets high scores on the group of items that

load high on each factor.
ARVIND BANGER
14
METHODS OF FACTOR ANALYSIS

Centroid method
Principal-components method
Maximum likelihood method
ARVIND BANGER
15
CENTROID METHOD
This method tends to maximize the sum of loadings
ILLUSTRATION :
Given the following correlation matrix, R, relating to 8
variables with unities in the diagonal spaces. Work out
the first & second centroid factors:
ARVIND BANGER
16
Variables
1
Variables
0.709
0.204 0.051
0.081 0.089 0.671
0.262 0.581 0.123
0.22
0.113 0.098 0.689 0.798 0.047
0.155 0.083 0.582 0.613 0.201 0.801
0.774 0.652 0.072 0.111 0.724
0.709 0.204 0.081 0.262 0.113 0.155 0.774

1
0.051 0.089 0.581 0.098 0.083 0.652
ARVIND BANGER
0.671 0.123 0.689 0.582 0.072

0.022 0.798 0.613 0.111
1
0.047 0.201 0.724

1
0.12
0.891
0.12
0.152
0.152
17
SOLUTION:
As the matrix is positive manifold, the weights of

various variables are taken as (+1) i.e. variables are
simply summed.
a)
The sum of coefficients in each column of the

correlation matrix are worked out.
The sum of these columns (T) is obtained.
The sum of each column obtained as per (a) is divided
by the square root of T obtained in (b), to get centroid
loadings. The full set of loadings so obtained constitute
the first centroid factor.( say A)
b)
c)
ARVIND BANGER
18
Variables
1
Variables
0.709
0.204 0.051
0.081 0.089 0.671
0.262 0.581 0.123
0.22
0.113 0.098 0.689 0.798 0.047
0.155 0.083 0.582 0.613 0.201 0.801
0.774 0.652 0.072 0.111 0.724
column sum(Si)
0.709 0.204 0.081 0.262 0.113 0.155 0.774

1
0.051 0.089 0.581 0.098 0.083 0.652

1
0.671 0.123 0.689 0.582 0.072

0.022 0.798 0.613 0.111
0.047 0.201 0.724

1
0.12
0.891
0.12
0.152
0.152
3.662 3.263 3.392 3.385 3.324 3.666 3.587 3.605
Sum of column sums (T) = 27.884, T=5.281

First centroid
factor A, Si /T
0.693 0.618 0.642 0.641 0.629 0.694 0.679 0.683

ARVIND BANGER
19
We can also state the information as:Variables
Factor loadings concerning
first centroid factor A

1
0.693
0.618
0.642
0.641
0.629
0.694
0.679
ARVIND BANGER
0.683
20
FINDING SECOND CENTROID FACTOR : The loadings for the
variables on the first centroid factor are multiplied. This is

done for all possible pairs of variables resulting matrix is
named as Q1
ARVIND BANGER
21
First Matrix of Factor cross product (Q1)
0.693 0.618 0.642 0.641 0.629 0.694 0.679 0.683
0.693
0.48
0.428 0.445 0.444 0.436 0.481 0.471 0.437
First centroid 0.618 0.428 0.382 0.397 0.396 0.389 0.429

factor A
0.42
0.422
0.642 0.445 0.397 0.412 0.412 0.404 0.446 0.436 0.438

0.641 0.444 0.396 0.412 0.411 0.403 0.445 0.435 0.438
0.629 0.436 0.389 0.404 0.403 0.396 0.437 0.427
0.43
0.694 0.481 0.429 0.446 0.445 0.437 0.482 0.471 0.474

0.679 0.471
0.42
0.436 0.435 0.428 0.471 0.461 0.464
0.683 0.473 0.422 0.438 0.438

ARVIND BANGER
0.43
0.474 0.464 0.466

22
Now, Q1 is subtracted element by element from the original matrix

R, resulting in matrix of residual coefficients R1.
ARVIND BANGER
23
First matrix of residual coefficient(R1 )

1
0.52
0.281 -0.24
Variables 2 0.281 0.618 -0.35
-0.36
0.19
-0.37
-0.31 0.192 -0.33
-0.32 0.301
-0.34
0.23
3 -0.24
-0.35 0.588 0.259 -0.28
4 -0.36
-0.31 0.259 0.589 -0.38 0.353 0.178 -0.33
0.192 -0.28
0.19
0.43
-0.38 0.604 -0.39
0.146 -0.37
-0.22 0.294
6 -0.37
-0.33 0.243 0.353 -0.39 0.518
0.33
7 -0.32
-0.34 0.146 0.178 -0.23
0.33
0.539 -0.31
-0.33 0.294 -0.35
-0.31 0.534
8 0.301 -0.23
-0.37
ARVIND BANGER
-0.35
24
Now reflecting the variables 3, 4, 6, 7 ,we obtain reflected matrix

of residual coefficients R1 as given below. Again the same method
is repeated to get the centroid factor B
ARVIND BANGER
25
First matrix of residual coefficient(R1)

1
Variables
3*
4*
0.281 0.241 0.363
0.19
6*
7*
0.52
0.281 0.618 0.346 0.307 0.192 0.331 0.337
3* 0.241 0.346 0.588 0.259 0.281
0.368 0.316 0.301
0.43
0.23
0.146 0.366
4* 0.363 0.307 0.259 0.589 0.381 0.353 0.178 0.327
0.19
0.192 0.281 0.381 0.604
0.39
0.217 0.294
0.39
0.518
0.33
7* 0.316 0.337 0.146 0.178 0.226
0.33
0.539 0.312
6* 0.368 0.331 0.243 0.353
0.354
0.301
0.23
0.366 0.327 0.294 0.354 0.312 0.534
2.58
2.642
2.47
2.757 2.558 3.074 2.375 2.718
Sum of column sums(T)=20.987 , T=4.581

ARVIND BANGER
26
We can now write the matrix of factor loadings as under:

Variables
Factor loadings
Centroid factor A
Centroid factor B
0.693
0.563
0.618
0.577
0.642
-0.539
0.641
-0.602
0.629
0.558
0.694
-0.63
0.678
-0.518
0.683
0.593
ARVIND BANGER
27
Hence centroid factor B, and commonality(h2) is as follows

Variables
Commonality(h2)
Factor loadings
Centroid factor A
Centroid factor B
A2+B2
0.693
0.563
0.797
0.618
0.577
0.715
0.642
-0.539
0.703
0.641
-0.602
0.773
0.629
0.558
0.707
0.694
-0.63
0.879
0.678
-0.518
0.729
0.683
0.593
0.818
ARVIND BANGER
28
Proportion of variance
Variables
Factor loadings
Commonality(h2)
Centroid factor A
Centroid factor B
Eigen value
3.49
2.631
6.121
Proportion of
0.44
0.33
0.77
total variance
[44%]
[33%]
[77%]
Proportion of
0.57
0.43
common variance
[57%]
[43%]
[100%]
ARVIND BANGER
29
PRINCIPAL-COMPONENTS METHOD
This method seeks to maximize the sum of squared loadings
of each factor .Hence the factors in this method explain more
variance than the loadings obtained from any other method of
factoring.
Principal components are constructed which are linear
combination of given set of variables.
p1 = a11X1+a12X2+.+a1nXn
p2= a21X1+a22X2+.+a2nXn
and so on till pn
The aijs are called loadings and worked out in such a way that
PC are uncorrelated(orthogonal) and first PC has maximum
variance.
ARVIND BANGER
30
ILLUSTRATION:
Take the correlation matrix R for 8 variables and compute:
(i) the first two principal component factors.
(ii) the communality for each variable on the basis of said
two component factors.
(iii) the proportion of total variance as well as the proportion
of common variance explained by each of the two
component factors.
ARVIND BANGER
31
SOLUTION:
As the correlation matrix is positive manifold we work out the
1st principal component factor as follows:
The vector of column sums is referred to as Ua1 and when it
is normalized by, we call it Va1.

To normalize: square the column sums in Ua1 and then
divide each element in Ua1 by the square root of the sum of
squares.
ARVIND BANGER
32
Variables
1
Variables
0.709
0.204 0.051
0.081 0.089 0.671
0.262 0.581 0.123
0.22
0.113 0.098 0.689 0.798 0.047
0.155 0.083 0.582 0.613 0.201 0.801
0.774 0.652 0.072 0.111 0.724
0.709 0.204 0.081 0.262 0.113 0.155 0.774

1
0.051 0.089 0.581 0.098 0.083 0.652

1
0.671 0.123 0.689 0.582 0.072

0.022 0.798 0.613 0.111
0.047 0.201 0.724

1
0.12
0.891
0.12
0.152
0.152
column sum Ua1
3.662 3.263 3.392 3.385 3.324 3.666 3.587 3.605
Va1
=
Ua1/normalizing
factor
0.371 0.331 0.334 0.343 0.337 0.372 0.363 0.365

ARVIND BANGER
33
Normalizing factor:
={(3.622)2+(3.263)2+(3.392)2+(3.385)2+(3.324)2+(3.666)2+(3.587)2+
(3.605)2}
=9.868
We now obtain Ua2 by accumulatively multiplying Va1 row
by row into R resulting in:

Ua2 : [1.296, 1.143, 1.201, 1.201, 1.165, 1.308,1.280, 1.275]
Normalizing it we obtain:
Va2 : [0.371, 0.327, 0.344, 0.344, 0.344, 0.374,0.366,0.365]
Va1 and Va2 are almost equal i.e. convergence has occurred .
Finally we compute the loadings as follows:
ARVIND BANGER
34
Variables
(Characteristic
vector Va1)
normalizing
factor
Principal
of Ua2
component
0.371
1.868
0.69
0.331
1.868
0.62
0.334
1.868
0.64
0.343
1.868
0.64
0.337
1.868
0.63
0.372
1.868
0.70
0.363
1.868
0.68
0.365
1.868
0.68
ARVIND BANGER
35
We now find principal component II (acc. to method followed to

obtain centroid factor B earlier) to get:
Variables
Principal component
II
1
0.57
0.59
-0.52
-0.59
0.57
-0.61
-0.49
-0.61
ARVIND BANGER
36
Variables
Principal components
commonality(h2)
II
I2+II2
0.69
0.57
0.801
0.62
59
0.733
0.64
-0.52
0.68
0.64
-0.59
0.758
0.63
0.57
0.722
0.7
-61
0.862
0.68
-0.49
0.703
0.68
-0.61
0.835
Eigen value
3.4914
2.6007
6.0921
Proportion of
0.436
0.325
0.761
total variance
43.6%
32.5%
76%
Proportion of
0.537
0.427
1.00
57%
43%
100%
common varianceARVIND
BANGER
37
MAXIMUM LIKELIHOOD METHOD

If Rscorrelation matrix actually obtained from data in
sample
& Rpcorrelation matrix obtained if entire population is
tested.
then ML method seeks to extrapolate what is known in Rs
in the best possible way to estimate Rp.
ARVIND BANGER
38
CLUSTER ANALYSIS
Unlike techniques for analyzing relationship between
variables it attempts to reorganize a differentiated group of

people, events or objects into homogenous subgroups.
STEPS:
Selection of sample to be clustered (buyers, products etc.).
Definition of the variables on which to measure the
objects, events etc. (e.g. market segment characteristics,
product competition definitions etc.)
Computation of similarities among entities (through
correlation, euclidean distances, and other techniques)
Selection of mutually exclusive clusters(maximisation of
within cluster similarity and between cluster differences).
Cluster comparison & validation.
ARVIND BANGER
39
CA used to segment car buying population into distinct markets
A minivan buyers
B sports car buyers
Income
Age
Family size
ARVIND BANGER
40
MULTIDIMENSIONAL SCALING (MDS)

This creates a special description of a respondents perception about
a product or service and helps business researcher to understand

difficult to measure construct like product quality.
Method:
We may take three type of attribute space, each representing a

multidimensional space
1. Objective space :object positioned in terms of measurable attributes
like objects weight, flavor and nutritional value.
2. Subjective space : perceptions about objects flavor, weight and
nutritional value.
3. Preference space: describes respondents preferences using objects
attributes (ideal point).All objects close to this ideal point are
interpreted as preferred .
Ideal points from many people can be positioned in this preference
space to reveal the pattern and size of preference clusters. Thus
Cluster analysis and MDS can be combined to map market
segments and then examine product designed for those segments.
ARVIND BANGER
41
CONJOINT ANALYSIS
Used in market research & product development
Takes non-metric & independent variables as input
E.g. considering purchase decision of a computer, if we have

3 prices, 3 brands, 3 speeds, 2 levels of educational values, 2
categories of games, & 2 categories of work assistance, then
model will have (3*3*3*2*2*2)=216 decision levels
Objective of Conjoint analysis is to secure Utility Scores, that
represent the importance of each aspect of the product in
buyers overall preference rating.
Utility Scores are computed from buyers ratings of set of
cards . Each card in the deck describes one possible
configuration of combined product attributes.
ARVIND BANGER
42
Steps followed in Conjoint Analysis

Select the attribute most pertinent to the purchase
decision (called factor).
Find the possible values of attribute (called factor
levels).
After selecting the factors and their levels SPSS determines
the No. of product descriptions necessary to estimate the
utilities. It also builds a file structure for all possible
combinations , generate the subset required for testing ,
produce the card description and analysis the results.
ARVIND BANGER
43
THANK YOU
ARVIND BANGER
44

Multivariate Analysis

Uploaded by

Document Information

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Multivariate Analysis

Uploaded by

Copyright:

Available Formats

ARVIND BANGER

MULTIVARIATE ANALYSIS TECHNIQUE

Objective : to represent a collection of massive data

All multivariate methods

Are some variables

How many variables

Are all inputs

Are they metric?

VARIABLES IN MULTIVARIATE ANALYSIS

If X may be considered to be the cause of Y, then X is

OBSERVABLE VARIABLES & LATENT

DISCRETE VARIABLE & CONTINUOUS

This term is being used in a technical sense and is useful in

MULTIPLE DISCRIMINANT ANALYSIS

Used for single dependent variable which in nonmetric

E.g. brand preference, that depends on individuals

age, education, income etc.

and 10 years of formal education

interdependence among a set of observed variables

occupation, dwelling area

MATHEMATICAL BASIS OF FACTOR

FEW BASIC TERMS

several observed variables.

how closely the variables are related to each one of the

is accounted for by the underlying factor taken together.

loadings relating to a factor. It indicates relative

values of factor loadings related to a factor.

each respondent gets high scores on the group of items that

METHODS OF FACTOR ANALYSIS

0.081 0.089 0.671

0.262 0.581 0.123

0.113 0.098 0.689 0.798 0.047

0.155 0.083 0.582 0.613 0.201 0.801

0.774 0.652 0.072 0.111 0.724

0.709 0.204 0.081 0.262 0.113 0.155 0.774

0.051 0.089 0.581 0.098 0.083 0.652

0.671 0.123 0.689 0.582 0.072

0.047 0.201 0.724

As the matrix is positive manifold, the weights of

The sum of coefficients in each column of the

0.081 0.089 0.671

0.262 0.581 0.123

0.113 0.098 0.689 0.798 0.047

0.155 0.083 0.582 0.613 0.201 0.801

0.774 0.652 0.072 0.111 0.724

0.709 0.204 0.081 0.262 0.113 0.155 0.774

0.051 0.089 0.581 0.098 0.083 0.652

0.671 0.123 0.689 0.582 0.072

0.047 0.201 0.724

3.662 3.263 3.392 3.385 3.324 3.666 3.587 3.605

Sum of column sums (T) = 27.884, T=5.281

0.693 0.618 0.642 0.641 0.629 0.694 0.679 0.683

We can also state the information as:Variables

Factor loadings concerning

first centroid factor A

FINDING SECOND CENTROID FACTOR : The loadings for the

variables on the first centroid factor are multiplied. This is

First Matrix of Factor cross product (Q1)

0.693 0.618 0.642 0.641 0.629 0.694 0.679 0.683

0.428 0.445 0.444 0.436 0.481 0.471 0.437

First centroid 0.618 0.428 0.382 0.397 0.396 0.389 0.429

0.642 0.445 0.397 0.412 0.412 0.404 0.446 0.436 0.438

0.629 0.436 0.389 0.404 0.403 0.396 0.437 0.427

0.694 0.481 0.429 0.446 0.445 0.437 0.482 0.471 0.474