You are on page 1of 16

MULTIVARIATE ANALYSIS

Nature of Multivariate Analysis


Most business problems are multi-dimensional. MVA helps to solve complex problems
Investigation of one variable –Univariate Analyis
Investigation of 2 variables – Bivariate Analysis
Investigate 3 or more variables –Multi-variate analysis
Eg; sales dependent on profits alone –
Classification of MV techniques
1. Dependence methods
2. Interdependence methods

Are Some of the


Variables Dependent
on Others?

Yes No

Interdependence
Dependence Methods
Methods
If a multivariate technique attempts to explain or predict the dependent variables on the
basis of 2 or more independent then we are analyzing dependence. Multiple regression
analysis, multiple discriminant analysis, multi-variate analysis of variance and canonical
correlation analysis are all dependence methods.

Analysis of Interdependence
The goal of interdependence methods is to give meaning to a set of variables or to seek to
group things together. No one variable or variable subset is to be predicted from the
others or explained by them. The most common of these methods are factor analysis,
cluster analysis and multidimensional scaling. A manager might utilize these techniques
to identify profitable market segments or clusters. Can be used for classification of
similar cities on the basis of population size, income distribution etc;
As in other forms of data analysis, the nature of measurement scales will determine
which MV technique is appropriate for the data. The exhibits below show the selection of
MV technique requires consideration of the types of methods for dependent and
interdependent variables.
Non-metric- nominal and ordinal scales
Metric- Interval and ratio scales
Exhibit 1 – independent variable is metric
Classification of dependence Methods
How many Variables
are dependent?

Multiple dependent
One dependent Several dependent
and independent
variable variables
variables

Metric Non-metric Metric Non-metric Canonical Analysis

Multiple discriminant
Multiple regression MANOVA Conjoint analysis
analysis

ANALYSIS OF DEPENDENCE
Multiple Regression analysis is an extension of bivariate regression analysis, which
allows for the simultaneous investigation of the effect of two or more independent
variables on a single interval-scaled dependent variable. In reality several factors are
likely to affect such a dependent variable.
An example of a multiple regression equation is
Y = a + B1X1 + B2X2 + B3X3+ …………..BnXn + e
Where B0= a constant, the value of Y when all X values =0
Bi= slope of the regression surface, B represents the regression coefficient associated
with each X
E= an error term, normally distributed about a mean of 0
Let us look at a forecasting example. Suppose a toy manufacturer wishes to forecast sales
by sales territory. It is thought that competitor’s sales, the presence or absence of a
company’s salesperson in the territory (a binary variable) and grammar school enrollment
are the independent variables that might explain the variation in the sales of a toy. The
data is fit in and the results from the mathematical computations are as follows
Y = 102.18 + 387X1 + 115.2X2 + 6.73X3
R2 = 0.845
F value 14.6
The regression equation indicates sales are positively related to X1 and X2 and X3
The coefficients B show the effects on the dependent variables of unit increases in any
independent variable. The value of B2 = 115.2 indicates that an increase of Rs 115,200 in
toy sales is expected with an additional unit of X2. Thus it appears that adding a company
salesperson has a very positive effect on sales. Grammar school enrollments also help
predict sales. An increase in 1 enrollment of students ( 1000) indicates a sales increase of
Rs 6730. A 1 unit increase in competitor sales volume X1 does not add much to the toy
manufacturer’s sales.
The regression coefficient can either be stated in raw score units (actual X values) or as
standardized coefficients(X values in terms of their standard deviation. When regression
coefficient are standardized they are called as beta weights B an their values indicate the
relative importance of the associated X values especially when predictors are unrelated. If
B1= .60 and B2 = .20 then X1 has three times the influence on Y as X2
In multiple regression the coefficients B1 and B2 etc are called coefficients of partial
regression because the independent variables are correlated with other independent
variables. The correlation between Y and X1, with the correlation that X1 and X2 have in
common with Y held constant is partial correlation. Because the partial correlation
between sales and X1 has been adjusted for the effect produced by variation produced in
X2(and other independent variables) , the coefficient of correlation coefficient obtained
from the bivariate regression will not be the same as the partial coefficient in the multiple
regression coefficient. N multiple regression the coefficient B1 is defined a partial
regression coefficient for which the other independent variables are held constant.
The coefficient of multiple determination indicates the percentage of variation in Y
explained by the variation in the independent variables. R2 = .845 tells us that the
variation in the independent accounted for 64.5% of the variance in the dependent
variable. Adding more of the independent variables in the equation explains more of the
variation in Y.
To test for statistical significance an F-test comparing the different sources of variation is
necessary. The F test allows for testing the relative magnitudes of the sum of squares due
to the regression (SSe) and the error sum of squares (SSr) with their appropriate degrees
of freedom
F = (SSr)/k/ (SSe) / (n-k-1)
K= nos of independent variables
N= nos of respondents or observations.
Refer F tables and test hypothesis at .05 level of significance
In the eg F ratio = 14.6
df = df for numerator =k =3
df for denominator n-k-1 = 8
Accept or reject H0 on the basis of comparison between calculated and table value
A continuous interval-scaled dependent variable is required in multiple regression as in
bivariate regression, interval scaling is also required for the independent variables.
However dummy variables such as the binary variable in our example may be utilized. A
dummy variable is one that has two or more distinct levels 0 and 1
Multiple regression is used as a descriptive tool in three types of situations
1. It is often used to develop a self-weighting estimating equation by which to predict
values for a criterion variable (DV) from the values of several predictor variables
(IV)
2. A descriptive application of multiple reg calls for controlling for confounding
variables to better evaluate the contribution of other variables- control brand and
study effect of price alone
3. To test and explain casual theories-referred to as Path analysis-reg is used to
describe an entire structure of linkages that have advanced from casual theories
4. Used as an inference tool to test hypotheses and estimate population values
Let us look at the following eg for SPSS
Let us assume that we use multiple regression to arrive at key drivers of customer usage
for hybrid mail. Among the explanatory variables are customer perceptions of (1) cost
speed valuation, (2) security, (3) reliability, (4) receiver technology, (5) Impact/emotional
value. Let us choose the first 3 variables all measured on a 5 point scale
Y=customer usage
X1=cost/speed evaluation
X2= security
X3= reliability
SPSS computed the model and the regression coeff. Eqn can be built with
1. specific variables
2. all variables
3. select a method that sequentially adds or removes variables.. Forward selection
starts with the constant and variables that result in large R2 increases. Backward
elimination begins with a model containing all independent var and removes var
that changes R2 the least. Stepwise selection, most popular, combines the two.
The independent var that contributes the most to explaining the dependent var is
added first. Subsequent var are added based on their incremental contribution over
the first var whenever they meet the criterion of entering the Eqn (eg a level of sig
of .01. var may be removed at each step if they meet the removal criterion which
is larger sig level than for entry
The std elements of a step-wise important indicator of the relative importance of
predictor variables output are shown in exhibit

Collinearity and Multicollinearity


Is a situation where two or more of the independent variables are highly correlated and
this can have a damaging effect on the multiple regression. When this condition exists,
the estimated regression coeff can fluctuate widely from sample to sample making it
risky to interpret the coeff as an as an important indicator of predictor var. Just how high
can acceptable correlations be between indep var? There is no definitive answer, but cor
at .80 or> should be dealt with in one of the following two ways
1. Choose one of the var and delete the other
2. create a new var that is composite of the highly inter-correlated variables use this
var in place of its components. Making this decision with a corr matrix alone is
not sufficient. The exhibit shows a VIF index. This is a measure of the effect of
other indep var on a reg coeff. Large values of 10 or more suggests Collinearity or
Multicollinearity. With only 3 predictors his is not a problem.
3. Another difficulty with reg occurs when researchers fail to evaluate the eqn with
data beyond those used originally to calculate it. A solution would be to set aside
a portion of the data and use only the remainder to estimate the eqn. This is called
a hold out eg. One then uses the eqn on the holdout data to calculate R2. This can
then be compared to the original R2 to determine how well the eqn predicts
beyond the database.

DISCRIMINANT ANALYSIS
In a myriad of situations the researcher’s purpose is to classify objects by a set of
independent variables into two or more exclusively categories. A manager might want to
distinguish between applicants as those to hire and not to hire. The challenge is to find
the discriminating variables to be utilized in a predictive equation that will produce better
than chance assignment of the individuals to the groups.
The prediction of a categorical variable (rather than a continuous interval-scaled variable
as in multiple regressions) is the purpose of multiple discriminant analysis. In each of the
above problems the researcher must determine which variables are associated with the
probability of an object falling into a particular group. In a statistical sense the problems
of studying the direction of group differences is the problem of finding a linear
combination of independent variables, the discriminant function that shows large
differences as group means. Discriminant analysis is a statistical tool for determining
such linear combinations. Deriving the coefficients of a linear function is the task of a
researcher.
We will consider a two group discriminant analysis problem where the dependent
variable Y is measured on a nominal scale (n way discriminant analysis is possible)
Suppose a personnel manager believes that it is possible to predict whether an applicant
will be successful on the basis of age, sales aptitude test scores and mechanical ability
test scores stated at the outset, the problem is to find a linear combination of the
independent variables that shows large differences in group means. The first task is to
estimate the coefficients of the individuals discriminant scores. The following linear
function is used
Zi =b1X1i +b2x2i +………+ bnXni
Where
Xni = applicant’s value on the nth independent variable
bn= discriminant function for the nth variable
Zi = ith applicants discriminant score
Using scores for all individuals in the sample, a discriminate function is determined based
on the criterion that the groups be maximally discriminated on the set of independent
variables. Returning to the example with three independent variables, suppose the
personnel manager calculates the standardized weights in the equation to be
Z = b1X1 +b2X2+b3X3
= .069X1 + .013X2 +.0007X3
This means that age (X1) is much more important than the sales aptitude test scores(X2)
and mechanical ability (X3) has relatively less discriminating power.
In the computation of linear discriminant function weights are assigned to the variables
such that the ratio of difference between the means of the two groups to the std dev
within the group is maximized. The standardised discriminant coefficients of weights
provide information about the relative importance of each of these variables in
discriminating between these groups.
An important goal of discriminant analysis is to perform a classification function. The
object of classification in our example is to predict which applicants will be successful
and which will be unsuccessful and to group them accordingly. To determine if
discriminant anlysis can be used as a good predictor information provided in the
‘confusion matrix’is utilized. Suppose the personnel manager has 40 successful and 45
unsuccessful employees in the sample

Confusion Matrix

Predicted Group

Actual Group Successful Unsuccessful

Successful 34 6 40

Unsuccessful 7 38 45
The confusion matrix shows that the number of correctly classified employees (76%) is
much higher than would be expected by chance. Tests can be performed to determine if
the create of correct classification is statistically significant.
A second example will allow us to portray DA from a graphic perspective. Suppose a
bank loan officer wants to segregate corporate loan applicants into those likely to default
and not o default. Assume that some data is available on a group of firms that went
bankrupt and another that did not. For simplicity we assume that only current ratio and
debt/asset ratio are analysed. The ratio for the sample firms are given.
The data in the table have been plotted in the graph. Xs represent firms that remained
solvent. For eg Point A in the upper left segment is the point for firm 2 which had a
current ratio of 3.0 and debt/asset ratio of 20% .The dot at point A indicates that the firm
did not go bankrupt. From a graphic perspective we construct a boundary line (the
discriminant function) through the graph such that if a firm is to the left of the line it is
not likely to become insolvent. In our example the line takes this form
Z= a + b1(current ratio) +b2(debt/asset ratio)
Here a is a constant term and b1 and b2 indicate the effect that the current ratio and the
debt/asset ratio have on the probability of a firm going bankrupt.
The following discriminant function is obtained

Z= - .3877-1.0736(current ratio) +.0579(debt/asset ratio)


This equation may be plotted in the graph as the locus of points for which Z=0. All
combinations of current ratio and debt/asset ratio shown on the line result in Z=0.
Companies that lie to the left of the line are not likely to go bankrupt while those to the
right are likely to fail. It can be seen for the graph that one X indicating a failed company
lies to the left of the point while two dots indicating non bankrupt companies lie to the
right of the line. Thus the DA failed to properly classify three companies.
Once we have determined the parameters of the discriminant function we can calculate
The Z scores for our hypothetical companies given may be interpreted as follows
• Z=0 50-50 probability of future bankruptcy (say within 2 yrs). The companies lie
on the boundary line
• Z<0 If Z is negative there is less than a 50% probability of bankruptcy. The
smaller the Z score the lower the probability of bankruptcy
• Z>0 If Z is positive the probability of bankruptcy is greater that 50%. Larger Z
greater the probability of bankruptcy
The mean Z score of the companies that did not go bankrupt is-.583, while that of the
bankrupt firms is +.648. These means along with approximations of the Z score
probability distributions of the 2 groups are graphed. We may interpret this graph as
indicating that if Z is about < than -.3 there is a small probability that a firm will turn out
to be bankrupt, while if Z is> than +.3 there is only a small probability that it will remain
solvent. If Z is in the range +_ .3 we are highly uncertain as to how the firms will be
classified-zone of ignorance
The sign of the coeff of the discriminant fn are logical. Since its coeff is negative, the
larger the current ratio, the lower a company’s Z-score and lower the Z score the smaller
the probability of failure
Similarly high debt ratios produce high Z scores which means higher probability of
bankruptcy.
In the discriminant fn we have been discussing only 2 var but other var like ROA(rate of
return on assets)can be introduced.
Z=a +b1(current ratio) +2(D/A) +b3 (ROA)
Multivariate Analysis of Variance(MANOVA)
MANOVA is used when there are multiple interval-or ratio-scaled dependent variables.
There be one or more nominally scaled independent variables. . By manipulating the
sales compensaton system in an experimental situation and by holding the compensation
system constant in a controlled situation a researcher may be able o identify the effect of
anew compensation system on the sales volume as well job satisfaction and turnover.
With MANOVA a significance test of mean difference between groups can be made
simultaneously for two or more dependent variables
MANOVA assess the relationship between two or more dep var and classifactory var or
factors. In business research MANOVA can be used to test differences among samples of
employees, customers, manufactured items etc;
MANOVA is similar to univariate ANOVA with the added ability to handle several
independent var. MANOVA employs sum of squares ans cross products (SSCP) matrices
to test for differences among groups. The variance between groups is determined by
partitioning the total SSCP matrix nad testing for sig. The F ratio generalsed to a ratio of
the withih group variance and total group variance matrix test for equality among
treatment groups.
The central hypothesis of MANOVA is that all centroids(multivariate means) are equal.
When H0 is rejected additional tests are done to better understand the data. Several
alternatives may be considered.
1. Univariate Ftests can be run on dep var
2. Simultaneous confidence intervals cn be produced for each var
3. Stepdown analysis like stepwise reg can be run by computing F values
successively. Each value is computed after the effects of the previous dependent
variables are eliminated.
4. Multiple discriminanat analysis can be used on the SSCP matrices. This aids in
the discovery of which var contribute to MANOVA sig
When MANOVA is applied properly, the dep var are correlated. If the dep var are
unrelated there would be no test for multivariate test and we could use separate F tests for
each characteristic.
CONJOINT ANALYSIS
In management research the most common applications for conjoint analysis are market
research and product development. A customer buying a computer may evaluate a set of
attributes to choose a product that best meets their needs. They may consider brand,
speed, price, educational values games or capacity for work-related risks. The attributes
and their features require the buyer to make trade-offs in the final decision making.
Method
CA uses input from nonmetric independent variables. Nomally we would use cross-
classification tables to handle such data, but even multiway tables become complex. If
there were three prices, three brands, three speeds, two levels of educational values, two
categories for games, and two categories for work assistance. The model will have
(3x3x3x2x2x2) This poses enormous difficulties for respondents and researchers. CA
solves this problem with various optimal scaling approaches, often with log linear
models, to provide reliable answers.
The objective of Ca is to obtain utility scores that represent the importance of each aspect
of the product in the subjects overall performance rankings or ratings of a set of cards.
Each card in the deck describes one possible configuration of combined product
attributes.
The first step in a conjoint analysis is to select the attributes most pertinent for the
purchase decision.. This may require a exploratory study such as a purchase group or
done by an expert. The attributes selected are independent factors called factors, the
possible values far an attribute are called factor levels. Factors like speed can be
quantified and others like brand are discrete variables.
After selecting the factors and their factor levels a computer programme determines the
number of product descriptions necessary to estimate the utilities. SPSS ORTHOPLAN<
PLANCARDS< and CONJOINT build a file structure for all possible combinations,
generate the subset required for testing, produce the card descriptions and analyse results.
The command structure within these procedures provide for holdout sampling,
simulations and other requirements frequently used in commercial applications

INTERDEPENDENCE METHODS
Are Inputs metric?

Metric-scales are ratio or interval Non metric-the scales are nominal or ordinal

Factor Analysis Non-metric multi-dimensional scaling

Cluster Analysis

Multi-dimensional scaling

Factor Analysis
FA is a general term for several specific computational techniques. It has the objective of
reducing to a manageable number many var that belong together and have many
overlapping measurement characteristics. The predictor-criterion relationship that was
found in the dependence situation is replaced by a matrix of inter correlations among
several variables, none of which viewed as dependent on the other. For eg, one may have
data on 100 employees with scores on 6 attitude scale items.
Method
FA begins with the construction of new set of var based on the relationships in the
correlation matrix. While this can be done in a number of ways, the frequently used
approach is the Principal Component Analysis. This method transforms a set of var into a
new set of composite var or principal components that are not correlated with one
another. These linear combinations of var called factors, account for the variance in the
data as a whole. The best combinations makes up the first principal component. The
second principal component is defined is defined as the best linear combination of var for
the variance not explained by the first factor. In turn there may be third, fourth and kth
component, each being the best of linear combination of variables not accounted for by
the previous factors.
The process continues until all the var have been accounted for but is usually stopped
after a few fac have been extracted.
The values in the table are cor coeff bet fac and the var. (.70 is the r bet fac 1 and var
A)The cor coeff are called as loadings Eigen values are the sum of the variances of the
fac values (.70sq+.60Sq---+.60Sq). When divided by the nos of var aneigen value yields
an estimate of tha amt of total var explained by the fac. Eg fac 1 accounts for36% of the
tot var. The col hsq gives the communalities or estimates of the var in each var explained
by 2 other fac. With var A the communality is .70sq=(-.40) sq =.65, indicating that that
65% of the variance in A is statistically explained in terms of fac 1&2
In this case the unratated fac loadings are not enlightening. We need to find some pattern
in fac 1 which would have ahigh r on some var and fac II on others. We can attempt to
secure this less ambiguous condition bet fac nad var by rotation. This procedure can be
carried out by either orthogonal or oblique methods.
The interpretation of fac loadings is Lrgely subjective. There is no way to calculate the
meaning of the fac, they are what one sees them. For this reason fac analysis is largely
use dfor exploration. One can detect patterns in latent var, discover new concepts and
reduce data.
In order to further clarify the fac, a varimax rotation issued to secure the matrix. Varimax
can clarify relationships but interpretation is largely subjective

CLUSTER ANALYSIS
CA is a technique of grouping similar objects or prople. CA shares some similarities with
FA, especially when FA is applied to people instead of var. It differs from discriminant
analysis in that DA begins with a well define group composed of 2 or more distinct set of
charac in search of a set of var to seprate them. Ca starts with an undifferentiated grp of
people, events or objects and attempts to reorganize them into homo subgrps
Method
5 steps are basic to the application of cluster studies
1. Selection of the sample to be clustered (eg buyers, employees etc)
2. Definition of the var on which to measure the objects, events or people(financial
status, political affiliation etc)
3. Computation of similarities among the entities through correlation and other
techniques
4. Selection of mutually exclusive clusters(maxn of within cluster similarity and
between cluster differences) or hierarchically arranged clusters)
5. Cluster comparison and validation
Different clustering methods can and do produce different solutions. It is important to
have enough information about the data to know when the derived groups are real and not
merely imposed on the data by the method
CA can be used to plan marketing campaigns and develop strategies.
MULTIDIMENSIONAL SCALING
MDS creates a special description of a respondent’s perception about a product, service
or any other product of interest. This helps the business researcher to understand difficult-
to-measure constructs such as product quality or desirability. In contrast to var that can be
measured directly many constructs are perceived and cognitively mapped in different
ways by individuals. With MDS items that are perceived to be similar will fall close
together on MD space and items that are dissimilar will be farther apart.
METHOD
We may think of 3 type of attribute space, each representing a MD map.
1. Objective base, in which an object can be positioned in terms of its measureable
attributes; flavour, weight, nutritional value
2. Subjective space: perceptions about the objects flavour, weight and nutritional
value can be positioned. Obj and sub attribute assessments may coincide but often
they do not. A comparison of the 2 allows us to judge how accurately an objective
is being perceived. Individuals may hold different perceptions of an object
simultaneously and these may be averaged to present a summary measure of
perception. A person’ perception may vary overtime and in different
circumstances. Such measurements are valuable to gauge the impact of various
perception-affecting actions such as advertising programmes
3. Describe respondent’s preferences using the object’s attributes. This represents
their ideal. All objects close to this ideal point are interpreted as preferred by
respondents to those that are more distant. Ideal points from many people can be
positioned in this preference space to reveal the pattern and size of preference
clusters. These can be compared to subjective space to how well the preferences
correspond to perception clusters. In this way CA and MDS can be combined to
map market segments and then design products designed for those segments

You might also like