An Introduction To Multivariate Analysis

AN INTRODUCTION TO
MULTIVARIATE ANALYSIS
Some problems in biology involve many variables, that is, for each unit examined
there may be observations on two or more related measurements. When more than
two variables are analysed simultaneously it is termed as multivariate analysis.
Some research questions may be answered with simpler statistical methods, especially
where the data were experimentally generated under controlled conditions. We are
already familiar with multiple regression analysis in which several variables are
used. In multiple regression one variablea dependent variable, is generally predicted
or explained by means of other variablesindependent variables. However, many
research questions are so complex that they demand multivariate methods such as
discriminant function analysis, factor analysis and canonical correlation.
32.1 DISCRIMINANT FUNCTION ANALYSIS
Discriminant function analysis is essentially a multiple regression where the
dependent variable Y is a categorical variable. Biologists use the discriminant
function analysis first to determine which variables discriminate between K groups.
This procedure, usually termed as discriminant predictive analysis, is used to optimize
the predictive functions. Then such predictive functions are used to allocate new
or unknown specimens to predetermined groups. For example, a researcher may
want to investigate which variables discriminate between short duration, medium
duration and long duration rice varieties. When a new variety is evolved, he will
determine whether it is a short duration variety or long duration variety. Similarly, an
extension worker may try to classify the farmers as adopters and non-adopters of
new agricultural techniques.
32.1.1 Assumptions
All the used assumptions of regression analysis apply to discriminant analysis.
The independent variables are measured on interval scale. However, as with other
regression analyses, dummy variables and ordinal variables are commonly used as
well. The independent variables must be uncorrelated. If they are correlated the
standardized discriminant function coefficient will not reliably assess the relative
importance of them. The dependent variable is a true dichotomy. Within each group
the independent variables may have different variances, but for the same
independent variable the variance should be similar between groups. The sample
Chapter 32
498 Agricultural Statistics
size should be adequate. Usually it is recommended that the sample size should be
at least five times as many as independent variables.
Discriminant analysis conducted for predictive purposes is based on an initial
set of observation, the group membership of which is known. In this analysis the
general problem is to develop a set of weighted combination of the predictors, that
is, the discriminant functions of the form
Z =
1
X
1
+
2
X
2
+ . . . . . . . . . +
p
X
p
Fisher has shown how to devise the weighted coefficients s such that if we were
to make an analysis of variance of the Z-values, the ratio of the variance between
groups to that within groups would be a maximum.
32.1.2 Computational methods
First let us consider two groups case. Suppose we have short duration rice
variety (group A) and long duration rice variety (group B). Assume that there are n
a
and n
b
observations in group A and group B respectively. Also suppose that three
variables are studied. Let them be panicle length in cm (X
1
), number of grains/
panicle (X
2
) and grain weight in grams (X
3
). The measurements and the basic
calculations are presented in Table 32.1.
Table 32.1 Data related to two rice varieties.
Group A Group B
X
1
X
2
X
3
X
1
X
2
X
3
25.6 127 37 22.6 118 36
22.5 88 20 22.8 109 23
23.7 139 30 23.4 110 22
24.6 175 38 22.6 101 23
24.9 151 32 21.3 93 22
28.7 178 65 21.6 84 16
22.5 96 10 20.9 76 10
24.3 121 20 21.0 87 17
26.0 134 20 20.2 80 8
25.2 174 26 22.8 115 16
25.0 145 40 20.6 81 14
20.9 84 22 20.5 103 16
499 An Introduction to Multivariate Analysis
x
i
293.9 1612 360 260.3 1157 223
x
i
2
7241.75 228474 13022 5659.47 113951 4739
x
i
24.49 134.33 30.0 21.69 96.42 18.58
SS (x
i
) 43.65 11928.67 2222.0 13.13 2396.92 594.92
x
1
x
j
- 40048.7 9051.3 - 25237.4 4895.6
SP (x
1
x
j
) - 568.13 234.3 - 140.14 58.36
x
2
x
j
- - 51913 - - 22389
SP (x
2
x
j
) - - 3553.0 - - 888.08
The sum of squares (SS) and sum of products (SP) are computed in the usual
way.
For the combined groups the following calculations are then made.
SS(x
1
) = SS (x
1a
) +SS (x
1b
)
= 43.65 + 13.13 = 56.78
Similarly, SS(x
2
) = 14325.59; SS(x
3
) = 2816.92
SP (x
1
X
2
) = 568.13 + 140.14 = 708.27
SP (x
1
x
3
) = 234.30 + 58.36 = 292.66
SP (x
2
x
3
) = 3553.00 + 888.08 = 4441.08
d
1
=
1 1
a b
x x = 24.49 21.69 = 2.80
d
2
=
2 2
a b
x x = 134.33 96.42 = 37.91
d
3
=
3 3
a b
x x = 30.00 18.58 = 11.42
The coefficients of the discriminant function (s) arise from maximizing the
ratio
G =
Between groups variance
Within groups variance
Maximizing the ratio G yields a set of simultaneous equations as given below:
1
SS(x
1
) +
2
SP (x
1
x
2
) + x
3
SP (x
1
x
3
) = d
1
1
SP(x
1
x
2
) +
2
SS (x
2
) + x
3
SP (x
2
x
3
) = d
2
1
SP(x
1
x
3
) +
2
SP (x
2
x
3
) + x
3
SS (x
3
) = d
3
In matrix form they are given as
1
2
3
, ]
, ]
, ]
, ]
]
=
( ) ( ) ( )
( ) ( ) ( )
( ) ( ) ( )
1 1 2 1 3
1 2 2 2 3
1 3 2 3 3
, ]
, ]
, ]
, ]
]
SS X SP X X SP X X
SP X X SS X SP X X
SP X X SP X X SP X
1
2
3
d
d
d
, ]
, ]
, ]
, ]
]
=
11 12 13
12 22 23
31 32 33
C C C
C C C
C C C
, ]
, ]
, ]
, ]
]
1
2
3
d
d
d
, ]
, ]
, ]
, ]
]
Where C
ij
are the elements of the inverse of the first matrix on the right hand
side. For our example,
1
2
3
, ]
, ]
, ]
, ]
]
=
0.0554 0.0019 0.0028
0.002 0.0001
0.0008
, ]
, ]
, ]
, ]
]
2.80
37.91
11.42
, ]
, ]
, ]
, ]
]
Multiplying the matrices on the right hand side, we get
1
= 0.0511;
2
= 0.0011;
3
= 0.0025
For the sake of convenience all values are divided by the least absolute
value, here
2
. Hence, we get
1
= 46.45;
2
= 1.00;
3
= 2.27
The discriminant function is
Z = 46.45 X
1
+ X
2
2.27 X
3
The significance of the discriminant functions can be tested by means of
analysis of variance. For this first we calculate the within group sum of squares
which is given by
D =
1
d
1
+
2
d
2
+
3
d
3
+ l
3
d
3
with (n
a
+ n
b
p 1) degrees of freedom (d.f.) where p = number of characters
measured. Then the between groups SS is computed as
j \
, (
( ,
a b
a b
n n
n n
D
2
, with p d.f.
For our example
D = (0.0511) (2.80) + (0.0011) (37.91) + ( 0.0025) (11.42) = 0.1562
Between groups SS =
(12)(12)
(12 12) +
(0.1562)
2
= 0.1464
The analysis of variance table is given as follows:
Sources of variation d.f. SS MS F
Between groups 3 0.1464 0.0488 6.256
Within groups 20 0.1562 0.0078
Since F is significant at 1% level we conclude that the groups are not
homogeneous and they can be discriminated. The larger the the larger the respective
variables unique contribution to the discrimination specified by the discriminant
function. In our example, panicle length is the important variable that contributes to
the discrimination between the two groups.
The classification function for the ith group is given by
S
i
=
1 1 2 2
.... + + +
i i p pi
x x x
Each function allows us to compute classification scores for each group. For our
example,
S
a
= (0.0511) (24.49) + (0.0011) (134.33) + ( 0.0025) (30.00) = 1.3242
S
b
= (0.0511) (21.69) + (0.0011) (96.42) + ( 0.0025) (18.58) = 1.1680
There are as many classification functions as there are groups.
Suppose that an individual is given by the same measurements as
X
1
= 28; X
2
= 170; X
3
= 38
We wish to know to which group this individual should be classified. For this we
compute the score using the given values.
U = (0.0511) (28) + (0.0011) (170) + ( 0.0025) (38) = 1.5228
This value is compared with the mean of the scores for the two groups, that is,
2
a b
S + S
=
1.3242 1.1680
2
+
= 1.2461
Since U >
2
+
a b
S S
it is evident that this individual should be assigned to group
A. By substituting the values of group A in S
a,
we can see that 9 out of 12 (75%)
cases fall under correct classification. Similarly using S
b
we can see that for group B
the correct classification is 83.3% and the overall correct classification is 79%.
This procedure can be extended to any number of groups. When there are
more than two groups, then we can estimate more than one discriminant function.
In general the total possible number of discriminant functions will be equal to the
number of groups minus one. For example, when there are three groups we would
estimate two functions, (1) a function for discriminating between group 1 and groups
2 and 3 combined, and (2) another function for discriminating between group 2 and
group 3. When interpreting multiple discriminant functions we would first test the
different functions for statistical significance, and consider only the significant
functions for further examination.
Closely allied with the discriminant analysis is standardized distance between
two population means. Consider the means of two populations
1
X and
2
X . The
squared standardized distance between the two means is defined as
D
2
= [
1
X
2
X ]
1
[
1
X
2
X ]
where
1
is the inverse of the within group variancecovariance matrix,
1
X and
2
X are vectors of means.
32.2 CLUSTER ANALYSIS
Cluster analysis is a method for grouping objects into categories where members
of the groups share properties in common. For instance, biologists have to arrange
the different species of plants into meaningful structures, that is, to develop
taxonomies. Cluster analysis is very useful in the fields of medicine and marketing
as well.
Establishment of clusters may begin with all cases in one cluster. This cluster
is gradually broken down into smaller and smaller clusters. It is called divisive
clustering. Another technique of clustering starts with single member clusters and
they are gradually fused until one large cluster is formed. It is known as agglomerative
clustering.
Cluster membership may be based on the presence or absence of a single
characteristic. It is known as monothetic scheme. When more than one characteristic
are used for grouping the scheme is known as polythetic.
The cluster analysis is different from discriminant analysis. In discriminant
function analysis the groups are determined before hand and the object is to
determine the linear combination of different variables which best discriminates
among the groups. In cluster analysis the groups are not predetermined and the
object is to determine the best way in which cases may be clustered into groups.
Cluster analysis classifies unknown groups while discriminant function analysis
classifies known groups. The procedure for doing a discriminant function analysis
is well established. On the other hand cluster analysis allows many choices about
the nature of the algorithm for combining groups. Each choice may result in a
different grouping structure.
32.3 FACTOR ANALYSIS
In agricultural research we often obtain information on several related variables.
When the number of variables are too large the analysis and interpretation becomes
complicated. In such situations we try to exclude some of the original variables, that
is, we try to reduce the dimensionality in order to attain simplicity for understanding
and interpretation and at the same time retain sufficient detail that they contain for
adequate representation. Also, reduction in dimensionality can lead to economy of
description, of measurement or of both.
Dimensionality may be reduced either by exclusion of variables by judgement
or by statistical techniques. Exclusion of variables by judgement may be done
before the experiment or after specific experiment. Judgement approaches may lead
to excessive elimination of variables. It is undesirable because in such cases
meaningful statistical analysis will not be possible. Hence in order to reduce the
dimensionality, statistical techniques such as factor analysis are used.
In statistical techniques we start with one set of p variables. The variables are
generally correlated with one another. We wish to reduce the large number of
variables to a smaller number, say k, of components or factors that capture most of
the variance in the observed variables. Each component or factor is estimated as
being a linear weighted combination of the observed variables.
Factor analysis is used to (1) reduce the dimensionality of large number of
variables to a fewer number of factors and (2) to confirm a hypothesized factor
structure by way of testing of hypotheses about the structuring of variables in
terms of the expected number of significant factors and factor loadings. Hence,
basically there are two types of factor analysis, namely, exploratory factor analysis
and confirmatory factor analysis. Both types of factor analyses are based on
Common Factor Model.
The primary objectives of an exploratory factor analysis are to determine the
number of common factors influencing a set of variables and to determine the
strength of relationship between each factor and each observed variable. In other
words, it is used to explore the underlying structure of a set of observed variables
when there are no a priori hypotheses about the factor structure. The confirmatory
factor analysis is used to test or confirm specific hypotheses about the factor
structure for a set of variables. Exploratory factor analysis is simpler to perform than
confirmatory factor analysis. A larger sample size is required for a confirmatory
factor analysis than for an exploratory factor analysis. For these reasons the
commonly used type is exploratory factor analysis.
Exploratory factor analysis is often confused with principal component analysis.
However, there are important differences between the two. In principal component
analysis it is assumed that all the variability (common, specific and error) in a factor
should be used in the analysis. In factor analysis specific and error variances are
excluded and only the common variance is taken into account. The exploratory factor
analysis is used when we are interested in making statements about the factors that
are responsible for a set of observed responses whereas the principal component
analysis is used when we are simply interested in performing data reduction.
There are three customary steps in factor analysis. First, we have to collect
data. Then we have to work out the correlations between variables (or attributes)
and form the correlation matrix.
The second step is to explore the possibilities of data reduction, that is, the
initial set of factors are to be explored. There are different methods to extract the
factors. The common methods are principal component analysis, maximum likelihood
and principal axis.
As already stated the principal component analysis is used to reduce the
dimensionality. When the data are collected on p variables, they are many times
correlated. This correlation indicates that some of the information contained in one
variable is also contained in some of the other P1 variables. The objective of the
principal component analysis is to transform the P original correlated variables into
p independent or orthogonal components. These components are linear functions
of the original variables. However, we may select only k components which is less
than p. It may be found that the selected k components explain greatest portion of
the total variance (common, specific and error). The remaining pk components
explain very small portion of the total variance and hence they are discarded from
the analysis.
The question of how many components are needed to satisfactorily explain
the total variance is an unresolved one. There are no statistical tests to determine
the significance of a component so that it can be included in the analysis.
However, some thumb rules are available to determine the optimal number of
factors. Among them Kaisers eigen value criterion is the most widely used
method.
Eigen value represents the total variance explained by a particular component
or factor. It is the sum of squared values in the column of a factor matrix. Factor
matrix is the matrix of factor structure coefficients in which the factors are presented
as columns and the variables are presented as rows. The factor structure coefficient
is the coefficient of correlation between a variable and a factor. The structure
coefficients are usually termed as factor loadings. The eigen value is denoted
by . Kaiser proposed that only factors with eigen value at least 1 ( 1) should
be retained. In many practical applications, we can expect that the number of
eigen values greater than 1.00 is approximately p/3 where p is the number of
original X variables. Thus 15 variables could be expected to reduce to about 5
principal components.
In principal components analysis the derived components may be used as
independent variables in a multiple regression analysis. However, in reporting the
results of a regression on principal components, it is generally desirable to transform
the resulting regression equation into an equation in terms of the original variables.
There are advantages as well as disadvantages of using principal components in a
regression analysis.
The third step is to rotate the selected factors to final solution. The rotation
simplifies the factor structure and therefore makes its interpretation easier and more
meaningful. Thurstone suggested five criteria to achieve the goal of rotation. They
are listed below.
1. Each row of the factor matrix should contain at least one zero.
2. If k factors are retained each column of the factor matrix should have at
least k zeros.
3. For every pair of factors (that is, pair of columns of the factor matrix) there
should be some variables with zero loadings on one factor and large
loadings on the other factor.
4. For every pair of factors there should be a sizable portion of the variables
with zero loadings.
5. For every pair of factors, there should be only a small number of large
loadings.
There are two major methods of rotation. They are orthogonal rotations and
oblique rotations. The orthogonal rotations produce uncorrelated factors while the
oblique rotations produce correlated factors. The orthogonal factors are simpler to
handle while oblique factors are empirically more realistic. The choice between the
two methods is at the discretion of the researcher based on the needs of a given
research problem.
Common methods of orthogonal rotation are varimax, quartimax and equimax.
Varimax rotation maximizes the variance of the factor loadings in each column.
Hence the name varimax method. After varimax rotation each original variable tends
to be associated with a small number of factors and each factor represents only a
small number of variables. Varimax is the most widely used method of rotation.
Quartimax rotation minimizes the cross product of factor loadings for a given
variable. Thus it minimizes the number of factors needed to explain each variable.
The equimax rotation is a compromise between quartimax and varimax. Among the
methods of rotation the popular one is varimax.
Example 32.1
In a study on farmers in a locality ten variables were considered. For the initial
orthogonal components the eigen values were as in Table 32.2.
Table 32.2 Table of eigen values.
Factor Eigen value Factor Eigen value
1 2.768 6 0.823
2 1.685 7 0.551
3 1.506 8 0.345
4 1.185 9 0.263
5 0.874 10 0.000
For the first four factors the eigen values were greater than 1. Hence, they were
selected for further analysis. The selected factors and their loadings were as in
Table 32.3.
Table 32.3 Table of factor loadings.

Variable
Factor
1 2 3 4
1. AGE 0.221 0.707 0.390 0.224
2. EDUCAN 0.354 0.243 0.016 0.657
3. EXPERINC 0.131 0.471 0.215 0.512
4. FAM TYPE 0.890 0.010 0.054 0.137
5. FAM SIZE 0.874 0.027 0.131 0.189
6. TOTMEM 0.981 0.021 0.041 0.181
7. SOC PART 0.220 0.596 0.012 0.556
8. OCC STA 0.041 -0.152 0.868 0.126
9. SIZE 0.088 0.656 0.526 0.162
10. TRAINING 0.033 0.309 0.506 0.049
Varimax rotation resulted in the following factor matrix.
Table 32.4 Factor matrix.

Variable
Factor Communality
1 2 3 4 h
2
(1 h
2
)
1. 0.148 0.839 0.015 0.155 0.750 0.250
2. 0.151 0.170 0.033 0.751 0.564 0.436
3. 0.015 0.708 0.097 0.192 0.548 0.452
4. 0.889 0.092 0.071 0.099 0.814 0.186
5. 0.898 0.004 0.091 0.057 0.818 0.182
6. 0.993 0.050 0.010 0.087 0.996 0.004
7. 0.041 0.148 0.156 0.815 0.712 0.288
8. 0.043 0.434 0.734 0.254 0.793 0.207
9. 0.088 0.322 0.770 0.191 0.741 0.259
10. 0.048 0.034 0.588 0.076 0.355 0.645
Variance explained 2.642 1.560 1.526 1.416
After obtaining the factor loadings by rotation, the factor matrix can be
interpreted. Each of the variables will be linearly related to each of the k hypothetical
factors. The factor loadings in a given row of the factor matrix represent standardized
regression coefficients, regressing the factor on the variables. The variables with
higher factor loadings, usually 0.60 and above, on an extracted factor is said to load
on to that factor significantly. Each row of the factor matrix may be used in the
regression model
Z
j
= a
j1
F
1
+ a
j2
F
2
+ + a
jk
F
k
+ e
j
u
j
where Z
j
= Variable j
F
i
= Hypothetical factors
a
ji
= Factor loadings
u
j
= Unique factor for variable j
The proportion of the unique variance of a variable is given by (1 h
2
). The
sum of the squares of the loadings in a row is known as communality. It is denoted
by h
2
.
Depending on how many factors each variable loads significantly the factorial
complexity of the variables is defined. If it is only one factor, the factorial complexity
of the variables is said to be 1; if it is on two factors the complexity is said to be 2;
and so on. If the complexity of a variable is more than one, the meaning of the
variable is no longer simple.
In our example, for variable 1 (V
1
), we have
Z
1
= 0.148 F
1
+ 0.839 F
2
0.015 F
3
+ 0.155 F
4
+ 0.250 U
1
.
In a regression equation the importance of a given factor for a given variable
can be expressed in terms of the variance in the variable that can be accounted for
by the factor. In our example, about 70% (0.839
2
) of the total variance of V
1
is
accounted for by F
2
. Hence factor 2 is the most important determinant of V
1
. The
proportion of variance in V
1
accounted for by all the common factors (h
2
) is 75%.
The contribution of factors 1, 3 and 4 to the variance of V
1
is only 5% (h
2
0.70). For
V
1
, the unique variance is 1 0.75 = 0.25 or 25%. It may be noted that the meaning
of the communality is true only in an orthogonal rotation.
Finally, the factors should be given meaningful names. Factor names should be
brief and should communicate the nature of the underlying construct. For example,
the factor consisting of income, occupational status and educational level may be
named as socio-economic factor. In many cases meaningful names to factors cannot
be given since no reasonable explanation can be offered as to why certain variables
cluster together.
32.4 CANONICAL CORRELATION
Canonical analysis belongs to the family of regression methods for data
analysis. Multiple regression is used to assess the relationship between one
dependent variable and many independent variables. The canonical correlation is
used to assess the relationship between two sets of variables, one set of many
independent variables and another set of many dependent variables.
32.4.1 Concepts and Terms
A canonical variable, also called a variate, is a linear combination of a set of
original variables in which the within set correlation has been controlled. Through
least squares analysis two linear combinations are formed, one for the independent
or predictor variables and the other for the dependent or criterion variables. Let
there be p independent variables X
1
, X
2
, , X
p
and q dependent variables Y
1
, Y
2
, ,
Y
q
. A pair of linear combinations may be generated as
U = a
1
X
1
+ a
2
X
2
+ + a
p
X
p
V = b
1
X
1
+ b
2
X
2
+ + b
q
X
q
Similarly a number of pairs of such linear combinations may be generated.
They may be denoted as (U
1
, V
1
), (U
2
, V
2
), (U
3
, V
3
) . The number of pairs will be
equal to either p or q whichever is less. The combinations will be generated in such
a way that the correlation coefficient between U
2
and V
2
is maximized subject to the
restriction that the correlation coefficient between U
1
and U
2
, U
1
and V
2
, V
1
and U
2
and V
1
and V
2
are all zero. Subsequent combinations will be formed with such
restriction. The new variables U and V are called canonical variables. We will use
the notation X for the independent canonical variable and Y for the dependent
canonical variable for further discussion.
The weights a
1
, a
2
, , a
p
and b
1
, b
2
, b
q
in the linear equation of variables
which create the canonical variables are called canonical weights. They are
standardized weights since the standard deviations of both the canonical variables
are equal to one in value. The weights are analogous to beta coefficients in multiple
regression. They are estimated so as to maximize the correlation coefficient between
the canonical variables X and Y. In matrix notation.
[A] = [a
1
, a
2
, , a
p
]
[A] = [b
1
, b
2
, , b
q
]
The canonical weight for a variable will be zero when that variable is totally redundant
with another variable in the same set.
The sum of the canonical weights of the canonical variable is known as the
canonical scores for that canonical variable.
The correlation between the canonical variables is called canonical correlation.
It is denoted by R
c
. The square of the canonical correlation, R
2
c, is the eigen value
or characterist root. It is denoted by . There will be as many eigen values as there
are canonical correlations. The number of eigen values extracted in a canonical
analysis will be equal to the minimum number of variables in either set. For example,
if an independent canonical variable consists of five original variables and a
dependent canonical variable consists of three original variables, then three eigen
values are extracted.
The eigen values are extracted successively. They will be uncorrelated with
each other. Each eigen value will be smaller than the last. Symbolically,
1

2

3

q
0. The eigen values reflect the proportion of variance explained by each
canonical correlation. Each successive eigen value will explain a unique additional
proportion of variability in the two sets of variables.
The correlation of a canonical variable with an original variable in its set is
known as structure correlation. For the first pair of canonical variables the structure
correlations are obtained as
r
x
= [R
XX
] [A]
r
y
= [R
YY
] [B]
where R
XX
= the correlation matrix of p independent variables in a set
R
YY
= the correlation matrix of q dependent variables in a set
[A] = [a
1
, a
2
, , a
p
]
[B] = [b
1
, b
2
, , b
q
]
The structure correlations are used for interpreting the canonical variables. If the
structure correlation of the original variable is larger with the canonical variable, that
variable is considered to be the important variable contributing to the canonical variable.
The structure correlation for a variable might be high even though its canonical weight
is zero. This is because of the reason that the variable which is totally redundant with
another variable in the set might have correlation with the canonical variable.
The average of all squared structure correlations for one set of variables with
respect to a given canonical variable is known as the adequacy coefficient. It is a
measure of how well a given canonical variable represents the original variance in
that set of original variables.
The product of the adequacy coefficient with the squared canonical correlation
coefficient (R
2
c) is termed as redundancy coefficient. High redundancy means high
ability to predict. It will show how well the independent canonical variable predicts
values of the original dependent variable. Hence for successful canonical analysis
the canonical variables X and Y should be redundant as much as possible.
32.4.2 Assumptions
The assumptions of multiple regression such as multivariate normality, interval
data, homoscedasticity and absence of multicollinearity hold good for canonical
analysis also. The relationships between independent and dependent canonical
variables as well as the relationships within the independent set of variables and
within the dependent set of variables must be linear. There should be no outliers
(extreme values) in the data. The outliers can greatly affect the magnitudes of
canonical correlation coefficients. The sample size should be sufficiently large so
as to minimize the influence of sampling variability in canonical analysis. A rule of
thumb is that the sample size (n) should be greater than 10 (p + q). Some authors
suggest that it should be more than 20 ( p + q).
32.4.3 Computational Methods
The basic data for the canonical correlation analysis will be as follows:
Independent variables Dependent variables
X
11
X
12
.. X
1P
Y
11
Y
12
. Y
1q
X
21
X
22
.. X
2P
Y
21
Y
22
.. Y
2q
X
31
X
32
.. X
3P
Y
31
Y
32
.. Y
3q
.. .. .. .. .. .. .. ..
.. .. .. .. .. .. .. ..
X
n1
X
n2
.. X
np
Y
n1
Y
n2
.. Y
nq
where p = number of independent variables
q = number of dependent variables
The inter-correlations are worked out and presented in the form of a matrix.
[R] =
j \
, (
( ,
XX XY
YX YY
R R
R R
where R
XX
= the correlation matrix of p independent variables
R
YY
= the correlation matrix of q dependent variables
R
XY
= the matrix of correlations between dependent and independent
variables
R
YX
= the transpose of R
XY
The inter correlation matrix [R
XY
] is examined to see if any of the Y variables
show some correlation with the X variables. If most of the correlations in [R
XY
] are
zero or close to zero, the final results will most likely be ambiguous. In such situations
the canonical correlation analysis may be abandoned. As a rule of thumb if a number
of the correlation coefficients in [R
XY
] are larger than 0.40 the canonical analysis can
be continued.
We obtain the matrix
[C] = [R
YY
]
1
[R
XX
] [R
XX
]
1
[R
XY
]
The eigen values are extracted by solving the characteristic equations
| C | = 0
These equations can be solved by using computer. The largest eigen value
1
equals the square of the correlation coefficient between the first pair of canonical
variables, X
1
and Y
1
(that is,
2
1
R ). It can be seen that the remaining canonical correlation
coefficients decrease in value as
1

2

q
> 0
In order to determine whether two canonical variables X and Y are related
significantly, the null hypothesis
H
o
:
1
=
2
= . =
q
= 0 is tested using Bartletts
2
test. The test makes use of
Wilks lambda criterion,
= (1
1
) (1
2
) . (1
q
)
2
= [(n1) 0.5 (p + q + 1)] (2.3026) log
10
with pq degrees of freedom.
If at least one eigen value is significantly different from zero will be equal to
one in value. Hence measures the strength of the association between pairs of
canonical variables X and Y. A significant
2
implies that a significant relationship
exists between the canonical variables X and Y.
The significance of each of the successive eigen values
2
,
3
,
q
can be
tested using the
2
test in sequence leaving one

at each step. Thus,
2
2
= [(n1) 0.5 (p + q + 1)] 2.3026 log

10

2
with (p1) (q1) d.f.
where
2
= (1
2
) (1
3
) (1
q
)
2
3

= [(n1) 0.5 (p + q + 1)] 2.3026 log
10

3
with (p2) (q2) d.f.
where
3
= (1
3
) (1
4
) (1
q
)
We can proceed upto
2
q
.
By the additive property of
2
2
1
+
2
2
+ . +
2
q
=
2
with pq d.f.
It may be noted that the above procedure is analogous to the backward
elimination method of multiple regression, where reactions in R
2
are examined as
variables are dropped from the model.
Instead of
2
test Raos F-test can also be used. It is given by
F =
1/
1/
2
1 2
j \
, (
( , , ]
, ]
]
s
s
pq
ms
pq
where
p = number of variables in the independent canonical variable X
q = number of variables in the dependent canonical variable Y
m =
1
2
[2n p q 2]
s =
2 2
2 2
4
5 +
p q
p q
The degrees of freedom associated with F is v
1
= pq and v
2
= ms
2
2
j \
, (
( ,
pq
.
It may be noted that m and s need not be integers.
The Bartletts
2
test is useful in specifying how many canonical variable
pairs should be retained. With F test such specification is not possible.
32.4.4 Interpretations
For interpreting the results of canonical analysis the canonical correlation
coefficient, canonical weights and structure correlations are used.
The squared canonical correlation represents the proportion of variance of
each canonical variable that is predictable from the other canonical variable. It is,
therefore, used when explaining the relationships between the independent and
dependent set of variables. A canonical correlation of 0.90 indicates that 81% of the
dependent canonical variable is predictable from the independent canonical variable.
It does not indicate that 81% of the variance of the original dependent variable is
predictable from the original independent variable.
The canonical weights may be interpreted in the same way as the beta
coefficients are interpreted in a multiple regression equation. They are used to
assess the relative importance of the contribution of individual variable to a given
canonical score. The larger the absolute value of the canonical weight the greater is
the respective variables positive or negative contribution to the variability in another
variable in the set.
Canonical weights may be subject to multicollinearity leading to incorrect
judgements. Hence the structure correlations are used for interpreting the relations
of the original variables to a canonical variable. The magnitudes of the structure
correlations help in interpreting the meaning of the canonical variables with which
they are associated. A rule of thumb is the variables with correlations of less than
0.30 is not to be considered part of the canonical variable. The square of the structure
correlation is the proportion of variance in a given original variable accounted for
by a given canonical variable.
If redundancy is high it indicates that independent canonical variable accounts
for a high proportion of the variance in the dependent set of original variables. In
other words, high redundancy means high ability of the independent canonical
variable to predict values of the original dependent variables.
Example : 32.2
In a study on the innovativeness and adoption behaviour of the farmers the
following correlation matrix was obtained. The independent variables were
X
1
= Age Y
1
= Innovativeness
X
2
= Education Y
2
= Adoption
X
3
= Experience
X
4
= Social participation
X
5
= Occupational status
The variables X
2
, X
4
and X
5
are coded as 1, 2 etc. They are acceptable as
variables for canonical analysis.
The correlation matrix, the standardized weights, and structure correlations are
given in Tables 32.5.and 32.6 respectively.
Table 32.5 Correlation matrix.
X
1
X
2
X
3
X
4
X
5
Y
1
Y
2
X
1
1.000 0.104 0.318 0.198 0.367 0.123 0.034
X
2
1.000 0.014 0.328 0.104 0.105 0.089
X
3
1.000 0.043 0.110 0.044 0.008
X
4
1.000 0.089 0.192 0.351
X
5
1.000 0.166 0.025
Y
1
0.123 0.105 0.044 0.192 0.166 1.000 0.372
Y
2
0.034 0.089 0.008 0.351 0.025 0.372 1.000
Table 32.6 Standardized weights.
Set 1 Set 2
Variables Canonical variables Canonical variables
X Y X Y
X
1
0.483 0.174
X
2
0.371 0.621
X
3
0.133 0.039
X
4
1.072 0.145
X
5
0.060 0.585
Y
1
0.504 0.952
Y
2
0.696 0.822
An examination of R
XY
matrix in Table 32.5 reveals that the correlations are less
than 0.40 and all of them are very close to zero. Hence the analysis has to be
stopped at this stage itself because the final results will be ambiguous. However,
for the sake of illustration we proceed further.
The product of the matrices
[R
YY
]
1
[R
YX
] [R
XX
]
1
[R
XY
]
results in the matrix
0.100 0.046
0.055 0.124
j \
, (
( ,
Solving the equations
0.100 0.046
0.055 0.124
=
2
0.224 + 0.0099 = 0
We get
1
= 0.1635;
2
= 0.0605
Hence R
c1
= 0.404; R
c2
= 0.246
1
= (1 0.1635) (1 0.0605) = 0.7859
2
1
= [(601) 0.5 (5 + 2 + 1)] 2.3026 log
10
0.7859 = 13.010
For pq = 52 = 10 d.f. at = 0.05 x
2
= 18.307
Since
2
is not significant we accept the null hypothesis that
1
=
2
= 0.
2
= (1 0.0605) = 0.9395
2
2
= [54] 2.3026 log
10
0.9395 = 3.370
For (p1) (q1) = 4 1 = 4 d.f. at = 0.05
2
= 9.488
Since
2
is not significant we accept the null hypothesis
2
= 0. Since the
largest eigen value,
1
, is not significant the testing procedure can be terminated at
this stage itself.
The structure correlations are obtained by multiplying the correlation matrix by
the vector of standardized weights. For the first set of canonical variables X and Y
we have the following results:
Correlations of X variables with canonical variable X = [R
XX
] [W
X
]
1.000 0.104 0.318 0.198 0.367
1.000 0.014 0.328 0.104
1.000 0.043 0.110
1.000 0.089
1.000
j \
, (
, (
, (
, (
, (
, (
( ,

0.483
0.371
0.133
1.072
0.060
j \
, (
, (
, (
, (
, (
, (
( ,
=
0.212
0.023
0.068
0.844
0.166
j \
, (
, (
, (
, (
, (
, (
( ,
Correlations of Y variables with canonical variable Y : [R
YY
] [W
Y
]
1.000 0.372
1.000
j \
, (
( ,

0.504
0.696
j \
, (
( ,
=
0.763
0.883
j \
, (
( ,
Among the original X variables X
4
is highly correlated with the canonical variable
X. It implies that X
4
is the important variable contributing to the canonical variable
X. It can be seen that the original variables Y
1
and Y
2
are highly correlated with the
canonical variable Y. Hence both are important variables contributing to the canonical
variable Y.
32.5 PATH ANALYSIS
Path analysis is closely related to multiple regression. It looks explicitly at
cause which cannot be done with straight regression. It produces a clear and
explicit result of the strengths of the mathematical relationships between the
variables. Path analysis is a method of decomposing correlations into direct and
indirect effects of variables that are hypothesised as causal. Such decomposition is
not possible in regression analysis. It was developed by Sewall Wright.
32.5.1 Applications of Path Analysis
In agricultural sciences the path analysis is used to select a favourable trait.
For example in order to improve the grain yield of a rice variety we have to find out
which of the following traits have to be selected: length of panicle, plant height,
number of productive tillers and the like. In the field of genetics it is used to separate
out genetic and environmental influences.
Path analysis is used to find out whether there are causal or spurious correlations
within the independent variables. The best regression model can be found out by
elimination of variables that contribute little to the equation.
32.5.2 Assumptions
The assumptions under path analysis are
(1) Relationships among variables are linear and additive. Curvilinear or
interaction relations are excluded.
(2) The residual or disturbance term is uncorrelated with all other variables.
(3) The variables are measured on interval scale.
(4) The independent variables are measured without error.
(5) A weak causal order among the variables is known.
(6) The relationships among the variables are causally closed.
For a given pair of variables X and Y, it is possible to assume that X may affect
Y but Y cannot affect X. For example, panicle length may affect grain yield of paddy
but grain yield cannot affect panicle length. Then it is possible to establish a weak
causal order X Y.
The observed correlation between X and Y may be due to solely the causal
dependence of Y on X. In that case the causal structure between X and Y may be
E
X Y
where E is the external variable. Then the correlation between X and Y is causally
closed to outside influence.
Consider the following causal structures between X and Y:
E
X
Y
E
X
Y
In these cases, the correlation between X and Y is influenced by external variable
E and hence the correlation between X and Y is not closed to outside influence.
32.5.3 Concepts and Terms
In a regression model the independent variable is denoted by X and the
dependent variable by Y. In path analysis X may be termed as exogenous variable
and Y as endogenous variable. A diagram relating independent and dependent
variables is known as path diagram.
The path diagram for the variables grain yield of paddy (Y), number of
productive tillers (X
1
) and plant height (X
2
) can be given as
X
1
Y
X
2
The arrows connecting the causes (X) and effect (Y) are referred to as paths. A
single-headed arrow points from cause (X) to effect (Y). A dashed line is used to
represent a negative causal relation. A double-headed arrow indicates that the
causal variables are merely correlated but no causal relations are assumed.
The coefficient to measure the influence along the path is known as path
coefficient. The path coefficients are nothing but the standardized partial regression
coefficients. Hence the path coefficients may be called beta weights based on
usage in multiple regression. The path coefficient for the path from X to Y is denoted
by P
yx
. It is given by
P
yx
=
yx
y
where
y
= standard deviation of Y

yx
= standard deviation of Y due to the influence of X, while other
causes are kept constant.
Note that the effect Y is listed first in the subscripts. The square of the path
coefficient is known as coefficient of determination. It is denoted by d
yx
or R
2
. The
regression coefficient b
yx
and the path coefficient P
yx
are equivalent if and only if
the assumptions of causal order and closure are met.
A path analysis in which the causal flow is unidirectional (one-way causal
flow) is called recursive. In a recursive model, there are no reciprocal effects, that is,
a variable cannot be both cause and effect at the same time.
32.5.4 Theorems
There are some elementary theorems under path analysis. They depend upon
the systems of relationships between the causal variables.
(1) When the causal variables X
1
, X
2
, are independent the path diagram will
be as follows:
X
1
Y
X
2
In such cases, the correlation coefficient between causal variable and the
effect variable Y will be equal to the path coefficient and the sum of the
coefficients of determination will be equal to unity. Symbolically
r
y1
= P
y.1
; r
y2
= P
y.2
; and so on.
d
y.1
+ d
y.2
+ . = 1
where 1, 2, etc, denote X
1
, X
2
, etc.
(2) When the effect Y is caused by chain of independent causes the path
diagram may be as follows:
X
1
Y
X
2
X
3
X
4
In such cases
(d
y.1
d
1.3
) + (d
y.1
d
1.4
) + d
y.2
= 1
It can be extended to any number of chains of independent causes.
(3) When Y is caused by common causes the path diagram may be as follows:
Z
1
Z
2
Z
3
Y
X
In this case, the total correlation between two effects Y and X is equal to
the sum of the products of pairs of path coefficients connecting the two
effects with each common cause (Z), when all the common causes are
independent. Symbolically
r
yx
= (P
x.1
P
y.1
) + (P
x.2
P
y.2
) + . = P
x.i
P
y.i
where i represents a common cause Z
i
.
(4) Suppose an effect Y is influenced by some immediate causes X
1
and X
2
which are correlated by their common cause X
3
, X
4
and X
5
. The path
diagram will be as follows:
X
1
X
2
X
3
X
1
X
2
Y
In such cases, the combined path coefficients for all paths connecting the
effect with remote cause equals the sum of products of the intermediate
individual path coefficients along all connecting paths. Symbolically
P
y.j
= P
y.i
P
i.j
where i represents the intermediate causes and j represents the remote cause
connecting the intermediate causes.
32.5.5 Computation of Path Coefficients
Before computing the path coefficients the causal scheme has to be formulated.
We may consider Y as effect and X as cause or X as effect and Y as cause. It may be
remembered that the path coefficient P
y.x
is not equal to P
x.y
in general. Hence,
whenever, we change the direction of a path the directions of other paths in the
scheme also be changed accordingly. Unless the causal scheme is formulated
properly the value of the path coefficient will be erroneous and misleading.
Appropriate correlations have to be used for path analysis. Although it is desirable
that the variables are interval variables, they may be ordinals or nominals or
combination of them. Pearsons correlation coefficient is used for two interval
variables; biserial correlation for an interval and a dichotomy; polyserial correlation
for an interval and an ordinal and tetrochoric correlation for two dichotomies.
From the path diagram a set of simultaneous equations can be written directly.
A solution of the simultaneous equations provides information on the direct and
indirect contribution of the causal variables to the effect. For example, consider the
grain yield of paddy (g/panicle, Y) number of primary branches per panicle (X
1
),
number of secondary branches per panicle (X
2
) and length of panicle (cm, X
3
). The
path diagram for these variables can be given as follows:
X
1
X
2
X
3
Y
R
The correlation between Y and X
1
can be partitioned into three parts, namely,
(1) due to direct effect of X
1
on Y (= P
Y.1
)
(2) due to indirect effect of X
1
on Y through X
2
(= P
Y.2
r
12
)
(3) due to indirect effect on X
1
on Y through X
3
(=P
Y.3
r
13
)
The equation is given by
r
Y1
= P
Y.1
+ P
Y.2
r
12
+ P
Y.3
r
13
Similarly we can write the equations for r
Y2
and r
Y3
r
Y2
= P
Y.1
r
12
+ P
Y.2
+ P
Y.3
r
23
r
Y3
= P
Y.1
r
13
+ P
Y.2
r
23
+ P
Y.3
These equations may be written in the form of a matrix
1
2
3
y
y
y
r
r
r
j \
, (
, (
, (
( ,
=
12 13
21 23
31 32
1
1
1
r r
r r
r r
j \
, (
, (
, (
( ,

.1
.2
.3
y
y
y
P
P
P
j \
, (
, (
, (
( ,
We can obtain the path coefficients by solving
.1
.2
.3
y
y
y
P
P
P
j \
, (
, (
, (
( ,
= [r]
1

1
2
3
y
y
y
r
r
r
j \
, (
, (
, (
( ,
where [r]
1
is the inverse of the correlation matrix of the causal variables.
The effect of the residual variables on Y is calculated as
P
Y.R
=
2
1 R
Usually the residual variable is denoted as h or U
32.5.6. Interpretation of Path Coefficients
The interpretation of path analysis results will be unambiguous only when the
assumptions of causal order and closure are unambiguous. In order to assess the
significance of path coefficients adequate sample size is required. Some authors
suggest ten to twenty times as many cases as variables. The significance of path
coefficients is tested employing t-test that is commonly used for individual
regression coefficients.
The interpretation of path coefficients depends upon its magnitude and
direction as well as its value compared to the total correlation. The factor with
maximum positive direct effect is considered to be the primary causal factor. The
causal factor which has high direct effect and close to the correlation coefficient
between the causal factor and the effect is considered to be more effective.
The direct effect of a causal factor may be positive and high but the correlation
coefficient between that factor and effect may be negative. It is an indication that
undesirable indirect effects have to be nullified in order to make use of the direct
effect.
The direct effect may be negative or may be negligible whereas the correlation
coefficient may be positive. In such situations the indirect causal factors have to be
considered simultaneously.
Example 32.3
In a study on architecture of rice panicle a random sample of 50 panicles of IR-
50 rice variety were selected. The measurements were made on grain yield
(g/panicle, Y), number of primary branches per panicle (X
1
), number of secondary
branches per panicle (X
2
), length of panicle (cm, X
3
) and total length of primary
branches (cm, X
4
).
The inter-correlations between variables are given below in the form of matrix:
Table 32.7 Correlation matrix.
X
1
X
2
X
3
X
4
Y
X
1
1 0.4485 0.1387 0.5541 0.3310
X
2
1 0.5557 0.7865 0.7534
X
3
1 0.5697 0.6177
X
4
1 0.7319
The input path diagram for these variables is given in the given Figure.
X
1
X
2
X
3
Y
X
4
Path diagram
The inverse of the correlation matrix involving causal factors is given in
Table 32.8.
Table 32.8 Inverse of correlation matrix.
[r]
1
=

1.5576 0.1749 0.4360 0.9739
2.7649 0.4864 1.8006
1.6722 0.8117
3.4182
j \
, (
, (
, (
, (
( ,
The correlation coefficients for Y on the causal factors are
r
yx
=
0.3310
0.7534
0.6177
0.7319
j \
, (
, (
, (
, (
( ,
Multiplying the two matrices we get the path coefficients. For example,
P
Y1
= (1.5576) (0.3310) + ( 0.1749) (0.7534) + (0.4360) (0.6177) +
( 0.9739) (0.7319)
= 0.0597
Similarly, we get
P
Y2
= 0.4069; P
Y3
= 0.2167 ; P
Y4
= 0.3215
The indirect effect of X
1
on Y through X
2
is
P
Y2
. r
12
= (0.4069) (0.4485) = 0.1825
The indirect effect of X
2
on Y through X
1
is
P
Y1
. r
21
= ( 0.0597) (0.4485) = 0.0268
Similarly, all other indirect effects can be calculated. The results are presented
in the form of matrix.
Table 32.9 Table of direct and indirect effects.
X
1
X
2
X
3
X
4
Total
X
1
0.0597 0.1825 0.0301 0.1781 0.3310
X
2
0.0268 0.4069 0.1204 0.2529 0.7534
X
3
0.0083 0.2261 0.2167 0.1832 0.6177
X
4
0.0330 0.3200 0.1234 0.3215 0.7319
The total of the direct and indirect effects of a causal factor is the correlation
coefficient of Y on that factor. The diagonal values in the matrix are the direct
effects.
The coefficient of determination is
R
2
= P
y1.
r
y1
+ P
y2
r
y2
+ P
y3
r
y3
+ P
y4
.r
y4
= ( 0.0597) (0.3310) + (0.4069) (0.7534) + (0.2167) (0.6177) + (0.3215)(0.7319)
= 0.6560
The residual effect is
U =
2
1 R = 1 0.6560 = 0.5865
The standard errors for the path coefficients are given as
SE (P
yi
) =
2
.
e ii
s C
where s
2
e
=
2
1
1
R
n p
p = number of causal factors
n = number of observations
C
ii
= diagonal values in the inverse of the correlation matrix
For our example
s
2
e
=
(1 0.6560)
(50 4 1)
= 0.0076
SE(P
y1
) = (0.0076) (1.5576) = 0.1088
SE(P
y2
) = (0.0076) (2.7649) = 0.1450
Similarly,
SE(P
y3
) = 0.1127 ; SE(P
y4
) = 0.1612
To test the significance of the path coefficients we use the t-test
t
i
=
( )
yi
yi
P
SE P
with (n p 1) d.f.
For our example
t
1
=
0.0597
0.1088
= 0.549
t
2
=
0.4069
0.1450
= 2.806
t
3
=
0.2167
0.1127
= 1.923
t
4
=
0.3215
0.1612
= 1.994
It can be verified that only P
Y2
is significant. Hence, it can be stated that the number
of secondary branches per panicle is the major causal factor for the grain yield per
panicle.

An Introduction To Multivariate Analysis

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

An Introduction To Multivariate Analysis

Uploaded by

Copyright:

Available Formats

AN INTRODUCTION TO

= [(n1) 0.5 (p + q + 1)] 2.3026 log

You might also like