Characterising and Displaying Multivariate Data

Psyx 522 Multivariate Statistics Autumn 2013
1

Chapter 3 Characterizing and Displaying Multivariate Data

Rencher, A. C. & Christensen, W. F. (2012). Methods of multivariate analysis. Wiley: New Jersey

Sketchy Notes by Daniel J. Denis, Ph.D.
Data & Decision Lab, University of Montana

last revised Thursday, September 12, 2013

Synopsis These notes are meant to summarize the primary technical details of the textbook. The way to use
these notes is to read through the equations below and see if you can place them in a conceptual framework. Do
not simply memorize equations and derivations. Rather, use the technical details to aid in your understanding of
the concepts.

In this chapter we review fundamental concepts in statistics and generalize these concepts to the multivariate
domain. As well, select graphical approaches to data analysis are reviewed.

Primary Topics - Overview

3.1. Mean and Variance of Univariate Random Variable
3.2. Covariance and Correlation of Bivariate Random Variables
3.3. Scatterplots of Bivariate Samples
3.4. Graphical Displays for Multivariate Samples
3.5. Dynamic Graphics
3.6. Mean Vectors
3.7. Covariance Matrices
3.8. Correlation Matrices
3.9. Mean Vectors and Covariance Matrices for Subsets of Variables
3.10. Linear Combinations of Variables
3.11. Measures of Overall Variability
3.12. Estimation of Missing Values
3.13. Distance Between Vectors

3.1. Mean and Variance of a Univariate Random Variable

Definition of random variable (formal vs. informal definitions)
Continuous vs. Discrete random variables

Density Functions ( ) f y

Indicates the relative frequency (density) of occurrence of the random variable y .
If
1 2
( ) ( ) f y f y > , points in the neighborhood of
1
y are more likely to occur than points in the
neighborhood of
2
y (What is a neighborhood?).
Population mean of a random variable is the mean of all possible values of y , denoted , it is the
expected value ( ( ) E y ) of y .
If density ( ) f y is known, the mean can be found using calculus (integral calculus); if ( ) f y is unknown,
use past experience or real data to estimate using sample mean from random sample(s).


2

Sample mean of random variable defined as arithmetic average:
1
1
n
i
i
y y
n
=
=

Probability is zero that y = , however ( ) E y = (unbiased estimator), and var
2
( ) y
n
o
= , y has a
smaller variance than a single observation y .

Transformation by a Constant

( ) ( ) E ay aE y a = = (if every y in the population is multiplied by a constant a , the expected value
is also multiplied by a . )
Analogously, the above holds for the sample mean as well: if
i i
z ay = for 1, 2,..., i n = , then z ay =

Variance
Population variance, var
2 2
( ) ( ) y E y o = =

2 2
2 2
2 2
2 2
2 2 2 2
2 2
2
( ) ( )
( )( )
( )
( ) ( ) ( )
( )
( )
( )
( )
y E y
E y y
E y y yu
E y E y E y
E y
E y
E y
E y
o

= =
=
= +
= +
= +
= +
=
=

Sample Variance and Sample Standard Deviation

2
2 1
( )
1
n
i
i
y y
s
n
=

=

which can also be written as
2
2
2 1
1
n
i
i
y ny
s
n
=

=

2 2
( ) E s o = (unbiased estimator, p(
2
s ) =
2
o = 0)

2
o o = and
2
s s = are the standard deviations for population and sample, respectively


3

Transformation by a Constant

If each y is multiplied by a constant a , the population variance is multiplied by
2
a , that is, var
2 2
( ) ay a o = .
Likewise, if , 1, 2,...,
i i
z ay i n = = , then the sample variance is z is given by
2 2 2
z
s a s = (Why?)

3.2. Covariance and Correlation of Bivariate Random Variables

3.2.1. Covariance

Bivariate random variable ( , ) x y (two variables are measured on each research unit)
Population covariance, cov( , ) [( )( )
xy x y
x y E x y o = = (When is the cross-product positive?
When is it negative?)
( )
xy x y
E xy o = because

[( )( )]
( )
( ) ( ) ( )
( )
( )
xy x y
y x x y
y x x y
x y y x x y
y x
E x y
E xy x y
E xy E x E y
E xy
E xy
o

=
= +
= +
= +
=

Addition and Multiplication of Random Variables

( ) ( ) ( ) E x y E x E y + = + (regardless of whether x and y are independent)
( ) ( ) ( ) E xy E x E y = (given x and y are independent)
What is independence? Two variables x and y are independent if their joint density is the product of
their individual densities: ( , ) ( ) ( ) f x y g x h y =
Independence 0
xy
o = but 0
xy
o = does not imply independence (Why not?)

Proof that Independence 0
xy
o =

( )
( ) ( )
0
xy x y
x y
x y x y
E xy
E x E y
o

=
=
=
=

HOWEVER, if , x y are bivariate normal, then 0
xy
o = independence (Why?)

4

Sample Covariance

1
( )( )
1
n
i i
i
xy
x x y y
s
n
=

=
or
1
1
n
i i
i
xy
x y nxy
s
n
=

=

( )
xy xy
E s o =

xy xy
s o = in any given sample, even when 0
xy
o =
Covariance measures linear relationship (not curvilinear relationships such parabolic, exponential,
logarithmic ones)
Why does covariance measure only linear relationships? It does so because it is proportional to the slope
in the computation of regression coefficient:

1
1
2
2
1
( )( )
( )
n
i i
xy
i
n
x
i
i
x x y y
s
s
x x
|
=
=

= =

Variables with zero sample covariance are orthogonal (
1
0
n
i i
i
a b
=
=
); That is, for even centered variables,

1
( )( ) 0
n
i i
i
x x y y
=
=
.

Covariance is not invariant to change of scale or variable variability.

Correlation

Pearsons r is the standardized covariance
Pearsons r is invariant to change of scale (it is a pure or dimensionless measure of linear
relationship).
Population correlation for variables , x y :

xy
= corr ( , ) x y =
2 2
[( )( )]
( ) ( )
xy x y
x y
x y
E x y
E x E y
o
o o

=

Sample correlation for variables , x y :

1
2 2
1 1
( )( )
( ) ( )
n
i i
xy
i
xy
n n
x y
i i
i i
x x y y
s
r
s s
x x y y
=
= =

= =


5

Correlation related to Cosine (p. 53)

3.3. Scatterplots of Bivariate Samples

See pp. 55-56, Figures 3.5 and 3.6 for scatterplots.

Quote from text: If the origin is shifted to ( , ) x y , as indicated by the dashed lines, then the first and third
quadrants contain most of the points. (p. 55). Comments?

But because of the independence assumed for these variables, each quadrant is likely to have as many points as
any other quadrant. (pp. 55-56)

3.4. Graphical Displays for Multivariate Samples

Plotting bivariate data is relatively easy
Plotting multivariate data is much more difficult because instead of two (bivariate) observations per case,
we now have three or more in a given vector y
How can we plot the value of a third variable in a bivariate scatterplot?

Representing p Dimensions where 2 p >

1. Profiles
2. Stars
3. Glyphs
4. Faces
5. Boxes

3.5. Dynamic Graphs

Using software, we can obtain the grand tour of a data set.
Ggobi software for visualization (http://www.ggobi.org/)

3.6. Mean Vectors

If there are n individuals in the sample, the n observation vectors are denoted by
1 2
, ,...,
n
y y y , where:

1
1
.
.
.
i
i
i
ip
y
y
y
| |
|
|
|
= |
|
|
|
|
\ .
y

6

Sample mean vector y :

1
2
1
. 1
.
.
n
i i
i
p
y
y
n
y
=
| |
|
|
|
|
= =
|
|
|
|
\ .
y y

'
1
'
2
'
'
.
.
.
.
.
.
i
n
| |
|
|
|
|
|
|
|
=
|
|
|
|
|
|
|
\ .
y
y
Y
y
y
= (units)
11 12 1 1
21 22 2 2
1 2
1 2
... ...
... ...
. . . .
. . . .
. . . .
... ...
. . ... . ... .
. . ... . ... .
. . ... . ... .
... ...
j p
j p
i i ij ip
n n nj np
y y y y
y y y y
y y y y
y y y y
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
\ .

We can also obtain y fromY (see pp. 64-65 for details)


7

Population Mean Vector

1 1 1
2 2 2
( )
( )
. . .
( )
. . .
. . .
( )
p p p
y E y
y E y
E E
y E y
| | | | | |
| | |
| | |
| | |
= = = = | | |
| | |
| | |
| | |
| | |
\ . \ . \ .
y

The expected value of y is:

1 1
1
2
2 2
( )
( )
.
. .
( )
.
. .
.
. .
( ) p
p p
y E y
y E y
E E
y E y
| | | |
| |
| |
|
| |
|
| |
|
| |
= = = = |
| |
|
| |
|
| |
|
|
| |
\ .
\ . \ .
y ( y is an unbiased estimator of )

3.7. Covariance Matrices

Sample covariance matrix ( )
jk
s = S is the matrix of sample variances and covariances of p variables:

11 12 1
21 22 2
1 2
...
...
. . .
( )
. . ... .
. . ... .
...
p
p
jk
p p pp
s s s
s s s
s
s s s
| |
|
|
|
|
= =
|
|
|
|
\ .
S

Approaches to computing the sample covariance matrix (see pp. 66-67)


8

Population Covariance Matrix

11 12 1
21 22 2
1 2
...
...
. . .
cov( )
. . ... .
. . ... .
...
p
p
p p pp
o o o
o o o
o o o
| |
|
|
|
|
= =
|
|
|
|
\ .
y

The population covariance matrix can be found by [( )( )'] E = y y (see p. 68, the
derivation looks complicated, but is straightforward)

3.8. Correlation Matrices

Sample correlation defined as:

jk jk
jk
j k jj kk
s s
r
s s s s
= =

12 1
21 2
1 2
1 ...
1 ...
. . .
( )
. . .
. . .
... 1
p
p
jk
p p
r r
r r
r
r r
| |
|
|
|
|
= =
|
|
|
|
\ .
R


9

Obtaining the Correlation Matrix from the Covariance Matrix (and vice versa)

11 22
1 2
1
2
( , ,..., )
( , ,..., )
0 ... 0
0 ... 0
. . .
. . .
. . .
0 0 ...
s pp
p
p
s s s
s s s
s
s
s
=
=
| |
|
|
|
= |
|
|
|
|
\ .
D diag
diag

-1 -1
s s
s s
=
=
R D SD
S D RD

Population Correlation Matrix

12 1
21 2
1 2
1 ...
1 ...
. . .
( )
. . .
. . .
... 1
p
p
jk
p p
p

| |
|
|
|
|
= =
|
|
|
|
\ .
P

jk
jk
j k
o
o o
=

3.9. Mean Vectors and Covariance Matrices for Subsets of Variables and Subvectors

We skip this section, as not much new is introduced, other than how to partition vectors and covariance
matrices. The discussion continues into section 3.9.2 as well. We resume with the extremely important discussion
of linear combinations of variables in section 3.10. Linear combinations are at the heart of multivariate (and
univariate) analysis.


10

3.10. Linear Combinations of Variables

3.10.1 Sample Properties

We are interested in linear combinations that maximize some function
We are interested in linear combinations that compare variables, e.g.,
1 2
y y
Because linear combinations are of such interest, we need to know statistical properties of them such as
means, variances, covariances.

Linear combination

Linear combination of the elements of the vector y :

1 1 2 2
...
p p
z a y a y a y = + + + = a'y
where
1 2
( , ,..., )
p
a a a = a'

If the same coefficient vector a is applied to each
i
y in a sample, we have:
1 1 2 2
...
, 1, 2,...,
i i i p ip
i
z a y a y a y
i n
= + + +
= = a'y

Sample mean of linear combination
i
z :

1
1
n
i
i
z z
n
=
= =
a'y (analogous to univariate result, z ay = )

Sample variance of linear combination , 1, 2,...,
i i
z i n = = a'y :

2
2 1
( )
1
n
i
i
z
z z
s
n
=

= =
a'Sa (multivariate analog to

2 2 2
z
s a s = )

Variance is nonnegative, and so
2
0
z
s > , and 0 > a'Sa
S is at least semi-positive definite

Correlating Linear Combinations

Define another linear combination
1 1 2 2
...
p p
w b y b y b y = = + + + b'y where
1 2
( , ,..., )
p
b b b = b' is a
vector of constants different from a' , then we can compute the sample covariance between these linear
combinations , z w:


11

1
( )( )
1
n
i i
i
zw
z z w w
s
n
=

= =
a'Sb

In summary, the variance of a linear combination is given by a'Sa while the covariance between linear
combinations is given by a'Sb.

The correlation between these linear combinations can then be obtained:

2 2
( )( )
zw
zw
z w
s
r
s s
= =
a'Sb
a'Sa b'Sb

Toward Multiple Linear Combinations

In multivariate methodology, we may wish to produce several linear combinations for a given problem.
Denote the constant vectors a and b as
1
a and
2
a :

'
1
'
2
| |
=
|
|
\ .
a
A
a

Define
'
1 1
'
2
2
z
z
| |
| |
= =
|
|
|
\ .
\ .
a y
z
a y

Factor y from the above, to get:
'
1
'
2
| |
= =
|
|
\ .
a
z y Ay
a

Evaluate bivariate
i
z for each p - variate
i
y in the sample, we obtain , 1, 2,...,
i i
i n = = z Ay
The average of z over the sample can be found from:

'
1
1
'
1
2
'
1
'
2
z
z
| | | |
= = | |
| |
\ . \ .
| |
= =
|
|
\ .
a y
z
a y
a
y Ay
a

Sample Covariance Matrix for Linear Combinations

We can specify the sample covariance matrix for linear combinations similar to how we specified the
sample covariance matrix for variables, only now, the elements will be variances of linear combinations
along the diagonal, and covariances of linear combinations in the off-diagonal (upper mirrors lower):


12

1 1 2
2 1 2
2
' '
1 1 1 2
2 ' '
2 1 2 2
z z z
z
z z z
s s
s s
| |
| |
| = =
|
|
|
\ .
\ .
a Sa a Sa
S
a Sa a Sa

We can factor the above into
'
1
1 2
'
2
( , ) '
z
| |
= =
|
|
\ .
a
S S a a ASA
a
.

We can now use the above to generalize to several linear combinations such as well see in techniques such as
principal components analysis.

Suppose we have k linear transformations, these can be expressed as:

'
1 11 1 12 2 1 1
'
2 21 1 22 2 2 2
'
1 1 2 2
...
...
. .
. .
. .
...
p p
p p
k k k kp p k
z a y a y a y
z a y a y a y
z a y a y a y
= + + + =
= + + + =
= + + + =
a y
a y
a y

More compactly, we can write the above system of equations as:

' '
1 1 1
' '
2 1 1
' '
1
. . .
. . .
. . .
k
k
z
z
z
z
| | | |
| |
| |
|
| |
|
| |
|
| | = = = =
|
| |
|
| |
|
| |
|
|
| |
\ .
\ . \ .
a a y
a a y
y Ay
a a y

We can write the sample mean vector as:

' '
1 1
' '
2 2
'
'
. .
. .
. .
k
k
| |
| |
|
|
|
|
|
|
|
|
= = =
|
|
|
|
|
|
| |
\ .
\ .
a y a
a a y
z y Ay
a
a y


13

Extending the Covariance Matrix
z
S to k Linear Combinations

Recall that for two linear combinations, we defined the covariance matrix as:
1 1 2
2 1 2
2
' '
1 1 1 2
2 ' '
2 1 2 2
z z z
z
z z z
s s
s s
| |
| |
| = =
|
|
|
\ .
\ .
a Sa a Sa
S
a Sa a Sa

We can easily extend the above covariance matrix to encompass several linear combinations:

' ' '
1 1 1 2 1
' ' '
2 1 2 2 2
' ' '
1 2
'
1 1 2
'
2 1 2
'
1 2
'
1
'
2
...
...
. . .
. . .
. . .
...
( , , ... )
( , , ... )
. . .
. . .
. . .
( , , ... )
.
.
.
k
k
z
k k k k
k
k
k k
| |
|
|
|
|
= |
|
|
|
|
|
\ .
| |
|
|
|
|
= |
|
|
|
|
|
\ .
=
a Sa a Sa a Sa
a Sa a Sa a Sa
S
a Sa a Sa a Sa
a Sa Sa Sa
a Sa Sa Sa
a Sa Sa Sa
a
a
1 1
'
'
1
'
2
1 2
'
( , ,..., )
.
( , ,..., )
.
.
'
k
k
k
k
| |
|
|
|
|
|
|
|
|
\ .
| |
|
|
|
|
=
|
|
|
|
\ .
=
Sa Sa Sa
a
a
a
S a a a
a
ASA


14

The trace of this covariance matrix of linear combinations is
'
1
( ')
k
i i
i=
=
tr ASA a Sa
In the text is given an example of a more general linear transformation. For details, see p. 78.

3.10.2. Population Properties

The sample results can easily be extended to expressions about the population:

Population mean of the linear combination z is:

( ) ( ) ' ( ) ' E z E E = = = a'y a y a

Population variance is given by
2
var( ' ) '
z
o = = a y a a
Population covariance for linear combinations ' z = a y and ' w= b y :

cov( , ) '
zw
z w o = = E a b

Population correlation of z and w is:
'
( ' , ' )
( ' )( ' )
zw
zw
z w
o
o o
E
= = =
E E
a b
corr a y b y
a a b b

3.11. Measures of Overall Variability

A single numerical measure for the overall multivariate variability is given by the generalized sample
variance, computed as the determinant of the sample covariance matrix. That is, | | S .
The volume of an ellipsoid is proportional to
1/ 2
| | S
A zero eigenvalue indicates redundancy in the linear relationships among variables (of which the
eigenvector reveals the form of the linear dependency); if the smallest eigenvalue
p
is zero, no axis
exists in that direction, and the ellipsoid lies in not p , but 1 p dimensional subspace of p - space.
Another measure of overall variability in the multivariate setting is the total sample variance, which is the
trace of S , computed as
11 22
( ) ...
pp
s s s = + + + tr S .
Note that the trace ignores covariation and rather focuses on variation in the sample; in principal
components analysis, the focus is on total variation, whereas in factor analysis, the focus is on total
covariation.
Generally, large values of | | S and ( ) tr S reflect broad scatter of
1 2
, ,...,
p
y y y about y
HOWEVER, for | | S , a very small value of it, or its standardized version, | | R , may indicate either small
scatter OR multicollinearity (which may be due to high pair-wise correlations or to the multiple correlation
of one variable and a linear combination of others (recall variance inflation factor and tolerance)


15

3.12. Estimation of Missing Values

If your data has a large number of missing values, your data collection procedure is problematic; it is not a
problem any longer of having missing data, its a problem of how to collect data well in the first place.
Random missing values are more easily justified to replace than if the pattern is non-random.
If missing data occurs at random, imputation or regression approaches may be used (if you really must).
Missing value procedures may be technically interesting, but scientifically, you must be very cautious
when deciding to use such a procedure. Is it very scientific to make up data points?

3.13. Distance Between Vectors

Standardized distance (or statistical distance):
1 1
| |
o
OR
1
| |
y
y
o

What are the above measures called in applied research?
Euclidean distance between two vectors
1 2 1 2
( ) '( ) y y y y

1 2 1 2
( ) '( ) y y y y is standardized by dividing by the sample covariance matrix S (remember we
arent actually dividing as in scalar algebra, but rather are multiplying by the inverse,
1
S :

2 1
1 2 1 2
( ) ' ( ) d

= y y S y y

We can generalize the above to various situations in which sample means and population means are used
instead:

2 1
2 1
2 1
1 2 1 2
( ) ' ( )
( ) ' ( )
( ) ' ( )
D

=
A = E
A = E
y S y
y y

Squared distances between two vectors are known generally as Mahalanobis distances (after
Mahalanobiss work in 1936)

Determinants of Mahalanobis Distance

If a random variable has a larger variance than another, it receives relatively less weight in a Mahalanobis
distance.
Two highly correlated variables do not contribute as much as two variables that are less correlated.
The use of the inverse of the covariance matrix has the effect of standardizing all variables to the same
variance.
The use of the inverse also has the effect of eliminating correlations (proof on p. 85).

End of Notes.

Characterising and Displaying Multivariate Data

Uploaded by

Document Information

Original Description:

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Characterising and Displaying Multivariate Data

Uploaded by

Copyright:

Available Formats

Psyx 522 Multivariate Statistics Autumn 2013

); That is, for even centered variables,

a'y (analogous to univariate result, z ay = )

a'Sa (multivariate analog to

You might also like