Correlated Regressors & Multicollinearity

Temporal Basis Functions
&
Correlated Regressors
Gary Price & Patti Adank
fMRI for Dummies 29-03-06

(or: the trouble with multicollinearity)
by (a slightly puzzled) Patti Adank

Sources:
Will Penny
Rik Hensons slides www.mrc-

cbu.cam.ac.uk/Imaging/Common/rikSPM-GLM.ppt
previous years presenters slides

Correlations between regressors
in multiple regression analysis:

problems for behavioural data
behavioural example (fictional)
solutions
in the General Linear Model:

problems for neuroimaging data
PET example
solutions?

Multiple Regression Analysis
&

Multiple regression analysis
Multiple regression characterises the relationship between

several independent variables (or regressors), X1, X2, X3
etc, and a single dependent variable, Y:
Y = 1X1 + 2X2 +..+ LXL +
The X variables are combined linearly and each has its own
regression coefficient (weight)
s reflect the independent contribution of each regressor,
X, to the value of the dependent variable, Y
i.e. the proportion of the variance in Y accounted for by
each regressor after all other regressors are accounted for
Multiple regression analysis
Fit straight line

through points for
Y and X
some statistics: if the model fits the data well:

- R2 is high (reflects the proportion of variance in Y
explained by the regressor X)
- the corresponding p value will be low
Multiple regression analysis: multicollinearity
multiple regression results are sometimes difficult to

interpret:
the overall p value of a fitted model is very low,
but individual p values for the regressors are high
this means that the model fits the data well, even
though none of the X variables has a significant impact
on predicting Y.
How is this possible?
caused when two (or more) regressors are highly
correlated: problem known as multicollinearity

Regression analysis: multicollinearity example
When is multicollinearity between regressors a problem:

no: when you just want to predict Y from X1 and X2, the
values of R2 and p will be correct
yes: but when you want assess how individual regressors
impact the independent variable:
- individual p values can be misleading: a p value can be
high, even though the variable is important);
- the confidence intervals on the regression coefficients are
very wide and may include zero: you cannot be confident
whether an increase in the X value is associated with an
increase, or a decrease, in Y.

Measures for multicollinearity:

In general:
if r > 0.8 between regressors it can be expected that they
show multicollinearity
In SPSS:
Tolerance:
proportion of a regressors variance not accounted for by other
regressors in the model
low tolerance values are an indicator of multicollinearity
Variance Inflation Factor (VIF)
the reciprocal of the tolerance
large VIF values are an indicator of multicollinearity
Example:
Question: how can the perceived clarity of a auditory
stimulus be predicted from the loudness and
frequency of that stimulus?
perception experiment in which subjects had to judge
the clarity of an auditory stimulus.
model to be fit:
Y = 1X1 + 2X2 +
Y = judged clarity of stimulus
X1 = loudness
X2 = frequency
What happens when X1 (pitch) and X2 (loudness) are

collinear, i.e., strongly correlated?
56
55
54
Correlation loudness
& frequency : 0.945
53
(p<0.000)
52
high loudness values

51
correspond to high
50
110 120 130 140 150 160 170
frequency values
frequency
PITCH
Contribution of individual predictors:
X1 (loudness) is entered as sole predictor:

Y = 0.859X1 + 24.41
R2 = 0.74 (74% explained variance in Y)
p < 0.000
X2 (frequency) entered as sole predictor:

Y = 0.824X1 + 26.94
p < 0.000
Collinear regressors X1 and X2 entered together:

Resulting model:
Y = 0.756X1 + 26.94 (X2?)
p < 0.000
Individual regressors:
X1 (loudness): R2 = , p < 0.000
X2 (frequency): R2 = 0.555, p < 0.594

Regression analysis: removing multicollinearity
How to deal with collinearity:

1. Increase the sample size (no data like more data)
2. Orthogonalise the correlated regressor variables
- using factor analysis
- this will produce linearly independent regressors and
corresponding factor scores.
- these factor scores can subsequently be used instead
of the original correlated regressor values

General Linear Model
&

General Linear Model
the General Linear Model can be seen as an extension

of multiple regression (or multiple regression is just a
simple form of the General Linear Model):
Multiple Regression only looks at ONE Y variable
GLM allows you to analyse several Y variables in a
linear combination (time series in voxel)
ANOVA, t-test, F-test, etc. are also forms of the
GLM

General Linear Model and fMRI
Y = X . +
Observed Design matrix Parameters Error/residu

data Several components which Define the al
Y is the BOLD explain the observed data, contribution of Difference
signal at various i.e. the BOLD time series each component between the
time points at a for the voxel of the design observed data, Y,
single voxel Timing info: onset vectors, matrix to the and that predicted
Omj, and duration vectors, value of Y by the model,
Dmj X .
HRF, hm, describes shape
of the expected BOLD
response over time
Other regressors, e.g.
realignment parameters
Experimental
manipulations

fMRI: constructing the design matrix
In analysing fMRI data, the problem of

multicollinearity occurs when specifying regressors in
the design matrix
If the regressors are linearly dependent (correlated)
then the results of the GLM are not easy to interpret
because variance attributable to an individual
regressor may be confounded with other regressor(s)
this may lead to misinterpretations of activations in
certain brain areas

fMRI: an example
for example:
- suppose that a response to a stimulus Sr is highly
correlated with the associated motor response Mr;
- and suppose it is hypothesised that a specific regions
activity for Sr is not influenced by Mr;
- then this region should be tested only after removing
all the variance from the regressor for Mr all variance
that can be explained for by Sr;
- dangerous: as the motor response does influence the
signal in the region; the test signal will be overly
significant! -> variance is wrongly assigned to Sr
fMRI: PET example
Andrade et al., (1999) Ambiguous Results in Functional

Neuroimaging Data Analysis Due to Covariate
Correlation, NeuroImage 10, 483-486
Andrade at al. show how correlated regressors can lead to
misinterpretations
collected PET data from a single subject and generated a
covariate (regressor) variable that correlated strongly with
the activation conditions used in the experiment (0 for
rest, 1-6 increasing linearly with activation levels in the
experiment)

fMRI: PET example
two purposes:
1. detect areas where the signal correlated with the
generated covariate
2. search for differences in activation versus control
periods
Implies fitting two models:
One with activation-vs-rest plus covariate regressors (r = 0.845):
M = C1 (ac-rest) + C2 (covariate)
One with variance from covariate C2 removed:
M* = C1 + C2* (C2* = 0.845(SSC2/SSC1)

fMRI: PET example
For model M and M*

SPM processing
parameters for C1 and C2/C2* were tested using t-tests
and transformed into z-scores
Results:
differences between M and M* occurred only for
activation related to C1 (the rest/activation regressor)
e.g., parahippocampal activation significant in M but
not in M*
left precuneal, superior temporal, medial frontal activity
significant in M* but not in M
fMRI: PET example
Example voxels:
(54, -56, 34) activated in M (p = 0.004) not in M* (p = 0.901)
(6, 28, -28) activated in M* (p = 0.014) not in M (p = 0.337)

fMRI: dealing with multicollinearity
Andrade et al. suggest a technique using the F-statistic to

orthogonalise correlated regressors without having to re-
estimate the parameters (which can be very time-
consuming) using principles from linear model theory
(Christensen, 1996)
Other technique used to remove correlations from
regressors: Gram-Schmidt orthogonalisation (cf. Rik
Hensons slides)
Christensen, 1996, Plane answers to Complex Questions: The Theory of Linear Models, Springer-
Verlag, Berlin

Dealing with multicollinearity in SPM
Use toolbox Design Magic - Multicollinearity assessment

for fMRI for SPM99 (SPM5?)
Author: Matthijs Vink
URL: http://www.matthijs-vink.com/tools.html
Allows you to assess the multicollinearity in your fMRI-
design by calculating the amount of factor variance that is
also accounted for by the other factors in the design
(expressed in R2).
also allows you to reduce correlations between regressors
through use of high-pass filters

Conclusion
When fitting a model in multiple regression analysis or

constructing your design matrix, correlations between
regressors can lead to misinterpretations of the
influence of the independent variables on the
dependent variable
Multicollinearity is a hassle, but can be dealt with,
usually though orthogonalisation procedures involving
(groups of) regressors

Assessing multicollinearity in SPM
The end

Correlated Regressors &amp; Multicollinearity

Uploaded by

Document Information

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Correlated Regressors &amp; Multicollinearity

Uploaded by

Copyright:

Available Formats

Temporal Basis Functions

Gary Price & Patti Adank

fMRI for Dummies 29-03-06

by (a slightly puzzled) Patti Adank

fMRI for Dummies 29-03-06

Rik Hensons slides www.mrc-

previous years presenters slides

fMRI for Dummies 29-03-06

in multiple regression analysis:

in the General Linear Model:

fMRI for Dummies 29-03-06

fMRI for Dummies 29-03-06

Multiple regression characterises the relationship between

Y = 1X1 + 2X2 +..+ LXL +

Fit straight line

some statistics: if the model fits the data well:

multiple regression results are sometimes difficult to

fMRI for Dummies 29-03-06

When is multicollinearity between regressors a problem:

fMRI for Dummies 29-03-06

Measures for multicollinearity:

What happens when X1 (pitch) and X2 (loudness) are

high loudness values

Contribution of individual predictors:

X1 (loudness) is entered as sole predictor:

X2 (frequency) entered as sole predictor:

Collinear regressors X1 and X2 entered together:

fMRI for Dummies 29-03-06

How to deal with collinearity:

fMRI for Dummies 29-03-06

fMRI for Dummies 29-03-06

the General Linear Model can be seen as an extension

fMRI for Dummies 29-03-06

Observed Design matrix Parameters Error/residu

fMRI for Dummies 29-03-06

In analysing fMRI data, the problem of

fMRI for Dummies 29-03-06

Andrade et al., (1999) Ambiguous Results in Functional

fMRI for Dummies 29-03-06

fMRI for Dummies 29-03-06

For model M and M*

fMRI for Dummies 29-03-06

Andrade et al. suggest a technique using the F-statistic to

fMRI for Dummies 29-03-06

Use toolbox Design Magic - Multicollinearity assessment

fMRI for Dummies 29-03-06

When fitting a model in multiple regression analysis or

fMRI for Dummies 29-03-06

fMRI for Dummies 29-03-06

You might also like

Correlated Regressors & Multicollinearity

Correlated Regressors & Multicollinearity