You are on page 1of 23

MULTICOLLINEARITY: WHAT HAPPENS

IF EXPLANATORY VARIABLES ARE


CORRELATED?

Multicollinearity: A situation where two or

more variables can be highly linearly related .


Perfect multicollinearity: Perfect

collinearity exists when two variables are


perfectly linearly related that is R2 = 1
Imperfect multicollinearity: Imperfect or

near collinearity exists when two variables are


approximately linearly related that is R 2 <1

THEORETICAL CONSEQUENCES
Unbiasedness is the result of repeated

sampling.
Minimum variance does not mean small

numerical value.
Multicollinearity is a sample phenomenon.

PRACTICAL
CONSEQUENCES
Large variances and standard error of OLS

estimators
Wider confidence intervals
Insignificant t ratios
High R2 value but few significant t ratios
High sensitivity of estimators and standard

error to small changes in data


Wrong sign for regression coefficient
Difficulty in assessing individual contribution

DETECTION OF
MULTICOLLINEARITY
High R2 but few significant t ratios
High pair wise correlations among explanatory

variables
Examination of partial correlations
Subsidiary or auxiliary regressions
The Variance Inflation Factor (VIF)

IS MULTICOLLINEARITY
NECESSARILY BAD?
Answer depends on the purpose of the study Collinearity is not a problem if the purpose is to
forecast future mean or estimate group of
coefficient
Collinearity is problem if the purpose is reliable
estimation of parameters.

REMEDIAL MEASURES
Drooping a variable(s) from the model
Acquiring additional data or a new sample
Rethinking the model
Prior information about some parameters
Transformation of variables

HETEROSCEDASTICITY:
WHAT HAPPENS IF THE ERROR
VARIANCE IS NONCONSTANT

HETEROSCEDASTICITY
The variance of error term or ui will
vary from
observation to observation.
The Nature of Heterorscedasticity:
The variance of dependent variable around

its mean will not remain same at all level of


explanatory variable(s).
Usually found in cross-sectional data.

CONSEQUENCES OF
HETEROSCEDASTICITY
OLS estimators remain linear.
OLS estimators remain unbiased.
The estimators are not BLUE because they no

longer have minimum variance.


The bias arises as the estimator of true variance

is no longer an unbiased of estimator.


The usual confidence intervals and hypothesis

tests based on t and f distributions become


unreliable

DETECTION OF
HETEROSCEDASTICITY
Nature of the problem
1. Prior studies
2. Use of cross-sectional data

Graphical Examination of residuals


Park test
1. Run the original regression
2. Obtain the residuals, square them and take

their log.
3. Run regression of residuals against explanatory
variables or alternatively against the
dependent variable
In e2i = B1 + B2 In Xi + vi
4. If the null hypothesis that B2=0 is rejected
then heteroscedasticity exists

Glesjer Test
Using the absolute value of residuals

|ei|= B1 +B2Xi+ vi
Whites General Heteroscedasticity Test
1.
2.

3.

4.

Run the original regression and obtain the residuals


Square the residuals and regress on the all original
variables, their squared values and their crossproducts.
According to White the R2 value obtained from the
regression of residuals and explanatory variables
times the sample size follows Chi-square distribution
with degrees of freedom equal to the number of
explanatory variables.
If obtained chi-square value exceeds critical chisquare value at the chosen level of significance we
can
reject
the
null
hypothesis
of
no
heteroscedasticity.

AUTOCORRELATION: WHAT
HAPPENS IF ERROR TERMS ARE
CORRELATED

AUTOCORRELATION
Correlation between members of observation
s ordered in time(time-series data) or
space(cross-sectional data).
E(ui uj) 0
ij

Reasons of Autocorrelation
Inertia
The Cobweb Phenomenon
Model Specification Error
Data Manipulation

CONSEQUENCES OF
AUTOCORRELATION
OLS estimators are still linear and biased.
The estimators are not BLUE because they

no longer have minimum variance.


The estimated variances of OLS are biased.

Sometimes underestimate true variances


and inflate t values.
The usual t and f tests are not generally

reliable.

The

computed error variance is biased

estimator the true variance.


The computed R2 may be an unreliable

measure of true R2
The

computed

variances

and

standard

errors for forecast may also be inefficient.

DETECTING AUTOCORRELATION
Graphical Method: Plotting residuals

against time.
The Durbin-Watson d test:
The ratio of the sum of squared differences
on successive residuals to RSS
d= (et et-1)2/et2
Underlying Assumptions
1. An intercept
2. The X variables are non-stochastic
3. The disturbances are generated by the

following mechanism:
ut= put-1 + vt
-1 p 1

No lagged value of the dependent variable as an

explanatory variable in the regression


For large sample d 2(1-p)
where means approximately and p is an estimator
of p.
if p = -1 and d = 4 there is perfect negative
correlation
If p= 0 and d= 2 there is no autocorrelation
If p= 1 and d=0 there is perfect positive correlation
There are two critical d values: upper limit d u and

lower limit dL

Steps involved in the test:


1. Run the OLS regression and obtain the

residuals et
2. Compute d
3. Find out the critical dL and du for given
sample size and the given number of
explanatory variables
Null Hypothesis Decision If
4. Decision rules:
No positive
reject
0<d<dL
correlation
No positive
No
dL d du
correlation
decision
No negative
reject
4-dL<d<4
correlation
No negative
No
4-dud4-dL
correlation
decision

REMEDIAL MEASURES
Transformation of model
Transformation process:
Two variable regression model
Yt=B1+B2Xt+ut

..(i)

Regression with one lag period

Yt-1=B1+B2Xt-1+ut-1 ..(ii)

Multiply regression by p on both sides


p Yt-1= p B1+ p B2Xt-1+ p ut-1 ..(iii)

Subtract equation (iii) from (i)


Yt- p Y = B (1-p)
t-1
1

+ B2(Xt- p Xt-1)+vt

HOW TO ESTIMATE P
p=1: The first difference Method
P estimated from d statistic
p 1-d/2

P estimated from OLS residuals


et= pet-1 + vt

Thank you