You are on page 1of 23

Gujarati(2003): Chapter 10 and 11

Multicollinearity
One of the assumptions of the CNLRM
requires that THERE IS NO EXACT
LINEAR RELATIONSHIP between any of
the explanatory variables
i.e non of the X variables are perfectly
correlated. If so, we say there is perfect
multicollinearity.
The nature of multicollinearity
If multicollinearity is perfect, the regression coefficients of
the X
i
variables, |
i
s, are indeterminate and their standard
errors, Se(|
i
)s, are infinite.
In general:
When there are some functional relationships
among independent variables, that is
i
X
i
= 0

or
1
X
1
+
2
X
2
+
3
X
3
++
i
X
i
= 0


Such as 1X
1
+ 2X
2

= 0 X
1
= -2X
2

The nature of multicollinearity
If all the Xs are uncorrelated, there is ABSENCE
OF MULTICOLLINEARITY.

Cases in between are described by various
degress of multicollinearity.

High degree of multicollinearity occurs when an
X is highly correlated with another X or with a
linear combination of other Xs.
The nature of multicollinearity
So, we study
- nature of multicollinearity
- consequencirs of multicollinearity
- detection and remedies


Multicollinearity
NOTE:
1) Multicollinearity is a question of degree not
of kind. It is not a matter of irs absence of
presence. Its degree is important.

2) It is a feature of the sample, not of the
population. It is due to the assumption of
nonstochastic Xs. We do not test for
multicollinearity but we can measure its
degree in a particular sample.
Multicollinearity
If the X variables in a multiplr regression
model are highly correlated, we say that
degree of multicollinearity is HIGH.

Then,variances of estimated coefficients
are large, t-values are low and so
estimates ate highly unreliable.
Consequences of imperfect multicollinearity
5. The OLS estimators and their standard errors
can be sensitive to small change in the data.
Can be
detected from
regression
results
1. Although the estimated coefficients are BLUE,
OLS estimators have large variances and
covariances, making the estimation with
less accuracy.
2. The estimation confidence intervals tend to be
much wider, leading to accept the zero null
hypothesis more readily.
3. The t-statistics of coefficients tend to be
statistically insignificant.
4. The R
2
can be very high.
Detection of high degree of multicollinearity
Symptoms
Using the feature of imperfect (or high degree of)
multicollinerity summarised under item 5, we can
delete or add a few observations from the data set and
examine the sensitivity of the coefficient estimates and
their standard errors to the changes. i.e are the
coefficient estimates and their standar errors change
substantially?
Based on large , t -values are low leading to
unprecise estimates. This is the reason why we may
get high significant F-test bu t-tests may be
insignificant.

)

( . | e s
2
R
Difficulty in detecting high degree of
multicollinearity
1) However, although we may have significant t-
ratios and significant F-test, still it is possible to
have high degree of multicollinearity.

2) High variances (so standard errors) and
covariances of estimated coefficients may also
occur due to small variation in Xs i.e due large
error variances
Difficulty in detecting high degree of
multicollinearity
In a three-variable case:

From the formula for







u X X Y + + + =
3 3 2 2 1
| | |
2
|

=
) 1 ( ) (
)

(
2
23
2
2
2
2
3 2
2
3
2
2
2 2
3
2
r x x x x x
x
Var
u
u
o
o
|
Difficulty in detecting high degree of
multicollinearity
where is obtained from the OLS regression of

on as

It can be seen that high variance of depends on 3
factors
1) high error variance,
2) small dispersion in Xs,
3) correlation between ,



2
23
r
2
X
c o o + + =
3 2 1 2
X X
3
X
2

|
2
u
o

2
i
x
3 2
& X X
23
r
Remedial Measures
1. Utilise a priori information
2. Combining cross-sectional and time-series data
3. Dropping a variable(s) and re-specify the regression
4. Transformation of variables:
(i) First-difference form
(ii) Ratio transformation
5. Additional or new data
6. Reducing collinearity in polynomial regression
7. Do nothing (if the objective is only for prediction)
'
3 3 2 2 1
u X X Y + A + A + = A | | |
u X X Y + + + =
3 3 2 2 1
| | |
u X X + + + =
3 2 2 2 1
1 . 0 | | |
u X X + + + = ) 1 . 0 (
3 2 2 1
| |
u Z + + =
2 1
| |
2 3
1 . 0 | | = given
' ) ( )
1
(
3
3
2
2
3
1
3
u
X
X
X X
Y
+ + + = | | |
'
3 3
2
2 2 1
u X X Y + + + = | | | '
2
3 3 2 2 1
u X X Y + + + = | | |
Insignificant
Significant
Since GDP and GNP
are highly related
Other examples: CPI <=> WPI;
CD rate <=> TB rate
M2 <=> M3
Heteroscedasticity
One of the assumptions of the CNLRM was that errors
were homoscedastic (equally spread)


2 2
2
) (
) (
o
o
=
=
u E
u Var
The probability density function for Y
i
at two
levels of family income, X
i
, are identical.

Homoscedasticity Case
.
.
x
1
i
x
1
1
=80
x
1
3
=10
0
f(Y
i
)
income
Var(u
i
) = E(u
i
2
)= o
2

.
x
1
2
=90
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
x
i
y
i
0
Homoscedastic pattern of errors
The scattered points spread out quite equally
The variance of Y
i
increases as family income,
X
i
, increases.
Heteroscedasticity Case
.
x
1
x
1
1

x
1
2

f(Y
i
)
x
1
3

.
.
income
Var(u
i
) = E(u
i
2
)= o
i
2

Heteroscedastic pattern of errors
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
. .
.
.
.
.
.
.
x
t
y
t
0
The scattered points spread out quite unequally
Definition of Heteroscedasticity:

Two-variable regression: Y = |
1
+ |
2
X + u
i
|
2
= = k
i
Y = k
i
(|
1
+ |
2
X + u
i
)
xy
x
2

^
|
2
= |
2
+ Ek
i
u
i
E(|
2
) = |
2
unbiased

^
^
if o
1
2
= o
2
2
= o
3
2
=
i.e., homoscedasticity
o
2
E x
i
2

Var (|
2
) =
^

if o
1
2
= o
2
2
= o
3
2
=
i.e., heteroscedasticity
Var (|
2
) =

^
Var(u
i
) = E(u
i
2
) = o
i
2
= o
2


=1
i i
X k

= 0
i
k
^
E x
i
2
o
i
2

(E x
i
2
)
2

=
Consequences of heteroscedasticity
1. OLS estimators are still linear and unbiased
2. Var( |
i
)s are not minimum.
^
=> not the best => not efficiency => not BLUE
4. o
2
= is biased, E(o
2
) = o
2

Eu
i
2

^
^
n-k
^
5. t and F statistics are unreliable. Y = |
0
+ |
1
X + u
Cannot be min.
SEE = o RSS = E u
2
^
^
3. Var ( |
2
) = instead of Var( |
2
) =
Ex
i
o
i
2
(Ex
i
2
)
2

^ ^
o
2

E x
2

Two-variable
case
Detection of heteroscedasticity
1. Graphical method :
plot the estimated residual ( u
i
) or squared (u
i

2
) against the
predicted dependent Variable (Y
i
) or any independent
variable(X
i
).
^
^
^
Observe the graph whether there is a systemic pattern as:
Y
^
u
2

^
Yes,
heteroscedasticity exists
Detection of heteroscedasticity: Graphical method
Y
^
u
2

^
no heteroscedasticity
Y
^
u
2

^
yes
Y
^
u
2

^
yes
Y
^
u
2

^
yes
Y
^
u
2

^
yes
Y
^
u
2

^
yes

You might also like