You are on page 1of 4

Assumption of OLS (Continued)

Assumption 6:
No Perfect Multicollinearity: The explanatory variables are not perfectly correlated
with each others. If one variable is a (perfect) linear combination of other variable, then
we have fewer linearly dependent variables than the unknown parameters. Thus, we
cannot find the unique solution.

The following example is the case where X2 is a perfect linear combination of X3 and X4
X 2 = 3X 3 − 4 X 4
Yi = β 0 + β1 X 1 + β 2 X 2 + β 3 X 3 + β 4 X 4
= β 0 + β1 X 1 + β 2 ( 3 X 3 − 4 X 4 ) + β 3 X 3 + β 4 X 4
= β 0 + β 1 X 1 + ( β 3 + 3β 2 ) X 3 + ( β 4 − 4 β 2 ) X 4

Question:
If one variable is a perfect non linear combination of other variable, for instance
X 2 = X 12 , what would be the effect of including both variables in a regression?

Consider an example of tobacco consumption, where we include age and age 2 after
controlling for other variables. If we have linear specification, then we would not have
any problem including both variables in the regression. However, if age and age 2 are
specified in log format, we would not be able to get the result.

One thing to Note:


Maddala (2001): high inter-correlation among the explanatory variables are neither
necessary nor sufficient to cause the multicollinearity problem. In practice,
multicollinearity is sensitive to the addition or deletion of observations.

(Explain the meaning of quadratic specification in tobacco consumption model)

Residual Interpretation of Multiple Regression Estimates


Consider the following regression

Yi = β0 + β1 X 1 + β2 X 2 + β3 X 3

β1 is interpreted as the effect of X 1 on Y after controlling for other variables. In other


words, β1 is the effect of X 1 on Y when other explanatory variables remain
unchanged. However, in most observational studies, we rarely find that other variables
unchanged. In this case, how can we proof that β1 is the effect of X 1 on Y when other
variables remain constant?

Claim 1:
1. Run the regression of X 1 on all other Xs, get the residuals.
X 1 = α 0 + α 2 X 2 + α3 X 3
X 1 = Xˆ 1 + vˆ1
This process cleans out the effects of other Xs from X 1
2. Run the simple regression of Y1 on v̂1 , resulting the estimate for slope
coefficient β1

. reg price displacement mpg

Source SS df MS N u m b e r o f 74o b s =
F( 2, 7 1 )13.35
=
Model 173587098 2 86793549.2 Prob > F 0.0000
=
Residual 461478298 71 6499694.33 R-squared 0.2733
=
A d j R - s q u a r e0.2529
d =
Total 635065396 73 8699525.97 Root MSE 2549.4
=

price Coef. Std. Err. t P>|t| [95% Conf. Interval]

d i s p l a c e m e n t 10.50885 4.58548 2.29 0.025 1.365658 19.65203


m p g -121.1833 72.78844 -1.66 0.100 -266.3193 23.95276
_ c o n s 6672.766 2299.72 2.90 0.005 2087.254 11258.28

. reg displacement mpg

Source SS df MS N u m b e r o f 74o b s =
F( 1, 7 2 )71.41
=
Model 306570.157 1 306570.157 Prob > F 0.0000
=
Residual 309117.303 72 4293.29587 R-squared 0.4979
=
A d j R - s q u a r e0.4910
d =
Total 615687.459 73 8434.07479 Root MSE 65.523
=

displacement Coef. Std. Err. t P>|t| [95% Conf. Interval]

m p g -11.20114 1.32554 -8.45 0.000 -13.84356 -8.558728


_ c o n s 435.8514 29.23994 14.91 0.000 377.5626 494.1401

. predict displacement2
(option xb assumed; fitted values)

. gen v2=displacement-displacement2

. reg price v2

Source SS df MS N u m b e r o f 74o b s =
F( 1, 7 2 ) 4.09
=
Model 34137625.8 1 34137625.8 Prob > F 0.0468
=
Residual 600927770 72 8346219.03 R-squared 0.0538
=
A d j R - s q u a r e0.0406
d =
Total 635065396 73 8699525.97 Root MSE = 2889

price Coef. Std. Err. t P>|t| [95% Conf. Interval]

v2 10.50885 5.196168 2.02 0.047 .1504726 20.86722


_cons 6165.257 335.8374 18.36 0.000 5495.777 6834.736

Claim 2:
1. Run Y on all other X and get Yˆ and residuals ŵ . This regression cleans both Y
and X 1 from the effect of other Xs

2. Run the simple linear regression of ŵ on v̂1 , resulting the estimate of the slope
coefficient
. reg price mpg

Source SS df MS N u74m b e r o f o b s =
F( 1, 20.26
72) =
M o d e l 139449474 1 139449474 Prob > F 0.0000
=
R e s i d u a l 495615923 72 6883554.48 R-squared 0.2196
=
A d j R - s q u0.2087
ared =
T o t a l 635065396 73 8699525.97 Root MSE 2623.7
=

price Coef. Std. Err. t P>|t| [95% Conf. Interval]

m p -238.8943
g 53.07669 -4.50 0.000 -344.7008 -133.0879
_ c o n s11253.06 1170.813 9.61 0.000 8919.088 13587.03

. predict price2
(option xb assumed; fitted values)

. gen w2=price-price2

. reg w2 v2

Source SS df MS N u74m b e r o f o b s =
F( 1, 5.33
72) =
M o d e l 34137628.5 1 34137628.5 Prob > F 0.0239
=
R e s i d u a l 461478287 72 6409420.65 R-squared 0.0689
=
A d j R - s q u0.0559
ared =
T o t a l 495615915 73 6789259.11 Root MSE 2531.7
=

w2 Coef. Std. Err. t P>|t| [95% Conf. Interval]

v 210.50885 4.553525 2.31 0.024 1.431559 19.58613


_ c o n s.0000148 294.3022 0.00 1.000 -586.6807 586.6808

R-Squared in Multiple Regression

Recall that

R =
2 ∑ ( Yi − Y )
2

=1−
∑ Yi − Yˆ (2
)
∑ (Yi − Y ) ∑ (Yi − Y )
2 2

∑(Y ) 2
i − Yˆ
When we add information on the model, is a non decreasing function.
∑(Y −Y )
2
i

This means that even if the variable that we add into the equation is irrelevant /
uncorrelated with dependent variable, the R-squared will not decrease. In the multiple
regression case, standard adjusted R-squared is not appropriate for evaluation the
precision power of our model. R-squared must be adjusted in the multiple regression, i.e.,
taking into account the benefit of adding new information (increased R-squared) and the
reduction in degree of freedom.
∑ Y − Yˆ
2
( i )
R2 =1− n−K
∑ (Yi − Y )
2

n −1
Dummy Variable
Consider the case where we want to see if there is a discrimination against gender in
wage.
wage i = β0 + β1Gender i + β 2 edu i + β3 exp i + ε i
1 if man
Gender = 
0 if woman

Intercept for men β 0 + β1 (1) = β 0 + β1

Intercept for women β 0 + β1 ( 0 ) = β 0

(Draw a graph with scatter plot on wages between male and female)

Interaction Effects
wage i = β0 + β1Gender i + β2 edu i + β3 exp i + β4 Gender i exp i + ε i

Slope education coefficient for men β 3 + β 4 (1) = β 3 + β 4

Slope education coefficient for women β3 + β 4 ( 0) = β3

Dummy variables for more than two groups


The rule of thumb to determine the number of dummy variables = number of group – 1
For instance, if we measure education by the highest completed degree instead of years of
education, then we would have the following categories; (1) no schooling, (2) high
school, (3) college, and (4) graduate). In this case, we have to use three dummy variables.

Educ 1 = 1 if education is high school


= 0 otherwise
Educ 2 = 1 if completed only up to college degree
= 0 otherwise
Educ 3 = 1 if completed graduate degree
= 0 otherwise

You might also like