Professional Documents
Culture Documents
Assumption 6:
No Perfect Multicollinearity: The explanatory variables are not perfectly correlated
with each others. If one variable is a (perfect) linear combination of other variable, then
we have fewer linearly dependent variables than the unknown parameters. Thus, we
cannot find the unique solution.
The following example is the case where X2 is a perfect linear combination of X3 and X4
X 2 = 3X 3 − 4 X 4
Yi = β 0 + β1 X 1 + β 2 X 2 + β 3 X 3 + β 4 X 4
= β 0 + β1 X 1 + β 2 ( 3 X 3 − 4 X 4 ) + β 3 X 3 + β 4 X 4
= β 0 + β 1 X 1 + ( β 3 + 3β 2 ) X 3 + ( β 4 − 4 β 2 ) X 4
Question:
If one variable is a perfect non linear combination of other variable, for instance
X 2 = X 12 , what would be the effect of including both variables in a regression?
Consider an example of tobacco consumption, where we include age and age 2 after
controlling for other variables. If we have linear specification, then we would not have
any problem including both variables in the regression. However, if age and age 2 are
specified in log format, we would not be able to get the result.
Yi = β0 + β1 X 1 + β2 X 2 + β3 X 3
Claim 1:
1. Run the regression of X 1 on all other Xs, get the residuals.
X 1 = α 0 + α 2 X 2 + α3 X 3
X 1 = Xˆ 1 + vˆ1
This process cleans out the effects of other Xs from X 1
2. Run the simple regression of Y1 on v̂1 , resulting the estimate for slope
coefficient β1
Source SS df MS N u m b e r o f 74o b s =
F( 2, 7 1 )13.35
=
Model 173587098 2 86793549.2 Prob > F 0.0000
=
Residual 461478298 71 6499694.33 R-squared 0.2733
=
A d j R - s q u a r e0.2529
d =
Total 635065396 73 8699525.97 Root MSE 2549.4
=
Source SS df MS N u m b e r o f 74o b s =
F( 1, 7 2 )71.41
=
Model 306570.157 1 306570.157 Prob > F 0.0000
=
Residual 309117.303 72 4293.29587 R-squared 0.4979
=
A d j R - s q u a r e0.4910
d =
Total 615687.459 73 8434.07479 Root MSE 65.523
=
. predict displacement2
(option xb assumed; fitted values)
. gen v2=displacement-displacement2
. reg price v2
Source SS df MS N u m b e r o f 74o b s =
F( 1, 7 2 ) 4.09
=
Model 34137625.8 1 34137625.8 Prob > F 0.0468
=
Residual 600927770 72 8346219.03 R-squared 0.0538
=
A d j R - s q u a r e0.0406
d =
Total 635065396 73 8699525.97 Root MSE = 2889
Claim 2:
1. Run Y on all other X and get Yˆ and residuals ŵ . This regression cleans both Y
and X 1 from the effect of other Xs
2. Run the simple linear regression of ŵ on v̂1 , resulting the estimate of the slope
coefficient
. reg price mpg
Source SS df MS N u74m b e r o f o b s =
F( 1, 20.26
72) =
M o d e l 139449474 1 139449474 Prob > F 0.0000
=
R e s i d u a l 495615923 72 6883554.48 R-squared 0.2196
=
A d j R - s q u0.2087
ared =
T o t a l 635065396 73 8699525.97 Root MSE 2623.7
=
m p -238.8943
g 53.07669 -4.50 0.000 -344.7008 -133.0879
_ c o n s11253.06 1170.813 9.61 0.000 8919.088 13587.03
. predict price2
(option xb assumed; fitted values)
. gen w2=price-price2
. reg w2 v2
Source SS df MS N u74m b e r o f o b s =
F( 1, 5.33
72) =
M o d e l 34137628.5 1 34137628.5 Prob > F 0.0239
=
R e s i d u a l 461478287 72 6409420.65 R-squared 0.0689
=
A d j R - s q u0.0559
ared =
T o t a l 495615915 73 6789259.11 Root MSE 2531.7
=
Recall that
R =
2 ∑ ( Yi − Y )
2
=1−
∑ Yi − Yˆ (2
)
∑ (Yi − Y ) ∑ (Yi − Y )
2 2
∑(Y ) 2
i − Yˆ
When we add information on the model, is a non decreasing function.
∑(Y −Y )
2
i
This means that even if the variable that we add into the equation is irrelevant /
uncorrelated with dependent variable, the R-squared will not decrease. In the multiple
regression case, standard adjusted R-squared is not appropriate for evaluation the
precision power of our model. R-squared must be adjusted in the multiple regression, i.e.,
taking into account the benefit of adding new information (increased R-squared) and the
reduction in degree of freedom.
∑ Y − Yˆ
2
( i )
R2 =1− n−K
∑ (Yi − Y )
2
n −1
Dummy Variable
Consider the case where we want to see if there is a discrimination against gender in
wage.
wage i = β0 + β1Gender i + β 2 edu i + β3 exp i + ε i
1 if man
Gender =
0 if woman
(Draw a graph with scatter plot on wages between male and female)
Interaction Effects
wage i = β0 + β1Gender i + β2 edu i + β3 exp i + β4 Gender i exp i + ε i