Multiple Linear Regression II

1/33
EC114 Introduction to Quantitative Economics

18. Multiple Linear Regression II
Marcus Chambers
Department of Economics
University of Essex
6/8 March 2012
EC114 Introduction to Quantitative Economics 18. Multiple Linear Regression II
2/33
Outline
1
Multicollinearity
2
Inference: F-tests
Reference: R. L. Thomas, Using Statistics in Economics,
McGraw-Hill, 2005, sections 13.3 and 13.4.
Multicollinearity 3/33
Recall Assumption ID of the Classical multiple regression
model:
ID (no collinearity): There exist no exact linear relationships
between the sample values of any two
or more of the explanatory variables.
If this assumption is not satised, then the OLS estimation
procedure itself will break down.
To see why this is the case, suppose we wish to estimate a
population regression equation with just two explanatory
variables,
E(Y) =
1
+
2
X
2
+
3
X
3
.
Furthermore, suppose that estimation is attempted using
sample data in which the linear relationship
X
2i
= 5 + 2X
3i
holds exactly for every sample observation.
If this is true then Assumption ID has clearly been violated.
Now let us assume, for the moment, that an OLS sample
regression equation were to exist that minimizes the sum
of squared residuals, SSR =
e
2
i
, and has the form
Y = 6 + 12X
2
+ 8X
3
.
It may appear that the numbers 6, 12 and 8 must represent
the OLS estimates of
1
,
2
and
3
respectively.
But there is a problem. . .
We can write the sample regression as
Y = 6 + 2X
2
+ 10X
2
+ 8X
3
where we have used 12X
2
= 2X
2
+ 10X
2
.
Since, for all sample observations, we have X
2
= 5 + 2X
3
,
this can be written as
Y = 6 + 2X
2
+ 10(5 + 2X
3
) + 8X
3
= 56 + 2X
2
+ 28X
3
.
We therefore have two different representations of the
sample regression line, with apparently different estimated
coefcients on X
2
and X
3
, derived from the same minimum
of SSR.
This suggests that when regressors are collinear then
there is not a unique minimum to the sum of squared
residuals, SSR.
Hence there is no unique solution to our OLS minimisation
problem and no unique set of OLS estimators.
Indeed, given that X
2i
= 5 + 2X
3i
holds for all sample
observations, it is easy to construct any number of sample
regressions all giving the same minimum for
e
2
i
.
For example, 12X
2
= 6X
2
+ 6X
2
and so
Y = 6 + 6X
2
+ 6(5 + 2X
3
) + 8X
3
= 36 + 6X
2
+ 20X
3
.
In fact, there must be an innite number of sets of values
for the estimated b
j
all giving the same minimum SSR.
When an exact linear relationship between explanatory
variables exists, the situation is referred to as one of
complete or perfect multicollinearity.
It implies a situation of complete uncertainty regarding the
values of
1
,
2
and
3
in the population regression.
We can have no idea whether to estimate them by, for
example, 6, 12 and 8, or 56, 2 and 28, or 36, 6 and 20.
Such complete multicollinearity almost never occurs in
practice (unless you construct new variables by combining
existing ones!)
But sometimes, particularly with time-series, there can be
an approximate linear relationship (rather than an exact
one) between two or more of the explanatory variables in a
multiple regression equation.
This situation can cause serious estimation problems, and
the closer the approximation is to a linear relationship, the
more serious these problems tend to become.
Suppose that when estimating our model we nd that the
relationship X
2i
= 5 + 2X
3i
holds not exactly but merely
approximately for all sample observations.
In this case there will be a unique set of estimates b
1
, b
2
and b
3
that minimize
e
2
i
.
That is, the OLS method does not break down completely
and there will be a unique sample regression equation.
But the problem now is that there are many sets of such
estimates b
1
, b
2
and b
3
(i.e., many sample regressions), all
with residual sums of squares not equal to, but very close,
to the minimum
e
2
i
yielded by the OLS regression
equation.
In such a situation, we will lack condence in, and be
uncertain about, the precision of the OLS estimates
obtained because there are so many other sets of values
for b
1
, b
2
and b
3
which are almost as good.
In fact, the closer is the approximation to a linear
relationship between the regressors, the higher is the
degree of the multicollinearity present and, other things
being equal, the greater is the extent of our uncertainty
about the true population values
1
,
2
and
3
.
Thus, while complete multicollinearity results in complete
uncertainty about population values, a high degree of
multicollinearity leads to a high degree of uncertainty about
population values.
Example. To illustrate what can happen under collinearity
lets return to the money demand data set on 30 countries
in 1985.
We have already considered the regression
Y =
1
+
2
X
2
+
3
X
3
+
4
X
4
+
where
Y: money stock;
X
2
: GDP;
X
3
: interest rate;
X
4
: rate of price ination.
The Stata output for this regression is as follows:
. regress m g ir pi
Source | SS df MS Number of obs = 30
-------------+------------------------------ F( 3, 26) = 30.70
Model | 20.5893701 3 6.86312337 Prob > F = 0.0000
Residual | 5.81286631 26 .223571781 R-squared = 0.7798
-------------+------------------------------ Adj R-squared = 0.7544
Total | 26.4022364 29 .910421946 Root MSE = .47283
------------------------------------------------------------------------------
m | Coef. Std. Err. t P>|t| [95% Conf. Interval]
-------------+----------------------------------------------------------------
g | .1703745 .0189433 8.99 0.000 .1314361 .2093129
ir | -.0001693 .0012483 -0.14 0.893 -.0027353 .0023967
pi | -.002197 .0037733 -0.58 0.565 -.0099531 .0055592
_cons | .0893538 .1388419 0.64 0.525 -.1960399 .3747475
------------------------------------------------------------------------------
Now suppose we construct a new variable, X
5
, which is the
real rate of interest (i.e. the difference between the nominal
rate of interest and the rate of ination):
X
5
= X
3
X
4
;
this holds at all points in the sample.
This is the Stata output:
. regress m g ir pi realir
note: realir omitted because of collinearity
-------------+------------------------------ F( 3, 26) = 30.70
Model | 20.5893701 3 6.86312337 Prob > F = 0.0000
-------------+------------------------------ Adj R-squared = 0.7544
Total | 26.4022364 29 .910421946 Root MSE = .47283
------------------------------------------------------------------------------
-------------+----------------------------------------------------------------
g | .1703745 .0189433 8.99 0.000 .1314361 .2093129
ir | -.0001693 .0012483 -0.14 0.893 -.0027353 .0023967
pi | -.002197 .0037733 -0.58 0.565 -.0099531 .0055592
realir | (omitted)
_cons | .0893538 .1388419 0.64 0.525 -.1960399 .3747475
------------------------------------------------------------------------------
Stata has realised that the variables are collinear and has
omitted the real interest rate!
The preceeding example was a very obvious one in which
a new variable was created from existing variables and the
software was able to spot the collinearity.
In cases where variables show trends over time such
perfect collinearity is not evident but there are likely to be
high correlations between the variables and we might have
a case of approximate collinearity.
In this situation the software will carry out the regression,
but what might be the implications?
Multicollinearity will often result in large standard errors for
the estimated coefcients, reecting the uncertainty that
we have mentioned above.
The regression R
2
may also look reasonable suggesting
that the regressors have a high degree of explanatory
power.
But beware of mistaking insignicance with multicollinearity
a large standard error can indicate that a variable is an
insignicant determinant of Y (yielding an insignicant
t-ratio) rather than the regression suffering from
multicollinearity!
There is no formal statistical test for (the absence of)
multicollinearity, but it is something that we must be aware
of when using regression analysis.
Example. Consider the following two OLS regression
results, obtained from two different countries:
Y = 0.356 + 0.765 X
2
+ 1.342 X
3
, R
2
= 0.854;
(3.12) (0.431) (0.654)
Y = 0.644 + 1.112 X
2
+ 0.943 X
3
, R
2
= 0.187.
(2.76) (0.644) (0.482)
Figures in parentheses are t-ratios.
For each regression result consider the extent to which X
2
and X
3
really inuence Y (the interpretation of the variables
themselves is not important).
Consider the rst regression:
Y = 0.356 + 0.765 X
2
+ 1.342 X
3
, R
2
= 0.854;
(3.12) (0.431) (0.654)
Here, we observe that the variables X
2
and X
3
have low
t-ratios and as a consequence it seems they do not have
individual explanatory power.
However, we observe a considerably high R
2
of 0.854.
This implies that 85.4% of the total variation in Y can be
attributed to the variation in X
2
and X
3
and, hence, that the
two variables seem to have explanatory power for Y.
We should suspect a problem of multicollinearity which is
masking the actual explanatory power of X
2
and/or X
3
as
determinants of Y.
Now consider the second regression:
Y = 0.644 + 1.112 X
2
+ 0.943 X
3
, R
2
= 0.187.
(2.76) (0.644) (0.482)
Here, the variables are also not signicant (they have low
t-ratios) which means we cannot reject the null hypotheses
of the coefcients being equal to zero, which seems to
suggest that they do not explain the variation in Y.
The R
2
is also very low (18.7%), which means that the joint
explanatory power of the two variables is very low.
In this case we do not suspect that the low t-ratios
associated with the variables are a result of
multicollinearity, and we should conclude that the variables
X
2
and X
3
do not inuence Y.
Inference: F-tests 19/33
We have seen how to test hypotheses concerning
individual coefcients in the multiple regression model
Y =
1
+
2
X
2
+ . . . +
k
X
k
+ ;
we conduct such tests using t-statistics.
For example, to test H
0
:
j
= 0 against H
A
:
j
= 0 we use
TS =
b
j
s
b
j
t
nk
under H
0
,
where b
j
is the estimated coefcient and s
b
j
its standard
error.
But in Economics we often wish to test hypotheses
concerning more than one parameter. . .
As an example, consider a production function
Y = AX
2
2
X
3
3
,
where Y denotes output and X
2
and X
3
are the inputs
(e.g. capital and labour).
Under constant returns to scale we require
2
+
3
= 1,
and this is a testable hypothesis.
Taking logarithms and adding a disturbance our regression
model would be
ln(Y) =
1
+
2
ln(X
2
) +
3
ln(X
3
) + ,
where
1
= ln(A); we would then want to test
H
0
:
2
+
3
= 1 against H
A
:
2
+
3
= 1.
We shall focus here on:
(i) tests of overall signicance; and
(ii) variable addition/deletion tests.
Tests of overall signicance are concerned with the joint
signicance of all the variables in the regression (excluding
the constant).
In the multiple regression model
Y =
1
+
2
X
2
+ . . . +
k
X
k
+ ,
a test of overall signicance is a test of
H
0
:
2
=
3
= . . . =
k
= 0
against H
A
: at least one of
2
, . . . ,
k
is non-zero.
The test is one of whether X
2
, . . . , X
k
are jointly signicant
determinants of Y.
Note that, under H
0
, Y =
1
+ , because all the other s
are zero.
The test itself is based around the decomposition of the
total sum of squares, SST =
y
2
i
=
(Y
i
Y)
2
, into the
explained sum of squares, SSE, and the residual sum of
squares, SSR =
e
2
i
:
SST = SSE + SSR.
If the regressors were unimportant determinants of Y (as
under H
0
), SSR would be large compared to SSE.
Conversely, if (at least one of) the regressors were
important determinants of Y (as under H
A
), then SSR
would be small compared to SSE.
In order to carry out a statistical test we need to be able to
determine what large and small mean statistically.
The actual test statistic is based on the ratio SSE/SSR;
more precisely,
TS =
SSE/(k 1)
SSR/(n k)
F
k1,nk
under H
0
, where k 1 are the degrees of freedom for the
numerator, and n k are the degrees of freedom for the
denominator.
A graph of a particular F-distribution F
20,20
follows:

We can use tables (e.g. Table A.4 of Thomas) to determine
the appropriate critical values against which to compare TS.
Our test statistic will be signicant at the 5% level if it lies in
the upper 5% of the F
k1,nk
distribution.
Denote the critical value by F
0.05
for a 5% level test.
The test criterion is:
if TS > F
0.05
then reject H
0
in favour of H
A
;
if TS < F
0.05
then do not reject H
0
(reserve judgment).
Example. Lets return to the money-demand example
where we estimated the regression equation
Y =
1
+
2
X
2
+
3
X
3
+
4
X
4
+
where
Y: money stock;
X
2
: GDP;
X
3
: interest rate;
X
4
: rate of price ination.
The Stata output for this regression is as follows:
. regress m g ir pi
-------------+------------------------------ F( 3, 26) = 30.70
Model | 20.5893701 3 6.86312337 Prob > F = 0.0000
-------------+------------------------------ Adj R-squared = 0.7544
Total | 26.4022364 29 .910421946 Root MSE = .47283
------------------------------------------------------------------------------
-------------+----------------------------------------------------------------
g | .1703745 .0189433 8.99 0.000 .1314361 .2093129
ir | -.0001693 .0012483 -0.14 0.893 -.0027353 .0023967
pi | -.002197 .0037733 -0.58 0.565 -.0099531 .0055592
_cons | .0893538 .1388419 0.64 0.525 -.1960399 .3747475
------------------------------------------------------------------------------
We nd that SSE = 20.5894 and SSR = 5.8129 so, with
n = 30 and k = 4, the test statistic is
TS =
20.5894/3
5.8129/26
= 30.6975 F
3,26
under H
0
:
2
=
3
=
4
= 0.
From Table A.4 in Thomas we nd that, with 3 degrees of
freedom (d.f.) for the numerator and 26 d.f. for the
denominator, F
0.05
= 2.9752.
We have TS = 30.6975 > 2.9752 and so we reject H
0
in
favour of H
A
i.e. there is evidence to suggest that at least
one of X
2
, X
3
and X
4
is a signicant determinant of Y.
Note that this test statistic is reported in the second row in
the upper right-hand part of the Stata output:
F( 3, 26) = 30.70.
The next row reports
Prob > F = 0.0000
which means that there is (virtually) nothing to the right of
30.70 in the F
3,26
distribution.
This implies that we clearly reject the null hypothesis at the
5% level!
Variable addition/deletion tests are concerned with the joint
signicance of a subset of variables in the model.
For example, in our money demand regression
Y =
1
+
2
X
2
+
3
X
3
+
4
X
4
+ ,
suppose we want to test the joint signicance of X
3
(interest rate) and X
4
(price ination).
Our null hypothesis is therefore
H
0
:
3
=
4
= 0
and the alternative hypothesis is
H
A
: either
3
= 0 or
4
= 0 or both are non-zero.
Under the null hypothesis Y (money demand) only
depends on X
2
(GDP):
Y =
1
+
2
X
2
+ .
The test statistic compares the values of the sum of
squared residuals, SSR, under the null and alternative
hypotheses.
Let SSR
R
denote the value of SSR under H
0
this is the
restricted model obtained by imposing the two restrictions
3
= 0 and
4
= 0.
Similarly, let SSR
U
denote the value of SSR under H
A

this is the unrestricted model because no restrictions have
been imposed.
In general, suppose h restrictions are imposed under H
0
.
Then the test statistic is
TS =
(SSR
R
SSR
U
)/h
SSR
U
/(n k)
F
h,nk
under H
0
.
This is an upper one-tail test and so the decision rule is:
if TS > F
0.05
then reject H
0
in favour of H
A
;
if TS < F
0.05
then do not reject H
0
(reserve judgment),
where F
0.05
is the 5% critical value from the F-distribution.
Returning to our money demand example, from the earlier
(unrestricted) regression we have SSR
U
= 5.8129.
The restricted regression is:
. regress m g
-------------+------------------------------ F( 1, 28) = 94.88
Model | 20.3862321 1 20.3862321 Prob > F = 0.0000
-------------+------------------------------ Adj R-squared = 0.7640
Total | 26.4022364 29 .910421946 Root MSE = .46353
------------------------------------------------------------------------------
-------------+----------------------------------------------------------------
g | .1748489 .0179502 9.74 0.000 .1380795 .2116182
_cons | .0212579 .1157594 0.18 0.856 -.2158645 .2583803
------------------------------------------------------------------------------
We nd that SSR
R
= 6.0160 so, with h = 2, n = 30 and
k = 4, the test statistic is
TS =
(6.0160 5.8129)/2
5.8129/26
= 0.4542 F
2,26
under H
0
:
3
=
4
= 0.
The 5% critical value from the F
2,26
distribution is
F
0.05
= 3.3690.
We have TS = 0.4542 < 3.3690 and so we do not reject H
0
i.e. there is insufcient evidence to reject the hypothesis
that X
3
(interest rate) and X
4
(price ination) are
insignicant determinants of money demand.
Some further points to note:
Imposing restrictions means that the SSR is greater than
the unrestricted SSR i.e. SSR
R
> SSR
U
.
The test statistic can also be written
TS =
(R
2
U
R
2
R
)/h
(1 R
2
U
)/(n k)
where R
2
U
and R
2
R
are the values of R
2
from the unrestricted
and restricted regressions, respectively.
Summary 33/33
Summary
multicollinearity
inference
Next week:
dummy variables; Chow tests; heteroskedasticity

Multiple Linear Regression II

Uploaded by

Document Information

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Multiple Linear Regression II

Uploaded by

Copyright:

Available Formats

1/33

EC114 Introduction to Quantitative Economics

You might also like