You are on page 1of 33

Econ107 Applied Econometrics

Topics 2-4: discussed under the classical Assumptions 1-6 (or 1-7 when normality is
needed for finite-sample inference)

Question: what if some of the classical assumptions are violated?

Topic 5: deals with violations of Assumption 1 (A1 hereafter). Topics 6-8: deal with
three cases of violations of the classical assumptions: multicollinearity (A6), serial
correlation (A4), and heteroskedasticity (A5). Questions to be addressed:
• what is the nature of the problem?
• what are the consequences of the problem?
• how is the problem diagnosed?
• how to remedy the problem?

1
6 Multicollinearity (Studenmund, Chapter 8)

6.1 The Nature of Multicollinearity

6.1.1 Perfect Multicollinearity

1. Definition: Perfect multicollinearity exists in the following regression

Yi = β 0 + β 1X1i + · · · + β k Xki + εi, (1)


if there exist a set of parameters λj (j = 0, 1, · · · , k, not all equal to zero) such
that
λ0X0i + λ1X1i + · · · + λk Xki = 0, (2)
where X0i ≡ 1. (2) must hold for all observations.

2
Alternatively, we could write an independent variable as an exact linear combination
of the others, e.g., if λk 6= 0, we can write (2) as
λ0 λ1 λk−1
Xki = − X0i − X1i − · · · − Xk−1,i. (3)
λk λk λk
The last expression says essentially that Xki is redundant and it does not have any
information other than those contained in X0i, X1i, · · · , Xk−1,i to explain Yi.

3
Example. Consider the following regression model for consumption function

C = β 1 + β 2N + β 3S + β 4T + ε,
where C is consumption, N is nonlabor income, S is salary, T is total income, and
ε is error term. Since T = N + S, it is not possible to separate individual effects of
the components (N, S) of income and total income (T ) . According to the model,

E (C) = β 1 + β 2N + β 3S + β 4T.
But if we let c be any nonzero value, and let β 02 = β 2 − c, β 03 = β 3 − c, and
β 04 = β 4 + c, then
E (C) = β 1 + β 02N + β 03S + β 04T
as well for a different set of parameters. This allows the same value of E (C) for
many different values of the parameters.

4
2. Problems
(1) Coefficients can’t be estimated. Consider the regression:

Yi = β 0 + β 1X1i + β 2X2i + εi. (4)


If X2i = λX1i (λ 6= 0), we will explain that the parameters β 1 and β 2 cannot be
identified or estimated. To see why, define

β ∗1 = β 1 + cλ, and β ∗2 = β 2 − c,
where c can be any constant. (4) is equivalent to

Yi = β 0 + β ∗1X1i + β ∗2X2i + εi. (5)


This means that there are an infinite number of c’s that makes (5) hold. In other
words, there are an infinite number of (β ∗1, β ∗2) such that (5) holds. We cannot
separate the influence of X1i from that of X2i on Yi.
The above analysis extends to a generic MLR model where a regressor can be written
as a linear combination of others.

5
(2) Standard errors can’t be estimated. In the regression model (4), the standard
error of β̂ 1 can be written:
v
u
u
u σ2
std(β̂ 1) = t P ³ ´2 ³ ´,
n 2
1 − r12
i=1 X1i − X
where r12 is the sample correlation between X1i and X2i. In the case of perfect
multicollinearity (e.g., X2i = λX1i + a), r12 = 1 or −1, so that the denominator
is zero. Thus, std(β̂ 1)= ∞.

Solution:
The solution to perfect multicollinearity is trivial: Drop one or several of the regres-
sors. In the above example, we can drop either X2i or X1i so that (4) can be written
as
Yi = β 0 + (β 1 + λβ 2) X1i + εi,
or
Yi = β 0 + (β 2 + β 1/λ) X2i + εi.

6
By regressing Yi on X1i, we are estimating β 1 + λβ 2. Analogously, by regressing Yi
on X2i, we are estimating β 2 + β 1/λ. In either case, we cannot estimate β 1 or β 2.

Remarks.
Perfect multicollinearity is fairy easy to avoid. Econometricians almost never talk
about perfect multicollinearity. Instead, when we use the word multicollinearity, we
are really talking about severe imperfect multicollinearity.

7
6.1.2 Imperfect Multicollinearity

Imperfect multicollinearity can be defined as a linear functional relationship between


two or more independent variables that is so strong that it can significantly affect
the estimation of the coefficients of the variables.

Definition: Imperfect multicollinearity exists in a k-variate regression if

λ0X0i + λ1X1i + · · · + λk Xki + vi = 0


for some stochastic variable vi.

Remarks.

1) As Var(vi)→0, imperfect multicollinearity tends to perfect multicollinearity.

8
2) Alternatively, we could write any particular independent variable as an “almost”
exact linear function of the others. E.g., if λk 6= 0, then
λ λ λ v
Xki = − 0 X0i − 1 X1i − · · · − k−1 Xk−1,i − i . (6)
λk λk λk λk
The above equation implies that although the relationship between Xki and X0i, X1i, · ·
might be fairly strong, it is not strong enough to allow Xki to be completely explained
by X0i, X1i, · · · , Xk−1,i; some unexplained variation still remains.

3) Imperfect multicollinearity indicates a strong linear relationship between the re-


gressors. The stronger the relationship between the two or more regressors, the more
likely it is that they will be considered significantly multicollinear.

9
6.2 The Consequences of (Imperfect) Multicollinearity

1. Coefficient estimators will remain unbiased. Imperfect multicollinearity does


not violate the classical assumptions. If all the classical assumptions 1-6 are met, we
can estimate the coefficients and the estimators of β’s will still be centered around
the true value of β’s. Moreover, the OLS estimators are still unbiased and are BLUE.

2. The variances/standard errors of the coefficient estimators “blow up”.


They increase with the degree of multicollinearity. Since two or more of the regressors
are significantly related, it becomes too difficult to identify the separate effects of
the multicollinear variables and we are much more likely to make errors in estimating
the coefficients than we were before we encountered multicollinearity. So imperfect
multicollinearity reduces the “precision” of our coefficient estimates.

10
For example, in the 2-variate regression case
v
u
u
u σ2
std(β̂ 1) = t P ³ ´2 ³ ´.
n 2
1 − r12
i=1 X1i − X
As |r12|→1, the standard error → ∞.
2 Pn ³ ´2
Numerical example: Suppose the standard error σ / i=1 X1i − X = 1, i.e.,
std(β̂ 1)=1 when r12 = 0.
If r12=0.10, then the standard error=1.01.
If r12=0.25, then the standard error=1.03.
If r12=0.50, then the standard error=1.15.
If r12=0.75, then the standard error=1.51.
If r12=0.90, then the standard error=2.29.
If r12=0.99, then the standard error=7.09.
Standard error increases at an increasing rate with the multicollinearity between the
explanatory variables. As a result, we will have wider confidence intervals and possibly
insignificant t ratios on our coefficient estimates because t1 = β b /se(βb ).
1 1

11
8

5
std(β1)
^

1
0 0.2 0.4 0.6 0.8 1
r12

Figure 1: A consequence of impefect multicollinearity: the blow up of standard errors

12
3. The computed t-ratios will fall. This means that we’ll have more difficulty in
rejecting the null hypothesis that a slope coefficient is equal to zero.
This problem is closely related to the problem of a “small sample size”. In both cases,
standard errors “blow up”. With a small sample size the denominator is reduced by
the lack of variation in the explanatory variable.

4. Coefficient estimates become very sensitive to the changes in specification


and number of observations.
• The coefficient estimates may be very sensitive to the addition of one or a small
number of observations.
• The coefficient estimates may be sensitive to the deletion of a statistically insignif-
icant variable.
• One may get very odd coefficient estimates possibly with wrong signs due to the
high variance of the estimator.

5. The overall fit of the model will be largely unaffected. Even though the
individual t-ratios are often quite low in the case of imperfect multicollinearity, the
2
overall fit of the equation (R2 or R ) will not fall much.

13
A hypothetical example.
Suppose we want to estimate a student’s consumption function. After some prelim-
inary work, we come up with the following equation

Ci = β 0 + β 1Yd,i + β 2LAi + εi,


where Ci =the annual consumption expenditures of the ith student
Yd,i =the annual disposable income (including gifts) of the ith student
LAi =the liquid asset (cash, savings, etc.) of the ith student.
Please analyze the following regression outputs:

Cbi = −367.83 + 0.5133Yd,i + 0.0427LAi (7)


(1.0307) (0.0492)
t [0.496] [0.868]
2
n = 9, R = 0.835.

14
Cbi = −471.43 + 0.9714Yd,i (8)
(0.157)
t [6.187]
2
n = 9, R = 0.861.
An empirical example: petroleum consumption
Suppose that we are interested in building a cross-sectional model of the demand for
gasoline by state:

Ci = β 0 + β 1Milei + β 2T axi + β 3Regi + εi,


where Ci =the petroleum consumption in the ith state
M ilei =the urban highway miles within the ith state
T axi =the gasoline tax rate in the ith state
Regi =the motor vehicle registrations in the ith state.

15
Please analyze the following regression outputs:

Cbi = 389.0 + 60.8Milei − 36.5T axi − 0.061Regi (9)


(10.3) (13.2) (0.043)
t [5.92] [-2.77] [-1.43]
2
n = 50, R = 0.919.

Cbi = 551.7 − 53.6T axi + 0.186Regi (10)


(16.9) (0.012)
t [-3.18] [15.88]
2
n = 50, R = 0.861.

16
6.3 The Detection of Multicollinearity

It is worth mentioning that multicollinearity exists in almost all equations. It is virtu-


ally impossible in the real world to find a set of independent variables that are totally
uncorrelated with each other. Our purpose is to learn to determine how much mul-
ticollinearity exists by using three general indicators or diagnostic tools.

1. t-ratios versus R2. Look for a ‘high’ R2, but ‘few’ significant t ratios.
Remarks.
(1) Common “rule of thumb”. Can’t reject the null hypotheses that coefficients are
individually equal to zero (t tests), but can reject the null hypothesis that they are
simultaneously equal to zero (F test).
(2) This is not an exact test. What do we mean by “few” significant t ratios, and a
“high” R2? Too imprecise. Also depends on other factors like the sample size.

17
2. Correlation matrix of regressors. Look for high pair-wise correlation coeffi-
cients. Look at the correlation matrix for the regressors.

Remarks.
(1) How high is high? As a rule of thumb, we can use 0.8. If the sample correlation
exceeds 0.8 in absolute value, we should be concerned about multicollinearity.
(2) Multicollinearity refers to a linear relationship among all or some of the regressors.
Any pair of independent variables may not be highly correlated, but one variable may
be a linear function of a number of others. In a 2-variate regression, multicollinearity
is the correlation between the 2 explanatory variables.
(3) This is a “... sufficient, but not a necessary condition for multicollinearity.” In
other words, if you’ve got a high pairwise correlation, you’ve got problems. However,
it isn’t conclusive evidence of an absence of multicollinearity.

18
3. High variance inflation factors (VIFs).

The variance inflation factor (VIF) is a method of detecting the degree of multi-
collinearity by looking at the extent to which a given explanatory variable can be
explained by all other explanatory variables in the equation. So there is a VIF for
each regressor.

Suppose we want to use VIF to detect multicollinearity in the following regression:

Yi = β 0 + β 1X1i + · · · + β k Xki + εi. (11)


b denote the OLS estimator of β in the above regression. We need to calculate
Let β j j
k different VIFs, one for each Xji (j = 1, · · · , k).
1) Run the following k regressions:

X1i = γ 0 + γ 2X2i + · · · + γ k Xk,i + v1i


···
Xki = α0 + α1X1i + · · · + αk−1Xk−1,i + vki

19
2) Calculate the R2 for each of the above k regressions and denote Rj2 as the R2
b is
from the linear regression of Xji on all other regressors in (11). The VIF for β j
defined by
b 1
VIF(β j ) = 2 .
1 − Rj

The higher VIF(β b ), the higher the variance of β b (holding constant the variance of
j j
the error term) and the more severe the effects of multicollinearity.
Remarks.
b ) > 5 for some j , then the multi-
1) How high is high? As a common rule of thumb, if VIF(β j
collinearity is severe.
2) As the number of regressors increases, it makes sense to increase the above number (5) slightly.
3) In Eviews we can calculate the VIF(β b ) after the j th regression (i.e, run X = α +α X +
j ji 0 1 1i
· · · + αj−1Xj−1,i +αj+1Xj+1,i + · · · + αk Xk,i + vki, and name the equation as eqj
after the regression) by typing in the command window

scalar VIFj=1/(1-eqj.@R2)
Summary: No single test for multicollinearity.

20
6.4 Remedies for Multicollinearity

Once we’re convinced that multicollinearity is present, what can we do about it? The
diagnosis of the ailment isn’t clear cut, neither is the treatment. Appropriateness of
the following remedial measures varies from one situation to another.

Example. Estimating the labour supply of married women from 1950 -1999:

Hourst = β 0 + β 1Ww,t + β 2Wm,t + εt, (12)


where: Hourst = Average annual hours of work of married women
Ww,t= Average wage rate for married women
Wm,t = Average wage rate for married men.
Suppose the regression output is
d
Hourst = 733.71 + 48.37Ww,t − 22.91Wm,t
(34.97) (29.01)
n = 50, R2 = 0.847

21
Multicollinearity is a problem here. The t-ratios are less than 1.5 and 1, respectively
(insignificant at 10% levels). Yet, R2 is 0.847. It is easy to confirm multicollinearity
in this case. The correlation between the two wage rates is as high as 0.99 over
our sample period! Standard errors blow up. We can’t separate the wage effects on
labour supply of married women.

Possible Solutions?

22
1. A Priori Information

If we know the relationship between the slope coefficients, we can substitute this
restriction into the regression and eliminate the multicollinearity. This relies heavily
on economic theory.

Example. If we use time series data to estimate the Cobb-Douglass production


function or the elasticity of output (Y ) with respect to the capital (K) and labor
(L) , we may have multicollinearity problem because as time evolves, both K and L
increase and they can be highly correlated. Suppose that we have a constant return to
β β
scale in the Cobb-Douglass production function Yt = AKt 1 Lt 2 eεt (β 1 + β 2 = 1).
We can impose the restriction β 1 + β 2 = 1 in the following regression:

ln Yt = β 0 + β 1 ln Kt + β 2 ln Lt + εt

23
by plugging β 2 = 1 − β 1 into the above equation to obtain

ln Yt = β 0 + β 1 ln Kt + (1 − β 1) ln Lt + εt
⇐⇒
ln Yt − ln Lt = β 0 + β 1 (ln Kt − ln Lt) + εt
⇐⇒
ln (Yt/Lt) = β 0 + β 1 ln (Kt/Lt) + εt.
That is we can estimate β 1 by regressing ln (Yt/Lt) on a constant and ln (Kt/Lt) .
b of β , we can obtain estimate β by
After we obtain estimate β 1 1 2
b =1−β
β b .
2 1

Remarks. Unfortunately, such a priori information is extremely rare.

24
2. Dropping a Variable
In the example of labour supply of married women, suppose we omit the wage of
married men and estimate the following model
Hourst = α0 + α1Ww,t + vt. (13)
In this example, it seems natural to drop the variable Wm,t. In other cases, it may
make no statistical significance which variable is dropped. One has to rely on the
theoretical underpinnings of the model or common sense.

Some cautionary note. Sometimes we have to be careful when we consider dropping


a variable in case of multicollinearity. If one variable should appear in the regression
while we have dropped it, then we will encounter the problem of “omitted variable
bias”. So we are substituting one problem for another. The remedy may be worse
than the disease.
Suppose that Wm,t should appear in (12), then the OLS estimator α b 1 in (13) is
likely to be biased for β 1:
E(α̂1) = β 1 + β 2b12,
where b12 is associated with the correlation between Ww,t and Wm,t.

25
3. Transformation of the Variables
One of the simplest things to do with time series regressions is to run the regression
on the “first differences” data.
Start with the original specification at time t:
Hourst = β 0 + β 1Ww,t + β 2Wm,t + εt, (14)
The same linear relationship holds for the previous period (t − 1) as well:
Hourst−1 = β 0 + β 1Ww,t−1 + β 2Wm,t−1 + εt−1. (15)
Subtracting (15) from (14) yields
³ ´ ³ ´
Hourst−Hourst−1 = β 1 Ww,t − Ww,t−1 +β 2 Wm,t − Wm,t−1 +(εt − εt−1) ,
(16)
or
4Hourst = β 14Ww,t + β 24Wm,t + 4εt, (17)
where e.g., 4Hourst = Hourst − Hourst−1.
The advantage is that “changes” in wage rates may not be as highly correlated as
their “levels”.

26
The disadvantages are:
(i) Number of observations are reduced (i.e., loss of one degree of freedom). The
sample period changes from 1950-1999 to 1951-1999, say.
(ii) May introduce serial correlation. Even if εt are uncorrelated, 4εt are not
because

Cov(4εt, 4εt−1)
= Cov(εt − εt−1, εt−1 − εt−2)
= Cov(εt, εt−1) − Cov(εt, εt−2) − Cov(εt−1, εt−1) + Cov(εt−1, εt−2)
= 0 − 0 − Var (εt−1) + 0 = −Var (εt−1) 6= 0.
Again, the cure may be worse than the disease. It violates one of the classical
assumptions and new problems need to be addressed (in later topic).

27
4. Get More Data. Two possibilities here:
(1) Extend the data set. Multicollinearity is a “sample phenomenon”. Wage rates
may be correlated over the period 1950-1999. Add more years. For example, go
back to 1940. Correlation may be reduced. The problem is that more data may
not be available, or the relationship among the variables may have changed (i.e., the
regression function isn’t stable over time). More likely that the data are not there.
If they are there, why not include them initially?

(2) Change Nature or Source of Data. If possible, we can switch from time-series
to cross-sectional analysis or to panel data analysis.
• The sample correlation in the cross-sectional data is usually different from that in
the time series data.
• The use of panel data potentially reduces the multicollinearity in the total sample.

For example, we can use a random sample of many households at a point in time.
The degree of multicollinearity in wages may be relatively lower ‘between’ spouses.
Or, we can use a random sample of households over a number of years.

28
5. Do Nothing (A Remedy!)
• Multicollinearity is not a problem if the objective of the analysis is forecasting. It
doesn’t affect the overall explanatory power of the regression (i.e., R2).
• It is a problem if the objective is to test the significance of individual coefficients
because of the inflated variances/standard errors.

Multicollinearity is often given too much emphasis in the list of common problems
with regression analysis. If it’s imperfect multicollinearity, which is almost always
going to be the case, then it doesn’t violate the classical assumptions.

29
• Exercise: Q8.11
• Questions for Discussion: Example 8.5.2
Example 8.5.2: Does the Pope’s 1966 decision to allow Catholics to eat meat on
non-Lent Fridays cause a shift in the demand function for fish?
Consider the regression

Ft = β 0 + β 1P Ft + β 2P Bt + β 3 ln Y dt + β 4Nt + β 5Pt + εt,


where Ft : average pounds of fish consumed per capita in year t
P Ft : price index for fish in year t
P Bt : price index for beef in year t
Y dt : real per capita disposable income in year t (in billions of dollars)
Nt : the number of Catholics in the US in year t (tens of thousands)
Pt : =1 after the Pope’s 1966 decision and 0 otherwise
Question 1: State the null and alternative hypotheses to test whether the Pope’s
decision plays a negative role in the consumption of fish.
Question 2: Some economic theory suggests that as income rises, the portion of

30
that extra income devoted to the consumption of fish will decrease. Is the choice of
semilog function to relate the disposable income to the consumption of fish consistent
with this theory?
Question 3: Suppose the regression output is

Ft = −1.99 + 0.0395P Ft-0.000777P Bt + 1.770 ln Y dt-0.000031Nt-0.355Pt


(0.031) (0.0202) (1.87) (0.000033) (0.353)
t : [1.27] [-0.0384] [0.945] (-0.958) (-1.01)
2
R2 = 0.736, R = 0.666, n = 25.
Evaluate the above regression results.
Question 4: Are there any signs of multicollinearity in the above regression model?
How do you check for this by using simple correlation coefficients? What is the
drawback of this approach? [Hint. To detect multicollinearity with simple correlation
coefficients: After you run the regression, select Procs/Make Regressor group on
the equation window menu bar. Select View/Correlation/Common Sample on
the group object menu bar.]

31
Question 5: How do you check the presence of multicollinearity by using the VIF?
Verify that the VIF for P F t and ln Y dt, is about 43.4 and 23.3, respectively. What
does this suggest to us?
Question 6: Given the high correlation between ln Y dt and Nt, it is reasonable to
drop one of them. Given that the logic behind including the number of Catholics in
a per capita fish consumption equation is fairly weak, we can decide to drop Nt :

Ft = 7.961+0.028P Ft+0.0047P Bt+0.360 ln Y dt-0.124Pt


(0.03) (0.019) (1.15) (0.26)
t : [0.98] [0.24] [0.31] (-0.48)
2
R2 = 0.723, R = 0.667, n = 25.
Does this solve the problem?
Question 7: In the case of prices, both P F t and P B t are theoretically important,
so it is not advisable to drop either one. As an alternative, the textbook author
suggests to use RPt = P Ft/P Bt to replace both price variables. Does it make any

32
sense to do so? If so, what is the expected sign of RPt? The regression output now
becomes
Ft = −5.17-1.93RPt+2.71 ln Y dt+0.0052Pt
(1.43) (0.66) (0.281)
t : [-1.35] [4.13] [0.019]
2
R2 = 0.640, R = 0.588, n = 25.
Question 8: Based on the last regression output, can you reject the null hypothesis
in Question 1?
Remark. To calculate VIF(β b ) (j =1,· · · , k) in Eviews:
j
Step 1: Run the regression of Xji on (1, X1i, · · · , Xj−1,i, Xj+1,i, · · · , Xki) and
name the equation as eqj, for example.
Step 2: In the command window type
scalar vifj=1/(1-eqj.@r2)
or genr vifj=1/(1-eqj.@r2). The former generate a scalar value for vifj whereas the
latter generates a sequence values for vifj.

33

You might also like