You are on page 1of 21

Intro to Multicollinearity.

xls
This workbook demonstrates perfect and near multicollinearity between two independent variables.
It uses a subset of the data from Multireg.xls
Both the Example1 and the Example2 sheets demonstrate perfect multicollinearity.
In Example1, one variable is directly proportional to a second variable.
In Example2, there is a slightly more complicated linear relationship between the two X variables.
The Table sheet uses the data from Example 1 to graphically demonstrate that many different combinations of
b1 and b2 find the exact same, minimum sum of squared residuals.
The NearMulti sheet demonstrates a case of near multicollinearity.
The Q&A sheet has questions pertaining to multicollinearity.

Excel 2003's LINEST function finds one solution, but doesn't directly warn the user about the multicollinearity.

The Data Set, Summary Statistics, and Bivariate Regression Analyses


Price

Income

Quantity
Demanded

50

10

10.3

50

10

11.8

60
60

12
12

11.5
12.6

LinEst QD= f(Price)


Slope

Intercept

18.0
16.0

Coefficients

0.146

3.405

R-Squared

0.067
0.439

4.433
2.126 RMSE

14.0
12.0
10.0
8.0

70
70
80
80

14
16.0
14
9.3
16
15.6
16
15.8
Summary Statistics
Price
(cents /gal.)

Mean
SD

Income
($1000s/
person)

65.00
11.95

Scroll Right for Multivariate Analyses

13.00
2.39

Reg SS

4.686
21.170

6 df
27.109 SSR

6.0
4.0
2.0
0.0
45

QD (100s
gals. /Month)

LinEst Q = f(Income)
D

Slope
Intercept
12.86
0.728
3.405
2.63 Coefficients
0.336
4.433
R-Squared
0.439
2.126 RMSE
4.686
6 df
Reg SS
21.170 27.109 SSR

Q=f

18.0
16.0
14.0
12.0
10.0
8.0
6.0
4.0
2.0
0.0
9

10

Multivariate Analysis: A Pivot Table


Q=f(P): A Demand Curve?

Average of Quantity Dema Income

18.0
16.0

Price

14.0

10

f(x) = 0.15x + 3.41

50
60

12.0
10.0

11.05

8.0
6.0

70
80

4.0
2.0
45

50

55

60

65

70

75

80

85

Q=f(Income): An Engel Curve?

18.0
16.0
14.0

f(x) = 0.73x + 3.40

12.0
10.0
8.0

In c o m e ( $ 1 0 0 0 p .h ./p e r y e a r )

Grand Total

0.0

Price and Income


18

6.0

16

4.0

14

2.0

12

0.0

10

10

11

12

13

14

15

16

17

11.05

f (x) = 0.2x + 5.02429586778808E-015

8
6
4
2
0
45

50

55

60

65

70

Pr ice (ce nts /gallon)

75

s: A Pivot Table

Multivariate Analysis: Multiple Regression

Change b1 and then use Solver to find a least squares sol


b0 (Intercept)
12

14

16 Grand Total
11.05
12.05

12.05

12.65
12.05

15.7
15.7

12.65

rice and Income

12.65
15.7
12.8625

3.405

b1 (Price)
b2 (Income)

1
-4.273

Price of
Heating Oil
50
50
60
60

Income
Per
Capita
10
10
12
12

Quantity
Demanded
of Heating
Oil
10.3
11.8
11.5
12.6

70

14

16.0

70
80
80

14
16
16

9.3
15.6
15.8

2x + 5.02429586778808E-015

Excel's Output In this Example Depends on the Versio

60

65

70

75

80

85

Pr ice (ce nts /gallon)

The bolded zeros tell the user that


Excel has "zeroed out" Income.

The Cause of Multicollinearity: A Perfec


Price and Income in the Data Set

ver to find a least squares solution

SSR

27.109

b2

LinEst Output
b1

1.3E+014
Predicte
Squared
dY
Residual Residual
10.68
-0.38
0.14
10.68
1.12
1.25
12.14
-0.64
0.40
12.14
0.46
0.22

3.0E+015
0.441
1.974
21.401

###

b0
3.6125

###
###
2.328 #N/A
5 #N/A
27.099 #N/A

In c o m e ( $ 1 0 0 0 p .c ./p e r y e a r )

Multiple Regression

Price and Income


18
16

f (x) = 0.2x + 5.02429586778808E-015

14
12
10
8
6

13.59

2.41

5.81

13.59
15.04
15.04

-4.29
0.56
0.76

18.40
0.31
0.57

mple Depends on the Version

Excel 2000 or less can't deal with


multicollinearity.

0
45

50

55

60

65

70

Pr ice (ce nts /gallon)

75

ulticollinearity: A Perfectly Linear Relationship Between


me in the Data Set

Price and Income

) = 0.2x + 5.02429586778808E-015

55

60

65

70

Pr ice (ce nts /gallon)

75

80

85

Table

b0 (Intercept)

3.405

b1 (Price)

b2 (Income)

-9.27

2,500,000
2,000,000

Income
10
10
12
12
14
14
16
16

Y
10.3
11.8
11.5
12.6
16.0
9.3
15.6
15.8

Fitted Y
10.7
10.7
12.1
12.1
13.6
13.6
15.0
15.0

Residuals
-0.4
1.1
-0.6
0.5
2.4
-4.3
0.6
0.8
SSR

b2 (Income)

b1 (Price)

-2
-1
0
1
2
3
4
5
6

intercept

1
2
3
4
5

1,000,000
500,000
0

b2 Income

11
27
34,827
139,227
313,227
556,827
870,027
1,252,827
1,705,227
2,227,227

6
34,827
27
34,827
139,227
313,227
556,827
870,027
1,252,827
1,705,227

1
139,227
34,827
27
34,827
139,227
313,227
556,827
870,027
1,252,827

-4
313,227
139,227
34,827
27
34,827
139,227
313,227
556,827
870,027

-9
556,827
313,227
139,227
34,827
27
34,827
139,227
313,227
556,827

-14
870,027
556,827
313,227
139,227
34,827
27
34,827
139,227
313,227

-19
1,252,827
870,027
556,827
313,227
139,227
34,827
27
34,827
139,227

-2

-1

3.4 slope

1,500,000

-2
-1

Price
50
50
60
60
70
70
80
80

Squared
Residuals
0.1
1.3
0.4
0.2
5.8
18.4
0.3
0.6
27.1085

y
fitted line
87.31525263
18.4
133.3523188
33.4
115.3881732
48.4
89.51438906
63.4
183.0803842
78.4

15
residuals
68.91525263
99.95231878
66.9881732
26.11438906
104.6803842

residuals
squared
4749.312046
9990.46603
4487.415349
681.9613161
10957.98284

Page 7

Table

6
7
8
9
10
11
12
13
14

94.71872612
94.87461567
225.8659367
91.62931046
153.7234328
180.0987084
211.6802011
173.5201145
203.3097495

93.4
108.4
123.4
138.4
153.4
168.4
183.4
198.4
213.4

1.318726119
-13.5253843
102.4659367
-46.7706895
0.323432778
11.69870836
28.28020107
-24.8798855
-10.0902505
SUM

1.739038577
182.9360212
10499.26818
2187.4974
0.104608762
136.8597773
799.7697726
619.0087033
101.8131551
45396.13423

Page 8

Table

2,500,000
2,000,000
1,500,000
1,000,000
500,000

-2
-1
0
1
2
3
4
5
6

ome

b1 Price

-24
1,705,227
1,252,827
870,027
556,827
313,227
139,227
34,827
27
34,827

-29
2,227,227
1,705,227
1,252,827
870,027
556,827
313,227
139,227
34,827
27

Page 9

Table

Page 10

The Data Set, Summary Statistics, and Bivariate Regression Analyses


Price

Income

Quantity
Demanded

50

7.1

50

5.7

Coefficients

0.265

60
60

11
11

5.7
6.6

R-Squared

0.067
0.721

70
70
80
80

13
13
15
15

13.2
9.7
11.3
15.6

Summary Statistics

Mean
SD

Price
(cents /gal.)
65.00
11.95

Income
($1000s/
person)
12.00
2.39

LinEst QD= f(Price)


Slope

Reg SS

15.479
69.960

Intercept
-7.830
4.434
2.126 RMSE

6 df
27.119 SSR

Q=f(P): A De
18.0
16.0
14.0
12.0
10.0
8.0
6.0
4.0
2.0
0.0

f(x) = 0.2

45

LinEst QD= f(Income)


QD (100s
gals.
Slope
Intercept
/Month)
1.323
-6.508
9.36 Coefficients
0.336
4.103
3.72
R-Squared
0.721
2.126 RMSE
15.479
6 df
Reg SS
69.960 27.119 SSR

50

55

Q=f(Income):

18.0
16.0
14.0

f(x) = 1.32

12.0
10.0
8.0
6.0
4.0
2.0
0.0
8

10

Multivariate Analysis: A Pivot Table


Q=f(P): A Demand Curve?

Average of Quantity Demanded

18.0
16.0
14.0
12.0
10.0
8.0
6.0
4.0
2.0
0.0

Price

9
50
60

f(x) = 0.26x - 7.83

Income
11.05

70
80
45

50

55

60

65

70

75

80

85

15

16

Q=f(Income): An Engel Curve?

18.0
16.0
14.0

f(x) = 1.32x - 6.51

12.0
10.0
8.0
6.0
4.0
2.0
0.0
8

10

11

12

13

14

Grand Total

11.05

ivot Table

Multivariate Analysis: Multiple Re

Change b1 and then use Solver to find a le


b0 (Intercept)
11

13

15 Grand Total
11.05
12.05

12.05

12.65
12.05

12.65

15.7
15.7

12.65
15.7
12.8625

b1 (Price)
b2 (Income)

Price of
Heating Oil
50
50
60
60

70
70
80
80
LINEST
Coefficients
Estimated SE
R-Squared
Reg SS

The Cause of Multicollinear


Price and Income in the Da

ange b1 and then use Solver to find a least squares solution


-26.507
4.000
-18.678

SSR

Income Per
Capita
9
9
11

Quantity
Demanded
of Heating
Oil
7.1
5.7
5.7

11

6.6

Predicted Y Residual
5.40
1.70
5.40
0.30
8.04
-2.34
8.04

-1.44

27.119

Squared
Residual
2.91
0.09
5.48
2.07

In c o m e ( $ 1 0 0 0 p .c ./p e r y e a r )

ultivariate Analysis: Multiple Regression

Price and I
16
f (x) = 0.2x - 1

14
12
10
8
6
4
2

13
13
15
15

13.2
9.7
11.3
15.6

10.68
10.68
13.33
13.33

2.52
-0.98
-2.03
2.27

6.33
0.97
4.12
5.15

0
45

50

55

60

Pr ice (ce n

LINEST results de
Excel 2003
b2
###
###
0.710
6.135210688
64.833

b1

Intercept
###
###
###
###
RMSE
2.299 #N/A
df
5 #N/A
SSR
26.418 #N/A
Income is zeroed out.
Earlier Versions of Excel

LINEST gives no explicit warning th

use of Multicollinearity: A Perfectly Linear Relationship Between


nd Income in the Data Set

Price and Income

f (x) = 0.2x - 1

45

50

55

60

65

70

75

80

85

Pr ice (ce nts /gallon)

LINEST results depend on the version of Excel

s zeroed out.
ersions of Excel

gives no explicit warning that there is a problem.

NearMulti

Near-multicollinearity and the 3-D Graph of the Sum of Squared Residuals


b0 (Intercept)

4.3896

b1 (Price)

-2.1856

b2 (Income)

11.5689

Price
50
50
60
60
70
70
80
80

Income
10
10
12
12
14
14
16
16.1

Y
10.3
11.8
11.5
12.6
16.0
9.3
15.6
15.8

Fitted Y
10.8
10.8
12.1
12.1
13.4
13.4
14.6
15.8

Residuals
-0.5
1.0
-0.6
0.5
2.6
-4.1
1.0
0.0
SSR

Squared
Residuals
0.2
1.0
0.3
0.3
7.0
16.5
0.9
0.0
26.232

b1 Price

b2 (Income)

b1 (Price)

-4.2
-3.2
-2.2
-1.2
-0.2
0.8
1.8

22
27
34,987
139,546
313,705
557,464
870,824
1,253,783

17
34,747
26
34,906
139,385
313,464
557,144
870,423

12
139,228
34,827
26
34,826
139,225
313,224
556,823

7
313,469
139,388
34,907
26
34,746
139,065
312,984

2
557,470
313,709
139,549
34,988
27
34,667
138,906

-3
871,232
557,791
313,951
139,710
35,069
28
34,588

We used the data below to construct the 3D graph of SSR's.


-4.18564372 -3.18564372 -2.18564372 -1.18564372 -0.18564372 0.814356276

Page 16

2
-8 -3

NearMulti

intercept

3.4 slope

x
1
2
3
4
5
6
7
8
9
10
11
12
13
14

y
fitted line
87.31525263
18.4
133.3523188
33.4
115.3881732
48.4
89.51438906
63.4
183.0803842
78.4
94.71872612
93.4
94.87461567
108.4
225.8659367
123.4
91.62931046
138.4
153.7234328
153.4
180.0987084
168.4
211.6802011
183.4
173.5201145
198.4
203.3097495
213.4

15
residuals
68.91525263
99.95231878
66.9881732
26.11438906
104.6803842
1.318726119
-13.5253843
102.4659367
-46.7706895
0.323432778
11.69870836
28.28020107
-24.8798855
-10.0902505
SUM

Page 17

residuals
squared
4749.312046
9990.46603
4487.415349
681.9613161
10957.98284
1.739038577
182.9360212
10499.26818
2187.4974
0.104608762
136.8597773
799.7697726
619.0087033
101.8131551
45396.13423

NearMulti

1,400,000
1,200,000
1,000,000
800,000
600,000
400,000
200,000
0
b1 Price

2
-8 -3

17 22
7 12
b2 Incom e

-8
1,254,755
871,634
558,113
314,192
139,872
35,151
30

1.814356276

Page 18

NearMulti

Page 19

Q&A for Multicollinearity.xls


1. Explain how the two bivariate regression results in Example1 are related to each other. Do the same for Example 2.
Using the data below, LINEST reports the regression results below. Answer the questions that follow.
X1

X2
22
28
28
29
22
21
27
27
24
23
29
21
26
23
27
20
24
21
27
27

47
62
62
64.5
47
44.5
59.5
59.5
52
49.5
64.5
44.5
57
49.5
59.5
42
52
44.5
59.5
59.5

Y
345
450
450
467.5
345
327.5
432.5
432.5
380
362.5
467.5
327.5
415
362.5
432.5
310
380
327.5
432.5
432.5

LINEST Results
2.666667 10.83333
0.845428 2.11357
1 1.7E-014
9.0E+031
17
52430 5.0E-027

Note: If you are not usi


-18.66667
6.763424
#N/A
#N/A
#N/A

In Excel 2003, the resul

2. What does the zero mean in the top row (cell F7)?
3. Use the regression results to predict the value of Y for the first observation. Show your work and comment on how well the
4. Does R2 = 1 mean that there is perfect multicollinearity between the X's? Explain.
5. Can you recover the linear relationship between X1 and X2? If so, what is it?

me for Example 2.

Note: If you are not using Excel 2003, you may get different results
In Excel 2003, the results look like this:

comment on how well the prediction did.

You might also like