Professional Documents
Culture Documents
Carlos M. Carvalho
The University of Texas McCombs School of Business
mccombs.utexas.edu/faculty/carlos.carvalho/teaching
1
The Multiple Regression Model
Many problems involve more than one independent variable or
factor which aects the dependent or response variable.
j
=
E[Y|X
1
, . . . , X
p
]
X
j
Holding all other variables constant,
j
is the
average change in Y per unit change in X
j
.
4
The MLR Model
If p = 2, we can plot the regression surface in 3D.
Consider sales of a product as predicted by price of this product
(P1) and the price of a competing product (P2).
Sales =
0
+
1
P1 +
2
P2 +
5
Least Squares
Y =
0
+
1
X
1
. . . +
p
X
p
+ , N(0,
2
)
How do we estimate the MLR model parameters?
The principle of Least Squares is exactly the same as before:
0
= 115.72, b
1
=
1
= 97.66, b
2
=
2
= 108.80,
s = = 28.42
8
Plug-in Prediction in MLR
Suppose that by using advanced corporate espionage tactics, I
discover that my competitor will charge $10 the next quarter.
After some marketing analysis I decided to charge $8. How much
will I sell?
Our model is
Sales =
0
+
1
P1 +
2
P2 +
with N(0,
2
)
Our estimates are b
0
= 115, b
1
= 97, b
2
= 109 and s = 28
which leads to
Sales = 115 +97 P1 + 109 P2 +
with N(0, 28
2
)
9
Plug-in Prediction in MLR
By plugging-in the numbers,
Sales = 115 +97 8 + 109 10 +
= 437 +
Sales|P1 = 8, P2 = 10 N(437, 28
2
)
and the 95% Prediction Interval is (437 2 28)
381 < Sales < 493
10
Least Squares
Just as before, each b
i
is our estimate of
i
Fitted Values:
Y
i
= b
0
+ b
1
X
1i
+ b
2
X
2i
. . . + b
p
X
p
.
Residuals: e
i
= Y
i
Y
i
.
Least Squares: Find b
0
, b
1
, b
2
, . . . , b
p
to minimize
n
i =1
e
2
i
.
In MLR the formulas for the b
i
s are too complicated so we wont
talk about them...
11
Least Squares
12
Least Squares
p1 p2 Sales yhat residuals b0 b1 b2
5.13567 5.204186 144.4879 184.0963 -39.60838 115 -97 109
3.49546 8.059732 637.2452 654.4512 -17.20597
7.275341 11.67598 620.7869 681.9736 -61.18671
4.662816 8.364421 549.0071 574.4288 -25.42163
3.584537 2.150292 20.42542 1.681753 18.74367
5.167917 10.15304 713.0067 720.3931 -7.386461
3.384091 4.946569 346.7068 325.9192 20.78763
4.293064 7.760569 595.7762 544.4749 51.30139
4.369094 7.428897 457.6469 500.9477 -43.30072
7.2266 10.71132 591.4548 581.5542 9.900659
Excel break: tted values, residuals,...
13
Residual Standard Error
The calculation for s
2
is exactly the same:
s
2
=
n
i =1
e
2
i
n p 1
=
n
i =1
(Y
i
Y
i
)
2
n p 1
Y
i
= b
0
+ b
1
X
1i
+ + b
p
X
pi
s
2
.
14
Residuals in MLR
As in the SLR model, the residuals in multiple regression are
purged of any linear relationship to the independent variables.
Once again, they are on average zero.
Because the tted values are an exact linear combination of the
Xs they are not correlated to the residuals.
We decompose Y into the part predicted by X and the part due to
idiosyncratic error.
Y =
Y +e
e = 0; corr(X
j
, e) = 0; corr(
Y, e) = 0
15
Residuals in MLR
Consider the residuals from the Sales data:
0.5 1.0 1.5 2.0
-
0
.
0
3
-
0
.
0
1
0
.
0
1
0
.
0
3
fitted
r
e
s
i
d
u
a
l
s
0.2 0.4 0.6 0.8
-
0
.
0
3
-
0
.
0
1
0
.
0
1
0
.
0
3
P1
r
e
s
i
d
u
a
l
s
0.2 0.6 1.0
-
0
.
0
3
-
0
.
0
1
0
.
0
1
0
.
0
3
P2
r
e
s
i
d
u
a
l
s
16
Fitted Values in MLR
Another great plot for MLR problems is to look at
Y (true values) against
Y (tted values).
0 200 400 600 800 1000
0
2
0
0
4
0
0
6
0
0
8
0
0
1
0
0
0
y.hat (MLR: p1 and p2)
y
=
S
a
l
e
s
If things are working, these values should form a nice straight line. Can
you guess the slope of the blue line?
17
Fitted Values in MLR
With just P1...
2 4 6 8
0
2
0
0
4
0
0
6
0
0
8
0
0
1
0
0
0
p1
y
=
S
a
l
e
s
300 400 500 600 700
0
2
0
0
4
0
0
6
0
0
8
0
0
1
0
0
0
y.hat (SLR: p1)
y
=
S
a
l
e
s
0 200 400 600 800 1000
0
2
0
0
4
0
0
6
0
0
8
0
0
1
0
0
0
y.hat (MLR: p1 and p2)
y
=
S
a
l
e
s
... and R
2
is once again dened as
R
2
=
SSR
SST
= 1
SSE
SST
telling us the percentage of variation in Y explained by the
Xs.
In Excel, R
2
is in the same place and Multiple R refers to
the correlation between
Y and Y.
20
Least Squares
Sales
i
=
0
+
1
P1
i
+
2
P2
i
+
i
SUMMARY OUTPUT
Regression Statistics
Multiple R 0.99
R Square 0.99
Adjusted R Square 0.99
Standard Error 28.42
Observations 100.00
ANOVA
df SS MS F Significance F
Regression 2.00 6004047.24 3002023.62 3717.29 0.00
Residual 97.00 78335.60 807.58
Total 99.00 6082382.84
Coefficients Standard Error t Stat P-value Lower 95% Upper 95%
Intercept 115.72 8.55 13.54 0.00 98.75 132.68
p1 -97.66 2.67 -36.60 0.00 -102.95 -92.36
p2 108.80 1.41 77.20 0.00 106.00 111.60
R
2
= 0.99
Multiple R = r
Y,
Y
= corr(Y,
Y) = 0.99
Note that R
2
= corr(Y,
Y)
2
21
Back to Baseball
R/G =
0
+
1
OBP +
2
SLG +
t o
SUMMARYOUTPUT
RegressionStatistics
MultipleR 0.955698
RSquare 0.913359
AdjustedRSquare 0.906941
StandardError 0.148627
Observations 30
ANOVA
df SS MS F SignificanceF
Regression 2 6.28747 3.143735 142.31576 4.56302E15
Residual 27 0.596426 0.02209
Total 29 6.883896
Coefficients andardErr tStat Pvalue Lower95% Upper95%
Intercept 7.014316 0.81991 8.554984 3.60968E09 8.69663241 5.332
OBP 27.59287 4.003208 6.892689 2.09112E07 19.37896463 35.80677
SLG 6.031124 2.021542 2.983428 0.005983713 1.883262806 10.17899
R
2
= 0.913
Multiple R = r
Y,
Y
= corr(Y,
Y) = 0.955
Note that R
2
= corr(Y,
Y)
2
22
Intervals for Individual Coecients
As in SLR, the sampling distribution tells us how close we can
expect b
j
to be from
j
The LS estimators are unbiased: E[b
j
] =
j
for j = 0, . . . , d.
The t-stat: t
j
=
(b
j
0
j
)
s
b
j
is the number of standard errors
between the LS estimate and the null value (
0
j
)
X
1
= Handles employee complaints
X
2
= Does not allow special privileges
X
3
= Opportunity to learn new things
X
4
= Raises based on performance
X
5
= Too critical of poor performance
X
6
= Rate of advancing to better jobs
28
Supervisor Performance Data
Week V. Slide 19
Applied Regression Analysis
Carlos M. Carvalho
Supervisor Performance Data
29
Supervisor Performance Data
Week V. Slide 20
Applied Regression Analysis
Carlos M. Carvalho
F-tests
s there any relationship here? Are all the coefficients significant?
What about the group of variables considered together?
Is there any relationship here? Are all the coecients signicant?
What about all of them together?
30
Why not look at R
2
R
2
in MLR is still a measure of goodness of t.
s and Y,
R
2
> 0!!
R
2
is 26% !!
We usually reject the null when the p-value is less than 5%.
Big f REJECT!
It looks like we should just raise our prices, right? NO, not if
you have taken this statistics class!
42
Understanding Multiple Regression
How can we see what is going on? Lets compare Sales in two
dierent observations: weeks 82 and 99.
Summary:
1. A larger P1 is associated with larger P2 and the overall eect
leads to bigger sales
2. With P2 held xed, a larger P1 leads to lower sales
3. MLR does the trick and unveils the correct economic
relationship between Sales and prices!
47
Understanding Multiple Regression
Beer Data (from an MBA class)
n
i =1
(X
i
X)
2
1
) is smaller.
55
Understanding Multiple Regression
2.78
2
201.85
= 0.20 Is this right?
57
Understanding Multiple Regression