You are on page 1of 19

Ch.

14: The Multiple Regression Model


building
Idea: Examine the linear relationship between
1 dependent (Y) & 2 or more independent variables (X
i
)
X X X Y
ki k 2i 2 1i 1 0 i
+ + + + + =
Multiple Regression Model with k Independent Variables:
Y-intercept
Population slopes Random Error
The coefficients of the multiple regression model
are estimated using sample data with k
independent variables






Interpretation of the Slopes: (referred to as a Net
Regression Coefficient)
b
1
=The change in the mean of Y per unit change in X
1
,
taking into account the effect of X
2
(or net of X
2
)
b
0
Y intercept. It is the same as simple regression.

ki k 2i 2 1i 1 0 i
X b X b X b b Y

+ + + + =
Estimated
(or predicted)
value of Y
Estimated slope coefficients
Estimated
intercept
Three dimension
Y
X
1
X
2
Graph of a Two-Variable Model
2 2 1 1 0
X b X b b Y

+ + =
Example:
Simple Regression Results



Multiple Regression Results



Check the size and significance level of the
coefficients, the F-value, the R-Square, etc. You
will see what the net of effects are.

Coefficients Standard Error t Stat
Intercept (b
0
) 165.0333581 16.50316094 10.000106
Lotsize (b
1
)
6.931792143 2.203156234 3.1463008
F-Value 9.89
Adjusted R Square 0.108
Standard Error 36.34
Coefficients Standard Error t Stat
Intercept 59.32299284 20.20765695 2.935669
Lotsize 3.580936283 1.794731507 1.995249
Rooms 18.25064446 2.681400117 6.806386
F-Value 31.23
Adjusted R Square 0.453
Standard Error 28.47
Using The Equation to Make Predictions
Predict the appraised value at average lot size
(7.24) and average number of rooms (7.12).



What is the total effect from 2000 sf increase in lot
size and 2 additional rooms?

$215,180 or 215.18
) 18.25(7.12 (7.24) 3.58 59.32 . App.Val
=
+ + =
$43,660
(18.25)(2) 0) (3.58)(200
value app. in Increse
=
+ =
Coefficient of Multiple Determination, r
2
and Adjusted r
2
Reports the proportion of total variation in Y
explained by all X variables taken together (the
model)


Adjusted r
2
r
2
never decreases when a new X variable is added to the
model
This can be a disadvantage when comparing models
squares of sum total
squares of sum regression
SST
SSR
r
2
k .. 12 . Y
= =
What is the net effect of adding a new variable?
We lose a degree of freedom when a new X variable is added
Did the new X variable add enough explanatory power to offset
the loss of one degree of freedom?
Shows the proportion of variation in Y explained
by all X variables adjusted for the number of X
variables used


(where n = sample size, k = number of independent variables)

Penalize excessive use of unimportant independent
variables
Smaller than r
2
Useful in comparing among models

(

|
.
|

\
|


=
1 k n
1 n
) r 1 ( 1 r
2
k .. 12 . Y
2
adj
Multiple Regression Assumptions
Assumptions:
The errors are normally distributed
Errors have a constant variance
The model errors are independent
Errors (residuals) from the regression model:
e
i
= (Y
i
Y
i
)
These residual plots are used in multiple
regression:
Residuals vs. Y
i
Residuals vs. X
1i

Residuals vs. X
2i

Residuals vs. time (if time series data)




Two variable model

Y
X
1
X
2
2 2 1 1 0
X b X b b Y

+ + =
Y
i
Y
i
<

x
2i
x
1i
The best fit equation, Y ,
is found by minimizing the
sum of squared errors, Ee
2
<


Sample
observation
Residual = e
i

= (Y
i
Y
i
)
<

Are Individual Variables Significant?
Use t-tests of individual variable slopes
Shows if there is a linear relationship between the
variable X
i
and Y; Hypotheses:
H
0
:
i
= 0 (no linear relationship)
H
1
:
i
0 (linear relationship does exist between X
i
and Y)
Test Statistic:


Confidence interval for the population slope
i


i
b
i
1 k n
S
0 b
t

=
i
b 1 k n i
S t b

Is the Overall Model Significant?


F-Test for Overall Significance of the Model
Shows if there is a linear relationship between all of the X
variables considered together and Y
Use F test statistic; Hypotheses:
H
0
:
1
=
2
= =
k
= 0 (no linear relationship)
H
1
: at least one
i
0 (at least one independent
variable affects Y)
Test statistic:
1 k n
SSE
k
SSR
MSE
MSR
F

= =
Testing Portions of the Multiple
Regression Model
To find out if inclusion of an individual X
j
or a
set of Xs, significantly improves the model,
given that other independent variables are
included in the model
Two Measures:
1. Partial F-test criterion
2. The Coefficient of Partial Determination
Contribution of a Single Independent
Variable X
j

SSR(X
j
| all variables except X
j
)
= SSR (all variables) SSR(all variables except X
j
)
Measures the contribution of X
j
in explaining the total
variation in Y (SST)
consider here a 3-variable model:
SSR(X
1
| X
2
and X
3
)
= SSR (all variablesX1-x3) SSR(X
2
and X
3
)




SSR
UR

Model
SSR
R
Model
The Partial F-Test Statistic
Consider the hypothesis test:
H
0
: variable Xj does not significantly improve the model after all
other variables are included
H
1
: variable Xj significantly improves the model after all other
variables are included
1) - k - /(n SSE MSE
n) restrictio of number )/(df SSR - (SSR
F
UR
R UR
=
=
=
Note that the numerator is the contribution of X
j
to the regression.

If Actual F Statistic is > than the Critical F, then
Conclusion is: Reject H
0
; adding X
1
does improve model
Coefficient of Partial Determination for
one or a set of variables
Measures the proportion of total variation in the dependent
variable (SST) that is explained by X
j
while controlling for
(holding constant) the other explanatory variables
R UR
R UR
2
j) except variables Yj.(all
SSR SST
SSR - SSR
r

=
Using Dummy Variables
A dummy variable is a categorical
explanatory variable with two levels:
yes or no, on or off, male or female
coded as 0 or 1
Regression intercepts are different if the
variable is significant
Assumes equal slopes for other variables
If more than two levels, the number of
dummy variables needed is (number of
levels - 1)

Different Intercepts, same slope
Y (sales)
b
0
+ b
2
b
0

1 0 1 0
1 2 0 1 0
X b b (0) b X b b Y

X b ) b (b (1) b X b b Y

1 2 1
1 2 1
+ = + + =
+ + = + + =
Fire Place
No Fire Place
If H
0
:
2
= 0 is
rejected, then
Fire Place has a
significant effect
on Values
Interaction Between Explanatory
Variables
Hypothesizes interaction between pairs of X variables
Response to one X variable may vary at different levels of
another X variable
Contains two-way cross product terms



Effect of Interaction
Without interaction term, effect of X
1
on Y is measured by
1
With interaction term, effect of X
1
on Y is measured by
1
+
3
X
2
Effect changes as X
2
changes

) (X b X b X b b
X b X b X b b Y

2 1 3 2 2 1 1 0
3 3 2 2 1 1 0
X + + + =
+ + + =
Example: Suppose X2 is a dummy variable
and the estimated regression equation is
Slopes are different if the effect of X
1
on Y depends on X
2
value
X
1
0 1 0.5 1.5
Y
= 1 + 2X
1
+ 3X
2
+ 4X
1
X
2

Y

You might also like