You are on page 1of 36

Important Questions

Discuss the purpose of regression analysis


Explain how hypothesis testing can be used to
determine the significance of a linear relationship
between X and Y.
Discuss the role of partial regression coefficients
and beta coefficients play in the regression
equation.
Explain the procedure for measuring the strength
of association in multiple regression and its
interpretation.
What approaches exist for determining the
relative importance of predictor variables in
accounting for variation in dependent
variable?
Discuss the Problems arising out of
multicolinearity while measuring relative
importance of factors.
Multiple Regression
The general form of the multiple regression model
e
is as follows:
Y = 0 + 1X1 + 2X2 + 3 X3+ . . . + k Xk + e

which is estimated by the following equation:


Y
= a + b1X1 + b2X2 + b3X3+ . . . + bkXk

As before, the coefficient a represents the intercept,


but the b's are now the partial regression coefficients.
Statistics Associated with Multiple
Regression
Adjusted R2, coefficient of multiple determination, is
adjusted for the number of independent variables and the
sample size to account for the diminishing returns. After
the first few variables, the additional independent variables
do not make much contribution.
Coefficient of multiple determination. The strength of
association in multiple regression is measured by the
square of the multiple correlation coefficient, R2, which is
also called the coefficient of multiple determination.
F test. The F test is used to test the null hypothesis that
the coefficient of multiple determination in the population,
R2pop, is zero. This is equivalent to testing the null
hypothesis. The test statistic has an F distribution with k
and (n - k - 1) degrees of freedom.
Statistics Associated with Multiple
Regression
i

Partial F test. The significance of a partial regression


coefficient , , of Xi may be tested using an incremental F
statistic. The incremental F statistic is based on the
increment in the explained sum of squares resulting from
the addition of the independent variable Xi to the
regression equation after all the other independent
variables have been included.
Partial regression coefficient. The partial regression
coefficient, b1, denotes the change in the predicted value, ,
per unit change in X1 when the other independent
variables, X2 to Xk, are held constant.
Conducting Multiple Regression Analysis
Partial Regression Coefficients

let us consider
Y a case in which there are two independent variables, so
that:
= a + b1X1 + b2X2
The interpretation of the partial regression coefficient, b1, is that it
represents the expected change in Y when X1 is changed by one
unit but X2 is held constant or otherwise controlled. Likewise, b2
represents the expected change in Y for a unit change in X2, when
X1 is held constant. Thus, calling b1 and b2 partial regression
coefficients is appropriate.

It can also be seen that the combined effects of X1 and X2 on Y are


additive. In other words, if X1 and X2 are each changed by one unit,
the expected change in Y would be (b1+b2).
. Analysis
Strength of Association
The strength of association is measured by the square of the multiple
correlation coefficient, R2, which is also called the coefficient of
multiple determination.

SS reg
R2 =
SS y

R2 is adjusted for the number of independent variables and the sample


size by using the following formula:
k(1 - R 2 )
Adjusted R2 = R2 -
n-k-1
Analysis
Significance Testing
H0 : R2pop = 0

This is equivalent to the following null hypothesis:

H0: 1 = 2 = 3 = . . . = k = 0

The overall test can be conducted by using an F statistic:

SS reg /k
F=
SS res /(n - k - 1)

= R 2 /k
(1 - R 2 )/(n- k - 1)

which has an F distribution with k and (n - k -1) degrees of freedom.


. Analysis
Significance Testing
Testing for the significance of the i's can be done in a manner
similar to that in the bivariate case by using t tests. The
significance of the partial coefficient for importance
attached to weather may be tested by the following equation:

t= b
SEb
which has a t distribution with n - k -1 degrees of freedom.
Multiple Linear
Regression Equation
Too
complicated
by hand! Ouch!
Multiple Regression Model:
Example Oil (Gal) Temp ( F) Insulation
0

Develop a model for estimating 275.30 40 3


heating oil used for a single 363.80 27 3
family home in the month of 164.30 40 10
40.80 73 6
January based on average 94.30 64 6
temperature and amount of 230.90 34 6
insulation in inches. 366.70 9 6
300.60 8 10
237.80 23 10
121.40 63 3
31.40 65 10
203.50 41 6
441.10 21 3
323.00 38 3
52.50 58 10
Interpretation of
Estimated Coefficients
Slope (bi)
Estimated that the average value of Y changes by
bi for each one unit increase in Xi holding all other
variables constant.
Example: if b1 = -2, then fuel oil usage (Y) is
expected to decrease by an estimated two gallons
for each one degree increase in temperature (X1)
given the inches of insulation (X2)
Y-intercept (b0)
The estimated average value of Y when all Xi = 0
Sample Multiple Regression
Equation: Example
Yi b0 b1 X1i b2 X 2i bk X ki
Coefficients
Excel Output Intercept 562.1510092
X Variable 1 -5.436580588
X Variable 2 -20.01232067

Yi 562.151 5.437 X1i 20.012 X 2i


For each degree increase in For each increase in one inch
temperature, the estimated average of insulation, the estimated
amount of heating oil used is average use of heating oil is
decreased by 5.437 gallons, decreased by 20.012 gallons,
holding insulation constant. holding temperature constant.
Multiple Regression in PHStat
PHStat | regression | multiple regression

EXCEL spreadsheet for the heating oil


example.
t Test Statistic
Excel Output: Example
t Test Statistic for X1
(Temperature)
Coefficients Standard Error t Stat
Intercept 562.1510092 21.09310433 26.65093769
X Variable 1 -5.436580588 0.336216167 -16.16989642
X Variable 2 -20.01232067 2.342505227 -8.543127434

bi
t t Test Statistic for X2
Sbi (Insulation)
t Test : Example Solution
Does temperature have a significant effect on monthly
consumption of heating oil? Test at = 0.05.
H0: 1 = 0 Test Statistic:
H1: 1 0 t Test Statistic = -16.1699
df = 12 Decision:
Critical Value(s): Reject H0 at = 0.05
Reject H0 Reject H0 Conclusion:
.025 .025 There is evidence of a
significant effect of
temperature on oil
-2.1788 0 2.1788 t
consumption.
Coefficient of Partial X
k
Determination of
Yk all others
2
r
SSR X k | all others
SST SSR all SSR X k | all others

Measures the proportion of variation in the


dependent variable that is explained by Xk
while controlling for (holding constant) the
other independent variables
Coefficient of Partial X
k
(continued)
Determination for
Example: Two Independent Variable Model

SSR X 1 | X 2
r2

SST SSR X 1 , X 2 SSR X 1 | X 2
Y 1 2
Venn Diagrams and
Explanatory Power of Regression
Variations in
Variations in
oil explained
temp not used
by the error
term SSE
in explaining
variation in Oil Oil
Variations in oil
explained by temp
or variations in
Temp temp used in
explaining variation
in oil SSR
Venn Diagrams and
(continued)
Explanatory Power of Regression

r
2
Oil

Temp
SSR

SSR SSE
Venn Diagrams and
Explanatory Power of Regression
Variation NOT Overlapping
explained by variation in
Temp nor both Temp and
Insulation Oil Insulation are
SSE used in
explaining the
variation in Oil
Temp
but NOT in the
Insulation estimation of
1
nor 2
Coefficient of
Multiple Determination
Proportion of total variation in Y explained
by all X variables taken together
SSR Explained Variation
r2
Y 12 k
SST Total Variation

Never decreases when a new X variable is


added to model
Disadvantage when comparing models
Venn Diagrams and
Explanatory Power of Regression

Oil r2
Y 12

Temp
Insulation SSR

SSR SSE
Venn Diagrams and
Estimation of Regression Model
Only this Only this
information is information is
used in the used in the
estimation of estimation of 2
Oil
1 This
information
is NOT used
Temp in the
Insulation estimation
of 1 nor 2
Venn Diagrams and Coefficient of
Partial Determination for X k
2
rY 1 2
SSR X1 | X 2
SSR X1 | X 2
SST SSR X 1 , X 2 SSR X 1 | X 2
Oil

=
Temp
Insulation
Table 17.3
Multiple Regression
Multiple R 0.97210
R2 0.94498
Adjusted R2 0.93276
Standard Error 0.85974

ANALYSIS OF VARIANCE
df Sum of Squares Mean Square

Regression 2 114.26425 57.13213


Residual 9 6.65241 0.73916
F = 77.29364 Significance of F = 0.0000

VARIABLES IN THE EQUATION


Variable b SEb Beta () T Significance
of T
IMPOR 0.28865 0.08608 0.31382 3.353 0.0085
DURATION 0.48108 0.05895 0.76363 8.160 0.0000
(Constant) 0.33732 0.56736 0.595 0.5668
Stepwise Regression
The purpose of stepwise regression is to select, from a large
number of predictor variables, a small subset of variables that
account for most of the variation in the dependent or criterion
variable. In this procedure, the predictor variables enter or are
removed from the regression equation one at a time. There are
several approaches to stepwise regression.

Forward inclusion. Initially, there are no predictor variables in the


regression equation. Predictor variables are entered one at a time, only
if they meet certain criteria specified in terms of F ratio. The order in
which the variables are included is based on the contribution to the
explained variance.
Backward elimination. Initially, all the predictor variables are
included in the regression equation. Predictors are then removed one
at a time based on the F ratio for removal.
Multicollinearity
How to detect it?
Multicollinearity arises when intercorrelations among the
predictors are very high.
If Tolerence is low.<.5 indiactes a multicollinearity problem.
Tolerance=1-Ri2 ;Where Ri2 is the squared multiple correlation
of the variable with the other independent variables.when it is
0 the variable is linear combination of the other independent
variable, so the estimate of the variables regression coefficient
is unstable.
VIF= 1/Tolerance > 1 desirable.
Multicollinearity
IMPLICATIONS

Multicollinearity can result in several problems, including:


The partial regression coefficients may not be estimated
precisely.
It becomes difficult to assess the relative importance of
the independent variables in explaining the variation in
the dependent variable.
Predictor variables may be incorrectly included or
removed in stepwise regression.
Multicollinearity
A simple procedure for adjusting for multicollinearity
consists of using only one of the variables in a highly
correlated set of variables.
Alternatively, the set of independent variables can be
transformed into a new set of predictors that are mutually
independent by using techniques such as principal
components analysis.
More specialized techniques, such as ridge regression and
latent root regression, can also be used.
Relative Importance of Predictors
Unfortunately, because the predictors are correlated,
there is no unambiguous measure of relative
importance of the predictors in regression analysis.
However, several approaches are commonly used to
assess the relative importance of predictor variables.

Statistical significance. If the partial regression


coefficient of a variable is not significant, as determined by
an incremental F test, that variable is judged to be
unimportant. An exception to this rule is made if there are
strong theoretical reasons for believing that the variable is
important.
Relative Importance of Predictors
Measures based on standardized coefficients or beta
weights. The most commonly used measures are the
absolute values of the beta weights, |Bi| , or the squared
values, Bi 2.
Stepwise regression. The order in which the predictors
enter or are removed from the regression equation is used
to infer their relative importance.

You might also like