Professional Documents
Culture Documents
Let's start by generating a basic Multiple Linear Regression Analysis for data supplied by
Levine et.al. associated with an investigation associated with Heating Oil Usage (for the
Month of January) based upon two suggested Independent Variables:
- Average daily temperature in Farenheit where the randomly selected home was
located; and
- Amount (in Inches) of Insulation in the randomly selected home during the
month the data were gathered.
We can begin by generating the sample model using the ENTER command (versus the
Forward, Backward, or Stepwise commands, which would usually be a more preferable
approach, since we obtain an R-squared change at each step), where:
Variables Entered/Removedb
Model Summary
Sum of Mean
Model Squares df Square F Sig.
SS 1 Regression 228014.6 2 114007.3 168.471 .000a
R
Residual 8120.603 12 676.717
SS Total 236135.2 14
E a. Predictors: (Constant), Inches of Attic Insulation, Average Daily Temperature (F)
SS - January
T b. Dependent Variable: Oil Consumption - January
Coefficientsa
Standardi
b0 zed
b1
Unstandardized Coefficien 95% Confidence
Coefficients ts Interval for B
b2 Lower Upper
Model B Std. Error Beta t Sig. Bound Bound
1 (Constant) 562.151 21.093 26.651 .000 516.193 608.109
Average Daily
-5.437 .336 -.866 -16.170 .000 -6.169 -4.704
Temperature (F) - January
Inches of Attic Insulation -20.012 2.343 -.457 -8.543 .000 -25.116 -14.908
a. Dependent Variable: Oil Consumption - January
So, we can interpret the results as:
so, for example, for houses with 6 inches of insulation that experienced an
average temperature of 30 degrees for the month of January, we would
infer that the average heating oil used would be 278.9798 gallons
* Using Table Curve 3, we can portray the Multiple Regression equation as:
Forward Inclusion
The Forward Inclusion method of Multiple Regression enters each significant variable
into the equation, in the order of greatest to least magnitude of effect on the dependent
variable. It has three advantages over the previous method displayed:
- we can judge the relative importance of each variable, even if all ‘end up’ in the
equation;
a
Variables Entered/Removed
Variables Variables
Model Entered Removed Method
1 Forward
Average
(Criterion:
Daily
Probabilit
Temperatu .
y-of-F-to-e
re (F) -
nter <=
January
.050)
2 Forward
(Criterion:
Inches of
Probabilit
Attic .
y-of-F-to-e
Insulation
nter <=
.050)
a. Dependent Variable: Oil Consumption - January
Model Summary
Change Statistics
Adjusted Std. Error of R Square
Model R R Square R Square the Estimate Change F Change df1 df2 Sig. F Change
1 .870 a .756 .738 66.5125 .756 40.377 Coefficients 1 a 13 .000
2 .983 b .966 .960
ANOVAc .209
26.0138 72.985 1 12 .000
Standardi
a. Predictors: (Constant), Average Daily Temperature (F) - January
Sum of zed
b. Predictors: (Constant), Average Daily Temperature (F) - January, Inches of Attic Insulation
Unstandardized Coefficien
Model Squares df Mean Square F ts Sig.
Coefficients 95% Confidence Interval for B
1 Regression Model 178624.4 1 178624.424 40.377 a
B Std. Error Beta t .000 Sig. Lower Bound Upper Bound
Residual 1 57510.805
(Constant) 13 4423.908
436.438 38.640 11.295 .000 352.962 519.914
Average Daily
Total 236135.2 14
Temperature (F) - January
-5.462 .860 -.870 -6.354 .000 -7.319 -3.605
2 Regression 2 228014.6 (Constant) 2 114007.313
562.151 21.093 168.471 .000 b
26.651 .000 516.193 608.109
Residual Average Daily
8120.603 12
Temperature (F) - January
676.717.336
-5.437 -.866 -16.170 .000 -6.169 -4.704
Total 236135.2 14
Inches of Attic Insulation -20.012 2.343 -.457 -8.543 .000 -25.116 -14.908
a. Dependent Variable: Oil Consumption - January
a. Predictors: (Constant), Average Daily Temperature (F) - January
b. Predictors: (Constant), Average Daily Temperature (F) - January, Inches of Attic
Insulation
c. Dependent Variable: Oil Consumption - January
Excluded Variablesb
Collinearit
y
Partial Statistics
Model Beta In t Sig. Correlation Tolerance
1 Inches of Attic Insulation -.457a -8.543 .000 -.927 1.000
a. Predictors in the Model: (Constant), Average Daily Temperature (F) - January
b. Dependent Variable: Oil Consumption - January
As shown by this output, Temperature is far more influential in its effect on Heating Oil
Consumption than Attic Insulation.
Residual Analysis
- Residuals / Standardized Residuals versus Y’ : shows that the data may not be
linear, and that a transformation of at least one X variable, or the Y
variable, may be in order
- If the data were collected in a time order, the generation of Residuals by Time,
with the calculation of the Durbin-Watson statistic would be employed.
We first note the absence of any pattern in the plot of the Predicted Values versus the
Residuals:
60
40
20
Unstandardized Residual
-20
-40
0 100 200 300 400
60
40
20
Unstandardized Residual
-20
-40
0 10 20 30 40 50 60 70 80
40
20
Unstandardized Residual
-20
-40
2 4 6 8 10 12
Testing for our other underlying assumptions as detailed in the Simple Regression
material, we can also show the ‘standard’ plots employed at this point as well. Three
plots have been generated to show that the observed Y values plotted against the
Unstandardized, Standardized, and Studentized Residuals produce the same general
distribution, and conclusions:
2.0
.75
FrequencyStandardized Residual
4 2
Regression Studentized Residual
1.5
3 1.0 1
.50
Expected Cum Prob
.5
2 0
0.0
.25
1 -.5
Regression
20
Unstandardized Residual
-20
-40
0 100 200 300 400 500
Next, we can use the Table below (also presented earlier) to make the following
observations:
Coefficientsa
Standardi
zed
Unstandardized Coefficien
Coefficients ts 95% Confidence Interval for B
Model B Std. Error Beta t Sig. Lower Bound Upper Bound
1 (Constant) 436.438 38.640 11.295 .000 352.962 519.914
Average Daily
-5.462 .860 -.870 -6.354 .000 -7.319 -3.605
Temperature (F) - January
2 (Constant) 562.151 21.093 26.651 .000 516.193 608.109
Average Daily
-5.437 .336 -.866 -16.170 .000 -6.169 -4.704
Temperature (F) - January
Inches of Attic Insulation -20.012 2.343 -.457 -8.543 .000 -25.116 -14.908
a. Dependent Variable: Oil Consumption - January
If we have not conducted a Forward Inclusion analysis, we would at this point wish to
test for the contribution of each individual variable to the Regression Model. This
approach provides slightly more information than using the Forward method, in that two
independent and distinct Model and Anova Tables are generated, and an F statistic can be
generated for each variable. Many statisticians find, however, that a Forward Inclusion
approach provides all of the data necessary for an analysis of Independent Variable
contribution.
b
Variables Entered/Removed Model Summary
ANOVAb
Sum of
Model Squares df Mean Square F Sig.
1 Regression 178624.4 1 178624.424 40.377 .000 a
Residual 57510.805 13 4423.908
Total 236135.2 14
a. Predictors: (Constant), Average Daily Temperature (F) - January
b. Dependent Variable: Oil Consumption - January
Coefficientsa
Standardi
zed
Unstandardized Coefficien
Coefficients ts 95% Confidence Interval for B
Model B Std. Error Beta t Sig. Lower Bound Upper Bound
1 (Constant) 436.438 38.640 11.295 .000 352.962 519.914
Average Daily
-5.462 .860 -.870 -6.354 .000 -7.319 -3.605
Temperature (F) - January
a. Dependent Variable: Oil Consumption - January
b
Variables Entered/Removed Model Summary
ANOVAb
Sum of
Model Squares df Mean Square F Sig.
1 Regression 51076.465 1 51076.465 3.588 .081 a
Residual 185058.8 13 14235.290
Total 236135.2 14
a. Predictors: (Constant), Inches of Attic Insulation
b. Dependent Variable: Oil Consumption - January
Coefficientsa
Standardi
zed
Unstandardized Coefficien
Coefficients ts 95% Confidence Interval for B
Model B Std. Error Beta t Sig. Lower Bound Upper Bound
1 (Constant) 345.378 74.691 4.624 .000 184.019 506.738
Inches of Attic Insulation -20.350 10.743 -.465 -1.894 .081 -43.560 2.859
a. Dependent Variable: Oil Consumption - January
Then, at this point, we would use the Sums of Squares from the three ANOVA tables to
test for the contribution of each variable, after its companion variable has been added to
the Multiple Regression equation.
Given :
SSR = 228,014.63 @ 2 df
MSR = 676.72 @ 12 df
SSRTemp = 178,624.42 @ 1 df
SSRInsulation = 51,076.46 @ 1 df
Then:
so
Contribution of variable X1 given X2 has been included :
So:
Similar to the results shown when conducting a Forward Inclusion analysis, both
variables are significant. Having found this, we can also calculate Coefficients of Partial
Determination; which break down the Coefficient of Multiple Determination into the
component Coefficients associated with each variable. SPSS will automatically calculate
these values for us when that option is toggled:
Coefficientsa
Standardi
zed
Unstandardized Coefficien
Coefficients ts 95% Confidence Interval for B Correlations
Model B Std. Error Beta t Sig. Lower Bound Upper Bound Zero-order Partial Part
1 (Constant) 562.151 21.093 26.651 .000 516.193 608.109
Average Daily
-5.437 .336 -.866 -16.170 .000 -6.169 -4.704 -.870 -.978 -.866
Temperature (F) - January
Inches of Attic Insulation -20.012 2.343 -.457 -8.543 .000 -25.116 -14.908 -.465 -.927 -.457
a. Dependent Variable: Oil Consumption - January
and
One of the major threats to Multiple Regression analysis concerns the possibility of
collinearity; the situation where independent variables are highly correlated with one
another, which can lead to spurious predictions. For years, statisticians (refer to the
handout from the old SPSS manual) recommended that, as a first step, a correlational
matrix be generated among the Independent Variables so that (by observation) one could
determine whether a “very high” relationship existed among the variables. Of course, it
was always difficult to objectively determine how high was “too high”.
1
VIFj =
1 – R2j
where R2j is the coefficient of multiple determination of explanatory variable Xj with all
other explanatory variables. In the case of just two variables, R21 is simply the Coefficient
of Determination between the two. In the case of the Heating Oil data, rT x I = 0.00892, so
SPSS, when toggling ‘Collinearity Diagnostics’ will calculate this value for us:
Coefficientsa
Standardi
zed
Unstandardized Coefficien
Coefficients ts Collinearity Statistics
Model B Std. Error Beta t Sig. Tolerance VIF
1 (Constant) 562.151 21.093 26.651 .000
Average Daily
-5.437 .336 -.866 -16.170 .000 1.000 1.000
Temperature (F) - January
Inches of Attic Insulation -20.012 2.343 -.457 -8.543 .000 1.000 1.000
a. Dependent Variable: Oil Consumption - January
Here’s an example from another Multiple Regression Analysis, where there were more
than two independent variables involved:
Coefficientsa
Standardi
zed
Unstandardized Coefficien
Coefficients ts Collinearity Statistics
Model B Std. Error Beta t Sig. Tolerance VIF
1 (Constant) -330.832 110.895 -2.983 .007
Total Staff 1.246 .412 .529 3.023 .006 .586 1.707
REMOTE -.118 .054 -.324 -2.180 .041 .811 1.233
DUBNER -.297 .118 -.408 -2.519 .020 .685 1.459
Total Labor .131 .059 .417 2.200 .039 .500 1.999
a. Dependent Variable: STANDBY
If a set of variables are uncorrelated, then the VIF will be equal to approximately 1.00. If
a set of variables is highly correlated, a VIF might exceed 10. Some statisticians suggest
that if the VIF exceeds 10, then alternatives to the generated model should be explored.
More conservative statisticians suggest that 5 is a more appropriate maximum threshold
value.
where
and where