Multiple Regression With SPSS

Example of Sequence of Activities Associated
with Multiple Regression Analysis
Sample Problem : Predicting Home Heating Oil Usage

(Statistics for Managers - Levine et.al.)
Let's start by generating a basic Multiple Linear Regression Analysis for data supplied by
Levine et.al. associated with an investigation associated with Heating Oil Usage (for the
Month of January) based upon two suggested Independent Variables:
- Average daily temperature in Farenheit where the randomly selected home was
located; and
- Amount (in Inches) of Insulation in the randomly selected home during the
month the data were gathered.
We can begin by generating the sample model using the ENTER command (versus the
Forward, Backward, or Stepwise commands, which would usually be a more preferable
approach, since we obtain an R-squared change at each step), where:
Yi' = b0 + b1X1i + b2X2i
Variables Entered/Removedb
Model Summary
Variables Variables Std. Error

Model Entered Removed Method
Adjusted of the
1 Inches of
Attic Model R R Square R Square Estimate
Insulation, 1 .983 a .966 .960 26.0138
Average
Daily
. Enter a. Predictors: (Constant), Inches of Attic Insulation,
Temperatu Average Daily Temperature (F) - January
re (F) - a
January
a. All requested variables entered.
b. Dependent Variable: Oil Consumption - January
ANOVAb
Sum of Mean
Model Squares df Square F Sig.
SS 1 Regression 228014.6 2 114007.3 168.471 .000a
R
Residual 8120.603 12 676.717
SS Total 236135.2 14
E a. Predictors: (Constant), Inches of Attic Insulation, Average Daily Temperature (F)
SS - January
T b. Dependent Variable: Oil Consumption - January
Coefficientsa
Standardi
b0 zed
b1
Unstandardized Coefficien 95% Confidence
Coefficients ts Interval for B
b2 Lower Upper
Model B Std. Error Beta t Sig. Bound Bound
1 (Constant) 562.151 21.093 26.651 .000 516.193 608.109
Average Daily
-5.437 .336 -.866 -16.170 .000 -6.169 -4.704
Temperature (F) - January
Inches of Attic Insulation -20.012 2.343 -.457 -8.543 .000 -25.116 -14.908
a. Dependent Variable: Oil Consumption - January
So, we can interpret the results as:
* Y’I = 562.151 – 5.43658Tempi - 20.0123InchesI
so, for example, for houses with 6 inches of insulation that experienced an
average temperature of 30 degrees for the month of January, we would
infer that the average heating oil used would be 278.9798 gallons
* Expected Oil Consumption is expected to decrease by 5.44 Gallons per Month

for every increase of 1 degree in average Temperature, for any given
amount of attic insulation (that is, with amount of insulation accounted
for, or held constant)
* Expected Oil Consumption is expected to decrease by 20.01 Gallons per Month

for every increase of 1 inch of insulation, for any given house
experiencing an average temperature at a given value (that is, with
temperature accounted for, or held constant)
* R2, the Coefficient of Multiple Determination, shows that 96.56% of the

variability in Heating Oil Usage is explained by Temperature and
Insulation. Adjusting for the number of predictors in the model, and the
sample size employed, our estimate of R2 = 0.96
* Using Table Curve 3, we can portray the Multiple Regression equation as:
Forward Inclusion
The Forward Inclusion method of Multiple Regression enters each significant variable
into the equation, in the order of greatest to least magnitude of effect on the dependent
variable. It has three advantages over the previous method displayed:
- we can judge the relative importance of each variable, even if all ‘end up’ in the
equation;
- we obtain the R2 change at each step, allowing us to judge if the variability

explained by the inclusion of each variable makes it worth considering /
monitoring; and
- insignificant variables are not included in the equation.
For this example, we would obtain:
a
Variables Entered/Removed
Variables Variables
Model Entered Removed Method
1 Forward
Average
(Criterion:
Daily
Probabilit
Temperatu .
y-of-F-to-e
re (F) -
nter <=
January
.050)
2 Forward
(Criterion:
Inches of
Probabilit
Attic .
y-of-F-to-e
Insulation
nter <=
.050)
Model Summary
Change Statistics
Adjusted Std. Error of R Square
Model R R Square R Square the Estimate Change F Change df1 df2 Sig. F Change
1 .870 a .756 .738 66.5125 .756 40.377 Coefficients 1 a 13 .000
2 .983 b .966 .960
ANOVAc .209
26.0138 72.985 1 12 .000
Standardi
a. Predictors: (Constant), Average Daily Temperature (F) - January
Sum of zed
b. Predictors: (Constant), Average Daily Temperature (F) - January, Inches of Attic Insulation
Unstandardized Coefficien
Model Squares df Mean Square F ts Sig.
Coefficients 95% Confidence Interval for B
1 Regression Model 178624.4 1 178624.424 40.377 a
B Std. Error Beta t .000 Sig. Lower Bound Upper Bound
Residual 1 57510.805
(Constant) 13 4423.908
436.438 38.640 11.295 .000 352.962 519.914
Average Daily
Total 236135.2 14
-5.462 .860 -.870 -6.354 .000 -7.319 -3.605
2 Regression 2 228014.6 (Constant) 2 114007.313
562.151 21.093 168.471 .000 b
26.651 .000 516.193 608.109
Residual Average Daily
8120.603 12
676.717.336
-5.437 -.866 -16.170 .000 -6.169 -4.704
Total 236135.2 14
b. Predictors: (Constant), Average Daily Temperature (F) - January, Inches of Attic
Insulation
c. Dependent Variable: Oil Consumption - January
Excluded Variablesb
Collinearit
y
Partial Statistics
Model Beta In t Sig. Correlation Tolerance
1 Inches of Attic Insulation -.457a -8.543 .000 -.927 1.000
a. Predictors in the Model: (Constant), Average Daily Temperature (F) - January
As shown by this output, Temperature is far more influential in its effect on Heating Oil
Consumption than Attic Insulation.
Residual Analysis
Generally, the next step in Multiple Regression Analysis is to generate a series of

Residual Plots:
- Residuals / Standardized Residuals versus Y’ : shows that the data may not be
linear, and that a transformation of at least one X variable, or the Y
variable, may be in order
- Residuals / Standardized Residuals versus Each X (Independent Variable) :

presence of a pattern would show the need to transform the variable, given
evidence of a non-linear effect
- If the data were collected in a time order, the generation of Residuals by Time,
with the calculation of the Durbin-Watson statistic would be employed.
We first note the absence of any pattern in the plot of the Predicted Values versus the
Residuals:
60
40
20
Unstandardized Residual
-20
-40
0 100 200 300 400
Unstandardized Predicted Value
Next, as previously described:
60
40
20
-20
-40
0 10 20 30 40 50 60 70 80
Average Daily Temperature (F) - January

60
40
20
-20
-40
2 4 6 8 10 12
Inches of Attic Insulation
Testing for our other underlying assumptions as detailed in the Simple Regression
material, we can also show the ‘standard’ plots employed at this point as well. Three
plots have been generated to show that the observed Y values plotted against the
Unstandardized, Standardized, and Studentized Residuals produce the same general
distribution, and conclusions:
Normal P-P Plot

Histogram
Scatterplot Scatterplot
Dependent Variable: Oil Consumption - January
Dependent
Dependent Variable:
Variable: OilOil Consumption
Consumption - January
- January Dependent
1.00 Variable: Oil Consumption - January
5 2.5 3
2.0
.75
FrequencyStandardized Residual
4 2
Regression Studentized Residual
1.5
3 1.0 1
.50
Expected Cum Prob
.5
2 0
0.0
.25
1 -.5
Regression
Std. Dev = .93

-1
60 Mean = 0.00
-1.0
N = 15.00 0.00
0
-1.5 -2 0.00 .25 .50 .75 1 .00
-1.50 -1.00 -.50 0.00 .50 1.00 1.50 2.00
0 100 200 300 400 500 0 100 200 300 400 5 00
40
Observed Cum Prob
Regression Standardized Residual Oil Consumption - January
Oil Consumption - January
20
-20
-40
0 100 200 300 400 500
Oil Consumption - January

Testing the Significance of the Entire (Final) Multiple Regression Model
and Inferences About the Population Regression Coefficients
Testing the hypothesis that:
H0 : β 1 = β 2 = 0 (There is no linear relationship between the dependent

variable
And the explanatory variables)
H1 : At least on β j ≠0
Seeing that F = 168.47 and p = 0.00, we reject the Null Hypothesis.
Next, we can use the Table below (also presented earlier) to make the following
observations:
Coefficientsa
Standardi
zed
Coefficients ts 95% Confidence Interval for B
Model B Std. Error Beta t Sig. Lower Bound Upper Bound
1 (Constant) 436.438 38.640 11.295 .000 352.962 519.914
Average Daily
-5.462 .860 -.870 -6.354 .000 -7.319 -3.605
2 (Constant) 562.151 21.093 26.651 .000 516.193 608.109
Average Daily
-5.437 .336 -.866 -16.170 .000 -6.169 -4.704
* we reject the hypothesis that β 1 = 0. t = -16.17; p = 0.000. Our point estimate

for
this value is –5.437; and our 95% CI for the Slope (β 1) is –6.169 to –
4.704, taking into account the effects of Temperature
* we reject the hypothesis that β 2 = 0. t = -8.543; p = 0.000. Our point estimate

for
this value is –20.012; and our 95% CI for the Slope (β 1) is –25.116 to
–14.908, taking into account the effects of Insulation
Testing Portions or Sub-Components of the Multiple Regression Model
If we have not conducted a Forward Inclusion analysis, we would at this point wish to
test for the contribution of each individual variable to the Regression Model. This
approach provides slightly more information than using the Forward method, in that two
independent and distinct Model and Anova Tables are generated, and an F statistic can be
generated for each variable. Many statisticians find, however, that a Forward Inclusion
approach provides all of the data necessary for an analysis of Independent Variable
contribution.
The ‘ENTER / Component’ approach would yield:
b
Variables Entered/Removed Model Summary
Variables Variables Adjusted Std. Error of

Model Entered Removed Method Model R R Square R Square the Estimate
1 Average 1 .870a .756 .738 66.5125
Daily
a. Predictors: (Constant), Average Daily Temperature (F) -
Temperatu . Enter
re (F) - a January
January
ANOVAb
Sum of
Model Squares df Mean Square F Sig.
1 Regression 178624.4 1 178624.424 40.377 .000 a
Residual 57510.805 13 4423.908
Total 236135.2 14
Coefficientsa
Standardi
zed
1 (Constant) 436.438 38.640 11.295 .000 352.962 519.914
Average Daily
-5.462 .860 -.870 -6.354 .000 -7.319 -3.605
b
Variables Entered/Removed Model Summary
Variables Variables Adjusted Std. Error of

Model Entered Removed Method Model R R Square R Square the Estimate
1 Inches of 1 .465a .216 .156 119.3117
Attic a
. Enter a. Predictors: (Constant), Inches of Attic Insulation
Insulation
ANOVAb
Sum of
Model Squares df Mean Square F Sig.
1 Regression 51076.465 1 51076.465 3.588 .081 a
Residual 185058.8 13 14235.290
Total 236135.2 14
a. Predictors: (Constant), Inches of Attic Insulation
Coefficientsa
Standardi
zed
1 (Constant) 345.378 74.691 4.624 .000 184.019 506.738
Inches of Attic Insulation -20.350 10.743 -.465 -1.894 .081 -43.560 2.859
Then, at this point, we would use the Sums of Squares from the three ANOVA tables to
test for the contribution of each variable, after its companion variable has been added to
the Multiple Regression equation.
Given :
SSR = 228,014.63 @ 2 df
MSR = 676.72 @ 12 df
SSRTemp = 178,624.42 @ 1 df
SSRInsulation = 51,076.46 @ 1 df
Then:
SSR (Xk all variables except k) = SSR(X1 and X2) – SSR(X2)
so
Contribution of variable X1 given X2 has been included :
SSR(X1 X2) = SSR (X1 and X2) – SSR(X2) and
Contribution of variable X2 given X1 has been included :
SSR(X2 X1) = SSR (X1 and X2) – SSR(X1)
So:
Contribution of Temperature (X1) After Insulation Has Been Added:
SSR(X1 X2) = 228,015 – 51,076 = 176, 939
then FPartial = 176,939 / 676.717 = 261.47; p = 0.000
Contribution of Insulation (X2) After Temperature Has Been Added:
SSR(X2 X1) = 228,015 – 178,624 = 49,391
then FPartial = 49,391 / 676.717 = 72.99; p = 0.000
Similar to the results shown when conducting a Forward Inclusion analysis, both
variables are significant. Having found this, we can also calculate Coefficients of Partial
Determination; which break down the Coefficient of Multiple Determination into the
component Coefficients associated with each variable. SPSS will automatically calculate
these values for us when that option is toggled:
The Coefficients may be obtained as:
Coefficientsa
Standardi
zed
Coefficients ts 95% Confidence Interval for B Correlations
Model B Std. Error Beta t Sig. Lower Bound Upper Bound Zero-order Partial Part
1 (Constant) 562.151 21.093 26.651 .000 516.193 608.109
Average Daily
-5.437 .336 -.866 -16.170 .000 -6.169 -4.704 -.870 -.978 -.866
Inches of Attic Insulation -20.012 2.343 -.457 -8.543 .000 -25.116 -14.908 -.465 -.927 -.457
r21.2 = -.9782 = .9564 = 95.64%
and
r22.1 = -.9272 = .8593 = 85.93%
Other Important Measures and Analyses in

Multiple Regression Analysis
One of the major threats to Multiple Regression analysis concerns the possibility of
collinearity; the situation where independent variables are highly correlated with one
another, which can lead to spurious predictions. For years, statisticians (refer to the
handout from the old SPSS manual) recommended that, as a first step, a correlational
matrix be generated among the Independent Variables so that (by observation) one could
determine whether a “very high” relationship existed among the variables. Of course, it
was always difficult to objectively determine how high was “too high”.
To solve this problem, we use the Variance Inflationary Factor (VIF):
1
VIFj =
1 – R2j
where R2j is the coefficient of multiple determination of explanatory variable Xj with all
other explanatory variables. In the case of just two variables, R21 is simply the Coefficient
of Determination between the two. In the case of the Heating Oil data, rT x I = 0.00892, so
VIF1 = VIF2 = 1 / { 1 – (0.008922) } ≅ 1.00
SPSS, when toggling ‘Collinearity Diagnostics’ will calculate this value for us:
Coefficientsa
Standardi
zed
Coefficients ts Collinearity Statistics
Model B Std. Error Beta t Sig. Tolerance VIF
1 (Constant) 562.151 21.093 26.651 .000
Average Daily
-5.437 .336 -.866 -16.170 .000 1.000 1.000
Inches of Attic Insulation -20.012 2.343 -.457 -8.543 .000 1.000 1.000
Here’s an example from another Multiple Regression Analysis, where there were more
than two independent variables involved:
Coefficientsa
Standardi
zed
Coefficients ts Collinearity Statistics
Model B Std. Error Beta t Sig. Tolerance VIF
1 (Constant) -330.832 110.895 -2.983 .007
Total Staff 1.246 .412 .529 3.023 .006 .586 1.707
REMOTE -.118 .054 -.324 -2.180 .041 .811 1.233
DUBNER -.297 .118 -.408 -2.519 .020 .685 1.459
Total Labor .131 .059 .417 2.200 .039 .500 1.999
a. Dependent Variable: STANDBY
If a set of variables are uncorrelated, then the VIF will be equal to approximately 1.00. If
a set of variables is highly correlated, a VIF might exceed 10. Some statisticians suggest
that if the VIF exceeds 10, then alternatives to the generated model should be explored.
More conservative statisticians suggest that 5 is a more appropriate maximum threshold
value.
The Cp Statistic and Model Building
In building a model through Forward, Backward, or Stepwise inclusion, the statistician

constantly may have a number of potential models which can be used to describe a
predictive value for the Dependent Variable (actually, its Criterion Measure). This is
particularly true when a large number of Independent Variables are significant. To
optimize the model employed, particularly when more than 2 variables are involved, the
Cp Statistic may be employed :
(1 – R2p) (n - T)
Cp = - [n – 2(p+1)]
1 – R2T
where
p = number of independent variables included in a model

T = total number of parameters (Intercept included) available to be estimated in
the full regression model
R2p = coefficient of multiple determination for a regression model with p
independent variables included
R2T = coefficient of multiple determination for a full regression model containing
the Intercept and all T estimated parameters
and where
the goal is to find models whose Cp is close to or below ( p + 1 )

Multiple Regression With SPSS

Uploaded by

Document Information

Original Description:

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Multiple Regression With SPSS

Uploaded by

Copyright:

Available Formats

Example of Sequence of Activities Associated

with Multiple Regression Analysis

Sample Problem : Predicting Home Heating Oil Usage

Yi' = b0 + b1X1i + b2X2i

Variables Variables Std. Error

* Y’I = 562.151 – 5.43658Tempi - 20.0123InchesI

* Expected Oil Consumption is expected to decrease by 5.44 Gallons per Month

* Expected Oil Consumption is expected to decrease by 20.01 Gallons per Month

* R2, the Coefficient of Multiple Determination, shows that 96.56% of the

- we obtain the R2 change at each step, allowing us to judge if the variability

- insignificant variables are not included in the equation.

For this example, we would obtain:

Generally, the next step in Multiple Regression Analysis is to generate a series of

- Residuals / Standardized Residuals versus Each X (Independent Variable) :

Unstandardized Predicted Value

Next, as previously described:

Average Daily Temperature (F) - January

Inches of Attic Insulation

Normal P-P Plot

Std. Dev = .93

Oil Consumption - January

Testing the hypothesis that:

H0 : β 1 = β 2 = 0 (There is no linear relationship between the dependent

Seeing that F = 168.47 and p = 0.00, we reject the Null Hypothesis.

* we reject the hypothesis that β 1 = 0. t = -16.17; p = 0.000. Our point estimate

* we reject the hypothesis that β 2 = 0. t = -8.543; p = 0.000. Our point estimate

The ‘ENTER / Component’ approach would yield:

Variables Variables Adjusted Std. Error of

Variables Variables Adjusted Std. Error of

SSR (Xk all variables except k) = SSR(X1 and X2) – SSR(X2)

SSR(X1 X2) = SSR (X1 and X2) – SSR(X2) and

Contribution of variable X2 given X1 has been included :

SSR(X2 X1) = SSR (X1 and X2) – SSR(X1)

Contribution of Temperature (X1) After Insulation Has Been Added:

SSR(X1 X2) = 228,015 – 51,076 = 176, 939

then FPartial = 176,939 / 676.717 = 261.47; p = 0.000

Contribution of Insulation (X2) After Temperature Has Been Added:

SSR(X2 X1) = 228,015 – 178,624 = 49,391

then FPartial = 49,391 / 676.717 = 72.99; p = 0.000

The Coefficients may be obtained as:

r21.2 = -.9782 = .9564 = 95.64%

r22.1 = -.9272 = .8593 = 85.93%

Other Important Measures and Analyses in

To solve this problem, we use the Variance Inflationary Factor (VIF):

VIF1 = VIF2 = 1 / { 1 – (0.008922) } ≅ 1.00

The Cp Statistic and Model Building

In building a model through Forward, Backward, or Stepwise inclusion, the statistician

p = number of independent variables included in a model

the goal is to find models whose Cp is close to or below ( p + 1 )

You might also like