Professional Documents
Culture Documents
Multiple Regression
14
18
10
22
14
26
30
20
10
0
0
30
20
10
0
0
Coefficientsa
Model
1
(Constant)
X
Unstandardized
Coefficients
B
Std. Error
2.000
.000
4.000
.000
Standardized
Coefficients
Beta
1.000
Sig.
.
.
.
.
a. Dependent Variable: Y
14
18
10
22
14
26
The constant
representing the
intercept is the value
that the dependent
variable would take
when all the predictors
are at a value of zero.
In some treatments
this is called B0 instead
of a
Model
1
(Constant)
X
Unstandardized
Coefficients
B
Std. Error
2.000
.000
4.000
.000
Standardized
Coefficients
Beta
1.000
Sig.
.
.
.
.
a. Dependent Variable: Y
In the bivariate case, where there is only one X and one Y, the
standardized beta weight will equal the correlation coefficient.
Lets confirm this by seeing what would happen if we convert
our raw scores to Z scores
In SPSS I have converted X and Y to two new variables, ZX and ZY, expressed in
standard score units. You achieve this by going to Analyze/ Descriptive/ Descriptives
(dont do this now), moving the variables you want to convert into the variables box,
and selecting save standardized values as variables. This creates the new variables
expressed as Z scores. Note that if you reran the linear regression analysis that we
just did on the raw scores, that in the output for the regression equation for
predicting the standard scores on Y the constant has dropped out and the equation
is now of the form y = Beta x, where Beta is equal to 1. In this case the z scores
are identical on X and Y although they certainly wouldnt always be
Coefficientsa
Model
1
(Constant)
Zscore(X)
Unstandardized
Coefficients
B
Std. Error
.000
.000
1.000
.000
Standardized
Coefficients
Beta
Zscore(X)
Pearson Correlation
Sig. (2-tailed)
N
Pearson Correlation
Sig. (2-tailed)
N
Zscore(Y)
Zscore(X)
1
1.000**
.
.
7
7
1.000**
1
.
.
7
7
1.000
Sig.
.
.
.
.
14
18
10
22
14
26
From your output you can obtain the regression equation for predicting
Average Female Life Expectancy from Daily Calorie Intake. The equation is Y
= 25.904 + .016X + e, where e is the error term. Thus for a country where
the average daily calorie intake is 3000 calories, the average female life
expectancy is about 25.904 + (.016)(3000) or 73.904 years. This is a raw
score regression equation
Significance
of constant
of little use.
Just says
that it
differs
significantly
from zero
(e.g when x
is zero, y is
not zero)
Coefficientsa
Model
1
(Constant)
Daily calorie intake
Unstandardized
Coefficients
B
Std. Error
25.904
4.175
.016
.001
Standardized
Coefficients
Beta
.775
t
6.204
10.491
Sig.
.000
.000
Model
1
Regression
Residual
Total
Sum of
Squares
5792.910
3842.477
9635.387
df
1
73
74
Mean Square
5792.910
52.637
F
110.055
Sig.
.000a
Model Summary
Model
1
R
.775a
R Square
.601
Adjusted
R Square
.596
Std. Error of
the Estimate
7.255
Look at the columns headed 95% confidence intervals. These columns put
confidence intervals based on the standard error of estimate around the regression
coefficients a and b. Thus for example in the table below we can say with 95%
confidence that the value of the constant a lies somewhere between 17.583 and
34.225, and the value of the regression coefficient b (unstandardized) lies
somewhere between .013 and .019)
Coefficientsa
Model
1
(Constant)
Daily calorie intake
Unstandardized
Coefficients
B
Std. Error
25.904
4.175
.016
.001
Standardized
Coefficients
Beta
.775
t
6.204
10.491
Sig.
.000
.000
Model Summary
Model
1
R
.775a
R Square
.601
Adjusted
R Square
.596
Std. Error of
the Estimate
7.255
Multivariate Analysis
Multiple regression is a relative of simple bivariate or zeroorder correlation (two interval-level variables)
In multiple regression, the investigator is concerned with
predicting a dependent or criterion variable from two or more
independent variables. The regression equation (raw score
version) takes the form Y = a + b1X1 + b2X2 + b3X3 +
..bnXn + e
Caucasian
AfricanAmerican
Subject 1
Caucas.
Subject 2
AfricanAmerican
Subject 3
Other
High Status
Medium Status
Subject 1
High
Status
Attire
Condition
Subject 2
Medium
Status
Attire
Condition
Subject 3
Low
Status
Attire
Condition
r YX1
r YX2
Sig. (1-tailed)
Daily calorie
intake
People who
read (%)
1.000
.776
.869
.776
.869
1.000
.682
.682
1.000
.000
.000
.000
.000
.
.000
.000
.
74
74
74
74
74
74
74
74
74
r X1X2
Model
1
(Constant)
People who read (%)
Daily calorie intake
Unstandardized
Coefficients
B
Std. Error
25.838
2.882
.315
.034
.007
.001
Standardized
Coefficients
Beta
.636
.342
t
8.964
9.202
4.949
Sig.
.000
.000
.000
Zero-order
Correlations
Partial
.869
.776
.738
.506
Part
.465
.250
Collinearity Statistics
Tolerance
VIF
.535
.535
1.868
1.868
Above are the raw (unstandardized) and standardized regression weights for
the regression of female life expectancy on daily calorie intake and
percentage of people who read. Consistent with our hand calculation, the
standardized regression coefficient (beta weight) for daily caloric intake is .
342. The beta weight for percentage of people who read is much larger, .
636. What this weight means is that for every unit change in percentage of
people who read (that is, for every increase by a factor of one standard
deviation on the people who read variable), Y (female life expectancy) will
increase by a multiple of .636 standard deviations. Note that both the beta
coefficients are significant at p < .001
R
.905a
R Square
.818
Adjusted
R Square
.813
Std. Error of
the Estimate
4.948
R Square
Change
.818
F Change
159.922
df1
df2
2
71
Sig. F Change
.000
Regression
Residual
Total
Sum of
Squares
7829.451
1738.008
9567.459
df
2
71
73
Mean Square
3914.726
24.479
F
159.922
Sig.
.000a
Coefficientsa
Model
1
(Constant)
Daily calorie intake
People who read (%)
Unstandardized
Coefficients
B
Std. Error
25.838
2.882
.007
.001
.315
.034
Standardized
Coefficients
Beta
.342
.636
t
8.964
4.949
9.202
Sig.
.000
.000
.000
Zero-order
Correlations
Partial
.776
.869
.506
.738
Part
.250
.465
Multicollinearity
Multicollinearity, contd
From our SPSS output we note that the correlation between our two predictors,
Daily Calorie Intake (X1) and People who Read (X2) is .682. This is a pretty high
correlation for two predictors to be interpreted independently: it means each
explains about half the variation in the other. If you look at the zero order
correlation of our Y variable, average life expectancy with % people who read,
you note that the correlation is quite high, .869. However, the value of r for the
two variable combination was .905, which is an improvement.
Correlations
Average
female life
expectancy
Pearson Correlation
r YX1
r YX2
Sig. (1-tailed)
Daily calorie
intake
People who
read (%)
1.000
.776
.869
.776
.869
1.000
.682
.682
1.000
.000
.000
.000
.000
.
.000
.000
.
74
74
74
74
74
74
74
74
74
r X1X2
The table below is excerpted from the more complete table on Slide 32. Look
at the tolerance value. Recall that zero tolerance means very high
multicollinearity (high intercorrelation among the predictors, which is bad).
Tolerance is .535 for both variables (since there are only two, the value is the
same for either one predicting the other)
Specification Errors