Professional Documents
Culture Documents
Multiple Regression
The test you choose depends on level of measurement:
Independent Variable Dependent Variable Test
Two or More…
Interval-Ratio
Dichotomous Interval-Ratio Multiple Regression
Multiple Regression
Multiple Regression is very popular among
sociologists.
Most social phenomena have more than one
cause.
It is very difficult to manipulate just one social
variable through experimentation.
Sociologists must attempt to model complex
social realities to explain them.
Multiple Regression
Multiple Regression allows us to:
Use several variables at once to explain the variation in a
continuous dependent variable.
Isolate the unique effect of one variable on the continuous
dependent variable while taking into consideration that
other variables are affecting it too.
Write a mathematical equation that tells us the overall
effects of several variables together and the unique effects
of each on a continuous dependent variable.
Control for other variables to demonstrate whether
bivariate relationships are spurious
Multiple Regression
For example:
A sociologist may be interested in the
relationship between Education and Income
and Number of Children in a family.
Number of Children
Family Income
Multiple Regression
For example:
Null Hypothesis: There is no relationship between
education of respondents and the number of children in
families. Ho : b1 = 0
Null Hypothesis: There is no relationship between family
income and the number of children in families. Ho : b2 = 0
X2 X1
Case: 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25
Children (Y): 2 5 1 9 6 3 0 3 7 7 2 5 1 9 6 3 0 3 7 14 2 5 1 9 6
Education (X1) 12 16 2012 9 18 16 14 9 12 12 10 20 11 9 18 16 14 9 8 12 10 20 11 9
Income 1=$10K (X2): 3 4 9 5 4 12 10 1 4 3 10 4 9 4 4 12 10 6 4 1 10 3 9 2 4
Multiple Regression
Y
What multiple regression
does is fit a plane to
these coordinates.
X2 X1
Case: 1 2 3 4 5 6 7 8 9 10
Children (Y): 2 5 1 9 6 3 0 3 7 7
Education (X1) 12 16 2012 9 18 16 14 9 12
Income 1=$10K (X2): 3 4 9 5 4 12 10 1 4 3
Multiple Regression
Mathematically, that plane is:
∧
Y = a + b1X1 + b2X2
a = y-intercept, where X’s equal zero
b=coefficient or slope for each variable
For
∧
our problem, SPSS says the equation is:
Y = 11.8 - .36X1 - .40X2
Expected # of Children = 11.8 - .36*Educ - .40*Income
Muliple Regression
Conducting a Test of Significance for the slopes of the Regression Shape
By slapping the sampling distribution for the slopes over a guess of the
population’s slopes, Ho, we can find out whether our sample could have
been drawn from a population where the slopes are equal to our guess.
Sum of
Model Squares df Mean Square F Sig.
1 Regression 161.518 2 80.759 14.776 .000 a
∧ Residual 120.242 22 5.466
Y = 11.8 - .36X1 - .40X2 Total 281.760 24
a. Predictors: (Constant), Income, Education
b. Dependent Variable: Children
Coefficientsa
Unstandardized Standardized
Coefficients Coefficients
Model B Std. Error Beta t Sig. Sig. Tests
1 (Constant) 11.770 1.734 6.787 .000
Education -.364 .173 -.412 -2.105 .047 t-scores and
Income -.403 .194 -.408 -2.084 .049
a. Dependent Variable: Children P-values
Multiple Regression
R2
TSS – SSE / TSS
TSS = Distance from mean to value on Y for each case
SSE = Distance from shape to value on Y for each case
Can be interpreted the same for multiple regression—joint
explanatory value of all of your variables (or “your model”)
Can request a change in R2 test from SPSS to see if adding new
variables improves the fit of your model
Model Summary
Sum of
Model Squares df Mean Square F Sig.
1 Regression 161.518 2 80.759 14.776 .000 a
∧ Residual 120.242 22 5.466
Y = 11.8 - .36X1 - .40X2 Total 281.760 24
a. Predictors: (Constant), Income, Education
b. Dependent Variable: Children
Coefficientsa
Unstandardized Standardized
Coefficients Coefficients
Model B Std. Error Beta t Sig.
1 (Constant) 11.770 1.734 6.787 .000
Education -.364 .173 -.412 -2.105 .047
Income -.403 .194 -.408 -2.084 .049
a. Dependent Variable: Children
Multiple Regression
r2
Model Summary
∧
Adjusted Std. Error of Σ (Y – Y) - Σ (Y – Y)2
2
Model R R Square R Square the Estimate
1 .757 a .573 .534 2.33785 Σ (Y – Y)2
a. Predictors: (Constant), Income, Education
ANOVAb
Sum of
Model Squares df Mean Square F Sig.
1 Regression 161.518 2 80.759 14.776 .000 a
∧ Residual 120.242 22 5.466
Y = 11.8 - .36X1 - .40X2 Total 281.760 24
a. Predictors: (Constant), Income, Education
b. Dependent Variable: Children
Y 11.40 11.44
Y
b = -.36 b = -.4
6.00 5.44
0 15
0 15
X1 = Education X2 = Income
Multiple Regression
An interesting effect of controlling for other
variables is “Simpson’s Paradox.”
The direction of relationship between two
variables can change when you control for
another variable.
+ ∧
Education Crime Rate Y = -51.3 + 1.5X
Multiple Regression
“Simpson’s Paradox”
+ ∧
Education Crime Rate Y = -51.3 + 1.5X1
+ Education
Urbanization
(is related to +
both)
Crime Rate
Education - ∧
Crime Rate Y = 58.9 - .6X1 + .7X2
+
Urbanization
Multiple Regression
Crime
Original
Regression Line
Rural
Small town
Suburban
Education City
Multiple Regression
Now… More Variables!
The social world is very complex.
For example:
A sociologist may be interested in the effects of Education, Income,
Sex, and Gender Attitudes on Number of Children in a family.
Independent Variables Dependent Variable
Education
Family Income Number of Children
Sex
Gender Attitudes
Multiple Regression
Null Hypotheses:
1. There will be no relationship between education of respondents and
the number of children in families. Ho : b1 = 0 Ha : b1 ≠ 0
2. There will be no relationship between family income and the number
of children in families. Ho : b2 = 0 Ha : b2 ≠ 0
3. There will be no relationship between sex and number of children.
Ho: b3 = 0 Ha : b3 ≠ 0
4. There will be no relationship between gender attitudes and number
of children. Ho : b4 = 0 Ha : b4 ≠ 0
Independent Variables Dependent Variable
Education
Family Income
Sex
Number of Children
Gender Attitudes
Multiple Regression
Bivariate regression is based on fitting a line as close
as possible to the plotted coordinates of your data on a
two-dimensional graph.
Trivariate regression is based on fitting a plane as
close as possible to the plotted coordinates of your
data on a three-dimensional graph.
Regression with more than two independent variables
is based on fitting a shape to your constellation of data
on an multi-dimensional graph.
Multiple Regression
Regression with more than two independent variables
is based on fitting a shape to your constellation of
data on an multi-dimensional graph.
The shape will be placed so that it minimizes the
distance (sum of squared errors) from the shape to
every data point.
Multiple Regression
Regression with more than two independent variables
is based on fitting a shape to your constellation of data
on an multi-dimensional graph.
The shape will be placed so that it minimizes the
distance (sum of squared errors) from the shape to
every data point.
The shape is no longer a line, but if you hold all other
variables constant, it is linear for each independent
variable.
Multiple Regression
Y
X2 X1
0
0
0
X2 0 X1
X2 X1
X2 X1
X2 X1
Multiple Regression
For∧our problem, our equation could be:
Y = 7.5 - .30X1 - .40X2 + 0.5X3 + 0.25X4
E(Children) =
7.5 - .30*Educ - .40*Income + 0.5*Sex + 0.25*Gender Att.
Multiple Regression
So what does our equation tell us?
^
Y = 7.5 - .30X1 - .40X2 + 0.5X3 + 0.25X4
E(Children) =
7.5 - .30*Educ - .40*Income + 0.5*Sex + 0.25*Gender Att.
Education: Income: Sex: Gender Att: Children:
10 5 0 0 2.5
10 5 0 5 3.75
10 10 0 5 1.75
10 5 1 0 3.0
10 5 1 5 4.25
Multiple Regression
Each variable, holding the other variables constant, has a linear,
two-dimensional graph of its relationship with the dependent
variable.
Here we hold every other variable constant at “zero.”
7.5
Y Y 7.5
b = -.3
b = -.4
4.5
3.5
0 10 0 10
X2 = Education X1 = Income
^Y = 7.5 - .30X1 - .40X2 + 0.5X3 + 0.25X4
Multiple Regression
Each variable, holding the other variables constant, has a linear,
two-dimensional graph of its relationship with the dependent
variable.
Here we hold every other variable constant at “zero.” 8.75
b = .25
Y Y
b = .5
8
7.5 7.5
0 1 0 5
X3 = Sex X4 = Gender Attitudes
^Y = 7.5 - .30X1 - .40X2 + 0.5X3 + 0.25X4
Multiple Regression
∧
Y = a + b1X1 + b2X2
a = the y-intercept,
b1 = the slope b2 = the slope
which in this case is
the predicted value for variable for variable
of self-esteem for X1, black X2, other
the excluded group,
white.
Multiple Regression
If our equation were:
For Race, with 3 dummies, predicting self-esteem:
Plugging in values for
∧ the dummies tells you
Y = 28 + 5X1 – 2X2 each group’s self-esteem
average:
One thing you can do is “standardize” your slopes so that you can
compare the standard deviation increase in your dependent variable for
each standard deviation increase in your independent variables.
You might find that Wages go up 0.3 standard deviations for each
standard deviation increase in education, but 0.4 standard deviations
for each standard deviation increase in age.
Multiple Regression
Standardized Coefficients
Recall that standardizing regression coefficients is accomplished by the formula:
b(Sx/Sy)
Coefficients a
Unstandardized Standardized
Coefficients Coefficients
Model B Std. Error Beta t Sig.
In the example1 above, education and income have very comparable
(Constant)
11.770 1.734 6.787 effects on number of
.000
Education
children. Income
-.364
-.403
.173
.194
-.412
-.408
-2.105
-2.084
.047
.049
Each lowers the number of children by .4 standard deviations for a standard deviation
a. Dependent Variable: Children