Multiple Regression

Multiple Regression
Multiple Regression
The test you choose depends on level of measurement:
Independent Variable Dependent Variable Test
Dichotomous Interval-Ratio Independent Samples t-test

Dichotomous
Nominal Nominal Cross Tabs

Dichotomous Dichotomous
Nominal Interval-Ratio ANOVA

Dichotomous Dichotomous
Interval-Ratio Interval-Ratio Bivariate Regression/Correlation

Dichotomous
Two or More…
Interval-Ratio
Dichotomous Interval-Ratio Multiple Regression
Multiple Regression
 Multiple Regression is very popular among
sociologists.
 Most social phenomena have more than one
cause.
 It is very difficult to manipulate just one social
variable through experimentation.
 Sociologists must attempt to model complex
social realities to explain them.
Multiple Regression
 Multiple Regression allows us to:
 Use several variables at once to explain the variation in a
continuous dependent variable.
 Isolate the unique effect of one variable on the continuous
dependent variable while taking into consideration that
other variables are affecting it too.
 Write a mathematical equation that tells us the overall
effects of several variables together and the unique effects
of each on a continuous dependent variable.
 Control for other variables to demonstrate whether
bivariate relationships are spurious
Multiple Regression
 For example:
A sociologist may be interested in the
relationship between Education and Income
and Number of Children in a family.
Independent Variables Dependent Variable

Education
Number of Children
Family Income
Multiple Regression
 For example:
 Null Hypothesis: There is no relationship between
education of respondents and the number of children in
families. Ho : b1 = 0
 Null Hypothesis: There is no relationship between family
income and the number of children in families. Ho : b2 = 0

Education
Number of Children
Family Income
Multiple Regression
 Bivariate regression is based on fitting a line as close
as possible to the plotted coordinates of your data on
a two-dimensional graph.
 Trivariate regression is based on fitting a plane as
close as possible to the plotted coordinates of your
data on a three-dimensional graph.
Case: 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25
Children (Y): 2 5 1 9 6 3 0 3 7 7 2 5 1 9 6 3 0 3 7 14 2 5 1 9 6
Education (X1) 12 16 2012 9 18 16 14 9 12 12 10 20 11 9 18 16 14 9 8 12 10 20 11 9
Income 1=$10K (X2): 3 4 9 5 4 12 10 1 4 3 10 4 9 4 4 12 10 6 4 1 10 3 9 2 4
Multiple Regression
Y
Plotted coordinates
(1 – 10) for Education,
Income and Number of
Children
0
X2 X1
Case: 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25
Children (Y): 2 5 1 9 6 3 0 3 7 7 2 5 1 9 6 3 0 3 7 14 2 5 1 9 6
Education (X1) 12 16 2012 9 18 16 14 9 12 12 10 20 11 9 18 16 14 9 8 12 10 20 11 9
Income 1=$10K (X2): 3 4 9 5 4 12 10 1 4 3 10 4 9 4 4 12 10 6 4 1 10 3 9 2 4
Multiple Regression
Y
What multiple regression
does is fit a plane to
these coordinates.
X2 X1
Case: 1 2 3 4 5 6 7 8 9 10
Children (Y): 2 5 1 9 6 3 0 3 7 7
Education (X1) 12 16 2012 9 18 16 14 9 12
Income 1=$10K (X2): 3 4 9 5 4 12 10 1 4 3
Multiple Regression
 Mathematically, that plane is:
∧
Y = a + b1X1 + b2X2
a = y-intercept, where X’s equal zero
b=coefficient or slope for each variable
For
∧
our problem, SPSS says the equation is:
Y = 11.8 - .36X1 - .40X2
Expected # of Children = 11.8 - .36*Educ - .40*Income
Muliple Regression
Conducting a Test of Significance for the slopes of the Regression Shape
By slapping the sampling distribution for the slopes over a guess of the
population’s slopes, Ho, we can find out whether our sample could have
been drawn from a population where the slopes are equal to our guess.
1. Two-tailed significance test for α -level = .05

2. Critical t = +/- 1.96
3. To find if there is a significant slope in the population,
H o: β 1 = 0 ; β 2 = 0 ∧
H a: β 1 ≠ 0 ; β 2 ≠ 0 Σ ( Y – Y )2
4. Collect Data n-2
5. Calculate t (z): t = b – β o s.e. =
(for each) s.e. Σ ( X – X )2
6. Make decision about the null hypotheses
7. Find P-values
Multiple Regression
Model Summary
Adjusted Std. Error of

Model R R Square R Square the Estimate
1 .757 a .573 .534 2.33785
a. Predictors: (Constant), Income, Education
ANOVAb
Sum of
Model Squares df Mean Square F Sig.
1 Regression 161.518 2 80.759 14.776 .000 a
∧ Residual 120.242 22 5.466
Y = 11.8 - .36X1 - .40X2 Total 281.760 24
b. Dependent Variable: Children
Coefficientsa
Unstandardized Standardized
Coefficients Coefficients
Model B Std. Error Beta t Sig. Sig. Tests
1 (Constant) 11.770 1.734 6.787 .000
Education -.364 .173 -.412 -2.105 .047 t-scores and
Income -.403 .194 -.408 -2.084 .049
a. Dependent Variable: Children P-values
Multiple Regression
 R2
 TSS – SSE / TSS
 TSS = Distance from mean to value on Y for each case
 SSE = Distance from shape to value on Y for each case
 Can be interpreted the same for multiple regression—joint
explanatory value of all of your variables (or “your model”)
 Can request a change in R2 test from SPSS to see if adding new
variables improves the fit of your model
Model Summary
Adjusted Std. Error of

1 .757 a .573 .534 2.33785
Multiple Regression
Model Summary 57% of the variation in
Adjusted Std. Error of number of children is
Model R
a
R Square R Square the Estimate explained by education
1 .757 .573 .534 2.33785
and income!
ANOVAb
Sum of
1 Regression 161.518 2 80.759 14.776 .000 a
∧ Residual 120.242 22 5.466
Y = 11.8 - .36X1 - .40X2 Total 281.760 24
Coefficientsa
Model B Std. Error Beta t Sig.
1 (Constant) 11.770 1.734 6.787 .000
Education -.364 .173 -.412 -2.105 .047
Income -.403 .194 -.408 -2.084 .049
a. Dependent Variable: Children
Multiple Regression
r2
Model Summary
∧
Adjusted Std. Error of Σ (Y – Y) - Σ (Y – Y)2
2
1 .757 a .573 .534 2.33785 Σ (Y – Y)2
ANOVAb
Sum of
1 Regression 161.518 2 80.759 14.776 .000 a
∧ Residual 120.242 22 5.466
Y = 11.8 - .36X1 - .40X2 Total 281.760 24
Coefficientsa 161.518 ÷ 261.76 = .573

1 (Constant) 11.770 1.734 6.787 .000
Education -.364 .173 -.412 -2.105 .047
Income -.403 .194 -.408 -2.084 .049
Multiple Regression
So what does our equation tell us?
∧
Y = 11.8 - .36X1 - .40X2
Try “plugging in” some values for your

variables.
Multiple Regression
^
Y = 11.8 - .36X1 - .40X2
If Education equals:& If Income Equals: Then, children equals:

0 0 11.8
10 0 8.2
10 10 4.2
20 10 0.6
20 11 0.2
Multiple Regression
^
Y = 11.8 - .36X1 - .40X2

1 0 11.44
1 1 11.04
1 5 9.44
1 10 7.44
1 15 5.44
Multiple Regression
^
Y = 11.8 - .36X1 - .40X2

0 1 11.40
1 1 11.04
5 1 9.60
10 1 7.80
15 1 6.00
Multiple Regression
If graphed, holding one variable constant produces a two-
dimensional graph for the other variable.
Y 11.40 11.44
Y
b = -.36 b = -.4
6.00 5.44
0 15
0 15
X1 = Education X2 = Income
Multiple Regression
 An interesting effect of controlling for other
variables is “Simpson’s Paradox.”
 The direction of relationship between two
variables can change when you control for
another variable.
+ ∧
Education Crime Rate Y = -51.3 + 1.5X
Multiple Regression
 “Simpson’s Paradox”
+ ∧
Education Crime Rate Y = -51.3 + 1.5X1
+ Education
Urbanization
(is related to +
both)
Crime Rate
Regression Controlling for Urbanization
Education - ∧
Crime Rate Y = 58.9 - .6X1 + .7X2
+
Urbanization
Multiple Regression
Crime
Original
Regression Line
Looking at each level of

urbanization, new lines
Rural
Small town
Suburban
Education City
Multiple Regression
Now… More Variables!
 The social world is very complex.
 What happens when you have even more variables?
 For example:
A sociologist may be interested in the effects of Education, Income,
Sex, and Gender Attitudes on Number of Children in a family.
Education
Family Income Number of Children
Sex
Gender Attitudes
Multiple Regression
 Null Hypotheses:
1. There will be no relationship between education of respondents and
the number of children in families. Ho : b1 = 0 Ha : b1 ≠ 0
2. There will be no relationship between family income and the number
of children in families. Ho : b2 = 0 Ha : b2 ≠ 0
3. There will be no relationship between sex and number of children.
Ho: b3 = 0 Ha : b3 ≠ 0
4. There will be no relationship between gender attitudes and number
of children. Ho : b4 = 0 Ha : b4 ≠ 0
Education
Family Income
Sex
Number of Children
Gender Attitudes
Multiple Regression
 Bivariate regression is based on fitting a line as close
as possible to the plotted coordinates of your data on a
two-dimensional graph.
 Trivariate regression is based on fitting a plane as
close as possible to the plotted coordinates of your
data on a three-dimensional graph.
 Regression with more than two independent variables
is based on fitting a shape to your constellation of data
on an multi-dimensional graph.
Multiple Regression
is based on fitting a shape to your constellation of
data on an multi-dimensional graph.
 The shape will be placed so that it minimizes the
distance (sum of squared errors) from the shape to
every data point.
Multiple Regression
is based on fitting a shape to your constellation of data
on an multi-dimensional graph.
 The shape will be placed so that it minimizes the
distance (sum of squared errors) from the shape to
every data point.
 The shape is no longer a line, but if you hold all other
variables constant, it is linear for each independent
variable.
Multiple Regression
Y
Imagining a graph with four dimensions!

Y
Y
Y
Y
X2 X1
0
0
0
X2 0 X1
X2 X1
X2 X1
X2 X1
Multiple Regression
For∧our problem, our equation could be:
Y = 7.5 - .30X1 - .40X2 + 0.5X3 + 0.25X4
E(Children) =
7.5 - .30*Educ - .40*Income + 0.5*Sex + 0.25*Gender Att.
Multiple Regression
^
Y = 7.5 - .30X1 - .40X2 + 0.5X3 + 0.25X4
E(Children) =
7.5 - .30*Educ - .40*Income + 0.5*Sex + 0.25*Gender Att.
Education: Income: Sex: Gender Att: Children:
10 5 0 0 2.5
10 5 0 5 3.75
10 10 0 5 1.75
10 5 1 0 3.0
10 5 1 5 4.25
Multiple Regression
Each variable, holding the other variables constant, has a linear,
two-dimensional graph of its relationship with the dependent
variable.
Here we hold every other variable constant at “zero.”
7.5
Y Y 7.5
b = -.3
b = -.4
4.5
3.5
0 10 0 10
X2 = Education X1 = Income
^Y = 7.5 - .30X1 - .40X2 + 0.5X3 + 0.25X4
Multiple Regression
Each variable, holding the other variables constant, has a linear,
two-dimensional graph of its relationship with the dependent
variable.
Here we hold every other variable constant at “zero.” 8.75
b = .25
Y Y
b = .5
8
7.5 7.5
0 1 0 5
X3 = Sex X4 = Gender Attitudes
^Y = 7.5 - .30X1 - .40X2 + 0.5X3 + 0.25X4
Multiple Regression
Okay, we’re almost

through with regression!
Multiple Regression
 Dummy Variables
What are
dummy
variables?!
 They are simply dichotomous variables that are entered into

regression. They have 0 – 1 coding where 0 = absence of
something and 1 = presence of something. E.g., Female
(0=M; 1=F) or Southern (0=Non-Southern; 1=Southern).
Multiple Regression
Dummy Variables
are especially nice
because they allow But YOU
us to use nominal said we
variables in CAN’T do
regression. that!
A nominal variable
has no rank or order,
rendering the
numerical coding
scheme useless for
regression.
Multiple Regression
 The way you use nominal variables in regression is by
converting them to a series of dummy variables.
Recode into different

Nomimal Variable Dummy Variables
Race 1. White
1 = White 0 = Not White; 1 = White
2 = Black 2. Black
3 = Other 0 = Not Black; 1 = Black
3. Other
0 = Not Other; 1 = Other
Multiple Regression
 The way you use nominal variables in regression is by converting them to
a series of dummy variables.
Recode into different
Nomimal Variable Dummy Variables
Religion 1. Catholic
1 = Catholic 0 = Not Catholic; 1 = Catholic
2 = Protestant 2. Protestant
3 = Jewish 0 = Not Prot.; 1 = Protestant
4 = Muslim 3. Jewish
5 = Other Religions 0 = Not Jewish; 1 = Jewish
4. Muslim
0 = Not Muslim; 1 = Muslim
5. Other Religions
0 = Not Other; 1 = Other Relig.
Multiple Regression
 When you need to use a nominal variable in
regression (like race), just convert it to a
series of dummy variables.
 When you enter the variables into your
model, you MUST LEAVE OUT ONE OF
THE DUMMIES.
Leave Out One Enter Rest into Regression
White Black
Other
Multiple Regression
 The reason you MUST LEAVE OUT ONE OF THE
DUMMIES is that regression is mathematically
impossible without an excluded group.
 If all were in, holding one of them constant would
prohibit variation in all the rest.
Leave Out One Enter Rest into Regression
Catholic Protestant
Jewish
Muslim
Other Religion
Multiple Regression
 The regression equations for dummies will
look the same.
For Race, with 3 dummies, predicting self-esteem:
∧
Y = a + b1X1 + b2X2
a = the y-intercept,
b1 = the slope b2 = the slope
which in this case is
the predicted value for variable for variable
of self-esteem for X1, black X2, other
the excluded group,
white.
Multiple Regression
 If our equation were:
For Race, with 3 dummies, predicting self-esteem:
Plugging in values for
∧ the dummies tells you
Y = 28 + 5X1 – 2X2 each group’s self-esteem
average:
a = the y-intercept, White = 28

5 = the slope -2 = the slope
which in this case is Black = 33
for variable for variable
the predicted value
X1, black X2, other Other = 26
of self-esteem for
the excluded group,
When cases’ values for X1 = 0 and X2 = 0, they are white;
white.
when X1 = 1 and X2 = 0, they are black;
when X1 = 0 and X2 = 1, they are other.
Multiple Regression
 Dummy variables can be entered into multiple
regression along with other dichotomous and
continuous variables.
 For example, you could regress self-esteem on sex,
race, and education:
Y
∧ =a+b X +b X + b X + b X
1 1 2 2 3 3 4 4
X1 = Female
X2 = Black
How would you interpret this? X3 = Other
Y
∧
= 30 – 4X1 + 5X2 – 2X3 + 0.3X4 X4 = Education
Multiple Regression
X1 = Female
How would you interpret this?
∧ X2 = Black
Y = 30 – 4X1 + 5X2 – 2X3 + 0.3X4
X3 = Other
X4 = Education
1. Women’s self-esteem is 4 points lower than men’s.
2. Blacks’ self-esteem is 5 points higher than whites’.
3. Others’ self-esteem is 2 points lower than whites’ and
consequently 7 points lower than blacks’.
4. Each year of education improves self-esteem by 0.3 units.
Multiple Regression
How would you interpret this? X1 = Female
Y∧ = 30 – 4X1 + 5X2 – 2X3 + 0.3X4 X2 = Black

X3 = Other
X = Education
Plugging in some select values, we’d get self-esteem for select
4
groups:
 White males with 10 years of education = 33
 Black males with 10 years of education = 38
 Other females with 10 years of education = 27
 Other females with 16 years of education = 28.8
Multiple Regression
How would you interpret this? X1 = Female
Y∧ = 30 – 4X1 + 5X2 – 2X3 + 0.3X4 X2 = Black

X3 = Other
X4 = relationship
The same regression rules apply. The slopes represent the linear Education
of each independent variable in relation to the dependent while holding
all other variables constant.
Make sure you get into the habit of saying

the slope is the effect of an independent
variable on the dependent variable “while
holding everything else constant.”
Multiple Regression
Standardized Coefficients
 Sometimes you want to know whether one variable has a
larger impact on your dependent variable than another.
 If your variables have different units of measure, it is hard
to compare their effects.
 For example, if wages go up one thousand dollars for
each year of education, is that a greater effect than if
wages go up five hundred dollars for each year increase in
age.
Multiple Regression
 So which is better for increasing wages, education or aging?
 One thing you can do is “standardize” your slopes so that you can
compare the standard deviation increase in your dependent variable for
each standard deviation increase in your independent variables.
 You might find that Wages go up 0.3 standard deviations for each
standard deviation increase in education, but 0.4 standard deviations
for each standard deviation increase in age.
Multiple Regression
 Recall that standardizing regression coefficients is accomplished by the formula:
b(Sx/Sy)
Coefficients a
 In the example1 above, education and income have very comparable
(Constant)
11.770 1.734 6.787 effects on number of
.000
Education
children. Income
-.364
-.403
.173
.194
-.412
-.408
-2.105
-2.084
.047
.049
 Each lowers the number of children by .4 standard deviations for a standard deviation
increase in each, controlling for the other.

Multiple Regression
 One last note of caution...
 It does not make sense to standardize slopes for
dichotomous variables.
 It makes no sense to refer to standard deviation increases
in sex, or in race--these are either 0 or they are 1 only.
Multiple Regression
Give yourself a hand…

You now understand more
statistics that 99% of the
population!
You are well-qualified for
understanding most
sociological research papers.

Multiple Regression

Uploaded by

Document Information

Original Description:

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Multiple Regression

Uploaded by

Copyright:

Available Formats

Multiple Regression

Dichotomous Interval-Ratio Independent Samples t-test

Nominal Nominal Cross Tabs

Nominal Interval-Ratio ANOVA

Interval-Ratio Interval-Ratio Bivariate Regression/Correlation

Independent Variables Dependent Variable

Independent Variables Dependent Variable

1. Two-tailed significance test for α -level = .05

Adjusted Std. Error of

Adjusted Std. Error of

Coefficientsa 161.518 ÷ 261.76 = .573

Try “plugging in” some values for your

If Education equals:& If Income Equals: Then, children equals:

If Education equals:& If Income Equals: Then, children equals:

If Education equals:& If Income Equals: Then, children equals:

Regression Controlling for Urbanization

Looking at each level of

 What happens when you have even more variables?

Imagining a graph with four dimensions!

Okay, we’re almost

 They are simply dichotomous variables that are entered into

Recode into different

a = the y-intercept, White = 28

Y∧ = 30 – 4X1 + 5X2 – 2X3 + 0.3X4 X2 = Black

Y∧ = 30 – 4X1 + 5X2 – 2X3 + 0.3X4 X2 = Black

Make sure you get into the habit of saying

increase in each, controlling for the other.

Give yourself a hand…

You might also like