You are on page 1of 51

Multiple Regression

Multiple Regression
The test you choose depends on level of measurement:
Independent Variable Dependent Variable Test

Dichotomous Interval-Ratio Independent Samples t-test


Dichotomous

Nominal Nominal Cross Tabs


Dichotomous Dichotomous

Nominal Interval-Ratio ANOVA


Dichotomous Dichotomous

Interval-Ratio Interval-Ratio Bivariate Regression/Correlation


Dichotomous

Two or More…
Interval-Ratio
Dichotomous Interval-Ratio Multiple Regression
Multiple Regression
 Multiple Regression is very popular among
sociologists.
 Most social phenomena have more than one
cause.
 It is very difficult to manipulate just one social
variable through experimentation.
 Sociologists must attempt to model complex
social realities to explain them.
Multiple Regression
 Multiple Regression allows us to:
 Use several variables at once to explain the variation in a
continuous dependent variable.
 Isolate the unique effect of one variable on the continuous
dependent variable while taking into consideration that
other variables are affecting it too.
 Write a mathematical equation that tells us the overall
effects of several variables together and the unique effects
of each on a continuous dependent variable.
 Control for other variables to demonstrate whether
bivariate relationships are spurious
Multiple Regression
 For example:
A sociologist may be interested in the
relationship between Education and Income
and Number of Children in a family.

Independent Variables Dependent Variable


Education

Number of Children
Family Income
Multiple Regression
 For example:
 Null Hypothesis: There is no relationship between
education of respondents and the number of children in
families. Ho : b1 = 0
 Null Hypothesis: There is no relationship between family
income and the number of children in families. Ho : b2 = 0

Independent Variables Dependent Variable


Education
Number of Children
Family Income
Multiple Regression
 Bivariate regression is based on fitting a line as close
as possible to the plotted coordinates of your data on
a two-dimensional graph.
 Trivariate regression is based on fitting a plane as
close as possible to the plotted coordinates of your
data on a three-dimensional graph.
Case: 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25
Children (Y): 2 5 1 9 6 3 0 3 7 7 2 5 1 9 6 3 0 3 7 14 2 5 1 9 6
Education (X1) 12 16 2012 9 18 16 14 9 12 12 10 20 11 9 18 16 14 9 8 12 10 20 11 9
Income 1=$10K (X2): 3 4 9 5 4 12 10 1 4 3 10 4 9 4 4 12 10 6 4 1 10 3 9 2 4
Multiple Regression
Y
Plotted coordinates
(1 – 10) for Education,
Income and Number of
Children
0

X2 X1

Case: 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25
Children (Y): 2 5 1 9 6 3 0 3 7 7 2 5 1 9 6 3 0 3 7 14 2 5 1 9 6
Education (X1) 12 16 2012 9 18 16 14 9 12 12 10 20 11 9 18 16 14 9 8 12 10 20 11 9
Income 1=$10K (X2): 3 4 9 5 4 12 10 1 4 3 10 4 9 4 4 12 10 6 4 1 10 3 9 2 4
Multiple Regression
Y
What multiple regression
does is fit a plane to
these coordinates.

X2 X1

Case: 1 2 3 4 5 6 7 8 9 10
Children (Y): 2 5 1 9 6 3 0 3 7 7
Education (X1) 12 16 2012 9 18 16 14 9 12
Income 1=$10K (X2): 3 4 9 5 4 12 10 1 4 3
Multiple Regression
 Mathematically, that plane is:

Y = a + b1X1 + b2X2
a = y-intercept, where X’s equal zero
b=coefficient or slope for each variable

For

our problem, SPSS says the equation is:
Y = 11.8 - .36X1 - .40X2
Expected # of Children = 11.8 - .36*Educ - .40*Income
Muliple Regression
Conducting a Test of Significance for the slopes of the Regression Shape

By slapping the sampling distribution for the slopes over a guess of the
population’s slopes, Ho, we can find out whether our sample could have
been drawn from a population where the slopes are equal to our guess.

1. Two-tailed significance test for α -level = .05


2. Critical t = +/- 1.96
3. To find if there is a significant slope in the population,
H o: β 1 = 0 ; β 2 = 0 ∧
H a: β 1 ≠ 0 ; β 2 ≠ 0 Σ ( Y – Y )2
4. Collect Data n-2
5. Calculate t (z): t = b – β o s.e. =
(for each) s.e. Σ ( X – X )2
6. Make decision about the null hypotheses
7. Find P-values
Multiple Regression
Model Summary

Adjusted Std. Error of


Model R R Square R Square the Estimate
1 .757 a .573 .534 2.33785
a. Predictors: (Constant), Income, Education
ANOVAb

Sum of
Model Squares df Mean Square F Sig.
1 Regression 161.518 2 80.759 14.776 .000 a
∧ Residual 120.242 22 5.466
Y = 11.8 - .36X1 - .40X2 Total 281.760 24
a. Predictors: (Constant), Income, Education
b. Dependent Variable: Children

Coefficientsa

Unstandardized Standardized
Coefficients Coefficients
Model B Std. Error Beta t Sig. Sig. Tests
1 (Constant) 11.770 1.734 6.787 .000
Education -.364 .173 -.412 -2.105 .047 t-scores and
Income -.403 .194 -.408 -2.084 .049
a. Dependent Variable: Children P-values
Multiple Regression
 R2
 TSS – SSE / TSS
 TSS = Distance from mean to value on Y for each case
 SSE = Distance from shape to value on Y for each case
 Can be interpreted the same for multiple regression—joint
explanatory value of all of your variables (or “your model”)
 Can request a change in R2 test from SPSS to see if adding new
variables improves the fit of your model

Model Summary

Adjusted Std. Error of


Model R R Square R Square the Estimate
1 .757 a .573 .534 2.33785
a. Predictors: (Constant), Income, Education
Multiple Regression
Model Summary 57% of the variation in
Adjusted Std. Error of number of children is
Model R
a
R Square R Square the Estimate explained by education
1 .757 .573 .534 2.33785
a. Predictors: (Constant), Income, Education
and income!
ANOVAb

Sum of
Model Squares df Mean Square F Sig.
1 Regression 161.518 2 80.759 14.776 .000 a
∧ Residual 120.242 22 5.466
Y = 11.8 - .36X1 - .40X2 Total 281.760 24
a. Predictors: (Constant), Income, Education
b. Dependent Variable: Children

Coefficientsa

Unstandardized Standardized
Coefficients Coefficients
Model B Std. Error Beta t Sig.
1 (Constant) 11.770 1.734 6.787 .000
Education -.364 .173 -.412 -2.105 .047
Income -.403 .194 -.408 -2.084 .049
a. Dependent Variable: Children
Multiple Regression
r2
Model Summary

Adjusted Std. Error of Σ (Y – Y) - Σ (Y – Y)2
2
Model R R Square R Square the Estimate
1 .757 a .573 .534 2.33785 Σ (Y – Y)2
a. Predictors: (Constant), Income, Education
ANOVAb

Sum of
Model Squares df Mean Square F Sig.
1 Regression 161.518 2 80.759 14.776 .000 a
∧ Residual 120.242 22 5.466
Y = 11.8 - .36X1 - .40X2 Total 281.760 24
a. Predictors: (Constant), Income, Education
b. Dependent Variable: Children

Coefficientsa 161.518 ÷ 261.76 = .573


Unstandardized Standardized
Coefficients Coefficients
Model B Std. Error Beta t Sig.
1 (Constant) 11.770 1.734 6.787 .000
Education -.364 .173 -.412 -2.105 .047
Income -.403 .194 -.408 -2.084 .049
a. Dependent Variable: Children
Multiple Regression
So what does our equation tell us?

Y = 11.8 - .36X1 - .40X2
Expected # of Children = 11.8 - .36*Educ - .40*Income

Try “plugging in” some values for your


variables.
Multiple Regression
So what does our equation tell us?
^
Y = 11.8 - .36X1 - .40X2
Expected # of Children = 11.8 - .36*Educ - .40*Income

If Education equals:& If Income Equals: Then, children equals:


0 0 11.8
10 0 8.2
10 10 4.2
20 10 0.6
20 11 0.2
Multiple Regression
So what does our equation tell us?
^
Y = 11.8 - .36X1 - .40X2
Expected # of Children = 11.8 - .36*Educ - .40*Income

If Education equals:& If Income Equals: Then, children equals:


1 0 11.44
1 1 11.04
1 5 9.44
1 10 7.44
1 15 5.44
Multiple Regression
So what does our equation tell us?
^
Y = 11.8 - .36X1 - .40X2
Expected # of Children = 11.8 - .36*Educ - .40*Income

If Education equals:& If Income Equals: Then, children equals:


0 1 11.40
1 1 11.04
5 1 9.60
10 1 7.80
15 1 6.00
Multiple Regression
If graphed, holding one variable constant produces a two-
dimensional graph for the other variable.

Y 11.40 11.44
Y
b = -.36 b = -.4

6.00 5.44

0 15
0 15
X1 = Education X2 = Income
Multiple Regression
 An interesting effect of controlling for other
variables is “Simpson’s Paradox.”
 The direction of relationship between two
variables can change when you control for
another variable.

+ ∧
Education Crime Rate Y = -51.3 + 1.5X
Multiple Regression
 “Simpson’s Paradox”
+ ∧
Education Crime Rate Y = -51.3 + 1.5X1

+ Education
Urbanization
(is related to +
both)
Crime Rate

Regression Controlling for Urbanization

Education - ∧
Crime Rate Y = 58.9 - .6X1 + .7X2
+
Urbanization
Multiple Regression

Crime

Original
Regression Line

Looking at each level of


urbanization, new lines

Rural
Small town
Suburban
Education City
Multiple Regression
Now… More Variables!
 The social world is very complex.

 What happens when you have even more variables?

 For example:
A sociologist may be interested in the effects of Education, Income,
Sex, and Gender Attitudes on Number of Children in a family.
Independent Variables Dependent Variable
Education
Family Income Number of Children
Sex
Gender Attitudes
Multiple Regression
 Null Hypotheses:
1. There will be no relationship between education of respondents and
the number of children in families. Ho : b1 = 0 Ha : b1 ≠ 0
2. There will be no relationship between family income and the number
of children in families. Ho : b2 = 0 Ha : b2 ≠ 0
3. There will be no relationship between sex and number of children.
Ho: b3 = 0 Ha : b3 ≠ 0
4. There will be no relationship between gender attitudes and number
of children. Ho : b4 = 0 Ha : b4 ≠ 0
Independent Variables Dependent Variable
Education
Family Income
Sex
Number of Children
Gender Attitudes
Multiple Regression
 Bivariate regression is based on fitting a line as close
as possible to the plotted coordinates of your data on a
two-dimensional graph.
 Trivariate regression is based on fitting a plane as
close as possible to the plotted coordinates of your
data on a three-dimensional graph.
 Regression with more than two independent variables
is based on fitting a shape to your constellation of data
on an multi-dimensional graph.
Multiple Regression
 Regression with more than two independent variables
is based on fitting a shape to your constellation of
data on an multi-dimensional graph.
 The shape will be placed so that it minimizes the
distance (sum of squared errors) from the shape to
every data point.
Multiple Regression
 Regression with more than two independent variables
is based on fitting a shape to your constellation of data
on an multi-dimensional graph.
 The shape will be placed so that it minimizes the
distance (sum of squared errors) from the shape to
every data point.
 The shape is no longer a line, but if you hold all other
variables constant, it is linear for each independent
variable.
Multiple Regression
Y

Imagining a graph with four dimensions!


Y
Y
Y
Y

X2 X1
0
0
0
X2 0 X1
X2 X1
X2 X1
X2 X1
Multiple Regression
For∧our problem, our equation could be:
Y = 7.5 - .30X1 - .40X2 + 0.5X3 + 0.25X4

E(Children) =
7.5 - .30*Educ - .40*Income + 0.5*Sex + 0.25*Gender Att.
Multiple Regression
So what does our equation tell us?
^
Y = 7.5 - .30X1 - .40X2 + 0.5X3 + 0.25X4
E(Children) =
7.5 - .30*Educ - .40*Income + 0.5*Sex + 0.25*Gender Att.
Education: Income: Sex: Gender Att: Children:
10 5 0 0 2.5
10 5 0 5 3.75
10 10 0 5 1.75
10 5 1 0 3.0
10 5 1 5 4.25
Multiple Regression
Each variable, holding the other variables constant, has a linear,
two-dimensional graph of its relationship with the dependent
variable.
Here we hold every other variable constant at “zero.”
7.5
Y Y 7.5
b = -.3
b = -.4
4.5
3.5

0 10 0 10
X2 = Education X1 = Income
^Y = 7.5 - .30X1 - .40X2 + 0.5X3 + 0.25X4
Multiple Regression
Each variable, holding the other variables constant, has a linear,
two-dimensional graph of its relationship with the dependent
variable.
Here we hold every other variable constant at “zero.” 8.75
b = .25
Y Y
b = .5
8

7.5 7.5

0 1 0 5
X3 = Sex X4 = Gender Attitudes
^Y = 7.5 - .30X1 - .40X2 + 0.5X3 + 0.25X4
Multiple Regression

Okay, we’re almost


through with regression!
Multiple Regression
 Dummy Variables
What are
dummy
variables?!

 They are simply dichotomous variables that are entered into


regression. They have 0 – 1 coding where 0 = absence of
something and 1 = presence of something. E.g., Female
(0=M; 1=F) or Southern (0=Non-Southern; 1=Southern).
Multiple Regression
Dummy Variables
are especially nice
because they allow But YOU
us to use nominal said we
variables in CAN’T do
regression. that!
A nominal variable
has no rank or order,
rendering the
numerical coding
scheme useless for
regression.
Multiple Regression
 The way you use nominal variables in regression is by
converting them to a series of dummy variables.

Recode into different


Nomimal Variable Dummy Variables
Race 1. White
1 = White 0 = Not White; 1 = White
2 = Black 2. Black
3 = Other 0 = Not Black; 1 = Black
3. Other
0 = Not Other; 1 = Other
Multiple Regression
 The way you use nominal variables in regression is by converting them to
a series of dummy variables.
Recode into different
Nomimal Variable Dummy Variables
Religion 1. Catholic
1 = Catholic 0 = Not Catholic; 1 = Catholic
2 = Protestant 2. Protestant
3 = Jewish 0 = Not Prot.; 1 = Protestant
4 = Muslim 3. Jewish
5 = Other Religions 0 = Not Jewish; 1 = Jewish
4. Muslim
0 = Not Muslim; 1 = Muslim
5. Other Religions
0 = Not Other; 1 = Other Relig.
Multiple Regression
 When you need to use a nominal variable in
regression (like race), just convert it to a
series of dummy variables.
 When you enter the variables into your
model, you MUST LEAVE OUT ONE OF
THE DUMMIES.
Leave Out One Enter Rest into Regression
White Black
Other
Multiple Regression
 The reason you MUST LEAVE OUT ONE OF THE
DUMMIES is that regression is mathematically
impossible without an excluded group.
 If all were in, holding one of them constant would
prohibit variation in all the rest.
Leave Out One Enter Rest into Regression
Catholic Protestant
Jewish
Muslim
Other Religion
Multiple Regression
 The regression equations for dummies will
look the same.
For Race, with 3 dummies, predicting self-esteem:


Y = a + b1X1 + b2X2
a = the y-intercept,
b1 = the slope b2 = the slope
which in this case is
the predicted value for variable for variable
of self-esteem for X1, black X2, other
the excluded group,
white.
Multiple Regression
 If our equation were:
For Race, with 3 dummies, predicting self-esteem:
Plugging in values for
∧ the dummies tells you
Y = 28 + 5X1 – 2X2 each group’s self-esteem
average:

a = the y-intercept, White = 28


5 = the slope -2 = the slope
which in this case is Black = 33
for variable for variable
the predicted value
X1, black X2, other Other = 26
of self-esteem for
the excluded group,
When cases’ values for X1 = 0 and X2 = 0, they are white;
white.
when X1 = 1 and X2 = 0, they are black;
when X1 = 0 and X2 = 1, they are other.
Multiple Regression
 Dummy variables can be entered into multiple
regression along with other dichotomous and
continuous variables.
 For example, you could regress self-esteem on sex,
race, and education:
Y
∧ =a+b X +b X + b X + b X
1 1 2 2 3 3 4 4
X1 = Female
X2 = Black
How would you interpret this? X3 = Other
Y

= 30 – 4X1 + 5X2 – 2X3 + 0.3X4 X4 = Education
Multiple Regression
X1 = Female
How would you interpret this?
∧ X2 = Black
Y = 30 – 4X1 + 5X2 – 2X3 + 0.3X4
X3 = Other
X4 = Education
1. Women’s self-esteem is 4 points lower than men’s.
2. Blacks’ self-esteem is 5 points higher than whites’.
3. Others’ self-esteem is 2 points lower than whites’ and
consequently 7 points lower than blacks’.
4. Each year of education improves self-esteem by 0.3 units.
Multiple Regression
How would you interpret this? X1 = Female

Y∧ = 30 – 4X1 + 5X2 – 2X3 + 0.3X4 X2 = Black


X3 = Other
X = Education
Plugging in some select values, we’d get self-esteem for select
4
groups:
 White males with 10 years of education = 33
 Black males with 10 years of education = 38
 Other females with 10 years of education = 27
 Other females with 16 years of education = 28.8
Multiple Regression
How would you interpret this? X1 = Female

Y∧ = 30 – 4X1 + 5X2 – 2X3 + 0.3X4 X2 = Black


X3 = Other
X4 = relationship
The same regression rules apply. The slopes represent the linear Education
of each independent variable in relation to the dependent while holding
all other variables constant.

Make sure you get into the habit of saying


the slope is the effect of an independent
variable on the dependent variable “while
holding everything else constant.”
Multiple Regression
Standardized Coefficients
 Sometimes you want to know whether one variable has a
larger impact on your dependent variable than another.
 If your variables have different units of measure, it is hard
to compare their effects.
 For example, if wages go up one thousand dollars for
each year of education, is that a greater effect than if
wages go up five hundred dollars for each year increase in
age.
Multiple Regression
Standardized Coefficients
 So which is better for increasing wages, education or aging?

 One thing you can do is “standardize” your slopes so that you can
compare the standard deviation increase in your dependent variable for
each standard deviation increase in your independent variables.
 You might find that Wages go up 0.3 standard deviations for each
standard deviation increase in education, but 0.4 standard deviations
for each standard deviation increase in age.
Multiple Regression
Standardized Coefficients
 Recall that standardizing regression coefficients is accomplished by the formula:
b(Sx/Sy)

Coefficients a

Unstandardized Standardized
Coefficients Coefficients
Model B Std. Error Beta t Sig.
 In the example1 above, education and income have very comparable
(Constant)
11.770 1.734 6.787 effects on number of
.000
Education
children. Income
-.364
-.403
.173
.194
-.412
-.408
-2.105
-2.084
.047
.049
 Each lowers the number of children by .4 standard deviations for a standard deviation
a. Dependent Variable: Children

increase in each, controlling for the other.


Multiple Regression
Standardized Coefficients
 One last note of caution...
 It does not make sense to standardize slopes for
dichotomous variables.
 It makes no sense to refer to standard deviation increases
in sex, or in race--these are either 0 or they are 1 only.
Multiple Regression

Give yourself a hand…


You now understand more
statistics that 99% of the
population!
You are well-qualified for
understanding most
sociological research papers.

You might also like