You are on page 1of 15

Chapter 9, page 1

Math 445 Chapter 9 Multiple Regression

Multiple regression refers to regression with multiple explanatory variables (but just one response
variable). Multiple regression is an amazingly flexible tool which can be used to model linear and
nonlinear relationships. Don’t be fooled by the “linear” in “linear regression”: we’ve already seen
how simple linear regression can be used to model nonlinear relationships by transforming one or both
of the explanatory and response variables. There are more ways using multiple regression. It’s even
possible to incorporate categorical variables into multiple regression models.

Examples:
1. One explanatory variable, but a quadratic relationship
µ (Y X ) = β 0 + β1 X + β 2 X 2
We can include higher order powers of X, although this is unusual unless there is a theoretical reason
for it. Note: we always include lower order terms when a higher order term is in a model. For
example, we always include X if X 2 is in the model.

2. Two explanatory variables:


µ (Y X 1 , X 2 ) = β 0 + β1 X 1 + β 2 X 2

3. Two explanatory variables with an interaction


µ (Y X 1 , X 2 ) = β 0 + β1 X 1 + β 2 X 2 + β 3 X 1 X 2

The term X 1 X 2 is the product of the two variables. We’ll see why this is called an interaction below.

4. The explanatory variables can be binary (0,1). In fact, the ANOVA and pooled two-sample t models
can be written as special cases of the linear regression model.

For normal-based inferences, we have the usual assumptions:


• Normality: the Y values at any particular combination of X values are normally distributed.

• Constant variance: the variance of the Y values is the same at every combination of X values.

• Independence: they Y’s are independent draws from their respective distributions.

These assumptions can also be summarized by writing a linear regression model in the following way,
using model 2 above as an example:

Y = β 0 + β1 X 1 + β 2 X 2 + ε

where the ε ’s are independent N (0, σ ) random variables (the subscript i has been omitted).

How do we fit the models? Least squares can still be used: find the values of the β’s to minimize the
n n
sum of squared residuals, ∑ (Yi − Yˆi ) 2 = ∑ resi2 . It’s not necessary to examine the formulas for the
i =1 i =1
least squares estimators of all the β’s, but the formulas can be obtained fairly easily using calculus, no
Chapter 9, page 2
matter how many β’s there are. Formulas for standard errors of the estimates can also be derived.
Confidence intervals and tests for individual coefficients can be computed under the assumptions of
the model. These will be covered in Chapter 10.

Example 1: Ozone data again. Four variables measured: ozone, max temperature, wind speed, solar
radiation. Examine ozone vs. wind speed; loess fit on left.
200 200

150 150

Ozone(ppb)
Ozone(ppb)

100 100

50 50

0 0

0.0 5.0 10.0 15.0 20.0 0.0 5.0 10.0 15.0 20.0
Wind speed (mph) Wind speed (mph)

A log transformation on ozone could be tried, but if the variance looks approximately constant, might
not want to transform ozone. Might try a quadratic relationship (right, above):

µ (Ozone Wind) =

Coefficientsa

Unstandardized Standardized
Coefficients Coefficients
Model B Std. Error Beta t Sig.
1 (Constant) 166.733 14.306 11.655 .000
Wind speed (mph) -19.958 2.735 -2.135 -7.298 .000
Wind^2 .662 .124 1.564 5.347 .000
a. Dependent Variable: Ozone(ppb)

Fitted model is

µ̂ (Ozone Wind) =

What is the predicted ozone level when Wind speed is 10 mph?

What do you think of the quadratic model based on the graph above?
Chapter 9, page 3

Notes
• Interpretation of the coefficients in a quadratic model is not straightforward. In particular, we
cannot interpret β̂1 the way we did in a simple linear regression model, since the change in
Ozone when Wind speed changes is affected by both β̂ and βˆ .
1 2

• If you include a quadratic term then you must also include a linear term for that variable. It
does not matter whether the coefficient on the linear term is statistically significant or not. You
cannot interpret the statistical significance of the coefficient on a variable if a higher order term
involving that variable is included in the model.

Example 2: Four variables were measured at each of thirty meteorological stations scattered
throughout California. These variables were: average annual precipitation (in inches), altitude (in
feet), latitude (in degrees), and whether or not the station was on the leeward side of the mountains in
the rain shadow (1 = in rain shadow, 0 = not in rain shadow). The goal was to examine the relationship
between precipitation and the other variables and also to create a model to predict precipitation.

Rain
Location Precip (in) Elevation (ft) Latitude Shadow
1 39.57 43 40.8 0
2 23.27 341 40.2 1
3 18.20 4152 33.8 1
4 37.48 74 39.4 0
5 49.26 6752 39.3 0
6 21.82 52 37.8 0
7 18.07 25 38.5 1
8 14.17 95 37.4 1
9 42.63 6360 36.6 0
10 13.85 74 36.7 1
11 9.44 331 36.7 1
12 19.33 57 35.7 0
13 15.67 740 35.7 1
14 6.00 489 35.4 1
15 5.73 4108 37.3 1
16 47.82 4850 40.4 0
17 17.95 120 34.4 0
18 18.20 4152 40.3 1
19 10.03 4036 41.9 1
20 4.63 913 34.8 1
21 14.74 699 34.2 0
22 15.02 312 34.1 0
23 12.36 50 33.8 0
24 8.26 125 37.8 1
25 4.05 268 33.6 1
26 9.94 19 32.7 0
27 4.25 2105 34.1 1
28 1.66 -178 36.5 1
29 74.87 35 41.7 0
30 15.95 60 39.2 1
Chapter 9, page 4

Scatterplot matrix: Graphs…Scatter…Matrix. Put in a categorical variable under Set Markers By. The
default is different colors, but you can edit the scatterplot to use different symbols.
In rain shadow
0
1
Precipitation (in)
Altitude (ft)
Latitude (degrees)

Precipitation (in) Altitude (ft) Latitude (degrees)

Ignore the Rainshadow variable for the time being. It also looks like transformations might be needed,
but for now, let’s ignore that also.

Consider the model µ (Precip Latitude, Altitude) = β 0 + β1Latitude + β 2 Altitude

Coefficientsa

Unstandardized Standardized
Coefficients Coefficients
Model B Std. Error Beta t Sig.
1 (Constant) -105.733 36.165 -2.924 .007
Latitude (degrees) 3.338 .984 .536 3.392 .002
Altitude (ft) .0014 .0013 .178 1.129 .269
a. Dependent Variable: Precipitation (in)

µ̂ (Precip Latitude, Altitude) =

According to the fitted model, what’s the predicted precipitation for a location at latitude 40 degrees
and 1000 feet in elevation?
Chapter 9, page 5

Interpreting the coefficients in the model:


• β1 represents the increase in mean precipitation for every one degree increase in latitude, given
that altitude remains fixed.

• β 2 represents the increase in mean precipitation for every one foot increase in altitude, given
that latitude remains fixed. It would be more natural to express this change for every 100 or
1000 feet increase in altitude.

• These interpretations are valid only in the range of combinations of combinations of latitude
and altitude that we have observed in our data.

6000

4000
Altitude (ft)

2000

32.0 34.0 36.0 38.0 40.0 42.0


Latitude (degrees)

Further interpretation of the model


The model assumes that there is a linear relationship between mean precipitation and latitude for every
altitude. The slope of the line is the same for all altitudes, but the intercept changes.

Altitude = 1000 feet


µ (Precip Latitude, Altitude = 1000) =

µ̂ (Precip Latitude, Altitude = 1000) =

Altitude = 3000 feet


µ (Precip Latitude, Altitude = 3000) =

µ̂ (Precip Latitude, Altitude = 3000) =


Chapter 9, page 6

Similarly, the model assumes that there is a linear relationship between mean precipitation and altitude
for every latitude. The slope of the line is the same for all latitudes, but the intercept changes.

Latitude = 34 degrees
µ (Precip Latitude = 34, Altitude) =

µ̂ (Precip Latitude = 34, Altitude) =

Latitude = 40 degrees
µ (Precip Latitude = 40, Altitude) =

µ̂ (Precip Latitude = 40, Altitude) =

We can also add an interaction term to the model. An interaction term is the product of two (or more)
explanatory variables. In SPSS, we can create a new variable which is the product of Altitude and
Latitude using Transform…Compute.

Coefficientsa

Unstandardized Standardized
Coefficients Coefficients
Model B Std. Error Beta t Sig.
1 (Constant) -144.230 44.487 -3.242 .003
Latitude (degrees) 4.375 1.206 .702 3.628 .001
Altitude (ft) .0304 .0202 3.830 1.501 .145
Altitude*Latitude -.00076 .00053 -3.700 -1.434 .163
a. Dependent Variable: Precipitation (in)

The model is µ (Precip Latitude, Altitude) = β 0 + β1Latitude + β 2 Altitude + β 3 Latitude * Altitude

According to this model, the relationship between Precipitation and Latitude is linear for any Altitude,
but both the intercept and slope of the relationship depend on the Altitude:

Altitude = 1000 feet


µ (Precip Latitude, Altitude = 1000) =

µ̂ (Precip Latitude, Altitude = 1000) =

Altitude = 3000 feet


µ (Precip Latitude, Altitude = 3000) =

µ̂ (Precip Latitude, Altitude = 3000) =


Chapter 9, page 7
Similarly, the model assumes that there is a linear relationship between mean precipitation and altitude
for every latitude. Both the intercept and slope of the line depend on the particular value of the
latitude.

Latitude = 34 degrees
µ (Precip Latitude = 34, Altitude) =

µ̂ (Precip Latitude = 34, Altitude) =

Latitude = 40 degrees
µ (Precip Latitude = 40, Altitude) =

µ̂ (Precip Latitude = 40, Altitude) =

Notes
• Interpretation of the model is easier in the absence of interactions so we usually avoid
interactions unless either a) there is strong evidence to the contrary, b) the interaction is
expected to be present, c) a test of the interaction term is meaningful scientifically in the
context of the problem. If prediction (and not interpretation) is the only goal, then we don’t
need to worry about the lack of interpretability of interactions.

• If an interaction between two variables is included in the model, then each of the variables
individually must be included. It doesn’t make sense not to. It does not matter whether the
coefficients on the individual variables are statistically significant or not. You cannot interpret
the statistical significance of the coefficient on individual variables if there is also an
interaction between those variables in the model.

Indicator variables
0/1 indicator variables, like Rainshadow, can be used in a multiple regression model to distinguish
between two groups.

µ (Precip Latitude, Rainshadow ) = β 0 + β1Latitude + β 2 Rainshadow

This implies there are two separate models relating Precipitation to Latitude, one for locations in the
rain shadow, and one for those not in the rain shadow.

µ (Precip Latitude, Rainshadow = 1) =

µ (Precip Latitude, Rainshadow = 0) =


Chapter 9, page 8
What would these two models look like on a graph of Precipitation versus Latitude?

Estimating the model:


Coefficientsa

Unstandardized Standardized
Coefficients Coefficients
Model B Std. Error Beta t Sig.
1 (Constant) -103.575 24.514 -4.225 .000
Latitude (degrees) 3.637 .659 .584 5.521 .000
Rainshadow -19.942 3.486 -.605 -5.720 .000
a. Dependent Variable: Precipitation (in)

µ̂ (Precip Latitude, Rainshadow ) =

µ̂ (Precip Latitude, Rainshadow = 1) =

µ̂ (Precip Latitude, Rainshadow = 0) =

Interpretation of β̂1 and β̂ 2 :


Chapter 9, page 9
We can also add an interaction term between Latitude and Rainshadow:

µ (Precip Latitude, Rainshadow ) = β 0 + β1Latitude + β 2 Rainshadow + β 3Latitude * Rainshadow

µ (Precip Latitude, Rainshadow = 1) =

µ (Precip Latitude, Rainshadow = 0) =

What would these two models look like on a graph of Precipitation versus Latitude?

Coefficientsa

Unstandardized Standardized
Coefficients Coefficients
Model B Std. Error Beta t Sig.
1 (Constant) -175.457 26.177 -6.703 .000
Latitude (degrees) 5.581 .705 .895 7.912 .000
Rainshadow 139.839 39.019 4.240 3.584 .001
Latitude*Rainshadow -4.315 1.051 -4.871 -4.105 .000
a. Dependent Variable: Precipitation (in)

µ̂ (Precip Latitude, Rainshadow ) =

µ̂ (Precip Latitude, Rainshadow = 1) =

µ̂ (Precip Latitude, Rainshadow = 0) =


Chapter 9, page 10
We can graph these two fitted lines in SPSS by graphing Precipitation versus Latitude with
Rainshadow entered into Set Markers By (this gives what the Sleuth calls a “coded scatterplot,” p.
254). Then get into Chart Editor, select one of the groups of points by clicking on the symbol on the
legend and then click Add Fit Line. Repeat for the other group. The plotting symbols and colors can
also be changed.
Rainshadow
80.0 0
1

60.0
Precipitation (in)

40.0

20.0

0.0

32.0 34.0 36.0 38.0 40.0 42.0


Latitude (degrees)

Interpretation of the coefficients in the model:

• β 0 represents the intercept of the model relating Precipitation to Latitude for locations not in
the rain shadow. The intercept isn’t of much interest, though, since Latitude of 0 is not
meaningful for these data.

• β1 represents the slope of the model relating Precipitation to Latitude for locations not in the
rain shadow. Thus, according to the model, mean precipitation increases by β1 for every one
degree increase in latitude for locations not in the rain shadow.

• β 2 represents the difference in mean precipitation for locations at Latitude 0 in and not in the
rain shadow. This isn’t meaningful since Latitude of 0 isn’t meaningful.

• β 3 represents the difference in the slope on Latitude for locations in and not in the rain
shadow. More meaningful is the quantity β1 + β 3 . According to the model, mean
precipitation increases by β1 + β 3 every one degree increase in latitude for locations in the rain
shadow.
Chapter 9, page 11

Question: what is the difference between fitting the above 3-variable model with an interaction and
fitting two separate linear regression models, one for locations in the rain shadow and one for those not
in the rain shadow? Are the assumptions of the two sets of models different?

Model with all three variables

µ (Precip Latitude, Altitude, Rainshadow ) = β 0 + β1Latitude + β 2 Altitude + β 3Rainshadow

How do you interpret the coefficients in this model?

What if you added all 2-way interactions?

More on indicator variables

Consider the model only with Rainshadow:

µ (Precip Rainshadow ) = β 0 + β1Rainshadow

What does this model imply about locations in and not in the rain shadow? If we have the assumptions
of normal distributions with constant variance and independent observations, what model we have
already studied is it equivalent to?
Chapter 9, page 12
Regression results:
Coefficientsa

Unstandardized Standardized
Coefficients Coefficients
Model B Std. Error Beta t Sig.
1 (Constant) 30.984 3.760 8.240 .000
Rainshadow -19.723 4.995 -.598 -3.949 .000
a. Dependent Variable: Precipitation (in)

Here’s some output from the two-sample t procedure. What’s the correspondence with the linear
regression model results?

Group Statistics

Std. Error
Rainshadow N Mean Std. Deviation Mean
Precipitation (in) 0 13 30.9838 19.35004 5.36674
1 17 11.2606 6.38787 1.54929

Independent Samples Test

t-test for Equality of Means


95% Confidence
Interval of the
Mean Std. Error Difference
t df Sig. (2-tailed) Difference Difference Lower Upper
Precipitation (in) Equal variances
3.949 28 .000 19.7233 4.9948 9.4919 29.9547
assumed
Equal variances
3.531 14.010 .003 19.7233 5.5859 7.7436 31.7030
not assumed
Chapter 9, page 13
Categorical variables with more than 2 levels
Categorical variables in linear regression models are called factors. How do we incorporate a factor
with 3 or more levels? In other words, how do we allow a separate effect for each level of the factor?

• A factor with k levels needs k-1 indicator variables to represent its effects in a regression
model.

Example: Meadowfoam case study, Chap. 9, p. 246. Light level (6 levels) can be treated as either a
quantitative variable or a categorical variable represented by 5 indicator variables. What’s the
difference, both in terms of the number of parameters in the model, and what the model says about the
relationship between number of flowers and light level?

We can create 6 indicator variables for light level, as seen in Display 9.7 on p. 246. Only 5 of them are
needed in the model because the constant term represents the omitted level. The level that is omitted is
called the reference level; the coefficients on the indicator variables represent differences from the
reference level.

Compare an ANOVA of Flowers on Light level to a regression of Flowers on the indicator variables
L300, L450, L600, L750, and L900.

One-way ANOVA output:


Descriptives

Flowers
95% Confidence Interval for
Mean
N Mean Std. Deviation Std. Error Lower Bound Upper Bound
150 4 73.275 7.379 3.689 61.533 85.017
300 4 64.150 11.455 5.727 45.923 82.377
450 4 59.900 9.017 4.509 45.551 74.249
600 4 50.050 10.035 5.017 34.082 66.018
750 4 45.525 11.847 5.923 26.674 64.376
900 4 43.925 6.592 3.296 33.436 54.414
Total 24 56.138 13.733 2.803 50.338 61.937
Chapter 9, page 14

ANOVA

Flowers
Sum of
Squares df Mean Square F Sig.
Between Groups 2683.514 5 536.703 5.839 .002
Within Groups 1654.423 18 91.912
Total 4337.936 23

Multiple Comparisons

Dependent Variable: Flowers


LSD

Mean
Difference 95% Confidence Interval
(I) Light intensity (J) Light intensity (I-J) Std. Error Sig. Lower Bound Upper Bound
150 300 9.12500 6.77910 .195 -5.1174 23.3674
450 13.37500 6.77910 .064 -.8674 27.6174
600 23.22500* 6.77910 .003 8.9826 37.4674
750 27.75000* 6.77910 .001 13.5076 41.9924
900 29.35000* 6.77910 .000 15.1076 43.5924
*. The mean difference is significant at the .05 level.

Regression output for model


µ (Flowers LIGHT) = β 0 + β1L300 + β 2 L450 + β 3 L600 + β 4 L750 + β 5 L900
ANOVAb

Sum of
Model Squares df Mean Square F Sig.
1 Regression 2683.514 5 536.703 5.839 .002a
Residual 1654.423 18 91.912
Total 4337.936 23
a. Predictors: (Constant), L900, L750, L600, L450, L300
b. Dependent Variable: Flowers

Coefficientsa

Unstandardized
Coefficients 95% Confidence Interval for B
Model B Std. Error t Sig. Lower Bound Upper Bound
1 (Constant) 73.275 4.794 15.286 .000 63.204 83.346
L300 -9.125 6.779 -1.346 .195 -23.367 5.117
L450 -13.375 6.779 -1.973 .064 -27.617 .867
L600 -23.225 6.779 -3.426 .003 -37.467 -8.983
L750 -27.750 6.779 -4.093 .001 -41.992 -13.508
L900 -29.350 6.779 -4.329 .000 -43.592 -15.108
a. Dependent Variable: Flowers
Chapter 9, page 15