100 views

Uploaded by Fanny Sylvia C.

- Lecture2- Matrix Approach to Multiple Regression
- EconometricS
- Aramaki2007
- The Future of Business Groups in Emerging Markets
- Eviews Tutorial 6
- UT Dallas Syllabus for eco4355.001.07s taught by Daniel Obrien (obri)
- Dummy Variables
- Out-Of-Trend Identification and Removal in Stability Modelling
- 3.+LINEAR+REGRESSION
- 10 Ch1 Introduction Revised
- Chapter 5,6 Regression Analysis.pptx
- lec20
- 028 FIN
- R CurveFit
- Teorema FWL
- Analyticsfood
- MA Applied Economics Syllabus.
- Altman1983.pdf
- BullEngGeolEnviron v67 2008 EstimativaRCU CargaPontual VPOUL DurezaSchmidt ArenitoCalcário Çobanoglu&Çelik
- HW 01

You are on page 1of 15

Multiple regression refers to regression with multiple explanatory variables (but just one response

variable). Multiple regression is an amazingly flexible tool which can be used to model linear and

nonlinear relationships. Don’t be fooled by the “linear” in “linear regression”: we’ve already seen

how simple linear regression can be used to model nonlinear relationships by transforming one or both

of the explanatory and response variables. There are more ways using multiple regression. It’s even

possible to incorporate categorical variables into multiple regression models.

Examples:

1. One explanatory variable, but a quadratic relationship

µ (Y X ) = β 0 + β1 X + β 2 X 2

We can include higher order powers of X, although this is unusual unless there is a theoretical reason

for it. Note: we always include lower order terms when a higher order term is in a model. For

example, we always include X if X 2 is in the model.

µ (Y X 1 , X 2 ) = β 0 + β1 X 1 + β 2 X 2

µ (Y X 1 , X 2 ) = β 0 + β1 X 1 + β 2 X 2 + β 3 X 1 X 2

The term X 1 X 2 is the product of the two variables. We’ll see why this is called an interaction below.

4. The explanatory variables can be binary (0,1). In fact, the ANOVA and pooled two-sample t models

can be written as special cases of the linear regression model.

• Normality: the Y values at any particular combination of X values are normally distributed.

• Constant variance: the variance of the Y values is the same at every combination of X values.

• Independence: they Y’s are independent draws from their respective distributions.

These assumptions can also be summarized by writing a linear regression model in the following way,

using model 2 above as an example:

Y = β 0 + β1 X 1 + β 2 X 2 + ε

where the ε ’s are independent N (0, σ ) random variables (the subscript i has been omitted).

How do we fit the models? Least squares can still be used: find the values of the β’s to minimize the

n n

sum of squared residuals, ∑ (Yi − Yˆi ) 2 = ∑ resi2 . It’s not necessary to examine the formulas for the

i =1 i =1

least squares estimators of all the β’s, but the formulas can be obtained fairly easily using calculus, no

Chapter 9, page 2

matter how many β’s there are. Formulas for standard errors of the estimates can also be derived.

Confidence intervals and tests for individual coefficients can be computed under the assumptions of

the model. These will be covered in Chapter 10.

Example 1: Ozone data again. Four variables measured: ozone, max temperature, wind speed, solar

radiation. Examine ozone vs. wind speed; loess fit on left.

200 200

150 150

Ozone(ppb)

Ozone(ppb)

100 100

50 50

0 0

0.0 5.0 10.0 15.0 20.0 0.0 5.0 10.0 15.0 20.0

Wind speed (mph) Wind speed (mph)

A log transformation on ozone could be tried, but if the variance looks approximately constant, might

not want to transform ozone. Might try a quadratic relationship (right, above):

µ (Ozone Wind) =

Coefficientsa

Unstandardized Standardized

Coefficients Coefficients

Model B Std. Error Beta t Sig.

1 (Constant) 166.733 14.306 11.655 .000

Wind speed (mph) -19.958 2.735 -2.135 -7.298 .000

Wind^2 .662 .124 1.564 5.347 .000

a. Dependent Variable: Ozone(ppb)

Fitted model is

µ̂ (Ozone Wind) =

What do you think of the quadratic model based on the graph above?

Chapter 9, page 3

Notes

• Interpretation of the coefficients in a quadratic model is not straightforward. In particular, we

cannot interpret β̂1 the way we did in a simple linear regression model, since the change in

Ozone when Wind speed changes is affected by both β̂ and βˆ .

1 2

• If you include a quadratic term then you must also include a linear term for that variable. It

does not matter whether the coefficient on the linear term is statistically significant or not. You

cannot interpret the statistical significance of the coefficient on a variable if a higher order term

involving that variable is included in the model.

Example 2: Four variables were measured at each of thirty meteorological stations scattered

throughout California. These variables were: average annual precipitation (in inches), altitude (in

feet), latitude (in degrees), and whether or not the station was on the leeward side of the mountains in

the rain shadow (1 = in rain shadow, 0 = not in rain shadow). The goal was to examine the relationship

between precipitation and the other variables and also to create a model to predict precipitation.

Rain

Location Precip (in) Elevation (ft) Latitude Shadow

1 39.57 43 40.8 0

2 23.27 341 40.2 1

3 18.20 4152 33.8 1

4 37.48 74 39.4 0

5 49.26 6752 39.3 0

6 21.82 52 37.8 0

7 18.07 25 38.5 1

8 14.17 95 37.4 1

9 42.63 6360 36.6 0

10 13.85 74 36.7 1

11 9.44 331 36.7 1

12 19.33 57 35.7 0

13 15.67 740 35.7 1

14 6.00 489 35.4 1

15 5.73 4108 37.3 1

16 47.82 4850 40.4 0

17 17.95 120 34.4 0

18 18.20 4152 40.3 1

19 10.03 4036 41.9 1

20 4.63 913 34.8 1

21 14.74 699 34.2 0

22 15.02 312 34.1 0

23 12.36 50 33.8 0

24 8.26 125 37.8 1

25 4.05 268 33.6 1

26 9.94 19 32.7 0

27 4.25 2105 34.1 1

28 1.66 -178 36.5 1

29 74.87 35 41.7 0

30 15.95 60 39.2 1

Chapter 9, page 4

Scatterplot matrix: Graphs…Scatter…Matrix. Put in a categorical variable under Set Markers By. The

default is different colors, but you can edit the scatterplot to use different symbols.

In rain shadow

0

1

Precipitation (in)

Altitude (ft)

Latitude (degrees)

Ignore the Rainshadow variable for the time being. It also looks like transformations might be needed,

but for now, let’s ignore that also.

Coefficientsa

Unstandardized Standardized

Coefficients Coefficients

Model B Std. Error Beta t Sig.

1 (Constant) -105.733 36.165 -2.924 .007

Latitude (degrees) 3.338 .984 .536 3.392 .002

Altitude (ft) .0014 .0013 .178 1.129 .269

a. Dependent Variable: Precipitation (in)

According to the fitted model, what’s the predicted precipitation for a location at latitude 40 degrees

and 1000 feet in elevation?

Chapter 9, page 5

• β1 represents the increase in mean precipitation for every one degree increase in latitude, given

that altitude remains fixed.

• β 2 represents the increase in mean precipitation for every one foot increase in altitude, given

that latitude remains fixed. It would be more natural to express this change for every 100 or

1000 feet increase in altitude.

• These interpretations are valid only in the range of combinations of combinations of latitude

and altitude that we have observed in our data.

6000

4000

Altitude (ft)

2000

Latitude (degrees)

The model assumes that there is a linear relationship between mean precipitation and latitude for every

altitude. The slope of the line is the same for all altitudes, but the intercept changes.

µ (Precip Latitude, Altitude = 1000) =

µ (Precip Latitude, Altitude = 3000) =

Chapter 9, page 6

Similarly, the model assumes that there is a linear relationship between mean precipitation and altitude

for every latitude. The slope of the line is the same for all latitudes, but the intercept changes.

Latitude = 34 degrees

µ (Precip Latitude = 34, Altitude) =

Latitude = 40 degrees

µ (Precip Latitude = 40, Altitude) =

We can also add an interaction term to the model. An interaction term is the product of two (or more)

explanatory variables. In SPSS, we can create a new variable which is the product of Altitude and

Latitude using Transform…Compute.

Coefficientsa

Unstandardized Standardized

Coefficients Coefficients

Model B Std. Error Beta t Sig.

1 (Constant) -144.230 44.487 -3.242 .003

Latitude (degrees) 4.375 1.206 .702 3.628 .001

Altitude (ft) .0304 .0202 3.830 1.501 .145

Altitude*Latitude -.00076 .00053 -3.700 -1.434 .163

a. Dependent Variable: Precipitation (in)

According to this model, the relationship between Precipitation and Latitude is linear for any Altitude,

but both the intercept and slope of the relationship depend on the Altitude:

µ (Precip Latitude, Altitude = 1000) =

µ (Precip Latitude, Altitude = 3000) =

Chapter 9, page 7

Similarly, the model assumes that there is a linear relationship between mean precipitation and altitude

for every latitude. Both the intercept and slope of the line depend on the particular value of the

latitude.

Latitude = 34 degrees

µ (Precip Latitude = 34, Altitude) =

Latitude = 40 degrees

µ (Precip Latitude = 40, Altitude) =

Notes

• Interpretation of the model is easier in the absence of interactions so we usually avoid

interactions unless either a) there is strong evidence to the contrary, b) the interaction is

expected to be present, c) a test of the interaction term is meaningful scientifically in the

context of the problem. If prediction (and not interpretation) is the only goal, then we don’t

need to worry about the lack of interpretability of interactions.

• If an interaction between two variables is included in the model, then each of the variables

individually must be included. It doesn’t make sense not to. It does not matter whether the

coefficients on the individual variables are statistically significant or not. You cannot interpret

the statistical significance of the coefficient on individual variables if there is also an

interaction between those variables in the model.

Indicator variables

0/1 indicator variables, like Rainshadow, can be used in a multiple regression model to distinguish

between two groups.

This implies there are two separate models relating Precipitation to Latitude, one for locations in the

rain shadow, and one for those not in the rain shadow.

Chapter 9, page 8

What would these two models look like on a graph of Precipitation versus Latitude?

Coefficientsa

Unstandardized Standardized

Coefficients Coefficients

Model B Std. Error Beta t Sig.

1 (Constant) -103.575 24.514 -4.225 .000

Latitude (degrees) 3.637 .659 .584 5.521 .000

Rainshadow -19.942 3.486 -.605 -5.720 .000

a. Dependent Variable: Precipitation (in)

Chapter 9, page 9

We can also add an interaction term between Latitude and Rainshadow:

What would these two models look like on a graph of Precipitation versus Latitude?

Coefficientsa

Unstandardized Standardized

Coefficients Coefficients

Model B Std. Error Beta t Sig.

1 (Constant) -175.457 26.177 -6.703 .000

Latitude (degrees) 5.581 .705 .895 7.912 .000

Rainshadow 139.839 39.019 4.240 3.584 .001

Latitude*Rainshadow -4.315 1.051 -4.871 -4.105 .000

a. Dependent Variable: Precipitation (in)

Chapter 9, page 10

We can graph these two fitted lines in SPSS by graphing Precipitation versus Latitude with

Rainshadow entered into Set Markers By (this gives what the Sleuth calls a “coded scatterplot,” p.

254). Then get into Chart Editor, select one of the groups of points by clicking on the symbol on the

legend and then click Add Fit Line. Repeat for the other group. The plotting symbols and colors can

also be changed.

Rainshadow

80.0 0

1

60.0

Precipitation (in)

40.0

20.0

0.0

Latitude (degrees)

• β 0 represents the intercept of the model relating Precipitation to Latitude for locations not in

the rain shadow. The intercept isn’t of much interest, though, since Latitude of 0 is not

meaningful for these data.

• β1 represents the slope of the model relating Precipitation to Latitude for locations not in the

rain shadow. Thus, according to the model, mean precipitation increases by β1 for every one

degree increase in latitude for locations not in the rain shadow.

• β 2 represents the difference in mean precipitation for locations at Latitude 0 in and not in the

rain shadow. This isn’t meaningful since Latitude of 0 isn’t meaningful.

• β 3 represents the difference in the slope on Latitude for locations in and not in the rain

shadow. More meaningful is the quantity β1 + β 3 . According to the model, mean

precipitation increases by β1 + β 3 every one degree increase in latitude for locations in the rain

shadow.

Chapter 9, page 11

Question: what is the difference between fitting the above 3-variable model with an interaction and

fitting two separate linear regression models, one for locations in the rain shadow and one for those not

in the rain shadow? Are the assumptions of the two sets of models different?

What does this model imply about locations in and not in the rain shadow? If we have the assumptions

of normal distributions with constant variance and independent observations, what model we have

already studied is it equivalent to?

Chapter 9, page 12

Regression results:

Coefficientsa

Unstandardized Standardized

Coefficients Coefficients

Model B Std. Error Beta t Sig.

1 (Constant) 30.984 3.760 8.240 .000

Rainshadow -19.723 4.995 -.598 -3.949 .000

a. Dependent Variable: Precipitation (in)

Here’s some output from the two-sample t procedure. What’s the correspondence with the linear

regression model results?

Group Statistics

Std. Error

Rainshadow N Mean Std. Deviation Mean

Precipitation (in) 0 13 30.9838 19.35004 5.36674

1 17 11.2606 6.38787 1.54929

95% Confidence

Interval of the

Mean Std. Error Difference

t df Sig. (2-tailed) Difference Difference Lower Upper

Precipitation (in) Equal variances

3.949 28 .000 19.7233 4.9948 9.4919 29.9547

assumed

Equal variances

3.531 14.010 .003 19.7233 5.5859 7.7436 31.7030

not assumed

Chapter 9, page 13

Categorical variables with more than 2 levels

Categorical variables in linear regression models are called factors. How do we incorporate a factor

with 3 or more levels? In other words, how do we allow a separate effect for each level of the factor?

• A factor with k levels needs k-1 indicator variables to represent its effects in a regression

model.

Example: Meadowfoam case study, Chap. 9, p. 246. Light level (6 levels) can be treated as either a

quantitative variable or a categorical variable represented by 5 indicator variables. What’s the

difference, both in terms of the number of parameters in the model, and what the model says about the

relationship between number of flowers and light level?

We can create 6 indicator variables for light level, as seen in Display 9.7 on p. 246. Only 5 of them are

needed in the model because the constant term represents the omitted level. The level that is omitted is

called the reference level; the coefficients on the indicator variables represent differences from the

reference level.

Compare an ANOVA of Flowers on Light level to a regression of Flowers on the indicator variables

L300, L450, L600, L750, and L900.

Descriptives

Flowers

95% Confidence Interval for

Mean

N Mean Std. Deviation Std. Error Lower Bound Upper Bound

150 4 73.275 7.379 3.689 61.533 85.017

300 4 64.150 11.455 5.727 45.923 82.377

450 4 59.900 9.017 4.509 45.551 74.249

600 4 50.050 10.035 5.017 34.082 66.018

750 4 45.525 11.847 5.923 26.674 64.376

900 4 43.925 6.592 3.296 33.436 54.414

Total 24 56.138 13.733 2.803 50.338 61.937

Chapter 9, page 14

ANOVA

Flowers

Sum of

Squares df Mean Square F Sig.

Between Groups 2683.514 5 536.703 5.839 .002

Within Groups 1654.423 18 91.912

Total 4337.936 23

Multiple Comparisons

LSD

Mean

Difference 95% Confidence Interval

(I) Light intensity (J) Light intensity (I-J) Std. Error Sig. Lower Bound Upper Bound

150 300 9.12500 6.77910 .195 -5.1174 23.3674

450 13.37500 6.77910 .064 -.8674 27.6174

600 23.22500* 6.77910 .003 8.9826 37.4674

750 27.75000* 6.77910 .001 13.5076 41.9924

900 29.35000* 6.77910 .000 15.1076 43.5924

*. The mean difference is significant at the .05 level.

µ (Flowers LIGHT) = β 0 + β1L300 + β 2 L450 + β 3 L600 + β 4 L750 + β 5 L900

ANOVAb

Sum of

Model Squares df Mean Square F Sig.

1 Regression 2683.514 5 536.703 5.839 .002a

Residual 1654.423 18 91.912

Total 4337.936 23

a. Predictors: (Constant), L900, L750, L600, L450, L300

b. Dependent Variable: Flowers

Coefficientsa

Unstandardized

Coefficients 95% Confidence Interval for B

Model B Std. Error t Sig. Lower Bound Upper Bound

1 (Constant) 73.275 4.794 15.286 .000 63.204 83.346

L300 -9.125 6.779 -1.346 .195 -23.367 5.117

L450 -13.375 6.779 -1.973 .064 -27.617 .867

L600 -23.225 6.779 -3.426 .003 -37.467 -8.983

L750 -27.750 6.779 -4.093 .001 -41.992 -13.508

L900 -29.350 6.779 -4.329 .000 -43.592 -15.108

a. Dependent Variable: Flowers

Chapter 9, page 15

- Lecture2- Matrix Approach to Multiple RegressionUploaded byAbdur Rehman
- EconometricSUploaded byGardo Prasetyo
- Aramaki2007Uploaded byADNANFOJNICA
- The Future of Business Groups in Emerging MarketsUploaded byweygandt
- Eviews Tutorial 6Uploaded bykattarinaS
- UT Dallas Syllabus for eco4355.001.07s taught by Daniel Obrien (obri)Uploaded byUT Dallas Provost's Technology Group
- Dummy VariablesUploaded bysameeaahmreen
- Out-Of-Trend Identification and Removal in Stability ModellingUploaded byJose Cortés
- 3.+LINEAR+REGRESSIONUploaded byRyan Goh
- 10 Ch1 Introduction RevisedUploaded byAstoria Wei
- Chapter 5,6 Regression Analysis.pptxUploaded bySushil Kumar
- lec20Uploaded byGurtar Kaur
- 028 FINUploaded byNadya Lovita
- R CurveFitUploaded byeroteme.thinks8580
- Teorema FWLUploaded byDouglas Akamine
- AnalyticsfoodUploaded byKartik Sachdev
- MA Applied Economics Syllabus.Uploaded byVishnu Venugopal
- Altman1983.pdfUploaded byingridgalaz
- BullEngGeolEnviron v67 2008 EstimativaRCU CargaPontual VPOUL DurezaSchmidt ArenitoCalcário Çobanoglu&ÇelikUploaded byClovis Gonzatti
- HW 01Uploaded byadnandanishahid
- Assignment 3Uploaded byshagunparmar
- bab 3Uploaded bysalsa
- 4315355Uploaded byKumar Sachin
- Factors Influencing Job SatisfactionUploaded byurusha
- Chapter 16 (1)Uploaded bymaustro
- Exercise 2Uploaded byJabir Arif
- Ch.18_Regression ExampleUploaded byEmma Lu
- File Spss SkripsiUploaded byMuhamad Azhari
- Sales Forecast for Bhushan Steel LimitedUploaded byOmkar Hande
- Drazin 1985 (2)Uploaded byDwi Narullia

- Chapter 12Uploaded byFanny Sylvia C.
- ReviewChaps3-4Uploaded byFanny Sylvia C.
- Chapter 10Uploaded byFanny Sylvia C.
- Chapter 14Uploaded byFanny Sylvia C.
- Chapter 20Uploaded byFanny Sylvia C.
- ReviewChaps1-2Uploaded byFanny Sylvia C.
- Chapter 13Uploaded byFanny Sylvia C.
- Chapter 21Uploaded byFanny Sylvia C.
- Charles TaylorUploaded byFanny Sylvia C.
- Chapter 8Uploaded byFanny Sylvia C.
- Chapter 11Uploaded byFanny Sylvia C.
- Non%26ParaBootUploaded byFanny Sylvia C.
- SampleSizeCalcRevisitedUploaded byFanny Sylvia C.
- Model- vs. design-based sampling and variance estimationUploaded byFanny Sylvia C.
- Hypo%26PowerLectureUploaded byFanny Sylvia C.
- R Matrix TutorUploaded byFanny Sylvia C.
- Intro BootstrapUploaded byMichalaki Xrisoula
- Clustering in the Linear ModelUploaded byFanny Sylvia C.
- Good Article on Standard Error vs Standard DeviationUploaded byAshok Kumar Bharathidasan
- An Ova PowerUploaded byFanny Sylvia C.
- Chapter 7Uploaded byFanny Sylvia C.
- Chapter 7Uploaded byFanny Sylvia C.
- Bio Math 94 CLUSTERING POPULATIONS BY MIXED LINEAR MODELSUploaded byFanny Sylvia C.
- Chapter 5Uploaded byFanny Sylvia C.
- GRM: Generalized Regression Model for Clustering Linear SequencesUploaded byFanny Sylvia C.
- Chapter5p2LectureUploaded byFanny Sylvia C.
- Data Modeling: General Linear Model &Statistical InferenceUploaded byFanny Sylvia C.
- Chapter 6Uploaded byFanny Sylvia C.
- The not so Short Introduction to LaTeXUploaded byoetiker

- Hybrid Hydrogen SystemsUploaded byrisovi
- RFEupdatingTrack_BasicTutorial_2.4Uploaded byTurugayu
- PB-onshore-wind-energy-UK.pdfUploaded byBalan Palaniappan
- Wild Pokemon Volt White 2Uploaded byicemango
- kesimpulanUploaded bydiaz ratna dewy
- AlohaUploaded bySaška Sekulić Pušica
- Download PDF eBooks.org Ku 13289Uploaded byMahibul Hossen
- Doc0556a - Plaquette Ditributeur Vierge Gb - Tarifs 2012Uploaded byjdepablo3544
- POH-2761-02-FAA-REV10-FULLUploaded byRobert Bell
- mapeo de IF.pdfUploaded byTania Mejía Puma
- B777_PilotsGuideUploaded byLeo Balverde Malagar
- national-standard-for-driving-cars-and-light-vans.pdfUploaded byValleyKavynchuk
- Loss Prevention and Safety in Sugar MillsUploaded byRavi Chandar
- SigmaXL FeaturesUploaded byppdat
- ConditionalsUploaded byEnglevert Reyes
- ESP19 Operacion y manteniento HAULOTTEUploaded byIGNACIO
- RJ1405_WEBUploaded byaurelian177
- Training Compressor Portable.pptxUploaded byBudi Waskito
- The Meaning, Derivation, And Usefulness of Wind Chill TablesUploaded bydonas chakra
- Cerc Adms 5 1 User GuideUploaded byIrfanHanifa
- Skinny Puppy LyricsUploaded byapi-3737564
- 223479707-Early-Cavity-Walls.pdfUploaded byDiasz
- A Christmas Carol - Charles DickensUploaded byMarko Golubovic
- Folklore and Legends-ScotlandUploaded byTeaganFalak
- Knives - raw material for a book on bushcraftUploaded bySusanne Williams
- Minimum Up Time is the Shortest Amount of Time Between the Generator Going From Zero Output to NonUploaded bysabrahima
- water Rocket ReportUploaded byChristopher Chan Sing Kong
- Ten Best SentencesUploaded byDrBertram Forer
- 6243_02_que_20050608Uploaded byzin
- MMCAT3512 (1207Hp@1200 rpm )Uploaded byrichardinocnt