168 views

Uploaded by Fanny Sylvia C.

- Regression Interpretation (1)
- Series 1
- 2009 4 Regression
- REGRESI LINIER BERGANDA
- Case-1-Number-2.pptx
- Examples Model Building
- Central Composite Design Evaluation Ketoprofen Formulations
- Quantile Regression
- 81063542 Regression Analysis
- Econometrics
- 2C-74P
- chap13
- Factors Affecting Youth Generation Interest on Agricultural Fields (Case Study in Deli Serdang District)
- 19016055_BankBill_Case3
- Sas 7
- Branding Role
- Fitting and Interpretation of Sediment Rating Curves
- E670
- B090200 Assignment 1 Sample Paper
- Finalized PPT.pptx

You are on page 1of 6

Math 445 Chapter 8: A Closer Look at Assumptions for Simple Linear Regression

1. Linearity

2. Constant variance

3. Normality

4. Independence

Assumptions 1, 2 and 4 are the most important. Violation of 1 can bias estimates of means and

predictions. Violations of 2 and 4 can lead to under- or over-estimates of standard errors and misleading

inferences and confidence intervals. Violation of 3 is only a problem when sample sizes are small. An

exception is prediction intervals for an individual response which depend critically on the normality

assumption (confidence intervals for the mean response at a particular X are robust to normality because

of the Central Limit Theorem).

Assessing assumptions

Linearity and constant variance assumptions: assess through scatterplots, smoothing (loess, for example),

and residual plots

Example: Ozone level and maximum temperature on 111 days at a location on New Jersey, summer 1973

200

100

150

Unstandardized Residual

Ozone(ppb)

50

100

0

50

0 -50

Maximum temperature (F) Unstandardized Predicted Value

• The relationship is not linear and the variance appears to increase as temperature increases. These

violations suggest transforming the response variable (transforming the explanatory variable will

not solve the nonconstant variance problem).

• When deciding whether to transform the response variable or the explanatory variable (or both),

sometimes it is helpful to look at histograms of each variable individually. If the distribution of

either variable is skewed, this suggests transforming that variable. In this example, the

distribution of ozone is skewed to the right while the distribution of temperature is roughly

symmetric.

• See Display 8.6 on p. 213 for suggested courses of action for other patterns.

Chapter 8, page 2

• Recall the ladder of powers: the family of power transformations (Chapter 10 of DeVeaux,

Velleman and Bock). Examples:

2 represents squaring (y2)

1 represents no transformation (y)

½ represents square root ( y )

0, by convention, represents log(y) (to any base)

-1/2 represents reciprocal square root (- 1 / y ) (the negative preserves the original order)

-1 represents reciprocal (-1/y)

For univariate data, powers less than 1 are often used for variables whose distribution is skewed to the

right; the stronger the skew, often the smaller the power needed (0 is smaller than ½, -1/2 is smaller than

0, etc.).

Log transformation is generally the most interpretable, though other transformations are sometimes

interpretable in special situations (see bottom of p. 216; in particular, the inverse transformation is

interpretable for rations where miles per gallon, for example, becomes gallons per mile).

Can easily try different transformations (in SPSS Chart Editor, can do power transformations with non-

negative exponents to X, Y or both).

A log transformation works well for the Ozone data, making the relationship more linear and the variance

more constant. There is one moderate outlier which we’ll address later.

2.50

0.5

2.00

Unstandardized Residual

Log10(Ozone)

1.50

0.0

1.00

-0.5

0.50

0.00 -1.0

Maximum temperature (F) Unstandardized Predicted Value

Coefficientsa

Unstandardized Standardized

Coefficients Coefficients 95% Confidence Interval for B

Model B Std. Error Beta t Sig. Lower Bound Upper Bound

1 (Constant) -.8028 .1976 -4.062 .000 -1.1945 -.4111

Maximum temperature (F) .0294 .0025 .745 11.654 .000 .0244 .0344

a. Dependent Variable: Log10(Ozone)

Chapter 8, page 3

Before proceeding to the interpretation of this model, we first address the other assumptions: normality

and independence. Normality is not crucial with larger sample sizes, but we should make sure that there

is not strong skewness or outliers. The assumption of a normal distribution at each value of X means that

the residuals ε i = Yi − ( β 0 + β1 X i ) are assumed to be N (0, σ ) . Thus we can look at the distribution of

the observed residuals res = e = Y − ( βˆ + βˆ X ) with a histogram and/or normal probability plot.

i i i 0 1 i

0.75

20

0.50

Frequency

15

0.25

0.00

10

-0.25

-0.50

0

-0.75

-1.00 -0.50 0.00 0.50 -1.0 -0.5 0.0 0.5

Unstandardized Residual Observed Value

The residuals for the log(Ozone) model appear quite symmetrically distributed with only one mild outlier

on the negative end.

The assumption of independence of the residuals can only be judged from the sampling plan and/or from

plotting the residuals versus time order or other covariates that may have been measured. For example, if

these observations had come from two different locations, then the independence assumption would be

violated. We would want to examine a scatterplot with the points from the two locations identified to see

if the relationship were different at the two locations. We would also want to plot the residuals versus day

number to see if there were patterns in the residuals.

0.50

Unstandardized Residual

0.00

-0.50

-1.00

41

45

49

21

25

29

57

61

65

69

5

9

33

37

53

73

77

81

85

89

93

97

1

13

17

101

105

109

Sequence number

Chapter 8, page 4

Interpretation of transformed model

µˆ [log(Ozone Temp)] = −.8028 + .0294Temp

If we transform back, by taking 10 to each side, the left-hand side does not become the mean of Y because

the mean of the logged data is not the log of the mean of the raw data. However, if the transformation has

succeeded in making the distribution of the log(Y) values symmetric about their mean, then

Median [log(Y X )] = µ [log(Y X )]

Medians can be transformed back: the median of the logged data is the log of the median of the original

data. Therefore, we can say:

Note that

Estimated Median(Ozone Temp + 1) (.1575)10.0294 ( Temp+1)

= .0294 Temp

= 10.0294 = 1.070

Estimated Median(Ozone Temp) (.1575)10

This means that median ozone level is estimated to increase by a factor of 1.070, or 7.0%, for every one

degree increase in maximum temperature (95% confidence interval 5.8% to 8.2%, since 10.0244 = 1.058

and 10.0344 = 1.082).

Model Summary

Model R R Square R Square the Estimate

1 .745a .555 .551 .25207

a. Predictors: (Constant), Maximum temperature (F)

ANOVAb

Sum of

Model Squares df Mean Square F Sig.

1 Regression 8.629 1 8.629 135.813 .000a

Residual 6.926 109 .064

Total 15.555 110

a. Predictors: (Constant), Maximum temperature (F)

b. Dependent Variable: Log10(Ozone)

Coefficientsa

Unstandardized Standardized

Coefficients Coefficients 95% Confidence Interval for B

Model B Std. Error Beta t Sig. Lower Bound Upper Bound

1 (Constant) -.8028 .1976 -4.062 .000 -1.1945 -.4111

Maximum temperature (F) .0294 .0025 .745 11.654 .000 .0244 .0344

a. Dependent Variable: Log10(Ozone)

Chapter 8, page 5

The t-statistics and P-values (“Sig.”) reported in the Coefficients table are for testing the hypothesis

H 0 : β 0 = 0 and the hypothesis H 0 : β1 = 0 . The former is usually not of interest, but the latter is a test

of the equal-means model.

The ANOVA table is precisely analogous to the ANOVA table for comparing several groups. It

compares the linear regression model with 2 parameters for the means ( β 0 and β1 ), which is the full

model, to the equal-means model µ (Y X ) = β 0 , which is the reduced model.

n

• Total sum of squares = residual sum of squares for equal-means (reduced) model = ∑ (Yi − Y ) 2 .

i =1

n 2

• Residual sum of squares = residual sum of squares for full model = .

i =1

1 n

• Mean square residual = ∑

n − 2 i =1

resi2 =σˆ 2

The F-test is a test of the simple linear regression model versus the equal-means model. Since the only

difference between the two models is the parameter β1 , this is a two-sided test of the hypothesis

H 0 : β1 = 0 . This is mathematically equivalent to the t-test of this hypothesis that is reported in the

regression coefficients table.

The R-squared statistic, or coefficient of determination gives us the percentage of the total variation in the

response, y, that is explained by the explanatory variable, x, which for our example yields:

R2 = = = 0.555

total sum of squares 15.555

The residual sum of squares is the deviation in y away from the regression model and hence the difference

of the total variation and the residual variation represents the reduction in the variation achieved by

modeling y in terms of the model.

For linear regression, R2 is identical to the square of the sample correlation coefficient for the response

and the explanatory variable. Hence, this quantity is only a valid measure if the assumptions are met—i.e.

that the data are random samples and should never be used to evaluate the adequacy of the linear model.

Chapter 8, page 6

Case Study 8.2: Breakdown times for Insulating Fluid

ANOVA

Log(Time)

Sum of

Squares df Mean Square F Sig.

Between Groups 196.477 6 32.746 13.004 .000

Within Groups 173.749 69 2.518

Total 370.226 75

ANOVAb

Sum of

Model Squares df Mean Square F Sig.

1 Regression 190.151 1 190.151 78.141 .000a

Residual 180.075 74 2.433

Total 370.226 75

a. Predictors: (Constant), Voltage (kV)

b. Dependent Variable: Log(Time)

Questions:

1) How much is the residual sum of squares lowered by going from the 2 parameter regression model

to the 7 parameter ‘separate means model’?

Between

Regression

Lack of fit

Within

Total

- Regression Interpretation (1)Uploaded byPrem Prakash
- Series 1Uploaded bySangwoo Kim
- 2009 4 RegressionUploaded bySenthilnathan Nagarajan
- REGRESI LINIER BERGANDAUploaded bylailatul inayati
- Case-1-Number-2.pptxUploaded byJudy Marl Elarmo
- Examples Model BuildingUploaded byHorace Choi
- Central Composite Design Evaluation Ketoprofen FormulationsUploaded byimirassou
- Quantile RegressionUploaded byhdascapital
- 81063542 Regression AnalysisUploaded byAlphaeus
- EconometricsUploaded byne30n2
- 2C-74PUploaded byCate Masilungan
- chap13Uploaded byImam Awaluddin
- Factors Affecting Youth Generation Interest on Agricultural Fields (Case Study in Deli Serdang District)Uploaded byIJEAB Journal
- 19016055_BankBill_Case3Uploaded byisaac
- Sas 7Uploaded bySumit Kumar
- Branding RoleUploaded byKruh Kwabena Isaac
- Fitting and Interpretation of Sediment Rating CurvesUploaded bynguyennghiahung
- E670Uploaded byDannyChacon
- B090200 Assignment 1 Sample PaperUploaded bysukoukou
- Finalized PPT.pptxUploaded byxianranreal
- OutputUploaded bysanti
- OutputUploaded byAbudzar Ghifari
- Alvarez et al. Modeling monthly mean air temperature for Brazil (2013).pdfUploaded byElisa Cardoso
- Arrhenius EquationUploaded byJorge Luis Betancurt
- QTBExercises.xlsxUploaded byLester Claudio
- Cost Estimation Predictive Modeling Regression Verses NNUploaded byMubanga
- changli.pdfUploaded byMarto BM
- Output Spss Prak 9Uploaded byHeri Prasetio
- Advanced Stata WorkshopUploaded byHector Garcia
- Statistics I - Introduction to ANOVA, Regression, And Logistic RegressionUploaded byWong Xianyang

- Chapter 21Uploaded byFanny Sylvia C.
- Charles TaylorUploaded byFanny Sylvia C.
- Chapter 9Uploaded byFanny Sylvia C.
- Chapter 13Uploaded byFanny Sylvia C.
- ReviewChaps1-2Uploaded byFanny Sylvia C.
- Hypo%26PowerLectureUploaded byFanny Sylvia C.
- Model- vs. design-based sampling and variance estimationUploaded byFanny Sylvia C.
- Non%26ParaBootUploaded byFanny Sylvia C.
- SampleSizeCalcRevisitedUploaded byFanny Sylvia C.
- Chapter 11Uploaded byFanny Sylvia C.
- Chapter 12Uploaded byFanny Sylvia C.
- Chapter 20Uploaded byFanny Sylvia C.
- Chapter 10Uploaded byFanny Sylvia C.
- Chapter 14Uploaded byFanny Sylvia C.
- ReviewChaps3-4Uploaded byFanny Sylvia C.
- GRM: Generalized Regression Model for Clustering Linear SequencesUploaded byFanny Sylvia C.
- Chapter 5Uploaded byFanny Sylvia C.
- Bio Math 94 CLUSTERING POPULATIONS BY MIXED LINEAR MODELSUploaded byFanny Sylvia C.
- The not so Short Introduction to LaTeXUploaded byoetiker
- Chapter 6Uploaded byFanny Sylvia C.
- Data Modeling: General Linear Model &Statistical InferenceUploaded byFanny Sylvia C.
- Chapter5p2LectureUploaded byFanny Sylvia C.
- Intro BootstrapUploaded byMichalaki Xrisoula
- R Matrix TutorUploaded byFanny Sylvia C.
- Chapter 7Uploaded byFanny Sylvia C.
- Chapter 7Uploaded byFanny Sylvia C.
- An Ova PowerUploaded byFanny Sylvia C.
- Clustering in the Linear ModelUploaded byFanny Sylvia C.
- Good Article on Standard Error vs Standard DeviationUploaded byAshok Kumar Bharathidasan

- ANOVA ThesisUploaded byAlfredo Jr Fortu
- Hybrid Approach for Intrusion Detection Model Using Combination of K-Means Clustering Algorithm and Random Forest ClassificationUploaded bytheijes
- Anlt BooksUploaded bypokospikos
- Advanced Mechanical Design and Materials-Material Selection and Finite Element Analysis SimulationUploaded byTop Engineering Solutions
- ANTH2237 Ethnographic Research Methods OutlineUploaded bymilesriley
- CointegrationUploaded by142857x1
- Advanced Stat Methods PaperUploaded byAves Ahmed Khan
- 1_Meyer_R_p78795_1Uploaded byZalde Tuballas
- Statistical analysisUploaded byamit-badhan
- Nguyên lý thống kê cơ bản trong các nghiên cứu lâm sàng, PGS.TS Lê Hoàng NinhUploaded byDạy Kèm Quy Nhơn Official
- ACL in Practice (1)Uploaded byvin
- Ch2 SlidesUploaded byNicolas Copernic
- Ch05 (1).pptUploaded bySani Ahamed
- Alcohol AttractivenessUploaded byYves Paul M. Montero
- Data Analysis Note, UT DallasUploaded bymeisam hejazinia
- Mental model 1st year students.pdfUploaded byAzlinawati Abdullah
- ARIMA_R_PROGRAMAS.pptUploaded byVLADIMIRO IBAÑEZ
- SAS Training Session 2Uploaded byRajesh Kumar
- Weibull 7_trainingUploaded byKhalid Mahmood
- forecast solved proplem +caseUploaded bymohammad
- 17-3-17Uploaded byAnonymous UP5RC6JpG
- Research HypothesisUploaded byAndrra Tanase
- DID101Uploaded byBetty Bup
- Chapter 1 Simple Linear Regression(3).pptUploaded byDe El Eurey Shine
- stat12t_0304Uploaded byPariGreenlime
- Notes Part2Uploaded bymouna
- Lesson 2 - Main Estimation Methods for Panel Data-PTBNgocUploaded bySuri Bella
- ANOVA - Testing for the Assumption of Equal Variances_ Levene TestUploaded byPedro Barbosa
- bahan2Uploaded byAngad Singh
- DatasciencemethodologyBestpracticesforsuccessfulimplementationsUploaded bySagar