0 views

Uploaded by Ali Elattar

- 9 Correlation
- Q3Reg
- RPoE.pdf
- Regression Analysis Multiple Choice
- Determinants of Total Consumption in Sudan
- Case Study IE322
- 1-Budgting Concepts and Forcasting
- Yeye
- Regression
- Problemas bono para Tercer examen de Estadística - Verano 2012
- Co Relation
- Chapter 7 and 8
- Regression Analysis,Tools and Techniques
- BI VARIATE STAT GRAPHS TI 84
- Correlation and Regression11
- Professional Realty Word
- CCP403
- AlokKumar_DissertationFinal
- Morimoto 2004
- Analysis of relationship between road safety and road design parameters of four lane National Highway in India

You are on page 1of 50

BUSINESS

STATISTICS

by

AMIR D. ACZEL

&

JAYAVEL SOUNDERPANDIAN

7th edition.

University

Chapter 10

Simple Linear Regression and Correlation

McGraw-Hill/Irwin Copyright © 2009 by The McGraw-Hill Companies, Inc. All rights reserved.

10-2

• Using Statistics

• The Simple Linear Regression Model

• Estimation: The Method of Least Squares

• Error Variance and the Standard Errors of Regression Estimators

• Correlation

• Hypothesis Tests about the Regression Relationship

• How Good is the Regression?

• Analysis of Variance Table and an F Test of the Regression Model

• Residual Analysis and Checking for Model Inadequacies

• Use of the Regression Model for Prediction

• The Solver Method for Regression

10-3

10 LEARNING OBJECTIVES

After studying this chapter, you should be able to:

• Determine whether a regression experiment would be useful in a given

instance

• Formulate a regression model

• Compute a regression equation

• Compute the covariance and the correlation coefficient of two random

variables

• Compute confidence intervals for regression coefficients

• Compute a prediction interval for the dependent variable

10-4

• Test hypothesis about a regression coefficients

• Conduct an ANOVA experiment using regression results

• Analyze residuals to check if the assumptions about the

regression model are valid

• Solve regression problems using spreadsheet templates

• Use LINEST function to carry out a regression

10-5

relationship between variables.

• In simple linear regression, we model the relationship

between two variables.

• One of the variables, denoted by Y, is called the dependent

variable and the other, denoted by X, is called the

independent variable.

• The model we will use to depict the relationship between X and

Y will be a straight-line relationship.

• A graphical sketch of the the pairs (X, Y) is called a scatter

plot.

10-6

This scatterplot locates pairs of observations of Scatterplot of Advertising Expenditures (X) and Sales (Y)

advertising expenditures on the x-axis and sales 140

100

Sales

80

Larger (smaller) values of sales tend to be 60

associated with larger (smaller) values of 40

advertising. 20

0

0 10 20 30 40 50

A d ve rtising

The scatter of points tends to be distributed around a positively sloped straight line.

The pairs of values of advertising expenditures and sales are not located exactly on a

straight line.

The scatter plot reveals a more or less strong tendency rather than a precise linear

relationship.

The line represents the nature of the relationship on average.

10-7

Y

Y

Y

X 0 X X

Y

Y

X X X

10-8

Model Building

relationship between component is the variation

advertising and sales of means between samples

suggests that a statistical or treatments (SSTR) and

model might be useful in

Statistical the random component is

analyzing the relationship. model the unexplained variation

(SSE).

A statistical model separates

the systematic component Systematic In regression, the

of a relationship from the systematic component is

component

random component. the overall linear

+ relationship, and the

Random random component is the

errors variation around the line.

10-9

Model

The population simple linear regression model:

Y= 0 + 1 X +

Nonrandom or Random

Systematic Component

Component

where

Y is the dependent variable, the variable we wish to explain or predict

X is the independent variable, also called the predictor variable

is the error term, the only random component in the model, and thus, the

only source of randomness in Y.

1 is the slope of the systematic component.

10-10

Regression Model

Y

Regression Plot The simple linear regression

model gives an exact linear

relationship between the

expected or average value of Y,

the dependent variable, and X,

E[Y]=0 + 1 X

the independent or predictor

Yi

variable:

{

Error: i } 1 = Slope

E[Yi]=0 + 1 Xi

}

1

Actual observed values of Y

0 = Intercept

differ from the expected value by

an unexplained or random error:

X

Yi = E[Yi] + i

Xi = 0 + 1 Xi + i

10-11

Regression Model

• The relationship between X and Y is a Assumptions of the Simple

straight-line relationship. Y Linear Regression Model

• The values of the independent

variable X are assumed fixed (not

random); the only randomness in the

values of Y comes from the error term

i. E[Y]=0 + 1 X

• The errors i are normally distributed

with mean 0 and variance 2. The

errors are uncorrelated (not related)

in successive observations. That is:

~ N(0,2)

Identical normal

distributions of errors,

all centered on the

regression line.

X

10-12

Squares

Estimation of a simple linear regression relationship involves finding

estimated or predicted values of the intercept and slope of the linear

regression line.

Y = b0 + b1X + e

b1 estimates the slope of the population regression line, 1;

and e stands for the observed errors - the residuals from fitting the estimated

regression line b0 + b1X to a set of n points.

The estimated regression line:

Y b0 + b1 X

where Y (Y - hat) is the value of Y lying on the fitted regression line for a given

value of X.

10-13

Y Y

Data

Three errors from the

least squares regression

X line X

Y

from a fitted line squares regression

line are minimized

X X

10-14

Errors in Regression

Y

the observeddata point

Y b0 b1 X the fitted regression line

Yi .

Yi

{

Error ei Yi Yi

Yi the predicted value of Y for X

i

X

Xi

10-15

n n

SSE = e

i=1

2

i (y

i=1

i y i ) 2

The least squares regression line is that which minimizes the SSE

with respect to the estimates b 0 and b 1 .

n n

y

i=1

i nb0 b1 x i

i=1

At this point

SSE is

Least squares b0 minimized

n n n with respect

x y

i=1

i i b0 x i b1 x 2i

i=1 i=1

to b0 and b1

Least squares b1 b1

10-16

and Least Squares Estimators

Sums of Squares and Cross Products:

x

2

SSx (x x ) x

2 2

n 2

SS y ( y y ) y

2 2 y

n

SSxy (x x )( y y ) xy

x ( y )

n

Least squares regression estimators:

SS XY

b1

SS X

b0 y b1 x

10-17

Example 10-1

2 x 2

1211

1345

1802

2405

1466521

1809025

2182222

3234725

SS x x

1422 2005 2022084 2851110 n

1687 2511 2845969 4236057 2

1849 2332 3418801 4311868 79, 448

2026 2305 4104676 4669930 293, 426,946 40,947 ,557.84

2133 3016 4549689 6433128 25

2253

2400

3385

3090

5076009

5760000

7626405

7416000 x ( y )

2468 3694 6091024 9116792 SS xy xy

2699 3371 7284601 9098329 n

2806 3998 7873636 11218388

(79, 448)(106,605)

390,185,014 51, 402,852.4

3082 3555 9498724 10956510

3209 4692 10297681 15056628

3466 4244 12013156 14709704 25

3643 5298 13271449 19300614

3852 4801 14837904 18493452 SS 51, 402,852.4

4033 5147 16265089 20757852 b XY 1.255333776 1.26

4267 5738 18207288 24484046 1 SS 40,947 ,557.84

4498 6420 20232004 28877160 X

4533 6059 20548088 27465448

4804 6426 23078416 30870504 106,605 79,448

5090 6321 25908100 32173890 b y b x (1.255333776 )

5233 7026 27384288 36767056

0 1 25 25

5439 6964 29582720 37877196

79,448 106,605 293,426,946 390,185,014 274.85

10-18

used to carry out a Simple Regression

10-19

to carry out a Simple Regression

10-20

to carry out a Simple Regression

between the residuals and the X-values (miles).

10-21

to carry out a Simple Regression

would indicate that the normality assumption for the errors has not

been violated.

10-22

Y Y

X X

What you see when looking

at the total variation of Y.

along the regression line at

the error variance of Y.

10-23

Errors of Regression Estimators

Y

Degrees of Freedom in Regression:

for each parameter estimated (b 0 and b1 ) )

2 Square and sum all

2 ( SS XY ) regression errors to find

SSE = ( Y - Y ) SSY SSE.

SS X X

SSE = SS Y b1 SS XY

2 2 66855898 (1.255333776)( 51402852 .4 )

An unbiased estimator of s , denoted by S :

2328161.2

SSE 2328161.2

SSE MSE

MSE = n2 23

(n - 2) 101224 .4

s MSE 101224 .4 318.158

10-24

Regression

2

s x

s(b0 )

s(b0 )

s x 2

nSS X

nSS X 318.158 293426944

( 25)( 4097557.84 )

where s = MSE 170.338

s

The standard error of b1 (slope): s(b1 )

SS X

318.158

s

s(b1 ) 40947557.84

SS X 0.04972

10-25

Regression Parameters

A (1 - ) 100% confidence interval for b :

0

b t s (b ) Example 10 - 1

0 ,(n 2 ) 0 95% Confidence Intervals:

2

b t s (b )

0 0.025,( 25 2 ) 0

A (1 - ) 100% confidence interval for b : = 274.85 ( 2.069) (170.338)

1

b t s (b ) 274.85 352.43

1 ,(n 2 ) 1

2 [ 77.58, 627.28]

Least-squares point estimate:

b1=1.25533

b1 t s (b1 )

0.025,( 25 2 )

= 1.25533 ( 2.069) ( 0.04972 )

Height = Slope

1.25533 010287

.

[115246

. ,1.35820]

Length = 1

regression slope at 95%)

10-26

to obtain Confidence Intervals for 0 and 1

10-27

10-5 Correlation

degree of linear association between the two variables.

The population correlation, denoted by, can take on any value from -1 to 1.

-1 < < 0 indicates a negative linear relationship

0 indicates no linear relationship

0<<1 indicates a positive linear relationship

1 indicates a perfect positive linear relationship

10-28

Illustrations of Correlation

Y Y Y

= -1 =0

=1

X X X

Y = -.8 Y =0 Y

= .8

X X X

10-29

The covariance of two random variables X and Y:

Cov ( X , Y ) E [( X )(Y )]

X Y

where and Y are the population means of X and Y respectively.

X

Cov ( X , Y ) SS

= XY

r=

SS SS

X Y X Y

51402852.4

The sample correlation coefficient * :

( 40947557.84)( 66855898)

SS

XY 51402852.4

r= .9824

SS SS 52321943.29

X Y

10-30

Coefficient

Example 10 -1:

r

H0: = 0 (No linear relationship) t( n 2 )

H1: 0 (Some linear relationship) 1 r2

n2

0.9824

r =

Test Statistic: t( n 2 ) 1 - 0.9651

1 r2

25 - 2

n2 0.9824

= 25.25

0.0389

t0. 005 2.807 25.25

H 0 rejected at 1% level

10-31

Regression Relationship

Constant Y Unsystematic Variation Nonlinear Relationship

Y Y Y

X X X

A hypothesis test for the existence of a linear relationship between X and Y:

H0: 1 0

H1: 1 0

Test statistic for the existence of a linear relationship between X and Y:

b

1

t

(n - 2) s(b )

1

where b is the least - squares estimate of the regression slope and s ( b ) is the standard error of b .

1 1 1

When the null hypothesis is true, the statistic has a t distribution with n - 2 degrees of freedom.

10-32

Slope

Example 10 - 1: Example10 - 4 :

H0: 1 0 H : 1

0 1

H1: 1 0 H : 1

1 1

b b 1

1 1

t t

(n - 2) s(b ) ( n - 2) s (b )

1

1

1.24 - 1

1.25533 = 1.14

= 25.25 0.21

0.04972

t 1.671 1.14

t 2.807 25.25 (0.05,58)

( 0 . 005 , 23 ) H is not rejected at the10% level.

0

H 0 is rejected at the 1% level and we may

We may not conclude that the beta

conclude that there is a relationship between

coefficien t is different from 1.

charges and miles traveled.

10-33

the regression relationship, a measure of how well the regression line fits the data.

( y y ) ( y y) ( y y )

Y Total = Unexplained Explained

Deviation Deviation Deviation

Y . (Error) (Regression)

Y

Y

Unexplained Deviation

Explained Deviation

{

}

{

Total Deviation

SST

2

= SSE

2

( y y ) ( y y) ( y y )

+ SSR

Percentage of

2

2 SSR SSE

r 1 total variation

SST SST explained by

X

X the regression.

10-34

Y Y Y

X X X

SST SST SST

S

r2 = 0 SSE r2 = 0.50 SSE SSR r2 = 0.90 S SSR

E

7000

Example 10 -1: 6000

5000

Dollars

SSR 64527736.8

r 2

0.96518 4000

2000

1000 1500 2000 2500 3000 3500 4000 4500 5000 5500

Miles

10-35

an F Test of the Regression Model

Source of Sum of Degrees of

Variation Squares Freedom Mean Square F Ratio

MSE

Error SSE (n-2) MSE

Total SST (n-1) MST

Example 10-1

Source of Sum of Degrees of

Variation Squares Freedom F Ratio p Value

Mean Square

Regression 64527736.8 1 64527736.8 637.47 0.000

Error 2328161.2 23 101224.4

Total 66855898.0 24

10-36

Variance and an F Test of the Regression Model

10-37

for Model Inadequacies

Residuals Residuals

0 0

x or y x or y

random. No indication of model inadequacy. increases when x changes.

Residuals Residuals

0 0

Time x or y

Residuals exhibit a linear trend with time. underlying nonlinear relationship.

10-38

Residuals

Flatter than Normal

10-39

Residuals

10-40

Residuals

Positively Skewed

10-41

Residuals

Negatively Skewed

10-42

for Prediction

• Point Prediction

A single-valued estimate of Y for a given value of X obtained by

inserting the value of X in the estimated regression equation.

• Prediction Interval

For a value of Y given a value of X

Variation in regression line estimate

Variation of points around regression line

For an average value of Y given a value of X

Variation in regression line estimate

10-43

Regression line Regression line

Y Y Lower limit on intercept

X X X X

slope of the regression line intercept of the regression line

10-44

Regression narrowest at the mean value of X.

line • The prediction band widens as the

distance from the mean of X increases.

Y • Predictions become very unreliable when

we extrapolate beyond the range of the

sample itself.

X X

10-45

Value of Y

Y

Regression line Y Prediction band for E[Y|X]

Regression

line

X X X

3) Variation around the regression

line Prediction Interval for E[Y|X]

10-46

1 (x x) 2

yˆ t s 1

2 n SS X

Example10 - 1 (X = 4,000) :

1 (4,000 3,177.92) 2

25 40,947,557.84

10-47

Value of Y

A (1 - ) 100% prediction interval for the E[Y X] :

1 (x x) 2

yˆ t s

2 n SS X

Example10 - 1 (X = 4,000) :

1 (4,000 3,177.92) 2

25 40,947,557.84

10-48

Intervals

10-49

Regression

The solver macro available in EXCEL can also be used to conduct a

simple linear regression. See the text for instructions.

10-50

Regression

Y = - 0.8465 + 1.352 X

9.0 S 0.184266

R-Sq 95.2%

R-Sq(adj) 94.8%

8.5

8.0

7.5

Y

7.0

6.5

6.0

5.5 6.0 6.5 7.0 7.5

X

- 9 CorrelationUploaded byJude Patrick Sabaybay
- Q3RegUploaded byUdita Banik
- RPoE.pdfUploaded byVaibhav Rajvanshi
- Regression Analysis Multiple ChoiceUploaded byAugust Mshingie
- Determinants of Total Consumption in SudanUploaded byDonald Patrick
- Case Study IE322Uploaded bylrg5092
- 1-Budgting Concepts and ForcastingUploaded bykcp123
- YeyeUploaded byDenny
- RegressionUploaded byRohit Arora
- Problemas bono para Tercer examen de Estadística - Verano 2012Uploaded byDavid Meza Carbajal
- Co RelationUploaded byAbdur Rajak
- Chapter 7 and 8Uploaded byAdam Glassner
- Regression Analysis,Tools and TechniquesUploaded byEngr Mujahid Iqbal
- BI VARIATE STAT GRAPHS TI 84Uploaded byBrenda Lynn
- Correlation and Regression11Uploaded byakash pradhan
- Professional Realty WordUploaded byKamal Ahmmad Munna
- CCP403Uploaded byapi-3849444
- AlokKumar_DissertationFinalUploaded byMonal Mehta
- Morimoto 2004Uploaded bydilum
- Analysis of relationship between road safety and road design parameters of four lane National Highway in IndiaUploaded byRavi Shenker
- Regression AnalysisUploaded byGeorge Fadri
- UAS Tek LingkungaseUploaded byAnonymous SYssobZ
- 125873000-Kuiper-Ch03.pdfUploaded byjmurcia78
- Table of Contents 127495Uploaded byrahsarah
- linear regression projectUploaded byapi-251694928
- 215 Final Exam Formula SheetUploaded byH.C. Z.
- Incentives to Inflate Reported Cash from Operations Using Classification and TimingUploaded bytrixionary
- Quality of Higher Education in Public and Private Universities in Bangladesh_SubmissionUploaded byKhondaker Sazzadul Karim
- TEACHERS PERFORMANCE MANAGEMENT SYSTEM AT ISOMORPHIC HIGHER EDUCATIONAL INSTITUTIONS.Uploaded byIJAR Journal
- Fu Ch11 Linear RegressionUploaded byjong

- Ch02Uploaded byAli Zainal Abidin
- Cost Analysis HospitalsUploaded byGeta_Varvaroi_6157
- Chapt 5 Basic Organization Designs.pptUploaded byAbu Umar
- Chapt 10 Motivating and rewarding employess.pptUploaded byAbu Umar
- Primer Hosp Acct Finance 4 the dUploaded byAli Elattar
- DBA Research TempleteUploaded byAli Elattar
- Ch07Uploaded byAli Elattar
- Ch09Uploaded byAli Elattar
- Fundamental of ManagmentUploaded byKuldeep Jangid
- tmeihs.pdfUploaded byAli Elattar
- tmeihsUploaded byAli Elattar
- Robbins S and DeCenzo Chapter3 FundamentalsOfManagementUploaded byMaritza Figueroa P.
- Primer Hosp Acct Finance 4 the dUploaded byTejinder Singh
- Ch05Uploaded byAli Elattar
- خطوات الإدارة الاستراتيجيةUploaded byAli Elattar
- Chapter8 conflict analysis.pdfUploaded byTeecast Tv
- اقتصاديات الصحةUploaded byAli Elattar
- أنواع الاستراتيجياتUploaded byAli Elattar
- وصف مساق استراتيجيات التسويق (1)Uploaded byAli Elattar
- ExecutionUploaded byAli Elattar
- Discussion Log BookUploaded byAli Elattar
- خطوات الإدارة الاستراتيجية.pdfUploaded byAli Elattar
- scarb_eesbm6e_ppt_06.pptUploaded byAli Elattar
- خطوات الإدارة الاستراتيجيةUploaded byAli Elattar
- Student Slides Chapter 10Uploaded byGaurav Widhani
- ISO 17025 ArabicUploaded byAli Elattar
- كتيب الطبيب المقيمUploaded byAli Elattar
- log book 2Uploaded byAli Elattar
- AssignmentUploaded byAli Elattar

- Homework 2 SolutionUploaded byJung Yoon Song
- Cultural Risk Assessment in ConstructionUploaded byTAHER AMMAR
- 08 Health and EnvironmentUploaded byM Ahmed Latif
- Bennett, KarenUploaded byMaríaMaría
- policy-brief-food-advertising-to-childrenUploaded byapi-267120287
- Phases of project managementUploaded byAbuIbrahimButt
- Absenteeism in Primary Schools Jamaica CookUploaded byCyrelyn Andric
- Kulpa Liinason Gjss EditorialUploaded bya.r.k.
- Peer 08Uploaded byLin Yan
- Cree Language Survey Report CopyUploaded byShannon M Houle
- Early History of the Kappa Statistic - Smeeton (1985)Uploaded byEduardo Aguirre Dávila
- Krauss - Inferring Speaker's Physical Attributes From Their VoicesUploaded byKing Stone
- Development and Validation of Hplc Method for Simultaneous Estimation of Clindamycin Phosphate and Benzoyl Peroxide in Gel FormulationUploaded byYulia Elf
- Ch 6-2 Comparing Diagnostic TestsUploaded byJimmy
- BA TecniquesUploaded byloveykhurana5980
- Full Report 2642Uploaded bytommylevitt
- Progress Report on Knowledge Transfer ProgrammeUploaded byMary Ann S
- veranixon rcapaperUploaded byapi-291204444
- aguinis_pm3_ppt_06Uploaded bymjr
- Technology ForecastingUploaded byvinay_usms
- ContentServer_2Uploaded byHabib Salim
- Chapter31357910121617212325272930323336and38.Uploaded bywraith324
- SIX SIGMAUploaded byGeetanjali Johar
- Ch 1 Intro to StatsUploaded byRohit Kumar
- Cheddar Man Paper Population Replacement in Early Neolithic BritainUploaded bymurtibing
- Manufacturing Processes - i Lecture 1Uploaded byayaan
- Strut and Tie ModelUploaded byChaudharyShubhamSachan
- Adolescent-and-Drug-Abuse-in-Tertiary-Institution-Implication-for-Counselling.pdfUploaded byAKINYEMI ADISA KAMORU
- Television and Active Audience - Sonia LivingstoneUploaded byLorena Caminhas
- Transmission Media and Antenna SystemsUploaded byEunice Jane Bolgado-Doctor