Complete Business Statistics: Simple Linear Regression and Correlation

COMPLETE
BUSINESS
STATISTICS
by
AMIR D. ACZEL
&
JAYAVEL SOUNDERPANDIAN
7th edition.
Prepared by Lloyd Jaisingh, Morehead State
University
Chapter 10
Simple Linear Regression and Correlation

McGraw-Hill/Irwin
Copyright 2009 by The McGraw-Hill Companies, Inc. All
10-2
10
Simple Linear Regression and Correlation
Using Statistics
The Simple Linear Regression Model
Estimation: The Method of Least Squares
Error Variance and the Standard Errors of Regression Estimators
Correlation
Hypothesis Tests about the Regression Relationship
How Good is the Regression?
Analysis of Variance Table and an F Test of the Regression Model
Residual Analysis and Checking for Model Inadequacies
Use of the Regression Model for Prediction
The Solver Method for Regression
10-3
10 LEARNING OBJECTIVES
After studying this chapter, you should be able to:
Determine whether a regression experiment would be useful in a given

instance
Formulate a regression model
Compute a regression equation
Compute the covariance and the correlation coefficient of two random
variables
Compute confidence intervals for regression coefficients
Compute a prediction interval for the dependent variable
10-4
10
LEARNING OBJECTIVES (continued)
After studying this chapter, you should be able to:

Test hypothesis about a regression coefficients
Conduct an ANOVA experiment using regression results
Analyze residuals to check if the assumptions about the
regression model are valid
Solve regression problems using spreadsheet templates
Use LINEST function to carry out a regression
10-5
10-1 Using Statistics

Regression refers to the statistical technique of modeling the
relationship between variables.
In simple linear regression,
regression we model the relationship
between two variables.
variables
One of the variables, denoted by Y, is called the dependent
variable and the other, denoted by X, is called the
independent variable.
variable
The model we will use to depict the relationship between X
and Y will be a straight-line relationship.
relationship
A graphical sketch of the the pairs (X, Y) is called a scatter
plot.
plot
10-6
10-1 Using Statistics

This scatterplot locates pairs of observations of
advertising expenditures on the x-axis and sales
on the y-axis. We notice that:
Scatterplot of Advertising Expenditures (X) and Sales (Y)

140
120
Larger (smaller) values of sales tend to be

associated with larger (smaller) values of
advertising.
Sales
100
80
60
40
20
0
0
10
20
30
40
50
A d ve rtising
The scatter of points tends to be distributed around a positively sloped straight line.
The pairs of values of advertising expenditures and sales are not located exactly on a
straight line.
The scatter plot reveals a more or less strong tendency rather than a precise linear
relationship.
The line represents the nature of the relationship on average.
10-7
Examples of Other Scatterplots

0
0
0
10-8
Model Building
Theinexact
inexactnature
natureof
ofthe
the
The
relationshipbetween
between
relationship
advertisingand
andsales
sales
advertising
suggeststhat
thataastatistical
statistical
suggests
modelmight
mightbe
beuseful
usefulinin
model
analyzingthe
therelationship.
relationship.
analyzing
statisticalmodel
modelseparates
separates
AAstatistical
thesystematic
systematiccomponent
component
the
ofaarelationship
relationshipfrom
fromthe
the
of
randomcomponent.
component.
component
random
component
Data
Statistical
model
Systematic
component
+
Random
errors
InANOVA,
ANOVA,the
thesystematic
systematic
In
componentisisthe
thevariation
variation
component
ofmeans
meansbetween
betweensamples
samples
of
ortreatments
treatments(SSTR)
(SSTR)and
and
or
therandom
randomcomponent
componentisis
the
theunexplained
unexplainedvariation
variation
the
(SSE).
(SSE).
Inregression,
regression,
the
regression the
In
regression
systematiccomponent
componentisis
systematic
theoverall
overalllinear
linear
the
relationship,and
andthe
the
relationship,
randomcomponent
componentisisthe
the
random
variationaround
aroundthe
theline.
line.
variation
10-2 The Simple Linear Regression

Model
10-9
Thepopulation
populationsimple
simplelinear
linearregression
regressionmodel:
model:
The
Y= 0++1XX
Y=
0
1
Nonrandomoror
Nonrandom
Systematic
Systematic
Component
Component
++
Random
Random
Component
Component
where
where
YYisisthe
thedependent
dependentvariable,
variable,the
thevariable
variablewe
wewish
wishtotoexplain
explainor
orpredict
predict
XXisisthe
theindependent
independentvariable,
variable,also
alsocalled
calledthe
thepredictor
predictorvariable
variable
isisthe
theerror
errorterm,
term,the
theonly
onlyrandom
randomcomponent
componentininthe
themodel,
model,and
andthus,
thus,the
the
onlysource
sourceof
ofrandomness
randomnessininY.Y.
only
0isisthe
theintercept
interceptof
ofthe
thesystematic
systematiccomponent
componentof
ofthe
theregression
regressionrelationship.
relationship.
0
1isisthe
theslope
slopeof
ofthe
thesystematic
systematiccomponent.
component.
1
Theconditional
conditionalmean
meanof
ofY:
Y: E [Y X ]
The
0 1 X
10-10
Picturing the Simple Linear

Regression Model
Y
Regression Plot
E[Y]=0 + 1 X
Yi
1 = Slope
Error: i
Actualobserved
observedvalues
valuesof
ofYY
Actual
differfrom
fromthe
theexpected
expectedvalue
valueby
by
differ
anunexplained
unexplainedor
orrandom
randomerror:
error:
an
0 = Intercept
Xi
Thesimple
simplelinear
linearregression
regression
The
modelgives
givesan
anexact
exactlinear
linear
model
relationshipbetween
betweenthe
the
relationship
expectedor
oraverage
averagevalue
valueof
ofY,
Y,
expected
thedependent
dependentvariable,
variable,and
andX,
X,
the
theindependent
independentor
orpredictor
predictor
the
variable:
variable:
E[Y]=
Xi
i]=0++1X
E[Y
E[Y]i]++ i
YYi i==E[Y
i
i
i + i
==00++11XXi +
i
Assumptions of the Simple Linear

Regression Model
Therelationship
relationshipbetween
betweenXXand
andYYisisaa
The
straight-linerelationship.
relationship.
straight-line
Thevalues
valuesofofthe
theindependent
independent
The
variableXXare
areassumed
assumedfixed
fixed(not
(not
variable
random);the
theonly
onlyrandomness
randomnessininthe
the
random);
valuesofofYYcomes
comesfrom
fromthe
theerror
errorterm
term
values
i.i.
Theerrors
errorsi iare
arenormally
normallydistributed
distributed
The
withmean
mean00and
andvariance
variance2.2. The
The
with
errorsare
areuncorrelated
uncorrelated(not
(notrelated)
related)
errors
successiveobservations.
observations. That
Thatis:
is:
ininsuccessive
~N(0,
N(0,2)2)
~
10-11
Assumptions of the Simple

Linear Regression Model
E[Y]=0 + 1 X
Identical normal
distributions of errors,
all centered on the
regression line.
10-12
10-3 Estimation: The Method of Least

Squares
Estimationof
ofaasimple
simplelinear
linearregression
regressionrelationship
relationshipinvolves
involvesfinding
finding
Estimation
estimatedor
orpredicted
predictedvalues
valuesof
ofthe
theintercept
interceptand
andslope
slopeof
ofthe
thelinear
linear
estimated
regressionline.
line.
regression
Theestimated
estimatedregression
regressionequation:
equation:
The
1X++ee
YY==bb0++bbX
0
wherebb0estimates
estimatesthe
theintercept
interceptof
ofthe
thepopulation
populationregression
regressionline,
line,0;;
where
0
0
estimatesthe
theslope
slopeof
ofthe
thepopulation
populationregression
regressionline,
line,;1;
bb11estimates
1
andeestands
standsfor
forthe
theobserved
observederrors
errors--the
theresiduals
residualsfrom
fromfitting
fittingthe
theestimated
estimated
and
regressionline
linebb0++bbX
toaaset
setof
ofnnpoints.
points.
1X to
regression
0
1
Theestimated
estimatedregression
regressionline:
line:
The
YY bb00 ++bb11XX
(Y
whereY
(Y--hat)
hat)isisthe
thevalue
valueof
ofYYlying
lyingon
onthe
thefitted
fittedregression
regressionline
linefor
foraagiven
given
Y
where
valueof
ofX.
X.
value
10-13
Fitting a Regression Line

Y
Data
Three errors from the

least squares regression
line
X
Three errors
from a fitted line
X
Errors from the least

squares regression
line are minimized
X
10-14
Errors in Regression
Y
the observed data point
Yi
Error ei Yi Yi
Yi
Xi
Y b0 b1 X
the fitted regression line
Yi the predicted value of Y for X

i
10-15
Least Squares Regression

The sum of squared errors in regression is:
SSE =
e
i=1
2
i
(y
y i ) 2
i=1
The least squares regression line is that which minimizes the SSE
with respect to the estimates b 0 and b 1 .
The normal equations:
n
b0
i=1
nb0 b1 x i
i=1
x y
i
i=1
SSE
i=1
i=1
b0 x i b1 x 2i
Least squares b0
Least squares b1
At this point
SSE is
minimized
with respect
to b0 and b1
b1
Sums of Squares, Cross Products,

and Least Squares Estimators
Sumsof
ofSquares
Squaresand
andCross
CrossProducts:
Products:
Sums
2
2
x
x
2
2
2
2
SSx
((xxxx))
xx n
SS
x
n 22
y
y
2
2
2
2
SS y
((yyyy))
yy n
SS
y
n
x ((
y))
x
y
SSxy
xy
((xxxx)()(yyyy))
xy
SS
xy
nn
Leastsquares
squaresregression
regressionestimators:
estimators:
Least
SS XY
SS
bb11 SSXY
SS XX
bb00 yybb11xx
10-16
10-17
Example 10-1
Miles
Miles
1211
1211
1345
1345
1422
1422
1687
1687
1849
1849
2026
2026
2133
2133
2253
2253
2400
2400
2468
2468
2699
2699
2806
2806
3082
3082
3209
3209
3466
3466
3643
3643
3852
3852
4033
4033
4267
4267
4498
4498
4533
4533
4804
4804
5090
5090
5233
5233
5439
5439
79,448
79,448
Dollars
Miles2 2 Miles*Dollars
Miles*Dollars
Dollars
Miles
1802
1466521
2182222
1802
1466521
2182222
2405
1809025
3234725
2405
1809025
3234725
2005
2022084
2851110
2005
2022084
2851110
2511
2845969
4236057
2511
2845969
4236057
2332
3418801
4311868
2332
3418801
4311868
2305
4104676
4669930
2305
4104676
4669930
3016
4549689
6433128
3016
4549689
6433128
3385
5076009
7626405
3385
5076009
7626405
3090
5760000
7416000
3090
5760000
7416000
3694
6091024
9116792
3694
6091024
9116792
3371
7284601
9098329
3371
7284601
9098329
3998
7873636
11218388
3998
7873636
11218388
3555
9498724
10956510
3555
9498724
10956510
4692
10297681
15056628
4692
10297681
15056628
4244
12013156
14709704
4244
12013156
14709704
5298
13271449
19300614
5298
13271449
19300614
4801
14837904
18493452
4801
14837904
18493452
5147
16265089
20757852
5147
16265089
20757852
5738
18207288
24484046
5738
18207288
24484046
6420
20232004
28877160
6420
20232004
28877160
6059
20548088
27465448
6059
20548088
27465448
6426
23078416
30870504
6426
23078416
30870504
6321
25908100
32173890
6321
25908100
32173890
7026
27384288
36767056
7026
27384288
36767056
6964
29582720
37877196
6964
29582720
37877196
106,605
293,426,946
390,185,014
106,605 293,426,946
390,185,014
22
2
x2 x
SS x x
SS
x
nn
2
79, 448
, 4482
79
293, 426
, 426,946
,946
40,947
,947,557
,557.84
.84
293
40
25
25
xx ( (yy))
SS
xy
SS xy
xy xy
nn
79, 448
, 448)()(106
106,605
,605))
((79
390
,
185
,
014
51,402
,402,852
,852.4.4
390,185,014
51
25
25
SS
51, 402
, 402,852
,852.4.4
SS
XY 51
XY
b
.25533377611.26
.26
b 1
11.255333776
SS
40,947
,947,557
,557.84
.84
1 SS
40
X
X
106,605
,605
79, 448
, 448
106
) 79
b
b
x
(
1
.
255333776
b 0 y b 1x
(1.255333776)
25
25
0
1
25
25
274.85
.85
274
10-18
Template (partial output) that can be

used to carry out a Simple Regression
10-19
Template (continued) that can be used

to carry out a Simple Regression
10-20

Residual Analysis. The plot shows the absence of a relationship

between the residuals and the X-values (miles).
10-21

Note: The normal probability plot is approximately linear. This

would indicate that the normality assumption for the errors has not
been violated.
10-22
Total Variance and Error Variance

Y
What you see when looking

at the total variation of Y.
What you see when looking

along the regression line at
the error variance of Y.
10-23
10-4 Error Variance and the Standard

Errors of Regression Estimators
Y
Degrees of Freedom in Regression:

df = (n - 2) (n total observations less one degree of freedom
for each parameter estimated (b 0 and b1 ) )
2
(
SS
)
2
XY
SSE = ( Y - Y ) SSY
SS X
= SSY b1SS XY
2
2
An unbiased estimator of s , denoted by S :
MSE =
SSE
(n - 2)
Square and sum all

regression errors to find
SSE.
Example 10 - 1:
SSE = SS Y b1 SS XY
66855898 (1.255333776)( 51402852 .4 )
2328161.2
MSE
SSE
n2
101224 .4
s
MSE
2328161.2
23
101224 . 4 318.158
Standard Errors of Estimates in

Regression
Thestandard
standarderror
errorof
ofbb00 (intercept)
(intercept)::
The
22
s
x
s x
s
(
b
)
s(b00) nSS
nSS XX
wheress == MSE
MSE
where
Thestandard
standarderror
errorof
ofbb11(slope)
(slope)::
The
ss
s
(
b
)
s(b11) SS
SS XX
Example10
10- -1:1:
Example
22
s
x
s x
s
(
b
)
s(b00)
nSS X
nSS
X
318.158
.158 293426944
293426944
318
25)()(4097557
4097557.84
.84) )
( (25
170.338
.338
170
ss
ss(b(b11) )
SS X
SS
X
318
.158
318.158
40947557.84
.84
40947557
.04972
00.04972
10-24
Confidence Intervals for the

Regression Parameters
A (1 - ) 100% confidence interval for b :
0
b t
s (b )
0 ,(n 2 )
0
2
A (1 - ) 100% confidence interval for b :

1
b t
s (b )
1 ,(n 2 )
1
2
r9
5%
Up
pe
r
we
Lo
%
95
d:
un
o
b
Length = 1
5
1.1
Height = Slope
bo
un
d
on
slo
pe
:1
.3
58
20
Least-squares point estimate:

b1=1.25533
6
24
(not a possible value of the

regression slope at 95%)
Example10
10--11
Example
95%Confidence
ConfidenceIntervals:
Intervals:
95%
bb00tt 0.025,( 25 2 )s s((bb00))

0.025,( 25 2 )
274.85((2.069)
2.069)(170
(170.338
.338))
==274.85
274.85
.85352
352.43
.43
274
77.58
.58, 627
, 627.28
.28]]
[[77
bb11tt 0.025,( 25 2 )s s((bb11))

0.025,( 25 2 )
1.25533((2.069)
2.069)((00.04972
.04972))
==1.25533
.25533010287
010287
11.25533
..
[115246
.35820]]
[115246
..
,1,1.35820
10-25
10-26
Template (partial output) that can be used

to obtain Confidence Intervals for and
10-27
10-5 Correlation
Thecorrelation
correlationbetween
betweentwo
tworandom
randomvariables,
variables,XXand
andY,
Y,isisaameasure
measureof
ofthe
the
The
degreeof
of linear
linearassociation
associationbetween
betweenthe
thetwo
twovariables.
variables.
degree
Thepopulation
populationcorrelation,
correlation,denoted
denotedby,
by,can
cantake
takeon
onany
anyvalue
valuefrom
from-1
-1toto1.1.
The
indicates
indicatesaaperfect
perfectnegative
negativelinear
linearrelationship
relationship
-1<<<<00 indicates
indicatesaanegative
negativelinear
linearrelationship
relationship
-1
indicates
indicatesno
nolinear
linearrelationship
relationship
indicatesaapositive
positivelinear
linearrelationship
relationship
00<<<<11 indicates
indicates
indicatesaaperfect
perfectpositive
positivelinear
linearrelationship
relationship
Theabsolute
absolutevalue
valueof
ofindicates
indicatesthe
thestrength
strengthor
orexactness
exactnessof
ofthe
therelationship.
relationship.
The
10-28
Illustrations of Correlation
Y
= -1
= 0
Y
= 1
X
Y
= -.8
X
Y
= 0
Y
= .8
10-29
Covariance and Correlation

The covariance of two random variables X and Y:
Cov ( X , Y ) E [( X )(Y )]
X
Y
where and Y are the population means of X and Y respectively.
X
The population correlation coefficient:
Cov ( X , Y )
=

X Y
The sample correlation coefficient * :

SS
XY
r=
SS SS
X Y
*Note:
Example 10 - 1:
SS
XY
r=
SS SS
X Y
51402852.4
( 40947557.84 )( 66855898)
51402852.4
.9824
52321943.29
If < 0, b1 < 0 If = 0, b1 = 0 If > 0, b1 >0
Hypothesis Tests for the Correlation

Coefficient
H0: = 0
H1: 0
(No linear relationship)

(Some linear relationship)
Test Statistic: t( n 2 )
r
1 r2
n2
Example10
10-1:
-1:
Example
rr
t
t( n( n22) )
22
1
r
1 r
nn22
0.9824
0.9824
=
= 1 - 0.9651
1 - 0.9651
25--22
25
0.9824
0.9824
25.25
.25
== 0.038925
0.0389
22.807
.80725
25.25
.25
tt00. 005
. 005
rejectedatat1%
1%level
level
HH00 rejected
10-30
10-31
10-6 Hypothesis Tests about the

Regression Relationship
Constant Y
Unsystematic Variation
Nonlinear Relationship
Y
A hypothesis test for the existence of a linear relationship between X and Y:

H0: 1 0
H1: 1 0
Test statistic for the existence of a linear relationship between X and Y:
b
1
t
(n - 2)
s(b )
1
where b is the least - squares estimate of the regression slope and s ( b ) is the standard error of b .
1
1
1
When the null hypothesis is true, the statistic has a t distribution with n - 2 degrees of freedom.
Hypothesis Tests for the Regression

Slope
Example 10 - 1:
H0: 1 0
H1: 1 0
(n - 2)
t
( 0. 005 , 23 )
b
1
s(b )
1
1.25533
0.04972
25.25
2.807 25.25
H 0 is rejected at the 1% level and we may

conclude that there is a relationship between
charges and miles traveled.
Example 10 - 4 :
H : 1
0 1
H : 1
1 1
b 1
t
1
( n - 2) s (b )
1
1.24 - 1
=
1.14
0.21
1.671 1.14
(0.05,58)
H is not rejected at the 10% level.
0
We may not conclude that the beta
coefficient is different from 1.
t
10-32
10-33
10-7 How Good is the Regression?

The coefficient of determination, r2, is a descriptive measure of the strength of
the regression relationship, a measure of how well the regression line fits the data.
( y y ) ( y y)
( y y )
Total = Unexplained Explained
Deviation
Deviation
Deviation
(Error)
(Regression)
Y
Unexplained Deviation
Explained Deviation
( y y ) ( y y) ( y y )
SST
= SSE
+ SSR
Total Deviation
r
X
X
SSR
SST
SSE
SST
Percentage of
total variation
explained by
the regression.
10-34
The Coefficient of Determination

Y
SST
r2 = 0
SSE
r2 = 0.50
SST
SSE SSR
Example 10 -1:
7000
SSR 64527736.8
r
0.96518
SST
66855898
5000
SST
SSR
6000
Dollars
r2 = 0.90
S
S
E
4000
3000
2000
1000 1500 2000 2500 3000 3500 4000 4500 5000 5500
Miles
10-8 Analysis-of-Variance Table and

an F Test of the Regression Model
Sourceof
of Sum
Sumof
of
Source
Variation Squares
Squares
Variation
Degreesof
of
Degrees
Freedom Mean
MeanSquare
Square FFRatio
Ratio
Freedom
Regression SSR
SSR
Regression
(1)
(1)
MSR
MSR
Error
Error
Total
Total
(n-2)
(n-2)
(n-1)
(n-1)
MSE
MSE
MST
MST
SSE
SSE
SST
SST
MSR
MSR
MSE
MSE
Example10-1
10-1
Example
Sourceofof Sum
Sumofof
Source
Variation Squares
Squares
Variation
Degreesofof
Degrees
Freedom
Freedom
Regression 64527736.8
64527736.8 11
Regression
Error
Error
Total
Total
2328161.2 23
23
2328161.2
66855898.0 24
24
66855898.0
Ratio ppValue
Value
FFRatio
MeanSquare
Square
Mean
64527736.8 637.47
637.47 0.000
0.000
64527736.8
101224.4
101224.4
10-35
10-36
Template (partial output) that displays Analysis of

Variance and an F Test of the Regression Model
10-9 Residual Analysis and Checking

for Model Inadequacies
Residuals
Residuals
x or y
x or y
Homoscedasticity: Residuals appear completely

random. No indication of model inadequacy.
Residuals
Heteroscedasticity: Variance of residuals

increases when x changes.
Residuals
Time
Residuals exhibit a linear trend with time.
x or y
Curved pattern in residuals resulting from

underlying nonlinear relationship.
10-37
Normal Probability Plot of the

Residuals
Flatter than Normal
10-38

Residuals
More Peaked than Normal
10-39

Residuals
Positively Skewed
10-40

Residuals
Negatively Skewed
10-41
10-10 Use of the Regression Model

for Prediction
PointPrediction
Prediction
Point
Asingle-valued
single-valuedestimate
estimateof
ofYYfor
foraagiven
givenvalue
valueof
ofXXobtained
obtainedby
by
A
insertingthe
thevalue
valueof
ofXXin
inthe
theestimated
estimatedregression
regressionequation.
equation.
inserting
PredictionInterval
Interval
Prediction
Foraavalue
valueof
ofYYgiven
givenaavalue
valueof
ofXX
For
Variationininregression
regressionline
lineestimate
estimate
Variation
Variation of points around regression line
Variation
of points around regression line
Foran
anaverage
averagevalue
valueof
ofYYgiven
givenaavalue
valueof
ofXX
For
Variationininregression
regressionline
lineestimate
estimate
Variation
10-42
10-43
Errors in Predicting E[Y|X]

Y
Upper limit on slope

Regression line
Lower limit on slope
1) Uncertainty about the

slope of the regression line
Upper limit on intercept

Regression line
Lower limit on intercept
2) Uncertainty about the

intercept of the regression line
10-44
Prediction Interval for E[Y|X]

Y
Prediction band for E[Y|X]

Regression
line
Theprediction
predictionband
bandfor
forE[Y|X]
E[Y|X]isis
The
narrowestatatthe
themean
meanvalue
valueof
ofX.
X.
narrowest
Theprediction
predictionband
bandwidens
widensasasthe
the
The
distancefrom
fromthe
themean
meanofofXXincreases.
increases.
distance
Predictionsbecome
becomevery
veryunreliable
unreliablewhen
when
Predictions
weextrapolate
extrapolatebeyond
beyondthe
therange
rangeof
ofthe
the
we
sampleitself.
itself.
sample
10-45
Additional Error in Predicting Individual

Value of Y
Y
Regression line
Prediction band for E[Y|X]

Regression
line
Prediction band for Y

X
3) Variation around the regression

line
10-46
Prediction Interval for a Value of Y

(1--))100%
100%prediction
predictioninterval
intervalfor
forYY::
AA(1
11 ((xxxx))
y
s
1
y t s 1 n SS
n
SS
2
2
2
Example10
10--11(X
(X==4,000)
4,000)::
Example
,00033,177
,177.92
.92))
11 ((44,000
{
2
74.85
(1.2553)(4
,000)}
2
.
069
318
.
16
1
{274.85 (1.2553)(4,000)} 2.069 318.16 1 25 40,947,557.84

25
40,947,557.84
2
5296.05
.05676
676.62
.62[[4619
4619.43
.43, ,5972
5972.67
.67]]
5296
10-47
Prediction Interval for the Average

Value of Y
(1--))100%
100%prediction
predictioninterval
intervalfor
forthe
theE[
E[YYX]
X]::
AA(1
11 ((xxxx))
y
s
y t s n SS
n
SS
2
2
2
Example10
10--11(X
(X==4,000)
4,000)::
Example
,00033,177
,177.92
.92))
11 ((44,000
{
2
74.85
(1.2553)(4
,000)}
2
.
069
318
.
16
{274.85 (1.2553)(4,000)} 2.069 318.16 25 40,947,557.84
25
40,947,557.84
2
,296.05
.05156
156.48
.48[[5139
5139.57
.57, ,5452
5452.53
.53]]
55,296
Template Output with Prediction

Intervals
10-48
10-49
10-11 The Excel Solver Method for

Regression
The solver macro available in EXCEL can also be used to conduct a
simple linear regression. See the text for instructions.
Using Minitab Fitted-Line Plot for

Regression
FittedLine
LinePlot
Plot
Fitted
Y = - 0.8465 + 1.352 X
Y = - 0.8465 + 1.352 X
9.0
9.0
S
S
R-Sq
R-Sq
R-Sq(adj)
R-Sq(adj)
8.5
8.5
Y
Y
8.0
8.0
7.5
7.5
7.0
7.0
6.5
6.5
6.0
6.0
5.5
5.5
6.0
6.0
X
X
6.5
6.5
7.0
7.0
7.5
7.5
0.184266
0.184266
95.2%
95.2%
94.8%
94.8%
10-50

Complete Business Statistics: Simple Linear Regression and Correlation

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Complete Business Statistics: Simple Linear Regression and Correlation

Uploaded by

Copyright:

Available Formats

COMPLETE

Simple Linear Regression and Correlation

Copyright 2009 by The McGraw-Hill Companies, Inc. All

Simple Linear Regression and Correlation

Determine whether a regression experiment would be useful in a given

LEARNING OBJECTIVES (continued)

After studying this chapter, you should be able to:

10-1 Using Statistics

10-1 Using Statistics

Scatterplot of Advertising Expenditures (X) and Sales (Y)

Larger (smaller) values of sales tend to be

Examples of Other Scatterplots

10-2 The Simple Linear Regression

Picturing the Simple Linear

Assumptions of the Simple Linear

Assumptions of the Simple

10-3 Estimation: The Method of Least

Fitting a Regression Line

Three errors from the

Errors from the least

the fitted regression line

Yi the predicted value of Y for X

Least Squares Regression

Sums of Squares, Cross Products,

Template (partial output) that can be

Template (continued) that can be used

Template (continued) that can be used

Residual Analysis. The plot shows the absence of a relationship

Template (continued) that can be used

Note: The normal probability plot is approximately linear. This

Total Variance and Error Variance

What you see when looking

What you see when looking

10-4 Error Variance and the Standard

Degrees of Freedom in Regression:

Square and sum all

Standard Errors of Estimates in

Confidence Intervals for the

A (1 - ) 100% confidence interval for b :

Least-squares point estimate:

(not a possible value of the

bb00tt 0.025,( 25 2 )s s((bb00))

bb11tt 0.025,( 25 2 )s s((bb11))

Template (partial output) that can be used

Covariance and Correlation

The sample correlation coefficient * :

If < 0, b1 < 0 If = 0, b1 = 0 If > 0, b1 >0

Hypothesis Tests for the Correlation

(No linear relationship)

10-6 Hypothesis Tests about the

A hypothesis test for the existence of a linear relationship between X and Y:

Hypothesis Tests for the Regression

H 0 is rejected at the 1% level and we may

10-7 How Good is the Regression?

The Coefficient of Determination

10-8 Analysis-of-Variance Table and

Template (partial output) that displays Analysis of

10-9 Residual Analysis and Checking

Homoscedasticity: Residuals appear completely

Heteroscedasticity: Variance of residuals

Residuals exhibit a linear trend with time.

Curved pattern in residuals resulting from

Normal Probability Plot of the