Professional Documents
Culture Documents
BUSINESS
STATISTICS
by
AMIR D. ACZEL
&
JAYAVEL SOUNDERPANDIAN
7th edition.
Chapter 10
Simple Linear Regression and Correlation
McGraw-Hill/Irwin Copyright © 2009 by The McGraw-Hill Companies, Inc. All rights reserved.
10-2
10 LEARNING OBJECTIVES
After studying this chapter, you should be able to:
• Determine whether a regression experiment would be useful in a given
instance
• Formulate a regression model
• Compute a regression equation
• Compute the covariance and the correlation coefficient of two random
variables
• Compute confidence intervals for regression coefficients
• Compute a prediction interval for the dependent variable
10-4
100
Sales
80
Larger (smaller) values of sales tend to be 60
associated with larger (smaller) values of 40
advertising. 20
0
0 10 20 30 40 50
A d ve rtising
The scatter of points tends to be distributed around a positively sloped straight line.
The pairs of values of advertising expenditures and sales are not located exactly on a
straight line.
The scatter plot reveals a more or less strong tendency rather than a precise linear
relationship.
The line represents the nature of the relationship on average.
10-7
Y
Y
Y
X 0 X X
Y
Y
X X X
10-8
Model Building
E[Yi]=0 + 1 Xi
}
1
Actual observed values of Y
0 = Intercept
differ from the expected value by
an unexplained or random error:
X
Yi = E[Yi] + i
Xi = 0 + 1 Xi + i
10-11
X
10-12
Y b0 + b1 X
where Y (Y - hat) is the value of Y lying on the fitted regression line for a given
value of X.
10-13
Data
Three errors from the
least squares regression
X line X
Y
Errors in Regression
Y
the observeddata point
Y b0 b1 X the fitted regression line
Yi .
Yi
{
Error ei Yi Yi
Yi the predicted value of Y for X
i
X
Xi
10-15
The least squares regression line is that which minimizes the SSE
with respect to the estimates b 0 and b 1 .
n n
y
i=1
i nb0 b1 x i
i=1
At this point
SSE is
Least squares b0 minimized
n n n with respect
x y
i=1
i i b0 x i b1 x 2i
i=1 i=1
to b0 and b1
Least squares b1 b1
10-16
SSx (x x ) x
2 2
n 2
SS y ( y y ) y
2 2 y
n
SSxy (x x )( y y ) xy
x ( y )
n
Least squares regression estimators:
SS XY
b1
SS X
b0 y b1 x
10-17
Example 10-1
X X
1.25533 010287
.
[115246
. ,1.35820]
10-5 Correlation
The population correlation, denoted by, can take on any value from -1 to 1.
Illustrations of Correlation
Y Y Y
= -1 =0
=1
X X X
Y = -.8 Y =0 Y
= .8
X X X
10-29
Example 10 -1:
r
H0: = 0 (No linear relationship) t( n 2 )
H1: 0 (Some linear relationship) 1 r2
n2
0.9824
r =
Test Statistic: t( n 2 ) 1 - 0.9651
1 r2
25 - 2
n2 0.9824
= 25.25
0.0389
t0. 005 2.807 25.25
H 0 rejected at 1% level
10-31
X X X
A hypothesis test for the existence of a linear relationship between X and Y:
H0: 1 0
H1: 1 0
Test statistic for the existence of a linear relationship between X and Y:
b
1
t
(n - 2) s(b )
1
where b is the least - squares estimate of the regression slope and s ( b ) is the standard error of b .
1 1 1
When the null hypothesis is true, the statistic has a t distribution with n - 2 degrees of freedom.
10-32
Y
Y
Unexplained Deviation
Explained Deviation
{
}
{
Total Deviation
SST
2
= SSE
2
( y y ) ( y y) ( y y )
+ SSR
Percentage of
2
2 SSR SSE
r 1 total variation
SST SST explained by
X
X the regression.
10-34
Y Y Y
X X X
SST SST SST
S
r2 = 0 SSE r2 = 0.50 SSE SSR r2 = 0.90 S SSR
E
7000
Example 10 -1: 6000
5000
Dollars
SSR 64527736.8
r 2
0.96518 4000
2000
1000 1500 2000 2500 3000 3500 4000 4500 5000 5500
Miles
10-35
Example 10-1
Source of Sum of Degrees of
Variation Squares Freedom F Ratio p Value
Mean Square
Regression 64527736.8 1 64527736.8 637.47 0.000
Error 2328161.2 23 101224.4
Total 66855898.0 24
10-36
0 0
x or y x or y
Residuals Residuals
0 0
Time x or y
Positively Skewed
10-41
Negatively Skewed
10-42
• Point Prediction
A single-valued estimate of Y for a given value of X obtained by
inserting the value of X in the estimated regression equation.
• Prediction Interval
For a value of Y given a value of X
Variation in regression line estimate
Variation of points around regression line
For an average value of Y given a value of X
Variation in regression line estimate
10-43
X X X X
X X
Y
Regression line Y Prediction band for E[Y|X]
Regression
line
X X X
3) Variation around the regression
line Prediction Interval for E[Y|X]
10-46
1 (x x) 2
yˆ t s 1
2 n SS X
Example10 - 1 (X = 4,000) :
1 (4,000 3,177.92) 2
1 (x x) 2
yˆ t s
2 n SS X
Example10 - 1 (X = 4,000) :
1 (4,000 3,177.92) 2
9.0 S 0.184266
R-Sq 95.2%
R-Sq(adj) 94.8%
8.5
8.0
7.5
Y
7.0
6.5
6.0
5.5 6.0 6.5 7.0 7.5
X