Professional Documents
Culture Documents
Chapter 12
Simple Regression
Chap 12-1
Chapter Goals
After completing this chapter, you should be
able to:
Chapter Goals
(continued)
Correlation Analysis
Correlation Analysis
r
where
s xy
s xy
sxsy
(x x)(y y)
n 1
H0 : 0
r (n 2)
(1 r )
2
Decision Rules
Hypothesis Test for Correlation
Lower-tail test:
Upper-tail test:
Two-tail test:
H0: 0
H1: < 0
H0: 0
H1: > 0
H0: = 0
H1: 0
-t
r (n 2)
(1 r )
2
has n - 2 d.f.
/2
-t/2
/2
t/2
Introduction to
Regression Analysis
Yi 0 1x i i
Population
Slope
Coefficient
Independent
Variable
Random
Error
term
Yi 0 1Xi i
Linear component
Random Error
component
Yi 0 1Xi i
Observed Value
of Y for Xi
Predicted Value
of Y for Xi
Slope = 1
Random Error
for this Xi value
Intercept = 0
Xi
Estimate of
the regression
Estimate of the
regression slope
intercept
y i b0 b1x i
Value of x for
observation i
ei ( y i - y i ) y i - (b0 b1x i )
b1
(x x)(y y)
i 1
x
2
(x
x
)
i
rxy
sY
sX
i 1
b0 y b1x
E[ i ] 0 and E[ i ] 2
for (i 1, , n)
Interpretation of the
Slope and the Intercept
Square Feet
(X)
245
1400
312
1600
279
1700
308
1875
199
1100
219
1550
405
2350
324
2450
319
1425
255
1700
Graphical Presentation
Excel Output
Regression Statistics
Multiple R
0.76211
R Square
0.58082
Adjusted R Square
0.52842
Standard Error
41.33032
Observations
ANOVA
10
df
SS
MS
F
11.0848
Regression
18934.9348
18934.9348
Residual
13665.5652
1708.1957
Total
32600.5000
Coefficients
Intercept
Square Feet
Standard Error
t Stat
P-value
Significance F
0.01039
Lower 95%
Upper 95%
98.24833
58.03348
1.69296
0.12892
-35.57720
232.07386
0.10977
0.03297
3.32938
0.01039
0.03374
0.18580
Graphical Presentation
Intercept
= 98.248
Interpretation of the
Intercept, b0
house price 98.24833 0.10977 (square feet)
Interpretation of the
Slope Coefficient, b1
house price 98.24833 0.10977 (square feet)
Measures of Variation
SST
SSR
SSE
Total Sum of
Squares
Regression Sum
of Squares
Error Sum of
Squares
SST (y i y)2
SSR (y i y)2
SSE (y i y i )2
where:
Measures of Variation
(continued)
Measures of Variation
(continued)
Y
yi
2
SSE = (yi - yi )
_
y
xi
_
y
Coefficient of Determination, R2
SST
total sum of squares
2
note:
0 R 1
2
Examples of Approximate
r2 Values
Y
r2 = 1
r2 = 1
r =1
2
Examples of Approximate
r2 Values
Y
0 < r2 < 1
Examples of Approximate
r2 Values
r2 = 0
No linear relationship
between X and Y:
r2 = 0
Excel Output
Multiple R
0.76211
R Square
0.58082
Adjusted R Square
0.52842
Standard Error
41.33032
Observations
ANOVA
SSR 18934.9348
R
0.58082
SST 32600.5000
2
Regression Statistics
10
df
SS
MS
F
11.0848
Regression
18934.9348
18934.9348
Residual
13665.5652
1708.1957
Total
32600.5000
Coefficients
Intercept
Square Feet
Standard Error
t Stat
P-value
Significance F
0.01039
Lower 95%
Upper 95%
98.24833
58.03348
1.69296
0.12892
-35.57720
232.07386
0.10977
0.03297
3.32938
0.01039
0.03374
0.18580
Correlation and R2
R r
2
2
xy
Estimation of Model
Error Variance
2
e
i
SSE
s
n2 n2
2
2
e
i1
Excel Output
Regression Statistics
Multiple R
0.76211
R Square
0.58082
Adjusted R Square
0.52842
Standard Error
41.33032
Observations
ANOVA
s e 41.33032
10
df
SS
MS
F
11.0848
Regression
18934.9348
18934.9348
Residual
13665.5652
1708.1957
Total
32600.5000
Coefficients
Intercept
Square Feet
Standard Error
t Stat
P-value
Significance F
0.01039
Lower 95%
Upper 95%
98.24833
58.03348
1.69296
0.12892
-35.57720
232.07386
0.10977
0.03297
3.32938
0.01039
0.03374
0.18580
small se
large se
2
2
(xi x) (n 1)s x
where:
sb1
SSE
se
n2
Excel Output
Regression Statistics
Multiple R
0.76211
R Square
0.58082
Adjusted R Square
0.52842
Standard Error
Observations
ANOVA
sb1 0.03297
41.33032
10
df
SS
MS
F
11.0848
Regression
18934.9348
18934.9348
Residual
13665.5652
1708.1957
Total
32600.5000
Coefficients
Intercept
Square Feet
Standard Error
t Stat
P-value
Significance F
0.01039
Lower 95%
Upper 95%
98.24833
58.03348
1.69296
0.12892
-35.57720
232.07386
0.10977
0.03297
3.32938
0.01039
0.03374
0.18580
small Sb1
large Sb1
Test statistic
b1 1
t
sb1
d.f. n 2
where:
b1 = regression slope
coefficient
1 = hypothesized slope
sb1 = standard
error of the slope
Square Feet
(x)
245
1400
312
1600
279
1700
308
1875
199
1100
219
1550
405
2350
324
2450
319
1425
255
1700
H1: 1 0
Coefficients
Intercept
Square Feet
b1
Standard Error
sb1
t Stat
P-value
98.24833
58.03348
1.69296
0.12892
0.10977
0.03297
3.32938
0.01039
b1 1 0.10977 0
t
3.32938
t
sb1
0.03297
H1: 1 0
Coefficients
Intercept
Square Feet
d.f. = 10-2 = 8
t8,.025 = 2.3060
/2=.025
Reject H0
/2=.025
Do not reject H0
-tn-2,/2
-2.3060
Reject H0
tn-2,/2
2.3060 3.329
b1
Standard Error
sb1
t Stat
P-value
98.24833
58.03348
1.69296
0.12892
0.10977
0.03297
3.32938
0.01039
Decision:
Reject H0
Conclusion:
There is sufficient evidence
that square footage affects
house price
P-value = 0.01039
H0: 1 = 0
H1: 1 0
Coefficients
Intercept
Square Feet
P-value
Standard Error
t Stat
P-value
98.24833
58.03348
1.69296
0.12892
0.10977
0.03297
3.32938
0.01039
Standard Error
t Stat
P-value
Lower 95%
Upper 95%
98.24833
58.03348
1.69296
0.12892
-35.57720
232.07386
0.10977
0.03297
3.32938
0.01039
0.03374
0.18580
Standard Error
t Stat
P-value
Lower 95%
Upper 95%
98.24833
58.03348
1.69296
0.12892
-35.57720
232.07386
0.10977
0.03297
3.32938
0.01039
0.03374
0.18580
F Test statistic:
where
MSR
F
MSE
MSR
SSR
k
MSE
SSE
n k 1
Excel Output
Regression Statistics
Multiple R
0.76211
R Square
0.58082
Adjusted R Square
0.52842
Standard Error
41.33032
Observations
ANOVA
MSR 18934.9348
F
11.0848
MSE 1708.1957
10
df
MS
F
11.0848
Regression
18934.9348
18934.9348
Residual
13665.5652
1708.1957
Total
32600.5000
Coefficients
Intercept
Square Feet
Standard Error
P-value for
the F-Test
t Stat
P-value
Significance F
0.01039
Lower 95%
Upper 95%
98.24833
58.03348
1.69296
0.12892
-35.57720
232.07386
0.10977
0.03297
3.32938
0.01039
0.03374
0.18580
Test Statistic:
H 0 : 1 = 0
MSR
F
11.08
MSE
H 1 : 1 0
= .05
df1= 1
df2 = 8
Decision:
Reject H0 at = 0.05
Critical
Value:
F = 5.32
Conclusion:
= .05
Do not
reject H0
Reject H0
F.05 = 5.32
Prediction
y n1 b0 b1x n1
Predictions Using
Regression Analysis
Predict the price for a house
with 2000 square feet:
Risky to try to
extrapolate far
beyond the range
of observed Xs
y = b0+b1xi
Prediction Interval
for an single
observed y, given xi
xi
1 (x n1 x)2
2
n (x i x)
1 (x n1 x)2
1
2
n (x i x)
1
(x i x)2
317.85 37.12
2
n (x i x)
y n1 t n-1,/2s e
1
(Xi X)2
1
317.85 102.28
2
n (Xi X)
In Excel, use
PHStat | regression | simple linear regression
Check the
confidence and prediction interval for x=
box and enter the x-value and confidence level
desired
Input values
y
Confidence Interval Estimate
for E(Yn+1|Xn+1)
Confidence Interval Estimate
Graphical Analysis
Chapter Summary