Professional Documents
Culture Documents
y b b x = +
is the estimated value of y for a given x value.
y
b
1
is the slope of the line.
b
0
is the y intercept of the line.
The graph is called the estimated regression line.
Estimation Process
Regression Model
y = |
0
+ |
1
x +c
Regression Equation
E(y) = |
0
+ |
1
x
Unknown Parameters
|
0
, |
1
Sample Data:
x y
x
1
y
1
. .
. .
x
n
y
n
b
0
and b
1
provide estimates of
|
0
and |
1
Estimated
Regression Equation
Sample Statistics
b
0
, b
1
0 1
y b b x = +
Least Squares Method
Least Squares Criterion
min (y y
i i
)
2
where:
y
i
= observed value of the dependent variable
for the ith observation
^
y
i
= estimated value of the dependent variable
for the ith observation
Slope for the Estimated Regression Equation
1
2
( )( )
( )
i i
i
x x y y
b
x x
=
10 5 y x = +
1
2
( )( )
20
5
( ) 4
i i
i
x x y y
b
x x
= = =
0 1
20 5(2) 10 b y b x = = =
Slope for the Estimated Regression Equation
y-Intercept for the Estimated Regression Equation
Estimated Regression Equation
Scatter Diagram and Trend Line
y = 5x + 10
0
5
10
15
20
25
30
0 1 2 3 4
TV Ads
C
a
r
s
S
o
l
d
Coefficient of Determination
Relationship Among SST, SSR, SSE
where:
SST = total sum of squares
SSR = sum of squares due to regression
SSE = sum of squares due to error
SST = SSR + SSE
2
( )
i
y y
( )
i
y y =
( )
i i
y y +
+ =
2
1
) of (sign r b r
xy
=
The sign of b
1
in the equation is +.
10 5 y x = +
= + .8772
xy
r
Sample Correlation Coefficient
r
xy
= +.9366
Assumptions About the Error Term c
1. The error c is a random variable with mean of zero.
2. The variance of c , denoted by o
2
, is the same for
all values of the independent variable.
3. The values of c are independent.
4. The error c is a normally distributed random
variable.
Testing for Significance
To test for a significant regression relationship, we
must conduct a hypothesis test to determine whether
the value of |
1
is zero.
Two tests are commonly used:
t Test
and
F Test
Both the t test and F test require an estimate of o
2
,
the variance of c in the regression model.
An Estimate of o
2
Testing for Significance
= =
2
1 0
2
) ( )
( SSE
i i i i
x b b y y y
where:
s
2
= MSE = SSE/(n 2)
The mean square error (MSE) provides the estimate
of o
2
, and the notation s
2
is also used.
Testing for Significance
An Estimate of o
2
SSE
MSE
= =
n
s
To estimate o we take the square root of o
2
.
The resulting s is called the standard error of
the estimate.
Hypotheses
Test Statistic
Testing for Significance: t Test
0 1
: 0 H | =
1
: 0
a
H | =
1
1
b
b
t
s
=
where
1
2
( )
b
i
s
s
x x
=
E
Rejection Rule
Testing for Significance: t Test
where:
t
o/2
is based on a t distribution
with n - 2 degrees of freedom
Reject H
0
if p-value < o
or t < -t
o/2
or t > t
o/2
1. Determine the hypotheses.
2. Specify the level of significance.
3. Select the test statistic.
o = .05
4. State the rejection rule.
Reject H
0
if p-value < .05
or |t| > 3.182 (with
3 degrees of freedom)
Testing for Significance: t Test
0 1
: 0 H | =
1
: 0
a
H | =
1
1
b
b
t
s
=
Testing for Significance: t Test
5. Compute the value of the test statistic.
6. Determine whether to reject H
0
.
t = 4.541 provides an area of .01 in the upper
tail. Hence, the p-value is less than .02. (Also,
t = 4.63 > 3.182.) We can reject H
0
.
1
1
5
4.63
1.08
b
b
t
s
= = =
Confidence Interval for |
1
H
0
is rejected if the hypothesized value of |
1
is not
included in the confidence interval for |
1
.
We can use a 95% confidence interval for |
1
to test
the hypotheses just used in the t test.
The form of a confidence interval for |
1
is:
Confidence Interval for |
1
1
1 /2 b
b t s
o
or 1.56 to 8.44
Rejection Rule
95% Confidence Interval for |
1
Conclusion
Hypotheses
Test Statistic
Testing for Significance: F Test
F = MSR/MSE
0 1
: 0 H | =
1
: 0
a
H | =
Rejection Rule
Testing for Significance: F Test
where:
F
o
is based on an F distribution with
1 degree of freedom in the numerator and
n - 2 degrees of freedom in the denominator
Reject H
0
if
p-value < o
or F > F
o
1. Determine the hypotheses.
2. Specify the level of significance.
3. Select the test statistic.
o = .05
4. State the rejection rule.
Reject H
0
if p-value < .05
or F > 10.13 (with 1 d.f.
in numerator and
3 d.f. in denominator)
Testing for Significance: F Test
0 1
: 0 H | =
1
: 0
a
H | =
F = MSR/MSE
Testing for Significance: F Test
5. Compute the value of the test statistic.
6. Determine whether to reject H
0
.
F = 17.44 provides an area of .025 in the upper
tail. Thus, the p-value corresponding to F = 21.43
is less than 2(.025) = .05. Hence, we reject H
0
.
F = MSR/MSE = 100/4.667 = 21.43
The statistical evidence is sufficient to conclude
that we have a significant relationship between the
number of TV ads aired and the number of cars sold.
Some Cautions about the
Interpretation of Significance Tests
Just because we are able to reject H
0
: |
1
= 0 and
demonstrate statistical significance does not enable
us to conclude that there is a linear relationship
between x and y.
Rejecting H
0
: |
1
= 0 and concluding that the
relationship between x and y is significant does
not enable us to conclude that a cause-and-effect
relationship is present between x and y.
Using the Estimated Regression Equation
for Estimation and Prediction
/
y t s
p y
p
o 2
where:
confidence coefficient is 1 - o and
t
o/2
is based on a t distribution
with n - 2 degrees of freedom
/2 ind p
y t s
o
2
( )
1
( )
p
p
y
i
x x
s s
n x x
= +
p
y
Confidence Interval for E(y
p
)
2
2 2 2 2 2
(3 2) 1
2.16025
5 (1 2) (3 2) (2 2) (1 2) (3 2)
p
y
s
= +
+ + + +
1 1
2.16025 1.4491
5 4
p
y
s = + =
The 95% confidence interval estimate of the mean
number of cars sold when 3 TV ads are run is:
Confidence Interval for E(y
p
)
25 + 4.61
/
y t s
p y
p
o 2
25 + 3.1824(1.4491)
20.39 to 29.61 cars
2
ind
2
( )
1
1
( )
p
i
x x
s s
n x x
= + +
1 1
2.16025 1
5 4
p
y
s = + +
2.16025(1.20416) 2.6013
p
y
s = =
Prediction Interval for y
p
The 95% prediction interval estimate of the number
of cars sold in one particular week when 3 TV ads
are run is:
Prediction Interval for y
p
25 + 8.28
25 + 3.1824(2.6013)
/2 ind p
y t s
o
i i
y y
Much of the residual analysis is based on an
examination of graphical plots.
Residual for Observation i
The residuals provide the best information about c .
If the assumptions about the error term c appear
questionable, the hypothesis tests about the
significance of the regression relationship and the
interval estimation results may not be valid.
Residual Plot Against x
If the assumption that the variance of c is the same
for all values of x is valid, and the assumed
regression model is an adequate representation of the
relationship between the variables, then
The residual plot should give an overall
impression of a horizontal band of points
x
y y
0
Good Pattern
R
e
s
i
d
u
a
l
Residual Plot Against x
Residual Plot Against x
x
y y
0
R
e
s
i
d
u
a
l
Nonconstant Variance
Residual Plot Against x
x
y y
0
R
e
s
i
d
u
a
l
Model Form Not Adequate
Residuals
Residual Plot Against x
Observation Predicted Cars Sold Residuals
1 15 -1
2 25 -1
3 20 -2
4 15 2
5 25 2
Residual Plot Against x
TV Ads Residual Plot
-3
-2
-1
0
1
2
3
0 1 2 3 4
TV Ads
R
e
s
i
d
u
a
l
s
Standardized Residual for Observation i
Standardized Residuals
i i
i i
y y
y y
s
1
i i
i y y
s s h
=
2
2
( ) 1
( )
i
i
i
x x
h
n x x
= +
where:
Standardized Residual Plot
The standardized residual plot can provide insight
about the assumption that the error term c has a
normal distribution.
If this assumption is satisfied, the distribution of the
standardized residuals should appear to come from a
standard normal probability distribution.
Standardized Residuals
Standardized Residual Plot
Observation Predicted Y Residuals Standard Residuals
1 15
-1 -0.535
2 25 -1 -0.535
3 20 -2 -1.069
4 15 2 1.069
5 25 2 1.069
Standardized Residual Plot
Standardized Residual Plot
A B C D
28
29 RESIDUAL OUTPUT
30
31 Observation Predicted Y Residuals Standard Residuals
32 1 15 -1 -0.534522
33 2 25 -1 -0.534522
34 3 20 -2 -1.069045
35 4 15 2 1.069045
36 5 25 2 1.069045
37
-1.5
-1
-0.5
0
0.5
1
1.5
0 10 20 30
Cars Sold
S
t
a
n
d
a
r
d
R
e
s
i
d
u
a
l
s
Standardized Residual Plot
All of the standardized residuals are between 1.5
and +1.5 indicating that there is no reason to question
the assumption that c has a normal distribution.
Outliers and Influential Observations
Detecting Outliers
An outlier is an observation that is unusual in
comparison with the other data.
Minitab classifies an observation as an outlier if its
standardized residual value is < -2 or > +2.
This standardized residual rule sometimes fails to
identify an unusually large observation as being
an outlier.
This rules shortcoming can be circumvented by
using studentized deleted residuals.
The |i th studentized deleted residual| will be
larger than the |i th standardized residual|.