You are on page 1of 98

Chapter 7

Simple Linear Regression


Models
Models
Representation of some phenomenon
Mathematical model is a mathematical
expression of some phenomenon
Often describe relationships between variables
Types
Deterministic models
Probabilistic models
Deterministic Models
Hypothesize exact relationships
Suitable when prediction error is
negligible
Example: force is exactly mass times
acceleration
F = ma

Probabilistic Models
Hypothesize two components
Deterministic
Random error
Example: sales volume (y) is 10 times
advertising spending (x) + random error
y = 10x + c
Random error may be due to factors
other than advertising
Types of
Probabilistic Models
Probabilistic
Models
Regression
Models
Correlation
Models
Regression Models
Types of
Probabilistic Models
Probabilistic
Models
Regression
Models
Correlation
Models
Regression Models
Answers What is the relationship between the
variables?
Equation used
One numerical dependent (response) variable
What is to be predicted
One or more numerical or categorical
independent (explanatory) variables
Used mainly for prediction and estimation
Regression Modeling Steps
1. Hypothesize deterministic component
2. Estimate unknown model parameters
3. Specify probability distribution of random error
term
Estimate standard deviation of error
4. Evaluate model
5. Use model for prediction and estimation
Model Specification
Regression Modeling Steps
1. Hypothesize deterministic component
2. Estimate unknown model parameters
3. Specify probability distribution of random error
term
Estimate standard deviation of error
4. Evaluate model
5. Use model for prediction and estimation
Specifying the Model
1. Define variables
Conceptual (e.g., Advertising, price)
Empirical (e.g., List price, regular price)
Measurement (e.g., $, Units)
2. Hypothesize nature of relationship
Expected effects (i.e., Coefficients signs)
Functional form (linear or non-linear)
Interactions
Model Specification
Is Based on Theory
Theory of field (e.g.,
Sociology)
Mathematical theory
Previous research
Common sense
Advertising
Sales
Advertising
Sales
Advertising
Sales
Advertising
Sales
Thinking Challenge:
Which Is More Logical?
Simple
1 Explanatory
Variable
Types of
Regression Models
Regression
Models
2+ Explanatory
Variables
Multiple
Linear
Non-
Linear
Linear
Non-
Linear
Linear Regression Model
Types of
Regression Models
Simple
1 Explanatory
Variable
Regression
Models
2+ Explanatory
Variables
Multiple
Linear
Non-
Linear
Linear
Non-
Linear
y x = + + | | c
0 1
Linear Regression Model
Relationship between variables is a linear function
Dependent
(Response)
Variable
Independent
(Explanatory)
Variable
Population
Slope
Population
y-intercept
Random
Error
y

0
=y-intercept
x
Change
in y
Change in x

1
= Slope
Line of Means
Population & Sample Regression
Models
Population
$
$
$
$
$
Unknown
Relationship
0 1
y x | | c = + +
Random Sample
$
$
$
$
0 1

y x | | c = + +
y
x
Population Linear Regression
Model
0 1 i i i
y x | | c = + +
( )
0 1
E y x | | = +
Observed
value
Observed value
c
i
= Random error
y
x
0 1

i i i
y x | | c = + +
Sample Linear Regression Model
0 1

i i
y x | | = +
Unsampled
observation
c
i
= Random
error
Observed value
^
Estimating Parameters:
Least Squares Method
Regression Modeling Steps
1. Hypothesize deterministic component
2. Estimate unknown model parameters
3. Specify probability distribution of random error
term
Estimate standard deviation of error
4. Evaluate model
5. Use model for prediction and estimation
Scattergram
1. Plot of all (x
i
, y
i
) pairs
2. Suggests how well model will fit
0
20
40
60
0 20 40 60
x
y
0
20
40
60
0 20 40 60
x
y
Thinking Challenge
How would you draw a line through the points?
How do you determine which line fits best?
Least Squares
Best fit means difference between actual y
values and predicted y values are a minimum
But positive differences off-set negative
( )
2
2
1 1

n n
i
i i
i i
y y
c
= =
=

Least Squares minimizes the Sum of the
Squared Differences (SSE)
Least Squares Graphically
c
2
y
x
c
1
c
3
c
4
^
^
^
^
2 0 1 2 2

y x | | c = + +
0 1

i i
y x | | = +
2 2 2 2 2
1 2 3 4
1

LS minimizes
n
i
i
c c c c c
=
= + + +

Coefficient Equations
1
1
1
1
2
1
2
1

n
n
i
i
n
i
i
i i
xy
i
n
xx
i
n
i
i
i
x y
x y
SS
n
SS
x
x
n
|
=
=
=
=
=
| || |
| |
\ .\ .

= =
| |
|
\ .

Slope
y-intercept
0 1

y x | | = +
Prediction Equation
0 1

y x | | =
Computation Table
x
i

2
x
1

2
x
2

2
: :
x
n

2
x
i

2
y
i

y
1

y
2

:
y
n

y
i

x
i

x
1

x
2

:
x
n

x
i

2
2
2
2
2
y
i

y
1

y
2

:
y
n

y
i

x
i
y
i

x
i
y
i

x
1
y
1

x
2
y
2

x
n
y
n

Interpretation of Coefficients
^
^
^
1. Slope (|
1
)
Estimated y changes by |
1
for each 1unit increase
in x
If |
1
= 2, then Sales (y) is expected to increase by 2
for each 1 unit increase in Advertising (x)
2. Y-Intercept (|
0
)
Average value of y when x = 0
If |
0
= 4, then Average Sales (y) is expected to be
4 when Advertising (x) is 0

^
^
Least Squares Example
Youre a marketing analyst for Hasbro Toys.
You gather the following data:
Ad $ Sales (Units)
1 1
2 1
3 2
4 2
5 4
Find the least squares line relating
sales and advertising.
0
1
2
3
4
0 1 2 3 4 5
Scattergram
Sales vs. Advertising
Sales
Advertising
Parameter Estimation Solution Table
2 2
1 1 1 1 1
2 1 4 1 2
3 2 9 4 6
4 2 16 4 8
5 4 25 16 20
15 10 55 26 37
x
i
y
i
x
i
y
i
x
i
y
i

Parameter Estimation Solution
( )( )
( )
1
1
1
1
2 2
1
2
1
15 10
37
5

.70
15
55
5
n
n
i
i
n
i
i
i i
i
n
i
n
i
i
i
x y
x y
n
x
x
n
|
=
=
=
=
=
| || |
| |
\ .\ .


= = =
| |

|
\ .

.1 .7 y x = +
( )( )
0 1

2 .70 3 .10 y x | | = = =
Parameter Estimation
Computer Output
Parameter Estimates

Parameter Standard T for H0:
Variable DF Estimate Error Param=0 Prob>|T|
INTERCEP 1 -0.1000 0.6350 -0.157 0.8849
ADVERT 1 0.7000 0.1914 3.656 0.0354
|
0
^
|
1
^

.1 .7 y x = +
Coefficient Interpretation Solution
1. Slope (|
1
)
Sales Volume (y) is expected to increase by .7
units for each $1 increase in Advertising (x)
^
2. Y-Intercept (|
0
)
Average value of Sales Volume (y) is -.10 units
when Advertising (x) is 0
Difficult to explain to marketing manager
Expect some sales without advertising
^
0
1
2
3
4
0 1 2 3 4 5
Regression Line Fitted
to the Data
Sales
Advertising

.1 .7 y x = +
Least Squares
Thinking Challenge
Youre an economist for the county cooperative. You
gather the following data:
Fertilizer (lb.) Yield (lb.)
4 3.0
6 5.5
10 6.5
12 9.0
Find the least squares line relating
crop yield and fertilizer.
0
2
4
6
8
10
0 5 10 15
Scattergram
Crop Yield vs. Fertilizer*
Yield (lb.)
Fertilizer (lb.)
Parameter Estimation Solution
Table*
4 3.0 16 9.00 12
6 5.5 36 30.25 33
10 6.5 100 42.25 65
12 9.0 144 81.00 108
32 24.0 296 162.50 218
x
i

2
y
i
x
i
y
i
x
i
y
i

2
Parameter Estimation Solution*
( )( )
( )
1
1
1
1
2 2
1
2
1
32 24
218
4

.65
32
296
4
n
n
i
i
n
i
i
i i
i
n
i
n
i
i
i
x y
x y
n
x
x
n
|
=
=
=
=
=
| || |
| |
\ .\ .


= = =
| |

|
\ .

( )( )
0 1

6 .65 8 .80 y x | | = = =

.8 .65 y x = +
Coefficient Interpretation Solution*
2. Y-Intercept (|
0
)
Average Crop Yield (y) is expected to be 0.8 lb.
when no Fertilizer (x) is used
^
^
1. Slope (|
1
)
Crop Yield (y) is expected to increase by .65 lb. for
each 1 lb. increase in Fertilizer (x)
Regression Line Fitted
to the Data*
0
2
4
6
8
10
0 5 10 15
Yield (lb.)
Fertilizer (lb.)

.8 .65 y x = +
Probability Distribution
of Random Error
Regression Modeling Steps
1. Hypothesize deterministic component
2. Estimate unknown model parameters
3. Specify probability distribution of
random error term
Estimate standard deviation of error
4. Evaluate model
5. Use model for prediction and estimation
Linear Regression Assumptions
1. Mean of probability distribution of error, ,
is 0
2. Probability distribution of error has
constant variance
3. Probability distribution of error, , is
normal
4. Errors are independent
Error
Probability Distribution
x
1
x
2
x
3
y
E(y) =
0
+
1
x

x

Random Error Variation
Measured by standard error of regression
model
Sample standard deviation of c : s


^
Affects several factors
Parameter significance
Prediction accuracy
^
Variation of actual y from predicted y, y
y
x
x
i
Variation Measures
0 1

i i
y x | | = +
y
i

2

( )
i i
y y
Unexplained sum
of squares
2
( )
i
y y
Total sum of
squares
2

( )
i
y y
Explained sum of
squares
y
Estimation of
2
( )
2
2

2
i i
SSE
s where SSE y y
n
= =


2
2
SSE
s s
n
= =

Calculating SSE, s
2
, s Example
Youre a marketing analyst for Hasbro Toys. You
gather the following data:
Ad $ Sales (Units)
1 1
2 1
3 2
4 2
5 4
Find SSE, s
2
, and s.
Calculating SSE Solution
1 1
2 1
3 2
4 2
5 4
x
i
y
i

.6
1.3
2
2.7
3.4

y y
.4
-.3
0
-.7
.6
2

( ) y y
.1 .7 y x = +
.16
.09
0
.49
.36
SSE=1.1
Calculating s
2
and s Solution

2
1.1
.36667
2 5 2
SSE
s
n
= = =

.36667 .6055 s = =
Testing for Significance
Evaluating the Model
Regression Modeling Steps
1. Hypothesize deterministic component
2. Estimate unknown model parameters
3. Specify probability distribution of random error
term
Estimate standard deviation of error
4. Evaluate model
5. Use model for prediction and estimation
Test of Slope Coefficient
Shows if there is a linear relationship between x
and y
Involves population slope |
1
Hypotheses
H
0
: |
1
= 0 (No Linear Relationship)
H
a
: |
1
= 0 (Linear Relationship)
Theoretical basis is sampling distribution of
slope
y
Population Line
x
Sample 1 Line
Sample 2 Line
Sampling Distribution
of Sample Slopes
|
1
Sampling Distribution
|
1
|
1
S
^
^
All Possible
Sample Slopes
Sample 1: 2.5
Sample 2: 1.6
Sample 3: 1.8
Sample 4: 2.1
: :
Very large number of
sample slopes
Slope Coefficient
Test Statistic
1
1 1

2
1 2
1

2
where
xx
n
i
n
i
xx i
i
t df n
s
S
SS
x
SS x
n
|
| |
=
=
= = =
| |
|
\ .
=

Test of Slope Coefficient


Example
Youre a marketing analyst for Hasbro Toys.
You find
0
= .1,
1
= .7 and s = .6055.
Ad $ Sales (Units)
1 1
2 1
3 2
4 2
5 4
Is the relationship significant
at the .05 level of significance?
^ ^
Test of Slope Coefficient
Solution
H0:
Ha:
o =
df =
Critical Value(s):

Test Statistic:
Decision:

Conclusion:

t
0 3.182 -3.182
.025
Reject H
0
Reject H
0
.025
|
1
= 0
|
1
= 0
.05
5 - 2 = 3
Solution Table
x
i
y
i
x
i
2
y
i
2
x
i
y
i
1 1 1 1 1
2 1 4 1 2
3 2 9 4 6
4 2 16 4 8
5 4 25 16 20
15 10 55 26 37
Test Statistic
Solution
( )
1
1

2
1

.6055
.1914
15
55
5

.70
3.657
.1914
xx
S
S
SS
t
S
|
|
|
= = =

= = =
Test of Slope Coefficient
Solution
H0: |
1
= 0
Ha: |
1
= 0
o = .05
df = 5 - 2 = 3
Critical Value(s):

Test Statistic:
Decision:

Conclusion:

1
1

.70
3.657
.1914
t
S
|
|
= = =
Reject at o = .05
There is evidence of a
relationship
t
0 3.182 -3.182
.025
Reject H
0
Reject H
0
.025
Test of Slope Coefficient
Computer Output
Parameter Estimates
Parameter Standard T for H0:
Variable DF Estimate Error Param=0 Prob>|T|
INTERCEP 1 -0.1000 0.6350 -0.157 0.8849
ADVERT 1 0.7000 0.1914 3.656 0.0354



t = |
1
/ S
|
P-Value
S
|
|
1 1
1
^
^
^
^
Correlation Models
Types of
Probabilistic Models
Probabilistic
Models
Regression
Models
Correlation
Models
Correlation Models
Answers How strong is the linear relationship
between two variables?
Coefficient of correlation
Sample correlation coefficient denoted r
Values range from 1 to +1
Measures degree of association
Does not indicate causeeffect relationship
Coefficient of Correlation
xy
xx yy
SS
r
SS SS
=
( )
( )
( )( )
2
2
2
2
xx
yy
xy
x
SS x
n
y
SS y
n
x y
SS xy
n
=
=
=

where
Coefficient of Correlation Values
1.0 +1.0 0 .5 +.5
No Linear
Correlation
Perfect
Negative
Correlation
Perfect
Positive
Correlation
Increasing degree of
positive correlation
Increasing degree of
negative correlation
Coefficient of Correlation
Example
Youre a marketing analyst for Hasbro Toys.
Ad $ Sales (Units)
1 1
2 1
3 2
4 2
5 4
Calculate the coefficient of
correlation.
Solution Table
x
i
y
i
x
i
2
y
i
2
x
i
y
i
1 1 1 1 1
2 1 4 1 2
3 2 9 4 6
4 2 16 4 8
5 4 25 16 20
15 10 55 26 37
Coefficient of Correlation Solution
( )
( )
( )( )
2
2
2
2
2
2
(15)
55 10
5
(10)
26 6
5
(15)(10)
37 7
5
xx
yy
xy
x
SS x
n
y
SS y
n
x y
SS xy
n
= = =
= = =
= = =

7
.904
10 6
xy
xx yy
SS
r
SS SS
= = =

Coefficient of Correlation Thinking


Challenge
Youre an economist for the county cooperative. You
gather the following data:
Fertilizer (lb.) Yield (lb.)
4 3.0
6 5.5
10 6.5
12 9.0
Find the coefficient of correlation.
Solution Table*
4 3.0 16 9.00 12
6 5.5 36 30.25 33
10 6.5 100 42.25 65
12 9.0 144 81.00 108
32 24.0 296 162.50 218
x
i

2
y
i
x
i
y
i
x
i
y
i

2
Coefficient of Correlation
Solution*
( )
( )
( )( )
2
2
2
2
2
2
(32)
296 40
4
(24)
162.5 18.5
4
(32)(24)
218 26
4
xx
yy
xy
x
SS x
n
y
SS y
n
x y
SS xy
n
= = =
= = =
= = =

26
.956
40 18.5
xy
xx yy
SS
r
SS SS
= = =

Coefficient of Determination
Proportion of variation explained by relationship between x and y
2
Explained Variation
Total Variation
yy
yy
SS SSE
r
SS

= =
0 s r
2
s 1
r
2
= (coefficient of correlation)
2

Coefficient of Determination
Example
Youre a marketing analyst for Hasbro Toys.
You know r = .904.
Ad $ Sales (Units)
1 1
2 1
3 2
4 2
5 4
Calculate and interpret the
coefficient of determination.
Coefficient of Determination
Solution
r
2
= (coefficient of correlation)
2
r
2
= (.904)
2
r
2
= .817
Interpretation: About 81.7% of the sample variation
in Sales (y) can be explained by using Ad $ (x) to
predict Sales (y) in the linear model.
r
2
Computer Output

Root MSE 0.60553 R-square 0.8167
Dep Mean 2.00000 Adj R-sq 0.7556
C.V. 30.27650



r
2
adjusted for number of
explanatory variables &
sample size
r
2
Using the Model for Prediction &
Estimation
Regression Modeling Steps
1. Hypothesize deterministic component
2. Estimate unknown model parameters
3. Specify probability distribution of random error
term
Estimate standard deviation of error
4. Evaluate model
5. Use model for prediction and estimation
Prediction With Regression
Models
Types of predictions
Point estimates
Interval estimates
What is predicted
Population mean response E(y) for given x
Point on population regression line
Individual response (y
i
) for given x
What Is Predicted
Mean y, E(y)
y
y
Individual
Prediction, y
^
x
x
P
Confidence Interval Estimate for
Mean Value of y at x = x
p

( )
xx
p
SS
x x
n
S t y
2
2 /
1


+
o
df = n 2
Factors Affecting
Interval Width
1. Level of confidence (1 o)
Width increases as confidence increases
2. Data dispersion (s)
Width increases as variation increases
3. Sample size
Width decreases as sample size increases
4. Distance of x
p
from meanx
Width increases as distance increases
Why Distance from Mean?
y
x
x
1
x
2

y
Greater
dispersion
than x
1
x
Confidence Interval Estimate
Example
Youre a marketing analyst for Hasbro Toys.
You find
0
= -.1,
1
= .7 and s = .6055.
Ad $ Sales (Units)
1 1
2 1
3 2
4 2
5 4
Find a 95% confidence interval for
the mean sales when advertising is $4.
^ ^
Solution Table
x
i
y
i
x
i
2
y
i
2
x
i
y
i
1 1 1 1 1
2 1 4 1 2
3 2 9 4 6
4 2 16 4 8
5 4 25 16 20
15 10 55 26 37
Confidence Interval Estimate
Solution
( )
( )( )
( )( )
( )
2
/ 2
2
1

.1 .7 4 2.7
4 3
1
2.7 3.182 .6055
5 10
1.645 ( ) 3.755
p
xx
x x
y t s
n SS
y
E Y
o

+
= + =

+
s s
x to be predicted
Prediction Interval of Individual
Value of y at x = x
p

( )
2
/ 2
1

1
p
xx
x x
y t S
n SS
o

+ +
Note!
df = n 2
Why the Extra S?
Expected
(Mean) y
y
y we're trying to
predict
Prediction, y
^
x
x
p

Prediction Interval Example
Youre a marketing analyst for Hasbro Toys.
You find
0
= -.1,
1
= .7 and s = .6055.
Ad $ Sales (Units)
1 1
2 1
3 2
4 2
5 4
Predict the sales when advertising
is $4. Use a 95% prediction interval.
^ ^
Solution Table
x
i
y
i
x
i
2
y
i
2
x
i
y
i
1 1 1 1 1
2 1 4 1 2
3 2 9 4 6
4 2 16 4 8
5 4 25 16 20
15 10 55 26 37
Prediction Interval Solution
( )
( )( )
( )( )
( )
2
/ 2
2
4
1

.1 .7 4 2.7
4 3
1
2.7 3.182 .6055 1
5 10
.503 4.897
p
xx
x x
y t s
n SS
y
y
o

+ +
= + =

+ +
s s
x to be predicted
Interval Estimate Computer
Output
Dep Var Pred Std Err Low95% Upp95% Low95% Upp95%
Obs SALES Value Predict Mean Mean Predict Predict
1 1.000 0.600 0.469 -0.892 2.092 -1.837 3.037
2 1.000 1.300 0.332 0.244 2.355 -0.897 3.497
3 2.000 2.000 0.271 1.138 2.861 -0.111 4.111
4 2.000 2.700 0.332 1.644 3.755 0.502 4.897
5 4.000 3.400 0.469 1.907 4.892 0.962 5.837

Predicted y
when x = 4
Confidence
Interval
S
Y
^
Prediction
Interval
Confidence Intervals versus
Prediction Intervals
x
y
x

You might also like