You are on page 1of 35

Engineering Statistics

Regression Analysis
Ch t 11
Chapter

Learning Objectives
1 Describe the linear regression model
1.
2. State the regression modeling steps
3. Explain the least squares method
4. Compute regression coefficients
5. Predict the response variable
6. Interpret computer output
1/8/2016

R. Ali | Regression Analysis

Curve Fitting
To establish a relationship which make it possible to
predict one or more variables in terms of others.
others
This problem of predicting the avg. value of one variable
in terms of known values of another variable is called the
Problem of Regression
Curve Fitting
IIn procedure
d
off curve fitting
fitti we face
f
three
th
kinds
ki d off problems:
bl
we must decide what kind of curve ( eq.) we want to use.
We must find particular eq. which is best in some sense
Investigate certain questions regarding merits of that
equation and prediction made from it.
1/8/2016

R. Ali | Regression Analysis

Models
1. Representation
p
of some p
phenomenon
2. Mathematical model is a mathematical
expression of some phenomenon
3. Often describe relationships between
variables
4 Types
4.
Deterministic models
Probabilistic models
1/8/2016

R. Ali | Regression Analysis

Deterministic Models
1. Conjecture
j
exact relationships
p
2. No prediction error
3. Example: Force is exactly
mass times acceleration
F = ma

1/8/2016

R. Ali | Regression Analysis

Probabilistic Models
1. Conjecture
j
2 components
p
Deterministic
Random error
Y = deterministic component + random error

2. Example: Sales volume is 10 times


Advertising spending + Random error
Y = 10X +
Random error may be due to factors other
than advertising
1/8/2016

R. Ali | Regression Analysis

Types of
Probabilistic Models
Probabilistic
Models

Regression
Models

1/8/2016

Correlation
Models

Other
Models

R. Ali | Regression Analysis

13

Regression Models
1. Answer What is the relationship
p between
the variables?
2. Equation used
1 Numerical dependent (response) variable
What is to be predicted
1 or more numerical or categorical
independent (explanatory) variables

3. Used mainly for prediction & estimation


1/8/2016

R. Ali | Regression Analysis

16

Regression Modeling Steps


1. Conjecture
j
deterministic component
p
2. Estimate unknown model parameters
3. Specify probability distribution of random
error term
Estimate standard deviation of error

4. Evaluate model
5. Use model for prediction & estimation
1/8/2016

R. Ali | Regression Analysis

17

Model Specification
Is Based on Theory
1.
1
2.
3.
4.

1/8/2016

Theory of field (e.g.,


(e g microeconomics)
Mathematical theory
Previous research
Common sense

R. Ali | Regression Analysis

21

Thinking Challenge:
Which Is More Logical?
Sales

Sales

Advertising
Sales

Advertising
Sales

Advertising
1/8/2016

Advertising
R. Ali | Regression Analysis

22

Types of
Regression Models
1 Explanatory
Variable

Regression
Models

Multiple

Simple

1/8/2016

2+ Explanatory
Variables

R. Ali | Regression Analysis

26

Types of
Regression Models
1 Explanatory
Variable

Regression
Models

2+ Explanatory
Variables

Multiple

Simple

Linear

1/8/2016

NonLinear

Linear

NonLinear

R. Ali | Regression Analysis

Linear Regression Model

30

Linear Equations
Y
Y = mX + b
m = Slope

Change
in Y

Change in X
b=Y
Y-intercept
intercept

X
High school knowledge
1/8/2016

R. Ali | Regression Analysis

38

Linear Regression Model


1 Relationship between variables is a
1.
linear function
Population
Y-intercept

Population
slope

Independent
(explanatory)
variable

Yi = 0 + 1X i + i
Dependent
(response)
variable
1/8/2016

Random
error
R. Ali | Regression Analysis

45

Population & Sample


Regression Models
Population

$
$

$
$
$
$

1/8/2016

R. Ali | Regression Analysis

47

Population & Sample


Regression Models
Random Sample

Population

Yi = $ 0 + $ 1X i + $ i

Unknown
Relationship

Yi = 0 + 1X i + i

1/8/2016

$
$
$
R. Ali | Regression Analysis

$
$

50

Population Linear
Regression Model
Y

Observed
value

X
Observed value
1/8/2016

R. Ali | Regression Analysis

53

Population Linear
Regression Model
Y

Observed
value

E (Y ) = 0 + 1 X i
X
Observed value
1/8/2016

R. Ali | Regression Analysis

55

Population Linear
Regression Model
Yi = 0 + 1X i + i

Observed
value

i = Random error

E (Y ) = 0 + 1 X i
X
Observed value
1/8/2016

R. Ali | Regression Analysis

57

Sample Linear
Regression Model
Y

Yi = $ 0 + $ 1X i + $ i
^i = Random
error

Y$i = $ 0 + $ 1X i

Unsampled
observation
b
ti

X
Observed value
1/8/2016

R. Ali | Regression Analysis

61

Estimating Parameters:
Least Squares Method

Scattergram
1. Plot of all ((Xi, Yi) p
pairs
2. Suggests how well model will fit
60
40
20
0

0
1/8/2016

20

40

R. Ali | Regression Analysis

X
60
65

Thinking Challenge
How would you draw a line through the
points?
i t ? How
H
do
d you determine
d t
i which
hi h line
li
fits best?
60
40
20
0

1/8/2016

20

40

X
60

R. Ali | Regression Analysis

66

Thinking Challenge
How would you draw a line through the
points?
i t ? How
H
do
d you determine
d t
i which
hi h line
li
fits best?
60
40
20
0

1/8/2016

20

40

R. Ali | Regression Analysis

X
60

67

Thinking Challenge
How would you draw a line through the
points?
i t ? How
H
do
d you determine
d t
i which
hi h line
li
fits best?
60
40
20
0

1/8/2016

20

40

X
60

R. Ali | Regression Analysis

68

Thinking Challenge
How would you draw a line through the
points?
i t ? How
H
do
d you determine
d t
i which
hi h line
li
fits best?
60
40
20
0

1/8/2016

20

40

R. Ali | Regression Analysis

X
60

69

Thinking Challenge
How would you draw a line through the
points?
i t ? How
H
do
d you determine
d t
i which
hi h line
li
fits best?
60
40
20
0

1/8/2016

20

40

X
60

R. Ali | Regression Analysis

70

Thinking Challenge
How would you draw a line through the
points?
i t ? How
H
do
d you determine
d t
i which
hi h line
li
fits best?
60
40
20
0

1/8/2016

20

40

R. Ali | Regression Analysis

X
60

71

Least Squares Method


1. Best fit means difference between
actual Y values & predicted Y values
are a minimum
But positive differences off-set negative

(Y Y ) =
n

i =1

1/8/2016

2
i

i =1

R. Ali | Regression Analysis

74

Least Squares Method


1. Best fit means difference between
actual Y values & predicted Y values
are a minimum
But positive differences off-set negative

(Y
n

i =1

Yi

i2

i =1

2. LS minimizes the sum of the squared


differences (SSE)
1/8/2016

R. Ali | Regression Analysis

75

Least Squares Method


Graphically
n

LS minimizes
minimizes

$ i2 =

$ 12 + $ 22 + $ 23 + $ 24

i =1

Y2 = $ 0 + $ 1X 2 + $ 2

^4

^2
^1

^3

Y$i = $ 0 + $ 1X i
X

1/8/2016

R. Ali | Regression Analysis

83

Coefficient Equations
Prediction Equation

Y$i = $ 0 + $ 1X i

(
X )( Y )
XY
n

1 = i =1

i =1

X
i =1

(
X )

Sample YY-intercept
1/8/2016

i =1

$ 0 = Y $ 1X

2
i

i =1

Sample Slope
R. Ali | Regression Analysis

87

Computation Table
Xi

Yi

Xi2

Yi2

XiYi

X1

Y1

X1 2

Y1 2

X1Y1

X2

Y2

X2 2

Y2 2

X2Y2

Xn

Yn

Xn2

Yn2

XnYn

Xi

Yi

Xi2

Yi2

XiYi

1/8/2016

R. Ali | Regression Analysis

88

Interpretation of Coefficients
^

1. Slope
p ( 1)

^
Estimated Y changes by 1 for each 1 unit
increase in X

^
If 1 = 2, then Sales (Y) is expected to increase
by 2 for each 1 unit increase in Advertising (X)

2. Y-Intercept
p ( 0)

^
Average value of Y when X = 0

^
If 0 = 4, then average Sales (Y) is expected to
be 4 when Advertising (X) is 0

1/8/2016

R. Ali | Regression Analysis

89

Parameter Estimation Example


Youre a marketing analyst for Hasbro
Toys. You gather the following data:
Sales (Units)
Ad $
1
1
2
1
3
2
4
2
5
4
What is the relationship
between sales & advertising?
1/8/2016

R. Ali | Regression Analysis

90

Scattergram Sales vs. Advertising

Sales
4

3
2
1
0
0

Advertising
1/8/2016

R. Ali | Regression Analysis

91

Parameter Estimation
Solution Table
Xi

Yi

Xi2

Yi2

XiYi

16

25

16

20

15

10

55

26

37

1/8/2016

R. Ali | Regression Analysis

92

Parameter Estimation
Solution

(
X )(Y )
XY
n

1 = i=1

i=1

i i

i=1

(
X )
X

i=1

1/8/2016

2
i

i=1

37

(15)(10)

5 = 0.70
(15)2
55
5

R. Ali | Regression Analysis

93

Parameter Estimation
Solution

(
X )( Y )
XY
n

1 = i =1

i =1

i =1

(
X )

X
i =1

2
i

37

i =1

(15)(10)

5
= 0.70
(15)2
55
5

0 = Y 1 X = 2 (0.70)(3) = 0.10
1/8/2016

R. Ali | Regression Analysis

94

Coefficient Interpretation
Solution
^

1. Slope
p ( 1)
Sales Volume (Y) is expected to increase
by .7 units for each $1 increase in
Advertising (X)

1/8/2016

R. Ali | Regression Analysis

96

Coefficient Interpretation
Solution
1. Slope
p (^1)
Sales Volume (Y) is expected to increase
by .7 units for each $1 increase in
Advertising (X)
^

2. Y-Intercept (0)

Average
e age value
a ue o
of Sa
Sales
es Volume
o u e ((Y)) is
s
-.10 units when Advertising (X) is 0
Difficult to explain to Marketing Manager
Expect some sales without Advertising
1/8/2016

R. Ali | Regression Analysis

97

Parameter Estimation
Thinking Challenge
You re an economist for the county
Youre
cooperative. You gather the following data:
Fertilizer (lb.) Yield (lb.)
4
3.0
6
5.5
10
6.5
12
9.0
What is the relationship
between fertilizer & crop yield?

1984-1994 T/Maker Co.

1/8/2016

R. Ali | Regression Analysis

100

Scattergram
Crop Yield vs. Fertilizer*
Yield (lb.)
10
8
6
4
2
0

10

15

Fertilizer (lb.)
1/8/2016

R. Ali | Regression Analysis

101

Parameter Estimation
Solution Table*

1/8/2016

Xi

Yi

Xi2

Yi 2

Xi Yi

3.0

16

9.00

12

5.5

36

30.25

33

10

6.5

100

42.25

65

12

90
9.0

144

81 00
81.00

108

32

24.0

296

162.50

218

R. Ali | Regression Analysis

102

Parameter Estimation
Solution*

(
X )( Y )
XY
n

1 = i =1

i =1

i =1

(
X )

X
i =1

2
i

i =1

218

(32)(24)

4
= 0.65
(32)2
296
4

0 = Y 1 X = 6 (0.65)(8) = 0.80
1/8/2016

R. Ali | Regression Analysis

103

Coefficient Interpretation
Solution*
1 Slope (^1)
1.
Crop Yield (Y) is expected to increase by
.65 lb. for each 1 lb. increase in Fertilizer (X)
^

2. Y-Intercept (0)

Average
g Crop
p Yield ((Y)) is expected
p
to be
0.8 lb. when no Fertilizer (X) is used

1/8/2016

R. Ali | Regression Analysis

106

Exercise 10.1
Why do we generally prefer a probabilistic model to a
deterministic model? Give examples for which the two
types of models might be appropriate.
Most variables do not have an exact relationship
If you are trying to determine how much will pay to rent a car, a
deterministic model would be appropriate. You would pay a fixed
amount plus so much per mile. Therefore # miles you drive
would determine how much you pay.
If you are trying to determine a persons weight base on his/her
height, a probabilistic model is appropriate. You can't determine
weight directly by height. There would be a deterministic
component
and a random
error.
1/8/2016
R. Ali | Regression Analysis
107

Exercise 10.2, 10.3

What is the line of means?

If a straight-line probabilistic relationship relates


the mean E(y) to an independent variable x, does it
imply that every value of the variable y will always
fall exactly on the line of means? Why or why not?

1/8/2016

R. Ali | Regression Analysis

108

Exercise 10.4
Columns 3 and 4 are for the preliminary computations to find
given p
pairs of x and y values. After the LS
the LS line for the g
line has been obtained, columns5,6, and 7 are used to compare
the observed and predicted values of y and to calculate the
SSE
a. Complete col. 3 & 4 of the table . Calculate the totals for columns 1-4.
b. Find SSxy.
c.
Find SSxx.
Find 1.
Find . x

d.
e.

and

f.
Find 0.
g. Find the least squares line and write it at the top of column 5.
h. Complete columns 5,6, and 7 of the table.
1/8/2016

R. Ali | Regression Analysis

109

Exercise 10.4
Xi

Yi

1/8/2016

2
i

x iy i

x y

y=

2 Ali | Regression Analysis


R.
i i
i

(Y-y)

(Y-y)2

( y y ) ( y y )

2
110

Measures of Variation
in Regression
1. Total sum of squares
q
((SSyy)
Measures variation of observed Yi
around the meanY

2. Explained variation (SSR)


Variation due to relationship between
X&Y

3. Unexplained variation (SSE)


Variation due to other factors
1/8/2016

R. Ali | Regression Analysis

137

Variation Measures
Y

Yi

Y
Xi
1/8/2016

R. Ali | Regression Analysis

X
140

Variation Measures
Y

Yi
Total sum
of squares
(Yi -Y)2

Y
X

Xi
1/8/2016

R. Ali | Regression Analysis

141

Variation Measures
Y

Yi
Total sum
of squares
(Yi -Y)2

Y$i = $ 0 + $ 1X i

Y
Xi
1/8/2016

R. Ali | Regression Analysis

X
142

Variation Measures
Y

Yi

Unexplained sum
^ )2
of squares (Yi - Y
i

Total sum
of squares
(Yi -Y)2

Y$i = $ 0 + $ 1X i

Explained sum of
^
squares (Yi -Y)2

Xi
1/8/2016

R. Ali | Regression Analysis

144

Coefficient of Determination
1. Proportion
p
of variation explained
p
by
y
relationship between X & Y
0 r2 1

r2 =

Explained Variation
Total Variation
2
(Yi Y ) (Yi Y )
n

= i =1

i =1

(Yi Y )
n

i =1

1/8/2016

R. Ali | Regression Analysis

148

Coefficient of
Determination Examples
Y

r2 = 1

r2 = 1

X
Y

X
Y

r2 = .8
8

r2 = 0

X
1/8/2016

R. Ali | Regression Analysis

X
153

Coefficient of
Determination Example
Youre a marketing analyst for Hasbro
Toys. You find ^0 = -0.1 & ^1 = 0.7.
Sales (Units)
Ad $
1
1
2
1
3
2
4
2
5
4
Interpret a coefficient of
determination of 0.8167.
1/8/2016

R. Ali | Regression Analysis

154

Correlation Models

Correlation Models
1. Answer How
How strong is the linear
relationship between 2 variables?
2. Coefficient of correlation used
Population coefficient denoted (rho)
Values range from -1 to +1
Measures degree of association

3. Used mainly for understanding


1/8/2016

R. Ali | Regression Analysis

215

Sample Coefficient
of Correlation
1. Pearson p
product moment coefficient of
correlation, r

r = Coefficient of Determination
( X i X )(Yi Y )
n

i =1

2
(
)
(
)

X
X
Y
Y
i
i
i =1
i =1
n

1/8/2016

R. Ali | Regression Analysis

216

Coeff Correlation
Equivalently:
q
y

1/8/2016

R. Ali | Regression Analysis

217

Coefficient of Correlation
Values
N
No
Correlation

-1.0

-.5

1/8/2016

+.5

+1.0

R. Ali | Regression Analysis

220

Coefficient of Correlation
Values
N
No
Correlation

-1.0

-.5

+.5

+1.0

Increasing degree of
negative correlation
1/8/2016

R. Ali | Regression Analysis

221

Coefficient of Correlation
Values
Perfect
N
Negative
ti
Correlation

-1.0

N
No
Correlation

-.5

1/8/2016

+.5

+1.0

R. Ali | Regression Analysis

222

Coefficient of Correlation
Values
Perfect
N
Negative
ti
Correlation

-1.0

N
No
Correlation

-.5

+.5

+1.0

Increasing degree of
positive correlation
1/8/2016

R. Ali | Regression Analysis

223

Coefficient of Correlation
Values
Perfect
N
Negative
ti
Correlation

-1.0

Perfect
P iti
Positive
Correlation

N
No
Correlation

-.5

1/8/2016

+.5

R. Ali | Regression Analysis

+1.0

224

Coefficient of Correlation
Examples
Y

r=1

r = -1
1

X
Y

r = .89
89

X
Y

X
1/8/2016

R. Ali | Regression Analysis

r=0

X
229

You might also like