Professional Documents
Culture Documents
Regression
17-2
Regression
Regression: It is to explain the variation in one variable
(dependent variable), based on the variation in one or
more other variables (called independent variable)
Ex: To explain the variations in sales of a product based
on advertising expenses, number of sales people,
number of sales offices. Correlation & Regression are
generally performed together.
Correlation: to measure the degree of association
between two sets of quantitative data. (EX: How the
advertising expenditure correlated with other
promotional expenditure). Correlation is usually
followed by regression analysis.
17-3
Addone
onevariable
variableat
ataatime
timeto
tothe
themodel,
model,on
onthe
thebasis
basisof
ofits
its
Add
statistic
FFstatistic
Backward
Backward elimination
elimination
Removeone
onevariable
variableat
ataatime,
time,on
onthe
thebasis
basisof
ofits
itsFF
Remove
statistic
statistic
Stepwise
Stepwise regression
regression
Addsvariables
variablesto
tothe
themodel
modeland
andsubtracts
subtractsvariables
variablesfrom
from
Adds
themodel,
model,on
onthe
thebasis
basisof
ofthe
theFFstatistic
statistic
the
17-4
Using Statistics
Lines
B
Slope: 11
x1
Intercept: 00
x
Any two points (A and B), or
an intercept and slope (0 and
1), define a line on a twodimensional surface.
Planes
x2
Any three points (A, B, and C), or an
intercept and coefficients of x1 and x2
(0 , 1, and 2), define a plane in a
three-dimensional surface.
17-5
x y b x b x b x x
2
x y b x b x x b x
2
2
2
17-6
Example
YY
72
72
76
76
78
78
70
70
68
68
80
80
82
82
65
65
62
62
90
90
----743
743
XX11
12
12
11
11
15
15
10
10
11
11
16
16
14
14
88
88
18
18
----123
123
XX22
55
88
66
55
33
99
12
12
44
33
10
10
----65
65
2
1X2
XX1X
XX121
2
60
144
60
144
88
121
88
121
90
225
90
225
50
100
50
100
33
121
33
121
144
256
144
256
168
196
168
196
32
64
32
64
24
64
24
64
180
324
180
324
----------869 1615
1615
869
2
XX222
25
25
64
64
36
36
25
25
99
81
81
144
144
16
16
99
100
100
----509
509
1Y
XX1Y
864
864
836
836
1170
1170
700
700
748
748
1280
1280
1148
1148
520
520
496
496
1620
1620
------9382
9382
2Y
XX2Y
360
360
608
608
468
468
350
350
204
204
720
720
984
984
260
260
186
186
900
900
------5040
5040
NormalEquations:
Equations:
Normal
743==10b
10b+123b
0+123b+65b
1+65b2
743
0
1
2
9382==123b
123b+1615b
0+1615b+869b
1+869b2
9382
0
1
2
5040==65b
65b+869b
0+869b+509b
1+509b2
5040
0
1
2
47.164942
bb00==47.164942
1.5990404
bb11==1.5990404
1.1487479
bb22==1.1487479
Estimatedregression
regressionequation:
equation:
Estimated
47164942
15990404
11487479
47164942
YY
..
15990404
..
XX11 11487479
..
XX22
17-7
where0isisthe
theY-intercept
Y-interceptof
ofthe
the
where
0
regressionsurface
surfaceand
andeach
eachi, ,ii==1,2,...,k
1,2,...,k
regression
i
theslope
slopeof
ofthe
theregression
regressionsurface
surface-isisthe
sometimescalled
calledthe
theresponse
responsesurface
surface-sometimes
withrespect
respecttotoXX.i.
with
i
x2
x1
y 0 1 x1 2 x 2
Modelassumptions:
assumptions:
Model
2
~N(0,2),
),independent
independentofofother
othererrors.
errors.
1.1. ~N(0,
Thevariables
variablesXXiare
areuncorrelated
uncorrelatedwith
withthe
theerror
errorterm.
term.
2.2. The
i
17-8
x1
y b0 b1x
X
simpleregression
regressionmodel,
model,
model
InInaasimple
model
theleast-squares
least-squaresestimators
estimators
the
minimizethe
thesum
sumof
ofsquared
squared
minimize
errorsfrom
fromthe
theestimated
estimated
errors
regressionline.
line.
regression
x2
y b0 b1 x1 b2 x 2
multipleregression
regressionmodel,
model,
model
InInaamultiple
model
theleast-squares
least-squaresestimators
estimators
the
minimizethe
thesum
sumofofsquared
squared
minimize
errorsfrom
fromthe
theestimated
estimated
errors
regressionplane.
plane.
regression
17-9
whereY isisthe
thepredicted
predictedvalue
valueof
ofY,
Y,the
thevalue
valuelying
lyingon
onthe
the
where
estimatedregression
regressionsurface.
surface. The
Theterms
termsbbi,i,for
forii==0,
0,1,1,....,k
....,kare
are
estimated
theleast-squares
least-squaresestimates
estimatesof
ofthe
thepopulation
populationregression
regression
the
parametersi.i.
parameters
Theactual,
actual,observed
observedvalue
valueof
ofYYisisthe
thepredicted
predictedvalue
valueplus
plusan
an
The
error:
error:
+.....++bbkkxxkjkj+e,
+e, jj==1,1,,
,n.n.
yyj j==bb00++bb11xx1j1j++bb22xx2j2j+.
17-10
17-11
r=
=1
i
(X i X )2 (Y i Y )2
i =1
=1
i
n
r=
=1
i
(X i X )(Y i Y )
n1
(X i X )2
n1
COV x y
SxSy
i =1
(Y i Y )2
n1
17-12
17-13
Partial Correlation
A partial correlation coefficient measures the
association between two variables after controlling for,
or adjusting for, the effects of one or more additional
variables.
rx y . z =
rx y (rx z )(ry z )
1rx2z 1ry2z
17-14
17-15
17-16
SEb
17-17
t=
SEb
17-18
Multiple Regression
The general form of the multiple regression
model
is as follows:
Y = 0 + 1 X1 + 2 X2 + 3 X3+ . . . + k Xk + e
= a + b 1 X 1 + b 2 X 2 + b 3X 3 + . . . + b kX k
17-19
17-20
Stepwise Regression
The purpose of stepwise regression is to select, from a large
number of predictor variables, a small subset of variables
that account for most of the variation in the dependent or
criterion variable. In this procedure, the predictor variables
enter or are removed from the regression equation one at a
time.
17-21
Multicollinearity
17-22
Total deviation: Y Y
Y Y: Error Deviation
Y Y : Regression Deviation
x1
x2
TotalDeviation
Deviation==Regression
RegressionDeviation
Deviation++Error
ErrorDeviation
Deviation
Total
SST
SST
==
SSR
SSR
SSE
++ SSE
17-23
S
o
u
r
c
e
o
f
S
u
m
o
f
S
o
u
r
c
e
o
f
S
u
m
o
f
V
V
aarriaiatitoionnSSqquuaarreess
R
R
R
eeggrreesssioionnSSSR
E
E
E
rrroorr SSSE
T
tall SSST
T
T
oota
D
e
g
r
e
e
s
o
f
D
e
g
r
e
s
o
f
F
mM
M
FR
R
F
rreeeddoom
eeaannSSqquuaarree F
aattioio
kk
nn--((kk++11))
nn--11
SSR
MSR
MSE
SSE
( n ( k 1))
MST
SST
( n 1)
17-24
( y y) 2
MSE
( n ( k 1)) ( n ( k 1))
SSE
x1
x2
s = MSE
Errors: y - y
2
The multiple coefficient of determination, R , measures the proportion of
the variation in the dependent variable that is explained by the combination
of the independent variables in the multiple regression model:
SSR
SSE
R =
=1SST
SST
2
17-25
SSE
=
SSR
SST
= 1-
SSE
SST
Example1:1:
Example
1.911
ss==1.911
R-sq==96.1%
96.1%
R-sq
R-sq(adj)==95.0%
95.0%
R-sq(adj)
17-26
S
o
u
r
c
e
o
f
S
u
m
o
f
S
o
u
r
c
e
o
f
S
u
m
o
f
V
iatito
ionnSS
V
aarria
qq
uu
aarreess
R
ionS
nS
R
R
eeggrreesssio
SS
R
E
E
E
rrroorr SSS
E
T
tall SSS
T
T
oota
T
D
e
g
r
e
e
s
o
f
D
e
g
r
e
s
o
f
F
mM
FR
R
io
F
rreeedd
oomM
eeaannSS
qq
uu
aarree F
aatito
(k))
(k
(=
n
-((n-k
(-kk+
+
1
)
)
(=n
-(n
1
)
k-1
-1))
(n--11))
(n
SSR
SST
= 1-
SSE
SST
MSR
MSE
MSR
MSE
SSE
( n ( k 1))
MST
SSR
SST
( n 1)
SSE
( n ( k 1))
2
(1 R )
(k )
= 1-
(n - (k + 1))
SST
(n - 1)
MSE
MST
17-27
0
Test
statistic
for
test
i
:
t
( n ( k 1 )
( n ( k 1 )
ss((bb))
i
17-28
It appears that the residuals are randomly distributed with no pattern and
with equal variance as M1 increases
17-29
It appears that the residuals are increasing as the Price increases. The
variance of the residuals is not constant.
17-30
17-31
Regression line
without outlier
. .
.
.. ..
. .. ..
.. .
.
y
Regression
line with
outlier
.
.
.
.
.. .. .. .
. .. .
Regression line
when all data are
included
No relationship in
this cluster
* Outlier
Outliers
Outliers
x
InfluentialObservations
Observations
Influential
17-32
.
.
.
.
.. .. .. .
. .. .
x
x
x
x x
x
x
x
*
x
x
x x
x
x
x
x
x
x
x
x x
x
More appropriate curvilinear relationship
(seen when the in between data are known).
17-33
Stepwise Regression
Compute F statistic for each variable not in the model
No
Stop
Yes
Enter most significant (smallest p-value) variable into model
Remove
variable
17-34
(SSE
F
(r, (n (k 1))
SSE ) / r
R
F
MSE
F
whereSSE
SSERisisthe
thesum
sumofofsquared
squarederrors
errorsofofthe
thereduced
reducedmodel,
model,SSE
SSEFisisthe
thesum
sumofofsquared
squared
where
R
F
errorsofofthe
thefull
fullmodel;
model;MSE
MSEFisisthe
themean
meansquare
squareerror
errorofofthe
thefull
fullmodel
model[MSE
[MSEF==SSE
SSE/(nF/(nerrors
F
F
F
(k+1))];rrisisthe
thenumber
numberofofvariables
variablesdropped
droppedfrom
fromthe
thefull
fullmodel.
model.
(k+1))];
17-35
Duration of
Residence
Importance
Attached to
Weather
10
12
11
12
10
12
11
11
18
10
10
11
10
17
12
17-36
X
Y
n
i =1
= (10 + 12 + 12 + 4 + 12 + 6 + 8 + 2 + 18 + 9 + 17 + 2)/
= 9.333
= (6 + 9 + 8 + 3 + 10 + 4 + 5 + 2 + 11 + 9 + 10 + 2)/12
= 6.583
(X i X )(Y i Y )
=
+
+
+
+
+
=
+
+
=
17-37
i =1
n
i =1
( X i X )2
(Y i Y )2
Thus,
r=
179.6668
(304.6668)(120.9168)
= 0.9361
17-38
=SS x
SS y
=TotalvariationErrorvariation
Totalvariation
SS y SS error
=
SS y