You are on page 1of 22

3-Nov-10

1
Simple Regression
&
Correlation Analysis
3-Nov-10
2
Correlation and Regression
Correlation is a measure of the
degree of relatedness of two
variables.
Regression analysis is the process
of constructing a mathematical
model or function that can be used
to predict or determine one variable
by another variable.
3-Nov-10
3
Measures of Association
Measuring the strength of Association between two
variables.
Non Linear Association
Linear Association
Correlation Ratio
Correlation
Concordant Pairs
Corr Coef.
Pearson
Spearman
rank Corrr
Ratio/Interval
Ordinal Data
Contingency
Coeff (Chi Sq)
Nominal Data
Kendall
s
Tau
3-Nov-10
4
Pearson Correlation
r
xy
= SS
xy
/ \ (SS
xx
* SS
yy
)
= Cov (x,y) / (o
x
* o
y
)
3-Nov-10
5
Interpretation of Correlation
r Interpretation
1.00 Perfect Positive
.80 Strong Positive
.50 Moderate Positive
.20 Weak Positive
0.0 No Relationship
-.20 Weak Negative
-.50 Moderate Negative
-.80 Strong Negative
-1.00 Perfect Negative
Ref: Newton & Rudestam,1999, Your Statistical Consultant, Sage Publications, NY.
3-Nov-10
6
Testing of Correlation
Does Linear correlation in a sample imply correlation in the population ?
Small Sample
Large Sample
t Test
r table (Biometrika Table)
Rule: r > r
n,o/2
; reject H
0
at o
Here, r
10,.025
= .632; r
x2,y
is Sig
No. of sample
t
(n-2)df
= [ r / \(1-r
2
)] * \ (n-2)
Here; t
x1,y
= [0.50/ \ (1-0.50
2
)] * \8 = 1.63
t
x2,y
= [0.77/ \ (1-0.77
2
)] * \8 = 3.42*
t
x1,x2
= [0.185/ \ (1-0.185
2
)] * \8 = 0.53
tab t
8, 0.025
= 2.306
Z = r / [1/ \(n-1)]
Rule; z > z
o/2
; Reject
H
0
at o
3-Nov-10
7
Application of SPSS
x
1
x
2
Y
x
1

Sig
1.00 .184
.611
.502
.140
x
2
Sig
.184
.611
1.00 .771*
.009
Y

Sig
.502
.140
.771*
.009
1.00


Corr Matrix
Significant at 1% sig. level
3-Nov-10
8
Regression Models
Deterministic Regression Model
Y = |
0
+ |
1
X
Probabilistic Regression Model
Y = |
0
+ |
1
X + c
|
0
& |
1
are population parameters
|
0
& |
1
are estimated by sample statistics b
0
& b
1
3-Nov-10
9
Assumptions in Regression
The Relationship is of the Linear Form
Errors should have constant variance
Errors should be independent
The Distribution of Errors about the
Regression Line is approximately Normal
Very high relationship bet Dependent &
Predictors
Predictors should be independent to each
other.
3-Nov-10
10
Equation of Simple Regression Line
Y Y
where
X Y
b
b
b b
of value predicted the =

slope sample the =


intercept sample the = :

1
0
1 0
+ =
3-Nov-10
11
Least Squares Analysis
( )( )
( )( )
( )
SS X X Y Y XY
X Y
n
SS
n
SS
SS
XY
XX
XY
XX
X X X
X
b
= =
= =

=



2
2
2
1
0 1 1
b b b
Y X
Y
n
X
n
= =

3-Nov-10
12
Evaluating the Model
Overall significance of Model
SE of Estimate (o)
Coefficient of Determinants (R
2
)
Significance tests of Regression
Coefficients
Residual Analysis
3-Nov-10
13
Coefficient of Determination
( )
( )
( )
n
Y
2
Y
2
SSE
1
SSYY
SSE
1
SSYY
SSR
r
2
SSYY
SSE
SSYY
SSR
1
SSE SSR SSYY
Var nexplained Var xplained SSYY
n
Y
2
Y
2
Y Y
2
SYY

=
=
=
+ =
+ =
+ =

=

=
U E
S
0 1
2
s s
r
3-Nov-10
14
Standard Error of the Estimate
( )
2
1 0
2
2

=
=
=


n
SSE
XY Y
SSE
S
b b Y
Y Y
e
Sum of Squares Error
Standard Error
of the
Estimate
3-Nov-10
15
Standard Error of the Estimate
for the Airline Cost Example
( )
1773 . 0
10
31434 . 0
2
31434 . 0

2
=
=

=
=
=


n
SSE
SSE
S
Y Y
e
Sum of Squares Error
Standard Error
of the
Estimate
3-Nov-10
16
SPSS Output: Airline Cost Example
Model R R
2
Adj R
2
SE of
Estimate
1
.948(a) .899 .889 177.217
a Predictors: (Constant), npass
b Dependent Variable: cost
3-Nov-10
17
Sum of
Squares df
Mean
Square F Sig.
1
Regression
2798031.39 1 2798031.394 89.092 .000(a)
Residual
314060.272 10 31406.027
Total
3112091.66 11
a Predictors: (Constant), npass
b Dependent Variable: cost
3-Nov-10
18
Coeffici ents
a
1569.793 338.083 4. 643 .001
40. 702 4. 312 .948 9. 439 .000
(Constant)
npass
Model
1
B Std. Error
Unstandardized
Coef f icients
Beta
Standardized
Coef f icients
t Sig.
Dependent Variable: cost
a.
3-Nov-10
19
Casewi se Di agnostics
a
1.283 4280 4052.59 227.41
-.305 4080 4133.99 -53.99
.695 4420 4296.80 123.20
-1.175 4170 4378.20 -208.20
.345 4480 4418.90 61.10
-1.590 4300 4581.71 -281.71
.885 4820 4663.11 156.89
-.940 4700 4866.62 -166.62
.225 5110 5070.13 39.87
-.811 5130 5273.64 -143.64
1.149 5640 5436.44 203.56
.238 5560 5517.85 42.15
Case Number
1
2
3
4
5
6
7
8
9
10
11
12
St d. Residual COST
Predicted
Value Residual
Dependent Variable: COST
a.
3-Nov-10
20
Residuals Statistics(a)
Min Max Mean S.D N
Predicted Value 4052.59 5517.85 4724.17 504.348 12
Std. Predicted Value -1.332 1.574 .000 1.000 12
Standard Error of
Predicted Value
51.566 98.426 70.705 16.015 12
Adjusted Predicted
Value
3978.99 5499.05 4716.44 504.060 12
Residual -281.711 227.410 .000 168.970 12
Std. Residual -1.590 1.283 .000 .953 12
Stud. Residual -1.667 1.476 .020 1.041 12
Deleted Residual -309.772 301.015 7.723 202.237 12
Stud. Deleted
Residual
-1.861 1.584 .011 1.093 12
Mahal. Distance .015 2.476 .917 .829 12
Cook's Distance .004 .353 .101 .119 12
Centered Leverage
Value
.001 .225 .083 .075 12
3-Nov-10
21
Graph of Residuals
for the Airline Cost Example
100 90 80 70 60
0.2
0.1
0.0
-0.1
-0.2
-0.3
Number of Passengers
R
e
s
i
d
u
a
l
3-Nov-10
22

NPASS
97
95
91
86
81
76
74
70
69
67
63
61
Value
6000
5000
4000
COST
Unstandardized Predi
cted Value

You might also like