Professional Documents
Culture Documents
Covariance as a
statistical construct is
unbounded and thus
difficult to interpret in
its raw form
Correlation (Pearsons
r) is a measure of the
direction and degree
of a linear association
between two variables
Correlation is the
standardized
covariance between
two variables
cov( x, y )
( x x)( y y )
i 1
n 1
cov( x, y )
n
cov( x, y )
rxy
sx s y
rxy
1 r 1
Z
i 1
xi
Z yi
n 1
bx a
Variable X
Criterio
n
Variable Y
C
Variable Z
where and
are the means based on the sets
of the Y and X values respectively, and b is the
estimated slope
These calculations ensure that the regression line
passes through the point on the scatterplot
defined by the two means
Alternatively , slope
s y
b r
sx
so, by substituting we get
s
s
y
y
Y r X Y r X
sx
sx
values
S2y
S2y
S2(yi - yi)
y
Total variance =
predicted variance + error variance
2
y
( y
y)
n 1
s r s
2
y
r
2
2 2
y
s
s
2
y
2
y
predicted from X
SY X
( ) 2
SS residual
df residual
SY X (1 R 2 )
N 1
2
Intercept
Value of Y if X is 0
Often not meaningful, particularly if its practically
Slope
R2
Regression
Example
Group
0
0
0
0
0
1
1
1
1
1
DV
3
5
7
2
3
6
7
7
8
9
Graphical
display
The
R-square is .
762 = .577
The regression
equation is
Y 4 3.4 X
Look
1.00
Mean
dv
Valid N (listwise)
dv
Valid N (listwise)
Std. Deviation
4.0000
2.00000
7.4000
1.14018
a. No statistics are computed for one or more split files because there are no valid cases.
Coefficients a
Unstandardized
Coefficients
Model
1
Standardized
Coeffic ients
Std. Error
(Constant)
4.000
.728
group
3.400
1.030
a. Dependent Variable: dv
Beta
.760
Sig.
Lower Bound
Upper Bound
5.494
.001
2.321
5.679
3.302
.011
1.026
5.774
Descriptive Statistics a
group
.00
1.00
Mean
dv
Valid N (listwise)
dv
Valid N (listwise)
Y 4 3.4 X
Std. Deviation
4.0000
2.00000
7.4000
1.14018
a. No statistics are computed for one or more split files because there are no valid cases.
Coefficients a
Unstandardized
Coeffic ients
Model
1
Standardized
Coeffic ients
Std. Error
(Constant)
4.000
.728
group
3.400
1.030
a. Dependent Variable: dv
Beta
.760
Sig.
Lower Bound
Upper Bound
5.494
.001
2.321
5.679
3.302
.011
1.026
5.774
Descriptive Statistics a
group
.00
1.00
Mean
dv
Valid N (listwise)
dv
Valid N (listwise)
Y 4 3.4 X
Std. Deviation
4.0000
2.00000
7.4000
1.14018
a. No statistics are computed for one or more split files because there are no valid cases.
Coefficients a
Unstandardized
Coeffic ients
Model
1
Standardized
Coeffic ients
Std. Error
(Constant)
4.000
.728
group
3.400
1.030
a. Dependent Variable: dv
Beta
.760
Sig.
Lower Bound
Upper Bound
5.494
.001
2.321
5.679
3.302
.011
1.026
5.774
Analysis of variance
Recall that in regression we are trying to
account for the variance in the DV
That total variance reflects the sum of the
squared deviations of values from the DV
mean
Sums of squares
regression
Error variance
ANOVA b
Model
1
Sum of Squares
df
Mean Square
Regression
28.900
28.900
Residual
21.200
2.650
Total
50.100
F
10.906
Sig.
.011 a
ANOVA b
Model
1
Sum of Squares
df
Mean Square
Regression
28.900
28.900
Residual
21.200
2.650
Total
50.100
Sig.
10.906
.011 a
df
Mean Square
group
28.900
28.900
Error
21.200
2.650
Total
375.000
10
50.100
Corrected Total
F
10.906
Sig.
.011
Partial Eta
Squared
.577
ANOVA b
Model
1
Sum of Squares
df
Mean Square
Regression
28.900
28.900
Residual
21.200
2.650
Total
50.100
Sig.
10.906
.011 a
df
Mean Square
group
28.900
28.900
Error
21.200
2.650
Total
375.000
10
50.100
Corrected Total
F
10.906
Sig.
.011
Partial Eta
Squared
.577
t
dv
df
-3.302
Sig. (2-tailed)
Mean Difference
.011
-3.40000
Std. Error
Difference
1.02956
Lower
Upper
-5.77418
-1.02582
df
Mean Square
group
28.900
28.900
Error
21.200
2.650
Total
375.000
10
50.100
Corrected Total
F
10.906
Sig.
.011
Partial Eta
Squared
.577
Compare to regression
The t, standard error, CI
and p-value
are the same, and again the
coefficient is the difference between
means
Independent Sam ples Test
t-test for Equality of Means
t
dv
df
-3.302
Std. Error
Difference
Sig. (2-tailed)
Mean Difference
.011
-3.40000
1.02956
Lower
Upper
-5.77418
-1.02582
Coefficients a
Unstandardized
Coeffic ients
Model
1
Standardized
Coeffic ients
Std. Error
(Constant)
4.000
.728
group
3.400
1.030
a. Dependent Variable: dv
Beta
.760
Lower Bound
Upper Bound
5.494
Sig.
.001
2.321
5.679
3.302
.011
1.026
5.774
Lets
assume that
we believe there
is a linear
relationship
between X and Y.
Which set of
parameter values
will bring us
closest to
representing the
data accurately?
Y 2 2 X
y y
We begin by picking
some values, plugging
them into the equation,
and seeing how well the
implied values correspond
to the observed values
We can quantify what we
mean by how well by
examining the difference
between the modelimplied Y and the actual Y
value
This difference
y y between
our observed value and
the one predicted,
,
is often called error in
prediction, or the residual
Y 2 1X
Y 2 0 X
Things
are
getting better,
but certainly
things could
improve
Y 2 1X
Ah,
much better
Y 2 2 X
Now
thats very
nice
There is a perfect
correspondence
between the
predicted values
of Y and the
actual values of
Y