Professional Documents
Culture Documents
linear regression
Continuous outcome
(means)
Outcome
Variable
Are the observations independent or correlated?
Alternatives if the
normality assumption is
violated (and small
sample size):
independent correlated
Continuous
(eg pain
scale!
cognitive
function)
Ttest: compares means
bet"een t"o independent
groups
ANOVA: compares means
bet"een more than t"o
independent groups
Pearsons correlation
coefcient (linear
correlation): sho"s linear
correlation bet"een t"o
continuous variables
Linear regression:
multivariate regression
techni#ue used "hen the
outcome is continuous$ gives
slopes
Paired ttest: compares
means bet"een t"o related
groups (eg! the same
sub%ects before and after)
Repeated-measures
ANOVA: compares changes
over time in the means of t"o
or more groups (repeated
measurements)
Mixed models/!!
modeling: multivariate
regression techni#ues to
compare changes over time
bet"een t"o or more groups$
gives rate of change over
time
&on'parametric statistics
"ilcoxon sign-ran#
test: non'parametric
alternative to the paired ttest
"ilcoxon sum-ran# test
(()ann'*hitney + test):
non'parametric alternative to
the ttest
$rus#al-"allis test: non'
parametric alternative to
A&OVA
%pearman ran#
correlation coefcient:
non'parametric alternative to
,earson-s correlation
coe.cient
/ecall: Covariance
1
) )( (
) , ( cov
1
=
n
Y y X x
y x
n
i
i i
cov(0!1) 2 3 0 and 1 are positively correlated
cov(0!1) 4 3 0 and 1 are inversely correlated
cov(0!1) ( 3 0 and 1 are independent
5nterpreting Covariance
Correlation coe.cient
+nit'less
Slide from: Statistics for Managers Using Microsoft Excel 4th Edition, 2004 rentice!"all
Y
X
Y
X
Y
Y
X
X
Linear relationships Curvilinear relationships
Linear Correlation
Slide from: Statistics for Managers Using Microsoft Excel 4th Edition, 2004 rentice!"all
Y
X
Y
X
Y
Y
X
X
Strong relationships Weak relationships
Linear Correlation
Slide from: Statistics for Managers Using Microsoft Excel 4th Edition, 2004 rentice!"all
Linear Correlation
Y
X
Y
X
No relationship
Slide from: Statistics for Managers Using Microsoft Excel 4th Edition, 2004 rentice!"all
Calculating by hand<
1
) (
1
) (
1
) )( (
var var
) , ( cov
#
1
2
1
2
1
= =
= =
=
n
y y
n
x x
n
y y x x
y x
y x ariance
r
n
i
i
n
i
i
n
i
i i
:impler calculation
formula<
y x
xy
n
i
i
n
i
i
n
i
i i
n
i
i
n
i
i
n
i
i i
SS SS
SS
y y x x
y y x x
n
y y
n
x x
n
y y x x
r
=
=
= =
=
= =
=
1
2
1
2
1
1
2
1
2
1
) ( ) (
) )( (
1
) (
1
) (
1
) )( (
#
y x
xy
SS SS
SS
r =
#
Numerator of
covariance
Numerators of
variance
&istri'ution o( t)e
correlation coefcient:
=note! li9e a proportion! the variance of the correlation coe.cient
depends on the correlation coe.cient itselfsubstitute in estimated r
2
1
)
#
(
2
=
n
r
r SE
The sample correlation coefficient follows a T-distribution
with n-2 degrees of freedom (since you have to estimate the
standard error).
Continuous outcome
(means)
Outcome
Variable
Are the observations independent or correlated?
Alternatives if the
normality assumption is
violated (and small
sample size):
independent correlated
Continuous
(eg pain
scale!
cognitive
function)
Ttest: compares means
bet"een t"o independent
groups
ANOVA: compares means
bet"een more than t"o
independent groups
Pearsons correlation
coefcient (linear
correlation): sho"s linear
correlation bet"een t"o
continuous variables
Linear regression:
multivariate regression
techni#ue used "hen the
outcome is continuous$ gives
slopes
Paired ttest: compares
means bet"een t"o related
groups (eg! the same
sub%ects before and after)
Repeated-measures
ANOVA: compares changes
over time in the means of t"o
or more groups (repeated
measurements)
Mixed models/!!
modeling: multivariate
regression techni#ues to
compare changes over time
bet"een t"o or more groups$
gives rate of change over
time
&on'parametric statistics
"ilcoxon sign-ran#
test: non'parametric
alternative to the paired ttest
"ilcoxon sum-ran# test
(()ann'*hitney + test):
non'parametric alternative to
the ttest
$rus#al-"allis test: non'
parametric alternative to
A&OVA
%pearman ran#
correlation coefcient:
non'parametric alternative to
,earson-s correlation
coe.cient
Linear regression
5n correlation! the t"o variables are treated as e#uals 5n regression!
one variable is considered independent ((predictor) variable (X) and
the other the dependent ((outcome) variable Y
*hat is >Linear??
/emember this:
Y=mX+B?
B
m
*hat-s :lope?
$ slo%e of 2 means that ever& 1!'nit change in (
&ields a 2!'nit change in )*
,rediction
+f &o' ,no- something a.o't (, this ,no-ledge hel%s &o'
%redict something a.o't )* (So'nd familiar/0so'nd
li,e conditional %ro.a.ilities/)
/egression e#uation<
i i i
x x y E + = ) 1 (
Expected value of y at a given level of x
,redicted value for an
individual<
&
i
2 3 4x
i
3 random error
i
5ollo-s a normal
distri.'tion
5ixed 6
exactl&
on the
line
Assumptions (or the @ne
print)
%%
residual
>ariance aro'nd the regression line
Additional variability not explained by
xwhat least squares method aims to
minimize
= = =
+ =
n
i
i i
n
i
n
i
i i
y y y y y y
1
2
1 1
2 2
) # ( ) # ( ) (
/egression ,icture
*
2
%%
reg
&%%
total
/ecall eEample: cognitive
function and vitamin ;
=
= =
n
i
i i i i
n
i
i i i
n
i
i i
x x x y
x x y x y
d
d
/esulting formulas<
%lope (beta coefficient)
) (
) , (
#
x Var
y x Cov
=
) , ( y x
x
#
! &
#
: ;alc'late =
#ntercept
*egression line always goes through the point+
/elationship "ith
correlation
y
x
SD
SD
r
#
#
=
+n correlation, the t-o varia.les are treated as e8'als* +n regression, one varia.le is considered
inde%endent (2%redictor) varia.le (X) and the other the de%endent (2o'tcome) varia.le Y*
IEample: dataset C
y
x
SS
SS
#
%0x 99 nmol&2
%0y 1. points
'ov("4!) 179
points/nmol&2
(eta 179&99
2
..18
points per nmol&2
1.8 points per 1.
nmol&2
r 179&(1./99) ..;<
=r
r ..18 / (99&1.) ..;<
:igni@cance testing<
%lope
0istribution of slope > T
n-2
(:4s.e.( ))
#
F
3
: S
7
( 3 (no linear relationship)
F
7
: S
7
3 (linear relationship does
eEist)
)
#
*( *
0
#
e s
T
n-2
2
Qormula for the standard error of
beta (you "ill not have to
calculate by handT):
i i
n
i
i
x y
x x
#
# # and
) ( SS -here
1
2
x
+ =
=
=
x
x y
x
n
i
i i
SS
s
SS
n
y y
s
2
1
1
2
#
2
)
#
(
=
IEample: dataset C
8
LM
( 37JP33B ( J! p43337
x
r
e
s
i
d
u
a
l
s
x
Y
x
Y
x
r
e
s
i
d
u
a
l
s
Slide from: Statistics for Managers Using Microsoft Excel 4th Edition, 2004 rentice!"all
/esidual Analysis for
Fomoscedasticity
Non-constant variance
Constant variance
x x
Y
x
x
Y
r
e
s
i
d
u
a
l
s
r
e
s
i
d
u
a
l
s
Slide from: Statistics for Managers Using Microsoft Excel 4th Edition, 2004 rentice!"all
Residual Analysis for
Independence
Not Independent
Independent
X
X
r
e
s
i
d
u
a
l
s
r
e
s
i
d
u
a
l
s
X
r
e
s
i
d
u
a
l
s
Slide from: Statistics for Managers Using Microsoft Excel 4th Edition, 2004 rentice!"all
/esidual plot! dataset C
)ultiple linear
regression<
Qunctions of multivariate
analysis:
+m%rove %redictions
A ttest is linear regressionT
5nterpretation:
Aulti-collinearity
*esidual confounding
=verfitting
)ulticollinearity