You are on page 1of 79

Linear correlation and

linear regression

Continuous outcome
(means)
Outcome
Variable
Are the observations independent or correlated?
Alternatives if the
normality assumption is
violated (and small
sample size):
independent correlated
Continuous
(eg pain
scale!
cognitive
function)
Ttest: compares means
bet"een t"o independent
groups
ANOVA: compares means
bet"een more than t"o
independent groups
Pearsons correlation
coefcient (linear
correlation): sho"s linear
correlation bet"een t"o
continuous variables
Linear regression:
multivariate regression
techni#ue used "hen the
outcome is continuous$ gives
slopes
Paired ttest: compares
means bet"een t"o related
groups (eg! the same
sub%ects before and after)
Repeated-measures
ANOVA: compares changes
over time in the means of t"o
or more groups (repeated
measurements)
Mixed models/!!
modeling: multivariate
regression techni#ues to
compare changes over time
bet"een t"o or more groups$
gives rate of change over
time
&on'parametric statistics
"ilcoxon sign-ran#
test: non'parametric
alternative to the paired ttest
"ilcoxon sum-ran# test
(()ann'*hitney + test):
non'parametric alternative to
the ttest
$rus#al-"allis test: non'
parametric alternative to
A&OVA
%pearman ran#
correlation coefcient:
non'parametric alternative to
,earson-s correlation
coe.cient

/ecall: Covariance
1
) )( (
) , ( cov
1

=
n
Y y X x
y x
n
i
i i

cov(0!1) 2 3 0 and 1 are positively correlated
cov(0!1) 4 3 0 and 1 are inversely correlated
cov(0!1) ( 3 0 and 1 are independent
5nterpreting Covariance

Correlation coe.cient

,earson-s Correlation Coe.cient is standardized covariance (unitless):


y x
y x ariance
r
var var
) , ( cov
=

Correlation

)easures the relative strength of the linear


relationship bet"een t"o variables

+nit'less

/anges bet"een 67 and 7

8he closer to 67! the stronger the negative linear


relationship

8he closer to 7! the stronger the positive linear


relationship

8he closer to 3! the "ea9er any positive linear relationship


:catter ,lots of ;ata "ith
Various Correlation
Coe.cients
Y
X
Y
X
Y
X
Y
X
Y
X
r = -1 r = -.6 r = 0
r = +.3 r = +1
Y
X
r = 0

Slide from: Statistics for Managers Using Microsoft Excel 4th Edition, 2004 rentice!"all

Y
X
Y
X
Y
Y
X
X
Linear relationships Curvilinear relationships
Linear Correlation

Slide from: Statistics for Managers Using Microsoft Excel 4th Edition, 2004 rentice!"all

Y
X
Y
X
Y
Y
X
X
Strong relationships Weak relationships
Linear Correlation

Slide from: Statistics for Managers Using Microsoft Excel 4th Edition, 2004 rentice!"all

Linear Correlation
Y
X
Y
X
No relationship

Slide from: Statistics for Managers Using Microsoft Excel 4th Edition, 2004 rentice!"all

Calculating by hand<
1
) (
1
) (
1
) )( (
var var
) , ( cov
#
1
2
1
2
1


= =

= =
=
n
y y
n
x x
n
y y x x
y x
y x ariance
r
n
i
i
n
i
i
n
i
i i

:impler calculation
formula<
y x
xy
n
i
i
n
i
i
n
i
i i
n
i
i
n
i
i
n
i
i i
SS SS
SS
y y x x
y y x x
n
y y
n
x x
n
y y x x
r
=


=

= =
=
= =
=
1
2
1
2
1
1
2
1
2
1
) ( ) (
) )( (
1
) (
1
) (
1
) )( (
#
y x
xy
SS SS
SS
r =
#
Numerator of
covariance
Numerators of
variance

&istri'ution o( t)e
correlation coefcient:
=note! li9e a proportion! the variance of the correlation coe.cient
depends on the correlation coe.cient itselfsubstitute in estimated r
2
1
)
#
(
2

=
n
r
r SE
The sample correlation coefficient follows a T-distribution
with n-2 degrees of freedom (since you have to estimate the
standard error).

Continuous outcome
(means)
Outcome
Variable
Are the observations independent or correlated?
Alternatives if the
normality assumption is
violated (and small
sample size):
independent correlated
Continuous
(eg pain
scale!
cognitive
function)
Ttest: compares means
bet"een t"o independent
groups
ANOVA: compares means
bet"een more than t"o
independent groups
Pearsons correlation
coefcient (linear
correlation): sho"s linear
correlation bet"een t"o
continuous variables
Linear regression:
multivariate regression
techni#ue used "hen the
outcome is continuous$ gives
slopes
Paired ttest: compares
means bet"een t"o related
groups (eg! the same
sub%ects before and after)
Repeated-measures
ANOVA: compares changes
over time in the means of t"o
or more groups (repeated
measurements)
Mixed models/!!
modeling: multivariate
regression techni#ues to
compare changes over time
bet"een t"o or more groups$
gives rate of change over
time
&on'parametric statistics
"ilcoxon sign-ran#
test: non'parametric
alternative to the paired ttest
"ilcoxon sum-ran# test
(()ann'*hitney + test):
non'parametric alternative to
the ttest
$rus#al-"allis test: non'
parametric alternative to
A&OVA
%pearman ran#
correlation coefcient:
non'parametric alternative to
,earson-s correlation
coe.cient

Linear regression
5n correlation! the t"o variables are treated as e#uals 5n regression!
one variable is considered independent ((predictor) variable (X) and
the other the dependent ((outcome) variable Y

*hat is >Linear??

/emember this:

Y=mX+B?
B
m

*hat-s :lope?
$ slo%e of 2 means that ever& 1!'nit change in (
&ields a 2!'nit change in )*

,rediction
+f &o' ,no- something a.o't (, this ,no-ledge hel%s &o'
%redict something a.o't )* (So'nd familiar/0so'nd
li,e conditional %ro.a.ilities/)

/egression e#uation<
i i i
x x y E + = ) 1 (
Expected value of y at a given level of x

,redicted value for an
individual<
&
i
2 3 4x
i
3 random error
i
5ollo-s a normal
distri.'tion
5ixed 6
exactl&
on the
line

Assumptions (or the @ne
print)

Linear regression assumes that<

7 8he relationship bet"een 0 and 1 is linear

A 1 is distributed normally at each value of


0

B 8he variance of 1 at every value of 0 is


the same (homogeneity of variances)

C 8he observations are independent



The standard error of ! given " is the average variability around the
regression line at any given value of ". #t is assumed to be e$ual at
all values of ".
%
y&x
%
y&x
%
y&x
%
y&x
%
y&x
%
y&x

'
A
(
A
y
i


x
y
y
i


C
(
47east s8'ares estimation
gave 's the line (9) that
minimi:ed ;
2

+ =
i i
x y
#
y
)
2
(
2
'
2
%%
total

Total s*uared distance o(
o'ser+ations (rom na,+e
mean o( -
Total variation
%%
reg

<istance from regression line to na=ve mean of &
Variability due to x (regression)



%%
residual
>ariance aro'nd the regression line
Additional variability not explained by
xwhat least squares method aims to
minimize

= = =
+ =
n
i
i i
n
i
n
i
i i
y y y y y y
1
2
1 1
2 2
) # ( ) # ( ) (
/egression ,icture
*
2
%%
reg
&%%
total

/ecall eEample: cognitive
function and vitamin ;

Fypothetical data loosely based on


G7H$ cross'sectional study of 733
middle'aged and older Iuropean
men

Cognitive function is measured by the


;igit :ymbol :ubstitution 8est (;::8)

7 Lee ;)! 8a%ar A! +lubaev A! et al Association bet"een AJ'hydroEyvitamin ; levels and cognitive performance
in middle'aged and older Iuropean men K &eurol &eurosurg ,sychiatry A33L Kul$M3(N):NAA'L

;istribution of vitamin ;
)ean( OB nmolPL
:tandard deviation ( BB
nmolPL

;istribution of ;::8
&ormally distributed
)ean ( AM points
:tandard deviation ( 73 points

Qour hypothetical datasets

5 generated four hypothetical


datasets! "ith increasing 8/+I
slopes (bet"een vit ; and ;::8):

3J points per 73 nmolPL

73 points per 73 nmolPL

7J points per 73 nmolPL



;ataset 7: no relationship

;ataset A: "ea9
relationship

;ataset B: "ea9 to
moderate relationship

;ataset C: moderate
relationship

8he >Rest @t? line
*egression
e$uation+
E(!
i
) 2, - ./vit
0
i
(in 1. nmol&2)

8he >Rest @t? line
Note how the line is
a little deceptive3 it
draws your eye4
ma5ing the
relationship appear
stronger than it
really is6
*egression
e$uation+
E(!
i
) 27 - ..8/vit
0
i
(in 1. nmol&2)

8he >Rest @t? line
*egression e$uation+
E(!
i
) 22 - 1../vit
0
i
(in 1. nmol&2)

8he >Rest @t? line
*egression e$uation+
E(!
i
) 2. - 1.8/vit 0
i

(in 1. nmol&2)
Note+ all the lines go
through the point
(794 2,)6

Istimating the intercept and
slope: least s#uares
estimation
44 7east S8'ares Estimation
$ little calc'l's0*
?hat are -e tr&ing to estimate/ :4 the slope4 from
?hat@s the constraint/ ?e are tr&ing to minimi:e the s8'ared distance (hence the Aleast s8'aresB) .et-een the
o.servations themselves and the %redicted val'es , or (also called the Aresid'alsB, or left!over 'nex%lained
varia.ilit&)
<ifference
i
2 y
i
(x 3 ) <ifference
i
2
2 (y
i
6 (x + ))
2
5ind the that gives the minim'm s'm of the s8'ared differences* "o- do &o' maximi:e a f'nction/ Ca,e the
derivativeD set it e8'al to :eroD and solve* C&%ical max1min %ro.lem from calc'l's0*
5rom here ta,es a little math tric,er& to solve for
*** 0 )) ( ( 2
)) )( ( ( 2 )) ( (
1
2
1 1
2
= + +
= +


=
= =
n
i
i i i i
n
i
i i i
n
i
i i
x x x y
x x y x y
d
d


/esulting formulas<
%lope (beta coefficient)
) (
) , (
#
x Var
y x Cov
=
) , ( y x
x
#
! &
#
: ;alc'late =
#ntercept
*egression line always goes through the point+

/elationship "ith
correlation
y
x
SD
SD
r
#
#
=
+n correlation, the t-o varia.les are treated as e8'als* +n regression, one varia.le is considered
inde%endent (2%redictor) varia.le (X) and the other the de%endent (2o'tcome) varia.le Y*

IEample: dataset C
y
x
SS
SS

#
%0x 99 nmol&2
%0y 1. points
'ov("4!) 179
points/nmol&2
(eta 179&99
2
..18
points per nmol&2
1.8 points per 1.
nmol&2
r 179&(1./99) ..;<
=r
r ..18 / (99&1.) ..;<

:igni@cance testing<
%lope
0istribution of slope > T
n-2
(:4s.e.( ))

#
F
3
: S
7
( 3 (no linear relationship)
F
7
: S
7
3 (linear relationship does
eEist)
)
#
*( *
0
#

e s

T
n-2
2

Qormula for the standard error of
beta (you "ill not have to
calculate by handT):
i i
n
i
i
x y
x x

#
# # and
) ( SS -here
1
2
x
+ =
=

=
x
x y
x
n
i
i i
SS
s
SS
n
y y
s
2
1
1
2
#
2
)
#
(
=


IEample: dataset C

:tandard error (beta) ( 33B

8
LM
( 37JP33B ( J! p43337

LJU Con@dence interval ( 33L to


3A7

/esidual Analysis: chec9
assumptions

8he residual for observation i! e


i
! is the
diVerence bet"een its observed and predicted
value

Chec9 the assumptions of regression by


eEamining the residuals

IEamine for linearity assumption

IEamine for constant variance for all levels of 0


(homoscedasticity)

Ivaluate normal distribution assumption

Ivaluate independence assumption

Wraphical Analysis of /esiduals

Can plot residuals vs 0


i i i
Y Y e
#
=

,redicted values<
i i
x y E * 1 20
#
+ =
?or @itamin 0 <8 nmol&2 (or <.8 in 1. nmol&2)+
F4 ) E * G ( E * 1 20
#
= + =
i
y

/esidual (
observed ' predicted
14
#
F4
#
4H
=
=
=
i i
i
i
y y
y
y
"<8
nmol&2
9;

/esidual Analysis for
Linearity
Not Linear
Linear

x
r
e
s
i
d
u
a
l
s
x
Y
x
Y
x
r
e
s
i
d
u
a
l
s

Slide from: Statistics for Managers Using Microsoft Excel 4th Edition, 2004 rentice!"all

/esidual Analysis for
Fomoscedasticity
Non-constant variance

Constant variance
x x
Y
x
x
Y
r
e
s
i
d
u
a
l
s
r
e
s
i
d
u
a
l
s

Slide from: Statistics for Managers Using Microsoft Excel 4th Edition, 2004 rentice!"all

Residual Analysis for
Independence
Not Independent
Independent
X
X
r
e
s
i
d
u
a
l
s
r
e
s
i
d
u
a
l
s
X
r
e
s
i
d
u
a
l
s

Slide from: Statistics for Managers Using Microsoft Excel 4th Edition, 2004 rentice!"all

/esidual plot! dataset C

)ultiple linear
regression<

*hat if age is a confounder here?

Older men have lo"er vitamin ;

Older men have poorer cognition

>Ad%ust? for age by putting age in


the model:

;::8 score ( intercept X


slope
7
xvitamin ; X slope
A
xage

A predictors: age and vit
;<

;iVerent B; vie"<

Qit a plane rather than a
line<
=n the plane4 the
slope for vitamin
0 is the same at
every age3 thus4
the slope for
vitamin 0
represents the
effect of vitamin
0 when age is
held constant.

I#uation of the >Rest @t?
plane<

;::8 score ( JB X 333BLxvitamin ;


(in 73 nmolPL) ' 3CO xage (in years)

,'value for vitamin ; 223J

,'value for age 43337

8hus! relationship "ith vitamin ; "as


due to confounding by ageT

)ultiple Linear /egression

)ore than one predictor<


I(y)( X
7
=0 X
A
=* X
B
=Y<
Iach regression coe.cient is the amount of
change in the outcome variable that "ould be
eEpected per one'unit change of the predictor! if
all other variables in the model "ere held
constant


Qunctions of multivariate
analysis:

;ontrol for confo'nders

Cest for interactions .et-een %redictors


(effect modification)

+m%rove %redictions

A ttest is linear regressionT

;ivide vitamin ; into t"o groups:

5nsu.cient vitamin ; (4J3 nmolPL)

:u.cient vitamin ; (2(J3 nmolPL)!


reference group

*e can evaluate these data "ith a ttest


or a linear regression<
000H * D 4I * F
4I
H * 10
E4
H * 10
E * J E * F2 40
2 2
GH
= =
+
=
= p T

As a linear regression<
Parameter ````````````````Standard
Variable Estimate Error t Value Pr > |t|
Intercept 40.07407 1.47511 27.17 .0001
insu!! "7.5#0$0 2.174%# "#.4$ 0.000&
#ntercept
represents the
mean value in
the sufficient
group.
%lope represents
the difference in
means between the
groups. 0ifference
is significant.

A&OVA is linear
regressionT

;ivide vitamin ; into three groups:

;e@cient (4AJ nmolPL)

5nsu.cient (2(AJ and 4J3 nmolPL)

:u.cient (2(J3 nmolPL)! reference group


;::8( ((value for su.cient) X
insu.cient
=(7 if
insu.cient) X
A
=(7 if de@cient)
8his is called >dummy coding?Z"here multiple
binary variables are created to represent being in
each category (or not) of a categorical variable

8he picture<
%ufficient vs.
#nsufficient
%ufficient vs.
0eficient

/esults<
Parameter Estimates
Parameter Standard
Variable '( Estimate Error t Value Pr > |t|
Intercept 1 40.07407 1.47&17 27.11 .0001
deficient 1 -9.87407 3.73950 -2.64 0.0096
insufficient 1 -6.87963 2.33719 -2.94 0.0041

5nterpretation:

8he de@cient group has a mean ;::8 LMN


points lo"er than the reference (su.cient)
group

8he insu.cient group has a mean ;::8 OMN


points lo"er than the reference (su.cient)
group

Other types of multivariate
regression

M'lti%le linear regression is for normall&


distri.'ted o'tcomes

7ogistic regression is for .inar& o'tcomes

;ox %ro%ortional ha:ards regression is 'sed -hen


time!to!event is the o'tcome
'ommon multivariate regression models.
K'tcome
(de%endent
varia.le)
Exam%le
o'tcome
varia.le
$%%ro%riate
m'ltivariate
regression
model
Exam%le e8'ation ?hat do the coefficients give
&o'/
;ontin'o's Llood
%ress're
2inear
regression
.lood %ress're (mm"g) 2
3 salt4salt cons'm%tion (ts%1da&) 3
age4age (&ears) 3 smo,er4ever
smo,er (&es211no20)
slo%esMtells &o' ho- m'ch
the o'tcome varia.le
increases for ever& 1!'nit
increase in each %redictor*
Linar& "igh .lood
%ress're
(&es1no)
2ogistic
regression
ln (odds of high .lood %ress're) 2
3 salt4salt cons'm%tion (ts%1da&) 3
age4age (&ears) 3 smo,er4ever
smo,er (&es211no20)
odds ratiosMtells &o' ho-
m'ch the odds of the
o'tcome increase for ever&
1!'nit increase in each
%redictor*
Cime!to!event Cime!to!
death
'ox regression ln (rate of death) 2
3 salt4salt cons'm%tion (ts%1da&) 3
age4age (&ears) 3 smo,er4ever
smo,er (&es211no20)
ha:ard ratiosMtells &o' ho-
m'ch the rate of the o'tcome
increases for ever& 1!'nit
increase in each %redictor*

)ultivariate regression
pitfalls

Aulti-collinearity

*esidual confounding

=verfitting

)ulticollinearity

Aulticollinearity arises -hen t-o varia.les that


meas're the same thing or similar things (e*g*,
-eight and LM+) are .oth incl'ded in a m'lti%le
regression modelD the& -ill, in effect, cancel each
other o't and generall& destro& &o'r model*

Model .'ilding and diagnostics are tric,&


.'sinessN

/esidual confounding

1ou cannot completely "ipe out


confounding simply by ad%usting
for variables in multiple regression
unless variables are measured "ith
zero error ("hich is usually
impossible)

IEample: meat eating and


mortality

)en "ho eat a lot of meat
are unhealthier for many
reasonsT
Sinha O, ;ross $P, Qra'.ard L+, 7eit:mann M5, Schat:,in $* Meat inta,e and mortalit&: a %ros%ective st'd& of over half a million %eo%le* Arc !n"ern #ed
200GD1IG:EI2!J1

)ortality ris9s<
Sinha O, ;ross $P, Qra'.ard L+, 7eit:mann M5, Schat:,in $* Meat inta,e and mortalit&: a %ros%ective st'd& of over half a million %eo%le* Arc !n"ern #ed
200GD1IG:EI2!J1

Over@tting

5n multivariate modeling! you can


get highly signi@cant but
meaningless results if you put too
many predictors in the model

8he model is @t perfectly to the


#uir9s of your particular sample!
but has no predictive ability in a
ne" sample

Over@tting: class data
eEample

5 as9ed :A: to automatically @nd


predictors of optimism in our class
dataset Fere-s the resulting linear
regression model:
Parameter Standard
Variable Estimate Error )*pe II SS ( Value Pr > (
Intercept 11.&0175 2.%&#41 11.%$0$7 15.$5 0.001%
e+ercise "0.2%10$ 0.0%7%& $.745$% &.&# 0.0117
sleep "1.%15%2 0.#%4%4 17.%&&1& 2#.5# 0.0004
obama 1.7#%%# 0.24#52 #%.01%44 51.05 <.0001
,linton "0.&#12& 0.170$$ 1&.1#4&% 2#.7# 0.0004
mat-.o/e 0.45$5# 0.10$$& 1#.%%%25 1&.#2 0.0011
Exercise4 sleep4 and high ratings for 'linton are negatively related to optimism (highly
significant6) and high ratings for =bama and high love of math are positively related to
optimism (highly significant6).

5f something seems to
good to be true<
'linton4 univariate+
arameter Standard
>aria.le 7a.el <5 Estimate Error t >al'e r R StS
+nterce%t +nterce%t 1 E*4FIHH 2*1F4JI 2*EE 0*01HH
'linton 'linton 1 ..2;<B9 ..2B111 ..<2 ..97B8
%leep4 Cnivariate+
arameter Standard
>aria.le 7a.el <5 Estimate Error t >al'e r R StS
+nterce%t +nterce%t 1 H*F0H1J 4*FIGH4 1*G0 0*0J11
sleep sleep 1 -..1;;,; ..78;81 -..22 ..,2B.
Exercise4 Cnivariate+
arameter Standard
>aria.le 7a.el <5 Estimate Error t >al'e r R StS
+nterce%t +nterce%t 1 I*IE1HG 0*HG1EF J*4I T*0001
exercise exercise 1 ..1<171 ..2.B.< ..<9 ..978,

)ore univariate models<
=bama4 Cnivariate+
arameter Standard
>aria.le 7a.el <5 Estimate Error t >al'e r R StS
+nterce%t +nterce%t 1 0*H210J 2*4F1FJ 0*F4 0*JFHG
obama obama 1 ..,B2B7 ..91<B9 2.B9 ...127
2ove of Aath4 univariate+
arameter Standard
>aria.le 7a.el <5 Estimate Error t >al'e r R StS
+nterce%t +nterce%t 1 F*J02J0 1*2EF02 2*GI 0*00JI
math2ove math2ove 1 ..8<;8< ..1<228 9..< ....88
'ompare
with
multivariate
result3
pD....1
'ompare
with
multivariate
result3
p...11

Over@tting
,ure noise variables still produce good R
A
values if the
model is over@tted 8he distribution of R
A
values from a
series of simulated regression models containing only
noise variables
(Qigure 7 from: Rabya9! )A *hat 1ou :ee )ay &ot Re *hat 1ou Wet: A Rrief! &ontechnical
5ntroduction to Over@tting in /egression'8ype )odels Psychosomatic Medicine OO:C77'CA7
(A33C))
O'le of th'm.: )o' need at
least 10 s'.Uects for each
additional %redictor
varia.le in the m'ltivariate
regression model*
/evie" of statistical tests
8he follo"ing table gives the appropriate
choice of a statistical test or measure of
association for various types of data
(outcome variables and predictor
variables) by study design
;ontin'o's o'tcome
Linar& %redictor
;ontin'o's %redictors
e*g*, .lood %ress're2 %o'nds 3 age 3 treatment (110)
Types of variables to be analyEed
%tatistical procedure
or measure of association
Fredictor variable&s
=utcome variable
'ross-sectional&case-control studies

'ategorical (G2 groups) 'ontinuous )N=@)
'ontinuous 'ontinuous %imple linear regression
Aultivariate
(categorical and
continuous)
'ontinuous Aultiple linear regression
'ategorical 'ategorical 'hi-s$uare test

(or ?isherHs
exact)
(inary (inary =dds ratio4 ris5 ratio
Aultivariate
(inary 2ogistic regression
'ohort %tudies&'linical Trials
(inary
(inary
*is5 ratio
'ategorical
Time-to-event
Iaplan-Aeier& log-ran5 test
Aultivariate Time-to-event 'ox-proportional haEards
regression4 haEard ratio
(inary (two groups) 'ontinuous
T-test
(inary
*an5s&ordinal Jilcoxon ran5-sum test
'ategorical 'ontinuous *epeated measures )N=@)
Aultivariate 'ontinuous Aixed models3 KEE modeling

Alternative summary:
statistics for various types of
outcome data
Outcome
Variable
Are the observations independent
or correlated?
Assumption
s
independent correlated
Continuous
(eg pain scale!
cognitive function)
8test
A&OVA
Linear correlation
Linear regression
,aired ttest
/epeated'measures
A&OVA
)iEed modelsPWII
modeling
Outcome is
normally distributed
(important for small
samples)
Outcome and
predictor have a
linear relationship
Rinary or
categorical
(eg fracture
yesPno)
;iVerence in
proportions
/elative ris9s
Chi's#uare test
Logistic regression
)c&emar-s test
Conditional logistic
regression
WII modeling
Chi's#uare test
assumes su.cient
numbers in each
cell (2(J)
8ime'to'event
(eg time to
fracture)
[aplan')eier statistics
CoE regression
nPa
CoE regression
assumes
proportional hazards
bet"een groups

Continuous outcome
(means)$ F/, AJLPF/,
AOA
Outcome
Variable
Are the observations independent or correlated?
Alternatives if the
normality assumption is
violated (and small
sample size):
independent correlated
Continuous
(eg pain
scale!
cognitive
function)
Ttest: compares means
bet"een t"o independent
groups
ANOVA: compares means
bet"een more than t"o
independent groups
Pearsons correlation
coefcient (linear
correlation): sho"s linear
correlation bet"een t"o
continuous variables
Linear regression:
multivariate regression
techni#ue used "hen the
outcome is continuous$ gives
slopes
Paired ttest: compares
means bet"een t"o related
groups (eg! the same
sub%ects before and after)
Repeated-measures
ANOVA: compares changes
over time in the means of t"o
or more groups (repeated
measurements)
Mixed models/!!
modeling: multivariate
regression techni#ues to
compare changes over time
bet"een t"o or more groups$
gives rate of change over
time
&on'parametric statistics
"ilcoxon sign-ran#
test: non'parametric
alternative to the paired ttest
"ilcoxon sum-ran# test
(()ann'*hitney + test):
non'parametric alternative to
the ttest
$rus#al-"allis test: non'
parametric alternative to
A&OVA
%pearman ran#
correlation coefcient:
non'parametric alternative to
,earson-s correlation
coe.cient

Rinary or categorical
outcomes (proportions)$ F/,
AJLPF/, AO7
Outcom
e
Variable
Are the observations correlated? Alternative to the
chi's#uare test if
sparse cells:
independent correlated
Rinary or
categorical
(eg
fracture!
yesPno)
.)i-s*uare test:
compares proportions
bet"een t"o or more
groups
Relati+e ris#s: odds
ratios or ris9 ratios
Logistic regression:
multivariate techni#ue
used "hen outcome is
binary$ gives multivariate'
ad%usted odds ratios
McNemars c)i-s*uare
test: compares binary
outcome bet"een correlated
groups (eg! before and after)
.onditional logistic
regression: multivariate
regression techni#ue for a
binary outcome "hen groups
are correlated (eg! matched
data)
!! modeling: multivariate
regression techni#ue for a
binary outcome "hen groups
are correlated (eg! repeated
measures)
/is)ers exact test:
compares proportions bet"een
independent groups "hen
there are sparse data (some
cells 4J)
McNemars exact test:
compares proportions bet"een
correlated groups "hen there
are sparse data (some cells
4J)

8ime'to'event outcome
(survival data)$ F/, AOA
Outcome
Variable
Are the observation groups independent or correlated? )odi@cations
to CoE
regression if
proportional'
hazards is
violated:
independent correlated
8ime'to'
event (eg!
time to
fracture)
$aplan-Meier statistics: estimates survival
functions for each group (usually displayed graphically)$
compares survival functions "ith log'ran9 test
.ox regression: )ultivariate techni#ue for time'to'
event data$ gives multivariate'ad%usted hazard ratios
nPa (already
over time)
8ime'dependent
predictors or
time'dependent
hazard ratios
(tric9yT)

You might also like