You are on page 1of 6

5/6/2016

IntroductiontoCorrelationandRegressionAnalysis

MultivariableMethods
Page:1|2|3|4|5|6|7|8|9|10

printall

IntroductiontoCorrelationand
RegressionAnalysis

Contents

Inthissectionwewillfirstdiscusscorrelationanalysis,whichisusedtoquantifythe
associationbetweentwocontinuousvariables(e.g.,betweenanindependentanda
dependentvariableorbetweentwoindependentvariables).Regressionanalysisis
arelatedtechniquetoassesstherelationshipbetweenanoutcomevariableand
oneormoreriskfactorsorconfoundingvariables.Theoutcomevariableisalso
calledtheresponseordependentvariableandtheriskfactorsandconfounders
arecalledthepredictors,orexplanatoryorindependentvariables.Inregression
analysis,thedependentvariableisdenoted"y"andtheindependentvariablesare
denotedby"x".
[NOTE:Theterm"predictor"canbemisleadingifitisinterpretedastheability
topredictevenbeyondthelimitsofthedata.Also,theterm"explanatory
variable"mightgiveanimpressionofacausaleffectinasituationinwhich
inferencesshouldbelimitedtoidentifyingassociations.Theterms
"independent"and"dependent"variablearelesssubjecttothese
interpretationsastheydonotstronglyimplycauseandeffect.

CorrelationAnalysis

Introductionto
Correlationand
Regression
Analysis
Correlation
Analysis
Example
Correlationof
GestationalAge
andBirthWeight

ModuleTopics
AllModules

Incorrelationanalysis,weestimateasamplecorrelationcoefficient,more
specificallythePearsonProductMomentcorrelationcoefficient.Thesample
correlationcoefficient,denotedr,
rangesbetween1and+1andquantifiesthedirectionandstrengthofthelinear
associationbetweenthetwovariables.Thecorrelationbetweentwovariablescan
bepositive(i.e.,higherlevelsofonevariableareassociatedwithhigherlevelsof
theother)ornegative(i.e.,higherlevelsofonevariableareassociatedwithlower
levelsoftheother).
Thesignofthecorrelationcoefficientindicatesthedirectionoftheassociation.The
magnitudeofthecorrelationcoefficientindicatesthestrengthofthe
association.
Forexample,acorrelationofr=0.9suggestsastrong,positiveassociation
betweentwovariables,whereasacorrelationofr=0.2suggestaweak,negative
association.Acorrelationclosetozerosuggestsnolinearassociationbetweentwo
continuousvariables.

LISA:[Ifindthisdescriptionconfusing.Yousaythatthecorrelation
coefficientisameasureofthe"strengthofassociation",butifyouthink
aboutit,isn'ttheslopeabettermeasureofassociation?Weuseriskratios
andoddsratiostoquantifythestrengthofassociation,i.e.,whenanexposure
ispresentithashowmanytimesmorelikelytheoutcomeis.Theanalogous
quantityincorrelationistheslope,i.e.,foragivenincrementinthe
http://sphweb.bumc.bu.edu/otlt/MPHModules/BS/BS704_Multivariable/BS704_Multivariable5.html

1/6

5/6/2016

IntroductiontoCorrelationandRegressionAnalysis

independentvariable,howmanytimesisthedependentvariablegoingto
increase?And"r"(orperhapsbetterRsquared)isameasureofhowmuchof
thevariabilityinthedependentvariablecanbeaccountedforbydifferences
intheindependentvariable.Theanalogousmeasureforadichotomous
variableandadichotomousoutcomewouldbetheattributableproportion,
i.e.,theproportionofYthatcanbeattributedtothepresenceofthe
exposure.]

Itisimportanttonotethattheremaybeanonlinearassociationbetweentwo
continuousvariables,butcomputationofacorrelationcoefficientdoesnotdetect
this.Therefore,itisalwaysimportanttoevaluatethedatacarefullybefore
computingacorrelationcoefficient.Graphicaldisplaysareparticularlyusefulto
exploreassociationsbetweenvariables.
Thefigurebelowshowsfourhypotheticalscenariosinwhichonecontinuous
variableisplottedalongtheXaxisandtheotheralongtheYaxis.

Scenario1depictsastrongpositiveassociation(r=0.9),similartowhatwe
mightseeforthecorrelationbetweeninfantbirthweightandbirthlength.
Scenario2depictsaweakerassociation(r=0,2)thatwemightexpecttosee
betweenageandbodymassindex(whichtendstoincreasewithage).
Scenario3mightdepictthelackofassociation(rapproximately0)betweenthe
extentofmediaexposureinadolescenceandageatwhichadolescentsinitiate
sexualactivity.
Scenario4mightdepictthestrongnegativeassociation(r=0.9)generally
observedbetweenthenumberofhoursofaerobicexerciseperweekand
percentbodyfat.

ExampleCorrelationofGestationalAgeandBirthWeight

http://sphweb.bumc.bu.edu/otlt/MPHModules/BS/BS704_Multivariable/BS704_Multivariable5.html

2/6

5/6/2016

IntroductiontoCorrelationandRegressionAnalysis

ExampleCorrelationofGestationalAgeandBirthWeight
Asmallstudyisconductedinvolving17infantstoinvestigatetheassociation
betweengestationalageatbirth,measuredinweeks,andbirthweight,measuredin
grams.

Wewishtoestimatetheassociationbetweengestationalageandinfantbirth
weight.Inthisexample,birthweightisthedependentvariableandgestationalage
istheindependentvariable.Thusy=birthweightandx=gestationalage.Thedata
aredisplayedinascatterdiagraminthefigurebelow.

Eachpointrepresentsan(x,y)pair(inthiscasethegestationalage,measuredin
weeks,andthebirthweight,measuredingrams).Notethattheindependent
variableisonthehorizontalaxis(orXaxis),andthedependentvariableisonthe
verticalaxis(orYaxis).Thescatterplotshowsapositiveordirectassociation
betweengestationalageandbirthweight.Infantswithshortergestationalagesare
morelikelytobebornwithlowerweightsandinfantswithlongergestationalages
http://sphweb.bumc.bu.edu/otlt/MPHModules/BS/BS704_Multivariable/BS704_Multivariable5.html

3/6

5/6/2016

IntroductiontoCorrelationandRegressionAnalysis

aremorelikelytobebornwithhigherweights.
Theformulaforthesamplecorrelationcoefficientis

whereCov(x,y)isthecovarianceofxandydefinedas

arethesamplevariancesofxandy,definedas

Thevariancesofxandymeasurethevariabilityofthexscoresandyscores
aroundtheirrespectivesamplemeans(
,consideredseparately).Thecovariancemeasuresthevariabilityofthe
(x,y)pairsaroundthemeanofxandmeanofy,consideredsimultaneously.
Tocomputethesamplecorrelationcoefficient,weneedtocomputethevarianceof
gestationalage,thevarianceofbirthweightandalsothecovarianceofgestational
ageandbirthweight.
Wefirstsummarizethegestationalagedata.Themeangestationalageis:

Tocomputethevarianceofgestationalage,weneedtosumthesquareddeviations
(ordifferences)betweeneachobservedgestationalageandthemeangestational
age.Thecomputationsaresummarizedbelow.

Thevarianceofgestationalageis:

http://sphweb.bumc.bu.edu/otlt/MPHModules/BS/BS704_Multivariable/BS704_Multivariable5.html

4/6

5/6/2016

IntroductiontoCorrelationandRegressionAnalysis

Next,wesummarizethebirthweightdata.Themeanbirthweightis:

Thevarianceofbirthweightiscomputedjustaswedidforgestationalageas
showninthetablebelow.

Thevarianceofbirthweightis:

Nextwecomputethecovariance,

Tocomputethecovarianceofgestationalageandbirthweight,weneedtomultiply
thedeviationfromthemeangestationalagebythedeviationfromthemeanbirth
weightforeachparticipant(i.e.,

Thecomputationsaresummarizedbelow.Noticethatwesimplycopythedeviations
fromthemeangestationalageandbirthweightfromthetwotablesaboveintothe
tablebelowandmultiply.

http://sphweb.bumc.bu.edu/otlt/MPHModules/BS/BS704_Multivariable/BS704_Multivariable5.html

5/6

5/6/2016

IntroductiontoCorrelationandRegressionAnalysis

Thecovarianceofgestationalageandbirthweightis:

Wenowcomputethesamplecorrelationcoefficient:

Notsurprisingly,thesamplecorrelationcoefficientindicatesastrongpositive
correlation.
Aswenoted,samplecorrelationcoefficientsrangefrom1to+1.Inpractice,
meaningfulcorrelations(i.e.,correlationsthatareclinicallyorpracticallyimportant)
canbeassmallas0.4(or0.4)forpositive(ornegative)associations.Thereare
alsostatisticalteststodeterminewhetheranobservedcorrelationisstatistically
significantornot(i.e.,statisticallysignificantlydifferentfromzero).Proceduresto
testwhetheranobservedsamplecorrelationissuggestiveofastatistically
significantcorrelationaredescribedindetailinKleinbaum,KupperandMuller.1

returntotop|previouspage|nextpage
Content2013.AllRightsReserved.
Datelastmodified:January17,2013.
BostonUniversitySchoolofPublicHealth
mobilepage

http://sphweb.bumc.bu.edu/otlt/MPHModules/BS/BS704_Multivariable/BS704_Multivariable5.html

6/6

You might also like