Professional Documents
Culture Documents
IntroductiontoCorrelationandRegressionAnalysis
MultivariableMethods
Page:1|2|3|4|5|6|7|8|9|10
printall
IntroductiontoCorrelationand
RegressionAnalysis
Contents
Inthissectionwewillfirstdiscusscorrelationanalysis,whichisusedtoquantifythe
associationbetweentwocontinuousvariables(e.g.,betweenanindependentanda
dependentvariableorbetweentwoindependentvariables).Regressionanalysisis
arelatedtechniquetoassesstherelationshipbetweenanoutcomevariableand
oneormoreriskfactorsorconfoundingvariables.Theoutcomevariableisalso
calledtheresponseordependentvariableandtheriskfactorsandconfounders
arecalledthepredictors,orexplanatoryorindependentvariables.Inregression
analysis,thedependentvariableisdenoted"y"andtheindependentvariablesare
denotedby"x".
[NOTE:Theterm"predictor"canbemisleadingifitisinterpretedastheability
topredictevenbeyondthelimitsofthedata.Also,theterm"explanatory
variable"mightgiveanimpressionofacausaleffectinasituationinwhich
inferencesshouldbelimitedtoidentifyingassociations.Theterms
"independent"and"dependent"variablearelesssubjecttothese
interpretationsastheydonotstronglyimplycauseandeffect.
CorrelationAnalysis
Introductionto
Correlationand
Regression
Analysis
Correlation
Analysis
Example
Correlationof
GestationalAge
andBirthWeight
ModuleTopics
AllModules
Incorrelationanalysis,weestimateasamplecorrelationcoefficient,more
specificallythePearsonProductMomentcorrelationcoefficient.Thesample
correlationcoefficient,denotedr,
rangesbetween1and+1andquantifiesthedirectionandstrengthofthelinear
associationbetweenthetwovariables.Thecorrelationbetweentwovariablescan
bepositive(i.e.,higherlevelsofonevariableareassociatedwithhigherlevelsof
theother)ornegative(i.e.,higherlevelsofonevariableareassociatedwithlower
levelsoftheother).
Thesignofthecorrelationcoefficientindicatesthedirectionoftheassociation.The
magnitudeofthecorrelationcoefficientindicatesthestrengthofthe
association.
Forexample,acorrelationofr=0.9suggestsastrong,positiveassociation
betweentwovariables,whereasacorrelationofr=0.2suggestaweak,negative
association.Acorrelationclosetozerosuggestsnolinearassociationbetweentwo
continuousvariables.
LISA:[Ifindthisdescriptionconfusing.Yousaythatthecorrelation
coefficientisameasureofthe"strengthofassociation",butifyouthink
aboutit,isn'ttheslopeabettermeasureofassociation?Weuseriskratios
andoddsratiostoquantifythestrengthofassociation,i.e.,whenanexposure
ispresentithashowmanytimesmorelikelytheoutcomeis.Theanalogous
quantityincorrelationistheslope,i.e.,foragivenincrementinthe
http://sphweb.bumc.bu.edu/otlt/MPHModules/BS/BS704_Multivariable/BS704_Multivariable5.html
1/6
5/6/2016
IntroductiontoCorrelationandRegressionAnalysis
independentvariable,howmanytimesisthedependentvariablegoingto
increase?And"r"(orperhapsbetterRsquared)isameasureofhowmuchof
thevariabilityinthedependentvariablecanbeaccountedforbydifferences
intheindependentvariable.Theanalogousmeasureforadichotomous
variableandadichotomousoutcomewouldbetheattributableproportion,
i.e.,theproportionofYthatcanbeattributedtothepresenceofthe
exposure.]
Itisimportanttonotethattheremaybeanonlinearassociationbetweentwo
continuousvariables,butcomputationofacorrelationcoefficientdoesnotdetect
this.Therefore,itisalwaysimportanttoevaluatethedatacarefullybefore
computingacorrelationcoefficient.Graphicaldisplaysareparticularlyusefulto
exploreassociationsbetweenvariables.
Thefigurebelowshowsfourhypotheticalscenariosinwhichonecontinuous
variableisplottedalongtheXaxisandtheotheralongtheYaxis.
Scenario1depictsastrongpositiveassociation(r=0.9),similartowhatwe
mightseeforthecorrelationbetweeninfantbirthweightandbirthlength.
Scenario2depictsaweakerassociation(r=0,2)thatwemightexpecttosee
betweenageandbodymassindex(whichtendstoincreasewithage).
Scenario3mightdepictthelackofassociation(rapproximately0)betweenthe
extentofmediaexposureinadolescenceandageatwhichadolescentsinitiate
sexualactivity.
Scenario4mightdepictthestrongnegativeassociation(r=0.9)generally
observedbetweenthenumberofhoursofaerobicexerciseperweekand
percentbodyfat.
ExampleCorrelationofGestationalAgeandBirthWeight
http://sphweb.bumc.bu.edu/otlt/MPHModules/BS/BS704_Multivariable/BS704_Multivariable5.html
2/6
5/6/2016
IntroductiontoCorrelationandRegressionAnalysis
ExampleCorrelationofGestationalAgeandBirthWeight
Asmallstudyisconductedinvolving17infantstoinvestigatetheassociation
betweengestationalageatbirth,measuredinweeks,andbirthweight,measuredin
grams.
Wewishtoestimatetheassociationbetweengestationalageandinfantbirth
weight.Inthisexample,birthweightisthedependentvariableandgestationalage
istheindependentvariable.Thusy=birthweightandx=gestationalage.Thedata
aredisplayedinascatterdiagraminthefigurebelow.
Eachpointrepresentsan(x,y)pair(inthiscasethegestationalage,measuredin
weeks,andthebirthweight,measuredingrams).Notethattheindependent
variableisonthehorizontalaxis(orXaxis),andthedependentvariableisonthe
verticalaxis(orYaxis).Thescatterplotshowsapositiveordirectassociation
betweengestationalageandbirthweight.Infantswithshortergestationalagesare
morelikelytobebornwithlowerweightsandinfantswithlongergestationalages
http://sphweb.bumc.bu.edu/otlt/MPHModules/BS/BS704_Multivariable/BS704_Multivariable5.html
3/6
5/6/2016
IntroductiontoCorrelationandRegressionAnalysis
aremorelikelytobebornwithhigherweights.
Theformulaforthesamplecorrelationcoefficientis
whereCov(x,y)isthecovarianceofxandydefinedas
arethesamplevariancesofxandy,definedas
Thevariancesofxandymeasurethevariabilityofthexscoresandyscores
aroundtheirrespectivesamplemeans(
,consideredseparately).Thecovariancemeasuresthevariabilityofthe
(x,y)pairsaroundthemeanofxandmeanofy,consideredsimultaneously.
Tocomputethesamplecorrelationcoefficient,weneedtocomputethevarianceof
gestationalage,thevarianceofbirthweightandalsothecovarianceofgestational
ageandbirthweight.
Wefirstsummarizethegestationalagedata.Themeangestationalageis:
Tocomputethevarianceofgestationalage,weneedtosumthesquareddeviations
(ordifferences)betweeneachobservedgestationalageandthemeangestational
age.Thecomputationsaresummarizedbelow.
Thevarianceofgestationalageis:
http://sphweb.bumc.bu.edu/otlt/MPHModules/BS/BS704_Multivariable/BS704_Multivariable5.html
4/6
5/6/2016
IntroductiontoCorrelationandRegressionAnalysis
Next,wesummarizethebirthweightdata.Themeanbirthweightis:
Thevarianceofbirthweightiscomputedjustaswedidforgestationalageas
showninthetablebelow.
Thevarianceofbirthweightis:
Nextwecomputethecovariance,
Tocomputethecovarianceofgestationalageandbirthweight,weneedtomultiply
thedeviationfromthemeangestationalagebythedeviationfromthemeanbirth
weightforeachparticipant(i.e.,
Thecomputationsaresummarizedbelow.Noticethatwesimplycopythedeviations
fromthemeangestationalageandbirthweightfromthetwotablesaboveintothe
tablebelowandmultiply.
http://sphweb.bumc.bu.edu/otlt/MPHModules/BS/BS704_Multivariable/BS704_Multivariable5.html
5/6
5/6/2016
IntroductiontoCorrelationandRegressionAnalysis
Thecovarianceofgestationalageandbirthweightis:
Wenowcomputethesamplecorrelationcoefficient:
Notsurprisingly,thesamplecorrelationcoefficientindicatesastrongpositive
correlation.
Aswenoted,samplecorrelationcoefficientsrangefrom1to+1.Inpractice,
meaningfulcorrelations(i.e.,correlationsthatareclinicallyorpracticallyimportant)
canbeassmallas0.4(or0.4)forpositive(ornegative)associations.Thereare
alsostatisticalteststodeterminewhetheranobservedcorrelationisstatistically
significantornot(i.e.,statisticallysignificantlydifferentfromzero).Proceduresto
testwhetheranobservedsamplecorrelationissuggestiveofastatistically
significantcorrelationaredescribedindetailinKleinbaum,KupperandMuller.1
returntotop|previouspage|nextpage
Content2013.AllRightsReserved.
Datelastmodified:January17,2013.
BostonUniversitySchoolofPublicHealth
mobilepage
http://sphweb.bumc.bu.edu/otlt/MPHModules/BS/BS704_Multivariable/BS704_Multivariable5.html
6/6