Professional Documents
Culture Documents
2012UpdatesinTherapeutics:
ThePharmacotherapyPreparatoryReview&
RecertificationCourse
Biostatistics:ARefresher
KevinM.Sowinski,Pharm.D.,FCCP
PurdueUniversity,CollegeofPharmacy
IndianaUniversity,SchoolofMedicine
WestLafayetteandIndianapolis,IN
ConflictofInterestDisclosures
Noconflictsofinteresttodiscloserelatedtothis
presentation
Outline
Purpose:Whatthisisandisnt
Introduction:WhatdoIneedtoknow?
Variables
Descriptivestatistics
Inferentialstatistics
Hypothesistesting
Statisticaltests
Decisionerrors
09/05/2012
Statistics
..collecting,classifying,summarizing,and
analyzingdata(demystifying?)
Toolsforquantifyingclinicalandlaboratory
datainameaningfulway
Assistsindeterminingwhether/howmucha
treatmentorprocedureaffectsagroup
Whypharmacistsneedtoknowstatistics?
Hopefullyobvioustothisgroup
Moreimportantly:WHATdoIneedtoknow
Page 2-126
Whatdoyouneedtoknow?
Descriptivestatistics/simplestatistics
Mean,median,frequency,SD,range,CI
Chisquare;Fisherexacttest
ttest(s)
KaplanMeier,Coxproportionalhazards
Analysisofvariance
Correlation
Regression(linear,multiple,logistic,other)
Multivariateanalysis
Wilcoxonranksumtest(nonparametric)
Pages 2-126-7
Statistics:WHYdoyouneedtoknowit?
Domain2:Retrieval,Generation,
Interpretation,andDisseminationof
KnowledgeinPharmacotherapy(25%)
Interpretbiomedicalliteraturewithrespecttostudydesignand
methodology,statisticalanalysis,andsignificanceofreported
dataandconclusions.
Knowledgeofbiostatisticalmethods,clinicalandstatistical
significance,researchhypothesisgeneration,researchdesign
andmethodology,andprotocolandproposaldevelopment
Page 2-126
09/05/2012
TypesofVariables/Data
Discretevariables
Canonlytakealimitednumberofvalueswithina
givenrange
Nominal:Classifiedintogroupsinanunordered
mannerandwithnoindicationofrelativeseverity
Sex(M/F),mortality(yes/no),diseasestate(present/absent)
Ordinal:Rankedinaspecificorderbutwithno
consistentlevelofmagnitudeofdifferencebetween
ranks
NYHAfunctionalclass:I,II,III,IV
COMMONERROR:
Useofmeans(SDs)withordinaldata.
Page 2-127
TypesofVariables/Data
ContinuousVariables
Countingvariables,cantakeonanyvalue
withinagivenrange
IntervalScaled:Datarankedinaspecificorder
withaconsistentchangeinmagnitude
betweenunits;thezeropointisarbitrary
degreesFahrenheit
RatioScaled:Likeinterval butwithan
absolutezero
degreesKelvin,pulse,BP,time,distance
Page 2-127-8
TypesofStatistics:Descriptivestatistics
Visualmethodsofdescribingdata
Frequencydistribution
Histogram
Scatterplot
Page 2-128
09/05/2012
TypesofStatistics:Descriptivestatistics
Histogram
16
14
Frequency
12
10
8
6
4
2
0
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1.1
Descriptivestatistics:Numericalmethods
MeasuresofCentralTendency
Mean
Usedonlyforcontinuousandnormallydistributed
data
Verysensitivetooutliers(tendstowardthetail)
Mostcommonlyused/wellunderstood
Median(a.k.a50thpercentile)
Midpointofthevalueswhenplacedinorderfrom
highesttolowest.Halfaboveandbelow.
Usedforordinalorcontinuousdata(especiallyfor
skewedpopulations)
Insensitivetooutliers
Page 2-128
Descriptivestatistics:Numericalmethods
MeasuresofCentralTendency
Mode
Mostcommonvalueinadistribution
Usedfornominal,ordinal,orcontinuousdata
Datamayhave>onemode(bimodal,trimodal)
Describesmeaningfuldistributionswithalarge
rangeofvalues
Page 2-128
09/05/2012
MeasuresofDataSpreadandVariability
StandardDeviation
Measureofthevariabilityaboutthemean
Appliedtocontinuousdata thatare~normally
distributedortransformedtobe
Empiricalrule:68%within1SD,95%within
2SD,and99%within3SD
Coefficientofvariation(CV)relatesthemean
totheSD
(SD/mean100%)
Variance=SD2
Page 2-128
MeasuresofDataSpreadandVariability
Range
Differencebetweenthesmallestandlargest
Appliedtoparametric andnonparametric
Easytocompute
Sizeofrangeisverysensitivetooutliers
Oftenreportedastheactualvalueratherthan
thedifferencebetweenthetwoextremevalues
Page 2-128-9
MeasuresofDataSpreadandVariability
Percentiles
Pointinadistributionwhichavalueislarger
thansomepercentageoftheothervalues
75thpercentile:75%ofthevaluesaresmaller
Doesnotassumethepopulationhasanormal
oranyotherdistribution
IQR:percentilethatdescribesthemiddle50%,
encompassesthe25th75thpercentile.
Page 2-128-9
09/05/2012
Example:Pharm.D.studentswereaskedthe
followingquestions..
2006(n=119)
2007(n=127)
2008(n=134)
The examinationquestionsinthiscoursewereappropriatetothematerialthat
wascovered
Mean (SD)
Median(IQR)
2.48(1.08)
3.80 (0.91)
4.04(0.82)
2.0(2.03.0)
4.0(3.04.0)
4.0(4.05.0)
IunderstandtheimportanceofthiscoursetotheprofessionofPharmacy
Mean (SD)
Median(IQR)
3.51(1.08)
3.57(0.92)
3.90(0.90)
4.0(3.04.0)
4.0(3.04.0)
4.0(4.04.0)
MeasuresofDataSpreadandVariability
Summary
Measuresofcentraltendencyshouldbe
presentedalongwithmeasuresofvariability
Whatmeasuresofcentraltendencyshouldbe
presentedwith
Continuous,intervalscaleddata?
Ordinaldata?
Whatmeasuresofspreadandvariability
shouldbepresentedwith
Means?
Medians?
Page 2-128-9
Dataset
HDLcholesterolexample
20 HDL concentrations measured as part of a
clinical study..
64
54
59
60
68
65
59
67
87
65
79
49
64
55
46
62
48
46
54
65
09/05/2012
Dataset
HDLcholesterolexample
MeasureofCentralTendency
MeasureofSpread
Mean
60.8
Median
61
Mode
65
SD
10.4
Range
41
(4687)
IQR
(5465)
SEM: 2.3
Evaluate the visual presentation of the
data..
Page 2-129
HDLcholesterolexample
GroupedHistogram
7
Normally distributed?
Frequency
5
4
3
2
1
0
40
48
56
64
72
80
HDL Concentrations (mg/dL)
88
More
Inferentialstatistics
Conclusionsmadeaboutapopulationfroma
studyofasampleofthatpopulation
Choosing/evaluatingstatisticalmethods
dependsonthetypeofdataused
Educatedstatementaboutanunknown
populationiscommonlyreferredtoasan
inference
Statisticalinferencecanbemadebyestimation
orhypothesistesting
Page 2-129
09/05/2012
PopulationDistributions
Discrete
Binomialdistribution
Poissondistribution
Page 2-129
PopulationDistributions
Normal(Gaussian)Distribution
Mostcommonmodelforpopulation
distributions
Symmetricorbellshaped
Importantlandmarks
:Populationmeanisequaltozero.
:PopulationSDisequalto1.
xandsrepresentthesamplemeanandSD.
Page 2-129-30
PopulationDistributions
Normal(Gaussian)Distribution
-2
-2SD
-SD
mean
SD
2SD
09/05/2012
HDLcholesterolexample
GroupedHistogram
7
6
Frequency
5
4
3
2
1
0
40
48
56
64
72
80
HDL Concentrations (mg/dL)
88
More
Normal(Gaussian)Distribution
Howdoweassess?
Frequencydistributionandhistograms
Median~mean(mostpracticalandeasiestto
use)
HDLExample:61vs.60.8mg/dL
Formaltest:KolmogorovSmirnovtest
Challengingtoevaluatewhenyouarereadinga
paper
Mean/SDdefineanormaldistribution..termed
parametric
Page 2-129-30
Normal(Gaussian)Distribution
Estimationandsamplingvariability
Separatesamplesfromapopulationwillgive
differentestimates
Distributionofmeansapproximatesanormal
distribution.
Meanofthisdistributionofmeans =(popmean)
SDofmeansisestimatedbytheSEM.
95%ofthesamplemeansliewithin2SEMof
Distributionofmeansfromtheserandom
samplesis~normalregardlessofthe
underlyingpopulationdistribution
Pages 2-129-30
09/05/2012
Normal(Gaussian)Distribution
StandardErroroftheMean(SEM)
SEM=SD/sqrt(n)
TheSEMquantifiesuncertaintyinthe
estimateofthemean,notvariabilityinthe
sample.
Whyisallofthisworthknowingaboutthe
differencebetweentheSEMandSD?
Application:95%CIis~mean 2 SEM
Deception?
Page 2-130
Dataset:HDLcholesterolexample
SDorSEM?
A = Mean (SD)
B = Mean (SEM)
Yes ?
No ?
ConfidenceIntervals
95%CIsarethemostcommonlyreportedCIs
Inrepeatedsamples,95%ofallCIsincludetrue
populationvalue.
Whyare95%CIsmostoftenreported?
Assumeabaselinebirthweightinagroupwith
amean SDof1.18 0.4kg
95%CI~mean 1.96 SEM(or2 SEM)
Whatisthe95%CI?(1.07,1.29)
SD,SEM,andCIsareoftenusedinterchangeably
(incorrectly)
Page 2-130
10
09/05/2012
CIsInsteadofStandardHypothesisTesting?
Hypothesistestingandcalculationofpvaluestellus
whetherthereis(orisnot),astatisticallysignificant
difference,butnothingaboutthemagnitude
CIs
Helptodeterminetheimportanceofafindingandits
application
Provideanideaofthemagnitudeofthedifference
Differencebetweentwocontinuousvariables:
CIthatincludes0(nodiff)isnotstatisticallysignificant(p>0.05)
Thereisnoneedtoshowboththe95%CIandthepvalue
CIsforORandRRareevaluateddifferently
Page 2-131
HypothesisTesting
Nullhypothesis(H0):
Nodifferencebetweencomparatorgroups(TxA=TxB)
Alternativehypothesis(Ha):
Statesthatthereisadifference(TxA TxB)
Resultsofhypothesistesting willindicatewhether
thereisenoughevidence torejectH0
H0 isrejected=statisticallysignificant(SS)difference
H0 isnotrejected =noSSdifference
Wearenotconcludingthatthetreatmentsareequal.
Pages 2-131-2
StatisticalTestsandChoosingaStatistical
Test
Dependenton:
Typeofdata(nominal,ordinal,continuous)
Distributionofdata(normal,etc.)
Studydesign(parallel,crossover,etc.)
Presenceofconfoundingvariables
Onetailedversustwotailed
Parametricvs.nonparametrictests
Page 2-132
11
09/05/2012
Parametricvs.Nonparametric
Parametrictestsassume
Databeinginvestigatedhaveanunderlying
~normaldistribution
Dataarecontinuous
Databeinginvestigatedhavevariancesthatare~
equal
Nonparametrictests
Dataarenotnormallydistributed
Datadonotmeetothercriteria(discretedata)
Page 2-132
ParametricTests
Studentsttest
Onesampletest:
Comparesthemeanofthestudysamplewiththe
populationmean
Group 1 Mean
Page 2-132
ParametricTests
Studentsttest(s)
Twosample,independentsamples,or
unpairedtest:
Comparesthemeansoftwoindependent samples.
Group 1
Group 2
Pages 2-132-3
12
09/05/2012
ParametricTests
Studentsttest(s)
Twosample,independentsamples,orunpaired
test:
Equalvariancetest
Ruleofthumbforvariances:Ratiooflargertosmaller
varianceisgreaterthan2,weconcludevariancesare
different
Formaltestfordifferencesinvariances:Ftest
Adjustmentscanbemadeforcasesofunequalvariance.
Unequalvariancetest
Correctionemployedtoaccountforvariances
Pages 2-132-3
ParametricTests
Studentsttest(s)
Twosample,independentsamples,orunpaired
test:
Equalvariancetest
Ruleofthumbforvariances:Ratiooflargertosmaller
varianceisgreaterthan2,weconcludevariancesare
different
Formaltestfordifferencesinvariances:Ftest
Adjustmentscanbemadeforcasesofunequalvariance.
Unequalvariancetest
Correctionemployedtoaccountforvariances
Pages 2-132-3
ParametricTests
Studentsttest(s)
Pairedtest:Comparesthemeandifferenceof
pairedormatchedsamples.Thisisarelated
samplestest.
Group 1
Measurement 1
Measurement 2
Pages 2-132-3
13
09/05/2012
ParametricTests
Studentsttest(s)
COMMONERROR:
Useofmultipletteststocomparemorethantwo
groups
Pages 2-132-3
ParametricTests
AnalysisofVariance(ANOVA)
Oneway(singlefactor)ANOVA:
Comparesthemeansof>3groups
Independentsamplestest
Young
Group 1
Group 2
Group 3
Twoway(twofactor)ANOVA:
Additionalfactoradded
Young
Elderly
Group 1
Group 1
Group 2
Group 2
Group 3
Group 3
Page 2-133
ParametricTests
AnalysisofVariance(ANOVA)
RepeatedMeasuresANOVA:
Relatedsamplestest,extensionofpairedttest
Young
(Group 1)
Related Measurements
Measurement 1 Measurement 2 Measurement 3
Page 2-133
14
09/05/2012
ParametricTests
Posthoctests
Remembermultiplettesterror
Maintainsappropriateerrorrate
Determinewhichgroupsactuallydiffer
ConductedifANOVAstatisticallysignificant
Posthoctests(examples):
TukeyHSD(HonestlySignificantDifference),
Bonferroni
Scheffe
NewmanKeuls
Page 2-133
NonParametricTests
Testsforordinaldataorcontinuousdata(thatdo
notmeetappropriateassumptionsforparametric
tests)
Testsforindependentsamples
WilcoxonranksumandMannWhitneyUtest
Compares2independentsamples(independentsamplest
test)
KruskalWallisonewayANOVAbyranks
Compares> 3independentgroups(onewayANOVA)
Posthoctesting
Page 2-133
NonParametricTests
Testsforrelatedorpairedsamples
SigntestandWilcoxonsignedranktest:Compares
2matchedorpairedsamples(pairedttest)
FriedmanANOVAbyranks:Compares>3
matched/pairedgroups
Page 2-133
15
09/05/2012
NonParametricTests
NominalData
Chisquare(2)test:Comparesexpectedand
observedproportionsbetween>2groups
Testofindependence
Testofgoodnessoffit
Fisherexacttest:UseofChisquaretestforsmall
groups(cells)containing<5observations
McNemar:Pairedsamples
MantelHaenszel:Controlsfortheinfluenceof
confounders
Page 2-134
ChoosingtheMostAppropriateStatistical
Test:Example
Group
Rosuvastatin
(n=25)
152 5
Simvastatin
(n=25)
151 4
> 0.05
Final LDL
(mg/dL)
p-value
Final
138 7
> 0.05
135 5
Page 2-134
ChoosingtheMostAppropriateStatistical
Test:Example
Men/Women
Smokers
Baseline LDL-C
(mg/dL)
Rosuvastatin (n=25)
Simvastatin (n=25)
12/13
10/15
10
13
152 5
151 4
16
09/05/2012
Appropriatetesttodeterminebaseline
differencesin.
1. Sexdistribution?
2. Lowdensitylipoproteincholesterol?
3. Percentageofsmokersand nonsmokers?
A.Wilcoxonsignedranktest
B.Chisquaretest
C.ANOVA
D.Twosamplettest
ChoosingtheMostAppropriateStatistical
Test:Example
Rosuvastatin (n=25) Simvastatin (n=25)
Baseline LDL (mg/dL)
152 5
151 4
138 7
135 5
LDL (mg/dL)
14 6
16 5
Appropriatetesttodetermine
EffectofrosuvastatinonLDLC
Primaryendpoint:3monthchangeinLDL
C
A.Wilcoxonsignedranktest
B.Chisquaretest
C.ANOVA
D.Twosamplettest
17
09/05/2012
DecisionErrors
TypeIError
ProbabilityofmakingTypeIerror=significancelevel
()
Conventionistosettheto0.05
5.0%ofthetime,wewillconcludethereisaSSdifference
whenactuallyonedoesnotexist.
CalculatedchancethatatypeIerrorhasoccurrediscalled
thepvalue.
Lowerpvaluedoesnotsuggestmoreimportance,onlySS
andlesslikelyattributabletochance
Pages 2-134-5
DecisionErrors
TypeIIerror
TypeIIError:
Convention:0.100.20
Concludingthatnodifferenceexistswhenonetrulydoes
(notrejectingH0 whenitshouldberejected)
Pages 2-134-5
DecisionErrors
Power(1)
Abilitytodetectdifferencesbetweengroupsifoneactually
exists
Dependentonthefollowingfactors:
Predetermined
Samplesize
Sizeofthedifferencebetweentheoutcomesyouwishtodetect
Variabilityoftheoutcomesthatarebeingmeasured
Powerisdecreasedby.
Asaboveand
Poorstudydesign
Incorrectstatisticaltests(useofnonparametrictestswhenparametric
testsareappropriate)
Pages 2-134-5
18
09/05/2012
DecisionErrors:Statisticalpower
analysisandsamplesizecalculation
Shouldbeperformedinallstudiesapriori
Necessarycomponentsforestimatingappropriatesample
size
AcceptabletypeIIerrorrate(usually0.100.20)
Observeddifferenceinpredictedstudyoutcomesthatisclinically
significant
Expectedvariabilityinabove
AcceptabletypeIerrorrate(usually0.05)
Pages 2-134-5
Statisticalsignificanceversusclinical
significance
Sizeofthepvalueisnotrelatedtothe
importanceoftheresult.
Statisticallysignificantnotnecessarilyclinically
significant
Lackofstatisticalsignificancedoesnotmean
resultsarenotimportant.
Withnonsignificantfindingsconsidersample
size,estimatedpower,andobserved
variability
Pages 2-134-5
CorrelationandRegression
Introduction
Correlationexaminesthestrengthoftheassociation
betweentwovariables.
Itdoesnotnecessarilyassumethatonevariableisuseful
inpredictingtheother.
Regressionexaminestheabilityofoneormore
variablestopredict anothervariable.
Pages 2-135-6
19
09/05/2012
Correlation
PearsonCorrelation
Strength oftherelationshipbetweentwovariables
thatare..
normallydistributed
ratioorintervalscaled
linearlyrelated
Oftenreferredtoasthedegreeofassociation
betweenthetwovariables
Doesnotnecessarilyimplythatonevariableis
dependentontheother
Pages 2-135-6
CorrelationCoefficient
Pearsoncorrela oncoecient(r)rangesfrom1to+1andcantakeany
valueinbetween.
1
Perfect negative linear
relationship
0
No linear
relationship
+1
Perfect positive linear
relationship
Hypothesistestingisperformedtodeterminewhetherthecorrelation
coefficientisdifferentfromzero.Thistestishighlyinfluencedbysample
size
SpearmanRankCorrelation:Nonparametrictestthatdoesnotassumea
normaldistributionorcontinuousdata.Canbeusedforordinaldataor
nonnormallydistributedcontinuousdata
Pages 2-135-6
CorrelationCoefficient
1
Perfect positive linear
relationship
0
No linear
relationship
-1
Perfect negative linear
relationship
20
09/05/2012
CorrelationPearls
Closerristo1(either+or),themorehighly
correlatedthetwovariables
Noconsistentinterpretationofthevalueofr
Paymoreattentiontothemagnitudeofthe
correlationthantothepvalue
VIEWtherelationshipbetweenthetwo
variables
Pages 2-135-6
Regression
Statisticaltechniquerelatedtocorrelation
Therearemanydifferenttypes
Simplelinearregression:
continuousoutcome(dependent)variable
continuousindependent(causative)variable
Twomainpurposesofregression:
Developmentofpredictionmodel
Accuracyofprediction
Pages 2-136-7
Regression
Developmentofpredictionmodel
Makingpredictionsofthedependentvariable
fromtheindependentvariable
Y=mx+b (dependentvariable=slope
independentvariable+intercept)
Pages 2-136-7
21
09/05/2012
Regression
Accuracyofprediction:Howwelltheindependent
variablepredictsthedependentvariable.
Determinestheextentofvariabilityinthedependent
variablethatcanbeexplainedbytheindependent
variable.
Coefficientofdetermination(r2)describesthis
relationship.Valuesofr2 canrangebetween0and1.
Anr2 of0.80:80%ofthevariabilityinY is
explained bythevariabilityinX.
Statisticaltestsassociatedwithregression
Pages 2-136-7
Coefficientofdetermination(r2)
r2~0.25
r2~0.5
r2~0.80
TypesofRegression
Simplelinearregression
Multiplelinearregression
Simplelogisticregression
Multiplelogisticregression
Nonlinearregression
Polynomialregression
Pages 2-136-7
22
09/05/2012
Regression
Example
Whatyoushouldknow
Slopeandintercept?
Requiredassumptions?
r2 interpretation?
PredictantifactorXa
concentrationsatdosesof
2and3.75mg/kg
Whatdoesthep<0.05
valueindicate?
Page 2-138
SurvivalAnalysis
Studiesthetimebetweenentryinastudyand
someevent(e.g.,death,myocardialinfarction)
Censoringmakessurvivalmethodsunique
Subjectsdonotenterthestudyatthesametime
Page 2-138
SurvivalAnalysis
KaplanMeiermethod
Usessurvivaltimestoestimatetheproportionof
peoplewhowouldsurvivealengthoftime
LogRankTest
Comparethesurvivaldistributions> 2groups
Coxproportionalhazardsmodel
Evaluatetheimpactofcovariatesonsurvivalin
twoormoregroups
Allowscalculationofahazardratio(andCI)
Pages 2-138-9
23
09/05/2012
SurvivalAnalysis
KaplanMeiermethod
Logranktest
HR:0.54(0.231.00)
p=0.05
Pages 2-138-9
SurvivalAnalysis
Coxproportionalhazardsmodel
Mostpopularmethodtoevaluatetheimpact
ofcovariates
Investigatesseveralvariablesatatime
Actualmethodofconstruction/calculationis
complex
Comparessurvivalintwoormoregroupsafter
adjustingforothervariables
Allowscalculationofahazardratio(andCI)
Page 2-139
24