Professional Documents
Culture Documents
TwoRunningExamples
Forourrstsetoftests,weregoingtousetworunningexamples:campaignspendingandafuncomparisonoftwotowns citizensheights.Herearethetwoscenarios: Onethingthatsbeenclaimedaboutthe2008electionisthatPresidentObamaraisedsmallerquantitiesfromalarger groupofdonorsthanSenatorMcCain,whoraisedasmallernumberoflargecontributions.Statisticaltechniqueswill helpusdeterminehowtruethisstatementis. Imaginetwotownsthatonlydierinthatoneofthetownshadsomethinginthewatertheyearabunchofkids wereborn.Didthatsomethinginthewateraecttheheightofthesekids?(Note:Thissituationisunrealistic.Its neverthecasethattheonlydierencebetweentwocommunitiesistheoneyouwanttomeasure,butitsanicegoal!) Wellusestatisticstodeterminewhetherthetwocommunitieshavemeaninfullydierentheights.
ComparingAverages
Letsstartbycomparingasimplestatistic,toseeifinthedataweobservetheresanydierence.Wellstartbycomparingthe averageheightsofthetwotowns.(Asanaside:itwouldhelpifyouwroteandranyourcodein dataiap/day3/ today,since severalmoduleslike ols.py areavailableinthatdirectory).
importnumpy town1_heights=[5,6,7,6,7.1,6,4] town2_heights=[5.5,6.5,7,6,7.1,6] town1_mean=numpy.mean(town1_heights) town2_mean=numpy.mean(town2_heights) print"Town1avg.height",town1_mean print"Town2avg.height",town2_mean print"Effectsize:",abs(town1_meantown2_mean)
GraphTheData
Ifyounishedyesterdayshistogramexercise,thenfeelfreetoskipdowntotheboxplotsection Theeectsizeinbothofourexamplesseemslarge.Itwouldbenicetomorethanjustcompareaverages.Letstrytolookat
1
ahistogramofthedistributions.Wecreatedahistogramofthetwocampaignscontributions,binnedby$100increments.
importmatplotlib.pyplotasplt
fromcollectionsimportCounter
increment=1
width=.25
town1_bucketted=map(lambdaammt:ammtammt%increment,town1_heights)
town2_bucketted=map(lambdaammt:ammtammt%increment+width,town2_heights)
town1_hist=Counter(town1_bucketted)
town2_hist=Counter(town2_bucketted)
minamount=min(min(town1_heights),min(town2_heights))
maxamount=max(max(town1_heights),max(town2_heights))
buckets=range(int(minamount),int(maxamount)+1,increment)
fig=plt.figure()
sub=fig.add_subplot(111)
sub.bar(town1_hist.keys(),town1_hist.values(),color='b',width=width,label="town1")
sub.bar(town2_hist.keys(),town2_hist.values(),color='r',width=width,label="town2")
sub.legend()
plt.savefig('figures/town_histograms.png',format='png')
Thisresultsinahistogramthatlookslikethis:
sub.set_xlim((20000,20000))
beforedisplayingtheplotinordertosetthexvaluesofthehistogramtocutodonationslargerthan$20,000orsmaller than$20,000(refunds).Withbarwidthsof50andincrementsof$100,yourhistogramwilllooksomethinglikethis:
Hereswhatwesee:
Letsinterpretthisplot.Weshowtown1ontheleftandtown2ontheright.Eachtownisrepresentedbyaboxwithared lineandwhiskers. Theredlineintheboxrepresentsthemedian,or50thpercentilevalueofthedistribution.Ifwesortthedataset,50% ofthevalueswillbebelowthisline,and50%willbeaboveit. Thebottomedgeoftheboxrepresentsthe25thpercentile(thevaluelargerthan25%ofyourdataset),andthetop edgerepresentsthe75thpercentile(thevaluelargerthan75%ofyourdataset).Thedierencebetweenthe75thand 25thpercentileiscalledtheinnerquartilerange(IQR). Thewhiskersrepresenttheextremesofourdataset:thelargestvaluewerewillingtoconsiderinourdatasetbefore callingitanoutlier.Inourcase,wesetwhis=1,requestingthatweshowwhiskersthemostextremevalueata distanceofatmost1xtheIQRfromthebottomandtopedgesoftheboxplot. Ifnormaldistributionsareyourthing,thisimagemighthelpyouinterprettheboxandwhiskersplot. Likeinthehistogram,weseethatthetownsheightdistributionsdontlookallthatdierentfromoneanother.Generally,if theboxesofeachdistributionoverlap,andyouhaventtakensomethingontheorderofabuttload(metricunits)of measurements,youshoulddoubtthedierenesindistributionaverages.Itlookslikeasingleheightmeasurementfortown 1isprettyfarawayfromtheothers,andyoushouldinvestigatesuchmeasurementsaspotentialoutliers. ExerciseBuildaboxandwhiskersplotoftheMcCainandObamacampaigncontributions.Again,outliersmakethisa diculttask.Withwhis=1,andbysettingtheyrangeoftheplotslikeso
sub.set_ylim((250,1250))
wegotthefollowingplot
Obamaisontheleft,andMcCainontheright.Realdatasureismoreconfusingthanfakedata!Obamasboxplotisalot tighterthanMcCains,whohasalargerspreadofdonationsizes.BothofObamaswhiskersarevisibleonthischart,whereas onlythetopwhiskerofMcCainsplotisvisible.Anotherfeaturewehaventseenbeforeisthestreamofbluedotsaftereach ofthewhiskersoneachofObamaandMcCainsplots.Theserepresentpotentialoutliers,orvaluesthatareextremeand donotrepresentthemajorityofthedataset. Itwaseasytosaythatthehistogramsandboxplotsforthetownheightsoverlappedheavily.Sowhiletheeectsizefor townheightswasprettylarge,thedistributionsdontactuallylookallthatdierentfromoneanother. Thecampaignplotsareabithardertodiscern.Thehistogramtoldusvirtuallynothing.Theboxplotshowedusthat Obamasdonationsseemedmoreconcentratedonthesmallerend,whereasMcCainsseemedtospanalargerrange.There wasoverlapbetweentheboxesintheplot,butwedontreallyhaveasenseforjusthowmuchoverlaporsimilaritythereis betweenthesedistributions.Inthenextsection,wellquantifythedierenceusingstatistics!
RunaStatisticalTest
Wehavetwopopulationheightaverages.Weknowthattheyaredierent,butchartsshowthatoverallthetwotownslook similar.Wehavetwocampaigncontributionaveragesthatarealsodierent,butwithamurkierstoryafterlookingatour boxandwhiskerplots.Howwillwedenitivelysaywhetherthedierencesweobservearemeaningful? Instatistics,whatweareaskingiswhetherdierencesweobservedarereliableindicatorsofsometrend,orjusthappened byluckychance.Forexample,wemightsimplyhavemeasuredparticularlyshortmembersoftown1andtallmembersof town2.Statisticalsignicanceisameasureoftheprobabilitythat,forwhateverreason,westumbledupontheresultswe didbychance. Thereareseveraltestsforstatisticalsignicance,eachapplyingtoadierentquestion.Ourquestionis:Isthedierence betweentheaverageheightofpeopleintown1andtown2statisticallysignicant?Weaskasimilarquestionaboutthe dierenceinaveragecampaigncontributions.ThetestthatanswersthisquestionistheTTest.Thereareseveralavorsof TTestandwewilldiscussthesesoon,butfornowwellfocusonWelchsTTest.
5
importwelchttest print"Welch'sTTestpvalue:",welchttest.ttest(town1_heights,town2_heights)
TheWelchsTTestemittedapvalueof.349.Apvalueistheprobabilitythattheeectsizeof.479feetbetweentown1and town2happenedbychance.Inthiscase,theres34.9%chancethatwevearrivedatoureectsizebychance. Whatsagoodcutoforpvaluestoknowwhetherweshouldtrusttheeectsizewereseeing?Twopopularvaluesare.05 or.01:ifthereislessthana5%or1%chancethatwearrivedatouranswerbychance,werewillingtosaythatwehavea statisticallysignicantresult. Soinourcase,ourresultisnotsignicant.Hadwetakenmoremeasurements,orifthedierencesinheightswerefarther apart,wemighthavereachedsignicance.But,givenourcurrentresults,letsnotjumptoconclusions.Afterall,itwasjust foodcoloringinthewater! ExerciseRunWelchsTtestonthecampaigndata.IstheeectsizebetweenMcCainandObamasignicant?Byour measurements,thepvaluereportediswithinroundingerrorof0.Thatssignicantbyanyonesmeasure:theresa nearnonexistantchancewereseeingthisdierencebetweenthecandidatesbysomerandomukeintheuniverse.Timeto writeanarticle!
CanYouHaveaVerySignicantResult?
No.Thereisnosuchthingasveryoralmostsignicant.Remember:theeectsizeistheinterestingobservation,andits uptoyouwhatmakesforanimpressiveeectsizedependingonthesituation.Youcanhavesmalleects,largeeects,and everythinginbetween.Signicancetestingtellsuswhethertobelievethattheobservationswemadehappenedbyanything morethanrandomchance.Whilepeopledisagreeaboutwhetherapvalueof.05or.01isrequired,theyallagreethat signicanceisabinaryvalue. Strictlyspeaking,youvelearnedaboutTTestsatthispoint.Ifyouarepressedfortime,readPuttingitallTogetherbelow andmoveontothenextsection.Fortheoverachieversinourmidst,thereslotsofimportantinformationtofollow,andyou caninsteadkeepreadinguntiltheend.
TypesofTTest
TheTTesthastwomajoravors:pairedandunpaired. Sometimesyourdatasetsarepaired(alsocalleddependent).Forexample,youmaybemeasuringtheperformanceofthe samesetofstudentsonanexambeforeandafterteachingthemthecoursecontent.TouseapairedTTest,youhavetobe abletomeasureanitemtwice,usuallybeforeandaftersometreatment.Thisistheidealcondition:byhavingbeforeand aftermeasurementsofatreatment,youcontrolforotherpotentialdierencesintheitemsyoumentioned,likeperformance betweenstudents. Othertimes,youaremeasuringthedierencebetweentwosetsofmeasureddata,buttheindividualmeasurementsineach datasetareunpaired(sometimescalledindependent).Thiswasthecaseinourtests:dierentpeoplecontributedtoeach campaign,anddierentpeopleliveintown1and2.Withunpaireddatasets,welosetheabilitytocontrolfordierences betweenindividuals,sowelllikelyneedmoredatatoachievestatisticalsignicance. Unpaireddatasetscomeinallavors.Dependingonwhetherthesizesofthesetsareequalorunequal,anddependingon whetherthevariancesofbothsetsareequal,youwillrundierentversionfofanunpairedTTest.Inourcase,wemadeno assumptionsaboutthesizesofourdatasets,andnoassumptionsontheirvariances,either.Sowewentwithanunpaired, unequalsize,unequalvariancetest.ThatsWelchsTTest. Aswithalllifedecisions, ifyou want moredetails,checkouttheWikipediaarticleonTTests.Thereareimplementationsof pairedTTests and unpaired ones inscipy.Theunequalvariancecaseisnotavailableinscipy,whichiswhyweincluded Enjoyit! welchsttest.py.
6
TTestAssumptionsweBroke:(
Wevemanagedtosoundlikesmartypantsesthatdoalltherightthingsuntilthismoment,butnowwehavetoadmitwe brokeafewrules.ThemathbehindTTestsmakesassumptionsaboutthedatasetsthatmakesiteasiertoachievestatistical signicanceifthoseassumptionsaretrue.Thebigassumptionisthatthedataweusedcamefromanormaldistribution. Therstthingweshouldhavedoneischeckwhetherornotourdataisactuallynormal.Luckily,thenescipyfolkshave implementedtheShapiroWilktesttestfornormality.Thistestcalculatesapvalue,that,iflowenough(usually<0.05),tells usthereisalowchancethedistributionisnormal.
importscipy.stats print"Town1ShapiroWilkspvalue",scipy.stats.shapiro(town1_heights)[1]
Withapvalueof.380,wedonthaveenoughevidencethatourtownheightsarenotnormallydistributed,soitsprobably netorunWelchsTTest ExerciseTestthecampaigncontributiondatasetsfornormality.Wefoundthemtonotbenormal(p=.003forObamaand .014forMcCain),whichmeanswelikelybrokethenormalityassumptionofWelchsTTest.Thestatisticspolicearegoingto bepayingusavisit. ThisturnsouttobeOKfortworeasons:TTestsareresilienttobreakingofthenormalityassumption,and,ifyourereally seriousaboutyourstatistics,therearenonparametricequivalentsthatdontmakenormalityassumptions.Theyaremore conservativesincetheycantmakeassumptionsaboutthedata,andthuslikelyrequirealargersamplesizetoreach signicance.Ifyourealrightwiththat,feelfreetoruntheMannWhitneyUnonparametricversionoftheTTest,whichhas awonderfulname.
importscipy.stats print"MannWhitneyUpvalue",scipy.stats.mannwhitneyu(town1_heights,town2_heights)[1]
PuttingitAllTogether
Sofar,wevelearnedthestepstotestahypothesis: Computesummarystatistics,likeaveragesormedians,andseeifthesenumbersmatchyourintuition. Lookatthedistributionhistogramsorsummaryvisualizationslikeboxplotstounderstandwhetheryourhypothesis appearstobebackedupbythedata Ifitsnotimmediatelyclearyourhypothesiswaswrong,testitusingtheappropriatestatisticaltestto1)quantifythe eectsize,and2)ensurethedatayouobservedcouldnthavehappenedbychance. TheresalotmoretostatisticsthanTTests,whichcomparetwodatasetsaverages.Next,wellcovercorrelationbetween twodatasetsusinglinearregression.
The following may not correspond to a particular course on MIT OpenCourseWare, but has been provided by the author as an individual learning resource.
For information about citing these materials or our Terms of Use, visit: http://ocw.mit.edu/terms.