You are on page 1of 17

A GUIDE TO SABERMETRIC RESEARCH

We'reoftenasked,"I'dliketoknowmoreaboutsabermetrics,butwheredoIbegin?"LongtimeSABR
memberPhilBirnbaumhasauthoredAGuidetoSabermetricResearchtohelpansweryourquestions.We're
pleasedtopublishitatSABR.org/sabermetrics.BirnbaumistheeditoroftheSABRStatisticalAnalysis
Committeenewsletter,"BytheNumbers",andhecanbefoundwritingonvarioustopicsathis
blog,SabermetricResearch.

First,let'sgooversomebasics:
Whatissabermetrics?AsoriginallydefinedbyBillJamesin1980,sabermetricsis"thesearchfor
objectiveknowledgeaboutbaseball".JamescoinedthephraseinparttohonortheSocietyforAmerican
BaseballResearch.

Whoinventedsabermetrics?Statisticalanalysishasbeenaroundaslongasbaseballhasbeenplayed
competitively.LongbeforeMoneyballbecameaworldwidephenomenoninthe21stcenturyandbefore
BillJames'baseballwritingsgainedmainstreampopularityinthe1980s,HallofFamemanagerEarl
WeaverwasusingindexcardstofinetunehisplatooningsystemandpitchingchangeswiththeBaltimore
Oriolesinthe1960s,whileBranchRickeyhiredstatisticianAllanRothinthe1940stoevaluateplayer
performancewiththeBrooklynDodgers.Agenerationbeforethat,BaseballMagazineeditorF.C.
Lanewascreatingnewstatisticalmethodstomeasureoffensiveproduction,culminatinginhisclassic
bookofessays,Batting.Inthemid19thcentury,HenryChadwickiscreditedwithdevelopingthebox
scoreandhistabulationofhits,homerunsandtotalbasesledtotheformulationofmetricssuchas
battingaverageandsluggingpercentage.

SABRorsabermetrics?Withmorethan6,000membersaroundtheworld,SABRisamembership
organizationcomprisedofpassionateandknowledgeablebaseballfanswithavarietyofinterestsone
ofthembeingstatisticalanalysis.SABRmembersBillJames,PetePalmerandDickCramercofounded
SABR'sStatisticalAnalysisCommitteein1974andhelpedpopularizethestudyofsabermetrics.The
phrase"sabermetrics"itselfisinthepublicdomainandisgenerallyusedtodescribeanymathematicalor
statisticalstudyofbaseball.

Sabermetricresearchersoftenusestatisticalanalysistoquestiontraditionalmeasuresofbaseballevaluation
suchasbattingaverageandpitcherwins.Earlyon,James'theorieswerelargelymocked(orignored)bythe
baseballestablishment,butasJoePosnanskiwroteinTheBalladofBillJames,overtimehisworkstartedto
berecognized.TimeMagazineoncenamedhimoneofthe100mostinfluentialpeopleintheworld.The
BostonRedSoxhiredhimin2003andsubsequentlywontwoWorldSeries.Jamesisstillaskingrelevant
questionstodayatbilljamesonline.com,andsoarelegionsofhisdisciplessuchasRobNeyer,baseballeditor
atSBNation;Birnbaum;andallthegreatwritersatBaseballAnalysts,BaseballProspectus,BeyondtheBox
Score,FanGraphs,TheHardballTimesandothersites.

Wantaprimeronsabermetrics?CheckouttheFanGraphsLibraryfordowntoearthexplanationsof
advancedmetricssuchaswOBA(weightedonbaseaverage),FIP(fieldingindependentpitching)andWAR
(winsabovereplacement),writtenbySteveSlowinski.SABRmemberscanalsoreadcuttingedgearticleson
statisticalanalysisineveryissueoftheBaseballResearchJournal,suchasTheManyFlavorsofDIPS:AHistory
andOverview,byDanBascoandMichaelDavies.We'vegotafulllistofresourcesonourRelatedLinkspage
attheendofthissection.

BesuretocheckouttheannualSABRAnalyticsConference,wherewebringtogetherthetopmindsofthe
baseballanalyticcommunityunderonerooftodiscuss,debateandshareinsightfulwaystoanalyzeand
examinethegreatgameofbaseball.

Whetheryou'rejuststartingoutoryou'dlikearefreshercourse,whetheryou'reanumberswizardoryou
consideryourselfmathphobic,wehopeyou'llfindPhilBirnbaum'sGuidetoSabermetricResearchinformative
andinteresting.

TheBasics

Sabermetricswasfirstintroducedtoawidepublicin1982,withthefirstmassmarketpublicationoftheBill
JamesBaseballAbstract.Andformygenerationofsabermetriciansofacertainage,thiswastheveryfirst
sentenceaboutsabermetricsthatweeverread:

Ifyousometimesgetthefeelingbetweenhereandthebackcoverthat
youarecominginonthemiddleofadiscussion,itisbecauseyouare.

Thatis:BillJamesandahandfulofcolleagues,mostlySABRmembers,hadbeenworkingonabodyof
knowledgeforafewyears.Therewasanestablished,althoughprivate,literatureofsabermetrics,andpartof
Jamestaskwastoexplainwhathadalreadybeendiscovered,andhow.

Thatwasanumberofpeoplewhocouldcongregatepeacefullyintherestroomsintheleftfieldbleachersof
YankeeStadium,workingforafewyears,withoutcomputersorformalpublication.Still,thosefew
researchershadbuiltaconsiderablebaseofknowledgethatwehadtobecaughtupon.

Imagine,then,thesituationtoday.Sabermetricshasbeeninfullforcesincethemid1970s.ByThe
Numbers,theSABRStatisticalAnalysisCommitteenewsletter,hasbeenpublishingsincethelate1980s.
Beforethat,therewasBaseballAnalyst,BillJamesownsabermetricsjournalinthe1980s.Withtheadvent
ofRotisserie/FantasyBaseball,anindustryofprofessionalsabermetricsresearchsprangup.Publications
likeBaseballProspectusandBaseballForecasterdotheirownproprietaryresearchandpublishsomeofitin
theirannualbooks.

And,mostimportantly,inthepastfewyears,amateursabermetricshasfounditsstrideand,inmyopinion,
takenoverthelead.Inthepasthalfdecade,avastnumberofresearchershavepublishedtowebsitesand
blogs,givingusserious,stateoftheartresultsthatareinstantlyseenbythousandsinthecommunity,who
oftenbuildonthefindingsandtakethemfurther.

Fiveyearsago,Iwouldhavearguedthatthemainoutletsforsabermetricresearchwereprintpublications,
andthatafewbooksandwebsitescouldbringyoureasonablywelluptodateonwhatsabermetricianshad
learnedovertheyears.But,now,thingshavemovedsofastthatitshardtokeepup,especiallywitharticles
andpapersandstudiesspreadallovertheweb.
Itsalittlelikethesoftwareindustry.Inthe1990s,almostallsoftwarecameshrinkwrappedfromretail
stores,andmostofitwasbybigindustryplayers,suchasMicrosoftandIBM.Todaythatstillexists,butwith
opensource,shareware,filesharing,andhundredsofthirdpartyiPhoneappscreatedeveryyearwell,now
ittakessomeefforttokeeptrack.

Still,thebasicshaventchangedthatmuch.Aswithanyscience,theearliestdiscoveredprinciplestendtobe
themostfundamental,and,overtime,theregetstobeabitofanunwrittenconsensusofwhatfindingsare
mostimportant.SoImgoingtodomybestheretogiveyouashortreadinglistofclassicalsabermetrics,a
waytotrytogetagoodfeelforwhatsabermetricshasbeenuptooverthepastfewdecades.

TheBillJamesBaseballAbstract(1982)Thisworkisthreedecadesoldandcounting,anditsgettingharder
tofind.Still,itremainsthebestplacetolearnwhatsabermetricsis,howitworks,andhowsabermetricians
think.

ThatsallattributabletoBillJameshimself.Notonlydidhemakemostofthediscoveriesinthebook(there
wereothersabermetriciansactiveatthetime,butJameswaswellover90%ofthefield),buthiswritingstyle
makestheexplanationseffortless.AnythingbyBillJamesisajoytoread.

Ifyoucantfindthe1982edition,trywhateverotheryearsyoucanfind.Generally,theearliertheyear,the
morespaceisdevotedtothebasics.

Mathletics(PrincetonUniversityPress,2009)WayneWinston,aprofessorandconsultanttotheNBAs
DallasMavericks,wrotethis2009summaryofsabermetricsfindingsinvarioussports.Thebaseballsection
comprisesseventeenbasicexplanationsofvarioussabermetricprinciples,suchasrunscreated,streakiness
andmomentum,pitcherevaluation,andsituationalstrategy.

TheresnooriginalbaseballresearchinMathletics,butifyouwantaquickandconciseintroductiontosomeof
thebasicfindingsinthefield,thisisthebooktoget.

TheHiddenGameofBaseball(Doubleday,1984)Thisbook,bysabermetricianPetePalmerandbaseball
historianJohnThorn,isconsideredbymanytobethebibleofsabermetrics.Idconsideritacomplementto
theBillJamesbooks.

WhileJamesdevelopedsomemethodsandformulasbytrialanderror,Palmermineshistoricaldataand
showsthetheoreticalunderpinningsofthemethodsheuses.Ifyoulikeamoremathematicalapproachto
sabermetrics,thisistheworkthatlaysthefoundation.

ThornandPalmersbookwilltellyou,forinstance,thataleadoffdoublehelpstheteambyanaverageof.614
runs.Howdotheyknowthat?Well,theylookedatmanyyearsofplaybyplaydata,andtheyfoundthat,on
average,.454runsarescoredintheaverageinning.But,witharunneronsecondandnobodyout,anaverage
1.068runswerescored.Andso,thedoubleisworththedifferencebetweenthetwosituations,whichis0.614
runs.

WhiletheBillJamesBaseballAbstractisthephilosopherandtheoreticianofsabermetricthought,TheHidden
GameofBaseballisitsengineeringdepartment.

TheBook(Potomac,2007)Acollaborationbythreeexceptionalsabermetricians,TheBookstudiesover100
differentquestionsonbaseballstrategy.Whileitdoescovertopicsthathavepreviouslybeenstudiedby
others,itdoesso,usually,withmuchgreaterrigor.Forinstance,whenlookingatplayerperformancein
varioussituations,TheBookwilloftencorrectforpark,home/road,theidentityoftheopposingpitcher,the
ball/strikecount.Asaresult,itsconclusionsareverydetailedandverywellconsidered.

TheBookisintendedmoreforfanswithahardcoreinterestinsabermetricsandstrategyissues.Itsincluded
herebecauseithasbeensoinfluentialamongcurrentresearchers,andyouwillseeitswaysofthinking,
especiallyasdescribedinChapter1,repeatedlysurfaceinemergingresearch.

IfthethreebooksabovecomprisethereadinglistforSabermetrics101,thenTheBookcouldbethetextfor
Sabermetrics301or401.

AskingtheRightQuestions

Sabermetricsisthesearchforobjectiveknowledgeaboutbaseballthroughanalysisofthestatisticalrecord.
Atitsmostbasic,theevidenceisjustsimpleobservationandcounting.Forinstance,inearly2010,
sabermetricianDaveAllenwonderedifbetterhittersgetfewergoodpitchestohit.Helookedattwentyof
thebesthittersinbaseball,andtwentyoftheworst.Hefoundthat,atalmosteveryballstrikecount,the
betterhitterswerethrownfewerstrikesandfewerfastballs.Forinstance,on00,theworsthittersgot66%
fastballs,whilethebesthitterssawonlyabout63%.

Noteveryquestioninsabermetricsisthatsimple.Oneofthemostcontroversialquestionsinbaseball
statisticalanalysisisthatofclutchhitting.Dosomehittershavetheabilitytoturnitupwhenthegameis
ontheline,andperformbetterthanusual?Dootherhittershavetheoppositetendency,hittingbetterwhen
itdoesntmatterasmuch?

Manystudieshavebeendoneonthetopic,startingasmanyas30yearsago.In1977,DickCrameranalyzed
battingrecordsfrom1969and1970,andfoundonlyaveryslighttendencyforclutchplayerstorepeattheir
performanceinsubsequentseasons.In1990,PetePalmerstudiedclutchhittingovermultipleseasons,and
foundthattherewerealmostexactlyasmanyapparentclutchandchokehittersasyouwouldexpectby
luck,ifclutchhittingskilldidntexistatall.

Afewyearslater,TomRuanerepeatedaversionofPalmersstudy,usingalargerdataset,andgotroughly
thesameresult.

Finally,in2006,AndyDolphindidamoresophisticatedmathematicalanalysis,andfoundevidenceforavery,
veryslightvariationinhowplayersvariedintheclutch.Butheconcludedthatitsimpossibletodiscover,with
anydegreeofaccuracy,whichplayersarewhich,andthatforallpracticalpurposes,playersshouldbe
expectedtohitnobetterorworseintheclutchthantheirnormalperformancewouldsuggest.

Sabermetricsisascience,whichmeansthatitfollowsthescientificmethod.Conclusionsmustbebasedon
evidenceandlogic,andanyconclusionscanbereevaluatedoroverturnedifnew,contradictory,evidence
turnsup.Rightnow,theevidencesuggeststhatclutchskillisaveryminorfactorinplayerperformance,if
indeeditexistsatall.Itscertainlypossiblethatsomefuturesabermetricianwillfinddifferingdata,oraflaw
inthepreviousstudies,andforceustochangeourmindsonthequestion.Butifwedochangeourminds,it
willbebecauseofempiricalevidencefortheotherside.

APrimeronStatistics

Forthetypicalfan,sabermetricsdoesntrepresentanythingastheoreticalasscientificinquiry.Rather,
sabermetricsisassociatedwithnewandunfamiliarstatistics.OPSisthemostfamousofthosenewstats.Its
gonefromanearlyunknownstatisticintheearly80s,tobarelyusedadecadeago,tomainstreamnow(it
evenappearsonToppsbaseballcards).TherehavealsobeenstatslikeLinearWeights,Runs
Created,ExtrapolatedRuns,WAR,andsoon.

Idstillarguethatsabermetricsisntreallyaboutthosestatistics;rather,thestatisticshavebeenproventobe
usefulbasedonevidencethatsabermetricianshaveuncovered.RunsCreated,forinstance,isastatistic
thatwascreatedbyBillJamesinthelate1970s.Jamesthinkingwentthisway:ateamsjobonoffenseisto
scorerunsthemoreruns,thebetter.Supposeyoudidntknowhowmanyrunsateamscored,andwanted
tomakeanestimate,basedonitsbattingline.Forinstance,heresarealteambattingline:

G AB H 2B 3B HR BB K AVG
161 5517 1451 234 22 214 604 908 0.263

Howmanyrunswouldyouguessthatteamscoredthatyear?IfImadeyouguess,youdprobablylookovera
fewyearsofteamstatistics,trytofindsometeamthatwasreasonablyclose,andusethatasabaseline.You
mightfindateamthathit.267withlesspower,andscored788runs.Youdfigure,well,thisteamhitonly
.263,buttheyhadafewmorehomeruns,soIguessmaybetheydcancelout,soIdguessthesame788runs.
But,wait,thisteamhadabout20morewalksthantheotherteam,somaybeIshouldbumpupmyestimate
to800orsomething.

WhatBillJamesprobablydidwasworkthroughlogiclikethat,and,aftersometrialanderror,comeupwith
theRunsCreated(RC)formula.Thatstatisticisintendedtoprovideaformalwayofestimatinghowabatting
linetranslatesintoruns.Initsmostbasicform,RClookslike:

RunsCreated=(TB)(H+BB)/(AB+BB)

Ifyouplugthenumbersinfromtheabovebattingline,youget

RunsCreated=(2461)(2055)/(6121)
whichgives826runs.

Asitturnsout,thatwasactuallythebattinglineforthe1985BaltimoreOrioles.Theyactuallyscored818
runs.Theestimateisoffby8runs,whichisverygood,alittlebetterthantypical.

WhyisRunsCreatedimportant?WhydoweneedRCifwealreadyknowtheOriolesscored818runs?Well,
knowingthatthereisapredictablerelationshipbetweenabattinglineandrunsisusefulwhenwedontknow
howmanyrunsweactuallyhave.Forinstance,wecanuseRConanindividualplayersbattingline.Heres
AlbertPujolsin2009:

G AB H 2B 3B HR BB K AVG
160 568 186 45 1 47 115 64 0.327

UsingthebasicRCformula,wecanestimatethatifagivenmajorleagueteamhadabattinglinelikePujols
did,itwouldscoreabout149runs.Thatbattinglinewouldcompriseabout15games,whichgivesabout10
runspergame.

Whatwecanconclude,then,isthatifyouputtogetheralineupofnineAlbertPujolsclones,onaverage
theydscore10runspergame.ThatsahugetotaltheaverageMLBteamscoressomewherebetween4.5
and5.0.

WecancomparePujolstoJoeMauer,orAdamLind,orAlexRodriguez,tohelpinformourconclusionson
howmucheachcontributedtohisteam,oreventoourargumentsaboutwhichplayerdeservestheMVP
award.

RunsCreatedisoneofthemostfamousofthestatisticsusedtoevaluateoffense.OthersincludePete
PalmersLinearWeights,JimFurtadosExtrapolatedRuns,andDavidSmythsBaseRuns.Allareverygood
estimators.Butwhichisthebest?Well,thatdepends.Noestimatorisperfect,andallhavetheirstrengths
andweaknesses.

Onewaytocomparethevariousestimatorsistotestthemforaccuracy.Applythemtothelast(say)fifty
yearsofbaseball,whichshouldgiveyouaround700teamseasons.Havethemeachestimaterunsforall700
teams,andseewhichonesdothebest.

OffensiveStatisticsACaution

Whatdoesallthishavetodowithhowtodobaseballresearch?Well,itbringsmetomyfirstsuggestion:if
yourejuststartingout,youmightwanttoconsiderresearchingsomethingotherthancomingupwithnew
waystoevaluateplayeroffenses.

Itsjustthatitsbeendonetodeath.Ivelistedfourdifferentstatisticsthatevaluateoffenses,andthereare
evenmorethanthose.Allofthemareprettygood,andallofthemarepushingthelimitsofhowaccuratea
statisticcanpossiblybe.

Now,Imnotsayingthattheresnowayyoulldobetter.Iwouldhavethoughtthesamethingmaybe20
yearsago,thattherewasnowaytobeatLinearWeightsandRunsCreatedbutthenDavidSmythcameto
inventBaseRuns,which,bysomemeasures,isthebestyet.Myadviceisnottosuggestthatyoucantdo
better,but,ratherthatyourresearcheffortmayyieldmorefruitifappliedelsewhere.

But,ontheotherhand,evaluatingplayersisfun.Andifthisareaofsabermetricsissomethingthatyoufind
mostinteresting,thengoahead!Butifyoucomeupwithanewstatistic,youwillbeexpectedtocomeup
withhardevidencethatyoursworksbetterthananythatarealreadyoutthere.Itsnotenoughtoargue
theoreticallywhyitshouldworkyouhavetoproveitdoes.

Theresasabermetricadage:JustbecauseastatistichasBabeRuthontopandMarioMendozaonthe
bottom,thatdoesntmeanitsaccuratelymeasuringwhatitssupposedtomeasure.
So,asyouworkonyournewstatistic,keepthesepointsinmind:

Itspossibletogetmoreandmoreaccuratebyincludingmoreandmoreinformation.TheversionofRuns
Createdincludesonlysixdataitems:AB,H,2B,3B,HRandBB.Obviously,youcangetmoreaccurateif
youincludeSBandCS,andHBP,andSF,andotherinformation.Indeed,someoftheotherstatistics
alreadyincludethosecategories,sowhenyoucompareyourstatistictoothers,makesureyouusethe
equivalentversion,toensureyourecomparingapplestoapples.Ifyoushowthatyourstatisticthat
includes20categoriesismoreaccuratethanastatisticthatincludesonlysixcategories,thatsnot
necessarilyabreakthrough.

Itispossibletogetveryaccurateifyouincludesituationalstatisticsthatgiveinformation
aboutwhenthevariouseventshappened.Forinstance,ifyouweretoaddbattingaveragewithrunners
inscoringposition,youdincreasetheaccuracyofyourestimatesquiteabit.Butyouwouldnt
necessarilyincreaseyourstatisticsusefulness.

Ifyouretryingtoshowhowvariousfactorsleadtorunsscored,youcantincludecategoriesthatare
basedonhowmanyrunsactuallyscored!Forinstance,youcandoalotbetterthanRunsCreatedifyou
includerunnersleftonbase.Forinstance(H+BBCSDPrunnerslefton)isalmostexactlyequalto
runs!Thatsbecauseitsalmostequalto(runnersreachingbaserunnerswhodidntscore),which
isexactlythedefinitionofruns.

Afterkeepingallthisinmind,ifyoudocomeupwithastatisticthatyoucandemonstrateismoreaccurate
thanitscounterparts,youllhavesomethingofveryhighinteresttothesabermetriccommunity.But,again,
asIsaid,youhaveanuphillclimb.Thisistheoneareaofsabermetricsthathashadthemosteffortpoured
intoitoverthepastthreeorfourdecades,andabettermousetrapwillnotbeeasytoinvent.
Asimilarcautionappliestoanynewstatistic,especiallyonethatssupposedtoevaluateorrankplayersor
teamsinsomedimension.Ifyournewstatistryingtoestimatesomethingthatcanbemeasured,showhow
wellitdoesthat,especiallycomparedtoanyotherstatsthatareoutthere.Andifitstryingtoestimate
somethingethereal,likeconsistencyordurability,somethingthatdoesnthavearealdefinition,howdo
youknowthatyouremeasuringitthebestwaypossible?Theresnothingwrongwithastatisticlikethat
BillJameshasspeedscore,whichestimatesthefuzzynotionofaplayersbaseballspeedbutbeaware
thatthosekindsofthingsareroughtools,notstrongempiricalfindings.

WhatToResearch

Insabermetrics,asprobablylikeanyotherdiscipline,theresnoofficiallistoftopicstoresearch.Most
sabermetriciansjuststudywhattheyreinterestedin.Often,ideasforsubjectscomeupduringconversations
withotherfans.Youllbetalkingbaseballoverabeer,andsomeonewillsay,well,Imworriedaboutthe
Indiansnextyeartheywent725inSeptemberandOctober,andthatsprobablyabadsignofthingsto
come.

Andyouthink,hmmm,Iwonderifthatstrue,thatabadSeptemberislikelytobeanegativeindicatorfor
nextyearsperformance?And,suddenly,youhaveatopictostudy.

Anothercommonsourceforideasisbaseballbroadcasterstheyllmakesomeclaimontheair,without
givingevidence,andyouspotanopportunitytocheckifwhattheysayistrue.BillJamesusedtodothisalot.
Or,youmightbereadingacertainstudyononeofthemanysabermetricinternetsites,andsomeonemakes
asuggestioninthecommentsor,thestudyraisesaquestioninyourmindthatyouthinkitwouldbe
interestingtoinvestigate.

Ifyourejuststartingout,mysuggestionwouldbetostartfairlysimple.Onepossibilityistofindabunchof
oldBillJamesAbstracts,andreadthroughthem(whichIrecommendyoudoanyway,ifyourenewto
sabermetrics).ThosebooksarefulloflittlestudiesthatBillJamesthrowsinwhenaquestionoccurstohim,
andthosemightleadyoutorelatedquestionsthatyoucantest.EvenrepeatingoneofBillsstudieswith
morecurrentdatacanbeuseful.

Forinstance,inthe1982BillJamesBaseballAbstract(Ballantine,1982),Billliststheaverageattendancefor
everystartingpitcherinthemajorleagues,andfindsthattheonlypitcherwhoreliablyseemedtodrawfans,
in1981,wasrookiephenomFernandoValenzuela.Itimmediatelyoccurredtome:isitstilltruethatthe
startingpitcherdoesntaffectattendance?Idlovetoseeasimilarstudyforrecentyears.
1
Idalsolovetosee
someonetakethisabitfurther.Billjusteyeballedthedatabeforeconcludingthattheredidntseemtobean
effect.Butmighttherebeasmalleffectthatyoudfindifyoulookedharder?Youmightcheckwhetherthe
betterpitcherstendedtodrawmorefansthantheworsepitchers,afteradjustingforday,weather,and
opponent.Maybetheresasmalleffect,butmaybetheresnot.

ThenicethingaboutusingtheBillJamesAbstractsforideasisthatBilltendstousestraightforward
techniquesthatdontrequireanyformalstatisticalexpertise.Histechniquesmaynotbeformalenoughfor,
say,academicjournals,buttheyreexcellentnonetheless,andtheyhaveenabledBillJamestoteachusmore
aboutbaseballthananyothersabermetrician.

Ofcourse,ifyoudohavesomeexpertiseinstatisticaltechniques,thatwillhelptoo.Fortheattendance
study,youmightrunaregressiontopredictattendancebasedonteam,dayoftheweek,opponent,and
startingpitchersquality.But,evenifyoudontuseaformalstatisticaltechnique(and,fortherecord,Ithink
inallofBillJamesswork,hesusedlinearregressionmaybetwice),withabitofcreativityyoucanusuallystill
figureoutwhatsgoingon.

1
UPDATE:itturnsoutthatsomeonehasfollowedupBill'sstudy!InanexcellentpieceinTheHardballTimes2012
BaseballAnnual,MaxMarchilookedatallpitcherssince1947,adjustedforoveralltrends,andfoundmanygreat
starterswhodrewinthefans.NolanRyanwasthecareerleader,with641,000estimatedextraticketssold,whileMark
Fidrychhadthehighestseasonaverage,withatotalofaround300,000ticketsoverthreeyears.
Onceyouvesettledonaquestion,youhavetofigureouthowyouregoingtoworkyourwaytoananswer.
Thatllbedifficultwithoutsomeknowledgeofsabermetrics.Theresnofieldofhumanknowledgewhereyou
canjustjumpinwithoutsomebasicunderstandingofhowthefieldworksandwhatsalreadybeendone.
Indeed,iftherewereonlyonepieceofadviceIwasallowedtogivetoaspiringresearchers,itwouldbe:learn
somesabermetricsfirst.AsmyfriendJohnMatthewIVsaid,Ifyouwereinterestedinastronomy,you
wouldreadatleastafewbooksbeforetryingtopredictthepathofacomet.

Andso:knowsomeofthesabermetriccanon.Inthenextsection,Illoutlinewhatmightbeareadinglistfor
Sabermetrics101.

Also,beforeyoustartworkingonyourproblem,youregoingtowanttocheckwhetherothershaveworked
ontheproblembefore.Maybetheyvealreadydonetheexactsamethingyoureplanningtodo.Maybe
theyvegoneonlypartoftheway,andyoucanexpandonwhattheyvedone.Andmaybetheyvethoughtof
somethingsthatyouhavent,ormaybeyouwontagreeonhowtheydidit.

Inanycase,nomatterhowknowledgeableyouareinsabermetrics,nobodyisawareofeverything.Before
youstart,youllwanttosearchtheliterature,toseewhatprogresshasalreadybeenmadeonyourproblem.
Welltalkaboutthatabitlatertoo.

LiteratureSearch

Soyoureatthepointwhereyouhavearesearchideainmind.Yournextstep,then,istofindanyprevious
workthatsalreadybeendoneonyourtopic.

Inacademia,theresaconventionalwisdomonhowtodoaliteraturesearch,andalotofitinvolvesindexesto
scholarlyjournalsthatcoveryourtopic.Insabermetrics,however,itsnotquitesosimplemuchofthebest
researchispublishedonline,onanyoneofhundredsofwebsites,withoutaformalpeerreviewprocessto
separatethegoodfromtheflawed.

So,asmuchaswemightwishtherewereastepbystepprocessforfindingexistingwork,therealityisthatit
becomesabitofaseatofthepantsthing.Somesuggestions,though,forhowtoproceed:

1.Scantheresearchrepositories

Whilemostsabermetricworkofrecentvintageiswebpublished,therearestillseveralmoreformal
repositoriesofstudies.Theadvantageofthoseisthat,iftheyreallatonespecificwebsite,youcansearch
themonlinebyusinganynormalsearchengine(suchasGoogle),butusingtheadvancedsearchfeatureto
askforresultsonlyfromthatonesite.
Somespecificplacestolook:

EverybackissueofSABRsByTheNumbersisavailable.ThereisarepositoryattheSABRwebsiteandat
myownwebsite,www.philbirnbaum.com.

Inthe1980s,BillJameseditedtheBaseballAnalyst,asabermetricsnewsletterthatwentouttowhatI
thinkwereonlyafewdozensubscribers.In2012,SABRpublishedthoseonlineforthefirsttime
atsabr.org/research/baseballanalystarchives.

TomTango,oneoftheleadingactivesabermetricresearcherstoday,hassomeofhisownstudiesathis
website,tangotiger.net.

TangoandhiscoauthorsofTheBookhavesetupawiki,anopensourceencyclopediaofsabermetric
subjects.Therehasbeensometalkofabandoningtheproject,but,attimeofwriting,itsstillactive
attangotiger.net/wiki/index.php?title=Main_Page.

CharliePavitt,aSABRmemberandregularcontributortoBytheNumbers,hascompiledabibliography
ofpublishedsabermetricpapers.Itsdedicatedtoonlythemoreformalpublicationoutlets,soitsmissing
alargepartoftherecentexplosioninwebresearch.Still,itsaworthysource.Adescriptioncanbefound
hereandthebibliographyitselfcanbefoundhere.

2.Searchthebiggestwebsitesdedicatedtosabermetricresearch

Myadvicewouldbetostartbysearching"TheBook"blog.There,TomTangoreviews,oratleastmentions,a
largeproportionofthemostsignificantstudies.Also,thesitehas,inmyopinion,thedensestpopulationof
knowledgeablecommenters;almostalways,youlearnmorefromthecommentdiscussionthanfromthe
studiesthemselves.Commentsdoshowupinthesearches,Ibelieve.

Fromthere,considertheseothersites:

TheHardballTimes
BaseballProspectus(subscriptionrequiredforsomecontent)
BaseballAnalysts
BeyondtheBoxScore
FanGraphs
TangoTiger

In2010,BeyondtheBoxScoreheldapolltovoteforthebestsabermetricwebsitesandstudies.Allthe
nomineewebsitesareworthalookandasearch,andcanbefoundat
http://www.beyondtheboxscore.com/2010/1/21/1263306/yourbtbsabermetricawardvoting.

3.Ask

Perhapsthebestwaytofindresearchonacertaintopicistoaskaround.Therearevariousplacestoask,but,
beforedoingso,pleasespendsometimelookingforyourself.Thatsjustacourtesytothosetowhomyou
arerequestingassistance.IhavehadpeopleemailmeaboutfindingresearchontopicX,whentheycould
havefoundwhattheyrelookingforbydoingthesimplestsearchforXonGoogle.

Peoplearegenerallyverywillingtohelpwhenyoushowwhatyouvetried,andyouletthemknowwhat
youvefoundsofar.

Placestoask:

Onegoodbetistowritetoauthorsofstudiesontopicsthatareclosetoyours.Ifyourethinkingofdoing
astudyonhowaccuratescoutsarewhentheyevaluatepitchers,andyoufindastudyonhowaccurate
scoutsarewhentheyevaluatebatterswell,theauthorisprobablyasinterestedinthesubjectasyou
are,andislikelytobeabletohelp.Eveniftheansweris,sorry,Idontknowofanything,thatsasign
thatyourtopicmayindeedbeafreshone.

Mostwebsitesallowcommentsonthestudiestheypublish.Iftheresatopicthatssimilarinsomeways,
postacommentaskingaboutyourtopic.

Askonemailforums.SABRhasSABRL,whichisprobablyabittoogeneralformanydetailed
sabermetricinquiries,butstillworthashot.AbetterplaceistheYahoogroupstatisticalanalysis,which
isfreetojoinforSABRmemberswithaninterestinsabermetrics.

Finally,youcantryspecificpeople.Idontmindanoccasionalinquiry,andImsuremanyothersare
happytoanswertoo.Ifyourestuck,youcanalwaystrywritingtosomeonewhoyouknowisanactive
researcher.Manyofthesabermetricwebsiteshavelinkstocontactauthors.SabermetricianJohnDoe
maynothavepublishedanythingthattouchesonyourspecifictopic,butifhepublishesacolumnevery
weekandaresearchpaperonceamonth,youwouldntbeoutoflineoccasionallysendingacourteous
requestforassistance.

HowtoFindRawData

Backinthebeginningdaysofsabermetrics,datawashardtocomeby.Somethingswerenttoobadifyou
wantedtoknowBillTerrysbattingaveragein1933,thereweretwoencyclopedias,Macmillanand
Neft/Cohen,thatwouldtellyou.Butifyouwantedmoreesotericstatistics,likeJoeMorganscareer
performancewiththebasesloaded,youwereoutofluck.

WhenBillJamesstartedwritinghisselfpublishedBaseballAbstractsbackinthelate1970s,hehadtocompile
situationalstatisticshimself,fromthedailyboxscores,withoutacomputer.Atthetime,Billmarketedhis
bookasfeaturing18categoriesofstatisticalinformationthatyoujustcantgetanywhereelse.
Jamesfoundthathehadtokeepcompilingthosestatsevenintothe1980s;famously,inhis1981book,he
reprintedaletterfromtheChicagoCubsrefusingtoprovidehimwithsuchintelligencetypestats.
Now,ofcourse,thingsaredifferent.Thereisnoshortageofalmostanykindofdata.Myfourfavoritesin
roughorderofincreasingdetailare:
MLB.com
BaseballReference.com
TheLahmanDatabase
Retrosheet.org

MLB'swebsiteprovidescopiousstatisticaldata,sortableandprintable,updatedinstantlyasgamesprogress.
Butthatstuffcanbefoundelsewhere.ThemainattractionoftheMLBwebsiteisthatitprovidesPITCHf/x
data.Thatis,foreverypitchthrownbyanypitchersinMLB,theylltellyouthetypeofpitch,whereitcrossed
theplate,andhowmuchitbrokeverticallyandhorizontally.Asaresult,andnotsurprisingly,muchofthe
groundbreakingresearchthesedayshastodowithpitchanalysis.

EasilythebestsourceforprecalculatedhistoricalstatisticsisBaseballReference.com(BR).Thatsitehas
prettymuchrenderedprintedbaseballencyclopediasobsolete.NotonlydoyougettheregularBillTerrys
battingaveragedata,butyoualsogetalargeselectionofsabermetricstats,breakdownsbytensofdifferent
criteria(left/right,day/night,April/September,andsoon),andtheabilitytomanipulatethedatainwaysthat
otherwebsitesdontallow.Youcanalsodoabsurdlyspecificsearches.WanttoknowJoeMorganslongest
consecutivestreakofgameswherehecametotheplateatleasttwice?Theanswer:235games.(Ifyouwant
thedetails,youhavetosubscribe,buttheoverwhelmingmajorityoftheinformationonthesitecanbehad
forfree.)

Forthoseofuswhowanttodomorecomplicatedthings,BaseballReference,awesomeasitis,justisnt
enough.Weneedtherawdataonourowncomputers,sowecanmanipulateitinwaysthatBRnever
thoughtof.Therearetwomainsourcesofrawdata:theLahmanDatabaseandRetrosheet.

TheLahmanDatabasecanbeobtainedforfreeatseanlahman.com/baseballarchive/statistics,thewebsiteof
itscreator,SeanLahman.ItsbasicallyastandardBaseballEncyclopediaindownloadableform.Youcanget
itintextform,forloadingintoExcel,but,moreimportantly,italsocomesinrelationaldatabaseformat
(MicrosoftAccess).IfyourefamiliarwithAccessandwithSQLdatabasequeries,youknowhowconvenientit
istouseittodopowerful,specificdatasearchesquickly.(IfyourenotfamiliarwithSQL,therehave
beenafewtutorialsonsabermetricsitesrecently.)

Anyway,theLahmanDatabasehaseveryplayersstandardbattingandpitchinglineforeveryyear.Itsgot
managers,birthdates,awards,allstargames,andothergoodstuff.Itslimitationisthatdataisavailableonly
forsingleseasonsifyouwanttoknowhowEddieMurrayhitinJuly1979,theresnowaytheLahman
Databasewilltellyou.Forthat,youhavetoturntoRetrosheet.

Retrosheetis,basically,amiracle.Itstheresultofasmallarmyofvolunteers,combinghistoricalsourcesto
trytorecreatetheplaybyplayofeverygameinbaseballhistoryanddigitizingitfordownloadandanalysis.
Icantbegintoimaginehowdifficultitistofindallthatinformation,toreconstructthetopofthe6thinning
oftheCardinals/PhilliesgameofApril29,1953.Buttheydid.(D.Ricegroundedout(shortstoptofirst);
Preskopoppedtofirstinfoulterritory;Hemuspoppedtofirstinfoulterritory.)

Youcanalsoseetheentirecareerofanyplayer,gamebygame.Youcanseethestandingsandresults
fromanydateinbaseballhistory.Youcanseeacoachscareer,whichteamshecoachedforandwhathe
coached,andevenhowmanytimeshewasejected.

Youcanseethisstuffonline,or,ifyouhavecomputerdatamanipulationskills,youcandownloaditandwork
withityourself.YoucanloadthedataintoExcelandwritemacrostomanipulateit.Or,youcanwrite
programstoanalyzeit;IuseVisualBasic,butanylanguagewilldo.Theresa2006bookcalledBaseball
Hacks(OReilly),whichexplainshowtouseacomputerlanguagecalledRtodownloadandanalyze
Retrosheetdata(and,actually,lotsofotherbaseballdatathatcanbefoundontheinternet).

NotallofbaseballhistoryisavailableonRetrosheetyet.Thevolunteersarestillworkingonit,though.
(Wanttohelp?Clickherefordetails.)Fornow,youcanseegamebygamesummariesfrom1871on.You
canseeboxscoresformorethan90percentofgamessince1916.And,ifyouwantfullplaybyplaydata,its
availableforanygameafter1952,andalargenumberofgamesbeforethat.Someyearsevenincludepitch
bypitchdata,intermsofball,strike,foul.

Theresultofliterallytensofthousandsofhoursofvolunteerlabor,Retrosheetisthegreatestsabermetric
resourceever.

ComputerAidedResearch

Beforethe1990s,asignificantproportionofsabermetricresearchwasdonewithoutthebenefitofcomputers
oratleast,withoutthekindofcomputerpowerandsoftwarewehavetoday.Agreatdealofstatistical
informationhadtobecompiledbyhand,ortypedbyhandintospreadsheets.Asaresult,manystudiesused
onlyasmallamountofdata,inordertokeeptheworkloadmanageable.

Thingsaredifferentnow,ofcourse,anditshardertostudynewareaswithoutthebenefitofacomputerand
agoodbaseballdatabase.Thatsbecausealotofthelowhangingfruithasbeenpicked,andwerenow
lookingformoreandmoresubtleeffects.In1977,DickCramersclutchhittingstudyconsistedofonlytwo
yearsofbattingaveragedata,enteredbyhand.Fromthat,hewasabletofindthatclutchhittingconsistency
wasnexttonothing.But,thatwasonlyoneyearsdata,notenoughforadefinitiveconclusion.Ittook
others,withmoresophisticatedcomputers,andexistingbaseballdatabases,torefinethatresulttothelevel
ofunderstandingwehavetoday.

Inanearly2000sessayonthistopic,NealTravenwrote,thecomputerisalmostanobligatorytoolfor
sabermetricresearch.Thatholdsevenmoretoday.

Itsunfortunatelytruethatyouregoingtoneedacertainamountofcomputerskillsinordertobeableto
takeahugemountainofbaseballdataandtrytosqueezeconclusionsoutofit.

Benefits

Sabermetricshasamixedreputationintheoutsideworld.Inmainstreamsportswriting,itssometimesseen
assomethingnerdsdofromtheirparentsbasements,somethingrealsportswritersdontneedbecausethey
seeallthegamesandknowalltheplayers.Inacademia,itsnotalwaysrespectedasseriousresearch,
becauseitoftendoesntfitintoanyspecificestablisheddiscipline(althougheconomistsarestartingtoget
involved),becauseitoftendoesntuseenoughfancymath,andbecauseitsonlyaboutbaseball.Andit
usedtobethatinbaseballitself,sabermetricswasnotperceivedtobeanythingthatwouldbeofusetothe
insidersofamajorleagueteam.

ButthesituationinMLBischanging,perhapsduetoMoneyball(Norton,2004),MichaelLewisstoryofhow
BillyBeanesOaklandAthleticsusedsabermetricstobuildawinningteamonthecheap.In2003,theRedSox
hiredBillJames.Sincethen,otherteamshavehiredstatisticalanalystsandbegunadvertisingforsimilar
positions.

Still,theseriousstudyofbaseballthroughitsstatisticsisnttakenallthatseriouslyoutsideofthe
Moneyballcrowd.Overthepastcoupleofyears,therehavebeenseveraluniversityprofessorswhohavehad
theirschoolsissueapressreleasewhentheycameupwithsomethingsabermetric.Usually,thoseacademic
studiesarentanymoreworthyofspecialrecognitionthanmanyotherstudiespublishedontheInternetat
thesametime.ButIguessbaseballisasubjectthatmanyconsiderlessserious,than,say,sociology,sothe
ideathatpeoplestudyitinearnestbecomesabitofanovelty.

Evenifthewiderworlddoesntseesabermetricsascompletelyserious,itspractitionersdo.Inonerecent
universitypressrelease,theprofessorexpresseshisinterestinsomedaygettinghisdreamjobdoing
sabermetricconsultingforamajorleagueteam.Thatssomethingalotofsabermetricianswouldbe
interestedin,obviously.Manyhavealreadygottenthere,inrecentyears.

Buttherewillprobablyalwaysbemoresabermetriciansthanemploymentopportunities.Formostofus,the
motivationforsabermetricsisnottheglamourofhavinganinsidejobwithabaseballteam,butjustour
interestinbaseball.Andscientificcuriosityisabigfactortoo.Becauseoftheabundanceofcheapdata,its
relativeneglectbytheacademiccommunity,andthefactthatthescienceissoyoung,sabermetricsis
perhapsthebestseriousfieldwhereparttimeresearcherscanroutinelymakethemostsignificant
discoveries.Andtheresacertainthrillincreatingnewknowledge,discoveringsomethingthatnobodyknew
before.

Andifthethrillofscientificdiscoveryisntenough,thefactthatthosediscoveriesareaboutbaseballfor
many,ourfavoritesubjectonearthisicingonthecake.

You might also like