Professional Documents
Culture Documents
InClassExercise:SimplePredictiveAnalyticsUsingTableau
Objective:Analyzeadatasettomakeinferencesaboutfutureoutcomes
LearningOutcomes:
Forecastfuturesalesbasedonordertransactiondata
Performassociationanalysistodeterminewhichproductsarepurchasedtogether
Interpretthemeaningoftheresultsfromtheseanalyses
Inthisexercise,youllonceagainbeworkingwithadatasetofordersforanimaginary
company,VandelayIndustries.
Thedatasetcontains102,531lineitemsfor60,011ordersplacedbetweenJanuary1,2009and
December31,2013.
Part1:Downloadthedatafile
1) GototheCommunitySitepostforthisinclassexercise.RightclickVandelayOrdersAll.xlsx
andsaveittoyourcomputer.
2) OpenthedatafileinExcel.TakeaquicklookthroughthedataandtheDataDictionarytab.
Part2:ForecastfuturesalesinTableau
ThefirstthingwelldoisuseTableautopredictfuturesalesbasedondailysalesfrom2009
through2013.Tableauhasaforecastingfeaturebuiltin,soitseasytodo.
1) StartTableauandclickConnecttodata.
2) ClickMicrosoftExcel.
3) OpentheVandelayOrdersAll.xslxworkbook.
Page1
4) DragtheVandelayOrders(All)sheettothewhitespace.ThenclickGotoWorksheet.
5) DragtheOrderShortDatedimensiontotheColumnsshelfandTotalProductPriceto
theRowsshelf.
6) ClickthelinegraphundertheShowMearea.
Page2
7) Youllseealinegraphoftheyeartoyearaggregatesales.
NoticeOrderShortDateappearsasYEAR(OrderShortDate).Tableauautomaticallypresents
datesashierarchiessoyoucandrilldowntoQuarter(orMonthorDay.)
8) ClickontheplussignnexttoYEARtodrilldowntoquarters.Youllseethis:
Page3
9) NowwecanrunaforecastbyselectingtheAnalysismenuandthenForecast/Show
Forecast.Youllseethis:
ThereisagapbecauseTableaudoesntcountthelastdataperiodinitsanalysis.Inthiscase,
ourlastdataperiodisthefourthquarterof2013.
Letstalkaboutsomeotheraspectsofthischart:
Thesolidlinetotherightofthegaparetheforecastedvaluesthepredictionoffuture
sales.
Theshadedareaisthe95%predictioninterval.Thismeansthattheactualvalueswillfall
somewhereintheshadedrange95%ofthetime.Notethatthesolidlineisrightdown
themiddleofthepredictioninterval.
ThepredictionintervalisprettywidethismeansitisdifficultforTableautobe
confidentaboutitspredictionusingquarterlydata.Theresjustnotenoughofittomake
agoodprediction.
Page4
10) ClickontheplussignnexttoQUARTERtodrilldowntoMONTH.Youllseethis:
Noticethatthegapissmaller(becauseitisonlyleavingoutonemonth,notonequarter),
andthatthepredictionintervalismuchnarrower.ThemainreasonforthisisTableauhas
muchmoredatatoworkwith(60monthsinsteadof20quarters).
Themoredatapointsyouuse,thebetteryourpredictionsbecome.
11) Letschangetheconfidencelevelofthepredictioninterval.GototheAnalysismenuand
selectForecast/ForecastOptions
12) Changethepredictionintervalto99%.ThenclickOK.
Page5
13) Youllseethepredictionintervalgetslightlywider,sincenowyoureaskingTableauto
presentarangeofvaluesthatwillcontaintheactualvalue99%ofthetime(insteadof95%).
Toseewhythisistrue,thinkaboutagame
whereyouthrowcrumpleduppaperintoa
wastebasket.Sayyousuccessfullygetthe
paperintothewastebasket95%ofthetime.
Ifyouwanttomakesureyougetitintothe
wastebasket99%ofthetime,oneoptionis
tobuyalargerwastebasket!
Alargerpredictionintervalislikealarger
wastebasket.
14) SaveyourTableauworkbookandcloseit.
Page6
Part3:PerformanassociationanalysisinTableau
(Adaptedfromkb.tableausoftware.com/articles/knowledgebase/marketbasketanalysis)
Associationanalysisisdiscoveringwheneventsoccuratthesametime.Inthiscase,were
lookingforwhichproductsarepurchasedtogether(withinthesameorder).
Tableaudoesnthaveanassociationanalysisfunction,butwithsomeclevertablejoiningwe
candoasimpleversionofthetypeofanalysismoresophisticateddataminingprogramsdo.
1) OpenTableauagain.MakesureyourestartinganewTableaufile.
2) ClickConnecttodata.
3) ClickMicrosoftExcel.
4) ClickONCEonVandelayOrdersAll.xlsx.Justselectthefiledontopenit!
5) ClickthedownarrownexttoOpenandselectOpenwithLegacyConnection.
6) DragtheVandelayOrders(All)sheettothewhitespace.
7) Again,dragtheVandelayOrders(All)sheettothewhitespaceasecondtime.Itshould
looklikethis:
buttheJoindialogmaycoverupthesecondVandelayOrders(All)sheet.
Page7
8) Ifyoudontseethejoindialog,clickonthejoinareabetweenthetwosheets:
9) Youllcreatetwojoins:
SelectProductNamefromDataSourceandValendayOrders(All)$1
Selectthe<>symbolfromthemiddledropdownbox.
SelectOrderIDfromDataSourceandVandelayOrders(All)$1
Selectthe=symbolfromthemiddledropdownbox.
ItshouldlookEXACTLYlikethis:
Sowhatdoesthismean?Itscalledaselfjoinyoureconnectingthetablewithitself.
YoureaskingTableautomatchupanycombinationofdifferentproducts
(Productname<>productname)
thatarepartofthesameorder
(OrderId=orderid).
10) Whenyouhavethissetupliketheimageabove,clickGotoWorksheet.
Page8
11) DragtheProductNamedimensionfromVandelayOrders(All)$(fromthefirstsetof
dimensions)totheColumnsshelf.
ThendragtheproductnamedimensionfromVandelayOrders(All)$1(fromthesecond
setofdimensions)totheRowsshelf.
12) Youllseesomethinglikethis:
Page9
13) UnderMeasures,dragNumberofRecordstotheTexticonundertheMarksarea.
14) Youllnowseethis:
Thisshowshowmanyorderscontainedbothproducts.Forexample,lookatthefirstrow.
WenowknowthatAntiDentiteJeansandAnytown,USASweatshirtsappearedtogetherin
thesameorder3times(hoveryourmouseovertheproductnametoseethewholething).
Hereareafewmore:
BadBreakerUpperSocksandArmoireTShirtsappearedinthesameorder43times.
BabyBoxersandAstronautPenBoxersappearedinthesameorder5times.
BOSCOTShirtsandAntiDentiteJeansappearedinthesameorder2times.
Page10
15) Itsnotdifficulttounderstand,butitwouldbeeasierifwecouldgenerateaneasytoread
visualofthisdata.
DragNumberofRecordstotheColoriconintheMarksarea
16) ClickontheColoriconintheMarksarea,thenclickEditColors
17) ChooseAreaRedforthePalette
18) GobacktoMeasuresanddragSUM(NumberofRecords)intheMarksareatothesize
iconintheMarksarea.
19) ClicktheSizeiconintheMarksareaandmovethesliderabouttwothirdsofthewaytothe
right.
20) Itsnowveryeasytoseetheproductcombinationsthataremostpopular.
Page11
21) Ifyouwanttoseedetailedinformationaboutaproductcombination,hoveryourmouse
overasquare.
22) SaveyourTableauworkbook.
Page12