You are on page 1of 12

MIS0855:DataScience

InClassExercise:SimplePredictiveAnalyticsUsingTableau
Objective:Analyzeadatasettomakeinferencesaboutfutureoutcomes

LearningOutcomes:

Forecastfuturesalesbasedonordertransactiondata
Performassociationanalysistodeterminewhichproductsarepurchasedtogether
Interpretthemeaningoftheresultsfromtheseanalyses
Inthisexercise,youllonceagainbeworkingwithadatasetofordersforanimaginary
company,VandelayIndustries.
Thedatasetcontains102,531lineitemsfor60,011ordersplacedbetweenJanuary1,2009and
December31,2013.

Part1:Downloadthedatafile
1) GototheCommunitySitepostforthisinclassexercise.RightclickVandelayOrdersAll.xlsx
andsaveittoyourcomputer.

2) OpenthedatafileinExcel.TakeaquicklookthroughthedataandtheDataDictionarytab.

Part2:ForecastfuturesalesinTableau
ThefirstthingwelldoisuseTableautopredictfuturesalesbasedondailysalesfrom2009
through2013.Tableauhasaforecastingfeaturebuiltin,soitseasytodo.

1) StartTableauandclickConnecttodata.

2) ClickMicrosoftExcel.

3) OpentheVandelayOrdersAll.xslxworkbook.

Page1
4) DragtheVandelayOrders(All)sheettothewhitespace.ThenclickGotoWorksheet.



5) DragtheOrderShortDatedimensiontotheColumnsshelfandTotalProductPriceto
theRowsshelf.

6) ClickthelinegraphundertheShowMearea.

Page2
7) Youllseealinegraphoftheyeartoyearaggregatesales.

NoticeOrderShortDateappearsasYEAR(OrderShortDate).Tableauautomaticallypresents
datesashierarchiessoyoucandrilldowntoQuarter(orMonthorDay.)



8) ClickontheplussignnexttoYEARtodrilldowntoquarters.Youllseethis:

Page3
9) NowwecanrunaforecastbyselectingtheAnalysismenuandthenForecast/Show
Forecast.Youllseethis:



ThereisagapbecauseTableaudoesntcountthelastdataperiodinitsanalysis.Inthiscase,
ourlastdataperiodisthefourthquarterof2013.

Letstalkaboutsomeotheraspectsofthischart:

Thesolidlinetotherightofthegaparetheforecastedvaluesthepredictionoffuture
sales.
Theshadedareaisthe95%predictioninterval.Thismeansthattheactualvalueswillfall
somewhereintheshadedrange95%ofthetime.Notethatthesolidlineisrightdown
themiddleofthepredictioninterval.
ThepredictionintervalisprettywidethismeansitisdifficultforTableautobe
confidentaboutitspredictionusingquarterlydata.Theresjustnotenoughofittomake
agoodprediction.

Page4
10) ClickontheplussignnexttoQUARTERtodrilldowntoMONTH.Youllseethis:



Noticethatthegapissmaller(becauseitisonlyleavingoutonemonth,notonequarter),
andthatthepredictionintervalismuchnarrower.ThemainreasonforthisisTableauhas
muchmoredatatoworkwith(60monthsinsteadof20quarters).

Themoredatapointsyouuse,thebetteryourpredictionsbecome.

11) Letschangetheconfidencelevelofthepredictioninterval.GototheAnalysismenuand
selectForecast/ForecastOptions

12) Changethepredictionintervalto99%.ThenclickOK.

Page5
13) Youllseethepredictionintervalgetslightlywider,sincenowyoureaskingTableauto
presentarangeofvaluesthatwillcontaintheactualvalue99%ofthetime(insteadof95%).




Toseewhythisistrue,thinkaboutagame
whereyouthrowcrumpleduppaperintoa
wastebasket.Sayyousuccessfullygetthe
paperintothewastebasket95%ofthetime.
Ifyouwanttomakesureyougetitintothe
wastebasket99%ofthetime,oneoptionis
tobuyalargerwastebasket!

Alargerpredictionintervalislikealarger
wastebasket.


14) SaveyourTableauworkbookandcloseit.

Page6
Part3:PerformanassociationanalysisinTableau
(Adaptedfromkb.tableausoftware.com/articles/knowledgebase/marketbasketanalysis)

Associationanalysisisdiscoveringwheneventsoccuratthesametime.Inthiscase,were
lookingforwhichproductsarepurchasedtogether(withinthesameorder).
Tableaudoesnthaveanassociationanalysisfunction,butwithsomeclevertablejoiningwe
candoasimpleversionofthetypeofanalysismoresophisticateddataminingprogramsdo.
1) OpenTableauagain.MakesureyourestartinganewTableaufile.

2) ClickConnecttodata.

3) ClickMicrosoftExcel.

4) ClickONCEonVandelayOrdersAll.xlsx.Justselectthefiledontopenit!

5) ClickthedownarrownexttoOpenandselectOpenwithLegacyConnection.



6) DragtheVandelayOrders(All)sheettothewhitespace.

7) Again,dragtheVandelayOrders(All)sheettothewhitespaceasecondtime.Itshould
looklikethis:


buttheJoindialogmaycoverupthesecondVandelayOrders(All)sheet.

Page7
8) Ifyoudontseethejoindialog,clickonthejoinareabetweenthetwosheets:


9) Youllcreatetwojoins:

SelectProductNamefromDataSourceandValendayOrders(All)$1
Selectthe<>symbolfromthemiddledropdownbox.

SelectOrderIDfromDataSourceandVandelayOrders(All)$1
Selectthe=symbolfromthemiddledropdownbox.

ItshouldlookEXACTLYlikethis:



Sowhatdoesthismean?Itscalledaselfjoinyoureconnectingthetablewithitself.

YoureaskingTableautomatchupanycombinationofdifferentproducts
(Productname<>productname)
thatarepartofthesameorder
(OrderId=orderid).

10) Whenyouhavethissetupliketheimageabove,clickGotoWorksheet.

Page8
11) DragtheProductNamedimensionfromVandelayOrders(All)$(fromthefirstsetof
dimensions)totheColumnsshelf.

ThendragtheproductnamedimensionfromVandelayOrders(All)$1(fromthesecond
setofdimensions)totheRowsshelf.



12) Youllseesomethinglikethis:

Page9
13) UnderMeasures,dragNumberofRecordstotheTexticonundertheMarksarea.



14) Youllnowseethis:



Thisshowshowmanyorderscontainedbothproducts.Forexample,lookatthefirstrow.
WenowknowthatAntiDentiteJeansandAnytown,USASweatshirtsappearedtogetherin
thesameorder3times(hoveryourmouseovertheproductnametoseethewholething).

Hereareafewmore:

BadBreakerUpperSocksandArmoireTShirtsappearedinthesameorder43times.
BabyBoxersandAstronautPenBoxersappearedinthesameorder5times.
BOSCOTShirtsandAntiDentiteJeansappearedinthesameorder2times.

Page10
15) Itsnotdifficulttounderstand,butitwouldbeeasierifwecouldgenerateaneasytoread
visualofthisdata.

DragNumberofRecordstotheColoriconintheMarksarea

16) ClickontheColoriconintheMarksarea,thenclickEditColors

17) ChooseAreaRedforthePalette

18) GobacktoMeasuresanddragSUM(NumberofRecords)intheMarksareatothesize
iconintheMarksarea.

19) ClicktheSizeiconintheMarksareaandmovethesliderabouttwothirdsofthewaytothe
right.



20) Itsnowveryeasytoseetheproductcombinationsthataremostpopular.

Page11
21) Ifyouwanttoseedetailedinformationaboutaproductcombination,hoveryourmouse
overasquare.



22) SaveyourTableauworkbook.

Page12

You might also like