Professional Documents
Culture Documents
IanH.Witten
DepartmentofComputerScience UniversityofWaikato NewZealand
weka.waikato.ac.nz
DataMiningwithWeka
apracticalcourseonhowto useWeka fordatamining explainsthebasicprinciples ofseveralpopularalgorithms
IanH.Witten
UniversityofWaikato,NewZealand
DataMiningwithWeka
Whatsdatamining?
Weareoverwhelmedwithdata Dataminingisaboutgoingfromdatatoinformation, informationthatcangiveyouusefulpredictions
Examples??
Youreatthesupermarketcheckout. Yourehappywithyourbargains andthesupermarketishappyyouveboughtsomemorestuff Sayyouwantachild,butyouandyourpartnercanthaveone. Candatamininghelp?
Dataminingvs.machinelearning
DataMiningwithWeka
WhatsWeka?
AbirdfoundonlyinNewZealand?
Dataminingworkbench
WaikatoEnvironmentforKnowledgeAnalysis Machinelearningalgorithmsfordataminingtasks 100+algorithmsforclassification 75fordatapreprocessing 25toassistwithfeatureselection 20forclustering,findingassociationrules,etc
DataMiningwithWeka
Whatwillyoulearn?
LoaddataintoWeka andlookatit Usefilterstopreprocessit Exploreitusinginteractivevisualization Applyclassificationalgorithms Interprettheoutput Understandevaluationmethodsandtheirimplications Understandvariousrepresentationsformodels Explainhowpopularmachinelearningalgorithmswork Beawareofcommonpitfallswithdatamining
Class1:GettingstartedwithWeka
InstallWeka ExploretheExplorer interface Exploresomedatasets Buildaclassifier Interprettheoutput Usefilters Visualizeyourdataset
Courseorganization
Class1 GettingstartedwithWeka
Lesson1.1
Activity1
Class2 Evaluation
Lesson1.2
Activity2
Class3 Simpleclassifiers
Lesson1.3
Activity3
Lesson1.5
Activity5
Class5 Puttingitalltogether
9
Lesson1.6
Activity6
Courseorganization
Class1 GettingstartedwithWeka
Class4 Moreclassifiers
Class5 Puttingitalltogether
10
Postclassassessment
2/3
Textbook
Thistextbookdiscussesdatamining, andWeka,indepth: DataMining:Practicalmachine learningtoolsandtechniques,
byIanH.Witten,Eibe Frankand MarkA.Hall.MorganKaufmann,2011
DataMiningwithWeka
Class1 Lesson2 ExploringtheExplorer
IanH.Witten
DepartmentofComputerScience UniversityofWaikato NewZealand
weka.waikato.ac.nz
Lesson1.2:ExploringtheExplorer
Class1 GettingstartedwithWeka
Class2 Evaluation
Class3 Simpleclassifiers
Class4 Moreclassifiers
Lesson1.5Usingafilter Lesson1.6Visualizingyourdata
Class5 Puttingitalltogether
14
Lesson1.2:ExploringtheExplorer
Downloadfrom http://www.cs.waikato.ac.nz/ml/weka
(forWindows,Mac,Linux)
Weka 3.6.10
(thelateststableversionofWeka) (includesdatasetsforthecourse) (itsimportanttogettherightversion,3.6.10)
15
Lesson1.2:ExploringtheExplorer
16
Lesson1.2:ExploringtheExplorer
17
Lesson1.2:ExploringtheExplorer
attributes
Outlook Temp
Hot Hot Hot Mild Cool Cool Cool Mild Cool Mild Mild Mild Hot Mild
Humidity
High High High High Normal Normal Normal High Normal Normal Normal High Normal High
Windy
False True False False False True True False False False True True False True
18
Play
No No Yes Yes Yes No Yes No Yes Yes Yes Yes Yes No
instances
1 2 3 4 5 6 7 8 9 10 11 12 13 14
Sunny Sunny Overcast Rainy Rainy Rainy Overcast Sunny Sunny Rainy Sunny Overcast Overcast Rainy
Lesson1.2:ExploringtheExplorer
openfileweather.nominal.arff
19
Lesson1.2:ExploringtheExplorer
20
Lesson1.2:ExploringtheExplorer
InstallWeka Getdatasets OpenExplorer Openadataset(weather.nominal.arff) Lookatattributesandtheirvalues Editthedataset Saveit?
Coursetext Section1.2 Theweatherproblem Chapter10 IntroductiontoWeka
21
DataMiningwithWeka
Class1 Lesson3 Exploringdatasets
IanH.Witten
DepartmentofComputerScience UniversityofWaikato NewZealand
weka.waikato.ac.nz
Lesson1.3:Exploringdatasets
Class1 GettingstartedwithWeka
Class2 Evaluation
Class3 Simpleclassifiers
Class4 Moreclassifiers
Lesson1.5Usingafilter Lesson1.6Visualizingyourdata
Class5 Puttingitalltogether
Lesson1.3:Exploringdatasets
attributes
Outlook Temp
Hot Hot Hot Mild Cool Cool Cool Mild Cool Mild Mild Mild Hot Mild
Humidity
High High High High Normal Normal Normal High Normal Normal Normal High Normal High
Windy
False True False False False True True False False False True True False True
24
Play
No No Yes Yes Yes No Yes No Yes Yes Yes Yes Yes No
instances
1 2 3 4 5 6 7 8 9 10 11 12 13 14
Sunny Sunny Overcast Rainy Rainy Rainy Overcast Sunny Sunny Rainy Sunny Overcast Overcast Rainy
Lesson1.3:Exploringdatasets
class
25
Lesson1.3:Exploringdatasets Classification
sometimescalledsupervisedlearning
instance: fixedsetoffeatures
discrete(nominal) continuous(numeric) discrete:classification problem continuous:regression problem
26
classified example
Lesson1.3:Exploringdatasets
openfileweather.numeric.arff
attribute values attributes
class
27
Lesson1.3:Exploringdatasets
openfileglass.arff
28
Lesson1.3:Exploringdatasets
Theclassificationproblem weather.nominal,weather.numeric Nominalvs numericattributes ARFFfileformat glass.arff dataset Sanitycheckingattributes
DataMiningwithWeka
Class1 Lesson4 Buildingaclassifier
IanH.Witten
DepartmentofComputerScience UniversityofWaikato NewZealand
weka.waikato.ac.nz
Lesson1.4:Buildingaclassifier
Class1 GettingstartedwithWeka
Class2 Evaluation
Class3 Simpleclassifiers
Class4 Moreclassifiers
Lesson1.5Usingafilter Lesson1.6Visualizingyourdata
Class5 Puttingitalltogether
31
Lesson1.4:Buildingaclassifier
UseJ48toanalyzetheglassdataset Openfileglass.arff
(orleaveitopenfromthe lastlesson)
Lesson1.4:Buildingaclassifier
InvestigateJ48 Opentheconfigurationpanel ChecktheMore information Examinetheoptions Useanunpruned tree Lookatleafsizes SetminNumObj to15toavoidsmallleaves Visualizetreeusingrightclickmenu
33
Lesson1.4:Buildingaclassifier
FromC4.5toJ48 ID3(1979)
C4.5 (1993)
C4.8(1996?) C5.0(commercial)
J48
34
Lesson1.4:Buildingaclassifier
ClassifiersinWeka Classifyingtheglass dataset InterpretingJ48output J48configurationpanel option:prunedvs unpruned trees option:avoidsmallleaves J48~C4.5
Coursetext Section11.1 Buildingadecisiontree Examiningtheoutput
35
DataMiningwithWeka
Class1 Lesson5 Usingafilter
IanH.Witten
DepartmentofComputerScience UniversityofWaikato NewZealand
weka.waikato.ac.nz
Lesson1.5:Usingafilter
Class1 GettingstartedwithWeka
Class2 Evaluation
Class3 Simpleclassifiers
Class4 Moreclassifiers
Lesson1.5Usingafilter Lesson1.6Visualizingyourdata
Class5 Puttingitalltogether
37
Lesson1.5:Usingafilter
Useafiltertoremoveanattribute
Openweather.nominal.arff (again!) Checkthefilters
supervisedvs unsupervised attributevs instance
Choosetheunsupervised attribute filterRemove ChecktheMore information;lookattheoptions SetattributeIndices to3 andclickOK Applythefilter RecallthatyoucanSave theresult PressUndo
38
Lesson1.5:Usingafilter
Removeinstanceswherehumidity ishigh Supervisedorunsupervised? Attributeorinstance? Lookatthem SelectRemoveWithValues SetattributeIndex SetnominalIndices Apply Undo
39
Lesson1.5:Usingafilter
Fewerattributes,betterclassification! Openglass.arff RunJ48(trees>J48) RemoveFe RemoveallattributesexceptRIandMG Lookatthedecisiontrees
Userightclickmenutovisualizedecisiontrees
40
Lesson1.5:Usingafilter
FiltersinWeka Supervisedvs unsupervised, attributevs instance Tofindtherightone,youneedtolook! Filterscanbeverypowerful Judiciouslyremovingattributescan
improveperformance increasecomprehensibility Coursetext Section11.2 Loadingandfilteringfiles
41
DataMiningwithWeka
Class1 Lesson6 Visualizingyourdata
IanH.Witten
DepartmentofComputerScience UniversityofWaikato NewZealand
weka.waikato.ac.nz
Lesson1.6:Visualizingyourdata
Class1 GettingstartedwithWeka
Class2 Evaluation
Class3 Simpleclassifiers
Class4 Moreclassifiers
Lesson1.5Usingafilter Lesson1.6Visualizingyourdata
Class5 Puttingitalltogether
43
Lesson1.6:Visualizingyourdata
UsingtheVisualizepanel
Openiris.arff BringupVisualizepanel Clickoneoftheplots;examinesomeinstances Setxaxistopetalwidthandyaxistopetallength ClickonClasscolourtochangethecolour Barsontherightchangecorrespondtoattributes:clickforxaxis; rightclickforyaxis Jitterslider ShowSelectInstance:Rectangleoption Submit,Reset,ClearandSave
44
Lesson1.6:Visualizingyourdata
Visualizingclassificationerrors
RunJ48(trees>J48) Visualizeclassifiererrors(fromResultslist) Plotpredictedclassagainstclass Identifyerrorsshownbyconfusionmatrix
45
Lesson1.6:Visualizingyourdata
Getdownanddirtywithyourdata Visualizeit Cleanitupbydeletingoutliers Lookatclassificationerrors
(theresafilterthatallowsyoutoaddclassificationsasanew attribute)
DataMiningwithWeka
DepartmentofComputerScience UniversityofWaikato NewZealand
CreativeCommonsAttribution3.0Unported License
creativecommons.org/licenses/by/3.0/
weka.waikato.ac.nz