You are on page 1of 45

DataMiningwithWeka

Class1 Lesson1 Introduction

IanH.Witten
DepartmentofComputerScience UniversityofWaikato NewZealand

weka.waikato.ac.nz

DataMiningwithWeka
apracticalcourseonhowto useWeka fordatamining explainsthebasicprinciples ofseveralpopularalgorithms

IanH.Witten
UniversityofWaikato,NewZealand

DataMiningwithWeka
Whatsdatamining?
Weareoverwhelmedwithdata Dataminingisaboutgoingfromdatatoinformation, informationthatcangiveyouusefulpredictions

Examples??
Youreatthesupermarketcheckout. Yourehappywithyourbargains andthesupermarketishappyyouveboughtsomemorestuff Sayyouwantachild,butyouandyourpartnercanthaveone. Candatamininghelp?

Dataminingvs.machinelearning

DataMiningwithWeka
WhatsWeka?
AbirdfoundonlyinNewZealand?

Dataminingworkbench
WaikatoEnvironmentforKnowledgeAnalysis Machinelearningalgorithmsfordataminingtasks 100+algorithmsforclassification 75fordatapreprocessing 25toassistwithfeatureselection 20forclustering,findingassociationrules,etc

DataMiningwithWeka
Whatwillyoulearn?
LoaddataintoWeka andlookatit Usefilterstopreprocessit Exploreitusinginteractivevisualization Applyclassificationalgorithms Interprettheoutput Understandevaluationmethodsandtheirimplications Understandvariousrepresentationsformodels Explainhowpopularmachinelearningalgorithmswork Beawareofcommonpitfallswithdatamining

UseWeka onyourowndata andunderstandwhatyouaredoing!

Class1:GettingstartedwithWeka
InstallWeka ExploretheExplorer interface Exploresomedatasets Buildaclassifier Interprettheoutput Usefilters Visualizeyourdataset

Courseorganization
Class1 GettingstartedwithWeka

Lesson1.1
Activity1

Class2 Evaluation

Lesson1.2
Activity2

Class3 Simpleclassifiers

Lesson1.3
Activity3

Lesson1.4 Class4 Moreclassifiers


Activity4

Lesson1.5
Activity5

Class5 Puttingitalltogether
9

Lesson1.6
Activity6

Courseorganization
Class1 GettingstartedwithWeka

Class2 Evaluation Midclassassessment Class3 Simpleclassifiers 1/3

Class4 Moreclassifiers

Class5 Puttingitalltogether
10

Postclassassessment

2/3

Textbook
Thistextbookdiscussesdatamining, andWeka,indepth: DataMining:Practicalmachine learningtoolsandtechniques,
byIanH.Witten,Eibe Frankand MarkA.Hall.MorganKaufmann,2011

Thepublisherhasmadeavailable partsrelevanttothiscourse inebook format.


11

12 CommonsAttribution3.0Unported License WorldMapbyDavidNiblack,licensedunderaCreative

DataMiningwithWeka
Class1 Lesson2 ExploringtheExplorer

IanH.Witten
DepartmentofComputerScience UniversityofWaikato NewZealand

weka.waikato.ac.nz

Lesson1.2:ExploringtheExplorer
Class1 GettingstartedwithWeka

Lesson1.1Introduction Lesson1.2ExploringtheExplorer Lesson1.3Exploringdatasets Lesson1.4Buildingaclassifier

Class2 Evaluation

Class3 Simpleclassifiers

Class4 Moreclassifiers

Lesson1.5Usingafilter Lesson1.6Visualizingyourdata

Class5 Puttingitalltogether

14

Lesson1.2:ExploringtheExplorer
Downloadfrom http://www.cs.waikato.ac.nz/ml/weka
(forWindows,Mac,Linux)

Weka 3.6.10
(thelateststableversionofWeka) (includesdatasetsforthecourse) (itsimportanttogettherightversion,3.6.10)

15

Lesson1.2:ExploringtheExplorer

Performance comparisons Graphical interface Commandline interface

16

Lesson1.2:ExploringtheExplorer

17

Lesson1.2:ExploringtheExplorer
attributes
Outlook Temp
Hot Hot Hot Mild Cool Cool Cool Mild Cool Mild Mild Mild Hot Mild

Humidity
High High High High Normal Normal Normal High Normal Normal Normal High Normal High

Windy
False True False False False True True False False False True True False True
18

Play
No No Yes Yes Yes No Yes No Yes Yes Yes Yes Yes No

instances

1 2 3 4 5 6 7 8 9 10 11 12 13 14

Sunny Sunny Overcast Rainy Rainy Rainy Overcast Sunny Sunny Rainy Sunny Overcast Overcast Rainy

Lesson1.2:ExploringtheExplorer

openfileweather.nominal.arff

19

Lesson1.2:ExploringtheExplorer

attribute values attributes

20

Lesson1.2:ExploringtheExplorer
InstallWeka Getdatasets OpenExplorer Openadataset(weather.nominal.arff) Lookatattributesandtheirvalues Editthedataset Saveit?
Coursetext Section1.2 Theweatherproblem Chapter10 IntroductiontoWeka
21

DataMiningwithWeka
Class1 Lesson3 Exploringdatasets

IanH.Witten
DepartmentofComputerScience UniversityofWaikato NewZealand

weka.waikato.ac.nz

Lesson1.3:Exploringdatasets
Class1 GettingstartedwithWeka

Lesson1.1Introduction Lesson1.2ExploringtheExplorer Lesson1.3Exploringdatasets Lesson1.4Buildingaclassifier

Class2 Evaluation

Class3 Simpleclassifiers

Class4 Moreclassifiers

Lesson1.5Usingafilter Lesson1.6Visualizingyourdata

Class5 Puttingitalltogether

Lesson1.3:Exploringdatasets
attributes
Outlook Temp
Hot Hot Hot Mild Cool Cool Cool Mild Cool Mild Mild Mild Hot Mild

Humidity
High High High High Normal Normal Normal High Normal Normal Normal High Normal High

Windy
False True False False False True True False False False True True False True
24

Play
No No Yes Yes Yes No Yes No Yes Yes Yes Yes Yes No

instances

1 2 3 4 5 6 7 8 9 10 11 12 13 14

Sunny Sunny Overcast Rainy Rainy Rainy Overcast Sunny Sunny Rainy Sunny Overcast Overcast Rainy

Lesson1.3:Exploringdatasets

openfileweather.nominal.arff attribute values attributes

class

25

Lesson1.3:Exploringdatasets Classification
sometimescalledsupervisedlearning

Dataset:classifiedexamples Model thatclassifiesnewexamples


attribute1 attribute2 attributen class

instance: fixedsetoffeatures
discrete(nominal) continuous(numeric) discrete:classification problem continuous:regression problem
26

classified example

Lesson1.3:Exploringdatasets

openfileweather.numeric.arff
attribute values attributes

class

27

Lesson1.3:Exploringdatasets

openfileglass.arff

28

Lesson1.3:Exploringdatasets
Theclassificationproblem weather.nominal,weather.numeric Nominalvs numericattributes ARFFfileformat glass.arff dataset Sanitycheckingattributes

Coursetext Section11.1 Preparingthedata LoadingthedataintotheExplorer


29

DataMiningwithWeka
Class1 Lesson4 Buildingaclassifier

IanH.Witten
DepartmentofComputerScience UniversityofWaikato NewZealand

weka.waikato.ac.nz

Lesson1.4:Buildingaclassifier
Class1 GettingstartedwithWeka

Lesson1.1Introduction Lesson1.2ExploringtheExplorer Lesson1.3Exploringdatasets Lesson1.4Buildingaclassifier

Class2 Evaluation

Class3 Simpleclassifiers

Class4 Moreclassifiers

Lesson1.5Usingafilter Lesson1.6Visualizingyourdata

Class5 Puttingitalltogether

31

Lesson1.4:Buildingaclassifier
UseJ48toanalyzetheglassdataset Openfileglass.arff
(orleaveitopenfromthe lastlesson)

Checktheavailableclassifiers ChoosetheJ48decisiontreelearner(trees>J48) Runit Examinetheoutput Lookatthecorrectlyclassifiedinstances andtheconfusionmatrix


32

Lesson1.4:Buildingaclassifier
InvestigateJ48 Opentheconfigurationpanel ChecktheMore information Examinetheoptions Useanunpruned tree Lookatleafsizes SetminNumObj to15toavoidsmallleaves Visualizetreeusingrightclickmenu
33

Lesson1.4:Buildingaclassifier
FromC4.5toJ48 ID3(1979)

C4.5 (1993)
C4.8(1996?) C5.0(commercial)

J48

34

Lesson1.4:Buildingaclassifier
ClassifiersinWeka Classifyingtheglass dataset InterpretingJ48output J48configurationpanel option:prunedvs unpruned trees option:avoidsmallleaves J48~C4.5
Coursetext Section11.1 Buildingadecisiontree Examiningtheoutput
35

DataMiningwithWeka
Class1 Lesson5 Usingafilter

IanH.Witten
DepartmentofComputerScience UniversityofWaikato NewZealand

weka.waikato.ac.nz

Lesson1.5:Usingafilter
Class1 GettingstartedwithWeka

Lesson1.1Introduction Lesson1.2ExploringtheExplorer Lesson1.3Exploringdatasets Lesson1.4Buildingaclassifier

Class2 Evaluation

Class3 Simpleclassifiers

Class4 Moreclassifiers

Lesson1.5Usingafilter Lesson1.6Visualizingyourdata

Class5 Puttingitalltogether

37

Lesson1.5:Usingafilter
Useafiltertoremoveanattribute
Openweather.nominal.arff (again!) Checkthefilters
supervisedvs unsupervised attributevs instance

Choosetheunsupervised attribute filterRemove ChecktheMore information;lookattheoptions SetattributeIndices to3 andclickOK Applythefilter RecallthatyoucanSave theresult PressUndo
38

Lesson1.5:Usingafilter
Removeinstanceswherehumidity ishigh Supervisedorunsupervised? Attributeorinstance? Lookatthem SelectRemoveWithValues SetattributeIndex SetnominalIndices Apply Undo
39

Lesson1.5:Usingafilter
Fewerattributes,betterclassification! Openglass.arff RunJ48(trees>J48) RemoveFe RemoveallattributesexceptRIandMG Lookatthedecisiontrees

Userightclickmenutovisualizedecisiontrees
40

Lesson1.5:Usingafilter
FiltersinWeka Supervisedvs unsupervised, attributevs instance Tofindtherightone,youneedtolook! Filterscanbeverypowerful Judiciouslyremovingattributescan
improveperformance increasecomprehensibility Coursetext Section11.2 Loadingandfilteringfiles
41

DataMiningwithWeka
Class1 Lesson6 Visualizingyourdata

IanH.Witten
DepartmentofComputerScience UniversityofWaikato NewZealand

weka.waikato.ac.nz

Lesson1.6:Visualizingyourdata
Class1 GettingstartedwithWeka

Lesson1.1Introduction Lesson1.2ExploringtheExplorer Lesson1.3Exploringdatasets Lesson1.4Buildingaclassifier

Class2 Evaluation

Class3 Simpleclassifiers

Class4 Moreclassifiers

Lesson1.5Usingafilter Lesson1.6Visualizingyourdata

Class5 Puttingitalltogether

43

Lesson1.6:Visualizingyourdata
UsingtheVisualizepanel
Openiris.arff BringupVisualizepanel Clickoneoftheplots;examinesomeinstances Setxaxistopetalwidthandyaxistopetallength ClickonClasscolourtochangethecolour Barsontherightchangecorrespondtoattributes:clickforxaxis; rightclickforyaxis Jitterslider ShowSelectInstance:Rectangleoption Submit,Reset,ClearandSave
44

Lesson1.6:Visualizingyourdata
Visualizingclassificationerrors
RunJ48(trees>J48) Visualizeclassifiererrors(fromResultslist) Plotpredictedclassagainstclass Identifyerrorsshownbyconfusionmatrix

45

Lesson1.6:Visualizingyourdata
Getdownanddirtywithyourdata Visualizeit Cleanitupbydeletingoutliers Lookatclassificationerrors
(theresafilterthatallowsyoutoaddclassificationsasanew attribute)

Coursetext Section11.2 Visualization


46

DataMiningwithWeka
DepartmentofComputerScience UniversityofWaikato NewZealand

CreativeCommonsAttribution3.0Unported License

creativecommons.org/licenses/by/3.0/

weka.waikato.ac.nz

You might also like