You are on page 1of 7

1/5/2017 DataMining:BasicMethods

WVU,DataMining:Fall'06
CS591O::LCSEE::WVU

Lectures
Lecture3a

(ThesenotesextendedfromWitten&Frankchapter4.)

1R:simplicityfirst
Simplealgorithmsoftenworksurprisinglywell

Inanycase,trythesimplestbeforetryingthemostcomplicated.Whytrysimple?Well...compareasupposedlymoresophisticatedapproach......againsta
seeminglymorestupidone(the"strawman").isgoodscience.

(Warning:sometimes,strawmendon'tburn.)

1R([Holte93])assumesthatallattributesareindependentandoneattributeismorepowerfulthantherest.

Itbuildsa1leveldecisiontree.Inotherwords,generatesasetofrulesthatalltestononeparticularattribute

Basicversion(assumingnominalattributes)

Onebranchforeachoftheattribute'svalues
Eachbranchassignsmostfrequentclass
Errorrate:proportionofinstancesthatdon'tbelongtothemajorityclassoftheircorrespondingbranch
Chooseattributewithlowesterrorrate

Pseudocodefor1R:

Foreachattribute,
Foreachvalueoftheattribute,makearuleasfollows:
counthowofteneachclassappears
findthemostfrequentclass
maketheruleassignthatclasstothisattributevalue
Calculatetheerrorrateoftherules
Choosetheruleswiththesmallesterrorrate

Note:"missing"isalwaystreatedasaseparateattributevalue

Example:

OUTLOOKTEMPHUMIDITYWINDYPLAY

SunnyHotHighFalseNo
SunnyHotHighTrueNo
OvercastHotHighFalseYes
RainyMildHighFalseYes
RainyCoolNormalFalseYes
RainyCoolNormalTrueNo
OvercastCoolNormalTrueYes
SunnyMildHighFalseNo
SunnyCoolNormalFalseYes
RainyMildNormalFalseYes
SunnyMildNormalTrueYes
OvercastMildHighTrueYes
OvercastHotNormalFalseYes
RainyMildHighTrueNo%%

OneR:

ATTRIBUTERULESERRORSTOTALERRORS
OutlookSunny=>No2/54/14
Overcast=>Yes0/4
Rainy=>Yes2/5
TemperatureHot=>No*2/45/14
Mild=>Yes2/6
Cool=>Yes1/4
HumidityHigh=>No3/74/14
Normal=>Yes1/7
WindyFalse=>Yes2/85/14
True=>No*3/6%%

Numericattributesarediscretized:therangeoftheattributeisdividedintoasetofintervals

Instancesaresortedaccordingtoattribute'svalues
Breakpointsareplacedwherethe(majority)classchanges(sothatthetotalerrorisminimized)

Example:temperaturefromweatherdata:

6465686970717272757580818385
Yes|No|YesYesYes|NoNoYes|YesYes|No|YesYes|No%%

Theproblemofoverfitting

Discretizationprocedureisverysensitivetonoise
Asingleinstancewithanincorrectclasslabelwillmostlikelyresultinaseparateinterval
Simplesolution:enforceminimumnumberofinstancesinmajorityclassperinterval
Weatherdataexample(withminimumsettoB=3):)

6465686970717272757580818385
YesNoYesYesYes|NoNoYesYesYes|NoYesYesNo

Resultofoverfittingavoidance(B=3)

http://www.csee.wvu.edu/~timm/cs591o/old/BasicMethods.html 1/7
1/5/2017 DataMining:BasicMethods

6465686970717272757580818385
YesNoYesYesYes|NoNoYesYesYes|NoYesYesNo

Repeatingtheabove)

Total
AttributeRulesErrorserrors

OutlookSunny=>No2/54/14
Overcast=>Yes0/4
Rainy=>Yes2/5
Temperature77.5=>Yes3/105/14
>77.5=>No*2/4
Humidity82.5=>Yes1/73/14
>82.5and<=95.5=>No2/6
>95.5=>Yes0/1
WindyFalse=>Yes2/85/14
True=>No*3/6%%

1Rwasdescribedinapaperby[Holte93].

Containsanexperimentalevaluationon16datasets(usingcrossvalidationsothatresultswererepresentativeofperformanceonfuturedata)
Minimumnumberofinstanceswassetto6aftersomeexperimentation
1R'ssimplerulesperformednotmuchworsethanmuchmorecomplexdecisiontrees
Simplicityfirstpaysoff!

BayesClassifiers101
ABayesclassifierisasimplelearningscheme.

Advantages:

Tinymemoryfootprint
Fasttraining,fastlearning
Simplicity
Oftenworkssurprisinglywell

Assumptions

Learningisdonebestviastatisticalmodeling
Attributesare
equallyimportant
statisticallyindependent(giventheclassvalue)
Thismeansthatknowledgeaboutthevalueofaparticularattributedoesn'ttellusanythingaboutthevalueofanotherattribute(iftheclassis
known)
Althoughbasedonassumptionsthatarealmostnevercorrect,thisschemeworkswellinpractice[Domingos97]:

Example
weather.symbolic.arff

outlooktemperaturehumiditywindyplay

rainycoolnormalTRUEno
rainymildhighTRUEno
sunnyhothighFALSEno
sunnyhothighTRUEno
sunnymildhighFALSEno
overcastcoolnormalTRUEyes
overcasthothighFALSEyes
overcasthotnormalFALSEyes
overcastmildhighTRUEyes
http://www.csee.wvu.edu/~timm/cs591o/old/BasicMethods.html 2/7
1/5/2017 DataMining:BasicMethods
overcastmildhighTRUEyes
rainycoolnormalFALSEyes
rainymildhighFALSEyes
rainymildnormalFALSEyes
sunnycoolnormalFALSEyes
sunnymildnormalTRUEyes%%

Thisdatacanbesummerizedasfollows:


OutlookTemperatureHumidity
======================================================
YesNoYesNoYesNo
Sunny23Hot22High34
Overcast40Mild42Normal61
Rainy32Cool31

Sunny2/93/5Hot2/92/5High3/94/5
Overcast4/90/5Mild4/92/5Normal6/91/5
Rainy3/92/5Cool3/91/5

WindyPlay
=========================
YesNoYesNo
False6295
True33

False6/92/59/145/14
True3/93/5

So,whathappensonanewday:

OutlookTemp.HumidityWindyPlay
SunnyCoolHighTrue?%%

Firstfindthelikelihoodofthetwoclasses

For"yes"=2/9*3/9*3/9*3/9*9/14=0.0053
For"no"=3/5*1/5*4/5*3/5*5/14=0.0206
Conversionintoaprobabilitybynormalization:
P("yes")=0.0053/(0.0053+0.0206)=0.205
P("no")=0.0206/(0.0053+0.0206)=0.795

So,wearen'tplayinggolftoday.

Bayes'rule
Moregenerally,theaboveisjustanapplicationofBayes'Theorem.

ProbabilityofeventHgivenevidenceE:

Pr[E|H]*Pr[H]
Pr[H|E]=
Pr[E]

AprioriprobabilityofH=Pr[H]
Probabilityofeventbeforeevidencehasbeenseen
AposterioriprobabilityofH=Pr[H|E]
Probabilityofeventafterevidencehasbeenseen
Classificationlearning:what'stheprobabilityoftheclassgivenaninstance?
EvidenceE=instance
EventH=classvalueforinstance
NaiveBayesassumption:evidencecanbesplitintoindependentparts(i.e.attributesofinstance!

Pr[E1|H]*Pr[E2|H]*....*Pr[En|H]Pr[H]
Pr[H|E]=
Pr[E]

Weusedthisabove.Here'sourevidence:

OutlookTemp.HumidityWindyPlay
SunnyCoolHighTrue?

Here'stheprobabilityfor"yes":

Pr[yes|E]=Pr[Outlook=Sunny|yes]*
Pr[Temperature=Cool|yes]*
Pr[Humidity=High|yes]*Pr[yes]
Pr[Windy=True|yes]*Pr[yes]/Pr[E]
=(2/9*3/9*3/9*3/9&9/14)*Pr[E]%%

Numericalerrors:
Frommultiplicationoflotsofsmallnumbers

Usethestandardfix:don'tmultiplythenumbers,addthelogs

Missingvalues
Missingvaluesareaproblemforanylearner.NaiveBayes'treatmentofmissingvaluesisparticularlyelegant.

Duringtraining:instanceisnotincludedinfrequencycountforattributevalueclasscombination

http://www.csee.wvu.edu/~timm/cs591o/old/BasicMethods.html 3/7
1/5/2017 DataMining:BasicMethods
Duringclassification:attributewillbeomittedfromcalculation

Example:OutlookTemp.HumidityWindyPlay
?CoolHighTrue?%%

Likelihoodof"yes"=3/9*3/9*3/9*9/14=0.0238
Likelihoodof"no"=1/5*4/5*3/5*5/14=0.0343
P("yes")=0.0238/(0.0238+0.0343)=41%
P("no")=0.0343/(0.0238+0.0343)=59%

The"lowfrequenciesproblem"
Whatifanattributevaluedoesn'toccurwitheveryclassvalue(e.g."Humidity=high"forclass"yes")?

Probabilitywillbezero!
Pr[Humidity=High|yes]=0
Aposterioriprobabilitywillalsobezero!Pr[yes|E]=0(Nomatterhowlikelytheothervaluesare!)

Souseanestimatorsforlowfrequencyattributeranges

Addalittle"m"tothecountforeveryattributevalueclasscombination
TheLaplaceestimator
Result:probabilitieswillneverbezero!

Anduseanstimatorforlowfrequencyclasses

Addalittle"k"toclasscounts
TheMestimate

Magicnumbers:m=2,k=1

Psuedocode
Here'sthepseudocodeofthetheNaiveBayesclassifierpreferredby[Yang03](p4).Itusestheseglobals:

#"F":frequencytables
#"I":numberofinstances
#"C":howmanyclasses?
#"N":instancesperclass

Whenlearningfromtrainingexamples,updateafrequencytable:

functionupdate(class,train){
#OUTPUT:changestotheglobals.
#INPUT:a"train"ingexample
#containingattribute/
#valuepairsinsome
#"class"
I++#updatenumberofinstances
if(++N[class]==1)#updatecountsforeachclass
thenC++#maybe,increasenumberofclasses
fi
for<attr,value>intrain
doif(value!="?")#:skipmissingvalues
thenF[class,attr,range]++#:incrementfrequencycounts
fi
done
}

Whentesting,findthelikelihoodofeachhypotheticalclassandreturntheonethatismostlikely.

functionclassify(test){
#OUTPUT:"what"isthemost
#likelyhypothesis
#forthetestcase.
#INPUT:a"test"casecontaining
#attribute/valuepairs.
m=2#ControlforLaplaceestimates.
k=1;#ControlforMestimates.
like=100000#Initial,impossiblelikelihood.
for(HinN)#Checkallhypotheses.
doprior=(N[H]+k)/(I+(k*C))#here'sP[H]
temp=log(prior)#uselogsforsmallvalues
for<attr,value>inattributes#forallitemsinthetest
doif(value!="?")#skipmissingvalues

theninc=F[H,attr,value]+(m*prior))/(N[H]+m)#P[Ei|H]
temp+=log(inc)#addinglogs=multiplication
fi
done
if(temp>=like)#ifwe'vegotabetterlikelihood
thenlike=temp
class=H#savethishypothesisandmaxlikelihood
fi
done
returnclass#returntheclasswithmostlikelihood
}

HandlingNumerics
Theabovecodeassumesthattheattributesarediscrete.Theusualapproximationistoassumea"gaussian"(i.e.a"normal"or"bellshaped"curve)forthe
numerics.

TheprobabilitydensityfunctionforthenormaldistributionisdefinedbythemeanandstandardDev(standarddeviation)

Given:

n:thenumberofvalues

http://www.csee.wvu.edu/~timm/cs591o/old/BasicMethods.html 4/7
1/5/2017 DataMining:BasicMethods
sum:thesumofthevaluesi.e.sum=sum+value
sumSq:thesumofthesquareofthevaluesi.e.sumSq=sumSq+value*value

Then:

functionmean(sum,n){
returnsum/n
}
functionstandardDeviation(sumSq,sum,n){
returnsqrt((sumSq((sum*sum)/n))/(n1))
}
functiongaussianPdf(mean,standardDev,x){
pi=1068966896/340262731;#:goodto17decimalplaces
return1/(standardDev*sqrt(2*pi))^
(1*(xmean)^2/(2*standardDev*standardDev))
}

Forexample:

outlooktemperaturehumiditywindyplay

sunny8585FALSEno
sunny8090TRUEno
overcast8386FALSEyes
rainy7096FALSEyes
rainy6880FALSEyes
rainy6570TRUEno
overcast6465TRUEyes
sunny7295FALSEno
sunny6970FALSEyes
rainy7580FALSEyes
sunny7570TRUEyes
overcast7290TRUEyes
overcast8175FALSEyes
rainy7191TRUEno%%

Thisgeneratesthefollowingstatistics:


OutlookTemperatureHumidity
=======================================================
YesNoYesNoYesNo
Sunny2383858685
Overcast4070809690
Rainy3268658070

Sunny2/93/5mean7374.6mean79.186.2
Overcast4/90/5stddev6.27.9stddev10.29.7
Rainy3/92/5

WindyPlay
==============================
YesNoYesNo
False6295
True33

False6/92/59/145/14
True3/93/5

Exampledensityvalue:

f(temperature=66|yes)=gaussianPdf(73,6/2,66)=0.0340
Classifyinganewday:

OutlookTemp.HumidityWindyPlay
Sunny6690true?%%

Likelihoodof"yes"=2/9*0.0340*0.0221*3/9*9/14=0.000036
Likelihoodof"no"=3/5*0.0291*0.0380*3/5*5/14=0.000136
P("yes")=0.000036/(0.000036+0.000136)=20.9%
P("no")=0.000136/(0.000036+0.000136)=79.1%

Note:missingvaluesduringtraining:notincludedincalculationofmeanandstandarddeviation

BTW,analternativetotheaboveisapplysomediscretizationpolicytothedatae.g.[Yang03].Suchdiscretizationisgoodpracticesinceitcandramatically
improvetheperformanceofaNaiveBayesclassifier(see[Dougherty95]

Notso"Naive"Bayes
WhydoesNaiveBayesworksowell?[Domingos97]offeroneanalysis:

Theyofferoneexamplewiththreeattributeswheretheperformancewherea"Naive"anda"optimal"Bayespeformnearlythesame.
Theygeneralizedthattoconcludethat"Naive"bayesisonlyreallyNaiveinavanishinglysmallnumberofcases.

Therethreeattributeexampleisgivenbelow.Forthegeneralizedexample,seetheirpaper.

ConsideraBooleanconcept,describedbythreeattributesA,BandC.

Assumethatthetwoclasses,denotedby+andareequiprobable

(P(+)=P()=1/2).

LetAandCbeindependent,andletA=B(i.e.,AandBarecompletelydependent).ThereforeBshouldbeignored,andtheoptimalclassificationprocedure
foratestinstanceistoassignitto(i)class+if

P(A|+)*P(C|+)P(A|)*P(C|)>0,

http://www.csee.wvu.edu/~timm/cs591o/old/BasicMethods.html 5/7
1/5/2017 DataMining:BasicMethods

and(ii)toclass(iftheinequalityhastheoppositesign),and(iii)toanarbitraryclassifthetwosidesareequal.

NotethattheBayesianclassifierwilltakeBintoaccountasifitwasindependentfromA,andthiswillbeequivalenttocountingAtwice.Thus,theBayesian
classifierwillassigntheinstancetoclass+if

P(A|+)^2*P(C|+)P(A|)^2*P(C|)>0,

andtootherwise.

ApplyingBayes'theorem,P(A|+)canbereexpressedas

P(A)*P(+|A)/P(+)

andsimilarlyfortheotherprobabilities.

SinceP(+)=P(),aftercancelingliketermsthisleadstotheequivalentexpressions

P(+|A)*P(+|C)P(|A)*P(|C)>0

fortheoptimaldecision,and

P(+|A)^2*P(+|C)P(|A)^2*P(|C)>0

fortheBayesianclassifier.Let

P(+|A)=p
P(+|C)=q.

Thenclass+shouldbeselectedwhen

pq(1p)*(1q)>0

whichisequivalentto

q>1p[OptimalBayes]

WiththeBayesianclassifier,itwillbeselectedwhen

p^2*q(1p)^2*(1q)>0

whichisequivalentto

q>(1p)^2*p^2+(1p)^2[SimpleBayes]

Thetwocurvesareshowninfollowingfigure.Theremarkablefactisthat,eventhoughtheindependenceassumptionisdecisivelyviolatedbecauseB=A,the
Bayesianclassifierdisagreeswiththeoptimalprocedureonlyinthetwonarrowregionsthatareaboveoneofthecurvesandbelowtheothereverywhereelseit
performsthecorrectclassification.

Thus,forallproblemswhere(p,q)doesnotfallinthosetwosmallregions,theBayesianclassifieriseffectivelyoptimal.

ReviewQuestions
OneR
Discuss:"OneRoffersaninductivegeneralizationbutNaiveBayesdoesnot.
StatethePseudocodeof1R
Whatisoverfitting?Discusshowthediscretizationmethodof1Rtriestoavoidoverfitting.Giveanexample.
Distinguish"superviseddiscretization"from"unsuperviseddiscretization".WhichkindofdiscretizationdoesOneRuse?Explainyouranswer.a
Applythe1Rpseudocodetothegivendatasetanddeterminetheattributeandruleswhicharetobeusedforclassification.

http://www.csee.wvu.edu/~timm/cs591o/old/BasicMethods.html 6/7
1/5/2017 DataMining:BasicMethods

tearrecommended
AgeSpectacleastigmatismratelenses
===================================================================
youngmyopenoreducednone
youngmyopeyesnormalsoft
youngmyopeyesnormalhard
younghypermetropeyesreducednone
Prepresbyopicmyopenoreducednone
Prepresbyopicmyopenonormalsoft
Prepresbyopichypermetropeyesreducednone
presbyopicmyopenoreducednone
presbyopichypermetropeyesnormalnone
presbyopichypermetropenonormalsoft

Showallyourworking.Leavefractionsasfractions.
NaiveBaues
StateBayesruleanddiscusshowitcaninferfuturebeliefsasafunctionofpastbeliefsplusnewevidence.
WhatarethedrawbacksofNaiveBayesmethod?
HowaremissingvalueshandledinNaiveBayes?
Supposewehaveatableofdata:

MakeSizeConvertibleType

Mitsubishismallyescoup
Mitsubishimediumnosuv
Toyotasmallyescoup
Toyotalargenocoup
Toyotalargenosuv
Benzsmallyescoup
Benzlargenosuv
BMWsmallyescoup
BMWmediumyescoup
Fordsmallyescoup
Fordlargenosuv
Hondasmallnocoup

andweseeanewexample:


MakeSizeConvertibleType

Fordmediumno?

Calculatethefollowingforthenewexample,giventhedatabaseofoldexamples:

1.A=LikelihoodofSUV:
2.B=LikelihoodofCoup:
3.C=ProbabilityofSUV:
4.D=Probabilityofcoup:
Showallyourworking.Leavefractionsasfractions.

SitedesignbyGetTemplate.com tim@menzies.us 2006,TimMenzies,AttributionShareAlike2.5

http://www.csee.wvu.edu/~timm/cs591o/old/BasicMethods.html 7/7

You might also like