DataMining NaiveByesBasicMethods

1/5/2017 DataMining:BasicMethods
WVU,DataMining:Fall'06
CS591O::LCSEE::WVU
Lectures
Lecture3a
(ThesenotesextendedfromWitten&Frankchapter4.)
1R:simplicityfirst
Simplealgorithmsoftenworksurprisinglywell
Inanycase,trythesimplestbeforetryingthemostcomplicated.Whytrysimple?Well...compareasupposedlymoresophisticatedapproach......againsta
seeminglymorestupidone(the"strawman").isgoodscience.
(Warning:sometimes,strawmendon'tburn.)
1R([Holte93])assumesthatallattributesareindependentandoneattributeismorepowerfulthantherest.
Itbuildsa1leveldecisiontree.Inotherwords,generatesasetofrulesthatalltestononeparticularattribute
Basicversion(assumingnominalattributes)
Onebranchforeachoftheattribute'svalues
Eachbranchassignsmostfrequentclass
Errorrate:proportionofinstancesthatdon'tbelongtothemajorityclassoftheircorrespondingbranch
Chooseattributewithlowesterrorrate
Pseudocodefor1R:
Foreachattribute,
Foreachvalueoftheattribute,makearuleasfollows:
counthowofteneachclassappears
findthemostfrequentclass
maketheruleassignthatclasstothisattributevalue
Calculatetheerrorrateoftherules
Choosetheruleswiththesmallesterrorrate
Note:"missing"isalwaystreatedasaseparateattributevalue
Example:
OUTLOOKTEMPHUMIDITYWINDYPLAY

SunnyHotHighFalseNo
SunnyHotHighTrueNo
OvercastHotHighFalseYes
RainyMildHighFalseYes
RainyCoolNormalFalseYes
RainyCoolNormalTrueNo
OvercastCoolNormalTrueYes
SunnyMildHighFalseNo
SunnyCoolNormalFalseYes
RainyMildNormalFalseYes
SunnyMildNormalTrueYes
OvercastMildHighTrueYes
OvercastHotNormalFalseYes
RainyMildHighTrueNo%%
OneR:
ATTRIBUTERULESERRORSTOTALERRORS
OutlookSunny=>No2/54/14
Overcast=>Yes0/4
Rainy=>Yes2/5
TemperatureHot=>No*2/45/14
Mild=>Yes2/6
Cool=>Yes1/4
HumidityHigh=>No3/74/14
Normal=>Yes1/7
WindyFalse=>Yes2/85/14
True=>No*3/6%%
Numericattributesarediscretized:therangeoftheattributeisdividedintoasetofintervals
Instancesaresortedaccordingtoattribute'svalues
Breakpointsareplacedwherethe(majority)classchanges(sothatthetotalerrorisminimized)
Example:temperaturefromweatherdata:
6465686970717272757580818385
Yes|No|YesYesYes|NoNoYes|YesYes|No|YesYes|No%%
Theproblemofoverfitting
Discretizationprocedureisverysensitivetonoise
Asingleinstancewithanincorrectclasslabelwillmostlikelyresultinaseparateinterval
Simplesolution:enforceminimumnumberofinstancesinmajorityclassperinterval
Weatherdataexample(withminimumsettoB=3):)
6465686970717272757580818385
YesNoYesYesYes|NoNoYesYesYes|NoYesYesNo
Resultofoverfittingavoidance(B=3)
http://www.csee.wvu.edu/~timm/cs591o/old/BasicMethods.html 1/7
6465686970717272757580818385
YesNoYesYesYes|NoNoYesYesYes|NoYesYesNo
Repeatingtheabove)
Total
AttributeRulesErrorserrors

OutlookSunny=>No2/54/14
Overcast=>Yes0/4
Rainy=>Yes2/5
Temperature77.5=>Yes3/105/14
>77.5=>No*2/4
Humidity82.5=>Yes1/73/14
>82.5and<=95.5=>No2/6
>95.5=>Yes0/1
WindyFalse=>Yes2/85/14
True=>No*3/6%%
1Rwasdescribedinapaperby[Holte93].
Containsanexperimentalevaluationon16datasets(usingcrossvalidationsothatresultswererepresentativeofperformanceonfuturedata)
Minimumnumberofinstanceswassetto6aftersomeexperimentation
1R'ssimplerulesperformednotmuchworsethanmuchmorecomplexdecisiontrees
Simplicityfirstpaysoff!
BayesClassifiers101
ABayesclassifierisasimplelearningscheme.
Advantages:
Tinymemoryfootprint
Fasttraining,fastlearning
Simplicity
Oftenworkssurprisinglywell
Assumptions
Learningisdonebestviastatisticalmodeling
Attributesare
equallyimportant
statisticallyindependent(giventheclassvalue)
Thismeansthatknowledgeaboutthevalueofaparticularattributedoesn'ttellusanythingaboutthevalueofanotherattribute(iftheclassis
known)
Althoughbasedonassumptionsthatarealmostnevercorrect,thisschemeworkswellinpractice[Domingos97]:
Example
weather.symbolic.arff
outlooktemperaturehumiditywindyplay

rainycoolnormalTRUEno
rainymildhighTRUEno
sunnyhothighFALSEno
sunnyhothighTRUEno
sunnymildhighFALSEno
overcastcoolnormalTRUEyes
overcasthothighFALSEyes
overcasthotnormalFALSEyes
overcastmildhighTRUEyes
overcastmildhighTRUEyes
rainycoolnormalFALSEyes
rainymildhighFALSEyes
rainymildnormalFALSEyes
sunnycoolnormalFALSEyes
sunnymildnormalTRUEyes%%
Thisdatacanbesummerizedasfollows:

OutlookTemperatureHumidity
======================================================
YesNoYesNoYesNo
Sunny23Hot22High34
Overcast40Mild42Normal61
Rainy32Cool31

Sunny2/93/5Hot2/92/5High3/94/5
Overcast4/90/5Mild4/92/5Normal6/91/5
Rainy3/92/5Cool3/91/5
WindyPlay
=========================
YesNoYesNo
False6295
True33

False6/92/59/145/14
True3/93/5
So,whathappensonanewday:
OutlookTemp.HumidityWindyPlay
SunnyCoolHighTrue?%%
Firstfindthelikelihoodofthetwoclasses
For"yes"=2/9*3/9*3/9*3/9*9/14=0.0053
For"no"=3/5*1/5*4/5*3/5*5/14=0.0206
Conversionintoaprobabilitybynormalization:
P("yes")=0.0053/(0.0053+0.0206)=0.205
P("no")=0.0206/(0.0053+0.0206)=0.795
So,wearen'tplayinggolftoday.
Bayes'rule
Moregenerally,theaboveisjustanapplicationofBayes'Theorem.
ProbabilityofeventHgivenevidenceE:
Pr[E|H]*Pr[H]
Pr[H|E]=
Pr[E]
AprioriprobabilityofH=Pr[H]
Probabilityofeventbeforeevidencehasbeenseen
AposterioriprobabilityofH=Pr[H|E]
Probabilityofeventafterevidencehasbeenseen
Classificationlearning:what'stheprobabilityoftheclassgivenaninstance?
EvidenceE=instance
EventH=classvalueforinstance
NaiveBayesassumption:evidencecanbesplitintoindependentparts(i.e.attributesofinstance!
Pr[E1|H]*Pr[E2|H]*....*Pr[En|H]Pr[H]
Pr[H|E]=
Pr[E]
Weusedthisabove.Here'sourevidence:
SunnyCoolHighTrue?
Here'stheprobabilityfor"yes":
Pr[yes|E]=Pr[Outlook=Sunny|yes]*
Pr[Temperature=Cool|yes]*
Pr[Humidity=High|yes]*Pr[yes]
Pr[Windy=True|yes]*Pr[yes]/Pr[E]
=(2/9*3/9*3/9*3/9&9/14)*Pr[E]%%
Numericalerrors:
Frommultiplicationoflotsofsmallnumbers
Usethestandardfix:don'tmultiplythenumbers,addthelogs
Missingvalues
Missingvaluesareaproblemforanylearner.NaiveBayes'treatmentofmissingvaluesisparticularlyelegant.
Duringtraining:instanceisnotincludedinfrequencycountforattributevalueclasscombination
Duringclassification:attributewillbeomittedfromcalculation
Example:OutlookTemp.HumidityWindyPlay
?CoolHighTrue?%%
Likelihoodof"yes"=3/9*3/9*3/9*9/14=0.0238
Likelihoodof"no"=1/5*4/5*3/5*5/14=0.0343
P("yes")=0.0238/(0.0238+0.0343)=41%
P("no")=0.0343/(0.0238+0.0343)=59%
The"lowfrequenciesproblem"
Whatifanattributevaluedoesn'toccurwitheveryclassvalue(e.g."Humidity=high"forclass"yes")?
Probabilitywillbezero!
Pr[Humidity=High|yes]=0
Aposterioriprobabilitywillalsobezero!Pr[yes|E]=0(Nomatterhowlikelytheothervaluesare!)
Souseanestimatorsforlowfrequencyattributeranges
Addalittle"m"tothecountforeveryattributevalueclasscombination
TheLaplaceestimator
Result:probabilitieswillneverbezero!
Anduseanstimatorforlowfrequencyclasses
Addalittle"k"toclasscounts
TheMestimate
Magicnumbers:m=2,k=1
Psuedocode
Here'sthepseudocodeofthetheNaiveBayesclassifierpreferredby[Yang03](p4).Itusestheseglobals:
#"F":frequencytables
#"I":numberofinstances
#"C":howmanyclasses?
#"N":instancesperclass
Whenlearningfromtrainingexamples,updateafrequencytable:
functionupdate(class,train){
#OUTPUT:changestotheglobals.
#INPUT:a"train"ingexample
#containingattribute/
#valuepairsinsome
#"class"
I++#updatenumberofinstances
if(++N[class]==1)#updatecountsforeachclass
thenC++#maybe,increasenumberofclasses
fi
for<attr,value>intrain
doif(value!="?")#:skipmissingvalues
thenF[class,attr,range]++#:incrementfrequencycounts
fi
done
}
Whentesting,findthelikelihoodofeachhypotheticalclassandreturntheonethatismostlikely.
functionclassify(test){
#OUTPUT:"what"isthemost
#likelyhypothesis
#forthetestcase.
#INPUT:a"test"casecontaining
#attribute/valuepairs.
m=2#ControlforLaplaceestimates.
k=1;#ControlforMestimates.
like=100000#Initial,impossiblelikelihood.
for(HinN)#Checkallhypotheses.
doprior=(N[H]+k)/(I+(k*C))#here'sP[H]
temp=log(prior)#uselogsforsmallvalues
for<attr,value>inattributes#forallitemsinthetest
doif(value!="?")#skipmissingvalues
theninc=F[H,attr,value]+(m*prior))/(N[H]+m)#P[Ei|H]
temp+=log(inc)#addinglogs=multiplication
fi
done
if(temp>=like)#ifwe'vegotabetterlikelihood
thenlike=temp
class=H#savethishypothesisandmaxlikelihood
fi
done
returnclass#returntheclasswithmostlikelihood
}
HandlingNumerics
Theabovecodeassumesthattheattributesarediscrete.Theusualapproximationistoassumea"gaussian"(i.e.a"normal"or"bellshaped"curve)forthe
numerics.
TheprobabilitydensityfunctionforthenormaldistributionisdefinedbythemeanandstandardDev(standarddeviation)
Given:
n:thenumberofvalues
sum:thesumofthevaluesi.e.sum=sum+value
sumSq:thesumofthesquareofthevaluesi.e.sumSq=sumSq+value*value
Then:
functionmean(sum,n){
returnsum/n
}
functionstandardDeviation(sumSq,sum,n){
returnsqrt((sumSq((sum*sum)/n))/(n1))
}
functiongaussianPdf(mean,standardDev,x){
pi=1068966896/340262731;#:goodto17decimalplaces
return1/(standardDev*sqrt(2*pi))^
(1*(xmean)^2/(2*standardDev*standardDev))
}
Forexample:
outlooktemperaturehumiditywindyplay

sunny8585FALSEno
sunny8090TRUEno
overcast8386FALSEyes
rainy7096FALSEyes
rainy6880FALSEyes
rainy6570TRUEno
overcast6465TRUEyes
sunny7295FALSEno
sunny6970FALSEyes
rainy7580FALSEyes
sunny7570TRUEyes
overcast7290TRUEyes
overcast8175FALSEyes
rainy7191TRUEno%%
Thisgeneratesthefollowingstatistics:

OutlookTemperatureHumidity
=======================================================
YesNoYesNoYesNo
Sunny2383858685
Overcast4070809690
Rainy3268658070

Sunny2/93/5mean7374.6mean79.186.2
Overcast4/90/5stddev6.27.9stddev10.29.7
Rainy3/92/5
WindyPlay
==============================
YesNoYesNo
False6295
True33

False6/92/59/145/14
True3/93/5
Exampledensityvalue:
f(temperature=66|yes)=gaussianPdf(73,6/2,66)=0.0340
Classifyinganewday:
Sunny6690true?%%
Likelihoodof"yes"=2/9*0.0340*0.0221*3/9*9/14=0.000036
Likelihoodof"no"=3/5*0.0291*0.0380*3/5*5/14=0.000136
P("yes")=0.000036/(0.000036+0.000136)=20.9%
P("no")=0.000136/(0.000036+0.000136)=79.1%
Note:missingvaluesduringtraining:notincludedincalculationofmeanandstandarddeviation
BTW,analternativetotheaboveisapplysomediscretizationpolicytothedatae.g.[Yang03].Suchdiscretizationisgoodpracticesinceitcandramatically
improvetheperformanceofaNaiveBayesclassifier(see[Dougherty95]
Notso"Naive"Bayes
WhydoesNaiveBayesworksowell?[Domingos97]offeroneanalysis:
Theyofferoneexamplewiththreeattributeswheretheperformancewherea"Naive"anda"optimal"Bayespeformnearlythesame.
Theygeneralizedthattoconcludethat"Naive"bayesisonlyreallyNaiveinavanishinglysmallnumberofcases.
Therethreeattributeexampleisgivenbelow.Forthegeneralizedexample,seetheirpaper.
ConsideraBooleanconcept,describedbythreeattributesA,BandC.
Assumethatthetwoclasses,denotedby+andareequiprobable
(P(+)=P()=1/2).
LetAandCbeindependent,andletA=B(i.e.,AandBarecompletelydependent).ThereforeBshouldbeignored,andtheoptimalclassificationprocedure
foratestinstanceistoassignitto(i)class+if
P(A|+)*P(C|+)P(A|)*P(C|)>0,
and(ii)toclass(iftheinequalityhastheoppositesign),and(iii)toanarbitraryclassifthetwosidesareequal.
NotethattheBayesianclassifierwilltakeBintoaccountasifitwasindependentfromA,andthiswillbeequivalenttocountingAtwice.Thus,theBayesian
classifierwillassigntheinstancetoclass+if
P(A|+)^2*P(C|+)P(A|)^2*P(C|)>0,
andtootherwise.
ApplyingBayes'theorem,P(A|+)canbereexpressedas
P(A)*P(+|A)/P(+)
andsimilarlyfortheotherprobabilities.
SinceP(+)=P(),aftercancelingliketermsthisleadstotheequivalentexpressions
P(+|A)*P(+|C)P(|A)*P(|C)>0
fortheoptimaldecision,and
P(+|A)^2*P(+|C)P(|A)^2*P(|C)>0
fortheBayesianclassifier.Let
P(+|A)=p
P(+|C)=q.
Thenclass+shouldbeselectedwhen
pq(1p)*(1q)>0
whichisequivalentto
q>1p[OptimalBayes]
WiththeBayesianclassifier,itwillbeselectedwhen
p^2*q(1p)^2*(1q)>0
whichisequivalentto
q>(1p)^2*p^2+(1p)^2[SimpleBayes]
Thetwocurvesareshowninfollowingfigure.Theremarkablefactisthat,eventhoughtheindependenceassumptionisdecisivelyviolatedbecauseB=A,the
Bayesianclassifierdisagreeswiththeoptimalprocedureonlyinthetwonarrowregionsthatareaboveoneofthecurvesandbelowtheothereverywhereelseit
performsthecorrectclassification.
Thus,forallproblemswhere(p,q)doesnotfallinthosetwosmallregions,theBayesianclassifieriseffectivelyoptimal.
ReviewQuestions
OneR
Discuss:"OneRoffersaninductivegeneralizationbutNaiveBayesdoesnot.
StatethePseudocodeof1R
Whatisoverfitting?Discusshowthediscretizationmethodof1Rtriestoavoidoverfitting.Giveanexample.
Distinguish"superviseddiscretization"from"unsuperviseddiscretization".WhichkindofdiscretizationdoesOneRuse?Explainyouranswer.a
Applythe1Rpseudocodetothegivendatasetanddeterminetheattributeandruleswhicharetobeusedforclassification.
tearrecommended
AgeSpectacleastigmatismratelenses
===================================================================
youngmyopenoreducednone
youngmyopeyesnormalsoft
youngmyopeyesnormalhard
younghypermetropeyesreducednone
Prepresbyopicmyopenoreducednone
Prepresbyopicmyopenonormalsoft
Prepresbyopichypermetropeyesreducednone
presbyopicmyopenoreducednone
presbyopichypermetropeyesnormalnone
presbyopichypermetropenonormalsoft
Showallyourworking.Leavefractionsasfractions.
NaiveBaues
StateBayesruleanddiscusshowitcaninferfuturebeliefsasafunctionofpastbeliefsplusnewevidence.
WhatarethedrawbacksofNaiveBayesmethod?
HowaremissingvalueshandledinNaiveBayes?
Supposewehaveatableofdata:
MakeSizeConvertibleType

Mitsubishismallyescoup
Mitsubishimediumnosuv
Toyotasmallyescoup
Toyotalargenocoup
Toyotalargenosuv
Benzsmallyescoup
Benzlargenosuv
BMWsmallyescoup
BMWmediumyescoup
Fordsmallyescoup
Fordlargenosuv
Hondasmallnocoup
andweseeanewexample:

MakeSizeConvertibleType

Fordmediumno?
Calculatethefollowingforthenewexample,giventhedatabaseofoldexamples:
1.A=LikelihoodofSUV:
2.B=LikelihoodofCoup:
3.C=ProbabilityofSUV:
4.D=Probabilityofcoup:
Showallyourworking.Leavefractionsasfractions.
SitedesignbyGetTemplate.com tim@menzies.us 2006,TimMenzies,AttributionShareAlike2.5

DataMining NaiveByesBasicMethods

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

DataMining NaiveByesBasicMethods

Uploaded by

Copyright:

Available Formats

1/5/2017 DataMining:BasicMethods

SitedesignbyGetTemplate.com tim@menzies.us 2006,TimMenzies,AttributionShareAlike2.5

You might also like