Basian Learning

( ( .
) )
.

:
. .
)
.1

.2 .3
:

.4
. : .
.1 .2

93%
) .

. . . ) (
ML , MAP :
Optimal classifier, Gibbs alg., Naive Bayes learning Bayesian belief network learning

H . D D . . .
:
. D . . h D D P(h|D) . D h H : =P(h) (prior probablity) . . =P(D) =P(D|h)
.1
.2 .3
(posterior
. .
D probablity)
. :
Likelihood
P ( D | h) P ( h) P(h | D) ! P( D)
Posterior probability Evidence
Prior probability
P ( D | h) P ( h) P ( h | D) ! P( D)
P(h|D) h P(D) D D . h .

Maximum A Posteriori (MAP) hypothesis

H Maximum A Posteriori (MAP) hypothesis . hMAP | arg max P( h | D )
P ( D | h) P (h) ! arg max P( D) hH ! arg max P ( D | h) P (h)
hH hH
P(D)
Maximum likelihood (ML) hypothesis

P(h) H . . h maximum : liklihood (ML)

P(D | h)
D
liklihood
hML=argmaxhH P(D | h)
:
: -2 0.008 . :
.
P(cancer)=0.008, P(+|cancer)=0.98, P(+|~cancer)=0.03, P(~cancer)=0.992, P(-|cancer)=0.02, P(-|~cancer)=0.97
-1

98% 97% .
:

: P(cancer|+) = P(+|cancer) P(cancer) / P(+) = (0.98)(0.008) / P(+) = 0.0078 / P(+) : P(~cancer|+) = P(+|~cancer) P(~cancer) / P(+) = (0.03)(0.992) / P(+) = 0.0298 / P(+) : MAP hmap=~cancer
Brute-force MAP Learning

: Brute-force MAP Learning Algorithm H h . h MAP

Bayes Optimal Classifier

Brute-Force MAP learning : :

MAP :

. .
h1,h2,h3 3 : P(h1|D) = 0.4, P(h2|D) = 0.3, P(h3|D) = 0.3 . MAP h1 x P(h1) = +, P(h2) = - and P(h3) = .4 x x .6
Bayes Optimal Classifier

. Vj
P (v j | D ) !

vj :
P(v j | hi ) P(hi | D)
hiH
:
arg max
v j V
P(v j | hi ) P (hi | D)
hiH
Bayes Optimal Classification
Optimal Classification . P(h1|D) = 0.4 P(h2|D) = 0.3 P(h3|D) = 0.3 P(-|h1) = 0 P(+|h1) = 1 P(-|h2) = 1 P(+|h2) = 0 P(-|h3) = 1 P(+|h3) = 0

i P( + | hi ) P (hi | D) = 0.4 and i P( - | hi ) P (hi | D) = 0.6 .
Naive Bayes Classifier

Naive Bayes . :
x . v f(x) .

learner

f:X (a1,an) .

V x :
vMAP ! arg max P(v j | a1, - , an ) :

v j V
vMAP ! arg max
v j V v j V
vmap
P (a1 ,- , an | v j ) P (v j )
P( a1, - , an ) ! arg max P (a1 ,- , an | v j ) P (v j )

vMAP ! arg max P (a1 ,- , an | v j ) P (v j )
v j V
vj
(Naive)
. P(a1,an | vj) . Naive Bayes Classifier
P(vj)

(a1,an)
n
:
NB
! arg max P(v j ) P(ai | v j )

v j V i !1

: Naive Bayes Classifier P(ai| vj) P(vj) :
n
v
.
NB
! arg max P(v j ) P (ai | v j )

v j V i !1
Day Day1 Day2 Day3 Day4 Day5 Day6 Day7 Day8 Day9 Day10 Day11 Day12 Day13 Day14
Outlook Sunny Sunny Overcast Rain Rain Rain Overcast Sunny Sunny Rain Sunny Overcast Overcast Rain
Temperature Hot Hot Hot Mild Cool Cool Cool Mild Cool Mild Mild Mild Hot Mild
Humidity High High High High Normal Normal Normal High Normal Normal Normal High Normal High
Wind Weak Strong Weak Weak Weak Strong Strong Weak Weak Weak Strong Strong Weak Strong
Play Tennis No No Yes Yes Yes No Yes No Yes Yes Yes Yes Yes No
:
x:(Outl=Sunny, Temp=Cool, Hum=High,Wind=strong)
:
vNB ! arg max P (vk ) P (ai | vk )
vk [ yes , no ] i
! arg max P (vk ) P (Outlook ! sunny | vk ) P (Temp ! cool | vk )

vk [ yes , no ]
P ( Humidity ! high | vk ) P (Wind ! strong | vk )
10
P( PlayTennis ! yes) ! 9 / 14 ! 0.64 P( PlayTennis ! no) ! 5 / 14 ! 0.36 P(Wind ! strong | PlayTennis ! yes) ! 3 / 9 ! 0.33 P(Wind ! strong | PlayTennis ! no) ! 3 / 5 ! 0.60 etc.
P( yes) P( sunny | yes) P(cool | yes) P (high | yes) P( strong | yes) ! 0.0053 P(no) P( sunny | no) P(cool | no) P(high | no) P ( strong | no) ! 0.0206 answer : PlayTennis ( x ) ! no
nc/n . P(ai|vj)=0 m.
m-estimate of probablity

nc ai . vj estimate
n mp
c
nm
p
p p=1/k
n nc .
:

. ) (
.1
.
.2
: 100 100 . .
.1
.2
(50000
) .
. . doc = (a1=w1, ai=wk, , an=wn) . 700 :

vk [ dislike ,like ]
like 100
1000
300 dislike
vNB ! arg max P (vk ) P (ai | vk )

i
! arg max P (vk ) P (a1 !" Our"| vk ) P (a 2 !" approach"| vk )

vk [ dislike ,like ]
... P (a100 !" trouble"| vk )
. . .
. . wk P(a1=wk|vj), P(a2=wk|vj),... 2*50000
P(vj) P(ai=wk|vj)

. P(wk|vj) .

m-estimate
1
n | Vocabulary |
LEARN_NAIVE_BAYES_TEXT( Examples, V ) 1. collect all words and other tokens that occur in Examples Vocabulary all distinct words and other tokens in Examples 2. calculate the required P( vj) and P( wk| vj) probability terms For each target value vj in V do docsj subset of Examples for which the target value is vj P( vj) |docsj|/| Examples| Textj a single document created by concatenating all members of docsj n total number of words in Textj (counting duplicate words multiple times) for each word wk in Vocabulary nk number of times word wk occurs in Textj P( wk| vj) ( nk + 1) / ( n + | Vocabulary|)
CLASSIFY_NAIVE_BAYES_TEXT ( Doc) positions all word positions in Doc that contain tokens found in Vocabulary Return vNB, where vNB = argmax vj in V P( vj)
i in positions
P( ai| vj)
news group
20 :
:
:

comp.graphics misc.forsale comp.os.ms-windows.misc rec.autos comp.sys.ibm.pc.hardware rec.motorcycles comp.sys.mac.hardware rec.sport.baseball comp.windows.x rec.sport.hockey alt.atheism sci.space sci.med soc.religion.christian sci.crypt talk.religion.misc sci.electronics talk.politics.mideast talk.politics.misc talk.politics.guns
1000 89 % :
. . 38500 . the
:
100 3

Basian Learning

Uploaded by

Document Information

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Basian Learning

Uploaded by

Copyright:

Available Formats



Maximum A Posteriori (MAP) hypothesis

Maximum likelihood (ML) hypothesis

Brute-force MAP Learning

: Brute-force MAP Learning Algorithm H h . h MAP

Bayes Optimal Classifier

Bayes Optimal Classifier

Bayes Optimal Classification

i P( + | hi ) P (hi | D) = 0.4 and i P( - | hi ) P (hi | D) = 0.6 .

Naive Bayes Classifier

Naive Bayes Classifier

vMAP ! arg max P(v j | a1, - , an ) :

P( a1, - , an ) ! arg max P (a1 ,- , an | v j ) P (v j )

Naive Bayes Classifier

. P(a1,an | vj) . Naive Bayes Classifier

! arg max P(v j ) P(ai | v j )

Naive Bayes Classifier

! arg max P(v j ) P (ai | v j )

! arg max P (vk ) P (Outlook ! sunny | vk ) P (Temp ! cool | vk )

P ( Humidity ! high | vk ) P (Wind ! strong | vk )

. . doc = (a1=w1, ai=wk, , an=wn) . 700 :

vNB ! arg max P (vk ) P (ai | vk )

! arg max P (vk ) P (a1 !" Our"| vk ) P (a 2 !" approach"| vk )

... P (a100 !" trouble"| vk )

. . wk P(a1=wk|vj), P(a2=wk|vj),... 2*50000

You might also like