You are on page 1of 38



( ( .

) )

.


:
. .

)
.1

 

.2 .3

:
  

.4

. : .
.1 .2

 

93%

) .

  

. . . ) (

ML , MAP :
Optimal classifier, Gibbs alg., Naive Bayes learning Bayesian belief network learning
  

H . D D . . .

:
. D . . h D D P(h|D) . D h H : =P(h) (prior probablity) . . =P(D) =P(D|h)
 .1

.2 .3

(posterior

. .

D probablity)

. :
Likelihood

P ( D | h) P ( h) P(h | D) ! P( D)
Posterior probability Evidence

Prior probability

P ( D | h) P ( h) P ( h | D) ! P( D)
P(h|D) h P(D) D D . h .


Maximum A Posteriori (MAP) hypothesis


H Maximum A Posteriori (MAP) hypothesis . hMAP | arg max P( h | D )
P ( D | h) P (h) ! arg max P( D) hH ! arg max P ( D | h) P (h)
hH hH

P(D)

Maximum likelihood (ML) hypothesis


P(h) H . . h maximum : liklihood (ML)


P(D | h)
D

liklihood

hML=argmaxhH P(D | h)

:
: -2 0.008 . :
.
P(cancer)=0.008, P(+|cancer)=0.98, P(+|~cancer)=0.03, P(~cancer)=0.992, P(-|cancer)=0.02, P(-|~cancer)=0.97

-1
 
 

98% 97% .

:


: P(cancer|+) = P(+|cancer) P(cancer) / P(+) = (0.98)(0.008) / P(+) = 0.0078 / P(+) : P(~cancer|+) = P(+|~cancer) P(~cancer) / P(+) = (0.03)(0.992) / P(+) = 0.0298 / P(+) : MAP hmap=~cancer

Brute-force MAP Learning




: Brute-force MAP Learning Algorithm H h . h MAP


 

Bayes Optimal Classifier


Brute-Force MAP learning : :


MAP :


. .

h1,h2,h3 3 : P(h1|D) = 0.4, P(h2|D) = 0.3, P(h3|D) = 0.3 . MAP h1 x P(h1) = +, P(h2) = - and P(h3) = .4 x x .6

Bayes Optimal Classifier


. Vj
P (v j | D ) !


vj :

P(v j | hi ) P(hi | D)
hiH

:
arg max
v j V

P(v j | hi ) P (hi | D)
hiH

Bayes Optimal Classification

Optimal Classification . P(h1|D) = 0.4 P(h2|D) = 0.3 P(h3|D) = 0.3 P(-|h1) = 0 P(+|h1) = 1 P(-|h2) = 1 P(+|h2) = 0 P(-|h3) = 1 P(+|h3) = 0
 

i P( + | hi ) P (hi | D) = 0.4 and i P( - | hi ) P (hi | D) = 0.6 .

Naive Bayes Classifier


Naive Bayes . :
x . v f(x) .
   

learner

Naive Bayes Classifier


f:X (a1,an) .


V x :

vMAP ! arg max P(v j | a1, - , an ) :


v j V
vMAP ! arg max
v j V v j V

vmap

P (a1 ,- , an | v j ) P (v j )

P( a1, - , an ) ! arg max P (a1 ,- , an | v j ) P (v j )

Naive Bayes Classifier


vMAP ! arg max P (a1 ,- , an | v j ) P (v j )
v j V

vj

(Naive)

. P(a1,an | vj) . Naive Bayes Classifier

P(vj)

  

(a1,an)
n

:
Naive Bayes Classifier

NB

! arg max P(v j ) P(ai | v j )


v j V i !1

Naive Bayes Classifier


: Naive Bayes Classifier P(ai| vj) P(vj) :
n

v
.

NB

! arg max P(v j ) P (ai | v j )


v j V i !1

Day Day1 Day2 Day3 Day4 Day5 Day6 Day7 Day8 Day9 Day10 Day11 Day12 Day13 Day14

Outlook Sunny Sunny Overcast Rain Rain Rain Overcast Sunny Sunny Rain Sunny Overcast Overcast Rain

Temperature Hot Hot Hot Mild Cool Cool Cool Mild Cool Mild Mild Mild Hot Mild

Humidity High High High High Normal Normal Normal High Normal Normal Normal High Normal High

Wind Weak Strong Weak Weak Weak Strong Strong Weak Weak Weak Strong Strong Weak Strong

Play Tennis No No Yes Yes Yes No Yes No Yes Yes Yes Yes Yes No

:
x:(Outl=Sunny, Temp=Cool, Hum=High,Wind=strong)

:
vNB ! arg max P (vk ) P (ai | vk )
vk [ yes , no ] i

! arg max P (vk ) P (Outlook ! sunny | vk ) P (Temp ! cool | vk )


vk [ yes , no ]

P ( Humidity ! high | vk ) P (Wind ! strong | vk )

10

P( PlayTennis ! yes) ! 9 / 14 ! 0.64 P( PlayTennis ! no) ! 5 / 14 ! 0.36 P(Wind ! strong | PlayTennis ! yes) ! 3 / 9 ! 0.33 P(Wind ! strong | PlayTennis ! no) ! 3 / 5 ! 0.60 etc.

P( yes) P( sunny | yes) P(cool | yes) P (high | yes) P( strong | yes) ! 0.0053 P(no) P( sunny | no) P(cool | no) P(high | no) P ( strong | no) ! 0.0206 answer : PlayTennis ( x ) ! no

nc/n . P(ai|vj)=0 m.
m-estimate of probablity

  

nc ai . vj estimate

n  mp
c

nm
p

p p=1/k

n nc .

:
 

. ) (

.1

.
.2

: 100 100 . .

 .1

.2

(50000

) .

. . doc = (a1=w1, ai=wk, , an=wn) . 700 :


vk [ dislike ,like ]

like 100

1000

300 dislike

vNB ! arg max P (vk ) P (ai | vk )


i

! arg max P (vk ) P (a1 !" Our"| vk ) P (a 2 !" approach"| vk )


vk [ dislike ,like ]

... P (a100 !" trouble"| vk )

. . .

. . wk P(a1=wk|vj), P(a2=wk|vj),... 2*50000

P(vj) P(ai=wk|vj)

 

. P(wk|vj) .


m-estimate

1

n  | Vocabulary |

LEARN_NAIVE_BAYES_TEXT( Examples, V ) 1. collect all words and other tokens that occur in Examples Vocabulary all distinct words and other tokens in Examples 2. calculate the required P( vj) and P( wk| vj) probability terms For each target value vj in V do docsj subset of Examples for which the target value is vj P( vj) |docsj|/| Examples| Textj a single document created by concatenating all members of docsj n total number of words in Textj (counting duplicate words multiple times) for each word wk in Vocabulary nk number of times word wk occurs in Textj P( wk| vj) ( nk + 1) / ( n + | Vocabulary|)

CLASSIFY_NAIVE_BAYES_TEXT ( Doc) positions all word positions in Doc that contain tokens found in Vocabulary Return vNB, where vNB = argmax vj in V P( vj)

i in positions

P( ai| vj)

news group
20 :

:
:


comp.graphics misc.forsale comp.os.ms-windows.misc rec.autos comp.sys.ibm.pc.hardware rec.motorcycles comp.sys.mac.hardware rec.sport.baseball comp.windows.x rec.sport.hockey alt.atheism sci.space sci.med soc.religion.christian sci.crypt talk.religion.misc sci.electronics talk.politics.mideast talk.politics.misc talk.politics.guns

1000 89 % :
. . 38500 . the

:
100 3

 

You might also like