You are on page 1of 41

TRNG I HC IN LC

HC MY

Thc s: Phm c Hng

Ni dung trnh by

Khi nim Cc loi thut ton Cc v d hc my Quy trnh gii quyt bi ton bng phng php my hc Biu din d liu Cc thut ton my hc

Khi nim Hc my

Hc my l mt phng php to ra cc chng trnh my tnh bng vic phn tch cc tp d liu. Hc my c lin quan ln n thng k, v c hai lnh vc u nghin cu vic phn tch d liu, nhng khc vi thng k, hc my tp trung vo s phc tp ca cc gii thut trong vic thc thi tnh ton

ng dng Hc my

Cc ng dng Hc my bao gm:


My truy tm d liu. Chn on y khoa. Pht hin th tn dng gi. Phn tch th trng chng khon. Phn loi cc chui DNA, nhn dng ting ni v ch vit, dch t ng, chi tr chi v c ng r-bt (robot locomotion).

Cc loi thut ton Hc my

Hc c gim st -- trong , thut ton to ra mt hm nh x d liu vo ti kt qu mong mun. Mt pht biu chun v mt vic hc c gim st l bi ton phn loi: chng trnh cn hc (cch xp x biu hin ca) mt hm nh x mt vector ti mt vi lp bng cch xem xt mt s mu d_liu - kt_qu ca hm . Hc khng gim st -- m hnh ha mt tp d liu, khng c sn cc v d c gn nhn.
5

Cc loi thut ton Hc my (tip)

Hc na gim st : kt hp cc v d c gn nhn v khng gn nhn sinh mt hm hoc mt b phn loi thch hp.
Hc tng cng : trong , thut ton hc mt chnh sch hnh ng ty theo cc quan st v th gii. Mi hnh ng u c tc ng ti mi trng, v mi trng cung cp thng tin phn hi hng dn cho thut ton ca qu trnh hc.
6

Cc loi thut ton Hc my (tip)

Chuyn i -- tng t hc c gim st nhng khng xy dng hm mt cch r rng. Thay v th, c gng on kt qu mi da vo cc d liu hun luyn, kt qu hun luyn, v d liu th nghim c sn trong qu trnh hun luyn.
Hc cch hc -- trong thut ton hc thin kin quy np ca chnh mnh, da theo cc kinh nghim gp.

Biu din mt bi ton hc my

Cc v d hc my

Cc v d hc my (tip)

10

Cc v d hc my (tip)

11

Cc v d hc my (tip)

12

Quy trnh hc my

13

Quy trnh gii bi ton hc gim st

14

Cc thut ton hc

Cc thut ton hc bao gm:


Bayes (Mitchell, 1996). Cy quyt nh (Fuhr et al, 1991). Vc-t trng tm (Centroid- based vector) (Han v Karypis, 2000). k-lng ging gn nht (Yang, 1994). Mng nron (Wiener et al, 1995). Support vector machines (Joachims, 1998).

15

Biu din d liu

16

Biu din d liu (tip)

i vi d liu phi cu trc th phi biu din bng d liu c cu trc. Biu din d liu bng M hnh thng tin khng gian-Vector

17

V d: Biu din nh bng Vc-t

18

V d: Biu din vn bn bng vc - t

Cho vn bn D = Khi tt c u ngh hai i mnh nht ng Nam sp sa vo hai hip ph th bt ng ci u vng ca L Cng Vinh i ln tch tc mang v chic cp AFF cho i tuyn Vit Nam... Gi s b t in bao gm: Th_thao, Bng_, i_tuyn, ng_Nam_, Cp_AFF, Vit_Nam Th vn bn D c biu din bng phng php tn sut l: D = (0,0,1,1,1,1)
19

Biu din d liu (tip)

Hnh: Biu din cc vc-t vn bn trong khng gian ch c 2 thut ng


20

Cc gi tr wij c tnh da trn tn s (hay s ln) xut hin ca thut ng trong vn bn. Gi fij l s ln xut hin ca thut ng ti trong vn bn dj, khi wij c tnh bi mt trong ba cng thc:

Trong : log(X) - logarit c s 10 ca X.


21

Cc thut ton my hc

La chn, m hnh hc hiu qu pht trin


-

Bayes (Mitchell, 1996). Cy quyt nh (Fuhr et al, 1991). Vc-t trng tm (Centroid- based vector) (Han v Karypis, 2000). k-lng ging gn nht (Yang, 1994). Mng nron (Wiener et al, 1995). Support vector machines (Joachims, 1998).

22

Phn loi Bayes

23

Phn loi Bayes (tip)

nh l Bayes cho php tnh xc sut xy ra ca mt s kin ngu nhin A khi bit s kin lin quan B xy ra. Xc sut ny c k hiu l P(A|B), v c l "xc sut ca A nu c B".

24

Bayes (tip)

Theo nh l Bayes, xc sut xy ra A khi bit B s ph thuc vo 3 yu t:

Xc sut xy ra A ca ring n, khng quan tm n B. K hiu l P(A) v c l xc sut ca A Xc sut xy ra B ca ring n, khng quan tm n A. K hiu l P(B) v c l "xc sut ca B". Xc sut xy ra B khi bit A xy ra. K hiu l P(B|A) v c l "xc sut ca B nu c A".

Khi bit ba i lng ny, xc sut ca A khi bit B cho bi cng thc:

25

V d: Gi s chng ta d on mt ngi sau c chi tennis hay khng? da vo tp d liu sau y:

26

V d: (tip)

S kin A: Anh ta chi tennis S kin B: Ngoi tri l nng v Gi l mnh

Xc sut P(A): Xc sut rng anh ta chi tennis (bt k Ngoi tri nh th no v Gi ra sao) Xc sut P(B ): Xc sut rng Ngoi tri l nng v Gi l mnh P(B|A): Xc sut rng Ngoi tri l nng v Gi l mnh, nu bit rng anh ta chi tennis P(A|B): Xc sut rng anh ta chi tennis, nu bit rng Ngoi tri l nng v Gi l mnh

27

P(A|B) => Gi tr xc sut c iu kin ny s c dng d on xem anh ta c chi tennis hay khng? P(A)=8/12, P(B|A)=1/2 Trong trng hp: A l Anh ta khng chi tennis P(A)=4/12, P(B|A)=1/2

28

Phn loi Naive Bayes


- Biu din bi ton phn loi (classification problem) +) Mt tp hc D_train, trong mi v d hc x c gn nhn v biu din l mt vect n chiu: (x1, x2, , xn) +) Mt tp xc nh cc nhn lp: C={c1 , c2 , , cm } +) Vi mt v d (mi) z, z s c phn vo lp no? - Mc tiu: Xc nh phn lp c th (ph hp) nht i vi z +) Cmap = argmax P(ci | z) vi cC = argmax P(ci |z1 ,z2 ,,zn) = argmax [ P(z1 ,z2 ,..., zn | ci ).P(ci) ] / P(z1 ,z2 ,, zn) (Bi nh l Bayes)
29

Thut ton Phn loi Bayes

30

Thut ton Phn loi Bayes (tip)


n P(d | C i) P( x | C i) P( x | C i) P( x | C i) ... P( x | C i) k 1 2 n k 1

Gii thch P(d)=const v

D liu quan st D l tp con ca tp gi thuyt cho nn chng ta c th phn r P(D) nh sau:

31

V d: Bng d liu hun luyn


age <=30 <=30 3140 >40 >40 >40 3140 <=30 <=30 >40 <=30 3140 3140 >40

income student redit_rating c buys_comput high no fair no high no excellent no high no fair yes medium no fair yes low yes fair yes low yes excellent no low yes excellent yes medium no fair no low yes fair yes medium yes fair yes medium yes excellent yes medium no excellent yes high yes fair yes medium no excellent no
32

Mt sinh vin tr vi mc thu nhp trung bnh v mc nh gi tn dng bnh thng s mua mt my tnh hay khng?
Lp hun luyn: C1:buys_computer = yes C2:buys_computer = no D liu cn phn loi: X = (age <=30, Income = medium, Student = yes Credit_rating = Fair)
33

P(Ci):

P(buys_computer = yes) = 9/14 = 0.643 P(buys_computer = no) = 5/14= 0.357

Tnh P(X|Ci) cho tng lp


P(age = <=30 | buys_computer = yes) = 2/9 = 0.222 P(age = <= 30 | buys_computer = no) = 3/5 = 0.6 P(income = medium | buys_computer = yes) = 4/9 = 0.444 P(income = medium | buys_computer = no) = 2/5 = 0.4 P(student = yes | buys_computer = yes) = 6/9 = 0.667 P(student = yes | buys_computer = no) = 1/5 = 0.2 P(credit_rating = fair | buys_computer = yes) = 6/9 = 0.667 P(credit_rating = fair | buys_computer = no) = 2/5 = 0.4

34

X = (age <= 30 , income = medium, student = yes, credit_rating = fair) P(X|Ci) :

P(X|buys_computer = yes) = 0.222 x 0.444 x 0.667 x 0.667 = 0.044 P(X|buys_computer = no) = 0.6 x 0.4 x 0.2 x 0.4 = 0.019 P(X|buys_computer = yes) * P(buys_computer = yes) = 0.028 P(X|buys_computer = no) * P(buys_computer = no) = 0.007

P(X|Ci)*P(Ci) :

Theo l thuyt th X thuc vo lp (buys_computer = yes)


35

36

37

38

39

40

41

You might also like