You are on page 1of 26

Word Sense Disambiguation

2000. 3. 24.

Contents
Introduction and preliminaries
Supervised Learning
Bayesian Classification
Information Theoretic Approach
Dictionary Based Disambiguation
Disambiguation based on sense definitions
Thesaurus-based Disambiguation
Disambiguation based on translations in a
second-language corpus
One Sense/Discourse,One Sense/Collocation
Unsupervised Learning
Introduction
Word Sense disambiguation
Word sense ambiguity
Bank : ,
Title :
, , , ,
In gallery : This work doesnt have a title
butter :
Semantic Tagging
Preliminaries
Supervised vs. Unsupervised learning
Supervised : classification
Unsupervised : clustering
Pseudowords
Large training/test collection
banana-door : corpus banana door
ambiguity
Upper and lower bounds
Upper bound : Human power.
Gale et al.s work :
(97%~99% )
Lower bound :
Supervised Learning
Two Approach
Bayesian Classification
Context window source
Structure
Information-theoretic approach
Context information
feature(indicator) sense
Bayesian Classification
Bayess decision rule

Bayes rule



) (
) (
) | (
) | (
k
k
k
s P
c P
s c P
c s P =
) | ( '
max arg
c s P s
k
s
k
=
)] ( log ) | ( [log
) ( ) | ( '
max arg
max arg
k k
s
k k
s
s P s c P
s P s c P s
k
k
+ =
=
Bag of words
Navie Bayes assumptions
context window c

Use MLE
P(vj|sk)=C(vj ,sk)/C(sk)
P(sk) = C(sk)/C(w)
sense s (p.238 Fig 7.1)

) | ( | } | ({ ) | (
k j c in v k j j k
s v P s c in v v P s c P
j
[
= =
)] ( log ) | ( [log '
max arg
k k
s
s P s c P s
k
+ =
Gale, Church and Yarowsky(1992)
Hansard corpus
duty, drug, land, language,position,
sentence
90%
Sense[drug] Clues for sense
medication Prices, prescription,patent,increase, consumer,
pharmaceutical
Illegal subatance Abuse,paraphernalia,illict, alcohol, cocaine,
traffickers
Information-theoretic
approach
Brown et al.s (1991) work
-




I(P; Q) Indicator
P: , Q : indicator value
Mutual information

Ambiguous word Indicator Examples: valuesense
prendre object Measureto take
Decision to make
voulouir tense Present to want
Conditional to like
Cent Word to the left Per%
Numberc.[money]

e e
=
X x Y y
y p x p
y x p
y x p Y X I
) ( ) (
) , (
log ) , ( ) ; (
Algorithm
Maximize I(P; Q)
indicator
I(P;Q) indicator Q
partition set
Flip-Flop algorithm(p. 240, Fig 7.2)



Find random partition P={P1,P2} of {T1Tm}
While (improving) do
Find partition Q={Q1,Q2} of {X1Xn} maximizes I(P;Q)
Find partition P={P1,P2} of {t1tm} maximizes I(P;Q)
End
(T1Tm : tranlation word, X1Xn : indicators possible value)
Dictionary-Based
Disambiguation


(Lesk, 1986)
(Yarowsky, 1992)
Bilingual dictionary corpus
(Dagan and Itai,1994)

Disambiguation based on
sense definitions

D1Dk ,s1sk
Algorithm(p.243, Fig 7.3)





Accuracy : 50% ~ 70%
comment: Given context c
for all senses sk of w do
score(sk) = overlap(Dk, Evj)
end
s=argmax score(sk)
*.Evj : context


Example
word ash




scoring
sense Definition
s1 tree a tree of the olive family
s2 burned stuff the solid residue left when
combustible matrial is burned
Scores Context
s1 s2
0 1 This cigar burns slowly and creates a stiff ash.
1 0 The ash is one of the last tress to com into leaf.
Thesaurus-based
Disambiguation

Walkers algorithm (1987) (p.245, Fig. 7.4)





Yarowskys algorithm
Bayes classifier
context category ,
catetgory

comment: given context c
for all senses sk of w do
score(sk) = E vj in c o(t(sk),vj)
end
s = arg max score(sk)
*. o(t(sk),vj) = 1 , iff t(sk) vj subject code
= 0,
Yarowsks algorithm
context score (p.246, Fig 7.5)
Navie Bayes assumption
score(ci,tl) = P(tl|ci)


sense s,
) (
) (
) | (
) (
) (
) | (
) | (
l
vinc
vinc
l
l
i
l i
i l
t p
v P
t v P
t P
c p
t c P
c t P
i
i
[
[
= =
))] ( ( log )) ( | ( [log '
max arg
k k
s
s t P s t c P s
k
+ =
Some Results
Roget categories
Word Sense Roget category Accuracy
bass Musical senses MUSIC 99%
fish ANIMAL,INSECT 100%
star space object UNIVERSE 96%
celebrity ENTERTAINER 95%
star shaped object INSIGNIA 82%
intere
st
curiosity RESONING 88%
advantage INJUSTICE 34%
financial DEBT 90%
share PROPERTY 38%
Disambiguation based on translations
in a second-language corpus
Dagan and Itai(1994)

Algorithm(p.249, Fig 7.6)








comment: Given : a context c in which w occurs in relation R(w,v)
for all senses sk of w do
score (sk)= |{ceS | -w eT(sk), v eT(v): R(w,v) ec}|
end
s =arg max score(sk)
*. S : second language corpus
*. T(x) : possible translation of x
Example
interest




show interest : show zeigen
zeigen interesse
sense2
sense1 sense2
Definition legal share attention, concern
Translation Beteiligung Interesse
English collocation acquire an interest show interest
Translation Beteiligung
erwerben
Interesse zeigen
One Sense per Discourse,
One Sense per Collocation
One sense per discourse
sense

One sense per collocation
sense

collocation sense
(collocation word f : )
) | (
) | (
2
1
f s P
f s P
k
k
Unsupervised
Disambiguation
Completely unsupervised
disambiguation
sense tagging
context-group
clustering grouping
Gale et al.s Bayes classifier

K s1 sK group(sense)
P(sk|c)
EM algorithm (p.254 Fig 7.8)
Unsupervised
Disambiguation (cont.)
K
K sense
training corpus
corpus
, tagging corpus
sense .

Word Sense
Word Sense ?

sense : ?
Systematic Polysemy
Co-activation (p.258 7.9, 7.10)
the act of X and the people doing X
Organization, administration, formation
Proper nouns : Brown, Bush, Army
Application

You might also like