Professional Documents
Culture Documents
Guillaume Wisniewski
guillaume.wisniewski@limsi.fr
March 2018
Université Paris Sud & LIMSI
1
Natural Language Understanding
2
Dependency parsing
3
Why is NLU difficult ?
4
Zipf Law
A plot of the rank versus frequency for the first 10 million words in 30
Wikipedias
5
Structure Prediction
Dependency parsing
• given a sentence
x = x1 , ..., xn predict its
dependency tree...
• ...knowing a set of
annotated sentences (i.e. :
sentences with their gold
dependency tree)
⇒ structure prediction
7
Structure prediction
The task
• input : sequences
• output : sequences, tree, ...
• both input and output can be divided into sub-parts ⊕
relations/dependencies between these parts
8
Understanding structure prediction
A new application
9
Starting with a simpler problem
10
Starting with a simpler problem
10
A first hand-written recognition system
11
A first hand-written recognition system
11
A first hand-written recognition system
11
A first hand-written recognition system
11
A first hand-written recognition system
11
A first hand-written recognition system
→ non optimal
11
A better way...
B C A C A
P R O R E
G T E T O
12
A better way...
B C A C A
P R O R E
G T E T O
12
Why is it easy ?
14
Transition-Based Dependency
Parsing
General Principle
15
The Arc-Eager Parser [Nivre 2003]
built
Notations :
16
Example
Stack Buffer
[Root, Economic, news, had, little, effect, on,
[]
financial, market, .]
17
Example
Stack Buffer
[Economic, news, had, little, effect, on, financial,
[Root]
market, .]
S-S
17
Example
Stack Buffer
[Root, Economic] [news, had, little, effect, on, financial, market, .]
S-S-L(amod)
17
Example
amod
Stack Buffer
[Root] [news, had, little, effect, on, financial, market, .]
S-S-L(amod)-S
17
Example
amod
Stack Buffer
[Root, news] [had, little, effect, on, financial, market, .]
S-S-L(amod)-S-L(nsubj)
17
Example
amod nsubj
Stack Buffer
[Root] [had, little, effect, on, financial, market, .]
S-S-L(amod)-S-L(nsubj)-R(root)
17
Example
root
amod nsubj
Stack Buffer
[Root, had] [little, effect, on, financial, market, .]
S-S-L(amod)-S-L(nsubj)-R(root)-S
17
Example
root
Stack Buffer
[Root, had, little] [effect, on, financial, market, .]
S-S-L(amod)-S-L(nsubj)-R(root)-S-L(amod)
17
Exercise
18
Non-Determinism
1. SH-RA-LA-SH-RA-SH-LA-
RE-RA-RE-RA
2. SH-RA-LA-SH-RA-RE-SH-
LA-RA-RE-RA
19
Open question
20
A multi-class classification problem
• sequence of decisions
• among all legal actions : pick
the ‘best’ one
⇒ multi-class problem
Formally :
where :
• A = set of legal actions
• φ = joint representation of
configuration & action
21
In practice : greedy inference
22
What kind of features ?
23
What kind of features ?
• pos(S2 ) = ROOT
• pos(S1 ) = verb
• pos(S0 ) = noun
• pos(B2 ) = prep
• pos(B1 ) = adj
• pos(B0 ) = noun
23
What kind of features ?
• word(S2 ) = ROOT
• word(S1 ) = had
• word(S0 ) = effect
• word(B2 ) = on
• word(B1 ) = financial
• word(B0 ) = market
23
What kind of features ?
• dep(S1 ) = root
• dep(lc(S1 )) = nsubj
• dep(rc(S1 )) = dobj
• dep(S0 ) = dobj
• dep(lc(S0 )) = amod
• dep(rc(S0 )) = None
23
What kind of features ?
• ti−1 = Right-Arc(dobj)
• ti−2 = Left-Arc(amod)
• ti−3 = Shift
• ti−4 = Right-Arc(root)
• ti−5 = Left-Arc(nsubj)
• ti−6 = Shift
23
Training a dependency parser
First idea
• perceptron-like training
• decode
• as soon as an error is made
→ correct it
• go to next example /
continue decoding
24
Training of an arc-eager parser
25
Static oracle
26
Expert policy
Static oracle
⇒ error propagation
27
Building on the shoulders of giants...
28
Dynamic oracle [Goldberg & Nivre, 2013]
New definition
Why is it important ?
29
In practice...
30
Training of an arc-eager parser
31