Professional Documents
Culture Documents
Collins Parser
Collins Parser
(Collins 1997, 2005)
What is a supervised parser? When is it lexicalized? How are dependencies used for
CFG parsing? What is a generative model? Why discriminative reranking? How is it
evaluated? How good are the results?
Jens Illig
2011-11-24
page 1
Collins Parser
Outline
basics
(P)CFG
supervised learning
(lexicalized) PCFG
conclusion
Jens Illig
2011-11-24
page 2
Collins Parser
Probabilistic CFG
CFG
S NP V P
PCFG
S N P V P (90%)
which means:
with normalization:
Jens Illig
ruler
P(ruler |rulel ) = 1
2011-11-24
page 3
Collins Parser
Treebank
(S1,T1),(S2,T4),(S3,T5),(S4,T8)
Training Data
(S3,T5),(S4,T8)
Test Data
(S1,T1),(S2,T4)
Training Algorithm
(S1,?),(S2,?)
Model
Jens Illig
Parser
2011-11-24
Evaluation
(S1,T1')
(S1,T2')
(S2,T3')
(S2,T4')
page 4
Collins Parser
P(T, S)
= arg max P(T, S)
T
P(S)
discriminative:
generative:
estimation of P(T, S)
PCFG: P(T, S)
Jens Illig
ruleS
P(ruler |rulel )
2011-11-24
page 5
Collins Parser
Lexicalization of Rules
add head word and its PoS tag to each nonterminal
S NP V P
becomes
Jens Illig
2011-11-24
page 6
Collins Parser
= P(H(h) | P (h))
n+1
Y
~
P(Li (li ) | P (h), H(h), (i))
i=1
m+1
Y
~
P(Ri (ri ) | P (h), H(h), (i))
i=1
2011-11-24
page 7
Collins Parser
Parameter Estimation
= P(H(h) | P (h))
n+1
Y
~
P(Li (li ) | P (h), H(h), (i))
i=1
m+1
Y
~
P(Ri (ri ) | P (h), H(h), (i))
i=1
P(H(h)|P (h)) =
C(H(h), P (h))
C(P (h))
~
C(Li (li ), P (h), H(h), (i))
~
P(Li (li )|P (h), H(h), (i)) =
~
C(P (h), H(h), (i))
2011-11-24
page 8
Collins Parser
Parsing
Bottom-Up chart parsing
PoS tag sentence
each word is a potential head of a phrase
calculate probabilities of modifiers
go on
Jens Illig
2011-11-24
page 9
Collins Parser
Dataset
Penn Treebank: Wall Street Journal portion
Jens Illig
2011-11-24
page 10
Collins Parser
Evaluation
PARSEVAL evaluation measures:
Crossing Brackets (CB) = nr of constituents violating the boundaries in the gold parse
Jens Illig
2011-11-24
page 11
Collins Parser
Results Model 1
Jens Illig
2011-11-24
page 12
Collins Parser
Subcategorization Problem
consider this parse:
Jens Illig
2011-11-24
page 13
Collins Parser
Subcategorization Problem
due to the independence of modifiers, Model 1 may parse:
Jens Illig
2011-11-24
page 14
Collins Parser
Subcategorization Problem
Solution: distinguish modifiers into complements (-C) and adjuncts
Jens Illig
2011-11-24
page 15
Collins Parser
Model 2
Extend Model 1:
m+1
Y
~
]
P(Li (li )|P (h), H(h), (i),
LCi )
i=1
n+1
Y
~
]
P(Ri (ri )|P (h), H(h), (i),
RCi )
i=1
draw sets of allowed complements (subcat sets) for the left (LC) and right (RC) side
generate each complement in LC/RC exactly once.
no STOP before the subcat set is satisfied
Jens Illig
2011-11-24
page 16
Collins Parser
Results Model 2
Jens Illig
2011-11-24
page 17
Collins Parser
Jens Illig
2011-11-24
page 18
Collins Parser
Jens Illig
2011-11-24
page 19
Collins Parser
Log-Linear Models
for PCFG, one step is an application of a CFG-rule:
P(T, S)
P(ruler |rulel )
ruleS
ruleG
log(P(T, S))
ruleG
Jens Illig
2011-11-24
page 20
Collins Parser
Jens Illig
2011-11-24
page 21
Collins Parser
Conclusion
Lexicalized parser
Head-centric generative process
Extensions for subcategorization (and wh-movement)
Discriminative Reranking of results
Jens Illig
2011-11-24
page 22
Collins Parser
Jens Illig
2011-11-24
page 23
Collins Parser
Parsing 1/3
bottom up chart parsing:
choose a complete(+) phrase as head for a new phrase
Jens Illig
2011-11-24
page 24
Collins Parser
Parsing 2/3
add completed neighbouring phrases as modifiers
Jens Illig
2011-11-24
page 25
Collins Parser
Parsing 3/3
complete by adding STOP modifiers
Jens Illig
2011-11-24
page 26
Collins Parser
wh-Movement Rules
Solution: Account for (+gap) rules separately. Allow generation of a TRACE under
a (+gap)-version of a nonterminal.
Jens Illig
2011-11-24
page 27
Collins Parser
Jens Illig
2011-11-24
page 28
Collins Parser
Model 3
Extend Model 2: new random variable G with values:
m+1
Y
~
]
LCi )
P(Li (li )|P (h), H(h), (i),
i=1
n+1
Y
~
]
RCi )
P(Ri (ri )|P (h), H(h), (i),
i=1
Jens Illig
2011-11-24
page 29
Collins Parser
Results Model 3
Jens Illig
2011-11-24
page 30
Collins Parser
linear combination: p
=
pmle
+ (1 ) pbackof f
recursively stacked: p
backof f = pmle + (1 ) pbackof f
all words occurring less than 5 times are replaced by UNKNOWN
Jens Illig
2011-11-24
page 31
Collins Parser
History-Based Models
history-based model (generative, structured):
Qn
Jens Illig
2011-11-24
page 32
Collins Parser
Boosting
machine-learning algorithm
composition of (typically) simple classifiers
repeatedly add a new classifier which is trained with particular focus on the samples
that are incorrectly classified by the previous zoo of classifiers
Here:
Jens Illig
2011-11-24
page 33