Collins Parser Presentation

Seminar: Recent Advances in Parsing Technology (WS 2011/2012)
Collins Parser
Collins Parser
(Collins 1997, 2005)
What is a supervised parser? When is it lexicalized? How are dependencies used for
CFG parsing? What is a generative model? Why discriminative reranking? How is it
evaluated? How good are the results?
Jens Illig
2011-11-24
page 1
Collins Parser
Outline
basics
(P)CFG
supervised learning
(lexicalized) PCFG
Collins 1997: Probabilistic parser

model 1: generative version of (Collins 1996)
model 2: + complement/adjunct distinction
(model 3: + wh-movement model)
Collins 2005: Reranking

reranker architecture
generative / discriminative, (log-)linear
conclusion
Jens Illig
2011-11-24
page 2
Collins Parser
Probabilistic CFG
CFG
S NP V P
PCFG
S N P V P (90%)
which means:
P(Ruler = hN P, V P i | Rulel = S) = 0.9
with normalization:
Jens Illig
ruler
P(ruler |rulel ) = 1
2011-11-24
page 3
Collins Parser
Supervised Parsing Architecture
Treebank
(S1,T1),(S2,T4),(S3,T5),(S4,T8)
Training Data
(S3,T5),(S4,T8)
Test Data
(S1,T1),(S2,T4)
Training Algorithm
(S1,?),(S2,?)
Model
Jens Illig
Parser
2011-11-24
Evaluation
(S1,T1')
(S1,T2')
(S2,T3')
(S2,T4')
page 4
Collins Parser
Finding the Best Parse

Tbest = arg max P(T |S) = arg max
T
P(T, S)
= arg max P(T, S)
T
P(S)
Two types of models
discriminative:
P(T |S) estimated directly
P(T, S) distribution not available
no model parameters for generating S
generative:
estimation of P(T, S)
PCFG: P(T, S)
Jens Illig
ruleS
P(ruler |rulel )
2011-11-24
page 5
Collins Parser
Lexicalization of Rules
add head word and its PoS tag to each nonterminal
S NP V P
becomes
S(loves, VB) N P (John, N N P ) V P (loves, VB)

lets write this as
P (h) L1 (l1 ) H(h)
Jens Illig
2011-11-24
page 6
Collins Parser
Collins 1997: Model 1

Tell a head-driven (lexicalized) generative story:
P(ruler |rulel ) = P(Ln (ln ), . . . , L1 (l1 ), H(h), R1 (r1 ), . . . , Rm (rm ) | P (h))

generate heads first,
then the left and right modifiers (independently)
= P(H(h) | P (h))
n+1
Y
~
P(Li (li ) | P (h), H(h), (i))
i=1
m+1
Y
~
P(Ri (ri ) | P (h), H(h), (i))
i=1
stop generating modifiers when

Ln+1 (ln+1 ) = STOP or
Rm+1 (rm+1 ) = STOP
~ : N hneighbour?, verb in between?, (0, 1, 2, > 2) commas in between?i

Jens Illig
2011-11-24
page 7
Collins Parser
Parameter Estimation
= P(H(h) | P (h))
n+1
Y
~
P(Li (li ) | P (h), H(h), (i))
i=1
m+1
Y
~
P(Ri (ri ) | P (h), H(h), (i))
i=1
parameters estimated by relative frequency in the training set (max. likelihood):
P(H(h)|P (h)) =
C(H(h), P (h))
C(P (h))
~
C(Li (li ), P (h), H(h), (i))
~
P(Li (li )|P (h), H(h), (i)) =
~
C(P (h), H(h), (i))
linearly smoothed with counts with less specific conditions (backoff)

Jens Illig
2011-11-24
page 8
Collins Parser
Parsing
Bottom-Up chart parsing
PoS tag sentence
each word is a potential head of a phrase
calculate probabilities of modifiers
go on
Jens Illig
2011-11-24
page 9
Collins Parser
Dataset
Penn Treebank: Wall Street Journal portion
sections 2-21 for training

40k sentences
section 23 for testing

2,416 sentences
Jens Illig
2011-11-24
page 10
Collins Parser
Evaluation
PARSEVAL evaluation measures:
nr of correctly predicted constituents

nr of all predicted constituents
nr of correctly predicted constituents
Labeled Recall (LR) =
nr of all correct constituents in the gold parse
Labeled Precision (LP) =
where correct constituent same boundaries, same label
Crossing Brackets (CB) = nr of constituents violating the boundaries in the gold parse
Jens Illig
2011-11-24
page 11
Collins Parser
Results Model 1
Jens Illig
2011-11-24
page 12
Collins Parser
Subcategorization Problem
consider this parse:
Jens Illig
2011-11-24
page 13
Collins Parser
due to the independence of modifiers, Model 1 may parse:
Jens Illig
2011-11-24
page 14
Collins Parser
Solution: distinguish modifiers into complements (-C) and adjuncts
estimate separate probabilities
learn that V P (was) prefers only one complement.

complement information might also help identifying functional information like subject
Jens Illig
2011-11-24
page 15
Collins Parser
Model 2
Extend Model 1:
P(H(h)|P (h)) P(LC|P (h), H(h)) P(RC|P (h), H(h))
m+1
Y
~
]
P(Li (li )|P (h), H(h), (i),
LCi )
i=1
n+1
Y
~
]
P(Ri (ri )|P (h), H(h), (i),
RCi )
i=1
draw sets of allowed complements (subcat sets) for the left (LC) and right (RC) side
generate each complement in LC/RC exactly once.
no STOP before the subcat set is satisfied
Jens Illig
2011-11-24
page 16
Collins Parser
Results Model 2
Jens Illig
2011-11-24
page 17
Collins Parser
Reranking (Collins 2005) Architecture
Jens Illig
2011-11-24
page 18
Collins Parser
Why rank again?

consider more features of a parse tree
CFG rule occurrence (lexicalized / with grandparent node)
bigram (nonterminals only / lexicalized) occurrence
...
parser: generative model

new random variables needed for every feature
nr of joint-probability parameters grows exponentially with nr of features

(must be avoided by a generative story introducing conditional independencies)
reranker: discriminative (log-)linear classifier

treat every feature independently
simple to extend feature set
Jens Illig
2011-11-24
page 19
Collins Parser
Log-Linear Models
for PCFG, one step is an application of a CFG-rule:
P(T, S)
P(ruler |rulel )
ruleS
P(ruler |rulel )CS (rule)
ruleG
log(P(T, S))
log(P(ruler |rulel )) CS (rule)
ruleG
i.e. linear combination in log space

call log(P(ruler |rulel )) feature weight and C(rule) feature value
Jens Illig
2011-11-24
page 20
Collins Parser
Results after Reranking
Jens Illig
2011-11-24
page 21
Collins Parser
Conclusion
Lexicalized parser
Head-centric generative process
Extensions for subcategorization (and wh-movement)
Discriminative Reranking of results
Jens Illig
2011-11-24
page 22
Collins Parser
Thanks for your attention!

questions
discussion
Jens Illig
2011-11-24
page 23
Collins Parser
Parsing 1/3
bottom up chart parsing:
choose a complete(+) phrase as head for a new phrase
Jens Illig
2011-11-24
page 24
Collins Parser
Parsing 2/3
add completed neighbouring phrases as modifiers
Jens Illig
2011-11-24
page 25
Collins Parser
Parsing 3/3
complete by adding STOP modifiers
Jens Illig
2011-11-24
page 26
Collins Parser
wh-Movement Rules
Solution: Account for (+gap) rules separately. Allow generation of a TRACE under
a (+gap)-version of a nonterminal.
Jens Illig
2011-11-24
page 27
Collins Parser
wh-Movement Rule Analysis
we observe: A TRACE can be
passed down the head (rule 3)

passed down to one of the left / right modifiers
discharged by a TRACE
Jens Illig
2011-11-24
page 28
Collins Parser
Model 3
Extend Model 2: new random variable G with values:
Head - passed down the head (3)

Left/Right - passed down to one of the left / right modifiers (LC +=gap / RC +=gap)
the gap entry in LC /RC is discharged by a TRACE or a (gap)-phrase modifier phrase
P(H(h)|P (h))P(LC|P (h), H(h))P(RC|P (h), H(h))P(G|P (h), H(h))
m+1
Y
~
]
LCi )
P(Li (li )|P (h), H(h), (i),
i=1
n+1
Y
~
]
RCi )
P(Ri (ri )|P (h), H(h), (i),
i=1
Jens Illig
2011-11-24
page 29
Collins Parser
Results Model 3
Jens Illig
2011-11-24
page 30
Collins Parser
Practical Issues - Smoothing

sparse data for full conditioning set needs backoff
linear combination: p
=
pmle
+ (1 ) pbackof f
recursively stacked: p
backof f = pmle + (1 ) pbackof f
all words occurring less than 5 times are replaced by UNKNOWN
Jens Illig
2011-11-24
page 31
Collins Parser
History-Based Models
history-based model (generative, structured):
Qn
P(T, S) = i=1 P(di |(d1 , . . . , di1 ))

i.e. a pair (t, s) is generated by a sequence of steps D = hd1 , . . . , dn i
Jens Illig
2011-11-24
page 32
Collins Parser
Boosting
machine-learning algorithm
composition of (typically) simple classifiers
repeatedly add a new classifier which is trained with particular focus on the samples
that are incorrectly classified by the previous zoo of classifiers
Here:
each simple classifier has exactly one binary feature

learning finds the feature that helps the most to improve the results of the previous
classifier zoo
Jens Illig
2011-11-24
page 33

Collins Parser Presentation

Uploaded by

Document Information

Original Description:

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Collins Parser Presentation

Uploaded by

Copyright:

Available Formats

Seminar: Recent Advances in Parsing Technology (WS 2011/2012)

Seminar: Recent Advances in Parsing Technology (WS 2011/2012)

Collins 1997: Probabilistic parser

Collins 2005: Reranking

Seminar: Recent Advances in Parsing Technology (WS 2011/2012)

P(Ruler = hN P, V P i | Rulel = S) = 0.9

Seminar: Recent Advances in Parsing Technology (WS 2011/2012)

Supervised Parsing Architecture

Seminar: Recent Advances in Parsing Technology (WS 2011/2012)

Finding the Best Parse

Two types of models

P(T |S) estimated directly

P(T, S) distribution not available

no model parameters for generating S

Seminar: Recent Advances in Parsing Technology (WS 2011/2012)

S(loves, VB) N P (John, N N P ) V P (loves, VB)

P (h) L1 (l1 ) H(h)

Seminar: Recent Advances in Parsing Technology (WS 2011/2012)

Collins 1997: Model 1

P(ruler |rulel ) = P(Ln (ln ), . . . , L1 (l1 ), H(h), R1 (r1 ), . . . , Rm (rm ) | P (h))

stop generating modifiers when

Seminar: Recent Advances in Parsing Technology (WS 2011/2012)

parameters estimated by relative frequency in the training set (max. likelihood):

linearly smoothed with counts with less specific conditions (backoff)

Seminar: Recent Advances in Parsing Technology (WS 2011/2012)

Seminar: Recent Advances in Parsing Technology (WS 2011/2012)

sections 2-21 for training

section 23 for testing

Seminar: Recent Advances in Parsing Technology (WS 2011/2012)

nr of correctly predicted constituents

where correct constituent same boundaries, same label

Seminar: Recent Advances in Parsing Technology (WS 2011/2012)

Seminar: Recent Advances in Parsing Technology (WS 2011/2012)

Seminar: Recent Advances in Parsing Technology (WS 2011/2012)

Seminar: Recent Advances in Parsing Technology (WS 2011/2012)

estimate separate probabilities

learn that V P (was) prefers only one complement.

Seminar: Recent Advances in Parsing Technology (WS 2011/2012)

P(H(h)|P (h)) P(LC|P (h), H(h)) P(RC|P (h), H(h))

Seminar: Recent Advances in Parsing Technology (WS 2011/2012)

Seminar: Recent Advances in Parsing Technology (WS 2011/2012)

Reranking (Collins 2005) Architecture

Seminar: Recent Advances in Parsing Technology (WS 2011/2012)

Why rank again?

parser: generative model

nr of joint-probability parameters grows exponentially with nr of features

reranker: discriminative (log-)linear classifier

Seminar: Recent Advances in Parsing Technology (WS 2011/2012)

P(ruler |rulel )CS (rule)

log(P(ruler |rulel )) CS (rule)

i.e. linear combination in log space

Seminar: Recent Advances in Parsing Technology (WS 2011/2012)

Results after Reranking

Seminar: Recent Advances in Parsing Technology (WS 2011/2012)

Seminar: Recent Advances in Parsing Technology (WS 2011/2012)

Thanks for your attention!

Seminar: Recent Advances in Parsing Technology (WS 2011/2012)

Seminar: Recent Advances in Parsing Technology (WS 2011/2012)

Seminar: Recent Advances in Parsing Technology (WS 2011/2012)

Seminar: Recent Advances in Parsing Technology (WS 2011/2012)

Seminar: Recent Advances in Parsing Technology (WS 2011/2012)

wh-Movement Rule Analysis

we observe: A TRACE can be