You are on page 1of 33

Seminar: Recent Advances in Parsing Technology (WS 2011/2012)

Collins Parser

Collins Parser
(Collins 1997, 2005)
What is a supervised parser? When is it lexicalized? How are dependencies used for
CFG parsing? What is a generative model? Why discriminative reranking? How is it
evaluated? How good are the results?

Jens Illig

2011-11-24

page 1

Seminar: Recent Advances in Parsing Technology (WS 2011/2012)

Collins Parser

Outline
basics
(P)CFG
supervised learning
(lexicalized) PCFG

Collins 1997: Probabilistic parser


model 1: generative version of (Collins 1996)
model 2: + complement/adjunct distinction
(model 3: + wh-movement model)

Collins 2005: Reranking


reranker architecture
generative / discriminative, (log-)linear

conclusion
Jens Illig

2011-11-24

page 2

Seminar: Recent Advances in Parsing Technology (WS 2011/2012)

Collins Parser

Probabilistic CFG
CFG
S NP V P
PCFG
S N P V P (90%)
which means:

P(Ruler = hN P, V P i | Rulel = S) = 0.9

with normalization:

Jens Illig

ruler

P(ruler |rulel ) = 1

2011-11-24

page 3

Seminar: Recent Advances in Parsing Technology (WS 2011/2012)

Collins Parser

Supervised Parsing Architecture

Treebank
(S1,T1),(S2,T4),(S3,T5),(S4,T8)

Training Data
(S3,T5),(S4,T8)

Test Data
(S1,T1),(S2,T4)

Training Algorithm

(S1,?),(S2,?)

Model

Jens Illig

Parser

2011-11-24

Evaluation

(S1,T1')
(S1,T2')
(S2,T3')
(S2,T4')
page 4

Seminar: Recent Advances in Parsing Technology (WS 2011/2012)

Collins Parser

Finding the Best Parse


Tbest = arg max P(T |S) = arg max
T

P(T, S)
= arg max P(T, S)
T
P(S)

Two types of models

discriminative:

P(T |S) estimated directly

P(T, S) distribution not available

no model parameters for generating S

generative:
estimation of P(T, S)
PCFG: P(T, S)

Jens Illig

ruleS

P(ruler |rulel )

2011-11-24

page 5

Seminar: Recent Advances in Parsing Technology (WS 2011/2012)

Collins Parser

Lexicalization of Rules
add head word and its PoS tag to each nonterminal

S NP V P
becomes

S(loves, VB) N P (John, N N P ) V P (loves, VB)


lets write this as

P (h) L1 (l1 ) H(h)

Jens Illig

2011-11-24

page 6

Seminar: Recent Advances in Parsing Technology (WS 2011/2012)

Collins Parser

Collins 1997: Model 1


Tell a head-driven (lexicalized) generative story:

P(ruler |rulel ) = P(Ln (ln ), . . . , L1 (l1 ), H(h), R1 (r1 ), . . . , Rm (rm ) | P (h))


generate heads first,
then the left and right modifiers (independently)

= P(H(h) | P (h))

n+1
Y

~
P(Li (li ) | P (h), H(h), (i))

i=1

m+1
Y

~
P(Ri (ri ) | P (h), H(h), (i))

i=1

stop generating modifiers when


Ln+1 (ln+1 ) = STOP or
Rm+1 (rm+1 ) = STOP
~ : N hneighbour?, verb in between?, (0, 1, 2, > 2) commas in between?i

Jens Illig

2011-11-24

page 7

Seminar: Recent Advances in Parsing Technology (WS 2011/2012)

Collins Parser

Parameter Estimation

= P(H(h) | P (h))

n+1
Y

~
P(Li (li ) | P (h), H(h), (i))

i=1

m+1
Y

~
P(Ri (ri ) | P (h), H(h), (i))

i=1

parameters estimated by relative frequency in the training set (max. likelihood):

P(H(h)|P (h)) =

C(H(h), P (h))
C(P (h))

~
C(Li (li ), P (h), H(h), (i))
~
P(Li (li )|P (h), H(h), (i)) =
~
C(P (h), H(h), (i))

linearly smoothed with counts with less specific conditions (backoff)


Jens Illig

2011-11-24

page 8

Seminar: Recent Advances in Parsing Technology (WS 2011/2012)

Collins Parser

Parsing
Bottom-Up chart parsing
PoS tag sentence
each word is a potential head of a phrase
calculate probabilities of modifiers
go on

Jens Illig

2011-11-24

page 9

Seminar: Recent Advances in Parsing Technology (WS 2011/2012)

Collins Parser

Dataset
Penn Treebank: Wall Street Journal portion

sections 2-21 for training


40k sentences

section 23 for testing


2,416 sentences

Jens Illig

2011-11-24

page 10

Seminar: Recent Advances in Parsing Technology (WS 2011/2012)

Collins Parser

Evaluation
PARSEVAL evaluation measures:

nr of correctly predicted constituents


nr of all predicted constituents
nr of correctly predicted constituents
Labeled Recall (LR) =
nr of all correct constituents in the gold parse
Labeled Precision (LP) =

where correct constituent same boundaries, same label

Crossing Brackets (CB) = nr of constituents violating the boundaries in the gold parse

Jens Illig

2011-11-24

page 11

Seminar: Recent Advances in Parsing Technology (WS 2011/2012)

Collins Parser

Results Model 1

Jens Illig

2011-11-24

page 12

Seminar: Recent Advances in Parsing Technology (WS 2011/2012)

Collins Parser

Subcategorization Problem
consider this parse:

Jens Illig

2011-11-24

page 13

Seminar: Recent Advances in Parsing Technology (WS 2011/2012)

Collins Parser

Subcategorization Problem
due to the independence of modifiers, Model 1 may parse:

Jens Illig

2011-11-24

page 14

Seminar: Recent Advances in Parsing Technology (WS 2011/2012)

Collins Parser

Subcategorization Problem
Solution: distinguish modifiers into complements (-C) and adjuncts

estimate separate probabilities

learn that V P (was) prefers only one complement.


complement information might also help identifying functional information like subject

Jens Illig

2011-11-24

page 15

Seminar: Recent Advances in Parsing Technology (WS 2011/2012)

Collins Parser

Model 2
Extend Model 1:

P(H(h)|P (h)) P(LC|P (h), H(h)) P(RC|P (h), H(h))

m+1
Y

~
]
P(Li (li )|P (h), H(h), (i),
LCi )

i=1

n+1
Y

~
]
P(Ri (ri )|P (h), H(h), (i),
RCi )

i=1

draw sets of allowed complements (subcat sets) for the left (LC) and right (RC) side
generate each complement in LC/RC exactly once.
no STOP before the subcat set is satisfied

Jens Illig

2011-11-24

page 16

Seminar: Recent Advances in Parsing Technology (WS 2011/2012)

Collins Parser

Results Model 2

Jens Illig

2011-11-24

page 17

Seminar: Recent Advances in Parsing Technology (WS 2011/2012)

Collins Parser

Reranking (Collins 2005) Architecture

Jens Illig

2011-11-24

page 18

Seminar: Recent Advances in Parsing Technology (WS 2011/2012)

Collins Parser

Why rank again?


consider more features of a parse tree
CFG rule occurrence (lexicalized / with grandparent node)
bigram (nonterminals only / lexicalized) occurrence
...

parser: generative model


new random variables needed for every feature

nr of joint-probability parameters grows exponentially with nr of features


(must be avoided by a generative story introducing conditional independencies)

reranker: discriminative (log-)linear classifier


treat every feature independently
simple to extend feature set

Jens Illig

2011-11-24

page 19

Seminar: Recent Advances in Parsing Technology (WS 2011/2012)

Collins Parser

Log-Linear Models
for PCFG, one step is an application of a CFG-rule:

P(T, S)

P(ruler |rulel )

ruleS

P(ruler |rulel )CS (rule)

ruleG

log(P(T, S))

log(P(ruler |rulel )) CS (rule)

ruleG

i.e. linear combination in log space


call log(P(ruler |rulel )) feature weight and C(rule) feature value

Jens Illig

2011-11-24

page 20

Seminar: Recent Advances in Parsing Technology (WS 2011/2012)

Collins Parser

Results after Reranking

Jens Illig

2011-11-24

page 21

Seminar: Recent Advances in Parsing Technology (WS 2011/2012)

Collins Parser

Conclusion
Lexicalized parser
Head-centric generative process
Extensions for subcategorization (and wh-movement)
Discriminative Reranking of results

Jens Illig

2011-11-24

page 22

Seminar: Recent Advances in Parsing Technology (WS 2011/2012)

Collins Parser

Thanks for your attention!


questions
discussion

Jens Illig

2011-11-24

page 23

Seminar: Recent Advances in Parsing Technology (WS 2011/2012)

Collins Parser

Parsing 1/3
bottom up chart parsing:
choose a complete(+) phrase as head for a new phrase

Jens Illig

2011-11-24

page 24

Seminar: Recent Advances in Parsing Technology (WS 2011/2012)

Collins Parser

Parsing 2/3
add completed neighbouring phrases as modifiers

Jens Illig

2011-11-24

page 25

Seminar: Recent Advances in Parsing Technology (WS 2011/2012)

Collins Parser

Parsing 3/3
complete by adding STOP modifiers

Jens Illig

2011-11-24

page 26

Seminar: Recent Advances in Parsing Technology (WS 2011/2012)

Collins Parser

wh-Movement Rules
Solution: Account for (+gap) rules separately. Allow generation of a TRACE under
a (+gap)-version of a nonterminal.

Jens Illig

2011-11-24

page 27

Seminar: Recent Advances in Parsing Technology (WS 2011/2012)

Collins Parser

wh-Movement Rule Analysis

we observe: A TRACE can be

passed down the head (rule 3)


passed down to one of the left / right modifiers
discharged by a TRACE

Jens Illig

2011-11-24

page 28

Seminar: Recent Advances in Parsing Technology (WS 2011/2012)

Collins Parser

Model 3
Extend Model 2: new random variable G with values:

Head - passed down the head (3)


Left/Right - passed down to one of the left / right modifiers (LC +=gap / RC +=gap)
the gap entry in LC /RC is discharged by a TRACE or a (gap)-phrase modifier phrase

P(H(h)|P (h))P(LC|P (h), H(h))P(RC|P (h), H(h))P(G|P (h), H(h))

m+1
Y

~
]
LCi )
P(Li (li )|P (h), H(h), (i),

i=1

n+1
Y

~
]
RCi )
P(Ri (ri )|P (h), H(h), (i),

i=1

Jens Illig

2011-11-24

page 29

Seminar: Recent Advances in Parsing Technology (WS 2011/2012)

Collins Parser

Results Model 3

Jens Illig

2011-11-24

page 30

Seminar: Recent Advances in Parsing Technology (WS 2011/2012)

Collins Parser

Practical Issues - Smoothing


sparse data for full conditioning set needs backoff

linear combination: p
=

pmle
+ (1 ) pbackof f
recursively stacked: p
backof f = pmle + (1 ) pbackof f
all words occurring less than 5 times are replaced by UNKNOWN

Jens Illig

2011-11-24

page 31

Seminar: Recent Advances in Parsing Technology (WS 2011/2012)

Collins Parser

History-Based Models
history-based model (generative, structured):

Qn

P(T, S) = i=1 P(di |(d1 , . . . , di1 ))


i.e. a pair (t, s) is generated by a sequence of steps D = hd1 , . . . , dn i

Jens Illig

2011-11-24

page 32

Seminar: Recent Advances in Parsing Technology (WS 2011/2012)

Collins Parser

Boosting
machine-learning algorithm
composition of (typically) simple classifiers
repeatedly add a new classifier which is trained with particular focus on the samples
that are incorrectly classified by the previous zoo of classifiers
Here:

each simple classifier has exactly one binary feature


learning finds the feature that helps the most to improve the results of the previous
classifier zoo

Jens Illig

2011-11-24

page 33

You might also like