You are on page 1of 10

Inferring Logical Forms From Denotations

Panupong Pasupat Percy Liang


Computer Science Department Computer Science Department
Stanford University Stanford University
ppasupat@cs.stanford.edu pliang@cs.stanford.edu

Abstract difficult to search over the space of logical forms.


Previous work sidesteps this obstacle by restrict-
A core problem in learning semantic ing the set of possible logical form compositions,
arXiv:1606.06900v2 [cs.CL] 15 Nov 2016

parsers from denotations is picking out but this is limiting. For instance, for the system
consistent logical forms—those that yield in Pasupat and Liang (2015), in only 53.5% of the
the correct denotation—from a combina- examples was the correct logical form even in the
torially large space. To control the search set of generated logical forms.
space, previous work relied on restricted The goal of this paper is to solve two main chal-
set of rules, which limits expressivity. In lenges that prevent us from generating more ex-
this paper, we consider a much more ex- pressive logical forms. The first challenge is com-
pressive class of logical forms, and show putational: the number of logical forms grows ex-
how to use dynamic programming to effi- ponentially as their size increases. Directly enu-
ciently represent the complete set of con- merating over all logical forms becomes infeasi-
sistent logical forms. Expressivity also ble, and pruning techniques such as beam search
introduces many more spurious logical can inadvertently prune out correct logical forms.
forms which are consistent with the cor- The second challenge is the large increase in
rect denotation but do not represent the spurious logical forms—those that do not reflect 能执行出最终
meaning of the utterance. To address the semantics of the question but coincidentally 结果,但语法
this, we generate fictitious worlds and use 结构是错的
execute to the correct denotation. For example,
crowdsourced denotations on these worlds while logical forms z1 , . . . , z5 in Figure 1 are all
to filter out spurious logical forms. On consistent (they execute to the correct answer y),
the W IKI TABLE Q UESTIONS dataset, we the logical forms z4 and z5 are spurious and would
increase the coverage of answerable ques- give incorrect answers if the table were to change.
tions from 53.5% to 76%, and the ad- We address these two challenges by solving two
ditional crowdsourced supervision lets us interconnected tasks. The first task, which ad-
rule out 92.1% of spurious logical forms. dresses the computational challenge, is to enumer-
ate the set Z of all consistent logical forms given
1 Introduction
a question x, a knowledge source w (“world”),
Consider the task of learning to answer com- and the target denotation y (Section 4). Observ-
plex natural language questions (e.g., “Where did ing that the space of possible denotations grows
the last 1st place finish occur?”) using only much more slowly than the space of logical forms,
question-answer pairs as supervision (Clarke et we perform dynamic programming on denotations
al., 2010; Liang et al., 2011; Berant et al., (DPD) to make search feasible. Our method is
2013; Artzi and Zettlemoyer, 2013). Seman- guaranteed to find all consistent logical forms up
tic parsers map the question into a logical form to some bounded size.
(e.g., R[Venue].argmax(Position.1st, Index)) Given the set Z of consistent logical forms, the
that can be executed on a knowledge source to ob- second task is to filter out spurious logical forms
tain the answer (denotation). Logical forms are from Z (Section 5). Using the property that spuri-
very expressive since they can be recursively com- ous logical forms ultimately give a wrong answer
posed, but this very expressivity makes it more when the data in the world w changes, we create
可以看成是某运动员的比赛记录

Year Venue Position Event Time ..


.
2001 Hungary 2nd 400m 47.12 Next
2003 Finland 1st 400m 46.69
2005 Germany 11th 400m 46.62 Index r1 ···
Venue Position
2007 Thailand 1st relay 182.05 Next
2008 China 7th relay 180.32 1 Finland 1st
Number 1
x: “Where did the last 1st place finish occur?” Index r2 ···
Venue Position
y: Thailand Next
2 Germany 11th
Consistent Number 11
Correct Index r3 ···
Venue Position
z1 : R[Venue].argmax(Position.1st, Index) Next
Among rows with Position = 1st, pick the one with 3 Thailand 1st
maximum index, then return the Venue of that row. Number 1
..
z2 : R[Venue].Index.max(R[Index].Position.1st) .
Find the maximum index of rows with Position =
1st, then return the Venue of the row with that index. z1 = R[Venue]. argmax( Position. 1st , Index)
z3 : R[Venue].argmax(Position.Number.1,
R[λx.R[Date].R[Year].x]) Figure 2: The table in Figure 1 is converted into
Among rows with Position number 1, pick one with a graph. The recursive execution of logical form
latest date in the Year column and return the Venue.
z1 is shown via the different colors and styles.
Spurious
z4 : R[Venue].argmax(Position.Number.1,
R[λx.R[Number].R[Time].x]) 76% of the examples, up from the 53.5% in Pa-
Among rows with Position number 1, pick the one
with maximum Time number. Return the Venue.
supat and Liang (2015). Moreover, unlike beam
z5 : R[Venue].Year.Number.( search, DPD is guaranteed to find all consistent
R[Number].R[Year].argmax(Type.Row, Index)−1) logical forms up to a bounded size. Finally, by us-
Subtract 1 from the Year in the last row, then return ing annotated data on fictitious worlds, we are able
the Venue of the row with that Year. to prune out 92.1% of the spurious logical forms.
Inconsistent
z̃: R[Venue].argmin(Position.1st, Index) 2 Setup
Among rows with Position = 1st, pick the one with
minimum index, then return the Venue. (= Finland) The overarching motivation of this work is allow-
ing people to ask questions involving computa-
Figure 1: Six logical forms generated from the
tion on semi-structured knowledge sources such
question x. The first five are consistent: they ex-
as tables from the Web. This section introduces
ecute to the correct answer y. Of those, correct
how the knowledge source is represented, how the
logical forms z1 , z2 , and z3 are different ways to
computation is carried out using logical forms, and
represent the semantics of x, while spurious logi-
our task of inferring correct logical forms.
cal forms z4 and z5 get the right answer y for the
wrong reasons. Worlds. We use the term world to refer to a col-
lection of entities and relations between entities.
One way to represent a world w is as a directed
fictitious worlds to test the denotations of the logi- graph with nodes for entities and directed edges
cal forms in Z. We use crowdsourcing to annotate for relations. (For example, a world about geog-
? the correct denotations on a subset of the gener- raphy would contain a node Europe with an edge
ated worlds. To reduce the amount of annotation Contains to another node Germany.)
needed, we choose the subset that maximizes the In this paper, we use data tables from the Web
expected information gain. The pruned set of log- as knowledge sources, such as the one in Figure 1.
ical forms would provide a stronger supervision We follow the construction in Pasupat and Liang
signal for training a semantic parser. (2015) for converting a table into a directed graph
We test our methods on the W IKI TABLE Q UES - (see Figure 2). Rows and cells become nodes (e.g.,
TIONS dataset of complex questions on Wikipedia r0 = first row and Finland) while columns be-
tables. We define a simple, general set of deduc- come labeled directed edges between them (e.g.,
tion rules (Section 3), and use DPD to confirm Venue maps r1 to Finland). The graph is aug-
that the rules generate a correct logical form in mented with additional edges Next (from each
row to the next) and Index (from each row to its answer, while spurious logical forms accidentally
index number). In addition, we add normaliza- get the right answer for the wrong reasons (e.g., z4
tion edges to cell nodes, including Number (from picks the row with the maximum time but gets the
the cell to the first number in the cell), Num2 (the correct answer anyway).
second number), Date (interpretation as a date),
Tasks. Denote by Z and Zc the sets of all con-
and Part (each list item if the cell represents a
sistent and correct logical forms, respectively. The
list). For example, a cell with content “3-4” has
first task is to efficiently compute Z given an ut-
a Number edge to the integer 3, a Num2 edge to 4,
terance x, a world w, and the correct denotation y
and a Date edge to XX-03-04.
(Section 4). With the set Z, the second task is to
Logical forms. We can perform computation on infer Zc by pruning spurious logical forms from Z
a world w using a logical form z, a small program (Section 5).
that can be executed on the world, resulting in a
3 Deduction rules
denotation JzKw .
We use lambda DCS (Liang, 2013) as the lan- The space of logical forms given an utterance x
guage of logical forms. As a demonstration, we and a world w is defined recursively by a set of de-
will use z1 in Figure 2 as an example. The small- duction rules (Table 1). In this setting, each con-
structed logical form belongs to a category (Set, category限定
est units of lambda DCS are entities (e.g., 1st) and 为这三种
relations (e.g., Position). Larger logical forms Rel, or Map). These categories are used for type
can be constructed using logical operations, and checking in a similar fashion to categories in syn-
the denotation of the new logical form can be com- tactic parsing. Each deduction rule specifies the
puted from denotations of its constituents. For ex- categories of the arguments, category of the re-
ample, applying the join operation on Position sulting logical form, and how the logical form is
and 1st gives Position.1st, whose denotation constructed from the arguments.
is the set of entities with relation Position point- Deduction rules are divided into base rules and
ing to 1st. With the world in Figure 2, the denota- compositional rules. A base rule follows one of
tion is JPosition.1stKw = {r1 , r3 }, which cor- the following templates:
responds to the 2nd and 4th rows in the table. The
TokenSpan[span] → c [f (span)] (1)
partial logical form Position.1st is then used
to construct argmax(Position.1st, Index), the ∅ → c [f ()] (2)
denotation of which can be computed by mapping A rule of Template 1 is triggered by a span of
the entities in JPosition.1stKw = {r1 , r3 } us- tokens from x (e.g., to construct z1 in Figure 2
ing the relation Index ({r0 : 0, r1 : 1, . . . }), and from x in Figure 1, Rule B1 from Table 1 con-
then picking the one with the largest mapped value structs 1st of category Set from the phrase “1st”).
(r3 , which is mapped to 3). The resulting logical Meanwhile, a rule of Template 2 generates a log-
form is finally combined with R[Venue] with an- ical form without any trigger (e.g., Rule B5 gen-
other join operation. The relation R[Venue] is the erates Position of category Rel from the graph
reverse of Venue, which corresponds to traversing edge Position without a specific trigger in x).
Venue edges in the reverse direction. Compositional rules then construct larger logi-
cal forms from smaller ones:
Semantic parsing. A semantic parser maps a
natural language utterance x (e.g., “Where did the c1 [z1 ] + c2 [z2 ] → c [g(z1 , z2 )] (3)
last 1st place finish occur?”) into a logical form z.
c1 [z1 ] → c [g(z1 )] (4)
匹配最终结果With denotations as supervision, a semantic parser
就叫consistent
is trained to put high probability on z’s that are A rule of Template 3 combines partial logical
consistent—logical forms that execute to the cor- forms z1 and z2 of categories c1 and c2 into
rect denotation y (e.g., Thailand). When the space g(z1 , z2 ) of category c (e.g., Rule C1 uses 1st of
of logical forms is large, searching for consistent category Set and Position of category Rel to con-
logical forms z can become a challenge. struct Position.1st of category Set). Template 4
As illustrated in Figure 1, consistent logical works similarly.
forms can be divided into two groups: correct log- Most rules construct logical forms without re-
ical forms represent valid ways for computing the quiring a trigger from the utterance x. This is
Rule Semantics We address this challenge with the Map cat-
Base Rules
B1 TokenSpan → Set fuzzymatch(span) egory. A Map is a pair (u, b) of a finite set
(entity fuzzily matching the text: “chinese” → China) u (unary) and a binary relation b. The deno-
B2 TokenSpan → Set val(span) tation of (u, b) is (JuKw , JbK0w ) where the binary
(interpreted value: “march 2015” → 2015-03-XX)
B3 ∅ → Set Type.Row JbK0w is JbKw with the domain restricted to the set
(the set of all rows) JuKw . For example, consider the construction of
B4 ∅ → Set c ∈ ClosedClass
(any entity from a column with few unique entities)
argmax(Position.1st, Index). After construct-
(e.g., 400m or relay from the Event column) ing Position.1st with denotation {r1 , r3 }, Rule
B5 ∅ → Rel r ∈ GraphEdges M1 initializes (Position.1st, x) with denotation
(any relation in the graph: Venue, Next, Num2, . . . )
B6 ∅ → Rel != | < | <= | > | >= ({r1 , r3 }, {r1 : {r1 }, r3 : {r3 }}). Rule M2 is then
Compositional Rules applied to generate (Position.1st, R[Index].x)
C1 Set + Rel → Set z2 .z1 | R[z2 ].z1 with denotation ({r1 , r3 }, {r1 : {1}, r3 : {3}}).
(R[z] is the reverse of z; i.e., flip the arrow direction)
C2 Set → Set a(z1 ) Finally, Rule M6 converts the Map into the desired
(a ∈ {count, max, min, sum, avg}) argmax logical form with denotation {r3 }.
C3 Set + Set → Set z1 u z2 | z1 t z2 | z1 − z2
(subtraction is only allowed on numbers)
Compositional Rules with Maps
Generality of deduction rules. Using domain
Initialization knowledge, previous work restricted the space of
此处x表示 M1 Set → Map (z1 , x) (identity map) logical forms by manually defining the categories
lambda x(x) Operations on Map
M2 Map + Rel → Map (u1 , z2 .b1 ) | (u1 , R[z2 ].b1 ) c or the semantic functions f and g to fit the do-
M3 Map → Map (u1 , a(b1 )) main. For example, the category Set might be di-
(a ∈ {count, max, min, sum, avg}) vided into Records, Values, and Atomic when the
M4 Map + Set → Map (u1 , b1 u z2 ) | . . .
M5 Map + Map → Map (u1 , b1 u b2 ) | . . . knowledge source is a table (Pasupat and Liang,
(Allowed only when u1 = u2 ) 2015). Another example is when a compositional
(Rules M4 and M5 are repeated for t and −) rule g (e.g., sum(z1 )) must be triggered by some
Finalization
M6 Map → Set argmin(u1 , R[λx.b1 ]) phrase in a lexicon (e.g., words like “total” that
| argmax(u1 , R[λx.b1 ]) align to sum in the training data). Such restrictions
make search more tractable but greatly limit the
Table 1: Deduction rules define the space of log-
scope of questions that can be answered.
ical forms by specifying how partial logical forms
Here, we have increased the coverage of logi-
are constructed. The logical form of the i-th argu-
cal forms by making the deduction rules simple
ment is denoted by zi (or (ui , bi ) if the argument
and general, essentially following the syntax of
is a Map). The set of final logical forms contains
lambda DCS. The base rules only generates en-
any logical form with category Set.
tities that approximately match the utterance, but
all possible relations, and all possible further com-
crucial for generating implicit relations (e.g., gen- binations.
erating Year from “what’s the venue in 2000?”
without a trigger “year”), and generating opera- Beam search. Given the deduction rules, an ut-
tions without a lexicon (e.g., generating argmax terance x and a world w, we would like to generate
from “where’s the longest competition”). How- all derived logical forms Z. We first present the
ever, the downside is that the space of possible floating parser (Pasupat and Liang, 2015), which
logical forms becomes very large. uses beam search to generate Zb ⊆ Z, a usually
incomplete subset. Intuitively, the algorithm first
The Map category. The technique in this paper constructs base logical forms based on spans of
需要用到partial
的执行 requires execution of partial logical forms. This the utterance, and then builds larger logical forms
poses a challenge for argmin and argmax oper- of increasing size in a “floating” fashion—without
ations, which take a set and a binary relation as requiring a trigger from the utterance.
arguments. The binary could be a complex func- Formally, partial logical forms with category c
tion (e.g., in z3 from Figure 1). While it is possible and size s are stored in a cell (c, s). The algorithm
to build the binary independently from the set, ex- first generates base logical forms from base deduc-
对binary是无 ecuting a complex binary is sometimes impossible tion rules and store them in cells (c, 0) (e.g., the 为什么是0?
法执行的 (e.g., the denotation of λx.count(x) is impossible cell (Set, 0) contains 1st, Type.Row, and so on).
to write explicitly without knowledge of x). Then for each size s = 1, . . . , smax , we populate
(Set, 7,
··· 2003; Liang et al., 2010; Gulwani, 2011).
{Thailand})
···
The main idea of DPD is to collapse logical
forms with the same denotation together. Instead
··· of using cells (c, s) as in beam search, we per-
···
··· form dynamic programming using cells (c, s, d)
··· where d is a denotation. For instance, the logi-
(Set, 7, cal form Position.Number.1 will now be stored
···
{Finland}) in cell (Set, 2, {r1 , r3 }).
Figure 3: The first pass of DPD constructs cells For DPD to work, each deduction rule must
(c, s, d) (square nodes) using denotationally in- have a denotationally invariant semantic function
variant semantic functions (circle nodes). The sec- g, meaning that the denotation of the resulting log-
ond pass enumerates all logical forms along paths ical form g(z1 , z2 ) only depends on the denota-
that lead to the correct denotation y (solid lines). tions of z1 and z2 :

Jz1 Kw = Jz10 Kw ∧ Jz2 Kw = Jz20 Kw


the cells (c, s) by applying compositional rules on
⇒ Jg(z1 , z2 )Kw = Jg(z10 , z20 )Kw
partial logical forms with size less than s. For in-
stance, when s = 2, we can apply Rule C1 on
All of our deduction rules in Table 1 are de-
logical forms Number.1 from cell (Set, s1 = 1)
notationally invariant, but a rule that, for in-
and Position from cell (Rel, s2 = 0) to create
stance, returns the argument with the larger log- 这个例子是不依
Position.Number.1 in cell (Set, s0 +s1 +1 = 2). 赖于denotation
ical form size would not be. Applying a de- 的情况
After populating each cell (c, s), the list of logi-
notationally invariant deduction rule on any pair
cal forms in the cell is pruned based on the model
of logical forms from (c1 , s1 , d1 ) and (c2 , s2 , d2 )
scores to a fixed beam size in order to control the
always results in a logical form with the same
search space. Finally, the set Zb is formed by
denotation d in the same cell (c, s1 + s2 +
collecting logical forms from all cells (Set, s) for
1, d).1 (For example, the cell (Set, 4, {r3 }) con-
s = 1, . . . , smax .
tains z1 := argmax(Position.1st, Index) and
Due to the generality of our deduction rules, the
z10 := argmin(Event.Relay, Index). Combin-
number of logical forms grows quickly as the size
ing each of these with Venue using Rule C1 gives
s increases. As such, partial logical forms that
R[Venue].z1 and R[Venue].z10 , which belong to
目标logical are essential for building the desired logical forms
form中的某些 the same cell (Set, 5, {Thailand})).
关键部分可能 might fall off the beam early on. In the next sec-
在beam中被筛掉 tion, we present a new search method that com- Algorithm. DPD proceeds in two forward
presses the search space using denotations. passes. The first pass finds the possible combi-
nations of cells (c, s, d) that lead to the correct de-
4 Dynamic programming on denotations notation y, while the second pass enumerates the
logical forms in the cells found in the first pass.
Our first step toward finding all correct logical Figure 3 illustrates the DPD algorithm.
forms is to represent all consistent logical forms In the first pass, we are only concerned about
(those that execute to the correct denotation). For- finding relevant cell combinations and not the ac-
mally, given x, w, and y, we wish to generate the tual logical forms. Therefore, any logical form
set Z of all logical forms z such that JzKw = y. that belongs to a cell could be used as an argu-
As mentioned in the previous section, beam ment of a deduction rule to generate further logical
search does not recover the full set Z due to prun- forms. Thus, we keep at most one logical form per
ing. Our key observation is that while the number cell; subsequent logical forms that are generated
of logical forms explodes, the number of distinct for that cell are discarded.
denotations of those logical forms is much more After populating all cells up to size smax , we
controlled, as multiple logical forms can share the list all cells (Set, s, y) with the correct denotation
same denotation. So instead of directly enumerat- y, and then note all possible rule combinations
ing logical forms, we use dynamic programming (cell1 , rule) or (cell1 , cell2 , rule) that lead to those
on denotations (DPD), which is inspired by sim-
1
ilar methods from program induction (Lau et al., Semantic functions f with one argument work similarly.
final cells, including the combinations that yielded Year Venue Position Event Time
2001 Finland 7th relay 46.62
discarded logical forms. 2003 Germany 1st 400m 180.32
The second pass retrieves the actual logical 2005 China 1st relay 47.12
forms that yield the correct denotation. To do this, 2007 Hungary 7th relay 182.05
populate
的意思是, we simply populate the cells (c, s, d) with all log- Figure 4: From the example in Figure 1, we gen-
对cell填充 ical forms, using only rule combinations that lead
进对应的 erate a table for the fictitious world w1 .
logical to final cells. This elimination of irrelevant rule
form combinations effectively reduces the search space. w w1 w2 ···
(In Section 6.2, we empirically show that the num- z1 Thailand China Finland ···)
ber of cells considered is reduced by 98.7%.) z2 Thailand China Finland · · · q1
The parsing chart is represented as a hyper- z3 Thailand China Finland ···
graph as in Figure 3. After eliminating unused z4 Thailand Germany China · · · } q2
rule combinations, each of the remaining hyper- z5 Thailand China China ··· o
paths from base predicates to the target denotation q3
z6 Thailand China China ···
corresponds to a single logical form. making the .. .. .. ..
. . . .
remaining parsing chart a compact implicit repre-
sentation of all consistent logical forms. This rep- Figure 5: We execute consistent logical forms
resentation is guaranteed to cover all possible log- zi ∈ Z on fictitious worlds to get denotation tu-
ical forms under the size limit smax that can be ples. Logical forms with the same denotation tuple
constructed by the deduction rules. are grouped into the same equivalence class qj .
In our experiments, we apply DPD on the de-
duction rules in Table 1 and explicitly enumerate
the logical forms produced by the second pass. For impose that all predicates present in the original
efficiency, we prune logical forms that are clearly world w should also be present in wi as well.
redundant (e.g., applying max on a set of size 1). In our case where the world w comes from a
We also restrict a few rules that might otherwise data table t, we construct wi from a new table ti as
create too many denotations. For example, we re- follows: we go through each column of t and re- 对纵列元素进
stricted the union operation (t) except unions of sample the cells in that column. The cells are sam- 行sample
two entities (e.g., we allow Germany t Finland pled using random draws without replacement if
but not Venue.Hungary t . . . ), subtraction when the original cells are all distinct, and with replace-
building a Map, and count on a set of size 1.2 ment otherwise. Sorted columns are kept sorted.
To ensure that predicates in w exist in wi , we use
5 Fictitious worlds the same set of table columns and enforce that any
After finding the set Z of all consistent logical entity fuzzily matching a span in the question x
forms, we want to filter out spurious logical forms. must be present in ti (e.g., for the example in Fig-
To do so, we observe that semantically correct log- ure 1, the generated ti must contain “1st”). Fig-
ical forms should also give the correct denotation ure 4 shows an example fictitious table generated
in worlds w0 other than than w. In contrast, spu- from the table in Figure 1.
rious logical forms will fail to produce the correct Fictitious worlds are similar to test suites for
denotation on some other world. computer programs. However, unlike manually
designed test suites, we do not yet know the cor-
不知道虚拟
Generating fictitious worlds. With the ob- rect answer for each fictitious world or whether a word对应z
servation above, we generate fictitious worlds world is helpful for filtering out spurious logical 的答案
w1 , w2 , . . . , where each world wi is a slight alter- forms. The next subsections introduce our method
ation of w. As we will be executing logical forms for choosing a subset of useful fictitious worlds to
z ∈ Z on wi , we should ensure that all entities and be annotated.
relations in z ∈ Z appear in the fictitious world wi
(e.g., z1 in Figure 1 would be meaningless if the Equivalence classes. Let W = (w1 , . . . , wk ) be
entity 1st does not appear in wi ). To this end, we the list of all possible fictitious worlds. For each
2
z ∈ Z, we define the denotation tuple JzKW =
While we technically can apply count on sets of size 1,
the number of spurious logical forms explodes as there are (JzKw1 , . . . , JzKwk ). We observe that some logi-
too many sets of size 1 generated. cal forms produce the same denotation across all
fictitious worlds. This may be due to an algebraic We exhaustively search for W 0 that minimizes
equivalence in logical forms (e.g., z1 and z2 in Fig- (*).
P The objective value follows our intuition since
ure 1) or due to the constraints in the construction t |Ft | log |Ft | is small when the terms |Ft | are
of fictitious worlds (e.g., z1 and z3 in Figure 1 are small and numerous.
equivalent as long as the Year column is sorted). In our experiments, we approximate the full
We group logical forms into equivalence classes set W of fictitious worlds by generating k =
based on their denotation tuples, as illustrated in 30 worlds to compute equivalence classes. We
Figure 5. When the question is unambiguous, we choose a subset of ` = 5 worlds to be annotated.
expect at most one equivalence class to contain
correct logical forms. 6 Experiments

如何找到 Annotation. To pin down the correct equiva- For the experiments, we use the training portion
正确的等价集合
lence class, we acquire the correct answers to the of the W IKI TABLE Q UESTIONS dataset (Pasupat
question x on some subset W 0 = (w10 , . . . , w`0 ) ⊆ and Liang, 2015), which consists of 14,152 ques-
W of ` fictitious worlds, as it is impractical to ob- tions on 1,679 Wikipedia tables gathered by crowd
tain annotations on all fictitious worlds in W . We workers. Answering these complex questions re-
compile equivalence classes that agree with the an- quires different types of operations. The same
notations into a set Zc of correct logical forms. operation can be phrased in different ways (e.g.,
We want to choose W 0 that gives us the most “best”, “top ranking”, or “lowest ranking num-
information about the correct equivalence class as ber”) and the interpretation of some phrases de-
possible. This is analogous to standard practices pend on the context (e.g., “number of ” could be
in active learning (Settles, 2010).3 Let Q be the a table lookup or a count operation). The lexical
set of all equivalence classes q, and let JqKW 0 be content of the questions is also quite diverse: even
the denotation tuple computed by executing an ar- excluding numbers and symbols, the 14,152 train-
bitrary z ∈ q on W 0 . The subset W 0 divides Q ing examples contain 9,671 unique words, only
F:对q进行分组 into partitions Ft = {q ∈ Q : JqKW 0 = t} based 10% of which appear more than 10 times.
on the denotation tuples t (e.g., from Figure 5, if We attempted to manually annotate the first 300
W 0 contains just w2 , then q2 and q3 will be in the examples with lambda DCS logical forms. We
same partition F(China) ). The annotation t∗ , which successfully constructed correct logical forms for
is also a denotation tuple, will mark one of these 84% of these examples, which is a good number
partitions Ft∗ as correct. Thus, to prune out many considering the questions were created by humans
spurious equivalence classes, the partitions should who could use the table however they wanted. The
partitions
越多越好 be as numerous and as small as possible. remaining 16% reflect limitations in our setup—
More formally, we choose a subset W 0 that for example, non-canonical table layouts, answers
maximizes the expected information gain (or appearing in running text or images, and com-
equivalently, the reduction in entropy) about mon sense reasoning (e.g., knowing that “Quarter-
the correct equivalence class given the annota- final” is better than “Round of 16”).
Q:正确等价类 tion. With random variables Q ∈ Q represent-
的随机变量 ∗ 6.1 Generality of deduction rules
ing the correct equivalence class and TW 0 for
0
the annotation on worlds W , we seek to find We compare our set of deduction rules with the
arg minW 0 H(Q | TW ∗ ). Assuming a uniform one given in Pasupat and Liang (2015) (hence-
0

prior on Q (p(q) = 1/|Q|) and accurate annota- forth PL15). PL15 reported generating the anno-
tion (p(t∗ | q) = I[q ∈ Ft∗ ]): tated logical form in 53.5% of the first 200 exam-
ples. With our more general deduction rules, we

X p(t) use DPD to verify that the rules are able to gener-
H(Q | TW 0) = p(q, t) log
p(q, t) ate the annotated logical form in 76% of the first
q,t
300 examples, within the logical form size limit
1 X
= |Ft | log |Ft |. (*) smax of 7. This is 90.5% of the examples that were
|Q| t
successfully annotated. Figure 6 shows some ex-
3
amples of logical forms we cover that PL15 could
The difference is that we are obtaining partial informa-
tion about an individual example rather than partial informa- not. Since DPD is guaranteed to find all consis-
tion about the parameters. tent logical forms, we can be sure that the logical
“which opponent has the most wins” 1.0K

z = argmax(R[Opponent].Type.Row, 0.8K logical forms

count
0.6K
R[λx.count(Opponent.x u Result.Lost]) denotations
0.4K
“how long did ian armstrong serve?” 0.2K
z = R[Num2].R[Term].Member.IanArmstrong 0
0 1 2 3 4 5 6 7
− R[Number].R[Term].Member.IanArmstrong
logical form size
“which players came in a place before lukas bauer?”
z = R[Name].Index.<.R[Index].Name.LukasBauer Figure 7: The median of the number of logical
“which players played the same position as ardo kreek?” forms (dashed) and denotations (solid) as the for-
z = R[Player].Position.R[Position].Player.Ardo
mula size increases. The space of logical forms
u !=.Ardo
grows much faster than the space of denotations.
Figure 6: Several example logical forms our sys-

annotated LFs coverage


tem can generated that are not covered by the de-
duction rules from the previous work PL15. 0.8
?
0.6

0.4
forms not covered are due to limitations of the de- 0.2

duction rules. Indeed, the remaining examples ei- 0.0


0 5000 10000 15000 20000 25000
ther have logical forms with size larger than 7 or number of final LFs produced
require other operations such as addition, union of
Figure 8: The number of annotated logical forms
arbitrary sets, etc.
that can be generated by beam search, both unini-
6.2 Dynamic programming on denotations tialized (dashed) and initialized (solid), increases
with the number of candidates generated (con-
Search space. To demonstrate the savings
trolled by beam size), but lacks behind DPD (star).
gained by collapsing logical forms with the same
denotation, we track the growth of the number of
unique logical forms and denotations as the log- 6.3 Fictitious worlds
ical form size increases. The plot in Figure 7 We now explore how fictitious worlds divide the
shows that the space of logical forms explodes set of logical forms into equivalence classes, and
much more quickly than the space of denotations. how the annotated denotations on the chosen
The use of denotations also saves us from con- worlds help us prune spurious logical forms.
sidering a significant amount of irrelevant partial
logical forms. On average over 14,152 training Equivalence classes. Using 30 fictitious worlds
examples, DPD generates approximately 25,000 per example, we produce an average of 1,237
consistent logical forms. The first pass of DPD equivalence classes. One possible concern with
generates ≈ 153,000 cells (c, s, d), while the sec- using a limited number of fictitious worlds is that
ond pass generates only ≈ 2,000 cells resulting we may fail to distinguish some pairs of non-
from ≈ 8,000 rule combinations, resulting in a equivalent logical forms. We verify the equiva-
98.7% reduction in the number of cells that have lence classes against the ones computed using 300
to be considered. fictitious worlds. We found that only 5% of the
logical forms are split from the original equiva-
Comparison with beam search. We compare
lence classes.
DPD to beam search on the ability to generate (but
not rank) the annotated logical forms. We consider Ideal Annotation. After computing equivalence
two settings: when the beam search parameters classes, we choose a subset W 0 of 5 fictitious
are uninitialized (i.e., the beams are pruned ran- worlds to be annotated based on the information-
domly), and when the parameters are trained using theoretic objective. For each of the 252 exam-
the system from PL15 (i.e., the beams are pruned ples with an annotated logical form z ∗ , we use
based on model scores). The plot in Figure 8 the denotation tuple t∗ = Jz ∗ KW 0 as the annotated
shows that DPD generates more annotated logical answers on the chosen fictitious worlds. We are
forms (76%) compared to beam search (53.7%), able to rule out 98.7% of the spurious equivalence
even when beam search is guided heuristically by classes and 98.3% of spurious logical forms. Fur-
learned parameters. Note that DPD is an exact al- thermore, we are able to filter down to just one
gorithm and does not require a heuristic. equivalence class in 32.7% of the examples, and
at most three equivalence classes in 51.3% of the with executable semantic parsing is the “schema
examples. If we choose 5 fictitious worlds ran- mismatch”—words in the utterance do not map
domly instead of maximizing information gain, cleanly onto the predicates in the logical form.
then the above statistics are 22.6% and 36.5%, This mismatch is especially pronounced in the
respectively. When more than one equivalence W IKI TABLE Q UESTIONS of Pasupat and Liang
classes remain, usually only one class is a dom- (2015). In the second example of Figure 6, “how
inant class with many equivalent logical forms, long” is realized by a logical form that computes
while other classes are small and contain logical a difference between two dates. The ramification
forms with unusual patterns (e.g., z5 in Figure 1). of this mismatch is that finding consistent logi-
The average size of the correct equivalence cal forms cannot solely proceed from the language
class is ≈ 3,000 with the standard deviation of side. This paper is about using annotated denota-
≈ 8,000. Because we have an expressive logical tions to drive the search over logical forms.
language, there are fundamentally many equiva- This takes us into the realm of program in-
lent ways of computing the same quantity. duction, where the goal is to infer a program
Crowdsourced Annotation. Data from crowd- (logical form) from input-output pairs (for us,
sourcing is more susceptible to errors. From the world-denotation pairs). Here, previous work
252 annotated examples, we use 177 examples has also leveraged the idea of dynamic program-
where at least two crowd workers agree on the an- ming on denotations (Lau et al., 2003; Liang et
swer of the original world w. When the crowd- al., 2010; Gulwani, 2011), though for more con-
sourced data is used to rule out spurious logical strained spaces of programs. Continuing the pro-
forms, the entire set Z of consistent logical forms gram analogy, generating fictitious worlds is simi-
is pruned out in 11.3% of the examples, and the lar in spirit to fuzz testing for generating new test
correct equivalent class is removed in 9% of the cases (Miller et al., 1990), but the goal there is
examples. These issues are due to annotation er- coverage in a single program rather than identi-
rors, inconsistent data (e.g., having date of death fying the correct (equivalence class of) programs.
before birth date), and different interpretations of This connection can potentially improve the flow
the question on the fictitious worlds. For the re- of ideas between the two fields.
maining examples, we are able to prune out 92.1% Finally, the effectiveness of dynamic program-
of spurious logical forms (or 92.6% of spurious ming on denotations relies on having a manage-
equivalence classes). able set of denotations. For more complex logi-
To prevent the entire Z from being pruned, we cal forms and larger knowledge graphs, there are
can relax our assumption and keep logical forms many possible angles worth exploring: performing
z that disagree with the annotation in at most 1 abstract interpretation to collapse denotations into
fictitious world. The number of times Z is pruned equivalence classes (Cousot and Cousot, 1977),
out is reduced to 3%, but the number of spurious relaxing the notion of getting the correct denota-
logical forms pruned also decreases to 78%. tion (Steinhardt and Liang, 2015), or working in a
continuous space and relying on gradient descent
7 Related Work and Discussion (Guu et al., 2015; Neelakantan et al., 2016; Yin et
This work evolved from a long tradition of learn- al., 2016; Reed and de Freitas, 2016). This paper,
ing executable semantic parsers, initially from an- by virtue of exact dynamic programming, sets the
notated logical forms (Zelle and Mooney, 1996; standard.
Kate et al., 2005; Zettlemoyer and Collins, 2005;
Zettlemoyer and Collins, 2007; Kwiatkowski et Acknowledgments. We gratefully acknowledge
al., 2010), but more recently from denotations the support of the Google Natural Language Un-
(Clarke et al., 2010; Liang et al., 2011; Berant derstanding Focused Program and the Defense
et al., 2013; Kwiatkowski et al., 2013; Pasupat Advanced Research Projects Agency (DARPA)
and Liang, 2015). A central challenge in learn- Deep Exploration and Filtering of Text (DEFT)
ing from denotations is finding consistent logical Program under Air Force Research Laboratory
forms (those that execute to a given denotation). (AFRL) contract no. FA8750-13-2-0040. In addi-
As Kwiatkowski et al. (2013) and Berant tion, we would like to thank anonymous reviewers
and Liang (2014) both noted, a chief difficulty for their helpful comments.
Reproducibility. Code and experiments for [Liang et al.2010] P. Liang, M. I. Jordan, and D. Klein.
this paper are available on the CodaLab platform 2010. Learning programs: A hierarchical Bayesian
approach. In International Conference on Machine
at https://worksheets.codalab.org/worksheets/
Learning (ICML), pages 639–646.
0x47cc64d9c8ba4a878807c7c35bb22a42/.
[Liang et al.2011] P. Liang, M. I. Jordan, and D. Klein.
2011. Learning dependency-based compositional
References semantics. In Association for Computational Lin-
guistics (ACL), pages 590–599.
[Artzi and Zettlemoyer2013] Y. Artzi and L. Zettle-
moyer. 2013. UW SPF: The University of Wash- [Liang2013] P. Liang. 2013. Lambda dependency-
ington semantic parsing framework. arXiv preprint based compositional semantics. arXiv.
arXiv:1311.3011.
[Miller et al.1990] B. P. Miller, L. Fredriksen, and
[Berant and Liang2014] J. Berant and P. Liang. 2014. B. So. 1990. An empirical study of the reliabil-
Semantic parsing via paraphrasing. In Association ity of UNIX utilities. Communications of the ACM,
for Computational Linguistics (ACL). 33(12):32–44.

[Berant et al.2013] J. Berant, A. Chou, R. Frostig, and [Neelakantan et al.2016] A. Neelakantan, Q. V. Le, and
P. Liang. 2013. Semantic parsing on Freebase I. Sutskever. 2016. Neural programmer: Induc-
from question-answer pairs. In Empirical Methods ing latent programs with gradient descent. In Inter-
in Natural Language Processing (EMNLP). national Conference on Learning Representations
(ICLR).
[Clarke et al.2010] J. Clarke, D. Goldwasser,
M. Chang, and D. Roth. 2010. Driving semantic [Pasupat and Liang2015] P. Pasupat and P. Liang.
parsing from the world’s response. In Computa- 2015. Compositional semantic parsing on semi-
tional Natural Language Learning (CoNLL), pages structured tables. In Association for Computational
18–27. Linguistics (ACL).

[Cousot and Cousot1977] P. Cousot and R. Cousot. [Reed and de Freitas2016] S. Reed and N. de Freitas.
1977. Abstract interpretation: a unified lattice 2016. Neural programmer-interpreters. In Inter-
model for static analysis of programs by construc- national Conference on Learning Representations
tion or approximation of fixpoints. In Principles of (ICLR).
Programming Languages (POPL), pages 238–252.
[Settles2010] B. Settles. 2010. Active learning litera-
[Gulwani2011] S. Gulwani. 2011. Automating string ture survey. Technical report, University of Wiscon-
processing in spreadsheets using input-output exam- sin, Madison.
ples. ACM SIGPLAN Notices, 46(1):317–330.
[Steinhardt and Liang2015] J. Steinhardt and P. Liang.
[Guu et al.2015] K. Guu, J. Miller, and P. Liang. 2015. 2015. Learning with relaxed supervision. In Ad-
Traversing knowledge graphs in vector space. In vances in Neural Information Processing Systems
Empirical Methods in Natural Language Processing (NIPS).
(EMNLP).
[Yin et al.2016] P. Yin, Z. Lu, H. Li, and B. Kao. 2016.
[Kate et al.2005] R. J. Kate, Y. W. Wong, and R. J. Neural enquirer: Learning to query tables with nat-
Mooney. 2005. Learning to transform natural to for- ural language. arXiv.
mal languages. In Association for the Advancement
of Artificial Intelligence (AAAI), pages 1062–1068. [Zelle and Mooney1996] M. Zelle and R. J. Mooney.
1996. Learning to parse database queries using in-
[Kwiatkowski et al.2010] T. Kwiatkowski, L. Zettle- ductive logic programming. In Association for the
moyer, S. Goldwater, and M. Steedman. 2010. Advancement of Artificial Intelligence (AAAI), pages
Inducing probabilistic CCG grammars from logi- 1050–1055.
cal form with higher-order unification. In Em-
pirical Methods in Natural Language Processing [Zettlemoyer and Collins2005] L. S. Zettlemoyer and
(EMNLP), pages 1223–1233. M. Collins. 2005. Learning to map sentences to log-
ical form: Structured classification with probabilis-
[Kwiatkowski et al.2013] T. Kwiatkowski, E. Choi, tic categorial grammars. In Uncertainty in Artificial
Y. Artzi, and L. Zettlemoyer. 2013. Scaling seman- Intelligence (UAI), pages 658–666.
tic parsers with on-the-fly ontology matching. In
Empirical Methods in Natural Language Processing [Zettlemoyer and Collins2007] L. S. Zettlemoyer and
(EMNLP). M. Collins. 2007. Online learning of relaxed
CCG grammars for parsing to logical form. In
[Lau et al.2003] T. Lau, S. Wolfman, P. Domingos, and Empirical Methods in Natural Language Process-
D. S. Weld. 2003. Programming by demonstra- ing and Computational Natural Language Learning
tion using version space algebra. Machine Learning, (EMNLP/CoNLL), pages 678–687.
53:111–156.

You might also like