Professional Documents
Culture Documents
parsers from denotations is picking out but this is limiting. For instance, for the system
consistent logical forms—those that yield in Pasupat and Liang (2015), in only 53.5% of the
the correct denotation—from a combina- examples was the correct logical form even in the
torially large space. To control the search set of generated logical forms.
space, previous work relied on restricted The goal of this paper is to solve two main chal-
set of rules, which limits expressivity. In lenges that prevent us from generating more ex-
this paper, we consider a much more ex- pressive logical forms. The first challenge is com-
pressive class of logical forms, and show putational: the number of logical forms grows ex-
how to use dynamic programming to effi- ponentially as their size increases. Directly enu-
ciently represent the complete set of con- merating over all logical forms becomes infeasi-
sistent logical forms. Expressivity also ble, and pruning techniques such as beam search
introduces many more spurious logical can inadvertently prune out correct logical forms.
forms which are consistent with the cor- The second challenge is the large increase in
rect denotation but do not represent the spurious logical forms—those that do not reflect 能执行出最终
meaning of the utterance. To address the semantics of the question but coincidentally 结果,但语法
this, we generate fictitious worlds and use 结构是错的
execute to the correct denotation. For example,
crowdsourced denotations on these worlds while logical forms z1 , . . . , z5 in Figure 1 are all
to filter out spurious logical forms. On consistent (they execute to the correct answer y),
the W IKI TABLE Q UESTIONS dataset, we the logical forms z4 and z5 are spurious and would
increase the coverage of answerable ques- give incorrect answers if the table were to change.
tions from 53.5% to 76%, and the ad- We address these two challenges by solving two
ditional crowdsourced supervision lets us interconnected tasks. The first task, which ad-
rule out 92.1% of spurious logical forms. dresses the computational challenge, is to enumer-
ate the set Z of all consistent logical forms given
1 Introduction
a question x, a knowledge source w (“world”),
Consider the task of learning to answer com- and the target denotation y (Section 4). Observ-
plex natural language questions (e.g., “Where did ing that the space of possible denotations grows
the last 1st place finish occur?”) using only much more slowly than the space of logical forms,
question-answer pairs as supervision (Clarke et we perform dynamic programming on denotations
al., 2010; Liang et al., 2011; Berant et al., (DPD) to make search feasible. Our method is
2013; Artzi and Zettlemoyer, 2013). Seman- guaranteed to find all consistent logical forms up
tic parsers map the question into a logical form to some bounded size.
(e.g., R[Venue].argmax(Position.1st, Index)) Given the set Z of consistent logical forms, the
that can be executed on a knowledge source to ob- second task is to filter out spurious logical forms
tain the answer (denotation). Logical forms are from Z (Section 5). Using the property that spuri-
very expressive since they can be recursively com- ous logical forms ultimately give a wrong answer
posed, but this very expressivity makes it more when the data in the world w changes, we create
可以看成是某运动员的比赛记录
如何找到 Annotation. To pin down the correct equiva- For the experiments, we use the training portion
正确的等价集合
lence class, we acquire the correct answers to the of the W IKI TABLE Q UESTIONS dataset (Pasupat
question x on some subset W 0 = (w10 , . . . , w`0 ) ⊆ and Liang, 2015), which consists of 14,152 ques-
W of ` fictitious worlds, as it is impractical to ob- tions on 1,679 Wikipedia tables gathered by crowd
tain annotations on all fictitious worlds in W . We workers. Answering these complex questions re-
compile equivalence classes that agree with the an- quires different types of operations. The same
notations into a set Zc of correct logical forms. operation can be phrased in different ways (e.g.,
We want to choose W 0 that gives us the most “best”, “top ranking”, or “lowest ranking num-
information about the correct equivalence class as ber”) and the interpretation of some phrases de-
possible. This is analogous to standard practices pend on the context (e.g., “number of ” could be
in active learning (Settles, 2010).3 Let Q be the a table lookup or a count operation). The lexical
set of all equivalence classes q, and let JqKW 0 be content of the questions is also quite diverse: even
the denotation tuple computed by executing an ar- excluding numbers and symbols, the 14,152 train-
bitrary z ∈ q on W 0 . The subset W 0 divides Q ing examples contain 9,671 unique words, only
F:对q进行分组 into partitions Ft = {q ∈ Q : JqKW 0 = t} based 10% of which appear more than 10 times.
on the denotation tuples t (e.g., from Figure 5, if We attempted to manually annotate the first 300
W 0 contains just w2 , then q2 and q3 will be in the examples with lambda DCS logical forms. We
same partition F(China) ). The annotation t∗ , which successfully constructed correct logical forms for
is also a denotation tuple, will mark one of these 84% of these examples, which is a good number
partitions Ft∗ as correct. Thus, to prune out many considering the questions were created by humans
spurious equivalence classes, the partitions should who could use the table however they wanted. The
partitions
越多越好 be as numerous and as small as possible. remaining 16% reflect limitations in our setup—
More formally, we choose a subset W 0 that for example, non-canonical table layouts, answers
maximizes the expected information gain (or appearing in running text or images, and com-
equivalently, the reduction in entropy) about mon sense reasoning (e.g., knowing that “Quarter-
the correct equivalence class given the annota- final” is better than “Round of 16”).
Q:正确等价类 tion. With random variables Q ∈ Q represent-
的随机变量 ∗ 6.1 Generality of deduction rules
ing the correct equivalence class and TW 0 for
0
the annotation on worlds W , we seek to find We compare our set of deduction rules with the
arg minW 0 H(Q | TW ∗ ). Assuming a uniform one given in Pasupat and Liang (2015) (hence-
0
prior on Q (p(q) = 1/|Q|) and accurate annota- forth PL15). PL15 reported generating the anno-
tion (p(t∗ | q) = I[q ∈ Ft∗ ]): tated logical form in 53.5% of the first 200 exam-
ples. With our more general deduction rules, we
∗
X p(t) use DPD to verify that the rules are able to gener-
H(Q | TW 0) = p(q, t) log
p(q, t) ate the annotated logical form in 76% of the first
q,t
300 examples, within the logical form size limit
1 X
= |Ft | log |Ft |. (*) smax of 7. This is 90.5% of the examples that were
|Q| t
successfully annotated. Figure 6 shows some ex-
3
amples of logical forms we cover that PL15 could
The difference is that we are obtaining partial informa-
tion about an individual example rather than partial informa- not. Since DPD is guaranteed to find all consis-
tion about the parameters. tent logical forms, we can be sure that the logical
“which opponent has the most wins” 1.0K
count
0.6K
R[λx.count(Opponent.x u Result.Lost]) denotations
0.4K
“how long did ian armstrong serve?” 0.2K
z = R[Num2].R[Term].Member.IanArmstrong 0
0 1 2 3 4 5 6 7
− R[Number].R[Term].Member.IanArmstrong
logical form size
“which players came in a place before lukas bauer?”
z = R[Name].Index.<.R[Index].Name.LukasBauer Figure 7: The median of the number of logical
“which players played the same position as ardo kreek?” forms (dashed) and denotations (solid) as the for-
z = R[Player].Position.R[Position].Player.Ardo
mula size increases. The space of logical forms
u !=.Ardo
grows much faster than the space of denotations.
Figure 6: Several example logical forms our sys-
0.4
forms not covered are due to limitations of the de- 0.2
[Berant et al.2013] J. Berant, A. Chou, R. Frostig, and [Neelakantan et al.2016] A. Neelakantan, Q. V. Le, and
P. Liang. 2013. Semantic parsing on Freebase I. Sutskever. 2016. Neural programmer: Induc-
from question-answer pairs. In Empirical Methods ing latent programs with gradient descent. In Inter-
in Natural Language Processing (EMNLP). national Conference on Learning Representations
(ICLR).
[Clarke et al.2010] J. Clarke, D. Goldwasser,
M. Chang, and D. Roth. 2010. Driving semantic [Pasupat and Liang2015] P. Pasupat and P. Liang.
parsing from the world’s response. In Computa- 2015. Compositional semantic parsing on semi-
tional Natural Language Learning (CoNLL), pages structured tables. In Association for Computational
18–27. Linguistics (ACL).
[Cousot and Cousot1977] P. Cousot and R. Cousot. [Reed and de Freitas2016] S. Reed and N. de Freitas.
1977. Abstract interpretation: a unified lattice 2016. Neural programmer-interpreters. In Inter-
model for static analysis of programs by construc- national Conference on Learning Representations
tion or approximation of fixpoints. In Principles of (ICLR).
Programming Languages (POPL), pages 238–252.
[Settles2010] B. Settles. 2010. Active learning litera-
[Gulwani2011] S. Gulwani. 2011. Automating string ture survey. Technical report, University of Wiscon-
processing in spreadsheets using input-output exam- sin, Madison.
ples. ACM SIGPLAN Notices, 46(1):317–330.
[Steinhardt and Liang2015] J. Steinhardt and P. Liang.
[Guu et al.2015] K. Guu, J. Miller, and P. Liang. 2015. 2015. Learning with relaxed supervision. In Ad-
Traversing knowledge graphs in vector space. In vances in Neural Information Processing Systems
Empirical Methods in Natural Language Processing (NIPS).
(EMNLP).
[Yin et al.2016] P. Yin, Z. Lu, H. Li, and B. Kao. 2016.
[Kate et al.2005] R. J. Kate, Y. W. Wong, and R. J. Neural enquirer: Learning to query tables with nat-
Mooney. 2005. Learning to transform natural to for- ural language. arXiv.
mal languages. In Association for the Advancement
of Artificial Intelligence (AAAI), pages 1062–1068. [Zelle and Mooney1996] M. Zelle and R. J. Mooney.
1996. Learning to parse database queries using in-
[Kwiatkowski et al.2010] T. Kwiatkowski, L. Zettle- ductive logic programming. In Association for the
moyer, S. Goldwater, and M. Steedman. 2010. Advancement of Artificial Intelligence (AAAI), pages
Inducing probabilistic CCG grammars from logi- 1050–1055.
cal form with higher-order unification. In Em-
pirical Methods in Natural Language Processing [Zettlemoyer and Collins2005] L. S. Zettlemoyer and
(EMNLP), pages 1223–1233. M. Collins. 2005. Learning to map sentences to log-
ical form: Structured classification with probabilis-
[Kwiatkowski et al.2013] T. Kwiatkowski, E. Choi, tic categorial grammars. In Uncertainty in Artificial
Y. Artzi, and L. Zettlemoyer. 2013. Scaling seman- Intelligence (UAI), pages 658–666.
tic parsers with on-the-fly ontology matching. In
Empirical Methods in Natural Language Processing [Zettlemoyer and Collins2007] L. S. Zettlemoyer and
(EMNLP). M. Collins. 2007. Online learning of relaxed
CCG grammars for parsing to logical form. In
[Lau et al.2003] T. Lau, S. Wolfman, P. Domingos, and Empirical Methods in Natural Language Process-
D. S. Weld. 2003. Programming by demonstra- ing and Computational Natural Language Learning
tion using version space algebra. Machine Learning, (EMNLP/CoNLL), pages 678–687.
53:111–156.