Professional Documents
Culture Documents
Percy Liang
Foundations of ML Bootcamp
January 26, 2017
The Imitation Game (1950)
1
The Imitation Game (1950)
1
The Imitation Game (1950)
1
The Imitation Game (1950)
end-to-end
1
The Imitation Game (1950)
end-to-end interactivity
1
The Imitation Game (1950)
1
SHRDLU (1971)
2
SHRDLU (1971)
Person: Pick up a big red block.
Computer: OK.
2
SHRDLU (1971)
Person: Pick up a big red block.
Computer: OK.
Person: Grasp the pyramid.
Computer: I dont understand which pyramid you mean.
2
SHRDLU (1971)
Person: Pick up a big red block.
Computer: OK.
Person: Grasp the pyramid.
Computer: I dont understand which pyramid you mean.
Person (changing their mind): Find a block which is taller than the one
you are holding and put it into the box.
Computer: By it, I assume you mean the block which is taller than
the one I am holding.
2
SHRDLU (1971)
Person: Pick up a big red block.
Computer: OK.
Person: Grasp the pyramid.
Computer: I dont understand which pyramid you mean.
Person (changing their mind): Find a block which is taller than the one
you are holding and put it into the box.
Computer: By it, I assume you mean the block which is taller than
the one I am holding.
Computer: OK.
Person: What does the box contain?
Computer: The blue pyramid and the blue block.
Person: What is the pyramid supported by?
Computer: The box.
2
SHRDLU (1971)
Person: Pick up a big red block.
Computer: OK.
Person: Grasp the pyramid.
Computer: I dont understand which pyramid you mean.
Person (changing their mind): Find a block which is taller than the one
you are holding and put it into the box.
Computer: By it, I assume you mean the block which is taller than
the one I am holding.
Computer: OK.
Person: What does the box contain?
Computer: The blue pyramid and the blue block.
Person: What is the pyramid supported by?
Computer: The box.
end-to-end
2
The Complexity Barrier
A number of people have suggested to me that large programs like the
SHRDLU program for understanding natural language represent a kind
of dead end in AI programming. Complex interactions between its com-
ponents give the program much of its power, but at the same time they
present a formidable obstacle to understanding and extending it. In order
to grasp any part, it is necessary to understand how it fits with other
parts, presents a dense mass, with no easy footholds. Even having writ-
ten the program, I find it near the limit of what I can keep in mind at
once.
3
1990s: statistical revolution
4
1990s: statistical revolution
Compute
4
1990s: statistical revolution
Compute Data
4
[McDonald et al., 2005; de Marnee et al. 2008]
5
[Sch
utze, 1993; Bengio et al. 2003; Mikolov 2013; etc.]
6
[Sch
utze, 1993; Bengio et al. 2003; Mikolov 2013; etc.]
6
Breadth
Depth
7
Breadth
Depth
7
Breadth
Depth
7
Breadth
Depth
7
Relevance for ML
Opportunity for transfer of ideas between ML and NLP
8
Relevance for ML
Opportunity for transfer of ideas between ML and NLP
8
Relevance for ML
Opportunity for transfer of ideas between ML and NLP
8
Relevance for ML
Opportunity for transfer of ideas between ML and NLP
8
Relevance for ML
Opportunity for transfer of ideas between ML and NLP
8
Relevance for ML
Opportunity for transfer of ideas between ML and NLP
8
Outline
Properties of language
Distributional semantics
Frame semantics
Model-theoretic semantics
Interactive learning
Reflections
9
Levels of linguistic analyses
10
Levels of linguistic analyses
10
Levels of linguistic analyses
10
Levels of linguistic analyses
10
Analogy with programming languages
Syntax: no compiler errors
Semantics: no implementation bugs
Pragmatics: implemented the right algorithm
11
Analogy with programming languages
Syntax: no compiler errors
Semantics: no implementation bugs
Pragmatics: implemented the right algorithm
Dierent syntax, same semantics (5):
2 + 3 , 3 + 2
11
Analogy with programming languages
Syntax: no compiler errors
Semantics: no implementation bugs
Pragmatics: implemented the right algorithm
Dierent syntax, same semantics (5):
2 + 3 , 3 + 2
Same syntax, dierent semantics (1 and 1.5):
3 / 2 (Python 2.7) 6, 3 / 2 (Python 3)
11
Analogy with programming languages
Syntax: no compiler errors
Semantics: no implementation bugs
Pragmatics: implemented the right algorithm
Dierent syntax, same semantics (5):
2 + 3 , 3 + 2
Same syntax, dierent semantics (1 and 1.5):
3 / 2 (Python 2.7) 6, 3 / 2 (Python 3)
Good semantics, bad pragmatics:
correct implementation of deep neural network
for estimating coin flip prob.
11
Whats a word?
light
12
Whats a word?
light
Multi-word expressions: meaning unit beyond a word
light bulb
12
Whats a word?
light
Multi-word expressions: meaning unit beyond a word
light bulb
Morphology: meaning unit within a word
light lighten lightening relight
12
Whats a word?
light
Multi-word expressions: meaning unit beyond a word
light bulb
Morphology: meaning unit within a word
light lighten lightening relight
Polysemy: one word has multiple meanings (word senses)
The light was filtered through a soft glass window.
He stepped into the light.
This lamp lights up the room.
The load is not light.
12
Synonymy
Words:
confusing
13
Synonymy
Words:
13
Synonymy
Words:
Sentences:
13
Synonymy
Words:
Sentences:
13
Synonymy
Words:
Sentences:
13
Other lexical relations
Hyponymy (is-a):
a cat is a mammal
14
Other lexical relations
Hyponymy (is-a):
a cat is a mammal
Meronomy (has-a):
14
Other lexical relations
Hyponymy (is-a):
a cat is a mammal
Meronomy (has-a):
I am speaking.
14
Compositional semantics
Two ideas: model theory and compositionality
Block 2 is blue.
15
Compositional semantics
Two ideas: model theory and compositionality
Block 2 is blue.
1 2 3 4
15
Compositional semantics
Two ideas: model theory and compositionality
Block 2 is blue.
1 2 3 4
15
Quantifiers
Universal and existential quantification:
Every block is blue.
1 2 3 4
Some block is blue.
1 2 3 4
16
Quantifiers
Universal and existential quantification:
Every block is blue.
1 2 3 4
Some block is blue.
1 2 3 4
Quantifier scope ambiguity:
Every non-blue block is next to some blue block.
1 2 3 4
16
Quantifiers
Universal and existential quantification:
Every block is blue.
1 2 3 4
Some block is blue.
1 2 3 4
Quantifier scope ambiguity:
Every non-blue block is next to some blue block.
1 2 3 4
Every non-blue block is next to some blue block.
1 2 3
16
Multiple possible worlds
Modality:
Block 2 must be blue. Block 1 could be red.
1 2 1 2 1 2
17
Multiple possible worlds
Modality:
Block 2 must be blue. Block 1 could be red.
1 2 1 2 1 2
Beliefs:
17
Multiple possible worlds
Modality:
Block 2 must be blue. Block 1 could be red.
1 2 1 2 1 2
Beliefs:
18
Pragmatics
Conversational implicature: new material suggested (not logically
implied) by sentence
18
Pragmatics
Conversational implicature: new material suggested (not logically
implied) by sentence
18
Pragmatics
Conversational implicature: new material suggested (not logically
implied) by sentence
18
Pragmatics
Semantics: what does it mean literally?
19
Pragmatics
Semantics: what does it mean literally?
19
Vagueness, ambiguity, uncertainty
Vagueness: does not specify full information
20
Vagueness, ambiguity, uncertainty
Vagueness: does not specify full information
20
Vagueness, ambiguity, uncertainty
Vagueness: does not specify full information
20
Vagueness, ambiguity, uncertainty
Vagueness: does not specify full information
20
Summary so far
Analyses: syntax, semantics, pragmatics
21
Outline
Properties of language
Distributional semantics
Frame semantics
Model-theoretic semantics
Interactive learning
Reflections
22
Distributional semantics: warmup
The new design has lines.
23
Distributional semantics: warmup
The new design has lines.
23
Distributional semantics
The new design has lines.
24
Distributional semantics
The new design has lines.
Roots in linguistics:
Distributional hypothesis: Semantically similar words occur in
similar contexts [Harris, 1954]
You shall know a word by the company it keeps. [Firth, 1957]
24
Distributional semantics
The new design has lines.
Roots in linguistics:
Distributional hypothesis: Semantically similar words occur in
similar contexts [Harris, 1954]
You shall know a word by the company it keeps. [Firth, 1957]
Contrast: Chomskys generative grammar (lots of hidden prior
structure, no data)
24
Distributional semantics
The new design has lines.
Roots in linguistics:
Distributional hypothesis: Semantically similar words occur in
similar contexts [Harris, 1954]
You shall know a word by the company it keeps. [Firth, 1957]
Contrast: Chomskys generative grammar (lots of hidden prior
structure, no data)
Upshot: data-driven!
24
General recipe
1. Form a word-context matrix of counts (data)
context c
word w N
25
General recipe
1. Form a word-context matrix of counts (data)
context c
word w N
25
[Deerwater/Dumais/Furnas/Landauer/Harshman, 1990]
26
[Deerwater/Dumais/Furnas/Landauer/Harshman, 1990]
Doc1 Doc2
cats 1 0
dogs 0 1
have 1 1
tails 1 1
26
[Deerwater/Dumais/Furnas/Landauer/Harshman, 1990]
document c
S V>
word w N
27
[Deerwater/Dumais/Furnas/Landauer/Harshman, 1990]
document c
S V>
word w N
27
[Mikolov/Sutskever/Chen/Corrado/Dean, 2013 (word2vec)]
28
[Mikolov/Sutskever/Chen/Corrado/Dean, 2013 (word2vec)]
28
[Mikolov/Sutskever/Chen/Corrado/Dean, 2013 (word2vec)]
29
[Mikolov/Sutskever/Chen/Corrado/Dean, 2013 (word2vec)]
1
p (g = 1 | w, c) = (1 + exp(w c ))
29
[Mikolov/Sutskever/Chen/Corrado/Dean, 2013 (word2vec)]
1
p (g = 1 | w, c) = (1 + exp(w c ))
29
Other models
Multinomial models:
HMM word clustering [Brown et al., 1992]
Latent Dirichlet Allocation [Blei et al., 2003]
30
Other models
Multinomial models:
HMM word clustering [Brown et al., 1992]
Latent Dirichlet Allocation [Blei et al., 2003]
Neural network models:
Multi-tasking neural network [Weston/Collobert, 2008]
30
Other models
Multinomial models:
HMM word clustering [Brown et al., 1992]
Latent Dirichlet Allocation [Blei et al., 2003]
Neural network models:
Multi-tasking neural network [Weston/Collobert, 2008]
Recurrent/recursive models: (can embed phrases too)
Neural language models [Bengio et al., 2003]
Neural machine translation [Sutskever/Vinyals/Le, 2014,
Cho/Merrienboer/Bahdanau/Bengio, 2014]
Recursive neural networks [Socher/Lin/Ng/Manning, 2011]
30
2D visualization of word vectors
31
2D visualization of word vectors
31
Nearest neighbors
cherish
(words)
adore
love
admire
embrace
rejoice
(contexts)
cherish
both
love
pride
thy
quasi-synonyms
32
Nearest neighbors
cherish tiger
(words) (words)
adore leopard
love dhole
admire warthog
embrace rhinoceros
rejoice lion
(contexts) (contexts)
cherish tiger
both leopard
love panthera
pride woods
thy puma
quasi-synonyms co-hyponyms
32
Nearest neighbors
cherish tiger good
(words) (words) (words)
32
Nearest neighbors
cherish tiger good
(words) (words) (words)
33
Eect of context
Suppose Barack Obama always appear together (a collocation).
33
Eect of context
Suppose Barack Obama always appear together (a collocation).
33
Summary so far
Premise: semantics = context of word/phrase
34
Summary so far
Premise: semantics = context of word/phrase
Recipe: form word-context matrix + dimensionality reduction
context c
word w N
34
Summary so far
Premise: semantics = context of word/phrase
Recipe: form word-context matrix + dimensionality reduction
context c
word w N
Pros:
Simple models, leverage tons of raw text
Context captures nuanced information about usage
Word vectors useful in downstream tasks
34
Food for thought
What contexts?
35
Food for thought
What contexts?
Examples to ponder:
35
Outline
Properties of language
Distributional semantics
Frame semantics
Model-theoretic semantics
Interactive learning
Reflections
36
Word meaning revisited
sold
37
Word meaning revisited
sold
Distributional semantics: all the contexts in which sold occurs
...was sold by... ...sold me that piece of...
Can find similar words/contexts and generalize (dimensionality re-
duction), but monolithic (no internal structure on word vectors)
37
Word meaning revisited
sold
Distributional semantics: all the contexts in which sold occurs
...was sold by... ...sold me that piece of...
Can find similar words/contexts and generalize (dimensionality re-
duction), but monolithic (no internal structure on word vectors)
Frame semantics: meaning given by a frame, a stereotypical situation
Commercial transaction
SELLER : ?
BUYER : ?
GOODS : ?
PRICE : ?
37
An example
Cynthia sold the bike for $200.
38
An example
Cynthia sold the bike for $200.
Commercial transaction
SELLER : Cynthia
GOODS : the bike
PRICE : $200
38
[Fillmore, 1977; Langacker, 1987]
39
[Fillmore, 1977; Langacker, 1987]
39
[Fillmore, 1977; Langacker, 1987]
39
[Fillmore, 1977; Langacker, 1987]
39
[Fillmore, 1977; Langacker, 1987]
40
Historical developments
Linguistics:
Case grammar [Fillmore, 1968]: introduced idea of deep semantic
roles (agents, themes, patients) which are tied to surface syntax
(subjects, objects)
AI / cognitive science:
Frames [Minsky, 1975]: a data-structure for representing a stereo-
typed situation, like...a childs birthday party
40
Historical developments
Linguistics:
Case grammar [Fillmore, 1968]: introduced idea of deep semantic
roles (agents, themes, patients) which are tied to surface syntax
(subjects, objects)
AI / cognitive science:
Frames [Minsky, 1975]: a data-structure for representing a stereo-
typed situation, like...a childs birthday party
Scripts [Schank & Abelson, 1977]: represent procedural knowledge
(going to a restaurant)
40
Historical developments
Linguistics:
Case grammar [Fillmore, 1968]: introduced idea of deep semantic
roles (agents, themes, patients) which are tied to surface syntax
(subjects, objects)
AI / cognitive science:
Frames [Minsky, 1975]: a data-structure for representing a stereo-
typed situation, like...a childs birthday party
Scripts [Schank & Abelson, 1977]: represent procedural knowledge
(going to a restaurant)
Frames [Fillmore, 1977]: coherent individuatable perception, mem-
ory, experience, action, or object
40
Historical developments
Linguistics:
Case grammar [Fillmore, 1968]: introduced idea of deep semantic
roles (agents, themes, patients) which are tied to surface syntax
(subjects, objects)
AI / cognitive science:
Frames [Minsky, 1975]: a data-structure for representing a stereo-
typed situation, like...a childs birthday party
Scripts [Schank & Abelson, 1977]: represent procedural knowledge
(going to a restaurant)
Frames [Fillmore, 1977]: coherent individuatable perception, mem-
ory, experience, action, or object
NLP:
FrameNet (1998) and PropBank (2002)
40
From syntax to semantics
Commercial transaction
SELLER : Cynthia
BUYER : Bob
GOODS : the bike
PRICE : $200
41
From syntax to semantics
Commercial transaction
SELLER : Cynthia
BUYER : Bob
GOODS : the bike
PRICE : $200
41
From syntax to semantics
Commercial transaction
SELLER : Cynthia
BUYER : Bob
GOODS : the bike
PRICE : $200
41
From syntax to semantics
Commercial transaction
SELLER : Cynthia
BUYER : Bob
GOODS : the bike
PRICE : $200
42
Semantic role labeling
Task:
42
Semantic role labeling
Task:
Subtasks:
42
A brief history
First system (on FrameNet) [Gildea/Jurafsky, 2002]
CoNLL shared tasks [2004, 2005]
Use ILP to enforce constraints on arguments [Pun-
yakanok/Roth/Yih, 2008]
No feature engineering or parse trees [Collobert/Weston, 2008]
Semi-supervised frame identification [Das/Smith, 2011]
Embeddings for frame identification [Her-
mann/Das/Weston/Ganchev, 2014]
Dynamic programming for some argument constraints [Tack-
strom/Ganchev/Das, 2015]
43
[Banarescu et al., 2013]
44
[Banarescu et al., 2013]
44
[Banarescu et al., 2013]
Coreference resolution:
44
[Banarescu et al., 2013]
Coreference resolution:
44
[Flanigan/Thomson/Carbonell/Dyer/Smith, 2014]
45
Summary so far
Frames: stereotypical situations that provide rich structure for un-
derstanding
46
Summary so far
Frames: stereotypical situations that provide rich structure for un-
derstanding
46
Summary so far
Frames: stereotypical situations that provide rich structure for un-
derstanding
46
Food for thought
Both distributional semantics (DS) and frame semantics (FS) in-
volve compression/abstraction
47
Food for thought
Both distributional semantics (DS) and frame semantics (FS) in-
volve compression/abstraction
Examples to ponder:
47
48
Outline
Properties of language
Distributional semantics
Frame semantics
Model-theoretic semantics
Interactive learning
Reflections
49
Types of semantics
Every non-blue block is next to some blue block.
50
Types of semantics
Every non-blue block is next to some blue block.
50
Types of semantics
Every non-blue block is next to some blue block.
50
Types of semantics
Every non-blue block is next to some blue block.
1 2 3 4 and 1 2 3 4
50
Executable semantic parsing
[database]
What is the largest city in Europe by population?
51
Executable semantic parsing
[database]
What is the largest city in Europe by population?
semantic parsing
Cities
51
Executable semantic parsing
[database]
What is the largest city in Europe by population?
semantic parsing
Cities Europe
51
Executable semantic parsing
[database]
What is the largest city in Europe by population?
semantic parsing
Cities ContainedBy(Europe)
51
Executable semantic parsing
[database]
What is the largest city in Europe by population?
semantic parsing
Cities \ ContainedBy(Europe)
51
Executable semantic parsing
[database]
What is the largest city in Europe by population?
semantic parsing
51
Executable semantic parsing
[database]
What is the largest city in Europe by population?
semantic parsing
51
Executable semantic parsing
[database]
What is the largest city in Europe by population?
semantic parsing
execute
Istanbul
51
Executable semantic parsing
[calendar]
52
Executable semantic parsing
[calendar]
semantic parsing
52
Executable semantic parsing
[calendar]
semantic parsing
execute
[reminder added]
52
Executable semantic parsing
[context]
[sentence]
semantic parsing
[program]
execute
[behavior]
53
A brief history of semantic parsing
GeoQuery [Zelle & Mooney 1996]
54
Compositional semantics
Richard Montague
cities in Europe
Cities \ ContainedBy(Europe)
55
Compositional semantics
Richard Montague
cities in Europe
Cities
cities
Cities \ ContainedBy(Europe)
55
Compositional semantics
Richard Montague
cities in Europe
Cities
cities ContainedBy
in
Cities \ ContainedBy(Europe)
55
Compositional semantics
Richard Montague
cities in Europe
Cities
in Europe
Cities \ ContainedBy(Europe)
55
Compositional semantics
Richard Montague
cities in Europe
Cities ContainedBy(Europe)
in Europe
Cities \ ContainedBy(Europe)
55
Compositional semantics
Richard Montague
cities in Europe
Cities \ ContainedBy(Europe)
Cities ContainedBy(Europe)
in Europe
Cities \ ContainedBy(Europe)
55
Language variation
cities in Europe
Cities \ ContainedBy(Europe)
Cities ContainedBy(Europe)
in Europe
Cities \ ContainedBy(Europe)
56
Language variation
cities in Europe
European cities
cities that are in Europe
cities located in Europe
cities on the European continent
Cities \ ContainedBy(Europe)
56
Language variation
cities in Europe
European cities
cities that are in Europe
cities located in Europe
cities on the European continent
no!
Cities \ ContainedBy(Europe)
56
Deep learning
Object recognition: Krizhevsky/Sutskever/Hinton (2012)
car
57
Deep learning
Object recognition: Krizhevsky/Sutskever/Hinton (2012)
car
57
Deep learning
Object recognition: Krizhevsky/Sutskever/Hinton (2012)
car
accuracy
57
Deep learning
Object recognition: Krizhevsky/Sutskever/Hinton (2012)
car
accuracy simplicity
57
[Jia & Liang, 2016]
58
[Jia & Liang, 2016]
58
[Jia & Liang, 2016]
58
[Jia & Liang, 2016]
58
[Jia & Liang, 2016]
88.9 89.3
90
accuracy
86.1 86.6
85
80
70
WM07 ZC07 KZGS11 RNN RNN+recomb
state-of-art, simpler
59
Summary so far
[context]
What is the largest city in Europe by population?
semantic parsing
execute
Istanbul
60
[Zelle & Mooney, 1996; Zettlemoyer & Collins, 2005; Clarke et al. 2010; Liang et al., 2011]
61
[Zelle & Mooney, 1996; Zettlemoyer & Collins, 2005; Clarke et al. 2010; Liang et al., 2011]
61
[Zelle & Mooney, 1996; Zettlemoyer & Collins, 2005; Clarke et al. 2010; Liang et al., 2011]
61
[Zelle & Mooney, 1996; Zettlemoyer & Collins, 2005; Clarke et al. 2010; Liang et al., 2011]
61
[Zelle & Mooney, 1996; Zettlemoyer & Collins, 2005; Clarke et al. 2010; Liang et al., 2011]
61
Training intuition
Where did Mozart tupress?
Vienna
62
Training intuition
Where did Mozart tupress?
PlaceOfBirth(WolfgangMozart)
PlaceOfDeath(WolfgangMozart)
PlaceOfMarriage(WolfgangMozart)
Vienna
62
Training intuition
Where did Mozart tupress?
PlaceOfBirth(WolfgangMozart) ) Salzburg
PlaceOfDeath(WolfgangMozart) ) Vienna
PlaceOfMarriage(WolfgangMozart) ) Vienna
Vienna
62
Training intuition
Where did Mozart tupress?
PlaceOfBirth(WolfgangMozart) ) Salzburg
PlaceOfDeath(WolfgangMozart) ) Vienna
PlaceOfMarriage(WolfgangMozart) ) Vienna
Vienna
62
Training intuition
Where did Mozart tupress?
PlaceOfBirth(WolfgangMozart) ) Salzburg
PlaceOfDeath(WolfgangMozart) ) Vienna
PlaceOfMarriage(WolfgangMozart) ) Vienna
Vienna
62
Training intuition
Where did Mozart tupress?
PlaceOfBirth(WolfgangMozart) ) Salzburg
PlaceOfDeath(WolfgangMozart) ) Vienna
PlaceOfMarriage(WolfgangMozart) ) Vienna
Vienna
62
Training intuition
Where did Mozart tupress?
PlaceOfBirth(WolfgangMozart) ) Salzburg
PlaceOfDeath(WolfgangMozart) ) Vienna
PlaceOfMarriage(WolfgangMozart) ) Vienna
Vienna
62
Training intuition
Where did Mozart tupress?
PlaceOfBirth(WolfgangMozart) ) Salzburg
PlaceOfDeath(WolfgangMozart) ) Vienna
PlaceOfMarriage(WolfgangMozart) ) Vienna
Vienna
62
Training intuition
Where did Mozart tupress?
PlaceOfBirth(WolfgangMozart) ) Salzburg
PlaceOfDeath(WolfgangMozart) ) Vienna
PlaceOfMarriage(WolfgangMozart) ) Vienna
Vienna
62
Searching...
Greece held its last Summer Olympics in which year?
?
2004
Year City Country Nations
1896 Athens Greece 14
1900 Paris France 24
1904 St. Louis USA 12
... ... ... ...
2004 Athens Greece 201
2008 Beijing China 204
2012 London UK 204
63
Searching...
Greece held its last Summer Olympics in which year?
R[Index].Country.Greece
2004
Year City Country Nations
1896 Athens Greece 14
1900 Paris France 24
1904 St. Louis USA 12
... ... ... ...
2004 Athens Greece 201
2008 Beijing China 204
2012 London UK 204
63
Searching...
Greece held its last Summer Olympics in which year?
R[Nations].Country.Greece
2004
Year City Country Nations
1896 Athens Greece 14
1900 Paris France 24
1904 St. Louis USA 12
... ... ... ...
2004 Athens Greece 201
2008 Beijing China 204
2012 London UK 204
63
Searching...
Greece held its last Summer Olympics in which year?
argmax(Country.Greece, Nations)
2004
Year City Country Nations
1896 Athens Greece 14
1900 Paris France 24
1904 St. Louis USA 12
... ... ... ...
2004 Athens Greece 201
2008 Beijing China 204
2012 London UK 204
63
Searching...
Greece held its last Summer Olympics in which year?
argmax(Country.Greece, Index)
2004
Year City Country Nations
1896 Athens Greece 14
1900 Paris France 24
1904 St. Louis USA 12
... ... ... ...
2004 Athens Greece 201
2008 Beijing China 204
2012 London UK 204
63
Searching...
Greece held its last Summer Olympics in which year?
2004
Year City Country Nations
1896 Athens Greece 14
1900 Paris France 24
1904 St. Louis USA 12
... ... ... ...
2004 Athens Greece 201
2008 Beijing China 204
2012 London UK 204
63
Searching...
Greece held its last Summer Olympics in which year?
R[Date].R[Year].argmax(Country.Greece, Index)
2004
Year City Country Nations
1896 Athens Greece 14
1900 Paris France 24
1904 St. Louis USA 12
... ... ... ...
2004 Athens Greece 201
2008 Beijing China 204
2012 London UK 204
63
64
[Pasupat & Liang 2015]
WikiTableQuestions
65
Language & world
66
Language & world
66
Language & world
66
Language & world
language world
66
Summary so far
[context]
What is the largest city in Europe by population?
semantic parsing
execute
Istanbul
67
Food for thought
Learning from denotations is hard; implicity moving from easy to
harder examples; dont have good formalism yet
68
Food for thought
Learning from denotations is hard; implicity moving from easy to
harder examples; dont have good formalism yet
68
Food for thought
Learning from denotations is hard; implicity moving from easy to
harder examples; dont have good formalism yet
68
Outline
Properties of language
Distributional semantics
Frame semantics
Model-theoretic semantics
Interactive learning
Reflections
69
Language game
Wittgenstein (1953):
Language derives its meaning from use.
70
Language game
Wittgenstein (1953):
Language derives its meaning from use.
70
[Wang et al. 2016]
SHRDLURN
71
[Wang et al. 2016]
SHRDLURN
remove red
71
[Wang et al. 2016]
SHRDLURN
add(hascolor(red))
add(hascolor(brown))
remove red
remove(hascolor(red))
remove(hascolor(brown))
71
[Wang et al. 2016]
SHRDLURN
add(hascolor(red))
add(hascolor(brown))
remove red
remove(hascolor(red))
remove(hascolor(brown))
71
SHRDLURN
shrdlurn.sidaw.xyz/acl16
72
Experiments
100 players from Amazon Mechanical Turk
73
Results: top players (rank 1-20)
(3.01) (2.72)
(2.78)
74
Results: average players (rank 21-50)
(9.17) (8.37)
76
Results: interesting players
(Polish)
usu
n brazowe klocki
postaw pomaranczowy klocek na pierwszym klocku
postaw czerwone klocki na pomara nczowych
usu
n pomaranczowe klocki w gornym rzedzie
77
Results: interesting players
(Polish) (Polish notation)
usu
n brazowe klocki rm scat + 1 c
postaw pomaranczowy klocek na pierwszym klocku +1c
postaw czerwone klocki na pomara nczowych rm sh
usu
n pomaranczowe klocki w gornym rzedzie + 1 2 4 sh
+1c
-4o
rm 1 r
+13o
full fill c
rm o
full fill sh
-13
full fill sh
rm sh
rm r
+23r
rm o
+ 3 sh
+ 2 3 sh
77
Pragmatics: motivation
remove red
remove(hascolor(red))
78
Pragmatics: motivation
remove red
remove(hascolor(red))
remove cyan
78
Pragmatics: motivation
remove red
remove(hascolor(red))
remove cyan
remove(hascolor(red))
remove(hascolor(cyan))
remove(hascolor(brown))
remove(hascolor(orange))
78
Pragmatics: motivation
remove red
remove(hascolor(red))
remove cyan
remove(hascolor(red))
remove(hascolor(cyan))
remove(hascolor(brown))
remove(hascolor(orange))
78
[Golland et al. 2010; Frank/Goodman, 2012]
Pragmatics: model
Paul Grice
79
Pragmatics: results
50
40
online accuracy
33.3 33.8
30
20
10
0
No pragmatics Pragmatics
(all) (all)
80
Pragmatics: results
52.8
50
48.6
40
online accuracy
33.3 33.8
30
20
10
0
No pragmatics Pragmatics No pragmatics Pragmatics
(all) (all) (top 10) (top 10)
80
Summary so far
81
Summary so far
81
Summary so far
81
Outline
Properties of language
Distributional semantics
Frame semantics
Model-theoretic semantics
Interactive learning
Reflections
82
Three types of semantics
1. Distributional semantics:
Pro: Most broadly applicable, ML-friendly
Con: Monolithic representations
83
Three types of semantics
1. Distributional semantics:
Pro: Most broadly applicable, ML-friendly
Con: Monolithic representations
2. Frame semantics:
Pro: More structured representations
Con: Not full representation of world
83
Three types of semantics
1. Distributional semantics:
Pro: Most broadly applicable, ML-friendly
Con: Monolithic representations
2. Frame semantics:
Pro: More structured representations
Con: Not full representation of world
3. Model-theoretic semantics:
Pro: Full world representation, rich semantics, end-to-end
Con: Narrower in scope
many opportunities for synthesis
83
[Rajpurkar et al. 2016]
Reading comprehension
84
[Rajpurkar et al. 2016]
Reading comprehension
84
[Rajpurkar et al. 2016]
Reading comprehension
Team F1
MSR-A 82.2%
AI2 81.1%
Salesforce 80.4%
... ...
Log. regression 51.0%
84
[Rajpurkar et al. 2016]
Reading comprehension
Team F1
MSR-A 82.2%
AI2 81.1%
Salesforce 80.4%
... ...
Log. regression 51.0%
84
Dialogue
86
Takeaway 2/2
87
Open questions
88
Open questions
88
Open questions
88
Questions?
89