Professional Documents
Culture Documents
basic aspects of QA: question representation, document indexing, question-document matching, response production
The framework
pluri-disciplinary cooperation: Linguistics, psychology, artificial intelligence, computer science, logic, philosophy Huge problems with multilinguism. Interleaving of scientific and applied facets
The goal is not to reproduce how the human brain works, it is rather to simulate human performance: just needs to be able to do possibly as well as a human on (simple) tasks that involve language understanding and language production.
A paradox ?
We are potentially capable of generating/understanding an infinite number of situationally appropriate, meaningful utterances, extracting knowledge, etc. just from a few examples. In doing so, we refer to huge amounts of knowledge or subtle categories that we percieve, some of which is postulated to be innate, and some of which has been acquired over the course of our lifetime. We also use other cognitive faculties in understanding and producing language, such as the ability to reason about what weve heard or read.
compositionality
Run has an underspecified meaning: move along a trajectory crucial to capture this fact! Therefore: run to school, run from the store, etc Involve all the same meaning of run, but with various trajectories: tune exactly the semantic contribution of a word/concept. Compositionality should take care of isolated sense combinations, including modifiers. It should also consider syntax, in particular syntactic alternations. In a large number of cases, it is a monotonic operation, however some cases are more complex (they require type coercion, type shifting, or other inference forms)
Lexical descriptions
At syntactic as well as semantic levels, must remain generic: this must be an essential property, at least in NLP circles !. Then dedicated mechanisms operate when processing sentences to elaborate meaning, derived meanings (e.g.metaphors), or to deal with unexpected situations (e.g. metonymies, sense variations). Crucial aspects: selectional restrictions, semantic representations that support underspecification and allow for term combinations (types + lambda-calculus is a good solution). Define a sufficiently expressive language to deal with meaning. Depends on objectives (granularity to be adjusted): from basic roles to languages based on primitives (LCS), incrementally complex. Framebased languages are an intermediate step.
-Frames (Minsky, Shank, etc 70) - Semantic nets, graphs (Winograd, Sowa, 75) - Primitive based languages (Wilks, Jackendoff, 80) - Specialized languages: situational semantics, temporal semantics - Dedicated formalisms: DRT, etc. Implementations: logic-based forms, dependencies ?
Structure of WN
95.600 word forms (quite recently) 51.500 simple words 44.100 collocations 70.100 word meanings Wordnet Relations - Lexical relations (between word forms) Synonymy Antonymy - Semantic relations (between word meanings) Hyponymy/Hyperymy Meronymy/Holonymy Entailment
An example
S: (v) dance (move in a graceful and rhythmical way) "The young girl danced into the room" direct troponym / full troponym S: (v) glissade (perform a glissade, in ballet) S: (v) chasse, sashay (perform a chasse step, in ballet) S: (v) capriole (perform a capriole, in ballet) verb group S: (v) dance, trip the light fantastic, trip the light fantastic toe (move in a pattern; usually to musical accompaniment; do or perform a dance) "My husband and I like to dance at home to the radio" entailment S: (v) step (shift or move by taking a step) "step back" direct hypernym / inherited hypernym / sister term S: (v) move (move so as to change position, perform a nontranslational motion) "He moved his hand slightly to the right" derivationally related form W: (n) dancer [Related to: dance] (a person who participates in a social gathering .) W: (n) dancer [Related to: dance] (a performer who dances professionally) sentence frame Somebody ----s [Applies to dance] The crowds dance in the streets [Applies to dance] The streets dance with crowds
If the substitution of one for the other never changes the truth value of a sentence in which the substitution is made
Cruse: substitution possible in any context. A Synset is the set of word forms that share the same Sense in fact usually quasi-synonyms. But: Synsets do not explain what the concepts are, they just say that concepts exist.
An hyponym is a word whose meaning contains the entire meaning of another, known as the superordinate (weak isa relation).
Two words overlap in meaning if they have the same value for some (but not all) of the semantic features (yet to be defined ! described here within a paradigmatic perspective). Hyponymy is a special case of overlap where all the features of the superordinate are contained by the hyponym sister
Meronymy
A word w1 is a meronym of another word w2 (the holonym) if the relation is-part-of holds betwen the meaning of w1 and w2: ! Meronymy is transitive and asymmetric, ! A meronym can have many holonyms, Pb: cardinality, optionality of parts, levels of generality, inheritance ?. Ex. If beak and wing are meronyms of bird, and if canary is a hyponym of bird, then (by inheritance), beak and wing must be meronyms of canary.
! Limited transitivity (functional domain): Ex. A house has a door and a door has a handle, Then a house has a handle ?
Categories of words
Treated independently in WN, unfortunately !: Nouns ! Organised as topical hierarchies with lexical inheritance (hyponymy/hyperymy and meronymy/holonymy). Verbs ! Organised by a variety of entailment relations Adjectives ! Organised on the basis of bipolar opposition (antonymy relations), syno Adverbs ! organized e.g. according to scales and opposition Function words ! Currently omitted, stored separately as part of syntactic elements.
{act, activity} {animal, fauna} {artifact} {attribute} {body} {cognition,knowledge} {communication} {event, happening} {feeling,emotion} {food} {group, grouping} {location} {motivation, motive} {natural, object}
{natural phenomenon} {person, human being} {plant, flora} {possesion} {process} {quantity, amount} {relation} {shape} {state} {substance} {time}
adjectives
19.500 adjective forms -- 10.000 word meanings (synsets) Main Types: ! Descriptive adjectives (scalars or booleans) - Clusters based on antonymy - Used to modify/give attribute values of a noun ! Relevance: X is Adj presupposes there is an attribute A s.t. A(x) = Adj. ! Relational adjectives Similar to nouns used as modifiers ! Reference modifying adjectives, negative adjectives Ex. Former, alleged,...
Antonymy of adjectives
Two words are antonyms if their meanings differ only in the value for a single semantic feature: ! Dead/alive, above/below, hot/cold, fat/skinny, ! Binary antonyms (dead/alive: [+/- living]) ! Gradable antonyms (scalar, notion of scale) Hot,,warm,,cool,,cold
Antonyms w.r.t. a context: arrive: stay/leave charge: accept/contest (reversives)
Verbs
21.000 verbs word forms -- 13.000 are unique strings 8.400 word meanings (synsets) Includes phrasal verbs Divided into 12 semantic domains e.g.: verbs of: Body care, change, cognition, communication,
competition, consumption, contact, creation, emotion, motion, perception, possession, social interaction, and weather verbs.
Verbs cannot easily arranged into the kind of tree structure onto which nouns are mapped but they can be related by semantic relations like: - Entailment - Temporal inclusion - Causation
Lexical entailment
A verb V1 logically entails a verb V2 when the sentence Someone V1 (logically) entails the sentence Someone V2 . ! Ex. snore lexically entails sleep. ! The first sentence presupposes the second.
Negation reverses the direction of entailment: ! Ex. Not sleeping entails not snoring. Lexical entailment is a non-symmetric relation: ! Only synonymous verbs can be mutually entailing Ex. A defeated B and A beat B.
Troponymy
The troponymy relation between two verbs V1 and V2 can be expressed by the formula: ! To V1 is to V2 in some particular manner. Ex. Troponyms of communication: say, tell Encode the speakers intention like in Examine, confess, preach, ... Encode the medium of communication like in Fax, email, phone, telex.
! Troponymy is a particular kind of entailment: Every troponym V1 of a (more general) verb V2 also entails V2. The activity referred to by a troponym and its more general hyperonym are always temporally coextensive. snore is not a troponym of sleep (because of lack of co-extensive temporal inclusion).
Causation
The causation relation relates two verb concepts: ! causative (like give) ! resultative (like have). Constraints: (1) The subject of the causative verb usually has an object that is distinct from the subject of the resultative verb. (2) The subject of the resultative verb must be the object of the causative verb (which is therefore necessarily transitive). Causation is anti-symmetric: For someone to have something does not entail that he was given it. Causation is a specific case of entailment: ! If V1 necessarily causes V2, then V1 also entails V2. ! Causal entailment lacks temporal inclusion.
Semantic relations in WN
Selectional restrictions
The synset hierarchy can be used to define selectional restrictions by means of objects viewed as types (but this is not sufficient): componential analysis. Typing objects: simple types, complex types (e.g. dot objects), often need a simple logical language to express restrictions. Relational/ predicative terms: may have their own type, but also express the type of argument they expect: Eat: verb of consumption, NP1: human, NP2: concrete object + edible + solid. Contrast with drink: NP2 : + liquid
Unexpected situations
Unexpected situations abound: New usage of a term, Metaphor, and related images, Metonymy, Other kind of meaning change, e.g. via co-composition, co-predication, etc., where an argument affects the meaning of the predicate: several views, many debates !! These situations are very difficult to predict, they are no as regular and systematic as sometimes claimed, and they need specific forms of semantic interpretation.
Thematic roles
Large number of types, relates a predicate and one of its arguments. Granularity to adjust. Allows for partial parsing. Problematic to capture inter-argument relations.
Proto-roles
Defined by means of clusters of properties.
proto role agent: * volitional involvement * sentience or perception * causes an event or change of state * causes a movement Proto role patient: * undergoes a change of state * incremental theme * causally affected by another participant * stationary relative to movement
Frame Semantics
declarative representations: frames (Marvin Minsky) schemata (David Rumelhart) scripts (Roger Schank, Abelson) procedural representations (productions): conditionals that specify actions to be performed if certain conditions are met
a frame is to be understood as a cognitive structuring device frames characterise situations or states of affairs they are in principle independent of their linguistic realisation parts of frames are connected to specific words/constructions since verbs refer to whole situations, they are most closely associated with frames
verbs refer to whole situations, they focus on different aspects of the same frame: e.g. : buy, sell, pay, spend; all evoke the same frame to know the meaning of a verb, one has to know the frame as a whole: this brings together these verbs in one semantic group
thus, frames introduce a perspective on situations, even if the same arguments are realised differently syntactically: buy X from Y for Z: perspective of buyer sell X to Y for Z: perspective of seller
structure
the database serves as: a dictionary: definition (from the Concise Oxford Dictionary, 10th Edition, Oxford University Press, or a definition written by a FrameNet staff member) - tables showing how frame elements are syntactically expressed in sentences containing each word. - this includes a complete characterisation of the headwords grammar and combinatorial properties - annotated examples from the corpus - alphabetical indexes Also used as a thesaurus.
There are three layers of annotation on a tagged constituent: the frame element realisation consists of a frame Element (say, patient), -- a grammatical function (say, object) and a phrase type (e.g. NP) valence descriptions of predicating words are generalisations over such structures
FrameNet examples
Base for change_direction:
veer.v
Judgment_communication A COMMUNICATOR communicates a judgment of an EVALUEE to an ADRESSEE. Kernel: COMMUNICATOR (semantic type), EVALUEE (judgement on source) EXPRESSOR (body part that informs) MEDIUM (mode of expression: telephone, etc.) REASON, TOPIC Out of kernel: ADRESSEE, DEGREE, FREQUENCY, GROUNDS, etc. acclaim.n, acclaim.v, accusation.n, accuse.v, belittle.v, belittlement.n, belittling.n, blame.v,..
Annotated text:
Frame Identification and their meaning: It CANCapability be HOPEDDesiring that Spanish PRIME MINISTERLeadership Felipe Gonzalez will draw the RIGHTSuitability CONCLUSIONComing_to_believe from his NARROWClarity_of_resolution ELECTIONChange_of_leadership VICTORYFinish_competition Sunday . A STRONGExertive_force CHALLENGECompetition from the FAR . detail: Spanish ORIGIN Prime-Minister IDENT Felipe Gonzales etc MINISTER Felipe Gonzalez
VerbNet
Link syntactic classes, WN, propbank and framenet, applied to verbs (M. Palmer et ali. 97). 5245 verbs of English, about 5000 links, 237 main classes, 3412 links towards FrameNet. Extensions to Portuguese and Korean.
Example: abandon
Classe: Leave 51-2 (No. De sens) Roles: Theme [+animate] , Source [+location & -region] Frame: Transitive basically, locative preposition drop of "from" "We abandoned the area." syntax: Theme V Source semantics: motion(during(E), Theme) location(start(E), Theme, Source) not(location(end(E), Theme, Source)) direction(during(E), from, Theme, Source)
Stratifications de sens
Stratification 2:
[1.2.1] VIA UNDER from generic 'An entity X moving via under a location Y' X: concrete entity, ACTION: movement verb, Y: location with a form of passage under it representation: X : via(loc, under(loc,Y)) French synset: {par dessous} example: Jean passe par dessous le pont.
Language Realization
SFi
synset1
synset3
synsets ??
+ frequency
measures
Domain spcific
Open domain
Structured data
The Web
A Map:
old: The renewal:
And now:
question contents
Several levels: - question Conceptual category, - expected response type, - representation of question body.
Who invented the telephone ? In logical form: (entity, X: person, invent(X, telephone)).
On the response side, unify Q body with text fragments, but forms are quite often quite different
Vazquez coach of Johnny <person>, <role> of <entity> In Tchaikovskys Eugne Ongin. Prep <person>s <entity>
In TextMap:
1. Query generation:
How did Mahatma Gandhi die? Mahatma Gandhi die <HOW> Mahatma Gandhi die of <HOW> Mahatma Gandhi lost his life in <WHAT> 550 patterns, grouped into 105 equivalence classes 2. Response extraction: When was Mozart born? P=1 <PERSON> (<BIRTHDATE> - DATE) P=.69 <PERSON> was born on <BIRTHDATE>
What [Arg1: kind of nuclear materials] were [Predicate:stolen] [Arg2: from the Russian Navy]?
Text in which the response is found: [ArgM-TMP(Predicate 2): in 1/96], [Arg1(Predicate 2): approximately 7 kg of HEU] was [ArgM-ADV(Predicate 2) reportedly] [Predicate 2: stolen] [Arg2(Predicate 2): from a naval base] [Arg3(Predicate 2): in Sovetskawa Gavan]
Semantic dependencies: thematic roles versus LCS primitives, rhtorical operators (IRIT-ILPL)
(1) Thematic roles: What is The first The first university of Thailand Thme University of Thailand
Loc-Temp
University Of Thailand
Loc-spatiale
Response 1
(1) roles :
Th Th Th
(2) LCS : At+tem B p e ( Kasetsart University ) ( has been recognized as ( the first university of Thailand ) ) From+ Loc
Response 2
(1) roles: Loc Ide nt
(2) LCS + rhetorical structure : in + Valu Loc e (the first university in Thailand ) ( namely(Chulalongkorn university)) Be Elaborat
At+Tem
Response 3
(1) roles: Ag Th
Goal
(2) LCS + RST : At+tem in + At+tem Loc p p (When (the king Vajiravudh (Rama VI) founded the first university in Thailand) ( in commemoration of his father, King Chulalongkorn,))) Cause Goal Valu
Index: disease-name(bakanae), symptoms(kakanae, [list of major symptoms]), origin(diseased: bakanae, place: California, date: 1999), spreading(disease: bakanae, period: winter, medium: [soil, water), treatment(disease: bakanae, product: XX).
SYMPTOMS Symptoms of bakanae first appear about a month after planting. Infected seedlings appear to be taller, more slender, and slightly chlorotic when compared to healthy seedlings. The rapid elongation of infected plants is caused by the pathogen's production of the plant hormone, gibberellin. Plants with bakanae are often visible arching above healthy rice plants; infected plants senesce early and eventually die before reaching maturity. If they do survive to heading, they produce mostly empty panicles. COMMENTS ON THE DISEASE Bakanae is one of the oldest known diseases of rice in Asia but has only been observed in California rice since 1999 and now occurs in all California rice-growing regions. While very damaging in Asia, the extent to which bakanae may effect California rice production is unknown. As diseased plants senesce and die, mycelium of the fungus may emerge from the nodes and may be visible above the water level. After the water is drained, the fungus sporulates profusely on the stems of diseased plants. The sporulation appears as a cottony mass and contaminates healthy seed during harvest. The bakanae pathogen overwinters as spores on the coat of infested seeds. It can also overwinter in the soil and plant residue. However, infested seed is the most important source of inoculum. MANAGEMENT The most effective means of control for this disease is the use of noninfested seed. Also, when possible, burning plant residues with known infection in fall may help limit the disease. Research is under way to identify effective seed treatments. Field trials indicate that a seed treatment with sodium hypochlorite (Ultra Clorox Germicidal Bleach) is effective at reducing the incidence of this disease. Using a thoroughly premixed solution of 5 gallons of bleach to 100 gallons of water, seed is soaked for 2 hours, then drained and soaked in fresh water.
spread {germinate, flower, infest, etc} which are sub-events of spread. Then, the matching is done via as many of the above sub-events as possible, given that the disease has already been identified. You are looking for SqE (sequences of events) which are annotated below: In spring as the fields are flooded, chlamydospores float, <action> germinate, and produce other spore and mycelial stages </action>. <action> At flowering (heading), secondary airborne spores (sporidia) infect individual florets or kernels.</action>
So the response is the above text, where the relevant actions have been identified.