You are on page 1of 2

Linearization in Natural Language Generation: linguistic and implementation issues Ciprian-Virgil Gerstenberger, University of Troms, Norway ciprian.gerstenberger@uit.

no One of the challenges for Natural Language Generation (NLG) systems is presenting information in a exible, adequate way. One specic means of achieving context-sensitive output is to choose the most appropriate word order. In NLG, this is carried out by linearization, a process within surface realization (cf. [CR99]). Both linguistic theories and NLG face similar linearization problems, in addition, NLG has to cope with practical issues of computability (cf. [Pul09]). The rst question concerns the quality of atomic input items for linearization: are these inected words or lemmata plus a bundle of features? This amounts to a temporal ordering between syntax and inectional morphology. While different models are conceptually possible syntax comes before morphology (Minimalist Program [Cho95]); syntax comes after morphology (LFG [Bre01], HPSG [PS94]); syntax and morphology are synchronous (RCG [Cro01]) , parallel (Parallel Morphology [Bor88]) or alternating processes; or even that morphology is distributed among syntax, semantics and phonology (DM [HM93]), only those with a clear temporal sequential ordering have been implemented (e.g., LFG or HPSG). The Romanian this-NP shows different marking patterns depending on the relative position of the demonstrative with respect to the noun. To obtain only grammatical variants (ex. 1 and 5), both morpho-syntactic specication and relative positions are required. This fact speaks for linearization before inectional morphology. HPSG- or LFGbased NLG systems might not be able to generate all grammatical variants of a Romanian this-NP without explicit coding of linearization-relevant information in modules that are not supposed to handle linearization. Couched in a cross-linguistically motivated, dependency-based linearization model as proposed in [Ger07], it will be shown that assuming linearization before inectional morphology a modular constraint-based processing of the Romanian this-NP is possible. The second question concerns the size of atomic input items for linearization: are these as small as morphemes or as big as lexemes? Taking into account phenomena such as separable particle verbs in German, a linearization test is proposed. As sketched in [Ger07], assuming two items and at morpho-syntactic level in a specic language: if the language allows for both and , then these items are linearization primitives. Applied to Polish person-number markers in past tense (ex. 911), weak pronouns in Romanian (ex. 1214), separable particle verbs in German (ex. 2425), or phrasal verbs in English (ex. 2021), the test classies them as atomic items for linearization. This is not the case with a genuine sufx (ex. 2223). The third question concerns the complexity of linearization units: are these as complex as a nominal/verbal phrase, less or perhaps more complex? Considering what is called in traditional analysis discontinuous constituents as well as the macro-structural clause organization in Germanic languages (generally labeled as Topological Field Model), a permutation test for forming complex linearization units is proposed. In terms of Immediate Dominance/Linear Precedence (ID/LP), the linearization task can be formulated as follows: given an ID structure a dependency tree without explicit linearization information , nd all corresponding LP variants all grammatical output sequences. Using only horizontal rules (controlling the order among sibling nodes) and vertical rules (controlling the order between mother and daughter nodes) would not allow for a straightforward, exible linearization of, for instance, extraposed relative clauses in German (ex. 18 and 19). To this end, a further rule type is proposed: diagonal rules, rules that control the linearization between nodes (or node groups) that relate neither as mother-daughter nor as siblings to each other. The current work investigates basic questions related to linearization. The main claim is that in order to implement exible NLG systems linguistic phenomena have to be reconsidered and linguistic theoretical models should be consulted. However, the plausibility of a model and its implementability in an NLG system have to be weighed up against each other.

(1) acest om this man (2) *acesta omul this-def man-def (3) *acesta om this-def man (4) *acest omul this man-def

(5) omul acesta man-def this-def (6) *om acest man this (7) *omul acest man-def this (8) *om acesta man this-def

(9) Nie widzielimy tego. s not see-pst-m-pl-1pl this [We didnt see this.] (10) Tegomy nie widzieli. s this-1pl not see-pst-m-pl [We didnt see this.] (11) Mymy tego nie widzieli. s we-1pl this not see-pst-m-pl [We didnt see this.]

(12) S l faceti! [Do it!] a that it do-conj-2pl (13) S -l faceti! a that it do-conj-2pl (14) Faceti-l! do-imp-pl it

(15) (16) (17) (18) (19)

Peter hat gestern ein Buch, das sch n ist, gekauft. o Peter has yesterday a book that nice is bought Peter hat ein Buch, das sch n ist, gestern gekauft. o Peter has a book that nice is yesterday bought *Peter hat ein Buch gestern, das sch n ist, gekauft. o Peter has a book yesterday that nice is bought Peter hat gestern ein Buch gekauft, das sch n ist. o Peter has yesterday a book bought that nice is o Peter hat ein Buch gestern gekauft, das sch n ist. Peter has a book yesterday bought that nice is [Yesterday, Peter bought a nice book.]

(20) They call up John. (21) They call John up. (22) Sie sollte kochen. [She should cook.] (23) *Sie kochente.

(24) Sie will das Fenster aufmachen. she wants the window open make [She wants to open the window.] (25) Sie macht das Fenster auf. she makes the window open [She opens the window.]

References
[Bor88] Hagit Borer. On the morphological parallelism between compounds and constructs. In Yearbook of Morphology 1, pages 4565. Dordrecht: Foris, 1988. [Bre01] Joan Bresnan. Lexical Functional Syntax. Blackwell, 2001. [Cho95] Noam Chomsky. The Minimalist Program. MIT Press, 1995. [CR99] Lynne Cahill and Mike Reape. Component tasks in applied NLG systems. Technical Report ITRI-99-05, University of Edinburgh, 1999. [Cro01] William Croft. Radical Construction Grammar Syntactic Theory in Typological Perspective. Oxford University Press, Oxford, 2001. [Ger07] Ciprian Gerstenberger. A mereology-based general linearization model for surface realization. In Proceedings of the EUROLAN 2007 Doctoral Consortium, Iasi, Rom nia, 2007. a [HM93] Morris Halle and Alec Marantz. Distributed morphology and the pieces of inection. In Kenneth Hale and S. Jay Keyser, editors, The View from Building 20, pages 111176. MIT Press, Cambridge, 1993. [PS94] Carl Pollard and Ivan A. Sag. Head-Driven Phrase Structure Grammar. University of Chicago Press, 1994.

[Pul09] Geoffrey Pullum. Computational linguistics and generative linguistics: The triumph of hope over experience. In Proceedings of the EACL 2009 Workshop on the Interaction between Linguistics and Computational Linguistics: Virtuous, Vicious or Vacuous?, pages 1221, Athens, Greece, March 2009. Association for Computational Linguistics.

You might also like