You are on page 1of 31

&/,33 &KULVWLDQL /HKPDQQL LQHGLWD SXEOLFDQGD SXEOLFDWD

titulus

Interlinear morphemic glossing


huius textus situs retis mundialis http://www.christianlehmann.eu/publ/img.pdf dies manuscripti postremum modificati 06.10.2011 occasio orationis habitae volumen publicationem continens Booij, Geert & Lehmann, Christian & Mugdan, Joachim & Skopeteas, Stavros (eds.), Morphologie. Ein internationales Handbuch zur Flexion und Wortbildung. 2. Halbband. Berlin: W. de Gruyter (Handbcher der Sprach- und Kommunikationswissenschaft, 17.2) annus publicationis 2004 paginae 1834-1857

Interlinear morphemic glossing


1. 2. 3. 4. 5. 6. 7. Basic concepts Prerequisites of morphological analysis Principles of interlinear glossing Boundary symbols Typographic conventions Summary References

1. Basic concepts
1.1. Purpose

Given an object language L1 and a metalanguage L2, then an interlinear morphemic gloss (IMG) is a representation of a text in L1 by a string of elements taken from L2, where, ideally, each morph of the L1 text is rendered by a morpheme of L2 or a configuration of symbols representing its meaning, and where the sequence of the units of the gloss corresponds to the sequence of the morphs which they render. Its primary aim is to make the reader understand the grammatical structure of the L1 text by identifying aspects of the free translation with meaningful elements of the L1 text. The ultimate purpose may be to aid the reader in grasping the spirit of the language, to control the linguistic argument the author is making by means of the L1 example or to scan a corpus for a certain gloss in order to find relevant examples. (1) Latin exeg-i monumentum implement\PRF-1.SG monument.N:ACC.SG aer-e perennius ore.F-ABL.SG lasting:CMPR:ACC.SG.N 'I have executed a monument more durable than ore'

(1) illustrates the typical use of an IMG. The first line of (1) contains the L1 text line; the second line contains the IMG, and the third line contains an idiomatic translation into L2. Interlinear morphemic glossing is at the intersection of different communicative purposes. On the one hand, it is a kind of translation that accompanies the original. In this sense, it is comparable to the arrangement that one finds in synoptic editions of original and translation. On the other, it is a kind of linguistic analysis. In this sense, it competes with a fragment of a grammar. Its hybrid character leads to a number of problems and to different styles in interlinear morphemic glossing. The aim of the following treatment is a standardization of an aspect of linguistic methodology on the basis of widespread usage as developed in the 20th century. To the extent that linguistics is a science, its methods are susceptible and in need of standardization. Interlinear morphemic glossing has to do with the representation of linguistic data,

Lehmann, Interlinear morphemic glosses

comparable in this with a phonetic transcription. Just as the latter has been successfully standardized by the IPA, so interlinear morphemic glossing should be standardized. This will be done in the present article in the form of a set of rules, which are listed in section 6.1. Such a standardization only concerns linguistic science. Linguistic data are often presented to a lay public, with the purpose of education, entertainment or divulgation of the achievements of our science. Here some kind of interlinear glossing may be necessary, too. However, scientific formalism tends to damage rather than serve the good cause. An example how interlinear glossing has been handled in a book directed to a non-specialist public is quoted in the next section (Finck 1909). The present article is biased in favor of a more formalized treatment, on the assumption that it will be easier to derive a less formal representation from the proposals made here than the other way round. The treatment is, however, not fully formal, since it focuses on interlinear glossing in printed texts. In the annotation of texts by markup languages for automatic retrieval, the same conceptual problems, but very different technical problems arise which will not be dealt with here. Data are commonly quoted from sources in which they are already provided by an analysis. In linguistic publications, it has been wide-spread usage to quote data together with their IMG and their translation, even if their form or language is different from the one used in the quoting context. That is, such composite data representations have been treated as indecomposable blocks. Such scruples do not seem to be warranted. Primary data may be quoted and provided with the quoting author's analysis and translation (cf. Bickel et al. 2004:1).

1.2.

Precursors

Interlinear glossing has precursors in the descriptive tradition which link it up not with some kind of morphological representation, but with efforts to bring out the spirit of the language. The point there is not to provide a formal representation of a piece of linguistic data, but to render the language-specific construal of the world intelligible. To this end, literal translations were provided. For instance, G. Gabelentz (1901:460), in a passage arguing that the personal verb suffixes in Semitic languages are possessive pronouns, gives the following Arabic example: ya-kf-ka-hm er gengt dir gegen sie (eig. er-gengt-dein-ihr). The IMG is a late-comer in linguistics. Early grammars were intended as primers, the user was expected to work through them and learn the morphemes; so no glossing was necessary. Many scientific grammars, e.g. of Latin, Greek, Arabic etc., were meant for the initiated who needed no glossing either (not seldom even the free translations were spared). Even comparative studies, historical or typological, left the analysis of the examples of diverse languages to the reader. H. C. Gabelentz, in the middle of a discussion of Lule, Osage and other languages, presents the following passage: "Im Dakota (meine Grammatik der Dakota-Sprache 34) dient die 3 Pers. Plur. Act. dazu, das Passivum auszudrcken, sogar wenn ein Actor im Singularis hinzuzudenken ist, z.B. Jesus Jan e hi q ix Jordan watpa ohna baptizapi, Jesus kam zu Johannes und sie tauften ihn (st. er wurde getauft) im Jordanfluss." (H. C. Gabelentz 1861:465) Here the reader who does not have the grammar mentioned on his desk is given no chance.

Lehmann, Interlinear morphemic glosses

Pace Gabelentz, IMGs are needed when two conditions coincide: the level of analysis is above morphology, and the reader is not expected to be familiar with the languages under discussion (which is generally the case in typology, but not in descriptive or historicalcomparative linguistics). W.v. Humboldt (1836[1963]:534) invented his own device to help the reader identify L2 meaningful elements with L1 morphemes. He gives the following example from Classical Nahuatl: 1 2 3 4 5 6 7 8 9 1 3 2 4 5 6 7 8 9 ni- c- chihui -lia in no- piltzin ce calli ich mache es fr der mein Sohn ein Haus While dispensing with the IMG proper, this method fails for L1 elements which cannot be rendered by L2 words. Beside the literal translation illustrated above, G. Gabelentz (1901) uses a variety of techniques. He also has interlinear glosses, as when he says: Der Satz Ich bin Dein Sohn heit im Maya: a Dein meen Sohn en. ich, (Gabelentz 1901:383)

and occasionally (e.g. Gabelentz 1901:400) he uses Latin as L2 in IMGs. Finck (1909) is one of the first linguistic publications that illustrate the working of a language with a sizable text provided with a free translation and an IMG. The following sentence from his Turkish text (Finck 1909:83) illustrates his glossing style: xoda-da esbb-n dzmle-si-ni Meister=auch Kleider=(der) Gesamtheit=ihre=die ate-e vur-up yak-ar Feuer=zu werf=enderweise verbrenn=end Der Meister warf nun smtliche Kleider ins Feuer und verbrannte sie.

As may be seen, these forerunners have no grammatical category labels yet. Finck glosses Turkish n 'GEN' by Germ. der because this word displays a morphological trace of the genitive. Similarly, Turkish up 'GER' is glossed by enderweise, maybe the closest to a gerund that German can muster. This procedure is a tribute to the non-specialist readership that the booklet aims at, but necessarily falsifies the working of the language by attributing lexical meanings to its grammatical morphemes. It took a long time until interlinear morphemic glossing became firmly established. In Bloomfields Language, of 1933, examples abound, but they are presented like this: Some languages have here one word, regardless of gender, as Tagalog [kapatid]; our brother corresponds to a Tagalog phrase [kapatid na lala:ki], where the last word means male, and our sister to [kapatid na baba:ji], with the attribute female (Bloomfield 1933:278). IMGs that fulfill most of the requirements set out below appear first in the sixties of the 20th century. From the eighties on, they become standard in publications dealing with languages whose knowledge is not presupposed. Editors and publishers increasingly require them even for languages like Latin, French and German that used to be well-known to linguists. The development is towards (not only providing translations for, but even) glossing every language except English. This is apparently a symptom of a global development in

Lehmann, Interlinear morphemic glosses

which every language except English becomes exotic. Good IMGs are relatively costly, both for the scientist and for the typesetter. Authors and publishers are therefore not too eager to produce them (well). There is at least one software on the market that aids the linguist in generating systematic IMGs for his texts, the interlinearizer that comes with the program Shoebox, from the Summer Institute of Linguistics (cf. Simons & Versaw 1988; Art. 168). Since IMGs are fairly recent in linguistics, they have seldom been treated by linguistic methodology. The first treatise of the present subject is Lehmann (1982). Subsequent work includes Simons & Versaw (1988), Lehmann et al. (21994), Lieb & Drude (2000), Bickel et al. (2004). They have been freely made use of in the present treatment.

1.3.

Levels of representation

Interlinear morphemic glossing must be seen in the larger context of representation of linguistic data and, even more comprehensive, of the documentation of a language (cf. Lieb & Drude 2000). On such a background, an isolated example given in a descriptive context is a particularly constrained case of the edition and annotation (also called markup for technical purposes) of a piece of primary linguistic data for posterity. In other words, a general-purpose edition of a linguistic corpus is a kind of maximum model, subject to the full set of rules for explicitness, detail and elaboration, from which the quotation of an isolated example in the context of some grammatical discussion represents a subset delimited by considerations of feasibility, usefulness and the like. Every linguistic representation of some piece of raw data, even if it limits itself to a phonetic transcription, involves some linguistic analysis (Lehmann 2004). Insofar, no sharp boundary is to be drawn between the sheer representation of data and their analysis. Bearing this in mind, we can speak of various levels at which linguistic data may be represented. Presupposing spoken language data, at least the following are relevant: (a) (b) (c) (d) (e) (f) (g) raw data recording (video or audio tape), phonetic transcription of the utterance, orthographic representation of the utterance, morph(ophon)emic representation of the utterance, IMG of the utterance, free translation of the utterance into the background language, descriptive and explanatory comment on pragmatic or cultural aspects of the utterance.

This set may be supplemented by even more representations (cf. Lieb & Drude 2000). There may be a phonological representation distinct from both levels (b) and (d). There may be a syntactic representation, e.g. in the form of a labeled bracketing. And there may be a semantic representation instead of, or in addition to, representation (f). In such representations, the portion of linguistic analysis is probably even stronger than in the seven levels enumerated. The raw data have a temporal structure which is projected onto a spatial line in written representations. These representations are synchronized more or less closely. For instance, representation (f) generally matches L1 sentences, units of level (g) may be associated with L1 units of any size, and representation (e) may match representation (d) morpheme by morpheme. This has different consequences for the typographic layout. For instance, units of

Lehmann, Interlinear morphemic glosses

level (g) may be associated with the running text by making full use of a multidimensional display, while representation (f) may be in a lateral column at the same height as its original, as is usual in synoptic editions and also practiced in the example from Finck (1909) given in section 1.2. Other representations should be arranged in lines one of which is beneath the other and runs in parallel with it. For the purposes of descriptive and typological grammatical analysis and exemplification, the seven-level set is generally reduced to only three. What may be called the canonical trilinear representation of linguistic examples involves: a representation of L1 at one of the levels (b), (c) or (d), an IMG in L2 (level e), an idiomatic translation into L2 (level f). An IMG will seldom be paired with a phonetic representation, because this serves phonetics, while an IMG serves grammar. They therefore form an unequal pair. If both are required, they will generally be mediated by another representation, a morphophonemic or orthographic one. It makes a difference for the glossing whether L1 is rendered in a morphophonemic representation or in conventional orthography. In the former case, the rules of orthography do not apply, and the linguist may dress up the representation in such a way that a biunique mapping onto the IMG is facilitated. In the latter case, morpheme boundaries may be obscured by the orthography, and there will be delimiters such as blanks, hyphens and punctuation marks which do not necessarily represent grammatical boundaries and may interfere with the glossing. However, the choice between an orthographic and a scientific representation of a text is generally a higher-order choice which cannot depend on glossing requirements. In particular, an example may be quoted unchanged from a primary source (think of Sanscrit examples). It may then not be possible to insert boundary symbols and the like in the L1 text. Glossing conventions therefore have to be adjusted to use with orthographic representations. If the first line representing the L1 text differs too much from a morphophonemic representation, then it is advisable to expand the canonical trilinear representation by an additional morphophonemic representation. It will then be this line that the IMG refers to. The two languages involved will be called L1 and L2 throughout. However, it should be clear that the relationship between them is asymmetric: L1 is the object language, L2 is the metalanguage. The symbols occurring in an IMG have a different status from the elements of the text line that they gloss: For present purposes, the L1 text line consists of morphs, while the IMG consists of names of L2 morphemes and of grammatical categories (cf. section 3.2). There can, thus, be no question of mirroring the structure of the L1 expression by the sequence of the L2 elements. Instead, an element in an IMG serves as a kind of mnemonic hint to the meaning or function of its corresponding L1 element.

1.4.

Delimitation

The complete set of representations rendering an L1 text may be sufficient to derive a grammatical description from it (as postulated in Lieb & Drude 2000, 1.1). However, given its inherent restrictions, an IMG cannot by itself compensate for a grammar (or just a morphology). Apart from the form of presentation, the most important substantive difference between a grammatical description and an IMG lies in the fact that the grammar treats of

Lehmann, Interlinear morphemic glosses

categories in the sense of classes, while the IMG identifies individual morphemes. For instance, a grammar treats of the verbal category of aspect. An IMG contains a gloss for an individual aspect morpheme, e.g. PERF, neglecting the question of whether this is actually an aspect morpheme or rather a tense morpheme, and also leaving unanswered questions concerning other members of the paradigm, let alone the construction and use of the PERF morpheme. Some of these kinds of information may be given in other representations, e.g. a syntactic representation. By the same token, the IMG does not indicate the syntactic category of a word form. For instance, the IMG of Germ. laufend is run:PART.PRS, showing that the form contains a morpheme whose function it is to mark a present participle. The gloss is not run(part.prs) or anything of the sort, meaning that laufend is a present participle. While the latter is true, it is not the task of an IMG to give this information. Moreover, the type of morphological unit is not an object of an IMG. Thus, concepts like stem, root, prefix do not appear in IMGs. Such information may, to a large extent, be inferred from a proper IMG, since the gloss of a root differs typographically from the gloss of a grammatical formative. Similarly, an IMG cannot replace a lexicon. Here again, elements appearing in an IMG are but names of elements appearing in the L1 line. They are not meant to exhaust the meaning of such an element. Finally, an IMG is not meant to replace an idiomatic translation. Thus, it cannot and should not render closely the sense of an L1 item in the given context. An IMG is regularly accompanied by a free translation which fulfills precisely this purpose.

2. Prerequisites of morphological analysis


Interlinear glossing might appear to be just an elementary form of representing data. As a matter of fact, it presupposes a morphological analysis. The following analytic problems are directly reflected by the glosses.

2.1.

Unmarkedness and zero morphemes

Where the L1 text contains a morph, the IMG contains an element rendering it. Where the L1 text contains nothing, the issue of rendering it is complicated by markedness theory. Germ. Herr may be glossed by master or by master(NOM.SG). Latin mone-t may be glossed by warn-3.SG or by warn(IND.ACT)-3.SG (according to R16). Moreover, one may believe that such forms contain zero morphemes and put thus: Herr- master-NOM.SG, mone---t warn-IND-ACT-3.SG. All of these IMGs are formally correct. The choice among them is not a matter of appropriate glossing, but of morphological theory. For interlinear glossing, only the general rule R1 is relevant.

2.2.

Allomorphy

If the L1 representation to be glossed corresponds to standard orthography, the analyst has no decisions to make in its regard. Otherwise, a good option for the representation (as well as for

Lehmann, Interlinear morphemic glosses

any writing system) is a morphophonemic representation which steers a middle course as far as allomorphy is concerned: Phonologically conditioned allomorphy is resolved (ignored), morphologically conditioned allomorphy is not resolved (is rendered). The IMG, on the other hand, shows morphemes, not allomorphs. In order to understand what this implies, consider three examples. Modern Yucatec Maya expresses completive and incompletive aspect by suffixes on transitive and (one conjugation class of) intransitive verbs as follows: aspect completive valence transitive intransitive -ah - incompletive -ik -Vl

Tab. 169.1: Aspectual suffixes in Yucatec Maya For instance, t-u hats-ah PAST-SBJ.3 beat-CMPL (he beat it). One might think that the table contains four morphemes. Actually, however, transitivity is inherent in the verb stem and conditions allomorphy in the aspect suffix. The conditioning factor should not make part of the gloss. That is, the correct gloss for -ah is not TR.CMPL, but simply CMPL. See also 4.5. Yucatec Maya also has personal clitics that precede nouns as possessive crossreference markers and verbs as subject cross-reference markers. If the noun or verb starts with a vowel, a glide is inserted in its front. The choice between the two glides w and y is morphologically conditioned: If the pronoun is of first person singular or of second person, it is w; if the pronoun is of third person, the glide is y. For instance, in watan POSS.1.SG :wife (my wife), u yatan POSS.3.SG :wife (his wife). It is also possible to regard the noun forms modified by the initial glide as stem allomorphs, in which case the glide would not even receive the gloss by . However, in the third person, a pronominal clitic followed by the glide can be omitted. Thus, yatan by itself means his wife. (Historically, the glide is indeed a reflex of an older cross-reference marker). We therefore have u y-atan POSS.3 -wife ~ yatan POSS.3-wife, and we face the problem that the same element is not even a morph in one context, but a full-fledged morpheme in another. Whatever the correct morphological analysis may be, the IMG presupposes it and brings it out. Last, consider gender marking in a language such as Latin (cf. Art. 48). Puellae bonae means good girls, pueri boni good boys. Apart from motion, gender is inherent in a noun stem. It is, however, recognizable by the declension suffixes. Nevertheless, the gloss of the morph in question does not contain the conditioning category. The noun forms will be glossed girl.F:NOM.PL, boy.M:NOM.PL, implying that gender is a category of the stem, not of the suffix. What about the adjectives? Gender is not inherent in an adjective stem. We may therefore gloss them by good:NOM.PL.F and good:NOM.PL.M. Then one and the same element would be a morpheme on adjectives, but a conditioned allomorph on nouns, and therefore it would get two different glosses. Since two different glosses for the same element are not admissible in interlinear glossing (R4), this would entail that there are two homonymous declension suffixes -ae in Latin, which is obviously undesirable. We may stop this consideration here, since the problem is obviously not one of glossing, but one of morphological analysis. R2 codifies the convention that IMG expressions represent morphemes, not allomorphs.

Lehmann, Interlinear morphemic glosses

3. Principles of interlinear glossing


3.1. General

In the canonical trilinear representation, one L1 text line is matched by two L2 lines, the IMG and the free translation. This entails a division of labor between the two L2 representations. The free translation is the idiomatic semantic equivalent of the L1 line, the IMG is a representation of its morphological structure. There is consequently no need for the translation to be particularly literal, just as there is no need for the IMG to repeat the morphemes that appear in the translation. For instance, a polysemous L1 item will be rendered by its contextual sense in the free translation, but by its basic meaning in the IMG (R8). Unnecessary parallelism between the two L2 lines is redundant; the trilinear canonical representation offers an occasion to provide additional information. In principle, the degree of detail displayed in an IMG depends on the purpose the example with its gloss is meant to serve. However, the author cannot foresee the purposes to which others will want to use his examples. A morphological detail that is not at stake in the current discussion may be essential for the argument another linguist may wish to base on the example. For this reason, the principle is to allow for as much precision and detail as seems tolerable (R3). The following rules specify the properties of a complete IMG. They do not exclude less detailed IMGs where they suffice. Cf. R13 and R23 for possibilities of underspecifying morphological structure. The IMG of a morpheme is some sort of name for it, a name that alludes to its meaning or function and is insofar mnemonic or, at least, more helpful to the non-specialist than the L1 morph itself. It must therefore have a certain recognition value. R4, which actually is a tightening of R1, therefore requires that given a particular L1 morpheme, its gloss will be the same in all contexts; and apart from full synonymy, no two morphemes of L1 will have the same gloss. These points will be elaborated in the following subsections.

3.2.

Glossing vocabulary

Glosses are taken from a language L2 that serves as a metalanguage of L1. L2 is based on a natural language in this article, English , but with far-reaching deviations from natural language use. The glossing vocabulary consists of the following kinds of symbols: vocables: L2 morphemes and stems grammatical category labels boundary symbols. The difference between the two kinds of vocables is the following: Morphemes and stems are taken from natural L2 vocabulary and are meant to be translation equivalents (in a sense to be made precise below) of L1 items. For instance, the notation Germ. Schreib-tisch write-table (desk) is to be interpreted thus: The German word form Schreibtisch desk consists of two morphs, of which schreib- means write and tisch means table. Grammatical category labels, on the other hand, are taken from scientific terminology and are meant to categorize the function of L1 items. For instance, Germ. schreib-en write-INF (write (inf.)) is to be interpreted thus: The German word form schreiben write (inf.) consists of two morphs, of

Lehmann, Interlinear morphemic glosses

which schreib- means write, while en is an infinitive marker (that is, -en does not mean infinitive; it is the German word Infinitiv which means infinitive). To bring out this essential difference between the two kinds of IMG vocables, L2 morphemes and stems are written in straight orthography, while grammatical category labels are written in (small) capitals (R29). A grammatical category label represents (i.e. is the name of) the value of a grammatical category (the latter being taken, technically, as a parameter or attribute). For instance, the label ACC is the name of the value accusative of the morphological category case. Just as a grammatical category label is a name of a value of a grammatical category, what is called L2 morphemes and stems are actually names of L2 morphemes and stems. In the following, we will abide by the simpler way of speaking. The choice and use of vocables are treated in the following subsections; boundary symbols are treated in section 4.

3.3.

Lexemes

An L1 lexeme is, in principle, glossed by an L2 lexeme (R5(a)). Sometimes more than one L2 word is necessary, for instance in Germ. fabulieren invent.stories. However, profusion is to be avoided. Adjectives that do not require a copula in predicative function are often glossed by adding a copula, e.g. West Greenlandic anurli windy is glossed as be.windy in Fortescue (1984:65). This is only correct if a word of this class requires an attributor in attributive function. Otherwise it wrongly implies that there is no difference between adjectives and verbs, and it tends to obscure the fact that the language does not use a copula with adjectival predicates. L1 cardinal numerals are glossed by Arabic numbers. An issue arises for proper names, which are often not glossed at all. However, there is no room here for an exception to the general rule: a proper name is rendered by its counterpart in L2. Some proper names have conventional counterparts that are specific to L2; Engl. John corresponds to Germ. Hans, and Engl. Munich corresponds to Germ. Mnchen. These then appear in the IMG. Whenever there is no such language-specific convention, the counterpart of an L1 name is usually the same word in L2. If L2 is English, no problem arises for the form in which L2 lexemes are quoted in the IMG. In other languages, lexemes have a citation form in conformity with L2 conventions. If this is an inflected form, like the nominative for nouns or the infinitive for verbs, then it is excluded from an IMG by R5(b), and instead the bare stem must be used. The reason is that such a gloss would seem to imply that there is a nominative, or an infinitive, in the L1 line where actually just a stem is being glossed.

3.4.

Grammatical formatives

L1 morphs are, in principle, glossed by citation forms of L2 morphemes. However, interlinear morphemic glossing crucially revolves around grammatical properties of L1 items. These will differ between L1 and L2. Even if, in a number of cases, the L2 stem appearing in a gloss has the same grammatical properties as the L1 morph that it represents, this cannot be expected and therefore not be relied upon. For instance, Latin eum could be glossed by Engl. him, and at the typological level, they do share a number of features. However, eum is accusative and

Lehmann, Interlinear morphemic glosses

10

can thus not be indirect object, while him is the form for direct and indirect object. Therefore, grammatical items of L1 are generally not glossed by grammatical items of L2, but by a configuration of symbols taken from the scientific metalanguage and representing their grammatical features, i.e. by grammatical category labels (R6). Thus, Latin eum may be glossed by ANA:ACC.SG.M. No bound grammatical or derivational morphemes should appear in IMGs. Free grammatical morphemes may be used to render free grammatical morphemes. However, use of those in the second column of Tab. 169.2 is discouraged unless L1 happens to exhibit the same ambiguity as English: word class copulas, auxiliaries prepositions instead of be have (except to mean possess, own) by with for as from to of that if that who which use
COP, PASS, PROG ... PF, OBLG ... AG, ERG ... INST, COM, ASSOC ... BEN, DEST ... EQT, ESS ... ABL, DEL ... DAT, ALL, DEST, TERM, INF ... GEN, ASSOC ... COMP, SR (, D3) INT, COND.SR REL REL.HUM.NOM ... REL.NHUM.NOM ...

subordinators relativizers

Tab. 169.2: Free grammatical morphemes Some morphemes are extremely deeply entrenched in the semantic or pragmatic system of the language and simply have no translation equivalent in L2. Two common ways out are a) to repeat the significans of the item in the gloss, and b) to indicate the class of the item instead of its meaning. Thus, we find the German modal particle eben glossed either as EBEN or as PTL. Both glosses are inadequate. If there is no translation equivalent in natural L2, then the linguist has a specialized metalanguage to describe such functions. For the sake of an IMG that is not devoted to modal particles in particular, a gloss like REAFF (reaffirmed) will be fully sufficient and more helpful than either of the aforementioned. A gloss is a proper name of an L1 morpheme. It does not give information on the grammatical class of the morpheme in question other than what is implied by the name itself. If a gloss is ACC, one assumes that the morpheme belongs to the grammatical class of the case morphemes. It is the task of the grammar to clarify whether or not this implication is correct in a particular case. The gloss will not be CASE.ACC or anything of this sort. For the same reason, the gloss of the perfective aspect is simply PFV and not PFV.ASP, and so on. From this it follows that the gloss will not be ASP either. In the literature, one frequently encounters glosses such as PTCL (particle), AGR (agreement), ART (article). If L1 possesses only one particle, agreement morpheme (hardly imaginable) or article (this is possible), then these glosses are sufficient. In all other cases, this kind of gloss is not helpful because it does not give the information on the meaning or function of the morpheme that a

Lehmann, Interlinear morphemic glosses

11

gloss is supposed to give. Moreover, the whole glossing becomes inconsistent, as some glosses name particular morphemes, while others name the class a morpheme belongs to. More on this in section 3.9.1.

3.5.

Ambiguity

Each morpheme of L1 should be recognizable by its gloss. The reader is supported in this task if glosses are consistent within one publication. It will rather confuse him if Yucatec Maya kin is once glossed sun and the next time day. Polysemy is resolved in the idiomatic translation. The gloss renders neither the contextual sense nor the full meaning range of an item. Naturally, this does not apply to homonymy. Homonymous L1 morphs represent different morphemes and therefore receive different glosses. This is stipulated by R7, which follows from R4. If the senses of an item are reducible to a Gesamtbedeutung, then this should be used in the gloss (R8). For instance, the Turkish dative/allative suffix a is glossed by DAT. The Gesamtbedeutung rather than the Grundbedeutung should appear in the gloss, because it has better chances to fit all the diverse contexts in which the item occurs. Sometimes, there is either no Gesamtbedeutung, or if there is, L2 does not have a term for it. In cases like Yucatec Maya kin sun, day, there are various alternatives. First, the Grundbedeutung may be used as the gloss; thus Yucatec Maya kin sun. However, if all the occurrences of a polysemous morpheme in a particular publication reflect the same (derived) reading, then generally no useful purpose is served if it is consistently glossed by its basic meaning. For instance, all the occurrences of Yucatec Maya kin in a particular text might mean day. Then this would be the appropriate gloss. Finally, any kind of reduction may seem misleading. Then two or even more senses may be indicated in the gloss, separated by a slash, e.g. Yucatec Maya kin sun/day. (2) illustrates the same convention. (2) Korean kae-hako cal Toli-nn Toli-TOP dog-ADD often/well Toli likes to play with the dog.

non-ta. play:PRS-DECL

Syncretism often involves extensive polysemy and/or homonymy. If it were to be made explicit in an IMG, then e.g. the gloss for Lat. ancillae would have to be maid.F:GEN.SG/DAT.SG/NOM.PL. This may be appropriate if the discussion in the context deals with syncretism. Otherwise, only the category actually required by the context may be shown, e.g.: (3) Latin ancillae maid.F:NOM.PL the maids pray

orant pray:3.PL

In other words, in cases of syncretism the last two bullet points of R8 must be resorted to. A whole paradigm of markers may be used in two clearly distinct functions. For instance, a set of cross-reference markers may combine with a verb to reference its subject,

Lehmann, Interlinear morphemic glosses

12

and with a noun to reference its possessor. Here again, the two alternatives mentioned are open: either gloss the verb markers by SBJ and the noun markers by POSS, or gloss them by SBJ/POSS in both positions (which is, actually, never done). A third alternative one that is actually resorted to in Mayan linguistics; cf. Art. 170, section 6.1.2 is to coin a concept and a term for a paradigm that is used in these two functions and use this in the IMG.

3.6.

Features and functions

As remarked in section 1.4, an IMG cannot fill the place of a grammar. In particular, the grammatical category label that represents a morpheme in the gloss cannot possibly represent the full functionality of that morpheme. It can only serve as a mnemonic identifier for the reader. We just saw that the full polysemy of an item cannot be accounted for in a gloss. The same goes for functional information associated with a morphological position. If the slot filler is a verb agreement affix or cross-reference marker, then its meaning is in the sphere of person, number and gender. Consider conjugation endings as in Germ. lieb-e loveSBJ.1.SG, lieb-st love-SBJ.2.SG, lieb-t love-SBJ.3.SG. The information that these suffixes cross-reference the subject is functional information associated with the morphological slot. It must be given in the grammar; the IMG may simply read lieb-e love-1.SG etc. The same would apply, in principle, if the verb cross-references more than one of its dependents. Here, however, it has become customary to distinguish the references of the cross-reference markers by indicating their syntactic function, as in (4). (4) Swahili ni-li-mw-ona SBJ.1.SG-PST-OBJ.CL.1-see I saw the/a child

m-toto CL.1-child

The information that the initial prefix references the subject, while the one following the tense prefix references the direct object must be contained in the grammar. The task of the gloss is to identify the particular element, not to specify the rules of its use. Insofar, adding functional information concerning the morphological slot itself SBJ and OBJ in (4) is a service to the reader that may be useful, but that also clutters up the gloss (cf. R3). The distinction between morphological categories and syntactic or semantic functions is also relevant in the domain of case and valence. The frequent confusion among syntactic/ semantic functions, cases and valence-derivational functions also manifests itself in glossing habits. One frequently encounters glosses such as Turkish ate-in fire-POSS instead of fireGEN, ate-e fire-IO instead of fire-DAT or ...-send-DAT ... instead of (5). The quality of the glossing reflects the quality of the morphological analysis. (5) Swahili Musa a-li-ni-andik-ia Musa SBJ.CL.1-PST-OBJ.1.SG-send-APPL Musa sent me a letter

barua letter

Lehmann, Interlinear morphemic glosses

13

3.7.

Derived stems

The morpho-semantic structure of a derived stem may be completely regular and transparent, as in Germ. wolk-ig cloud-ADJVR (cloudy), or it may be opaque, as in Germ. heil-ig salvation-ADJVR (holy). If the discussion focuses on word-formation, then both of these words will be glossed as indicated. If the internal structure of stems is of no relevance, then it will not be shown in the L1 text line, and consequently the glosses can reduce to cloudy and holy, respectively. For opaque complex stems, morphological segmentation plus corresponding gloss often amounts more to etymology than to morphological analysis. It also unnecessarily obscures the correspondence of the gloss to the idiomatic translation. This should be borne in mind before one carries it through as a general principle in text editions. In an ideal methodological situation, an IMG is taken from a lexicon, where the gloss constitutes one of the fields in the microstructure of each lexical entry. The German lexicon may contain, e.g., the three entries Huf 'hoof', Eisen 'iron' and Hufeisen 'horse-shoe'. If the latter occurs in an L1 text, then it may either be analyzed or not. In the former case Huf and Eisen will be looked up in the lexicon and will be matched by their glosses, while in the latter case Hufeisen will be looked up and be glossed accordingly.

3.8.

Submorphemic units

There are two kinds of submorphemic units: parts of morphemes with a sound-symbolic value and strings of phonemes inserted between morphemes for euphonic or similar reasons. The former kind is not generally subjected to morphemic analysis and may therefore be left out of consideration here. The latter kind may be illustrated by the second element in forms such as French a-t-il has he and Germ. Weihnacht-s-gans Christmas goose. If the submorphemic unit is not at stake in the context, then the first choice is to abstain from an analysis by regarding the submorphemic unit as part of a stem alternant: Weihnachts-gans Christmasgoose. The second choice is to render the submorphemic unit by , e.g. a-t-il has--he. A euphonic submorphemic unit may be glossed by EU instead of .

3.9.
3.9.1.

Grammatical category labels


General

As was said in 3.4, the gloss for a grammatical item is generally not a grammatical item of L2, but a grammatical category label (R6). For instance Yucatec Maya yan is not rendered by be, but by EXIST, one of the reasons being that L2 be is a copula, while Yucatec Maya yan is not. While this poses few problems for such categories for which the European grammaticographic tradition possesses terms, it does pose a problem for certain classes of semi-grammaticalized items such as function verbs and coverbs. Coverbs are words which are grammaticalized from verbs to minor parts of speech, mostly adpositions. If they function as the latter, they may express a semantic role. In Mandarin, for instance, yng has the lexical meaning use and the grammatical meaning INSTR, as in (6).

Lehmann, Interlinear morphemic glosses (6) Chinese yng shu Ta# he use/INSTR hand He walks on his hands.

14

z u walk

l. road

This kind of problem is not solved by putting the lexical meaning in upper case (USE), since use is neither a grammatical concept in L2 nor a term of the grammatical metalanguage. Applying R8 in such cases would imply opting in favor of the Gesamtbedeutung of the item, which in such cases is the grammatical meaning. The gloss would then be INSTR (or some more language-specific grammatical category which may better suit this particular function). The problem remains, however, that the same word can occur as the sole predicate of a clause, in the meaning use (e.g. t yng shu he uses his hand). An IMG INSTR would be hardly intelligible there. The alternative of only using the Grundbedeutung use in (6) and throughout would be in conflict with the principle that morphological analysis must be kept distinct from etymology. Here the third alternative offered by rule R8 may be resorted to, viz. providing both meanings in the gloss of each occurrence of the item, thus: yng use/INSTR. An IMG identifies an L1 morpheme. It names a value, not a parameter. Mentioning the name of the generic category in the gloss instead of the specific value is nevertheless widespread usage. One finds both Japanese yom-i and yon-de glossed by read-CONV (converb), which hinders the reader in his attempt to keep the converb forms apart. One finds Onondaga waha-yekwa-hn:-nu he bought tobacco glossed as TNS:he/it-tobacco:buyASP (Woodbury 1975:10), which is of no use for somebody studying the interdependence of incorporation with tense and aspect. IMGs not seldom contain labels that do not correspond to the principles introduced so far. Sometimes, elements without morphological status are separated and glossed. Sometimes, the parameter instead of the particular value of a grammatical category is identified. Sometimes, syntactic or semantic instead of morphological information is given. Here is an incomplete list of labels that have repeatedly been found in glosses but which should be avoided. label
A ADV AGR AGT ART ASP AUX CARD CLF CLT EP EVID PAT PREP

intended meaning transitive subject adverb agreement agent article aspect auxiliary cardinal classifier clitic epenthetic evidential patient preposition

comment in morphemic glosses, the abbreviation is ERG specify meaning specify agreement categories this is not a value of a morphological category only if it has no determinative properties specify particular aspect only if there is only one auxiliary morpheme in the language only if it is a morpheme or grammatical feature this is a word class this is neither a morphological category nor a value of one has no morphological status, should not be separated in the first place specify particular evidential this is not a value of a morphological category this is a word class

Lehmann, Interlinear morphemic glosses


PTL TNS

15

particle tense

this is (at best) a word class specify particular tense

Tab. 169.3: Labels to be avoided 3.9.2. List of grammatical categories and their glossing labels No list of grammatical category labels can be complete. The list following in Tab. 169.4 (which incorporates the list in Lehmann et al. 21994) only contains the most widespread categories. When more than one abbreviation is mentioned, they are given in the order of preference. To the extent that these abbreviations are or become wide-spread, they get the status of linguistic abbreviations like NP, which need not be defined when used. If a publication uses labels not contained in the following list, it must explain them in an individual list of abbreviations. Grammatical category labels are subject to two conflicting requirements: they must be both distinct and short. The former requirement takes precedence. It is, for instance, not possible to use COMP in one and the same publication to mean both completive and complementizer. The list in Tab. 169.4 avoids such clashes. However, in an individual publication that has nothing to do with complementation, the aspect may, of course, be abbreviated by COMP (instead of CMP(L), as in the list). Parenthesized parts of an abbreviation are only necessary if a distinctness conflict arises. Tab. 169.4 contains only such terms which may appear in an IMG. In other publications, similar lists of terms for syntactic categories and functions and for semantic and pragmatic functions may be found. Cross-reference position means a morphological slot, usually on a verb, occupied by pronominal elements that agree with or refer to a dependent in a specific syntactic function. Case means a case relator that may take the form of a case affix or an adposition. Verb derivational morphemes get these glosses only if they are homonymous with nominal case relators.
value 1st person 2nd person 3rd person abessive ablative absolute absolutive abstract accusative action nominalizer active actor actor topic additive addressee-honorific addressee-humble adelative abbrev. 1 2 3 (PRV) (AVERS) ABL
ABSL ABS ABSTR ACC ACNNR ACT ACR A ADD

category person person person

comment

use privative and aversive local case nominal grammatical case or crossreference position nominal grammatical case deverbal nominal derivation voice; case or cross-reference position grammatical case or cross-reference position voice case honorification honorification local case from (= separative) free non-incorporated form of noun in ergative system

in active system

2HON 2HML
ADEL

Lehmann, Interlinear morphemic glosses


adessive adhortative aditive adjectiv(al)izer admonitive adverbializer adversative affirmative agent nominalizer agentive alienable allative allocutive anaphoric andative animate anterior anticausative antipassive aorist applicative apprehensional assertive associative assumed attenuative attributor auditory augmentative auxiliary benefactive cardinal caritive causative circumstantial clamative classifier cohortative collective comitative common comparative complementizer completive conative concessive conditional
ADESS

16

local case use hortative use allative derivational or syntactic mood derivational or syntactic interpropositional relation opposite to negative deverbal nominal derivation possessive attribution morpheme local case honorification pronominal deictic tense deverbal verb derivation voice tense-aspect deverbal verbal derivation interpropositional relation modality adnominal case evidential deverbal verb derivation nominal evidential denominal nominal derivation case numeral deverbal verb derivation interpropositional relation nominal

(HORT) (ALL)
ADJR ADM ADVR ADRVS AFFMT AGNR AG AL ALL ALLOC ANA AND AN ANT ACAUS APASS AOR APPL APPR ASRT ASS(OC) ASSUM ATTEN AT AUD AUG AUX BEN CARD

whereas normally unmarked

to kind of addressee-honorific

relative tense = deagentive, blocking of actor argument perfective past (as opposed to imperfect) subtypes may be distinguished by APPL.REC, APPL.INST etc. lest subtype of declarative: high degree of commitment with,

links an attribute to the head

(PRV)
CAUS CIRC

if it is the only auxiliary root for if marked grammatically use privative in, by use exclamative followed by class identifier, e.g. HUM use hortative with, in the company of either masc. or fem.; cf. human and animate = SR normally = perfective

(EXCL)
CLF

(HORT)
COLL COMIT COMM CMPR COMP CMPL, CMP CNTV CONC COND

case gender degree of comparison subordinator aspect mood interpropositional relation interpropositional relation; mood evidential

although if; would

conjectural

CONJC

Lehmann, Interlinear morphemic glosses


conjunctive connector, -ive consecutive construct converb continuous copula crastinal dative deagentive debitive declarative deferential definite deictic of 12 person deictic of 1st person deictic of 2nd person deictic of 3rd person delative demonstrative dependent verb form desiderative destinative
CONJ CONN CONSEC CONST

17
of non-finite predicate if there is only one so that construct state use gerund if there is only one tomorrow use anticausative use obligative normally unmarked ~ speaker-humble

interpropositional relation interpropositional relation nominal aspect/aktionsart tense grammatical case

(GER)
CONT COP CRAS DAT

(ACAUS) (OBLG)
DECL DEFR DEF D12 D1 D2 D3 DEL DEM

sentence-type honorification determination determination determination determination determination local case determination deverbal verb derivation local case; also on non-finite verb forms (= supine) pronominal deverbal verb derivation

down from use subjunctive to; if typically for human destinations, use benefactive will normally be DEF, INDEF, GNR, SPEC, NSPEC see also anticausative and introversive

(SUBJ)
DES DEST

determiner detransitivizer different subject diminutive direct direct evidential direct object directional distal distributive donative dual dual exclusive dual inclusive dubitative durative dynamic egressive elative emphasizer/emphatic equative

DET DETR DS DIM DR DIREV DO DIR DIST DISTR DON DU, DL DE DI DUB DUR DYN EGR ELAT EMPH EQT

denominal noun derivation voice evidential cross-reference position case or verb derivation determination nominal or verbal number number number mood aktionsart aktionsart aktionsart local case funct. sentence perspective 1. case; 2. predicative grammatical case or crossreference position case verbal mood

vs. inverse

towards; use AND and VEN for deictic directionals remote from deictic center auxiliary of benefactive construction

vs. stative out of e.g., class of pronoun as; feature/marker of adjective in nominal clause in ergative system as; see also transformative

ergative essive evidential exclamative

ERG ESS EVID EXCL

Lehmann, Interlinear morphemic glosses


exclusive exist(ential) experiential extrafocal extraversive factitive

18
use dual exclusive, plural exclusive

EXIST EXPER EXFOC EXTRV FACT

grammatical verb aspect verbal deverbal verb derivation denominal/deadjectival verb derivation pronominal gender verbal

status of subordinate clause of cleftsentence transitivization by addition of undergoer A-FACT NP make NP A

familiar FAM F feminine finite FIN first person dual inclu- 12 sive FOC focus FRM formal frequentative FREQ FUT future GNR generic genitive GEN GER gerund gerundive (OBLG) HABIT habitual habitual-generic habitual-past HESIT hesitative HEST hesternal hodiernal future HODFUT HODPST hodiernal past HON honorific hortative HORT HUM human HML humble hypocoristic hypothetical illative immediate immediate/imminent future immediate past imperative imperfect imperfective impersonal impersonal passive inactive inalienable inanimate inceptive inchoative inclusive incompletive,
HCR HYP ILL IMM IMMFUT

if treated as a quasi-singular; otherwise dual inclusive funct. sentence perspective mood aktionsart tense determination grammatical case verbal aktionsart

multiple times on several occasions

verbal adverb or converb use obligative ~ customary use habitual, generic use habitual, past yesterdays past todays future todays past 1st person imperative comprises speaker-humble, addresseehumble, referent-humble

funct. sentence perspective tense tense tense honorification mood honorification affect mood local case tense tense

into specifier of other tenses

(RECPST)
IMP IMPF IPFV IMPR IPS INACT INAL INAN

use recent past mood tense-aspect aspect imperfective past; vs. aorist only if formally distinct from the specific persons passive without promotion to subject in active system possessive attribution morpheme or feature use ingressive N/A-INCH become N/A use dual inclusive, plural inclusive normally = imperfective

voice grammatical case or crossreference position nominal

(INGR)
INCH INCMP(L)

denominal verbal derivation aspect

Lehmann, Interlinear morphemic glosses


noncompletive inconsequential indefinite independent indicative indirect object inessive inferential infinitive ingressive injunctive instructive instrument nominalizer instrumental intensive interrogative intransitive intransitive subject introversive inverse invisible irrealis iterative jussive lative ligature linker

19

INCONS INDEF INDEP IND IO INESS INFR INF INGR INJ

interpropositional relation determination mood mood cross-reference position local case mood or evidential verbal aktionsart mood deverbal nominal derivation case verbal sentence type verbal cross-reference position deverbal verb derivation usually verbal determination mood aktionsart mood local case nominal nominal

only if distinct from indicative

inside

(MAN)
INSTNR INST(R) INTS INT INTR S INTRV INV INVS IRR ITER JUSS LAT LIG LNK

use manner

often aktionsart particle or morphological category morpheme or grammatical category only if opposed to both A and P; use SBJ otherwise blocking of undergoer argument vs. direct

several times on one occasion 3rd ps. imperative or dependent mood to ~ from ~ via links subconstituents of a phrase, typically an NP; properly includes attributor

locative locative topic logophoric malefactive manner manner nominalizer masculine masculine personal medial medial mediative mediopassive middle motivative narrative near future negative neuter nominalizer nominative nonnon-finite non-future non-human non-masculine personal

LOC LT LOG MAL MAN MANNR M MHUM MED MEDV MEDT MEDP MID MTV NARR NRFUT NEG N NR NOM N NFIN NFUT NHUM NM

local case voice pronominal or verbal deverbal verb derivation case deverbal nominal derivation gender gender determination verbal case voice voice case tense tense gender deverbal nominal derivation or syntactic subordination grammatical case verbal tense gender gender

also on non-finite verbs

medial distance from deictic center verb form in a chain between, among; by means of excludes passive by; sometimes called causal after immediate future

see also the more specific ones

e.g. NPST

Lehmann, Interlinear morphemic glosses


non-past non-plural non-singular non-specific non-visual non-volitional noun class n object obligative oblique obviative optative ordinal participle (marker) partitive passive past patient nominalizer patient topic paucal pejorative perfect perfective pergressive perlative place nominalizer pluperfect plural plural exclusive plural inclusive pluritive polite positional positive possessive postcrastinal postelative posterior postessive post-hodiernal potential precative predicative present preterite pre-hesternal primary object privative processive, -ual progressive prohibitive prolative proprietive prospective proximal
NPST NPL NSG NSPEC NVIS NVOL CLn OBJ OBLG OBL OBV OPT ORD PART PRTV PASS PST PATNR PT PAU PEJ P(R)F PFV

20

tense number number determination evidential verbal cross-reference position mood case person mood numeral verbal case voice tense deverbal nominal derivation voice number affect tense-aspect aspect local case deverbal nominal derivation tense number number number

<3 > 1; only if there is a plural for > 2 non-eye-witness where n is a number or a feature

vs. proximate

(PERL)
PERL LOCNR PLUP PL PE PI

use perlative through past or perfect of a past

(PL) (FRM)
POSIT

plural of a singulative; use plural use formal verbal possessive adjective, pronoun and cross-reference position tense local case relative tense local case tense mood mood nominal tense tense cross-reference position case denominal verb derivation aspect mood local case case or derivational category tense-aspect determination use affirmative not for an adnominal case relation; that is GEN or AT future after tomorrow from behind behind future after today for requesting predicative form use past past before yesterday without

(AFFM)
POSS POCRAS POSTEL POST POSTESS POHOD POT PREC PRED PRS

(PST)
PRHEST PO PR(I)V PROC PROG PROH PROLAT PROPR PROSP PROX

negative imperative along, by (way of) having, provided with going to; opposite of perfect near the deictic center

Lehmann, Interlinear morphemic glosses


proximate punctual purposive quality nominalizer quotative realis recent past reciprocal reduplicative referent-honorific referent-humble referentive reflexive reinforcement relational(izer) relative relative remote remote past repetitive reportative resultative reversive same subject secondary object semelfactive sensory separative sequential simultaneous singular singulative sociative speaker-honorific speaker-humble specific speculative stative subelative subessive subject subjunctive sublative subordinator superdirective superelative superessive superlative super-lative terminative topic transformative transitive transitive patient
PRX PNCT

21
vs. obviative use destinative

person aspect or aktionsart deverbal nominal derivation marking indirect speech mood tense voice or pronominal honorification honorification case voice or pronominal nominal subordinative and/or pronominal

(DEST)
QUALNR QUOT RLS RECPST REC(P)

vs. irrealis = immediate past gloss by function

3HON 3HML
RFR R(E)FL

about use intensive in relative clause use referentive use distal only if distinct from iterative

(INTNS)
RELL REL

(RFR) (DIST)
REMPST REP RPRT RES RVRS SS SO SMLF SENS

tense aktionsart evidential aspect or aktionsart aktionsart cross-reference position aktionsart evidential interpropositional relation interpropositional relation number nominal verbal honorification honorification determination evidential aktionsart local case local case cross-reference position mood local case interpropositional relation

(ABL)
SEQ SIM SG SGT SOC

use ablative vs. simultaneous vs. sequential restricted vs. collective together

1HON 1HML
SPEC SPECL STAT SUBEL SUBESS SBJ SUBJ SUBL SR

from under under

(SUPL)
SUPEL SUPESS SUP SUPL TERM TOP TRNSF TR P

local case local case degree of comparison local case local case or aktionsart funct. sentence perspective case verbal cross-reference position

to under only for the single universal subordinator (that) use super-lative from above above to above up to becoming; dynamic counterpart of essive morpheme or grammatical category only if opposed to both S and A; use OBJ

Lehmann, Interlinear morphemic glosses


otherwise only if opposed to both S and P; use ERG otherwise across only if distinct from paucal

22

transitive subject transitivizer translative trial undergoer unrestricted unspecified validator venitive verbalizer visible visual vocative volitional, volitive zero

A TRR TRNSL TRL UGR

cross-reference position deverbal verb derivation local case number cross-reference position person deictic verbal derivation determination evidential case verbal

(PL)
UNSPEC VEN VR, VBZ VS VIS VOC VOL

use plural unspecified argument of relational base use assertive, declarative

eyewitness

making no contribution to sentence meaning

Tab. 169.4: Grammatical category labels

4. Boundary symbols
4.1. Basic rules

Rules R1 and R4 guarantee correspondence between units in the L1 text and in the IMG. They do not, however, insure that the vertical alignment works in a mechanical way. This is desirable in certain contexts such as automatic parsing. It can be guaranteed in a fully formalized representation, which would then take the form of a table (s. Lieb & Drude 2000). In less formal situations, it cannot be fully guaranteed because there may be good reasons not to insert morpheme boundaries in the L1 text while still representing each morph by a separate gloss (cf. R13). Correspondence of boundary symbols in the L1 and the IMG lines is therefore not generally an equivalence, but only an implication: boundary symbols in the L1 line are matched by corresponding boundary symbols in the IMG (R9). We will review the kinds of boundaries and their delimiters in turn. The word boundary is shown by a blank in L1. This is repeated in the IMG, and conversely there is a blank in an IMG only if there is a corresponding blank in the L1 line. This particular rule (R10) is therefore stricter than R9. R10 prohibits two situations: a word being rendered by a sequence of two words; and a sequence of two words being rendered by one word. The first situation will be discussed in section 4.5. Sometimes a sequence of two L1 units (words or morphemes) corresponds to one L2 unit. In principle, this situation should not arise in the IMG because each of the L1 units should have its own gloss. However, it is possible that either the L1 units have no meaning in isolation or else mean something totally different than their combination, the latter being idiomaticized. In such cases, glossing them separately might give a misleading impression of the workings of the grammar. When the bisected L1 unit forms an orthographic unit (e.g. a compound), one may simply dispense with the analysis (cf. section 3.7). For instance, instead of Germ. be-komm-en APPL-come-INF,

Lehmann, Interlinear morphemic glosses

23

one can write bekomm-en get-INF. If the orthography requires a boundary, as in Yucatec Maya le kah when, the first choice is to gloss the items separately (in this case, DEF SR) and to leave the semantic interpretation to the idiomatic translation. The second choice is to indicate the semantic unity of the two L1 items typographically by replacing the blank by a boundary symbol that does not interfere with the orthography, e.g. by an underscore: le_kah when (R11). If L1 orthography links the two items by another symbol that is also an IMG boundary symbol, as in Engl. vis--vis facing, no satisfactory solution is known. Apart from special cases to be noted, the morpheme boundary is shown by a hyphen in L1 (R12). This is repeated in the IMG; and here again the converse applies, too. Apart from the vis--vis type exception, this does not pose any problems. It does, however, happen that the L1 text contains a combination of two morphemes, but no boundary is shown between them. Various motivations for this are conceivable, be it that two morphemes are fused in a portmanteau morph, be it that the position of the boundary is not clear or irrelevant, be it that the analyst does not want to disfigure L1 orthography with boundary symbols. In such cases, a colon in the IMG is a hint at a morpheme boundary existing, but not shown in the L1 line (R13). The purpose of R13 is to allow the analyst to forgo a segmentation while still saving R1 and insuring biuniqueness of the other boundary symbols. Several examples may be seen in (1). The colon is also used to render a portmanteau morph, e.g. French au DAT:DEF. More on this in section 4.5. Special symbols may be introduced to distinguish kinds of morpheme boundaries. For instance, the use of the plus sign to signal a boundary in compounding, as in German Weihnachts+gans Christmas+goose is rather widespread; and occasionally it is also found in derivation, as in German wolk+ig cloud+ADJVR (cloudy) (R14). No orthography distinguishes clitic boundaries from word and morpheme boundaries. If L1 is represented in conventional orthography, then the simplest solution for an IMG is not to distinguish them either. Thus French je le sais I know it will be glossed as SBJ.1.SG DO.3.SG.M know.SG, while Latin itaque and so will be glossed by so:and. If clisis is important or the L1 representation is non-orthographic, then the clitic boundary will be shown by an equal sign both in the L1 text and in the IMG, thus: ita=que so=and (R15). If a zero morph or morpheme is represented in L1 by (cf. section 2.1), no special measures need be taken. If it is not there represented, then its gloss is enclosed in parentheses (R16), like this: Lat. timor fear.M(NOM.SG). In this example, a stem is accompanied by two (complexes of) grammatical category labels, M and NOM.SG. The first is separated by a period because it corresponds to an inherent feature of the stem. The second is enclosed in parentheses because it corresponds to a separate morpheme.

4.2.

Discontinuity

Discontinuous units words or morphemes are like bisected units in that one semantic unit is represented by two expression units. However, they present the added difficulty that their parts are not adjacent, so the IMG has to make it explicit what belongs together. For a discontinuous stem or affix, diverse solutions have been proposed in the literature. Among them is the proposal (Bickel et al. 2004) to repeat the same gloss under each part of the discontinuous item. However, this seems misleading, as the syntagmatic cooccurrence of synonymous L1 items is not at all rare e.g. in hypercharacterization and must be distinguished from discontinuity. An unambiguous solution for a circumfix is to set it off by

Lehmann, Interlinear morphemic glosses

24

angled brackets, like this: Germ. ge>lauf<en <PART.PRF>run (run (part.prf.)) (R17). Discontinuous words are rare. The first choice is to try and gloss each part independently, as done for the German circumposition um willen for in (7). (7) German um unser-es Heil-es for our-GEN.SG salvation-GEN.SG for (the sake of) our salvation

willen sake

The second choice is to treat them by the same formalism as for circumfixes. Consider the case of preverbs. In several Indo-European languages, they may be distantiated from their host verb to yield a discontinuous verb stem. There are two options for glossing such discontinuous compounds: If the compounding is relatively transparent, one may prefer to provide the preverb and the base each with its gloss. If the compound is completely lexicalized, this might be misleading, and so it may be preferable to treat it as a discontinuous morpheme in the gloss, as in (8). (8) German es hr>-t it <stop>-3.SG it stops now

jetzt now

<auf

Infixes, too, require a special boundary symbol in order to insure that the root bisected by them is perceived as a unit. This is achieved enclosing them in angled brackets as shown in (9)-(10) (R18). (9) Latin vi<n>c-o conquer<PRS>-1.SG I conquer Indonesian t<el>unjuk <AGNR>point forefinger

(10)

The gloss of a left-peripheral infix precedes the gloss of its host, the gloss of a right-peripheral infix follows it (Bickel et al. 2004).

4.3.

Reduplication

Reduplicative segments may have the same kinds of grammatical functions as affixes, and sometimes they are formally not easily distinguished from affixes. Therefore they must be glossed just like affixes, but at the same time they must be formally distinguished from affixes. This is achieved by providing the same kind of gloss for them as for grammatical formatives, but separating them by a tilde (R19); Bickel et al. 2004), as in (11)-(12).

Lehmann, Interlinear morphemic glosses

25

(11)

Ancient Greek g~graph-a PRF~write-1.SG I have written Yucatec Maya ka~kas INTNS~bad wicked

(12)

4.4.

Other morphological processes

Morphological processes not covered by the above conventions comprise transfixation, internal modification, metathesis, subtraction and suprasegmental processes (cf. ch. VIII). These are like infixation in not being peripheral to the base, but they differ from it in that the grammatical meaning in question is not associated with a single string of segments which, if subtracted, leaves the base. The notation recommended here distinguishes them from the other morphological processes, but not from each other. Such a morpheme can hardly be signaled in the L1 representation. In the IMG, its gloss follows the gloss of the base, separated by a backslash (R20). An example of transfixation is the Arabic broken plural, as in bujt house\PL (houses). Apophony, metaphony, e.g. German sng-e sing\IRR-1/3.SG (I/he would sing), and tone shift, as in Yucatec Maya. hats beat\INTROV (beat (unspec. object)) are treated in the same way.

4.5.

Semantic and grammatical features

The gloss of a grammatical morph often consists of a set of symbols. They are separated by a period, as in Germ. Tisch-es table-GEN.SG (R21). The same rule applies in the situation mentioned in section 3.3, where an L1 lexeme is glossed by more than one L2 words. These, too, are separated by a period, as in Germ. fabulier-en invent.stories-INF. Lexical stems fall into grammatical classes. Noun stems, for instance, have gender; verb stems have valence. If such grammatical categories are covert, this information is not deducible from (the gloss of) the lexical meaning. It therefore makes sense to represent it in the gloss of the stem. The Latin example puellae girl.F:NOM.PL of section 2.1 shows how this may be done for gender. The same would be possible with transitivity. Instead of Yucatec Maya hats-ah beat-CMPL as shown in section 2.2, we might put beat.TR-CMPL. It does not seem necessary to have a rule here beyond R3 and R21. The period between values of different morphological categories cumulated in one morpheme is dispensable between person, gender and number, provided the resulting letter sequence is unambiguous. Thus, Latin lauda-mus may be glossed as praise(PRS.IND)-1.PL or praise(PRS.IND)-1PL. Sometimes the period is used as a general-purpose symbol to hide the lack of an analysis, including the function of the colon as regulated by R13. This is not recommendable if as is usually the case the period is also used in the function regulated by R21. Given

Lehmann, Interlinear morphemic glosses

26

R21, the notation Lat. orant pray.3.PL would imply that orant consists of a single morph. An IMG should at least make the distinction between a morph and a grammatical feature of a morph. In other words, if the author knows the number and order of morphs in an L1 form, then he should indicate them. If the author does not even know so much, he probably ought not to use the example. Still, in emergency situations, R23 may be viable, which allows for linking IMG elements by an underscore without any implications for L1 morphological structure. This would allow for putting orant they_pray.

4.6.

Composite categories

Two cross-reference categories may share a morphological slot, as in (13). (13) Mayali Kamak kan-bolk-bukka-n ke. good SBJ.2&OBJ.1-country-show-NPST your It is good that you will show me your country. (Evans 1997:400)

In principle, the case is analogous to one declension suffix showing both number and case. However, when actor and undergoer cross-reference is cumulated in one morpheme, sticking to R21 would lead to obscurity. Instead, information on the two dependents should be separated by '&' or by '>' (R22). The greater than sign has two advantages here: it is iconic, and it dispenses with the use of function labels such as SBJ, OBJ, ACR, UGR (simply 2>1 in (13)). It has the disadvantage that the same symbol is used for discontinuous and infixed material, which may lead to conflicts. This case must be kept distinct from a portmanteau morph, viz. when two crossreference categories that generally each have their own morphological slot fuse in one morph occasionally. There R13 applies.

4.7.

Constituency

The IMG abides at the level of morphology. The text may be represented at other levels in addition, if that is desired. Still, IMGs are used most frequently in publications on syntax, where not only morphological, but also syntactic properties of the examples are at stake. Very often it suffices to identify one constituent in the example, for instance the prepositional phrase or the relative clause that is the subject of analysis. Then no harm is done, but on the contrary the reader is helped in scanning the example, if constituency is shown by brackets. Thus in (14), the relative clause is identified by the bracketing. (14) Yucatec Maya le mak chowak DEF person [long the person who has long hair

u
POSS.3

ho'l-e' head]-D3

In principle, this may be done either in the L1 line or in the IMG (it need not be repeated in both). However, since the IMG line is the one that contains the grammatical analysis, the

Lehmann, Interlinear morphemic glosses

27

bracketing seems more natural there (R24). In principle, an IMG may even be combined with a labeled bracketing; but above some rudimentary level, this will soon lead to illegibility.

5. Typographic conventions
IMGs obey a number of typographic conventions all of which aim at facilitating the readers task. First, if there are more lines of linguistic representation (cf. section 1.3), for instance one of syntactic constituency or lines that show syntactic, semantic or pragmatic functions of the construction, then these follow the IMG, as stipulated in R25. Second, words (neither larger nor smaller units) of L1 are left-aligned with their glosses (R27). Further, since IMGs are generally longer than the L1 text they render, they are printed in a smaller type-face (R28), and grammatical category labels are abbreviated (R29). For comparison, here is an example of a publication which does not observe these rules (Monod-Becquelin 1976:138 on Trumai): yyk letsi kate y hai-ts yy-ka-ke avec du piment, je rends le poisson piquant (regarde) // piment / avec / poisson / actualis. / 1re pers. erg. / piquant-causatif-marque dadjectivisation // Furthermore, since IMG lines are not sentences, the relevant orthographic rules of punctuation, initial capitalization and syllabification do not apply (R30 R32).

6. Summary
Instead of a prose summary, a list of the rules and symbols proposed follows:

6.1.
6.1.1. R1. R2.

Rules
Glossing principles With the exceptions specified below, there is a symbol or a configuration of symbols in the IMG if and only if there is a morph in the L1 text that it corresponds to. The IMG represents morphemes, not allomorphs. Therefore, the gloss of a grammatically conditioned allomorph does not contain the grammatical category that conditions it. An IMG should be as precise and detailed as tolerable. The limits of precision and detail are defined by practical considerations of complexity and intelligibility. There is a biunique mapping of individual L1 morphemes onto glosses. (a) An L1 lexeme is glossed by L2 lexemes. (b) L1 stems are glossed by L2 stems. The gloss of a grammatical morph is a configuration of grammatical category labels each of which represents the value of a grammatical category. A grammatical morph

R3. R4. R5. R6.

Lehmann, Interlinear morphemic glosses

28

R7. R8.

should not be glossed by an L2 bound morpheme. It may be glossed by an L2 word if that has the same function as the L1 morph. Homonymy is resolved in the IMG, polysemy is preferably not. The gloss of a polysemous L1 item should represent, in the order of decreasing preference, its Gesamtbedeutung, its Grundbedeutung, the set of its senses, its contextual sense. Boundary symbols Apart from R30, there is a boundary symbol of a certain type in the IMG if there is a corresponding boundary symbol in the L1 text. More strictly, there is a blank, hyphen, plus, equal sign, angled bracket and tilde in an IMG if and only if there is an identical symbol in the L1 text corresponding to it. A word boundary is shown by a blank ( ). Two successive orthographic L1 words which must be glossed by one L2 word are linked by an underscore (_). A morpheme boundary is shown by a hyphen (-). A morpheme boundary not shown in the L1 text is indicated by a colon (:) in the IMG. This applies also to portmanteau morphs. A boundary in a compound stem, and possibly also in a derived stem, may be shown by a plus sign (+). A clitic boundary may be shown by an equal sign (=). A gloss of a zero morpheme or allomorph is enclosed in round parentheses (()). The string enclosed in a discontinuous L1 item P1 ... P2 is enclosed in inverted angled brackets (P1> ... <P2). In the IMG, P1 receives a gloss enclosed in angled brackets; P2 is not glossed. An infix is enclosed in angled brackets both in the L1 text and in the IMG. The gloss of a left-peripheral infix precedes the gloss of its host, the gloss of a right-peripheral infix follows it. A reduplicative segment is glossed like an affix (i.e. by a configuration of grammatical category labels) and separated from its source by a tilde (~). A grammatical meaning expressed by a non-segmentable morphological process (transfixation, internal modification, metathesis, subtraction, suprasegmental process) is not signaled in the L1 representation. Its gloss follows the gloss of the base, separated by a backslash (\). Elements of an IMG that represent components of one L1 morph are separated by a period (.). As a special case of R21, components of one L1 cross-reference morph that have distinct reference are separated by the ampersand (&) or, where no conflict with R17 and R18 arises, by the greater-than sign (>) for actor and undergoer cross-reference. An L1 word form whose morphological structure is not represented in the IMG may be represented by a set of symbols whose status as representing morphs or features is ignored and whose sequence has no implications as to L1. Such symbols that jointly correspond to an L1 word form are joined by an underscore (_).

6.1.2. R9.

R10. R11. R12. R13. R14. R15. R16. R17.

R18.

R19. R20.

R21. R22.

R23.

Lehmann, Interlinear morphemic glosses R24.

29

If constituent structure is to be displayed, square brackets ([]) can be inserted in the IMG. Typographic conventions The IMG is in the line immediately below the corresponding L1 text line. The distance between an L1 text line and the line immediately preceding it is greater than that between it and the IMG line belonging to it. Each L1 word form is left-flush with the L2 word or complex of symbols rendering it. If such an arrangement is impossible, the following is a minimum requirement: If there is, in an IMG, an equivalent to an element of an L1 text line, it is contained in the line immediately below that line. The IMG is printed in a smaller type-face than the L1 text. If this is impossible, then at least grammatical category labels are in small capitals. Grammatical terms appearing in IMGs are abbreviated, without a period at the end, and set in (small) capitals. There is no punctuation in an IMG. Parentheses including optional material in the L1 line are not repeated in the IMG, either (cf. R16). There is no sentence-initial uppercase in an IMG. There is no syllabication either in the L1 line or in the IMG.

6.1.3. R25. R26. R27.

R28. R29. R30. R31. R32.

6.2.
L1 xy x_y z x-y x+y x=y z xy a<x>b x>a<y z z z x x

Symbols
IMG xy z x_y x-y x+y x=y x/y x:y (x) ab<x> <xy>a x\y x.y x&y (x>y) [x] [x]Y meaning word boundary between x and y x and y are two orthographic words, but one lexical word x and y jointly render z without morphological analysis morpheme boundary between x and y x and y form a compound or a derivative stem x and y are joined by clisis x and y are alternative meanings of ambiguous z morpheme boundary between x and y not shown in the L1 text x does not have a significans in the L1 text x is an infix in ab xy is a circumfix around a y is a non-segmentable morphological process on lexeme x x and y are semantic or grammatical components of z x and y are grammatical components of z cross-referencing two different dependents x is a syntactic constituent x is a syntactic constituent of category Y

Lehmann, Interlinear morphemic glosses

30

7. References
7.1. Specialized literature

Bickel, Balthasar & Comrie, Bernard & Haspelmath, Martin 2004, The Leipzig Glossing Rules. Conventions for Interlinear Morpheme by Morpheme Glosses. Leipzig: Max-Planck-Institut fr Evolutionre Anthropologie Lehmann, Christian 1982, "Directions for Interlinear Morphemic Translations". Folia Linguistica 16, 199-224 Lehmann, Christian 2004, Data in Linguistics. Linguistic Review 21.2, 000-000 Lehmann, Christian & Bakker, Dik & Dahl, sten & Siewierska, Anna 21994, EUROTYP Guidelines. Strasbourg: Fondation Europenne de la Science (EUROTYP Working Papers) Lieb, Hans-Heinrich & Drude, Sebastian 2000, Advanced Glossing: A Language Documentation Format. Berlin: Technische Universitt (Working Paper) Simons, Gary F. & Versaw, Larry 1988, How to use IT. A Guide to Interlinear Text Processing. Dallas, Tx.: Summer Institute of Linguistics (Revised edition, Version 1.1)

7.2.

Sources of examples

Bloomfield, Leonard 1933, Language. New York etc.: Holt, Rinehart & Winston Evans, Nicholas 1997, Role or Cast? Noun Incorporation and Complex Predicates in Mayali. Alsina, Alex & Bresnan, Joan & Sells, Peter (eds.), Complex Predicates. Stanford: CSLI Publications, 379430 Finck, Franz Nikolaus 1909, Die Haupttypen des Sprachbaus. Leipzig: B. G. Teubner [Nachdr. d. 3., unvernd. Aufl.: Darmstadt: Wiss. Buchgesellschaft, 1965] Fortescue, Michael 1984, West Greenlandic. London etc.: Croom Helm (Croom Helm Descriptive Grammars) Gabelentz, Hans Conon von der 1861, "ber das Passivum. Eine sprachvergleichende Abhandlung". Abhandlungen der philologisch-historischen Classe der Kniglich-Schsischen Gesellschaft der Wissenschaften 8, 449-546 Monod-Becquelin, Aurore 1976, "Classes verbales et construction ergative en trumai". Amrindia 1, 117-143 Woodbury, Hanni 1975, Onondaga Noun Incorporation: Some Notes on the Interdependence of Syntax and Semantics. International Journal of American Linguistics 41, 10-20

Christian Lehmann, Erfurt (Germany)

You might also like