Jonathan Owens* and Robin Dodsworth

Semantic mapping: What happens to idioms
in discourse
Abstract: Idioms have generally played a supporting rather than a leading role
in research on figurative language. In Cognitive Linguistics for instance idioms
have been understood against how they are embedded in conceptual metaphors
(Lakoff 1987, Women, fire, and dangerous things. Chicago: University of Chicago
Press; Clausner and Croft 1997, Productivity and schematicity in metaphors.
Cognitive Science 21. 247282) while in the experimental psycholinguistic tradi-
tion their role has been to challenge the basis of conceptual metaphor in
priming figurative language (Glucksberg et al. 1993, Conceptual metaphors
are not automatically accessed during idiom comprehension. Memory and
Cognition 21. 711719; McGlone 2007, What is the explanatory value of a con-
ceptual metaphor? Language and Communication 27. 109206). It is, moreover,
broadly assumed that criteria defining grammatical properties of idioms are
limited to their morphological and syntactic behavior (Nunberg et al. 1994,
Idioms. Language 70. 491538). While the pragmatic properties of idioms have
been described informally (Glucksberg 2001. Understanding figurative language:
From metaphors to idioms (Oxford psychology series 36). Oxford: OUP), there are
few studies which systematically contrast the behavior of nouns in literal vs.
idiomatic expressions in discourse. Using a battery of criteria which has been
developed to study discourse properties of subjects in spoken Arabic (Owens
et al. 2013. Subject expression and discourse embeddedness in Emirati Arabic.
Language Variation and Change 25. 255285), we show that keyword nouns in
Nigerian Arabic are significantly different according to whether they are idio-
matic or literal. The basis of the conclusion is the statistical analysis of 1403
tokens derived from a large corpus of natural Nigerian Arabic texts. Nouns in
idiomatic expressions are opaque to discourse in a way those in literal ones are
not. To explain the statistical results we argue that idioms partake in a semantic
mapping which incorporates the noun and its collocate in the idiom into a

Jonathan Owens and Robin Dodsworth

word-like unit, rendering it largely invisible to subsequent discourse. Since

Nigerian Arabic idiomatic nouns, as is shown, display no clause-internal syn-
tactic constraints, exhibit no cross-clausal syntactic dependencies, and show no
significant interactions with possessive pronouns which ostensibly appear to
mark the discourse argument of the keyword they are suffixed to, it is concluded
that the mapping is of semantic nature. Other than exemplifying basic facts
obtained via elicitation, the entire argument hinges on an examination of nouns
in actual spoken discourse. The article establishes that large corpora coupled
with multivariate statistical treatment contribute directly to understanding
semantic factors difficult to evaluate via direct elicitation or examination of
individual examples, in this case the sensitivity of cross-clausal referentiality
to idiomatic contextualization.

Keywords: idioms, discourse, referentiality, semantic mapping, Arabic

1 The problem
In Nigerian Arabic there is a class of nouns in the configuration, [Pssd N Pssr]
with Pssr = N or possessor pronoun, which display contrasting referential proper-
ties clause internally as opposed to inter-clausal behavior.1 Clause internally any
reference to a Pssd N in the unit [Pssd N Pssr] must be to the possessed head
N. The possessor may occur in situ, or be extraposed to another clausal function,
usually a topic, leaving behind a pronoun possessor trace. In (1) below, naas
(> naaz via assimilation) is extraposed to Topic function, and its possessor
position is marked by the possessor pronoun trace hm (> tm via assimilation)
on iid (> iit via assimilation, i. e., < iid-hm). Clause internal reference is required,
for instance, in S-Predicate (Predicate here = verb or adjective) agreement as in
(1), or in topicalization (2).

(1) an-naaz iit-tm yaabs-e b-door le-hum deen

DEF-people hand-their dry-F 3-want for-them loan
Poor people require a loan. (lit. people whose hand is dry)
(TV 70a, Gulumba)

1 Abbreviations and phonetic symbols are standard ones, except DEF = definite,
ID = ideophone, and // = voiced emphatic implosive stop. In the transcription no attempt is
made to separate out epenthetic vowels. They are parsed with the following morpheme.
Examples from texts are identified by a key which can be used to find the audio and transcript
at the online data base resource listed in the bibliography.

Semantic mapping and idiomaticity

(2) naas-na raas-um fat-

peoplei-our head-theiri open-theyii.3.M.SG
Our people, they taught them. (lit. our peoplei theiri head theyii opened it)
(IM 14)

Example (1) is a straightforward case of agreement between the subject iid- hand.F
and the adjective predicate yaabs-e dry.F. In (2) the object noun raas is topicalized,
i. e., moved to a pre-verb position and a cross-referencing 3MSG pronoun obligato-
rily picks it up again, realized in the stress on the plural subject suffix o. Note that
although the meaning in (1) pertains to the topicalized possessor of iid- hand,
i. e., the people are the poor ones (see note 9 below), in the clause-internal syntax
this noun has no subject-like properties. In the current examples, for instance, it
is the possessed head noun (iid in [1], aas in [2]), not the Topic naas (nor the
pronoun possessor trace -hm), which determines co-referential agreement.
Inter-clausally this same construction [Pssd N Pssr] has two possibilities,
illustrated here by elicited data. Either the continuing reference will be
expressed in a pro form agreeing with the head noun, or a continuing pronom-
inal reference will agree with the possessor. Thus, (2) could be synonymously
continued in one of two ways.

(3) naas-na raas-um fat- amm ma bigi helu

people-our head-their but not became sweet
Our people taught them, but they didnt get smart (lit. it [=their head] did
not become sweet).

(4) naas-na raas-um fat- amm ma big-o hel

people-our head-their but not became-PL sweet
Our people taught them, but they (lit. they) didnt get smart.

In (3) the subject of the 3MSG verb bigi become continues aas head, while in
(4) the plural big-o continues the possessor of aas, naas-na /their. Note,
however, that in both cases the subject in the second clause refers to people
(see below).
Nouns in non-idiomatic expressions do not enjoy the cross-clausal freedom
of choice which idiomatic [Pssd N Pssr] does. For instance, given (5)

(5) bagar-na dassee-naa-hin fi l-buyuut

cattle-our stick-we-them.F in DEF-houses
As for our cattle, we put them in the houses (at night).
(IM 79)

Jonathan Owens and Robin Dodsworth

any further reference to bagar will be to bagar and any further reference to our,
the possessive pronoun on bagar-na will refer to the 1PL. referent. In this text
extract for instance, the discourse continues (in [6]) immediately with a refer-
ence to we in the form of a question directed to the speaker (-tu) and to bagar,
repeated as a separate token. With literal nouns continuing reference is essen-
tially one to one with their antecedents.

(6) waddee-tu al-bagar

sent-you.PL DEF-cattle
Did you.PL send the cattle?

Nouns in idiomatic expressions, therefore, stand apart, at least in theory. In a

literal [Pssd N Pssr] construction, in cross-clausal reference N and possessor
always entail disjoint reference. In an idiomatic [Pssd N Pssr] construction,
agreement in a following clause can target either N or Pssr, but in either case the
reference is the same, as illustrated in (3, 4) above.
If an essential contrast between idiomaticity and literalness resides in inter-
clausal referential properties, it is clear that ascertaining their actual behavior
requires corpus study. Elicitation has taken us as far as one can go by establish-
ing that a linguistic question remains open. The basic question is, given the
[Nidiomatic-Pssr] configuration, how does following discourse treat these two
elements? Is it the head noun, or the possessor which tends to create the
cross-clausal referential coherency? What happens to each part in discourse?
In order to answer this question a large corpus of Nigerian Arabic was
tagged for relevant parameters and subjected to statistical analysis, as described
in Section 3. The theoretical relevance of the analysis in Sections 6 and 7 will be
addressed after the results have been introduced in Section 5.

2 What are nouns in idiomatic expressions?

The basis of the current study is a small, yet on a token basis frequent, set of
nouns, all of them body parts: aas head, gab heart, iid hand, een eye,
and baun stomach. These have both idiomatic and literal meanings. The
nature of idiomaticity is a far-reaching topic, some of whose key elements will
be discussed in Sections 57 when the data and its theoretical treatment is
considered in greater detail. What is essential for present purposes are four
properties of idioms.

Semantic mapping and idiomaticity

are collocationally unpredictable
are collocationally fixed
are based on metaphoric and metonymic extension (Riemer 2005)
are idioms only as collocations

Collocational unpredictability and fixedness go hand in hand. Educate,

enlighten is formed of the keywords fata(h) open + aas in NA. Keywords are
stipulated lexemes necessary for the idiomatic meaning (see Section 3.1). Even
allowing for the arbitrary association of educate with open head, there seems
to be no a priori reason why educate as in (2) couldnt also be expressed by
kasar aas! break a head (! = semantically odd in the intended meaning) or
poor in (1) by iida muakkaa! his hand is dry, akka be dry, overripe. There
are historical, contact-based reasons for how the collocations came about few
of the many idioms on which this paper is based are found, as idioms in
varieties of Arabic outside the western Sudanic dialect area (Owens 2014)
but these reasons merely push the explanation for why only certain lexical
collocations work as idioms down the road into a larger linguistic
Sprachbund. The collocations are simply arbitrary, and they are limited to the
collocations given. In the case of body-part nouns focused on in this paper, the
idiomaticity, the move from concrete to abstract, results via metaphoric and/or
metonymic extensions as will be outlined in Section 6.
Fixedness is a collocational fact. It is ascertained linguistically in more than
one way. For instance, using a behavioral criterion, all of the idioms studied are
readily recognizable and explicable to our language consultants. This point has
more than theoretical significance. Those working on an Arabic variety which in
respect of its idiomaticity is quite different from other varieties of Arabic are
confronted by meanings opaque to the researcher. Each idiom had to be, and
could be, explained much in the way a new lexeme is explicated. For instance,
bi-tallif galb-i he spoils my heart = angers me (see [24] below) was immediately
paraphrased as zaal-ni he angered me.2 Secondly, fixedness is reflected in
our corpus in important distributional patterns. We will present statistics and

2 An anonymous reader correctly cautions against attributing too great an importance to

opinions of native speakers. The thrust of our observation is rather to suggest that in an ideal
world, if there were unlimited resources to test linguistic hypotheses, one could well imagine
setting up psycholinguistic experiments (e. g., reaction time tests) which seek to determine, for
instance, the extent to which a word such as galb heart or tallaf spoil, alone, in non-idiomatic
collocations, in idiomatic collocation etc. cues zaal anger, or vice versa.

Jonathan Owens and Robin Dodsworth

data in Sections 5 and 6 showing that the idiomatic meanings are (a) pervasive
and (b) have a special status in the syntax of the language.
Finally, an obvious, though important point to bear in mind is that the body
part nouns treated here are idiomatic only in the relevant collocations. Alone,
aas is simply a physical head, iid hand. Any analysis of aas as idiomatic, qua
individual noun, necessarily abstracts away from its idiomatic realization in its
obligatory collocational context.

3 Measuring discourse embeddedness

The text-based study of discourse embeddedness was established through stu-
dies such as Prince (1981), Givn (1983, 1992), Chafe (1994), and Ariel (1990,
1994). The basic unit of analysis is information status characterized variously in
terms of new/old, shared knowledge (Prince), accessibility (Ariel), or
aboutness (Chafe).3 Information status can be characterized in binary terms,
new vs. old for instance, and it can be characterized in graded terms, usually
represented as a set of values. Prince (1981: 237), for instance, has a hierarchi-
calized three point scale, Ariel (1994: 30) an eighteen point scale. Referring
expressions (NPs, Ns, Pros, see Chafe 1994) have an information status as
defined by their role in a given text. The first explicit quantification of referring
expressions in texts was probably the studies in Givn (1983). A noun, NP or
pronoun was defined according to so-called look-back and persistence values.
Look-back measures when an entity was last referred to in a text, and persis-
tence, how long an entity continues to be referred to in the following text.
Numerous studies (Brown 1983; Ariel 1994; Chafe 1994) have shown that
there is an inherent relation between information status and the syntactic unit or
lexical item encoding it. Pronouns, for instance, are inherently given, and they
typically have very short look-back values. Definite NPs may be either new (in
some sense of new) or given and have longer look-back values. Word order
often correlates with information status as well. Many languages, for instance,
represent new subjects in VS order, with SV used for discourse old subjects
(Payne 1992; Birner 1994). Owens et al. (2013) emphasize the inherent lexical
coding of information status in pronouns vs. nouns and how this impinges on
word order (SV/VS) in one variety of Arabic (Emirati Arabic).

3 Sentence or clause-based models of information status are developed in Levinson (1987) and
Lambrecht (1994).

Semantic mapping and idiomaticity

Essentially, therefore the discourse status of a referring expression can be

seen as a function of its discourse-given information status, and its grammatical
value. Grammatical value is a cover term for syntactic type (type of NP for
instance) and lexical class, which itself may have a semantic basis. It is these
two factors, discourse status and grammatical type, which we will use to define
the discourse behavior of the class of nouns in idiomatic expressions in Nigerian
Arabic. While it is a relatively easy matter to define for a given token what its
discourse status is, gaining an overview of how discourse categories actually
behave, i. e., if one wants to move beyond the logical possibilities provided by
direct elicitation, as described in Section 1, requires examining their behavior in
a relatively large corpus of actual speech. This in turn implies an adequately
large tagged set of referring expressions which in its turn feeds into a statistical
treatment which permits the relevant generalizations to be made. Without a
statistical treatment the relative effects of discourse status and grammatical
status (both lexical and syntactic) are impossible to discern in a large corpus.
The methodology followed and the variables used in this study are based on
Owens et al. (2009, 2010, 2013). Referring expressions are tagged for information
status and for syntactic and lexical properties. One caveat needs to be stated,
and that is the statistical treatment does provide one inherent constraint on the
degree of analytical delicacy possible. This is because multivariate analysis is
possible only when the factors have an adequate number of representative

3.1 The keyword, possessor pronoun and the tagging profile

Referring expressions are either nominals or pronouns. The research focuses on

two different properties of the referring expression which, as noted above is
termed a keyword. In the case of idioms the keyword is one of the constituent
body parts which comprise the idioms. These are discussed in greater detail in
Section 3.3, where it will be seen one class of keyword is literal only, a second
either literal or abstract. In (1), the noun iid- is a keyword. The two different
properties of the keyword are (1) its own discourse and grammatical properties,
and (2) whether or not it is possessed by a pronoun. Since a possessive pronoun
is itself a referring expression, it will be treated as a variable separate from the
keyword it is suffixed to. In (1), the possessor m (<-hum) is the second
referring expression studied.
The interest in these two elements follows from the discussion in Section 1.
We assess the discourse embeddedness of both keywords and their possessive
pronouns via the following set of properties.

Jonathan Owens and Robin Dodsworth

3.2 The parameters

The variables used in this study are summarized in this section. They are divided
between information status variables and grammatical variables, which include
syntactic and lexical-semantic properties.

3.2.1 Information status variables

(7) Previous mention of the possessor pronoun referent:

same as referent in previous clause
in preceding context, but not in immediately preceding clause

A referring expression can be new, i. e., not previously mentioned in the dis-
course, it can be the same as a referent in the immediately preceding clause, or it
can have been mentioned in the previous discourse, but not be in the immedi-
ately preceding clause. Situational describes a referent that is inferable from
the speech situation itself. A first or second person pronoun is a typical value of
this variable. The same as and in preceding context values tell whether the
referring expression is referred to in the immediately preceding clause, same
as, or rather in the preceding but non-adjacent discourse.
Same as represents a good example of the necessity of collapsing values
to make a quantitatively larger variable. Same as referent in previous clause
may be the same as the subject (see next variable) or the same as a non-subject
argument in the previous clause. The contrast is given in (8), (9).

(8) shoqol al dassee-ni fi galb-inai be niyya waade nui-guum

thing which in heart-our with goal one we-get up
on it
(Whatever) we decide to do with determination, we take it up.

Here the token nu- we in the second clause continues both the subject and the
possessor in the previous clause. In the following, the possessor in the successor
clause na continues the object in the previous clause.

Semantic mapping and idiomaticity

(9) dugut if-t-inai gab-inai waagif ooy

now saw-you-us heart-our standing ID
Now you see us scared (lit. saw us, our heart is standing).

While the original coding distinguishes according to whether the previous

clause mention is in subject (8) or non-subject (9) position, previous study
based on Emirati Arabic (Owens et al. 2013) showed that this contrast had a
relatively small effect on discourse coherency, and therefore the two values are
collapsed into one.

(10) Possessor pronouns co-reference in same clause

mentioned in same clause
same as subject of clause
no coreferent in same clause

Discourse studies have shown that the subject is a key hinge (Prince 1981; Chafe
1994) linking clauses. The current variable measures whether a relevant referring
expression is referred to in the same clause as itself. In (1) for instance, the suffix
pronoun m on iid- cross references the topic naas so m is mentioned in
same clause, and similarly aas in (2) is cross-referenced as an object pronoun
on . In (8) the possessive pronoun on the keyword heart is cross-referenced
in the subject of the same clause.

(11) Possessor pronouns co-reference in following clause

The discourse variable reference in following clause is collapsed with a

syntactic variable discussed in (16) below and so is both a discourse and a
grammatical variable.
The information status variables will yield a measure of the degree to which
a noun or pronoun is cross-referenced in preceding and following discourse. The
more an item is cross-referenced in preceding and following clauses, the more it
is embedded in or transparent to discourse.

3.2.2 Grammatical and semantic variables

Whereas discourse variables necessarily index values outside of the keyword

itself, grammatical and semantic variables encode values inherent in the

Jonathan Owens and Robin Dodsworth

(12) Keyword: literal or abstract


Keywords can either have a literal meaning, or they can have an idiomatic
meaning, which is here termed abstract. In all examples given thus far the
body-part keywords are abstract (see [17] below for literal body part keyword).
Discussion of this point is continued in Section 3.3.

(13) Keyword: body part or non-body part

body part
non-body part

The keywords can either be a body part or not. Because all non-body part
keywords in the present sample have literal meaning, (12) and (13) were com-
bined into a single variable with three possible values: abstract body part, literal
body part, literal non-body part.

(14) Does referring expression have a possessor pronoun?


This is self-explanatory. In both (1) and (2) the keywords have possessive
pronoun suffixes.

(15) Syntactic function of referring expression

The syntactic function of a referring expression is classified as follows.
object of a preposition
possessor of a noun

This list collapses some syntactically distinctive constructions. In particular,

there are two types of possessor in NA, the so-called iaafa, direct possessor
and an analytic possessor in which Pssd + Pssr are linked by hana of. These
have been shown to be distinctive in respect, for instance, of borrowing

Semantic mapping and idiomaticity

(Owens 2005), but for the sake of analytic simplicity and statistical treatment
they are collapsed into one class.4 It should be noted that while adverbs,
predicates, and fragments5 were included in the tagging and original statistical
treatment, because their token count is very low it was decided to leave them out
of the treatment in Section 5.

(16) Referent of referring expression continued as what in following clause?

Whereas (15) classifies the syntactic status of the referring item itself, this
variable classifies what status the coreferential item in the following clause
has. This has the same values, except for one further possibility, namely that
the referring expression is not continued at all in the following clause.

not continued

In (18) below, for instance, the keyword gab is not referentially continued in the
following clause, while its the referent of the possessor i surfaces as the
subject ba- of the following verb say.
As noted above, this is also a discourse variable, since it tracks whether the
referent of the keyword is continued in discourse or not.6
The referring expression can be either a noun keyword or a possessor
pronoun, so (15) and (16) each apply twice, once for the keyword, once for the
pronoun possessor. However, only possessor pronouns, and not keywords, were
coded for previous mention. This is because past studies (Owens et al. 2013: 23)
have shown that full nouns typically are new, hence the time required to code
each token is not commensurate with immediate results. While this is not a
categorical rule, it is a strong enough tendency that at this point in the study of
Arabic discourse it is an issue that is left for further, more refined analysis.
The current study limits the discourse tracking of referents largely to pre-
ceding and following clause. This is done first of all because the immediate
context has proved to be decisive in understanding the main properties of the

4 By elicitation direct and analytic possessors are semantically equivalent. Beet- my house =
al-beet hana-y. Thus disentangling the contrastive function of the two itself implies more
detailed discourse-based study.
5 Adverbs are tokens such as een een bilkallamo eye eye they speak = they speak face to face.
In Arabic nouns can occur directly as predicates without a verb. Fragments are false starts,
ellipted answers to questions and such elements.
6 A further grammatical variable, whether or not a noun is modified by a demonstrative,
intended to measure a nouns discreteness or individuality, was left out since this variable
essentially increases the model deviance without contributing significantly to the research
question as developed in the statistical treatment.

Jonathan Owens and Robin Dodsworth

discourse behavior of referents in Arabic (e. g., Owens et al. 2010, 2013: 28 note
6). Furthermore, limitations of resources prohibit the large-scale tagging of more
distant referents.
In the appendix can be found further examples of the categories described
here, as exemplified from the corpus, while further exemplification comes from
a discussion of detailed aspects of the statistical analysis in Section 5.

3.3 The keyword, idiom vs. control sample

The focus of the study is the behavior of the keyword. The interest of the
keyword is that a number of body parts have a dual status: they have a literal
meaning, and they have an abstract or idiomatic meaning (we use the terms
interchangeably). Besides (2), head has literal usages, as in the following where
the reference is to the head of a cow.

(17) al-hebl kula bu-rub- f aas-a

DEF-rope also on head-his
The rope as well, they tie it on its head.

The central question formulated thus far has focused on the keyword and its
possessor, asking which of the two, if either, are the elements which contract
discourse relationships with the preceding and following context, as defined by
the parameters summarized in 3.2. The fact that one and the same word can be
idiomatic and literal introduces a further research question, namely whether its
status in one of the two usages doesnt also affect its behavior in discourse. For
instance, (16) asks what the function of the keyword is in following discourse, or
whether it is continued at all. This question naturally divides into two (or four,
depending on perspective): what is the behavior of an abstract keyword mean-
ing in following discourse vs. a literal meaning, and, what is the behavior of a
pronoun possessor on a noun with an abstract meaning, vs. on a literal one.
The invocation of the literal vs. abstract parameter introduces a further
consideration, moreover, and that is, whereas aas head has literal and
abstract meanings, there are many nouns which have only literal ones (or lack
abstract meanings), at least in the corpus collected thus far. These nouns,
however, obviously have the same discourse functions and can be subjected to
the same discourse classification as any other, hence a set of what will be called
control nouns have been tagged, which consist of nouns lacking idiomatic
meaning (in the corpus). Keywords, therefore, divide into three classes.

Semantic mapping and idiomaticity

literal and idiomatic meaning, used idiomatically (aas in [2])

literal and idiomatic meaning, used literally (aas in [17])
literal meaning only (bagar in [5])

Equally, the behavior of the pronominal possessor suffixed to each of these three
classes can be compared.
The keywords chosen for the study are the following. To anticipate descrip-
tive material, the basis of the study is a 400,000 word linguistic corpus (see
Section 4). In Table 1 are found the number of tokens of each word, if the word
has both literal and idiomatic meanings, the number of idiomatic tokens is given
in parentheses, the per cent the lexeme represents of the entire 400,000 words
corpus is given, what per cent of the word is idiomatic, and in how many
different texts the word occurs.

Table 1: Corpus summary of keywords used in study.

Keyword Tokens (idiomatic) % corpus % idiomatic Number texts

literal and idiomatic meaning

aas head () . % %
ga heart () . %
iid hand () . %
een eye () . %
baun stomach () . %
literal only
baga cattle a . %
qalla grain %
ruaana foreign language . %
nugura hole . %
total ()

Note: aThese are the tagged tokens only of bagar, which are about half of the total number in the
corpus of 245,000 words. In the case of qalla, the per cent is based on 245,000 words of text.

These words were chosen for the following reasons. First, one backdrop to the
study is that a quantitative treatment was envisaged, so words need to be
chosen with high enough token frequencies to enable statistical treatment. As
far as those with idiomatic meanings are concerned, it turns out that the most
frequent nouns in idiomatic expressions are all body parts. Gab is a dramatic
case. All of its tokens in the corpus are idiomatic. Baun was included because
it is one body part which happens not to be dominantly idiomatic. For the literal-
only keywords, words were chosen which were, again, frequent, and which

Jonathan Owens and Robin Dodsworth

contribute a certain semantic diversity: cattle represents animate nouns, grain

a commodity that is central to Nigerian Arabic subsistence, ruaana foreign
language, non-Arab ethnic group is an abstract noun, and nugura hole is a
simple object. The extensive amount of hand tagging that is required for each
noun meant that there are limits as to how many keywords can be included in
the sample, so no claims are made about overall representativeness as far as the
literal nouns go. However, there are enough tokens to enable an initial contrast
based on the parameters described above.
In Table 1 above two important quantitative and distributional aspects of the
keywords can be highlighted. First, the idiomatic meanings are highly prevalent.
Gab heart is particularly striking, having only idiomatic meaning. Secondly,
the keywords are prevalent throughout the speech community, as witnessed in
the number of texts each occurs in. As will be explained in Section 4, these texts
encompass all of Nigerian Arabic. The keyword aas, in fact, is found in more
texts than any other keyword, and except for baun stomach, none of the
keywords with dual literal/idiomatic meaning occur in less than a third of all
In the case of idioms, all types used in this study consist of two lexemes.
While the analytic focus here is only on the noun, as will be discussed in greater
detail in Section 6, the noun typically combines with a verb, an adjective or a
preposition. The lexical core of the idiom is represented between curly brackets,
followed by the literal and idiomatic translation. In (1) for instance the lexemes
are {iid yaabis hand dry = poor} and in (2) they are {fata aas open
head = educate, enlighten}.

4 The sample
The data comes from a 400,000 word corpus of spoken Nigerian Arabic, which
has served as the basis of various studies of Nigerian Arabic, including socio-
linguistic and code-switching. While some speakers appear in more than one
text, with a small number of exceptions each text features a speaker not found in
other texts. The total number of analyzed texts is 96. These were recorded
throughout native Arabic-speaking Borno, representing both rural and urban
areas. About 250,000 words of the sample are available in audio and transcribed
format at the website given in the bibliography. In addition idioms were picked
up in unrecorded, casual conversation, and these have been included in the

Semantic mapping and idiomaticity

total of idiom types. However, the great majority of idiom types, it should be
noted, come from the recorded texts.
From the two sources have been extracted something in the range of 329
individual idioms or idiom types.7 There are, of course, questions in a number of
cases as to whether one is dealing with one and the same idiom from one token to
another. For instance, the idiom, ligat raas-ha she escaped a dangerous situa-
tion, she gave birth was discussed in Owens (2014: 145), where it was pointed
out that various criteria can be adduced to argue for the unity of this as an idiom
(i. e., one idiom only), or its differentiation (two idioms, escape a dangerous
situation vs. give birth). These are important theoretical points and often decid-
ing for a unitary vs. a differentiated interpretation requires detailed case-by-case
consideration. However, deciding one way or the other (two idioms or one) does
not impinge on the overall statistical analysis described in Section 5 below, and as
seen in (1) and (2) above, as well as elsewhere, in most cases it is an easy matter,
both in collocational and in semantic terms, to identify different idioms.
The 329 idioms (i. e., idiom types) tend to consist of recurrently-used building
blocks. The 101 tokens of gab heart, for instance, occur in 32 individual idioms
and aas head in 40. From the complete list, the idiomatic keywords chosen
represent four the of seven most frequent idiomatic keywords in the corpus and
four of the five most common nouns. All of nouns used as idiomatic keywords are
body parts. In Table 2, the number in parentheses represents the number of idiom
types for the given keyword which are actually attested in the corpus.

Table 2: Seven most frequent keywords occurring in idiom types.

aas head ()
gab heart ()
aal carry
een eye ()
qaim mouth
kaab hold, grab
iid hand ()

5 The hypotheses, the results

In Section 3 parameters were defined measuring the degree of discourse
embeddedness of referring items. Following Hopper and Thompson (1984: 708,

7 The extraction was facilitated with the help of a morphological parser developed by colleagues
in the Faculty of Applied Computer Studies of Bayreuth University (Ackermann et al. Forthcoming).

Jonathan Owens and Robin Dodsworth

see Section 6); the neutral expectation is that referring expressions, nouns,
should play a central role in discourse and should all be equally visible to
discourse processes.
This provides the basis of a first set of tests. Is it the case that all classes of
nouns as defined here, body part vs. non-body part and literal vs. abstract are
equally embedded in discourse, i. e., do they all return the same results against
the parameters defined in Section 3.
A second set of tests looks at the possessive pronoun. Are all possessive
pronouns equally liable to have high previous mention values and to be con-
tinued in following discourse, or does it play a role what type of noun they are
suffixed to, literal, abstract, body part or non-body part, for instance.
The null hypothesis predicts that there will be no distinction among these
classes. However, it could be, for instance, that nouns in idiomatic expressions
are less discourse-embedded, but at the same time that they have a higher
degree of pronoun attachment than literal nouns, and the pronoun provides a
referential noun-stand-in in the discourse.8 For instance, it is notable in the
following (18) that while the grammatical subject of the verb i-door is gab, the
following subject, ba- I continues the possessor of gab-, as marked by the co-
indices. Here it is the possessive pronoun which contracts a referential relation
in the following clause.

(18) [kan gai-iii ii-door iya bikaan] [baii-guul] [baii-wadd le

if heart-my 3M-want little place I-say I-send.him to
aj juduud-a a]
RC-ancestors-his Arabs
If I were to prefer a place, I would say Id send him to one whose ancestors
are Arab. (IM144)

5.1 Results

Table 3 shows the token counts from the subset of the corpus used in the
following statistical analysis.

8 Cf. Johnson-Lairds (1993: viii) remark that in the idiom pull my leg, my leg refers not to
my leg, but rather to me.

Table 3: Keyword tokens from corpus.

Abstract body Literal body Literal non-body Total

part part part

Possessor pronoun
Keyword syntactic function
prepositional object
Keyword continued in following
not continued
prepositional object
Previous mention of possessor
new referent
in previous clause
NA (no possessor)
Possessor pronouns coreferent
in same clause
no coreferent
NA (no possessor)
Possessor pronoun has a
coreferent in following clause
NA (no possessor)

Note: *10 tokens are missing from this variable as the result of occurring in syntactic positions
with zero tokens in at least one column of this table.

Jonathan Owens and Robin Dodsworth

The statistical analysis is structured around three questions:

(i) Are keywords with abstract referents more likely than keywords with
literal referents to occur with pronominal possessors?
(ii) Which factors, if any, significantly favor/disfavor keyword persistence in
the following clause?
(iii) Among keywords that occur with pronominal possessors, which factors
affect the coreference of the possessor with some element of the following
clause? (That is, which factors affect the persistence of the possessor?)

We address these questions chiefly via generalized linear mixed effects

regression, and we assess the results in part by considering the data on an
idiom-by-idiom basis. If pronominal possessors serve as de facto subjects for
abstract idioms, then we expect abstract keywords to occur with pronominal
possessors more frequently than keywords with literal referents, controlling
for other factors and for differences between idioms. Further, we expect
abstractness to disfavor keyword persistence in the following clause, whereas
we hypothesize that abstractness will favor persistence of the possessor.

Question 1: Are keywords with abstract referents more likely than keywords with
literal referents to occur with pronominal possessors?

This question was addressed with a generalized linear mixed effects model
having the binary dependent variable of possessor pronoun (see (14), Table 3).
The fixed effects were (1) the three-way keyword distinction of abstract body
part, literal body part, and literal non-body part (see [12, 13]) and (2) the key-
words syntactic function (see [15]). The data did not support an interaction term
in the model. In addition, the model had random intercepts for keyword. The
results (Table 4) show that abstract body parts are significantly less likely than
literal body parts to have possessors, but more likely than literal non-body parts.
The major contrast here is between (all) body parts and non-body parts, rather
than between abstract keywords and literal keywords.
Regarding in particular the significant association between literal body part
and possessor pronoun, which we had not anticipated, it has often been observed
that body parts are inherently possessed nouns.9 A body part is someones body

9 For instance, if a language has a formal distinction between alienable and inalienable
possession, body parts will be expressed in the latter category (cf. e. g. Heine 1997). Arabic,
as it turns out, does not make this distinction, though literal body parts are marked in the
corpus by their high degree of occurrence with possessors. A factor disfavouring pronominal
possessors on abstract body parts is discussed below.

Brought to you by | Universidade de Braslia UNB

Table 4: Estimates from generalized linear mixed effects model with

presence vs. absence of a possessor pronoun on the keyword as the
dependent variable.

Variable Estimate

Keyword type
literal body part .*
literal non-body part .***
reference level = abstract body part
Syntactic function of keyword
adverb .
object .
prepositional object .
possessor .***
topicreference level = subject .

Note: * = p < 0.05, ** = p < 0.01, *** = p < 0.001.

part. This basic fact, and not the parameter of abstractness, is probably what lies
behind the higher degree of possessor pronouns on literal body parts.
In addition, subjects are more likely than possessors to have possessor
pronouns, a result which holds irrespective of the literal/non-literal contrast.
To interpret this finding, when keywords occur as possessors they often form
word-like units. This in fact is a major aspect of our treatment of idioms in
general in Section 6, including but not limited to idiomatic keywords as pos-
sessor. An instance of idiomatic keyword as possessor is illustrated in (22c)
below, where it is argued that aas as possessor is part of a process which we
term semantic mapping, which downgrades the discourse independence of
idiomatic keywords in general, in this case disfavoring a pronoun possessor
on the idiomatic noun possessor.
As far as literal keywords as possessors go, they also can form word-like
compounds. Arabic is a language which uses formal compounding sparingly.
Often, however, Pssd + Pssr units (iaafa in traditional terminology) form word-
like units. This can be discerned in the current sample. For instance, the literal
keyword with the largest exposure to possessor function is bagar cattle, with
sixty tokens. Thirteen of these tokens occur in one fixed collocation, suug al-
bagar market of the cattle = the cattle market. This always refers to the largest
cattle market in NE Nigeria, which is located in Maiduguri, and is a unique
referent, which probably explains why the possessor bagar in this collocation is
never itself possessed.
Figure 1 below reveals some variability across keywords with respect to the
relationship between abstractness and the presence of a pronominal possessor.

Figure 1: Raw counts per keyword, presence vs. absence of a possessor pronoun.

Of the five keywords that have both abstract and literal referents, literal body
parts predominantly have possessors for three of them: baun stomach, een
eye, and iid hand. By contrast, when aas head refers to a literal body part, it
shows only a slight preference for a possessor, whereas its abstract referents
favor the presence of a possessor to a greater extent than the former three
keywords. Gab heart, which only occurs with abstract reference, also favors
the occurrence of a possessor moreso than baun, een, and iid. Of the four
keywords with only literal non-body part reference, all disfavor possessors, with
ruaana foreign language (which also has the fewest tokens by far) showing
the least bias. This variability deserves a more detailed treatment than is
possible in this article, inter alia because it touches on the factor of lexical
conditioning,10 which leads to levels of detail whose more comprehensive treat-
ment requires a larger set of keywords than that on offer here. It is important to
note that the regression model included random intercepts for idioms, and so
the results in Table 4 are, theoretically, independent from the differences across
individual keywords. We would, however, like to discuss one aspect of the data

10 I. e. the extent to which a given grammatical phenomenon, e. g., reference in following

clause in the current case, correlates with delimitable lexical classes, such as idiomatic or
literal collocates.

Semantic mapping and idiomaticity

presented in Figure 1 in order to indicate one way an examination of individual

frequencies contributes to a closer understanding of idiomaticity.
As noted, against our expectations, literal body parts are more likely to be
possessed than are non-literal, idiomatic ones. We suggest that one factor in this
is the following. When body part nouns are used idiomatically, the inherent
body part connection can be attenuated or broken. For instance, iid hand is
realized literally in 65 tokens, 57 of these possessed by a pronoun possessor, as
in kaab-t-a-fi iid-ak you held it in your hand (GR161, see Figure 1). Only three
literal iids are possessed by a noun possessor. Among the abstract iids,
twenty-one of the tokens instantiate a single idiom {iid X} = hand of X = Xs
possession, e. g., fi iid al-hakuuma in the hands of the government/the govern-
ments. In this case, six of the possessors are nouns, as in the example. As soon
as the possessor of iid is dissociated from a human possessor, the discourse
inherent tendency to pronominalization wanes, and non-pronominal possessors
can multiply. This would thus appear to be one factor accounting for the lower
degree of possessive pronouns among abstract body parts.

Question 2: Which factors, if any, significantly favor/disfavor keyword persistence

in the following clause?
This question was also addressed using a generalized linear mixed effects
model with a binary dependent variable (keyword continued vs. not continued),
random intercepts for keywords, and the fixed factors of keyword type, keyword
syntactic function, and presence/absence of a possessor pronoun. The results
(Table 5) indicate that literal non-body parts are significantly more likely to
persist into the next clause than abstract body parts (see Examples (11), (12),
(13)), which is consistent with the hypothesis developed in Sections 6 and 7 that
the abstract body parts are not canonical arguments. Subjects are more likely to
persist than prepositional objects or possessors, but they are less likely than
topics to persist (see [15, 16]). The presence of a pronominal possessor suffix on
the keyword falls just short of significance (p = .07), though the interaction
between keyword type and pronominal possessor is significant such that
among literal non-body parts, keywords with possessor pronouns are more
likely to persist than those without possessor pronouns.
Figure 2 below shows that among body parts, persistence occurs less fre-
quently than not, both for abstract and literal referents. The apparent exception is
baun stomach, for which literal referents persist more often than not. However,
baun as an outlier in this case perhaps illustrates the role that genre and sample
size can play. Twenty of the literal tokens of baun occur in three folk tales. In
these the role of the stomach in the story was significant. For instance, one story
tells about children cutting open a stomach and crawling into it:

Table 5: Estimates from generalized linear mixed effects model with persistence vs. non-
persistence of the keyword in the following clause as the dependent variable.


Keyword type
literal body part .
literal non-body part .**
reference level = abstract body part
Syntactic function of keyword
adverb .
object .
prepositional object .**
possessor .***
topic .***
reference level = subject
Possessor pronoun .
reference level = no possessor
Keyword type * Possessor pronoun
literal body part: possessor .
literal non-body part: possessor .*

Note: * = p < 0.05, ** = p < 0.01, *** = p < 0.001.

Figure 2: Raw counts per keyword, persistence of keyword in the following clause.

Semantic mapping and idiomaticity

He said her children are in its stomach, (they are) inside in her stomach.

It is not so much that stomach is inherently more likely to be topical (and thus
to persist in discourse) as that it turned out to be so in the sample.
Further, Figure 2 shows striking differences among non-body parts. In
particular, bagar is less persistent in discourse than the other non-body parts
with over 100 tokens, such as qalla grain. At the risk of departing from our
statistical script, one factor relevant here may simply be that cattle are so much
an inherent part of Nigerian Arab culture they are nomads par excellence
that cattle take their sequential place in discourse in a way that reflects their
universal presence among many Nigerian Arabs. One does not need to keep
them physically in discourse because they are always there.

(20) kan gammee-na taab-iin sarit al-bagara bas bakaan da kan ga iya
If we embark on pasturing cattle and the place doesnt have enough grass

In (20) the speaker is describing an activity that includes cattle, but has other
key elements as well (e. g., grass, herders), and here, once cattle are mentioned,
the speaker moves on to a different aspect of the activity. In the following the
speaker describes the typical subsistence of rural Arabs.

(21) humma indu-hum bagar wo bi-hert-u

they at-them cattle and 3-herd-PL
They have cattle, and they farm.

The clause following bagar is wo bihertu. While (21) makes a fundamental obser-
vation about Nigerian Arab life, by our discourse measure bagar is not persistent.
In the current case, bagar is so available that in a measure based on immediate
adjacency it might appear to be less persistent than some other keywords.

Question 3: Among keywords that occur with pronominal possessors, which factors
affect the coreference of the possessor with some element of the following clause?
(That is, which factors affect the persistence of the possessor?)
Table 6 shows the results of a generalized linear mixed effects regression
with random intercepts for keyword. The data are subsetted to the 611 tokens
with possessor pronouns. The results show that possessor pronouns modifying

Table 6: Estimates from generalized linear mixed effects model with

persistence of the possessor in the following clause as the dependent


Keyword type
literal body part .
literal non-body part .**
reference level = abstract body part
Previous mention of possessor
situational .
environment .
new referent .
reference level = in previous clause
Syntactic function of keyword
adverb NA
object .
prepositional object .
possessor .
topic .
reference level = subject
Keyword continued in following clause
object .*
prepositional object .*
possessor .
predicate .
subject .
topic .
Reference level = not continued
Possessor pronouns coreferent in same clause
no coreferent .***
non-subjectReference level = subject .

Note: * = p < 0.05, ** = p < 0.01, *** = p < 0.001.

literal non-body parts are significantly less likely to persist in the following
clause than those modifying abstract body parts (and also those modifying
literal body parts). The parameter of body part rather than abstractness moti-
vates this contrast, as there is no significant difference between abstract and
literal body parts (whereas literal body parts favor persistence over literal non-
body parts at p < .05). Further, when the possessor pronoun does not have a
coreferent in the same clause (see (10]), it is significantly less likely to persist
into the following clause than when it is coreferential with the subject of the
same clause. There is no significant difference between subject and non-subject
coreferents. When the keyword appears as an object or a prepositional object in

Semantic mapping and idiomaticity

the following clause, the possessor is more likely to persist than when the
keyword does not persist in the following clause. Finally, previous mention of
the possessor (see [7]) has no significant effect on the possessors persistence;
that is, net of the other variables, a possessor that has already come up in
conversation is not more likely to persist in the following clause.
Figure 3 reveals that, looking only at tokens with possessor pronouns, all
keywords favor the possessors persistence in the following clause. The large
majority of non-literal body parts in this subset are bagar cow, which only mildly
favors persistence of the possessor pronoun. The keyword nugura hole does not
appear in Figure 3 because its 48 tokens all occur without a possessor pronoun.

Figure 3: Raw counts per keyword, persistence of the possessor in the following clause.

5.2 Two key results

To conclude this section we would like to highlight two main results of the
statistical survey. The first is that among the three categories of keywords
examined, abstract body parts represent the category least transparent to the
measures used to define discourse embeddedness (Table 5). The implications of
this central finding will be developed further in the following two sections.
The second is that in contrast to what we had hypothesized, pronominal
possessors of abstract body parts are not in general more frequent on abstract
body parts than on literal nouns, nor are they more likely to represent a referent

Jonathan Owens and Robin Dodsworth

in following discourse than are those associated with literal body parts. Thus,
the observationally well-founded idea that in (18) above the true or dis-
course subject of the verb idoor is the possessor i my = I of gab, would be
manifested by a higher frequency of continuing topicality in discourse, is not
substantiated as soon as a large control group is used as the basis of compar-
ison. What instead is the case is that the idioms investigated here behave no
differently from literal body parts in respect of the discourse embeddedness of
the possessive pronouns which they occur with. The distribution of the posses-
sive pronouns on keywords does not depend crucially on idiomaticity, but rather
on the parameter of body part, and their continuing topicality in discourse
correlates most closely with their coreferentiality in the same clause (Table 6).
Possessive pronouns have, as it were, a life of their own, whose broader under-
standing in Nigerian Arabic discourse awaits further study.

6 Idioms and semantic mapping

The major finding of the statistical analysis shows that nouns in idiomatic
expressions are far less likely than non-idiomatic ones to be embedded in
discourse. They are far less transparent to discourse processes than are literal
ones. Nouns in idiomatic expressions, in the parlance of Thompson and Hopper
(1984: 708) are, like predicate nominals, compounds and incorporated nouns,
less prototypically nominal than non-idiomatic nouns. The current study, how-
ever, is based on a very different type of data than Hopper and Thompson (1980,
1984). The results are arrived at statistically, not via introspection (see discus-
sion in Section 1), they describe mainly discourse-defined properties, not cate-
gorical grammatical ones, and so cannot be exhaustively described by
categorical grammatical rule (e. g., a process of noun incorporation or com-
pounding). Moreover, idiomaticity is of semantic nature. In some sense one and
the same noun occurs alternatively in literal or idiomatic expressions.
Idiomatic aas is not phonologically, morphologically or syntactically different
from literal aas. By the same token, in accordance with the thrust of Thompson
and Hoppers distinctions, it was shown that the nouns in idiomatic expressions
do lack a prototypical noun attribute, namely exhibiting a reduced degree of
individuality, qua noun, as measured by a low degree of parametrically-defined
discourse embeddedness.
It is important to bear in mind that the idioms described here are not exotic,
rare constructs to be considered as odd appendages to normal language. They
are not, as it were kick the bucket-type idioms. This is apparent in four ways.
First, the statistics in Table 1 show that for some nouns of relatively high

Semantic mapping and idiomaticity

frequency their normal state in discourse is idiomatic. They are conventionally

idiomatic. Secondly, as discussed in Section 1, clause internally the properties of
nouns in idiomatic expressions which are referenced for grammatical processes
such as topicalization and agreement are no different from literal nouns.
Thirdly, as demonstrated in Section 5 (Tables 4 and 6), pronominal possessors
of nouns in idiomatic expressions maintain a complete syntactic and discourse
independence from their possessed head. In the sense of the statistical measures
used, the pronominal possessor is freely compositional with the possessed
abstract body part noun. Fourthly, the nature of the idiomatic collocations
studied here can be demonstrated in the range of syntactic flexibility which
individual idioms in the corpus show. The following, for instance, partially
exemplifies the spread of constructions in which the idiom {lamma aas join
head = unite} occurs.

(22) a. lamma = verb, aas = object

tawwa lammee-tu aas-ku
formerly join-you.PL head-you.PL
Formerly you united.
b. aas = topic, lamma = predicate
al-kloob al aas-na laamm-inn-a fi l-koob al-waahid da
DEF-club which head-our joined-M.PL-it in DEF-club DEF-one this
the club in which they united us
c. lammiin, mlamm, malamma, lamamaan = possessed verbal noun,
aas = possessor
indu-hum mlamm ar-aa
at-them joining DEF-head
They have unity.
d. uqul mallam-it aas
thing joining-F head
the matter of uniting

While type (22a) (V + DO) is the most frequent {lamma aas} structure, the items
occur across a range of constructions, and as can be seen by the text citations,
are distributed throughout the sample. There is essentially nothing distinguish-
ing the syntactic behavior of aas in the idiom {lamma aas} from its behavior as
a literal noun, a finding broadly in line with studies on idioms in general

Jonathan Owens and Robin Dodsworth

(Nunberg et al. 1994).11 The syntactic flexibility of {lamma aas}is typical, not
exceptional for the idioms studied in this paper.
The syntactic flexibility of the idiom parts, however, appears to contradict
the statistical finding that the nominal keyword in idioms is constrained in
discourse. We will expand on the ramifications of these basic observations in
the rest of this section.

6.1 Semantic mapping

Studies of figurative language over the past 40 years have been dominated by
metaphor theory, foremost that of Lakoff (1987: 446453). For the most part,
however, idioms have played a secondary role in Cognitive Linguistics, subordi-
nated to the central concern of metaphor (Lakoff and Johnson 1999: 68, 1980).
Nunberg et al. (1994) do appear to suggest that an idiom like spill the beans
should be conceived of as a mapping between source and target domain.

To say that an idiom is an idiomatically combining expression is to say that the conventional
mapping from literal to idiomatic interpretation is homomorphic with respect to certain
properties of the interpretations of the idioms components. (Nunberg et al. 1994: 504)

From a related perspective, Clausner and Croft (1997: 265) take a middle position
assigning idioms semi-productivity against more fully productive conceptual
metaphors. Still, for them idioms are sanctioned against an embeddedness in an
overarching metaphor. Spill the beans for instance involves a source-target
mapping embedded in a conceptual metaphor THE MIND IS A CONTAINER.
Idioms have played a more important role in one domain of specialization,
namely in experimental psycholinguistic research on figurative language. In this
tradition the embedding of idioms in conceptual metaphors, as with Clausner
and Croft, is assumed, at least for the purposes of an experiment and the focus is
on what idioms can tell us about the metaphors (e. g., Gibbs 1992; Gibbs et al.
1997; Gibbs and Nayak 1989; Glucksberg et al. 1993; McGlone 2007). Often
however, even if it is found that conceptual metaphors do not produce signifi-
cant priming effects on idiom comprehension (e. g., McGlone 1996; Keysar and

11 Though not a universally accepted position. Kavka (2011: 2) for instance, definitionally states
that idioms are bound together lexically and syntactically. While we agree with the lexical
part of the definition, the idioms treated here have no special syntactic attributes marking them
as idioms. While Kavkas definition may apply to some idioms (kick the bucket), the defini-
tion is unlikely to be universally true even in English (see Glucksbergs discussion of idiom
flexibility (1993: 9, 2001: 8386), Nunberg et al. (1994: 520525) on graded acceptability of
transformed idioms, and discussion of compositional and transparent idioms below.

Semantic mapping and idiomaticity

Bly 1999; McGlone 2001: 1012), the status or structure of idioms as such is not
examined further.
One significant exception in this regard is the work of Glucksberg and
colleagues (2001: chapter 5; Cacciari and Tabossi 1993) analysis of the status
of idioms. A brief summary of their main findings and interpretations will help
understanding the status of idioms in Nigerian Arabic as described in this study.
Glucksberg, like others before him, distinguishes various classes of idioms,
though the only one which concerns the current discussion is his third class
(2001: 74) which he terms compositional and transparent. This would appear to
largely fit the type of idioms which have been used in this large corpus study
(see list of characteristics in Section 2). Keeping this point in mind is important,
because a good deal of Glucksbergs discussion, as with much of the discussion
about idioms (e. g., Nunberg et al. 1994; Clausner and Croft 1997), is concerned
with differentiating different types of idioms and gauging the degree to which
each type fits general parameters of idiom behavior. Tailoring the discussion,
therefore, to the compositional and transparent type, Glucksbergs main con-
clusion is as follows. Idioms appear to be memorized configurations (2001: 71,
86), yet paradoxically, it is not the type of memorization involved in learning
words. This is based on comparing word-recognition routines, as defined in
psycholinguistic studies, as opposed to idiom recognition routines. For indivi-
dual words, recognition takes place incrementally, from the beginning to the
end of the word, as the possible interpretation is narrowed down as the form is
spelled out in greater and greater detail (gating phenomenon). With idioms,
however, recognition is not incremental, but rather categorical, occurring only
when a key lexeme in an idiom has been recognized. This lexical key can be at
the end of the entire idiom, so that, for instance, there is no idiom activation at
all in hit the nail on the head until head has been presented (Cacciari and
Tabossi 1988, 1993; Tabossi and Zardon 1993).12 From this Glucksberg concludes
that idioms are not just long words. This conclusion is a reference to the
classic study of Swinney and Cutler (1979: 528) who argue from experimental
(RT) evidence for a model in which idioms are stored and accessed as lexical
items for instance idioms do appear to be compositional.
Instead Glucksberg suggests that the words in idioms have a dual meaning,
their literal meaning, e. g., beans in spill the beans, and the stipulated idiomatic
meaning of the entire construct, e. g., reveal secrets (2001: 78). This allows him

12 Cacciari and Tabossi (1988: 678) see idioms as a part of idiomatic configurations whose
constituent words are, as with Glucksberg, accessed both literally and idiomatically. They note
that there is no logical criterion for identifying where the key lies and hence, as Glucksberg
observes, drawing parallels with the incrementality of word recognition is deceptive.

Jonathan Owens and Robin Dodsworth

to account for the syntactic flexibility associated with compositional and trans-
parent idioms words remain words in their literal meaning as it were and
their idiomatic meaning which is associated with the words as a part of the
entire idiom.13
Glucksbergs analysis serves usefully both as a practical and theoretical hinge.
Practically, even if one wanted to embed the current study in the larger domain of
metaphor, it is not a question which can be addressed here given the relatively
modest state of linguistic research on Nigerian Arabic. Knowledge of metaphors
goes to the heart of cultural matters which, in the current context, implies out-
standing anthropological linguistic research. Theoretically, Glucksberg, in any
case, offers a perspective on figurative language, idioms included, which sees
no necessary role for conceptual metaphor (see references above). Regardless of
how one evaluates the extremely broad field of figurative language in this respect,
Glucksbergs ideas allow the question of idiomaticity in NA to be phrased in terms
of specific expectations. In particular, Glucksberg attributes to the words in idioms
a dual meaning, both a literal and an idiomatic one.
The data presented in this paper, however, suggests a different interpreta-
tion. Keywords which can be both literal and idiomatic, such as aas, behave
differently according to whether they are literal or idiomatic. They cannot be
both and, as with Glucksberg. Crucially, their different behavior in the two
modes resides not in the syntax a direction Glucksberg was looking towards
but rather in their discourse embeddedness, their lack of discourse transpar-
ency.14 Words in idioms are referentially severely restricted in discourse, even if
syntactically unconstrained, as illustrated in (22). In a nutshell, literal nouns are
nouns because they have phonological, morphological and syntactic properties
of nouns, and they are transparent to discourse. Nouns in idiomatic expressions

13 We should caution that the nature of the data in this article and in Glucksbergs work are of
course different, discounting any language difference. Glucksbergs conclusions are based on
responses to cued stimuli whereas the current one is based on spontaneous production of
language. Still, since psycholinguists build general models on such response data, it can be
assumed to be applicable to understanding language in general. As far as language goes, we
assume a universality of interpretive relevance for any language. Note that the study by Cacciari
and Tabossi which Glucksberg used for his model was conducted on and in Italian.
14 At this point one might object, against note 14 above, that processing does indeed function
differently from production, and that Glucksbergs model describes only processing/perception.
There is no empirical evidence for such an assumption, and it is not one in any case emphasized
in the psycholinguistic literature on idiomaticity. In any case, our model does not assume
parallel literal/abstract processing during production; only the idiomatic/abstract meanings are
called up in the production of an idiom, an interpretation commensurate with our data showing
that literal and abstract/idiomatic meanings are different in discourse.

Semantic mapping and idiomaticity

have the first three properties as well, but their discourse transparency or
embeddedness is severely curtailed.
In this section we offer an account of what happens to the idiomatic
keywords that make them relatively opaque to discourse. In the course of this
discussion we will offer further refinements to the representation of indivi-
dual idioms. We elucidate the nature of idioms with a metaphor of our own,
which we term semantic mapping. What is striking is how many of the
idioms considered in this study have single word paraphrases. The following
is a brief list.

(23) Lexical equivalents to idioms

idiom equivalent lexeme
(a) aas-a xafiif = muooun
head-his light = mad
He is mad.
(b) fata aas-na = garraa-na
open head-our = taught-us
He taught us.
(c) ligi aas-hum = aawan-
got head-their = helped-they.him
They helped him.
(d) farrag-na/faal-na aa-na minhum = xallee-naa-hum
split-we/separate-we head-our from-them = left-we-them
We split with them.
(e) ligi aas-a = erred
got head-his = ran away
He escaped.
(f) galb-i faar/gamma = zil katiir
heart-my boiled/got up = angered.I much
I am furious.
(g) tallaf-at galb- = zaal-at-ni
spoiled-F heart-my = anger-F-me
She angered me.

All of these idioms share a meaning with a single lexical item. An argument of
the idiomatic verb, or an adnominal possessor of a noun can be represented as
mapped on to the subject or object of the single lexical item, while the
idiomatic collocation {tallaf gab} for instance in [23 g]) is mapped on to the
lexeme (zaal).

Jonathan Owens and Robin Dodsworth

(24) = (23g)

As an initial entry to understanding the discourse behavior of idioms, these

idioms can be said to undergo a semantic mapping. The broken line in (24)
represents a semantic mapping; {tallaf gab} is represented as mapping to
zaal. Idioms are not so much long words, but rather lexico-grammatical
constructions of various types (more on which below) which share their meaning
with individual words, while their individual collocates maintain their phono-
logical, morphological and syntactic integrity as single lexical items.
Elaborating on the classic formulation of Swinney and Cutler (1979) noted
above, it is more accurate to say here that the meaning of the idiom is mapped on
to whatever semantic space underlies both the idiom and the corresponding lexical
item (see below). This is apparent in a number of idioms, for instance, lig-at aas-ha
she gave birth (lit. she got her head). This is synonymous with wild-at she gave
birth. Wilid, however is more general than {ligat raa-ha}. A tree, for instance, can
bear fruit, i. e., tuulid yaal-ha it reproduces its children = fruit, but in so doing does
not !{tilga aas-ha}. {ligat aas-ha} and wilid share a reproduce meaning, but only
as far as humans go. This can be represented as in (25), with {ligat aas-ha} over-
lapping with only a part of the semantic domain of wilid X.

(25) reproduce

Of humans ligat aasha

Of animals

Of biologically
reproducing organisms

Before considering the idea of semantic mapping more closely, it is relevant to

consider the basis on which the idiomatic collocations described in this study
are built.

Semantic mapping and idiomaticity

The approach followed here is that the idiomatic collocations are based on
abstract meanings of the component lexemes.15 In (23a), it is a property of head
which is referenced in the idiomatic meaning. Following Riemer (2005), the
basis of the idiomatic meanings can be thought of as either metaphoric or
metonymic extensions of a literal heart, head, hand etc.16 As is the nature of
metaphors and metonyms, these extensions are motivated by properties of the
metaphoric or metonymic source. In (23a) for instance, it is the typical function
of what a head does, its cognitive capacity, which is instrumentalized. This same
extension applies to (23b). In (23c)(23e), on the other hand, it is head as
representative of an individual, the individual as active agent. Here, head
stands in for the individual, along two dimensions. The head is physically
prominent, and it is the organ which controls actions. In Riemers model, the
abstract extension of lexemes form new, conventionalized meanings, creating
polysemous lexemes. If NA head has a meaning representing the cognitive
state of an individual (see (26) below), this is one basis of its polysemy.
In contrast to the lexical polysemy data analyzed by Riemer (see Note 17),
the metaphorical and metonymic extensions of NA aas are apparent only in an
idiomatic context. Here it is therefore relevant to briefly introduce a basic
representation of essential attributes of idioms.
The idioms are based on abstract meanings of the idiomatic keywords.
These meanings can be represented in what we term an attribute extension
sense taxonomy. The idea of attribute extension borrows loosely from the idea
of property attribution as developed in the psycholinguistic work on idiomaticity
and metaphors of Glucksberg and colleagues (Glucksberg et al. 1993; Glucksberg
and McGlone. 1997, 1999; McGlone 2007). As McGlone (1996: 457) explains, the
metaphoricity of Our marriage was a rollercoaster ride, is interpreted as

15 For reasons of space, enduring questions about monosemy or polysemy and if polysemy, the
degree to which and how polysemy is circumscribed are not discussed (see Owens 2015 for
discussion based on current data).
16 Analyzing the Warlpiri verb pakarni, Riemer begins with a basic meaning of hit or hit with
an object such as a hand. He terms this a prototypical centre (2005: 327), also termed core
meaning (2005: 345). Pakarni has a number of further meanings, including kill, pierce, paint,
perform dance ceremony. Each of these meanings is derived via a metaphoric or metonymic
application. The meanings kill and perform a dance for instance, are both seen as effect
metonymies, killing a causal metonymy from hitting, and performing a dance ceremony,
involving hitting feet or instruments against the ground. Although these further meanings are
derived via metaphoric/metonymic extension, Riemer considers them conventionalized in the
pakarni lexeme, i. e., they are no less a part of its meaning as is hit (see also Riemer 2002 on
conventionalized figurative meaning). The historical development of polysemy based on meto-
nymy and metaphor extension is described inter alia in Geeraerts (1997), Enfield (2002) and
Robert (2008).

Jonathan Owens and Robin Dodsworth

matching our marriage with the situations which typify a rollercoaster ride
exciting or full of ups and downs or scary, for instance. One of these properties
is attributed to marriage.
In the attribute extension taxonomy developed here it is experiential proper-
ties of aas on which the main figurative senses proposed are based. Its cogni-
tive control function is paramount in (23b) for instance. In each case, an
inherent, figurative attribute of aas forms the basis of attracting a large range
of the collocates with which aas forms the idioms. Properties of aas are not
matched with but rather extended out of. Each lexical item of the idiom has
a figurative sense, so each lexeme is postulated to have besides a literal, at least
one abstract extension. In (22) {lamma aas}, for instance, the attribute of aas
which is accessed is a psychological/intellectual/sensory capacity of the indi-
vidual which represents an individual as an active agent to the outside world.
The attribute of lamma gather together which is accessed is join together for
collective action. The idiom {lamma aas} is compositional out of these two
meanings: join together individuals in their capacity as active agents for col-
lective action, unite. A very partial representation of the lexico-idiomatic prop-
erties of aas is as follows (see Owens 2015 for details).

(26) Partial attribute extension sense taxonomy for aas

(a) Literal aas (17)
(b) Psychological/intellectual/sensory capacity of the individual:
i. represents the cognitive state of an individual (23a)(23b)
ii. represents an individual to the world as active agent (23c)(23e)17

17 While basing our idea on Glucksbergs idea of property attribution, it differs from the work of
Glucksberg and his associates in postulating an internal polysemic sub-categorization of the
idiomatic collocates. This follows from our invocation of Riemer in defining the internal
semantics of the abstract body parts. Glucksberg, to this point, speaks only of idiomatic
word meanings (1993: 12), with a word like spill in spill the beans having an idiomatic
meaning. While for Glucksberg individual words in an idiom can acquire an abstract meaning,
he does not explore the possibility that the collocates themselves can be embedded in a
conventionalized, structured polysemy. The ramifications of the different approaches need to
be worked out in further research.
Even in the CL tradition, with a bias towards polysemic analyses (e. g., Brugman and Lakoff
1988), the treatment of idiomatic collocates as discretely polysemous is approached with
caution. Gibbs (1993: 65) comes close in this respect individual parts of idioms have some
figurative meanings that contribute to the overall nonliteral interpretations of idiomatic
phrases. In general, however, Gibbs works to embed idioms in conceptual metaphors rather
than explore the systematic semantic decomposability of the individual collocates further (1993:

Semantic mapping and idiomaticity

Note that even if the attribute extensions are based on metonymic and meta-
phorical associations, following Riemer (2005) there is no need to build meta-
phoricity and metonymicity directly into the taxonomy. The essential point is
that aas has conventionalized, abstract meanings which in the current model
define a polysemy in the Nigerian Arabic aas. As already pointed out, conven-
tionality in this study has, inter alia, been established by frequency of occur-
rence in a natural corpus. There is equally no need, as with Glucksberg, to see a
residual literal meaning in the constitutive idiom lexemes. Indeed, doing so
contradicts a major finding of this study that idiomatic keywords are measurably
discourse distinctive from their literal guise.
Idiomaticity directly accesses conventionalized, abstract meanings which
can be represented on an attribute extension taxonomy. If an abstract meaning
is accessed, the literal one is not, and vice versa. A second property of idioms is
that they require a stipulation that given senses in the taxonomy must (see Notes
18, 19) be accessed in order for the idiom to be successful as an idiom. Given
sense means that for a successful idiom only the lexemes in which the senses
are found can be used. For the unite idiom, for instance, it is not possible to
use kambal gather together instead of the related lamma gather together.

(27) kambal-tu aas-ku

heaped up-you.PL head-your.PL
You gathered your heads together.

(27) suggests that people work in a slaughterhouse and have the job of collecting
severed heads, i. e., it is a possible collocation, but without idiomatic attribute
extension legitimacy, will only be interpreted literally. The following (28)

(28) !lammee-tu lisaan-ku

join-you.M.PL tongue-your.M.PL
!You joined your tongues.

is grammatical, but semantically odd, even if, for an outsider, it might appear a
good candidate for unite.18 Note that the idea of a lexical stipulation also has its
correlate in the psycholinguistic studies discussed above: in online processing tests

18 In fact, {lamma qaim} join mouth is equivalent to {lamma aas}. For certain idioms there is
a small degree of substitution leeway. Whereas {fata aas open head = teach} is the most
common collocation for this meaning, it does also appear as {kaah aas} with the exact
same meaning. Kaah is a synonym for fata open. In this case there are five {kaah aas}
tokens and twelve {fata aas}, with speakers using one or the other, not both, so here one might
be dealing with lectal (dialectal, idiolectal) differences.

Jonathan Owens and Robin Dodsworth

idioms are not recognized until the key word is given. The key and the stipulated
collocates in our model can be postulated to be the same item(s). However, rather
than see stipulation (in our sense, see Note 18) as part of an online processing
procedure, we understand it as a conventionalized part of the idiom itself.19
The word-like effect which is represented in (24) and in (29) below can be
understood against two properties summarized in this section. First, the idioms
are successful only if the given collocations are chosen. Secondly, the idiomatic
meaning is supported only in the context of the idiom. Raas out of any context
means head; aas as representing an individual to the world as active agent
only occurs in the context of the limited number of stipulative collocates {lamma
join, ligi get, farrag separate } which support this meaning.
At this point it is time to return to the representation in (24). Idiom formation
combines the senses represented in two lexemes defined on the attribute exten-
sion taxonomy and maps them onto a meaning which it shares with a single
lexeme. Note that in this perspective the semantics of the idiom is compositional
of the abstract meanings of the constituent lexemes.
The NA idiom formation can be thought of as word formation which leaves
phonology, morphology and syntax in tact. In (24) above, for instance, whatever
attribute extensions of tallaf spoil, ruin and gab heart are called upon to
create the meaning anger combine in the idiom to create a word-like meaning.
Idiom formation does not create exact synonyms, and indeed in NA there are
important idioms which have no obvious single word paraphrase, yet still
behave exactly like the idioms described here. One in fact is {lamma aas},
which has no single word equivalent.20


{lammee-tu aas-ku} X

join-you.M.PL head-your.M.PL

19 Not wanting to add to terminological confusion, we note here that stipulation is also used
by Glucksberg (1993: 12, 23). For Glucksberg, however, what is stipulated is the entire idiomatic
meaning. Stipulation in our sense is the arbitrary requirement that only specific lexemes
collocate to form the idiomatic meaning. At this point in our model, the predictability, or lack
thereof of the idiomatic meaning, given an attribute sense taxonomy for each constituent
lexeme and the stipulation that the idiom be formed of the given collocates, is beyond the
scope of the current discussion.
20 Others without single word equivalents include {aal aas carry head = convince}, {aas ja head
came = get orientated, regain consciousness}, and {sanad aas support/raise head = compete}.

Semantic mapping and idiomaticity

Even here, that one is dealing with a word-like derivation can be seen in the fact
that contemporary NA can easily create lexeme-like equivalents to (29). One
way, for those who know Standard Arabic is to borrow a word from Standard
Arabic, wahhad unite, while another is to use a common codeswitching strat-
egy based on the verb do, sawwa yunayting do uniting = unite (Owens 2005,
Still, while the semantics of NA idioms can be understood as a kind of word
formation process, as (22) reminds us the individual keywords never lose their
individual syntactic integrity.21

7 Why nouns in idiomatic expressions have

reduced discourse embeddedness
In the light of the discussion in Section 6 it is now time to return to the
observations on the discourse embeddedness of referring expressions in NA. It
was seen in the statistical analysis that idiomatic (abstract) meanings of lexemes
are far less likely to be discourse embedded according to the measures described
in Section 3 than are literal nouns. This difference holds whether idiomatic
meanings are contrasted with literal meanings in general, i. e., including lex-
emes which only have a literal meaning, or whether body parts alone are
contrasted according to whether their abstract, idiomatic meaning is accessed
vs. their literal meaning (Table 5 above). One explanation for this difference can
be discounted, and that is that nouns in their idiomatic guise are in some way
syntactically degraded, made less noun-like, in the manner for instance of noun
incorporation. (22) shows, however, that the idioms which form the basis of this
study are highly active syntactically. In fact, categorically they are uncon-
strained. The restrictions on discourse embeddedness so far as this rather
detailed analysis of referentiality goes, operate at the semantic and lexemic

21 Our approach to idioms addresses the much discussed relation between compositionality
and noncompositionality of idioms recalling the tenor of Titone and Connine (1999: 1667), but
not their concrete proposal. Titone and Connine argue that idioms are both compositional and
noncompositional. For them, as with Glucksberg, compositional idioms access literal lexical
meanings, but they would also see them as holistic entities due to their high degree of
conventionality. Our model is, as it were, modular. It locates noncompositionality, our semantic
mapping, in the semantics (see Swinney and Cutler 1979) but compositionality, e. g., syntactic
flexibility, in the grammar. However, it adds one further factor, namely that idioms need to be
collocationally stipulated against the relevant abstract meanings of the constitutive lexemes.

Jonathan Owens and Robin Dodsworth

The analysis of idioms developed in Section 6 offers an alternative expla-

nation. It is argued that idioms can be conceived of as participating in a
semantic mapping in which two items, most frequently a verb + noun combi-
nation, are mapped onto a semantic space which is equally occupied by single
lexemes. This space represents a common ground which both single lexemes
and idioms access. Idioms do not merely paraphrase single lexemes. This way
of conceptualizing the mapping allows it to be postulated even where there is
idiom access, but no single word access (29). Note that our analysis accom-
modates Glucksbergs observation that idioms are not simply long words. Here
the issue hinges on what one understands by word. In phonological, mor-
phological and syntactic terms the constituent lexemes are individual words.
However, in their lack of referential transparency they lose individual word-
like status, becoming indeed, like long words.22 In a similar way that fox in
the verb compound fox-hunt is invisible to discourse (Hopper and Thompson
1984: 708), so too are the idiomatic body-part nouns of severely limited
discourse transparency.
With this background, it is now possible to draw a relationship between the
results of the statistical analysis and the characterization of idioms made in
Section 6. This can be done from two complementary and related perspectives.
Looking at the matter in terms of the nature of idioms, since the individual
lexemes in idioms have their meanings only in the context of their idiomaticity
(see discussion around [27, 28]), and since the idioms considered in this study
consist of at least two lexemes, the lexeme qua discrete meaning bearing
element is subordinated semantically to the larger idiomatic whole. It follows
that it is not the individual lexeme in the idiom which is visible to discourse, but
rather the idiomatic unit as a whole. Since the idiomatic meanings of the body
parts, as exemplified in (26), exist only stipulatively in the context of an appro-
priate idiomatic collocation, not individually, they do not achieve the degree of
individuation necessary for discourse transparency.
Phrasing this in terms of semantic mapping, the meaning of the idiom is
mapped as a whole onto a semantic space. The nouns lose their individual
referentiality hence discourse transparency, with the outcome described in
Section 5. This is a semantic process which does not impinge on the phonology,
morphology and syntax of the constitutive lexemes.

22 It would be pointless to argue that phonology, morphology and syntax derive from the
literal meaning of an idiom, from literal aas, while idiomatic aas is present in semantics only.
A literal meaning is as much a part of semantics as is idiomatic, while there is no evidence
which allows one to claim that the syntactic permutations illustrated in (23) derive exclusively
from a literal meaning of the constituent lexemes.

Semantic mapping and idiomaticity

Acknowledgments: The authors would like to thank Prof. Jidda Hassan, Prof.
Sherif Abdulahi, Ibrahim Adamu, Kellu Ibrahim and Prof. Bosoma Sherif for
their persistent support in the research on Nigerian Arabic.

Funding: Research was generously supported by the Deutsche

Forschungsgemeinschaft (German Research Council) under grant OW 5/5-1/3,
Idiomaticity and lexical realignment in spoken Arabic.

Jonathan Owens and Robin Dodsworth

Semantic mapping and idiomaticity

Online data corpus:

Owens, Jonathan and Jidda Hassan. In their own voices, in their own words: A
corpus of spoken Nigerian Arabic.

682 Jonathan Owens and Robin Dodsworth

This appendix adds further examples illustrating the parameters defined in
Section 3.2, showing that nouns in literal and idiomatic expressions occur in
parallel contexts.
Literal non-body part noun qalla grain, referred to by object pronoun in two
following clauses.

(a) i-jiib-ui lee-naiii al qallaii foog at tawwaar nin-uiii bi-kiil-uuihaii min borno bi-
theyd bring us grain on the bulls. Wed go and they measured (i. e.
bought it in measures) it from the Kanuri

Literal body part noun een eye, referred to by object pronoun in two following

(b) eeni-ak da ma ti-sill-ahai tu-zugg-ii-ni ba-ai

eye-your DET no you-remove-it.F you-throw-F-me with-it.F
That eye of yours, dont take it out and hit me with it
(GR153, from a folktale)

Noun in idiomatic expressions referred to by object pronoun in following clause.

(c) ana gul wu al-katkad di di bas een-i hu rijil- bas ar raajil bi-iil-hai bi-
I said this very paper is my eye and my leg (necessity) then a person takes
it and throws it

Note that this has an ambiguous reference. The object pronoun biiilha is
feminine, as is een eye, so grammatically ha can be seen as anaphoric to
een. However, the subject noun paper, katkada, is also feminine, so it could be
that ha refers only to this noun.

