You are on page 1of 18

Available online at www.sciencedirect.

com

Lingua 119 (2009) 1523–1540


www.elsevier.com/locate/lingua

Reading between the (head)lines: A processing account of


article omissions in newspaper headlines and child speech
Joke De Lange a, Nada Vasic b, Sergey Avrutin a,*
a
Utrecht University, Department of Linguistics, Trans 10 3512 JK, Utrecht, The Netherlands
b
University of Amsterdam, Department of Dutch studies, Spuistraat 134 1012 VB Amsterdam, Amsterdam, The Netherlands
Received 10 July 2007; received in revised form 24 September 2007; accepted 16 April 2008
Available online 30 September 2008

Abstract
In psycholinguistic research, minimized, or reduced structures, are a characteristic property of the child and agrammatic Broca’s
aphasics’ speech. These constructions have received much attention in the linguistic literature, with a particular focus on the
omission patterns the two populations exhibit. In most cases, such minimization or reduction has been explained in terms of
different underlying knowledge mechanisms, for example as an incorrect parameter setting, or a partial reduction (‘‘pruning’’) of a
syntactic tree. It is remarkable, however, that in unimpaired adult speech reduction (or minimization) of structure is also possible. As
a characteristic of unimpaired speech, these utterances have been investigated to a much lesser degree. In this article we offer a
processing account of the cross-linguistic similarities and differences observed in the pattern of article omissions in newspaper
headlines, one of the so-called special registers. We demonstrate that there are interesting similarities between this register and the
pattern of omissions observed in small children. Furthermore, this pattern is observed both in Dutch and Italian: two languages that
differ in their inventory of the article system. We present an information theoretical model of article selection and production that
explains the optional nature of the structure reduction in both child speech and newspaper headlines. We conclude that minimization
(or reduction) of structure can be explained in terms of optimization of the underlying processing mechanisms.
# 2008 Elsevier B.V. All rights reserved.

Keywords: Article omission; Language processing; Language acquisition; Newspaper headlines; Information theory; Cross-linguistic

1. Introduction

In the linguistic literature, omissions of articles in language production have mainly been ascribed to populations
with limited processing resources, such as children acquiring their first language and adults with agrammatic aphasia.
Omissions of articles in the early child language production have been studied extensively (Chierchia et al., 1999;
Gerken, 1991; Guasti et al., 2008; Lléo and Demuth, 1999; Pérez-Leroux and Roeper, 1999; Van der Velde, 2006,
among others). The results of these studies show that children acquiring Romance languages omit fewer articles than
children acquiring Germanic languages. This difference leads to a dissimilar developmental pattern of the article
systems in the respective types of languages; children acquiring Romance languages reach an adult-like state at an
earlier age than their Germanic peers. In the previous literature, these differences have been explained in terms of

* Corresponding author.
E-mail address: sergey.avrutin@let.uu.nl (S. Avrutin).

0024-3841/$ – see front matter # 2008 Elsevier B.V. All rights reserved.
doi:10.1016/j.lingua.2008.04.005
1524 J. De Lange et al. / Lingua 119 (2009) 1523–1540

differences in semantic properties of nominal systems (Chierchia et al., 1999) and differences in the morphosyntactic
systems of the two types of languages (Pérez-Leroux and Roeper, 1999). For an overview on agrammatic article
omissions see Ruigendijk (2002).
Nevertheless, omission of articles can also be found in healthy speakers in the so-called special registers, such as
diary style (Haegemann, 1990), colloquial speech, telegram style and newspaper headlines (Avrutin, 1999; Stowell,
1999). We focus on the omission of articles in headlines and we examine this phenomenon cross-linguistically in
Dutch and Italian, exemplified in (1) and (2), respectively.1

(1) NEDERLANDS DRUGSBELEID ONTZET FRANSE REGERING


Dutch drugs policy horrifies French government
KRAB VEROVERT NOORDZEE
Crab conquers North sea

(2) LEGGE GASPARI, PERA MEDIA


Gaspari law, Pera mediates
COMANDANTE ARRESTATO PER SPIONAGGIO
Commander arrested for espionage

A number of studies have examined newspaper texts and headlines from sociolinguistic perspective (e.g. for Dutch see
Nortier, 1995; for English see Bell, 1991; Arnold, 1969; for Italian see Dardano, 1981; Magni, 1992). There are also a few
descriptive linguistic studies on headlines in English (Mardh, 1980; Simon-Vandenbergen, 1981; Straumann, 1935). The
purely linguistic studies of headlines are limited in number. Stowell (1999) was the first to examine English headlines in
the light of the generative framework. He made an attempt at characterizing omissions of articles, auxiliary verb be
omissions and tense omissions with the help of formal syntactic rules. Avrutin (1999) presented evidence for specific
linguistic constraints on headlines, for example constraints on embedding, the use of pronouns, quantifiers, etc. To our
knowledge, only three other studies examined headlines from a linguistic perspective, focusing on cross-linguistic
aspects of article omission and connecting the findings from healthy adults to the omission pattern in children acquiring
their first language (for details, see De Lange, 2008; see also De Lange, 2004; De Lange et al., 2005).
Headlines are an essential ingredient of a newspaper. The most important function of a headline is to convey as
much information as fast as possible. In other words, the reader should be able to grasp the information conveyed by a
headline applying as little processing effort as possible. It has been shown that newspaper readers spend most of their
time scanning headlines without even reading the full stories. A number of studies tested how much the reported
readers could recall after reading full stories versus headlines only (Dor, 2003; Van Dijk, 1988, among others). All of
these studies found that newspaper readers who only scan headlines recall as much as the readers reading full stories. It
seems that the way a headline is constructed, e.g. omission of articles, facilitates its main function.
It is clear, thus, that omissions of articles are not restricted to speakers with limited processing resources but are also
found in healthy adults. Therefore, we should try to obtain a parsimonious model of article selection and production
that could explain, in a more or less unified way, why the optional omissions of articles is a phenomenon both in this
population and in special registers. The knowledge-based accounts of child speech relate omissions of articles to the
differences in knowledge that still exists between children and adults. Nevertheless, it is not reasonable to argue that
adults have suddenly lost the knowledge of the use of articles in their language when they use special registers. In
addition, the optional nature of omissions also suggests that they are not governed by a completely different
grammatical system. After all, there is no known language where the use of articles would be optional. This
optionality, at first sight, also appears to be problematic for accounts that claim that omissions are caused by lack of
processing resources. The healthy adult speakers do not lack processing resources when they use special registers.
In principle, it is imaginable that omissions of articles by adults in special registers are unrelated to omissions of
articles by speakers with limited processing resources. If that would be the case, then there would be no need for a
unified account. However, the data we discuss here demonstrate similarities between the article omission patterns in
adults in special registers and in child speech. The same cross-linguistic differences are present in both populations;
there are more omissions in Dutch than in Italian. On the basis of these findings we propose a unified account of article

1
As opposed to full sentences, the headlines containing omissions are indicated with capital letters.
J. De Lange et al. / Lingua 119 (2009) 1523–1540 1525

omissions in production. This account aims at capturing both the processing difficulties of people with limited
processing resources and the processing difficulties of people with normal processing resources in special contexts.
We focus on headlines where it is allowed (but not required) to omit articles. The optional omissions of articles in
adults were studied by examining the databases of 1000 headlines in both Dutch and Italian. Article omissions in
children were studied by investigating the files of child speech taken from the CHILDES database. The two sets of
articles, Italian and Dutch, differ in the number of members in the set and in the number of features that are relevant for
the correct article selection. The Dutch set of articles consists of three members: een, het and de. The relevant features
for the selection of articles are gender, number and discourse properties. The Italian set, on the other hand, has more
members: il, la, lo, l’, un, una, uno, un’, i, le, gli, del, della, dello, dell’, dei, delle and degli and the features
determining their selection are gender, number, discourse properties, set-partition properties and phonological
context.2 Given these differences it is not surprising that during speech production children’s omission patterns are
possibly different in different languages. To briefly summarise healthy adults take longer to select an article in Italian
than in Dutch in Italian more information is necessary to retrieve the correct article form. Moreover, the required
information becomes available later in Italian than in Dutch (Alario and Caramazza, 2002; Caramazza et al., 2001;
Schiller and Caramazza, 2003). Hence, the Italian articles seem to be more difficult to produce; nevertheless, children
omit them less often than Dutch children.

1.1. Headlines in Dutch and Italian

1.1.1. Database set-up and description of analysis


The database is a collection of headlines that appeared in Dutch newspapers (De Volkskrant, De Telegraaf and
NRC) and Italian newspapers (Corriere della Sera and Repubblica) during the period from October to December
2003.3
The collected headlines were analyzed for a number of characteristics with regard to article omissions in both
languages. The non-legitimate omissions of articles were instances where articles were omitted in contexts that in
normal standard adult grammar would require the use of an article. The legitimate omissions of article were cases in
which the omission of the article was required by the standard rules of grammar of the language, such as in bare nouns.
The correct uses of the article were instances in which all obligatory articles were present and used according to the
standard grammar rules. The type of article, definite or indefinite, that was used or omitted was also determined.4 Due
to space limitations we focus only on the overall number of articles produced and omitted. This is a part of a larger
investigation conducted by De Lange (2008) where she also studied the omission pattern as a function of structural
position, presence of finiteness and other linguistic characteristics.

1.1.2. Results (newspaper headlines in Dutch and Italian)


The results of the analysis of the overall article production and omission in both Dutch and Italian are given in
Table 1.
The second column in Table 1, Article produced, indicates the percentage of produced articles requiring noun-
contexts, exemplified in (3). For the calculation of the percentage of articles produced in the so-called obligatory
contexts, the total number of nouns that required and were produced with an article was divided by the total number of
nouns that required an article.

2
The forms ‘del, dello, etc.’ are partitive articles. And, though they have the morphological form of the ‘preposition’ di + definite article, they
have a completely different function: they are the partitive counterpart of the indefinite articles in the paradigm, and, thus, are used specifically in
‘article requiring’ (and not: preposition requiring) contexts. For example: where un would be used with a ‘count noun’, ‘del’ is used with mass
nouns, to indicate an unspecified quantity or part of the whole denoted by a noun:
(. . .) C’è una mosca dentro la bottiglia - There is a fly in the bottle
C’e dell’acqua dentro la bottiglia - There is (some) water in the bottle
In fact, even in the Lexicon of Spoken Italian that we used for the frequency data (De Mauro et al., 1993) a distinction was made between ‘del,
dello, della, etc.’ when they are used as articles and when they are used as prepositions. This illustrates that they are used in different and
distinguishable contexts, and that treating the partitive articles on a par with prepositions would be wrong.
3
A preliminary comparison between the paper and the digital versions of the newspapers revealed differences between the two. In order to prevent
the effect of these differences on the results, only paper versions of the newspaper were examined.
4
The exact content of the omitted article was not recoverable in all cases. There were cases for which it was impossible to decide between definite/
indefinite, since both would have been possible (but with a difference in meaning). These were counted separately.
1526 J. De Lange et al. / Lingua 119 (2009) 1523–1540

Table 1
Percentages of article production and omission in the headlines examined ( p < 0.0001)
Language Articles produced Articles produced + standard omissions Non-standard omission of articles
Dutch 7.9 12.9 87.1
Italian 58.3 70.8 27.9
Pearson’s x2 x2 = 390.860, p < 0.001 x2 = 391.479, p < 0.001 x2 = 254.654, p < 0.001

(3) PRAAG LAAT DE KALVERSTRAAT ACHTER ZICH


(Prague beats the Kalverstraat (famous street in Amsterdam))
IL PREMIER DIFENDE TREMONTI
(The Prime Minister defends Tremonti)

The third column, Articles produced + standard omissions, gives the percentage of noun-contexts in which articles
were produced or bare nouns were used without articles, such as in (4), which is in agreement with the rules of standard
adult grammar. This percentage was expressed as the total number of noun-contexts that were in agreement with the
rules of standard adult grammar, divided by the total number of noun-contexts.

(4) MOTOROLA IMPONEERT BELEGGERS NIET


(Motorola does not impress investors)
PIÙ FACILE COSTRUIRE NUOVE CENTRALI
(Easier to construct new power plants)

The fourth column, Non-standard omission of articles, shows the percentage of noun-contexts in which an article was
omitted in a non-standard way, as exemplified in (5). This percentage was calculated as the total number of article-
requiring-noun-contexts in which an article was omitted divided by the total number of article-requiring-noun-contexts.

(5) BANKOVERVALLERS VERLIEZEN GESTOLEN PINAUTOMAAT


(Bank robbers lose stolen bank machine)
PITBULL FERISCE DUE BAMBINI
(Pit-bull attacks two children)

The data in Table 1 show a striking difference between omissions of articles in Italian and Dutch. These data also
exhibit optionality in the omission pattern; the articles are sometimes used and at other times omitted. The substantial
difference between article omissions in Dutch and Italian newspaper headlines clearly points out that omission of
articles in headlines is not solely governed by functional motivations, e.g. space restriction, since space is equally
restricted in both Italian and Dutch newspapers.

1.2. Cross-linguistic study of omissions in child language5

In this study we examined spontaneous longitudinal data. There are a number of reasons why this type of data is
used in this study. Unlike cross-sectional data, longitudinal data provide insight into developmental patterns emerging
over time. Spontaneous data are the most suitable data for examination of the extent to which article omissions occur in
the early stages of natural child speech.

1.2.1. Subjects
The data from four Dutch (Abel, Peter, Sara, Tom) and four Italian children (Diana, Martina, Raffaello, Rosa) were
analyzed at several points in time evenly spread across the age range between 1;7 and 3;0 years.

1.2.2. Data
The data were taken from the CHILDES database (MacWhinney and Snow, 1985; for Dutch: the Groningen-
corpus, Bol, 1996; and the corpus, Van Kampen, 1994; for Italian: Cipriani et al., 1989). Table 2 shows the CHILDES

5
A part of this study was performed as a joint project with Maria Teresa Guasti and Anna Gavarrò (see Guasti et al., 2004).
J. De Lange et al. / Lingua 119 (2009) 1523–1540 1527

Table 2
Overview of CHILDES files used in the analysis
Files Age range MLU range No. of utterances requiring an article
Dutch
Abel 111, 201, 203, 205, 207, 210, 211, 301 1;11–3;0 1.2–3.7 487
Peter 200, 202, 204, 205, 207, 208 1;11–2;08 1.8–3.5 537
Sara 2, 7, 11, 17, 20, 27 1;7–2;11 1.2–3.0 428
Tom 110, 202, 205, 206, 209, 210, 301 1;10–3;0 1.2–3.1 553
Italian
Diana 2–10 1;10–2;6 2.0–4.5 457
Martina 2, 4, 6, 8, 11, 13, 16 1;7–2;7 1.2–2.7 500
Raffaello 3, 5, 7, 9, 10, 12, 13, 14, 16 1;9–2;11 1.3–2.7 405
Rosa 5, 7, 10, 12, 13, 14, 16, 18, 20, 21 2;0–2;12 1.4–2.8 398

files used, the age range during the period investigated, the MLU range and the number of article requiring utterances,
in which the article was either used or omitted. The same number of utterances was examined in both languages.6 The
length of the files differed and in the case of shorter files the number of files examined was higher. All the files used
included at least three contexts that required use of an article. The MLU of the first 100 utterances of each file used was
calculated for all children, unless the number of utterances was less than 100, in which case it was based on the total
file. The total numbers of utterances were 1790 for Italian and 2005 for Dutch.
The repetitions of the same sentence, clear imitations of adult input, idiomatic expressions, rhymes and songs and
routine sentences were eliminated from the analysis. The relevant utterances were analyzed by hand and a distinction
was made between different types of utterances. The aim was to compare the data on omission in child speech with the
data on omission in headlines. Therefore, the division of utterance types was the same as the one used for types of
headlines, as much as possible.
A number of different categories in child speech were distinguished. Nouns in isolation were nouns used without
phrasal context, also used by adults, for example in lists or in naming an object as an answer to a question. Children
often use nouns in isolation in situations where adults would use complete phrases. It is sometimes difficult to decide
whether articles are needed or not when a noun is used in isolation. It is possible that in such cases the number of
articles omitted would be overestimated. This is why omissions before nouns in isolation were counted separately. A
number of different types of utterances were distinguished whose scope is beyond the present study. These types of
utterances were: non-verbal utterances (phrases with nouns where the verb was omitted); verbal utterances with a
non-finite verb (phrases with nouns in which a non-finite verb was used); verbal utterances with a finite verb
(constructions with nouns in a phrasal context with a finite verb overtly realized) and utterances in which the noun was
preceded by a preposition (for a detailed discussion, see De Lange, 2008).
The omissions of articles were counted in such a way that all nouns, which would require an article in the adult
language, but were produced without an article by the child, were counted as omissions. For the Dutch data plural and
mass nouns without an article were considered ungrammatical only in those contexts in which the use of an article was
obligatory in the target language. The Italian plural and mass nouns without an article were considered ungrammatical
unless they occurred as direct object or after a preposition. Copular sentences in which the distinction between
argument and predicate was impossible to draw (e.g. This tree is a maple or John is a doctor) were not included in the
counts. The predicates in these constructions could be used without an article in both Dutch and Italian.

1.2.3. Analysis and results


We analyzed first whether there were any differences in non-standard article omissions in both languages (that is
omissions in those cases where in adult speech overt articles are required) by collapsing the data for the whole period
investigated for all utterance types. As can be seen in Table 3, a significant difference was found between Italian and

6
The differences in omissions cannot be attributed to the fact that some children were older than others during the period investigated because
they were all of roughly the same in both languages (Mann–Whitney, z = 0.901, p = 0.367; age in months—Italian: M = 27.7, S.D. = 4.8, age
range: 19–36, Dutch: M = 28.8, S.D. = 4.8, age range: 19–36).
1528 J. De Lange et al. / Lingua 119 (2009) 1523–1540

Table 3
Overall non-standard omission means and standard deviations in whole period and all contexts under investigation
Overall omission
M S.D.
Dutch 62.71 29.63
Italian 38.17 31.85

Table 4
Means and S.D. of article omission in the two different stages of linguistic development
Stage 1: 0.03–0.29 VU Stage 2: 0.3–0.6 VU Mann–Whitney St 1–St 2
M S.D. M S.D.
Dutch 87.05 14.84 41.6 21.84 z = 4.372, p < 0.0001
Italian 58.29 29.5 16.9 17.18 z = 4.097, p < 0.0001
Mann–Whitney Italian–Dutch z = 2.528, p = 0.011 z = 3.395, p < 0.0001

Dutch, analogous to the results of the headlines; the omission of articles in child speech is higher in Dutch as opposed
to Italian (Mann–Whitney, z = 3.012, p = 0.003).
In order to specify the developmental pattern of children’s omission of articles their performance had to be matched
based on an independent measure representative of their linguistic development on the basis of which they could be
grouped. Following Guasti et al. (2008) we used the rate of verbal utterances (VU) as an independent measure of
linguistic development.7 Therefore, for each child the rate of VU in a given file was calculated. On the basis of this
calculation the data of article use/omission were divided into two classes. The first class included observations
obtained when children’s rate of VU was between 0.03 and 0.29. The second one when VU was between 0.30 and 0.60
(VU ranges between 0.05 and 54, M = 30.54, S.D. = 15.75 in Dutch; range between 0.03 and 0.60, M = 27.00,
S.D. = 14.75 in Italian.) Table 4 as can been seen in.
In both stages, omission of articles is higher in Dutch than in Italian (Stage 1: z = 2.528, p = 0.011; Stage 2:
z = 3.395, p < 0.0001). In the second stage Italian children have almost stopped omitting articles and perform almost
adult-like, while Dutch children still omitted articles quite often. The data show that children sometimes produced and
sometimes omitted the article. In the adult data on headlines the same optionality was found (see Table 1).
We propose a single model of language processing (for a detailed overview, see De Lange, 2008) that accounts for
both child and adult data on the omission of articles. This model is a combination of Levelt’s (1989) speech production
model and some ideas expressed in Avrutin (1999, 2004a,b). As described in the following sections, we also rely on the
experimental findings reported in the studies by Carmazza and colleagues (Alario and Caramazza, 2002; Caramazza
et al., 2001; Schiller and Caramazza, 2002, 2003) and the basic tenants of the Shannon and Weaver’s (1949)
information theory as applied to language processing (Kostic, 2004; Moscoso del Prado et al., 2004).

1.3. The place of article production in the general speech production model

The model proposed by Levelt (1989) explains spontaneous speech production in healthy adults. Its main
components are: conceptualizer, formulator (with grammatical and phonological encoding as its subcomponents),
articulator, speech–comprehension system and monitor. We focus on the stages of the speech production process that
take place up to the level of grammatical encoding and lemma selection. Due to space limitations, only the relevant
components will be dealt with here.

7
Age is not a reliable measure of linguistic development since children at the same age may be more or less advanced. An enormous variability
has been found among children in the age range that is relevant in this study (Bates et al., 1995). Differences in morphology and lexicon make it
problematic to use MLU as a measure of linguistic development in cross-linguistic studies. Another drawback in the use of MLU as a measure of
linguistic development concerns the fact that it does not evaluate qualitative aspects of syntactic development (for a discussion, see Klee and
Fitzgerald, 1985).
J. De Lange et al. / Lingua 119 (2009) 1523–1540 1529

The conceptualizer converts the communicative intention of the speaker into a so-called pre-verbal message. At
this level the speaker decides on the informational content as related to the amount of knowledge shared between her
and the interlocutors. The selected information is ordered according to relative importance; the speaker marks the
information status of referents as given or new, assigns topic and focus and so on. Each referent in a message is
assigned an accessibility index, which informs the listener where the referent can be found: in the store with shared
knowledge, in the store with general knowledge or somewhere else. Suppose the speaker wants to inform the hearer
about the car she bought, and that her thoughts are the non-linguistic counterpart of (6) or (7).

(6) I bought a yellow car


(7) I bought the yellow car

The final form of the produced utterance (with definite or indefinite article) depends on the amount of shared
knowledge between participants. In (6) the speaker assumes that the car represents new information for the hearer,
therefore, the pre-verbal message will indicate that there is no shared knowledge between speaker and hearer with
respect to the object ‘car’. An accessibility index will indicate that a new referent has to be stored in the discourse
model. In (7), the speaker assumes that the car is old information, and an accessibility index will instruct the hearer to
search the intended referent in the shared knowledge store. The accessibility index is thus part of the pre-verbal
message of the noun; there is no separate encoding for the semantic information of an article. The selection and
production of the article is triggered by the accessibility status of the referent in the pre-verbal message of the noun.
The output of the conceptualizer is the input to the formulator, which translates the pre-verbal message into a linguistic
structure in two consecutive steps: the grammatical and phonological encoding. In the grammatical encoder, the semantic
information contained in the pre-verbal message activates the matching lemmas in the mental lexicon. The lemma
comprises the lexical item’s meaning or sense and its syntactic properties. The syntactic category of the lemma then
initiates category-specific functional procedures in order to build syntactic structure. The lemma of a noun will contain
information about its grammatical properties such as gender, number of the referent, the accessibility status of the referent
and on topic/focus status of the referent. The grammatical encoder uses this information to select an article that is
appropriate for expressing the accessibility status. This initial mapping creates the so-called surface structure on the basis
of which the phonological forms of the words are accessed and a phonetic and an articulatory plan of the utterance are
built.
Levelt defines use of different registers as ‘varieties, which may have characteristic syntactic, lexical and phonological
properties’ (1989:368). Nevertheless, he never explicitly states that registers are conceptually conditioned. We assume
that information about the choice of a register is encoded in the pre-verbal message and that it depends on the discourse
model and situational knowledge. This is a reasonable assumption since the conceptualizer is the only level at which the
speaker has access to this kind of knowledge. We also argue that the choice of a register style is conceptually conditioned
and takes place at the level of macroplanning. The choice of lexical elements and syntactic structures appropriate for the
realization of speaker’s communicative goal crucially depends on this choice.8 The assumption that selection of a register
style takes place in the conceptualizer is necessary in order to explain why ‘special registrese’ is used only when highly
specific contextual conditions are satisfied. However, this assumption is not sufficient to account for all the differences we
find between normal and special registers. We will show in the remainder of this paper that the cross-linguistic differences
in article use in headlines can only be explained if we take into account differences in processing cost. This cost is related
to the complexity in the morphological article paradigms in different languages.
In fact, Avrutin (2004a,b, 2006) argues that the relative complexity of what he refers to as the morphosyntactic
channel is a determining factor in the competition between two routes for encoding information. Under normal
circumstances, the message from the conceptualizer is translated into linguistic material using the morphosyntactic
channel, which in normal adults in normal registers is the most economical route. All ingredients of the message,
therefore, find their realization in overt speech. The morphosyntactic route, according to Avrutin, is in competition
with an alternative way of encoding information, which is based on presupposition. Avrutin argues that there are two

8
We do not assume that in the case of ‘special registrese’ speakers have really ready-to-use separate ‘special’ registers at their disposal, the coding
for the characteristic properties of special register has to be seen in a more abstract way, such as not marking for the distinction given/new
information or tense specification in the pre-verbal message of utterances in specific contexts. In these specific contexts this may even be an
‘unconscious’ choice, in the sense that given a specific context given/new marking or tense specification are not necessary for the realization of the
communicative goal. But, based on a conscious decision or not, it has to make part of the pre-verbal message.
1530 J. De Lange et al. / Lingua 119 (2009) 1523–1540

possibilities for this alternative route to take over some of the functions of the morphosyntactic route. Either the
discourse channel becomes less expensive than the morphosyntactic channel (for Avrutin this is the case of special
registers), or the morphosyntactic channel is not fully developed (or impaired), which is the case of child and aphasic
speech. What Avrutin’s model lacks, however, is an explicit way of measuring the relative complexity of the
morphosyntactic channel, which we will address below.
Finally, in order to formulate a model of article omission in special circumstances (special registers or child
speech), it is important to consider how unimpaired adults process articles under normal circumstances.
To our knowledge, the only comprehensive groups of studies on article processing are studies by Caramazza and his
colleagues (see for details, Spanish and Catalan data: Costa et al., 1999; Italian data: Miozzo and Caramazza, 1999;
French data: Alario and Caramazza, 2002; Dutch data: Janssen and Caramazza, 2003; German and Dutch data:
Schiller and Caramazza, 2002, 2003).9 In his Primed Unitized Activation Model, Caramazza proposes that the article
production process is tuned into language-specific properties. This stands in contrast to open class lexical items that are
processed in the same way in all languages. According to this model, each article is represented by a language-specific
frame with slots that must be filled with feature values. Each feature corresponds to a separate slot for discourse/
pragmatic information (def), grammatical gender and phonological value of the context (in languages where
realization of articles depends on the phonological context). All slots must be filled in for an article form to be
retrieved. The appropriate article form can be selected only when information about the semantic value, the
grammatical gender and the phonological value of the context are all simultaneously active in the system. However,
the activation of different article forms does not have to wait until all slots are filled. In fact, during the selection
process all articles corresponding with the feature content of a specific slot will be activated. For example, all article
forms compatible with the information masculine (in Italian: il, lo, l’,) will be pre-activated to some degree when the
gender information masculine is specified, even the ones that are possibly incompatible with other features, such as
phonological context or discourse/pragmatic information. The final choice between the activated candidates is made
on the basis of their level of activation at the moment of article selection.
On the basis of numerous experimental studies conducted by Caramazza and his colleagues, a distinction can be made
between the so-called early-selection-languages (Dutch and German) and late-selection-languages (Italian, Spanish,
Catalan and French). This difference depends on the moment the necessary information for article retrieval becomes
available in the process of production of the particular NP that the article is associated with. In Dutch, the selection of an
appropriate article depends only on the retrieval of the noun’s gender and number. As soon as these features are available,
article selection can take place. Therefore, anything that might disturb gender selection in Dutch adversely affects article
selection. In Italian, on the other hand, the selection of the appropriate article form is made at a later point when
phonological phrases are assembled. The process of article selection is, therefore, far less vulnerable; interference with
gender selection does not necessarily affect article selection. There is enough time to resolve conflicting information
about gender before this information is needed for selecting a specific article at the level of phonological phrase assembly.
In Italian, phonological phrase information is crucial for article selection and becomes available at a post-lexical stage
where phonological forms are ordered for output. Crucially, the difference in the article selection process in Dutch and
Italian can be explained by the different role phonology plays in article selection in these languages.
There seems to be a contradiction in findings from regular speech production of adults, on the one hand, and the
pattern of omissions in children and in headlines, on the other hand. The Dutch (and, incidentally, German) adults
process articles faster than adults in Romance speaking languages. Therefore, we would expect more omissions in
headlines in the language where article selection takes longer. However, as we have seen above, children acquiring
Dutch omit more articles in production than children acquiring Italian. In addition, in special contexts we also find
more omissions of articles in Dutch newspaper headlines. We find most omissions in the language with the fastest adult
processing time. The lower the processing time needed for article production, the more omissions are observed, both in
child and adult speech (in headlines). This generalization seems to be rather puzzling. We address this puzzle in the
next section. We will argue that there is an objective measure of complexity of the article paradigm that quantifies the
Italian article set as less resource consuming in comparison to the Dutch article paradigm. Children whose processing
resources are limited demonstrate a better performance in production of articles selected from a ‘simpler’ paradigm;

9
Many studies on language processing focus on the production of open class words (Levelt, 1989; Roelofs, 1997). The processing of closed class
words has only been studied in comparison to the processing of open class words in general (Bradley, 1978; Bock, 1989; Bock and Loebell, 1990;
Garrett, 1982).
J. De Lange et al. / Lingua 119 (2009) 1523–1540 1531

Fig. 1. Time course of articles production processes in Italian and Dutch.

thus, they omit more articles in Italian than in Dutch. It is the speeded language production that matters in headlines.
Using the same objective measure of article selection and production, we will demonstrate that article omission
becomes more beneficial, in this sense, in Dutch than in Italian, hence we observe more omissions in Dutch.

1.4. Addressing the puzzle

In order to solve this puzzle the distinction has to be made between the ‘processing’ time and the ‘selection’ time.
An important observation is that we have to distinguish two ‘time points’: the point marking the start of the activation
process and the point marking the start of the actual selection process. Fig. 1 illustrates the time course of article
production processes in Dutch and Italian, divided in these two stages.
As already pointed out, the final selection of an article in Italian begins later than in Dutch. Nevertheless, the actual
selection process costs less time in Italian. The left block of the time bar in Fig. 1 represents the time during which the
article is activated. The selection of the noun incurs processing cost. However, once the noun is selected, the
grammatical features necessary to fill the article slots are available; they do not require additional processing
resources. This process takes longer in Italian than in Dutch.
As can be seen in Fig. 1, the actual selection process starts later in Italian. Nevertheless, the selection process costs less
time in Italian. We assume that the activation of most of the features (gender, number and the phonological context),
necessary for the selection of the article is a by-product of other processes taking place, such as the selection of the noun.10
In the same way, grammatical features for article selection originate from the selection of another element, a noun, for
example. The selection of the noun itself requires processing resources, which is why there is specific processing cost
related to its selection. Once the noun is selected, the grammatical features necessary to fill the article slots are available.
Therefore, they do not require additional processing resources during the article selection process. They even become
available if the article is not selected (Costa et al., 1999; Miozzo and Caramazza, 1999; Alario and Caramazza, 2002;
Schiller and Caramazza, 2003). This means that the process represented by the left-hand block of the time bar, by itself,
does not make demands on available processing resources with respect to the article production per se. The process of
article selection, of course, does take time and this process takes longer in Italian than in Dutch.
The right block of the time bar represents the actual selection process. Here the processor is actively working on the
selection of the article. The selection can be more difficult, for example, because of the higher competition with other
elements of the set (caused, as discussed below, by a more similar probability distribution). In that case the selection
process will take longer and cost more processing resources than in the case when the elements are more conspicuous
and can easily be distinguished (meaning that the probability distribution of the set members is more diverse). This part
of the process takes longer in Dutch than in Italian. It may very well be the case that the total time necessary to process
an article is longer in Italian, but a large portion of this time the processor is not actively working on the selection of the

10
One may informally compare it with the manufacturing process. As is the case with by-products (spin-offs) in a production company: up to the
stage where the products will be processed individually, a by-product has no specific cost related to it. They come ‘for free’ because of the
production of another product.
1532 J. De Lange et al. / Lingua 119 (2009) 1523–1540

article. After all, other processes take place as well, such as selection of other elements (the noun, the phonological
context, etc.)
Caramazza and his colleagues (Alario and Caramazza, 2002; Caramazza et al., 2001; Costa et al., 1999; Janssen and
Caramazza, 2003; Miozzo and Caramazza, 1999; Schiller and Caramazza, 2003) argue that for an article to be selected
its frame must be completely filled. However, they do not take into consideration the role of the characteristics of the
set out of which the article has to be selected; languages do differ in the complexity of their article paradigms. The
selection of an article out of the set of possible articles in one language may be a more ‘costly’ operation in terms of
processing resources required to retrieve the article, than selecting an article from another language. In other words,
languages may differ in the level of activation necessary to activate the article selection. If article selection is a
relatively ‘easy’ process in a language the activation level necessary for article selection will be low.11 If the activation
level is low an article can be selected, in spite of the fact that the article frame is not completely filled. The suggested
complexity of an article paradigm, of course, needs to be defined. In the following section we define the complexity in
terms of the information theory.

1.5. Aspects of the information theory and its application to language processing

1.5.1. Basic notions of the information theory


When an article is to be produced, the targeted form needs to receive the highest activation among the members of
its set. As discussed earlier, there is evidence that other members of the set also receive a certain amount of activation.
The final output is a result of a competition between all members of the set. The process of activation can, therefore, be
characterized as resolving uncertainty with regard to which element of the set will win the competition. Uncertainty is
a notion that can be measured and it is the basic notion of the information theory that provides mathematical tools for
measuring the so-called information value of an uncertain output. We begin with a brief introduction to this theory and
then turn to its application in language research.
The information theoretical approach adopted here finds its roots in Shannon and Weaver’s (1949) information
theory (see also Schneider, 1984; De Lange, 2008). In this framework, the notion information is used in a rather
counterintuitive way and has nothing to do with the meaning of a message. Unlike the concept meaning, it applies to
the situation as a whole rather than to an individual message. The information unit indicates that, in a particular
situation, there is a certain amount of uncertainty that a particular message will be selected. It thus refers to the degree
of freedom in choosing this particular message. The greater the freedom of choice, the greater is the uncertainty that a
particular message will actually be selected. In order to avoid confusion, it is important to note that the message in the
information theory can be anything from a single letter to a long text. We will be applying this approach to the process
of article selection; therefore a message, for our purposes, is the targeted article (not to be confused with the message
as an output of the conceptualizer).
The concepts information and uncertainty are technical terms related to the process of selecting one or more entity
out of a set of entities. There are two types of uncertainty or information value: that of an individual element and of a
system. Let us begin first with the former and, in order to specify this further, let us assume that there is a mechanism
that can produce one out of three entities: A, B and C. In this particular system the uncertainty is that of three entities as
we are not sure which one of the three will be produced. When, for example, the entity B is produced the uncertainty
decreases and we have received some information. If another mechanism is active in parallel and producing two
entities X and Y, its uncertainty will be that of two entities. When the two mechanisms are combined, there are six
possible combinations, hence an uncertainty of six entities (AX, AY, BX, BY, CX and CY). The uncertainty is,
therefore, an additive measure expressed in the logarithm of the number of possible entities.
In the above example the first mechanism has the uncertainty of log(3), the second of log(2) and the two
mechanisms combined of log(3) and log(2), which results in log(6). The base of the logarithm determines the units.
When we use base 2 the units are in bits (base 10 gives digits). Thus, if a mechanism produces one entity, we are
uncertain by log2 1 = 0 bits, and we have no uncertainty about what the device will do next. If it produces two symbols
our uncertainty would be log2 2 = 1 bit. So far, the formula for uncertainty is log2(M), with M being the number of
entities. Let us now extend the formula so it can handle cases where the entities are not equally likely to be produced.

11
This is essentially what Avrutin (1999, 2006) means when he discusses the relative complexity of the morphosyntactic channel, namely the ease
of retrieval of an element used for encoding part of the message.
J. De Lange et al. / Lingua 119 (2009) 1523–1540 1533

For example, if there are three possible entities, but one of them never appears, then the uncertainty is 1 bit. If the third
entity appears rarely, relative to the other two entities, then the uncertainty should be larger than 1 bit, but not as high as
log2(3) bits. The uncertainty in the case of unequal probability of the entities is given in (8), where P = 1/M is the
probability that any entity appears.

(8) log2(M) = log2(M  1)


= log2(1/M)
= log2(P)

The amount of information (H) of an element (x) in a set, expressed in bits of information, is equal to the logarithm of
the probability that specific element will appear. It expresses the decrease in uncertainty that will be achieved if the
element becomes available, which is exemplified in Formula (1)12:

HðxÞ ¼ log2 PðxÞ (1)

I(x), thus, expresses the informative value of a specific single element in set. The second type of uncertainty is that of
the system, i.e. the average information per element of the system. The value of the uncertainty of a system is the
average surprise for the string of symbols produced by the device:
X
H¼ Pi log2 Pi ðbits per symbolÞ (2)

Formula (2) is Shannon’s general formula of the uncertainty of a system, i.e. absolute entropy (H) of the system.
Absolute entropy reflects the level of uncertainty within a system and is based on the probability distribution of the
elements in the system. The formula comprises the sum of the informative values of each element in the system, in
which each of these values is weighed for its share in the total frequency. The formula thus expresses the average
information per element of a set, in other words the average surprisal of a set of N elements with corresponding (not
necessarily equal) probabilities p1, p2, p3, . . ., pN.
Let us illustrate this with the calculation of the absolute entropy based on the actual probability distribution rate of
articles in Dutch (taken from the Spoken Dutch Corpus): Probability ( p) of de = 0.479; ( p) of het = 0.182 and ( p) of
een = 0.339. These have the following surprisal/informative values (I), calculated by log2(Pi):

I of de = 1.062 bits;
I of het = 2.466 bits, and
I of een = 1.561 bits.

The uncertainty (H), the absolute entropy of the Dutch article system, is thus H = 0.479  1.062 + 0.182 
2.466 + 0.339  1.561 = 1.485 bits per symbol.13
It is not difficult to see that the absolute entropy reaches its minimum level when one element has probability of 1
and all the rest that of 0. There is no uncertainty in that case; we can be completely sure about which element will be
selected, hence H equals 0. The entropy of a system reaches its maximum level if all elements have the same
probability of occurrence. In our example the maximum absolute entropy is 1.585 (for details, see footnote 13). This
example shows that the more ‘unequal’ the probability distribution is (i.e. the higher the probability differences

12
According to Tribus (1961), this measure is also called the ‘surprisal’. If a specific element is a very ‘common’ element, the surprise will be
‘low’, if it is a ‘rare’ element the surprise will be ‘high’.
13
The entropy of a system reaches its maximum level (Hmax) if all elements have the same probability to occur. Let assume that all three Dutch
article have the same probability of occurrence: p of de = 0.333; p of het = 0.333 and p of een = 0.333. These give the same surprisals/informative
values (log2(Pi)):
I of de = 1.585 bits
I of het = 1.585 bits, and
I of een = 1.585 bits.
All articles have the same probability of being selected, therefore, the uncertainty is: H = 0.333  1.585 + 0.333  1.585 + 0.333  1.585 = 1.585
bits per symbol. This is the maximum level of uncertainty in the system, the maximum entropy level (Hmax) of a system of three elements. From the
example above we see that a set of articles with unequal probability of occurrence has lower entropy.
1534 J. De Lange et al. / Lingua 119 (2009) 1523–1540

between the elements in the system), the lower the absolute entropy value will be, meaning that there is less uncertainty
in the system.14
The absolute entropy reflects the level of uncertainty of a single system. Nevertheless, in order to compare two or more
systems with an unequal number of elements we cannot just compare the absolute entropy values of the two systems. The
absolute entropy is sensitive to the number of elements in the system, since it is calculated on the basis of the sum of the
informative value of the elements. The sum of the informative values of the single elements (and hence, the absolute
entropy) is very likely to be higher in a set of for example 17 elements than in a set of 3 elements. The more elements a
system has, the more choices there are. However, selecting a specific element out of the set does not necessarily have to be
more difficult in a set that contains a larger number of elements. If the elements in the larger set are more conspicuous than
the elements in the smaller set, selecting an element out of the larger set will be easier, in spite of the higher absolute
entropy value. Therefore, in order to compare the uncertainty levels of systems with an unequal number of elements we
need a measure that is not sensitive to the number of elements in the system: this is the relative entropy value.
In order to calculate relative entropy first the absolute entropy of the set must be calculated, based on the actual
probability distribution of the elements. Further we need to calculate what the entropy of this set would have been if all
elements would have had the same probability to occur (this gives the maximum entropy, Hmax). The maximum
entropy reflects that situation where all members of the set have equal probability and, therefore, represents the highest
uncertainty. Finally, the absolute entropy of the set is divided by the maximum entropy to get the value of relative
entropy. In our example above: Hr = H/Hmax = 1.485/1.585 = 0.937. The values of relative entropy range from 1 to 0,
and relative entropy is not sensitive to the number of elements in the system. Relative entropy represents an index of
homogeneity of probability distribution. The higher the relative entropy value, the more the actual entropy value of the
sets approaches the maximum level of entropy, and hence, the more the uncertainty level of the set approaches the
maximum level of uncertainty. In other words, the higher the relative entropy, the more homogeneous the probability
distribution of the elements is, and the more difficult is to select an element out of the set.

1.5.2. Language processing and information theory


To begin with, recent brain imaging research showed that human brain is sensitive to the amount of entropy. In a
study using the fMRI technique, Overath et al. (2007) observed that the brain activation increases when the entropy of
the processing signal increases. In this article, however, we do not wish to go into the neurological side of the story but
we will focus on the available behavioural data as related to lexical processing. Specifically, we will base our
discussion on a series of experiments conducted by Kostic and colleagues. Kostic (2004) provided convincing
evidence that the human language processing system is sensitive to information load as defined in the previous section.
He conducted a series of studies on the processing of inflected noun forms and found that reaction times to these forms
were dependent on the individual information load of a case form (see Formula (1)) and the average information of the
set of case forms also referred to as the absolute entropy (see Formula (2)). In a number of experimental studies, Kostic
found a strong correlation between the amount of information carried by a specific form and response latencies.
However, the amount of information of a form is not the only factor influencing response latencies. In addition to the
average amount of information of a set, Kostic also calculates the average amount of information in the experiment. This
value is the ‘global information load’ as opposed to the ‘individual amount of information’ of a specific form. An increase
in global information load (entropy) was associated with a non-linear increase in processing speed and, therefore, an
increase in the amount of information processed per unit of time (bit/s). In other words, a particular form, when presented
in the context of other forms with high informational values (high entropy context), will be processed faster than when it is
presented with other forms of low informational value (low entropy contexts). Kostic found that the regression slope
varied systematically with the average amount of information from experiment to experiment. In other words, the
processing time for a particular form varies as a function of the informational values of the other forms presented in the
experiment.
According to Kostic, there are two factors influencing recognition latencies for inflected noun forms in Serbian. At
the level of the individual case form, processing time was directly proportional to the amount of information carried by
that specific form (see Formula (1)). The higher the amount of information carried by the specific form, the longer the
reaction time. At the global level, where information load was defined in terms of the average amount of information in

14
In the physical sciences the entropy associated with a situation is a measure of the degree of randomness (shuffled-ness) in the situation. When a
situation has a high degree of randomness, the entropy will be high. When there is a low degree of randomness, the entropy will be low.
J. De Lange et al. / Lingua 119 (2009) 1523–1540 1535

Table 5
Absolute entropy values of Dutch articles
Absolute entropy Dutch articles (based on freq. ‘Corpus Gesproken Nederlands’)
freq. p = rel. freq. log p p log p
De 253210 0.478969311 1.061994874 0.508662953
Het 96327 0.182211116 2.456317116 0.447568284
Een 179119 0.338819573 1.561410877 0.529036566
528656 1 1.485267803
H 1.485267803

an experiment (see Formula (2)), higher global information load is associated with faster processing time per bit of
information. The higher the absolute entropy in the system, the higher the processing speed per bit of information.
Kostic convincingly shows that our language processing system is sensitive to differences in the amount of
information of a single element (Formula (1)) and the average amount of information in the set (Formula (2)). Hence both
values can be adequate descriptors of the cognitive complexity of word forms. In the following section we apply the
information theoretical approach to the process of article selection. For the purposes of the present study we focus only on
the individual amount of information. For a detailed discussion of the influence of the global information load on article
omission in various linguistic constructions, such as finite versus non-finite sentences, the reader is referred to De Lange
(2008).

1.5.3. Articles and information theory


In this study, the process of article selection is examined in two different article sets, Dutch and Italian. The two
sets have an unequal number of elements; the Dutch set contains 3 elements and the Italian set contains 17 elements.
In order to compare the complexity of two sets with an unequal number of elements, relative entropy of the two sets
must be calculated. Tables 5 and 6 provide the calculations of absolute entropy for the Dutch and Italian article
system.

Table 6
Absolute entropy values of Italian articles
Absolute entropy Italian articles (freq. data corpus De Mauro, 1993)
freq. p = rel. freq. log p p log p
il 7,111 0.195351776 2.355853723 0.460220209
la 7,254 0.19928024 2.327129434 0.463750911
le 2,536 0.069668416 3.843351434 0.267760205
lo 425 0.011675503 6.420371433 0.074961069
i 2,637 0.072443065 3.787008608 0.27434251
l’ 3,120 0.085711931 3.54436015 0.303793953
gli 820 0.022526854 5.472210364 0.123271682
un 6,624 0.181973023 2.458203506 0.447326723
un’ 800 0.021977418 5.507834274 0.121047977
una 3,838 0.105436664 3.245551468 0.342200119
uno 227 0.006236092 7.325141977 0.045680262
degli 111 0.003049367 8.357274598 0.025484395
dei 394 0.010823878 6.529638644 0.070676015
del 18 0.000494492 10.98176546 0.005430394
della 15 0.000412077 11.24479987 0.004633719
delle 470 0.012911733 6.275173517 0.081023366
dello 1 2.74718E05 15.15169046 0.000416244
36,401 1 3.112019753
Absolute entropy H 3.112019753
1536 J. De Lange et al. / Lingua 119 (2009) 1523–1540

Fig. 2. Probability of distribution of Dutch articles. Dutch article set: absolute entropy = 1.49 bits, relative entropy = 0.94.

In order to calculate relative entropy we need to know the frequency distributions of the elements in the two article
sets.15 On the basis of frequency distributions we can calculate the probability of occurrence for each article ( p = rel.
freq. in Tables 5 and 6). On the basis of the probability of each article we can calculate the individual informative load,
using Formula (1) (H(x) = log2 P(x)) from section 1.5.1. Then, in order to calculate P the entropy of the article set we
have to calculate the average information value of the elements in the set, H =  Pi log2 Pi (bits per symbol). As the
tables show, this gives us the following values—Dutch: H dutch = 1.485267803 and Italian: H italian = 3.112019753.
Before we proceed further, an important word of caution is needed since we do not refer to probability as a measure
that indicates how well, for example, a masculine singular noun predicts a specific article form. We define probability
as the probability distribution of elements within a set as a measure that reflects the organization of lexical storage of
the elements. It characterizes the ease or the effort of lexical retrieval of an element out of the set that is stored in the
lexicon. Differences in probability distribution reflect differences in the strength of memory traces of elements stored
in the set. The more frequent an article is used in a specific function, the stronger its memory trace is. The selection
process of an article is reflected in competition between different elements. The stronger the memory trace of a
particular article, the easier it is for that article to win over other competitors, hence, the easier the selection process of
that particular article is.
The absolute entropy value of the Italian article system is higher. Nevertheless, as we argued previously, this is not
surprising, as we are comparing the absolute entropy values of two systems with an unequal number of elements. In a
comparison of two systems with an unequal number of elements, the system with the highest number of elements will
very likely have the highest absolute entropy value. The Italian system has more elements than the Dutch system.
Therefore, the absolute entropy is supposed to be higher and it is higher in the Italian system. This fact, however, is not
informative in relation to the differences in the complexity level between the two systems.
In order to assess the differences in the complexity of the two systems and compare entropy values of systems with
unequal number of elements the relative entropy is calculated: Hr = H/Hmax.

Dutch: H dutch 1.485267803 Italian: H italian 3.112019753


H max dutch 1.584962501 H max italian 4.087462841
H rel dutch 0.937099649 H rel italian 0.76135732

Figs. 2 and 3 illustrate the probability distribution of articles in the Dutch and Italian article set. The probability
differences are more conspicuous in the Italian article set, as is confirmed by the values of relative entropy. Note that, in
spite of the lower degree of complexity, the absolute entropy value is higher in Italian than in Dutch.
The relative entropy values show that the degree of uncertainty in Dutch article selection process is higher than in
Italian. The Dutch article system consists of, more or less, evenly distributed probabilities for different articles. The
entropy value of the Dutch article set is closer to the maximum entropy level than the Italian entropy value. In other
words, the contrast between the probability values of the individual elements is bigger in Italian than in Dutch. This
leads to smaller relative entropy in Italian. The overall probability differences in Italian are more pronounced. That is

15
The corpora of Spoken Dutch (Corpus Gesproken Nederlands) and Spoken Italian (De Mauro et al., 1993) were used to calculate the frequency
of the different article forms in the two respective languages.
J. De Lange et al. / Lingua 119 (2009) 1523–1540 1537

Fig. 3. Probability of distribution of Italian articles. Italian article set: absolute entropy = 3.11 bits, relative entropy, Hr = 0.76.

why the degree of uncertainty is smaller in the Italian article selection process. The bigger the probability differences
between the elements, the lower the relative entropy and, as a consequence, the less cognitive effort to process an
element of a paradigm is needed. Due to the fact that the degree of uncertainty of the Italian article selection process is
smaller, it requires less cognitive effort to select a specific element out of the set.16 To reformulate the same claim in
terms of a competition between members of a set, we can say that the competition in Dutch is harder than in Italian due
to the fact that the targeted element in Italian is more distinct from other elements than it is the case in Dutch. The
notion of relative entropy, therefore, is a measure of complexity of a set as compared to other sets, for example,
comparison of the sets of articles in two different languages.
The production of a noun with an article (a DP) begins with a communicative intention that is being passed to
syntax through the information structure level. The noun lemma contains rules that specify the article frame, defining
which slots have to be filled in and which functional elements are required to implement these rules. These elements
will be selected out of the article set on the basis of the information contained in the article slots, such as accessibility
index, number and gender. An article is selected when all article slots are filled in. However, in ‘special’ contexts, or in
particular types of speakers an article can be omitted in spite of the fact that the frame is filled in. The more processing
resources and time are necessary, the more likely the article will be omitted. The stronger the competition between
articles in a set, the more difficult and time consuming the article selection process will be. In what follows, we will
argue that, in special situations, an article may not be selected because of the high relative entropy value of the set.
The syntactic processes are non-intentional, automatic and most economical; nevertheless, the number of units of
information that the syntactic channel can process per unit of time is limited, even in healthy adult speakers.
Consequently, the more time is available for processing, the more units of information can be processed. As long as the
amount of information the syntactic channel has to process does not exceed its channel capacity, the output this
channel will be ‘normal’ grammatical utterances. In a regular discourse situation, involving healthy adult speakers, the
available processing resources and the available time will suffice to translate the communicative intention into a
linguistic form. This means that the grammatical encoder will be able to retrieve appropriate lexical and functional
elements that constitute the linguistic message from the lexicon. The morphosyntactic channel wins the competition
with an alternative, presuppositional way of encoding information (Avrutin, 2006).
We would like to argue that maximal capacity of children’s syntactic channel is lower than the maximal capacity of
the healthy adult’s syntactic channel. Therefore, the elements that require too much cognitive effort and processing
time will be omitted. In children, this limitation is, most likely, related to the on-going maturation of the relevant brain
structures and the amount of resources necessary for automatic, fast activation of lexical items. However, a detailed
neurological description of the phenomenon would be far beyond the scope of this article.
Dutch children omit more articles than Italian children because the relative entropy of the Dutch article set is higher
than in the Italian article set. Children do not have enough processing resources to process this high relative entropy
value. This is why we find more omissions and during a longer period of time in Dutch as opposed to Italian.
Nevertheless, children do not omit all the time because of other factors that may affect the availability of processing

16
Once again, the uncertainty, as well as information, is a technical notion that has nothing to do with conscious uncertainty of the speaker with
regard to what he/she intends to say. It is an indirect measure of the difficulty related to the selecting an element from a set of competitors.
1538 J. De Lange et al. / Lingua 119 (2009) 1523–1540

resources in a given situation. These factors may or may not be linguistically interesting, for example frequency of a
particular item or structural complexity may also play a role. Other non-linguistic yet real factors, such as the child’s
tiredness or other unknown circumstances, may influence the amount of resources necessary for the retrieval of lexical
information. In general, children’s speech production may possibly be more burdened by resource-consuming non-
linguistic activities than speech production of healthy adults.
The omissions in headlines are naturally not related to limited processing resources. The headline writers’ or
readers’ channel capacity is not limited but the time readers wish to spend on reading headlines is. Readers wish to
receive as much information as possible in as limited time as possible. A headline has to be a ‘news-flash’ with high
surprise and informative values. In terms of Shannon and Weaver’s (1949) theory, headline writers strive at
maximizing the transmission rate (i.e. the number of informational units) through the channel per unit of time. They
suggest that, in order to maximize the transmission rate, the proper coding is essential. This means that elements with
low amount of information and/or high processing time will be eliminated. However, they also argue that a price must
be paid for the gain in transmission rate; the process of coding will take longer (see Shannon and Weaver, 1949:19).
Interestingly, Dor (2003) points out that a proper headline making is a time consuming process, which may very well
be the reason why in normal speech we do maximize the transmission rate. If the available channel time is not
restricted we can be less careful with the coding process and use a transmission rate that allows for more redundancy,
as for example the inclusion of articles.
The ‘coding’ in human language begins with the communicative intention and subsequently involves processes
taking place at the information structure level. At the moment the communicative intention is conceived, speakers or
headline writers do not have the explicit knowledge on the informative load or relative entropy of the elements or sets
at their disposal. They do, however, know that they are writing a headline that must be comprehended in a limited time
span. In other words, they find themselves in the special register zone.
Similar to the intuitive judgments of native speakers widely used in theoretical linguistics, headline writers cannot
explain why the ‘best’ headline is the best headline. This is not surprising, since morphosyntactic processing takes
place automatically, non-intentionally and unconsciously. The reason why articles are omitted is not because they are
less meaningful. They are equally meaningful or meaningless in Dutch as in Italian but we still find large differences in
the article use in newspaper headlines. Because headline writers want to maximize the effectiveness of the used
channel, they are extremely sensitive to the processing time per unit of information. They do know that their audience,
the reader, wants to receive the best informative value for their cognitive effort. As we have seen from Kostic’s study in
Serbian, the processing time per unit of information is influenced by two factors. First, he finds that the higher the
informative load, the longer the processing time per unit of information. In addition, he reveals that the lower the
relative entropy, the faster the processing time per unit of information.
The informative load of articles is lower than the informative load of lexical elements in both Italian and Dutch.
Hence, we expect and find omissions in both languages, but more of them in Dutch than in Italian. This cross-linguistic
difference is caused by differences in the entropy of the two sets. Selecting an article out of the Italian article set costs
less processing time and effort than selecting an article out of the Dutch article set. Recall Fig. 1 in section 1.4
illustrating the time course of article selection, the actual selection process costs less time in Italian. The complexity of
the article set, expressed as the value of relative entropy in Italian, is lower and the overall percentage of omissions is
lower as well. Let us underline again, that differences in probability distribution reflect differences in the strength of
memory traces of various elements stored in the set.

1.6. Conclusions

Reduced utterances, as an instance of structure minimization, are a result of an intentional or a non-intentional


failure to activate all features necessary for producing an article. The non-intentional cases are those observed in child
speech and the intentional ones are the ones observed in newspaper headlines. The causes of this non-activation are
different in the two populations. Children’s omissions are caused by lower brain activation than in the healthy adult
speakers. Omissions in headlines are driven by the desire to obtain the maximum processing speed per unit of
information. Nevertheless, in both cases the crucial factor is the optimal use of the available information channel
capacity, whose effectiveness is measured by the amount of information, in its technical sense, transmitted per unit of
time. There is no difference between brain maturation of Dutch and Italian children. Similarly, there is no difference in
the intentions of Dutch and Italian headline writers or the amount of space available in their respective publications.
J. De Lange et al. / Lingua 119 (2009) 1523–1540 1539

What differs is the linguistic properties of the article sets in the two respective languages. They differ in such a way that
producing an article in Italian becomes possible at an earlier age than in Dutch and omitting an article in Italian
becomes less advantageous in a headline in comparison to Dutch. Both in the case of child speech and newspaper
headlines, structure minimization, therefore, can be viewed as an optimal solution for the language processor to
convey information in the most efficient way.

Acknowledgements

This research is supported in part by the NWO (Dutch National Science Foundation) Pioneer grant Comparative
Psycholinguistics to Sergey Avrutin, which is hereby gratefully acknowledged. We thank Aleksandar Kostic, Maria
Teresa Guasti and Fermin Moscoso del Prado for valuable discussions and suggestions.

References

Alario, F.X., Caramazza, A., 2002. The production of determiners: evidence from French. Cognition 82, 179–223.
Arnold, 1969. Modern Newspaper Design. Harper and Row, New York.
Avrutin, S., 2006. Weak syntax. In: Amundt, K., Grodzinsky, Y. (Eds.), Broca’s Region. Oxford Press, Oxford, pp. 49–62.
Avrutin, S., 2004a. Beyond narrow syntax. In: Jenkins, L. (Ed.), Genetics of Language. Routledge, London, pp. 95–115.
Avrutin, S., 2004b. Optionality in child and aphasic speech. Lingue e Linguaggio 1, 67–93.
Avrutin, S., 1999. Development of the Syntax–Discourse Interface. Kluwer Academic Publishers, Dordrecht.
Bates, E., Dale, P.S., Thal, D., 1995. Individual differences and implication for theories of language development. In: Fletcher, P., MacWhinney, B.
(Eds.), Handbook of Child Language. Blackwell, Oxford, pp. 96–151.
Bell, A., 1991. The Language of the News Media. Blackwell Publishers, Oxford.
Bock, K., 1989. Closed-class immanence in sentence production. Cognition 31, 163–186.
Bock, K., Loebell, H., 1990. Framing sentences. Cognition 35, 1–39.
Bol, G.W., 1996. Optional subjects in Dutch child language. In: Koster, C., Wijnen, F. (Eds.), Proceedings of the Groningen Assembly on Language
Acquisition. pp. 125–135.
Bradley, D.C., 1978. Computational distinctions of vocabulary type. Doctoral dissertation. MIT.
Caramazza, A., Miozzo, M., Costa, A., Schiller, N., Alario, F.X., 2001. A crosslinguistic investigation of determiner production. In: Dupoux, E.
(Ed.), Language, Brain and Cognitive Development, Essays in Honor of Jacques Mehler. The MIT Press, Cambridge, MA, pp. 209–226.
Chierchia, G., Guasti, M.T., Gualmini, A., 1999. Nouns and articles in child grammar and the syntax/semantics map. In: Paper Presented at Gala,
Potsdam.
Cipriani, P., Pfanner, P., Chilosi, A., Cittadoni, L., Ciuti, A., Maccari, A., Pantano, N., Pfanner, L., Poli, P., Sarno, S., Bottari, P., Cappelli, G.,
Colombo, C., & Veneziano, E., 1989. Protocolli diagnostici e terapeutici nello sviluppo e nella patologia del linguaggio, 1/84 Ministero della
salute, Fondazione Stella Maris, Pisa.
Costa, A., Sebastián-Galles, N., Miozzo, M., Caramazza, A., 1999. The gender congruity effect: evidence from Spanish and Catalan. Language and
Cognitive Processes 14, 381–391.
Dardano, M., 1981. Il linguaggio dei giornali italiani. Laterza, Roma.
De Lange, J., 2008. Article omission in child speech and headlines: a processing account. Doctoral dissertation. LOT Series, Utrecht University.
De Lange, J., Avrutin, S., Guasti, M.T., 2005. Crosslinguistic differences in child and adult speech optional omissions: a comparison of Dutch and
Italian. In: Paper Presented at GALA 2005, Siena, Italy.
De Lange, J., 2004. Article omission in child language and headlines. In: Kerkhoff, A., De Lange, J., Sadeh Leicht, O. (Eds.), Yearbook 2004.
Utrecht Institute of Linguistics, pp. 109–120.
De Mauro, T., Mancini, F., Vedovelli, M., Voghera, M., 1993. Lessico di frequenza dell’italiano parlato. Etaslibri, Milano.
Dor, D., 2003. On newspaper headlines as relevance optimizers. Journal of Pragmatics 35, 695–721.
Garrett, M.F., 1982. Production of speech: observations from normal and pathological language use. In: Ellis, A. (Ed.), Normality and Pathology in
Cognitive Functions. Academic Press, London, pp. 19–76.
Gerken, L.A., 1991. The metrical basis for children’s subjectless sentences. Journal of Memory and Language 30, 1–21.
Guasti, M.T., De Lange, Gavarró, A., Caprin, C., 2004. Article omission: across child languages and across special registers. In: van Kampen, J.,
Baauw, S. (Eds.), Proceedings of GALA 2003, LOT Occasional Series, vol. 1, Utrecht, pp. 199–210.
Guasti, M.T., Gavarró, A., De Lange, J., Caprin, C., 2008. Article omission across languages. Language Acquisition 15 (2), 89–119.
Haegemann, L., 1990. Non-overt subjects in diary contexts. In: Mascaro, J., Nespor, M. (Eds.), Grammar in Progress. Foris, Dordrecht, pp. 167–
174.
Janssen, N., Caramazza, A., 2003. The selection of closed-class words in noun phrase production: the case of Dutch determiners. Journal of Memory
and Language 48, 635–652.
Klee, T., Fitzgerald, M.D., 1985. The relation between grammatical development and mean length of utterance in morphemes. Journal of Child
Language 12, 251–269.
Kostic, A., 2004. The Effects of the Amount of Information on Processing of Inflected Morphology. University of Belgrade.
Levelt, W.J.M., 1989. Speaking. From Intention to Articulation. The MIT Press, Cambridge, MA.
1540 J. De Lange et al. / Lingua 119 (2009) 1523–1540

Lléo, C., Demuth, K., 1999. Prosodic constraints on the emergence of grammatical morphemes: crosslinguistic evidence from Germanic and
Romance languages. In: Proceedings of the 23rd BUCLD. Cascadilla Press, Somerville, pp. 407–418.
MacWhinney, B., Snow, C., 1985. The child language data exchange system. Journal of Child Language 12, 271–296.
Magni, M., 1992. Lingua italiana e giornali d’oggi. Guido Miano Editore Milano, Miano.
Mardh, I., 1980. Headlines: On the Grammar of English Front Page Headlines. Gleerup C.W.K.
Miozzo, M., Caramazza, A., 1999. The selection of determiner in noun phrase production. Journal of Experimental Psychology: Learning, Memory
and Cognition 25, 907–922.
Moscoso del Prado, M., Kostic, A., Baayen, H., 2004. Putting the bits together: an informational perspective on morphological processing. Cognition
94, 1–18.
Nortier, J., 1995. Code switching in Moroccan Arabic/Dutch vs. Moroccan Arabic/French language contact. International Journal of the Sociology
of Language 112, 81–95.
Overath, T., Cusack, R., Kumar, S., von Kriegstein, K., Warren, J.D., 2007. An information theoretic characterization of auditory encoding. PLoS
Biology 5 (11), e288, doi:10.1371/journal.pbio.0050288.
Pérez-Leroux, A.T., Roeper, T., 1999. Scope and the structure of bare nominals: evidence from child language. Linguistics 37, 927–960.
Roelofs, A., 1997. The WEAVER-model of word-form encoding in spoken production. Cognition 64, 249–284.
Ruigendijk, E., 2002. Case-assignment in Agrammatism: a crosslinguistic study. Doctoral dissertation. Groningen University.
Schiller, N.O., Caramazza, A., 2002. The selection of grammatical features in word production: the case of plural nouns in German. Brain and
Language 81, 342–357.
Schiller, N.O., Caramazza, A., 2003. Grammatical feature selection in noun phrase production: evidence from German and Dutch. Journal of
Memory and Language 48, 169–194.
Schneider, T., 1984. The information content of binding sites on nucleotide sequences. Doctoral dissertation. University Colorado.
Shannon, C., Weaver, W., 1949. The Mathematical Theory of Communication. University of Illinois Press, London.
Simon-Vandenbergen, A.M., 1981. The Grammar of Headlines in The Times 1870–1970. Paleis der Academieën, Brussel.
Stowell, T., 1999. Words lost and syntax found in headlines: the hidden structure of abbreviated English in headlines, instructions and diaries. In:
Paper Presented at York University, Toronto, November 24.
Straumann, H., 1935. Newspaper Headlines, A Study in Linguistic Method. Unwin Brothers Ltd., London.
Tribus, M., 1961. Thermostatics and Thermodynamics. D. van Nostrand Company Inc., Princeton, N.J..
Van Der Velde, M., 2006. L’acquisition des articles définis en L1, Acquisition et Interaction en Langue Étrangère [En ligne], Trois courants de
recherche en acquisition des langues, Mis en ligne le: 18 juillet 2006, Disponible sur: http://aile.revues.org/document1723.html.
Van Dijk, T.A., 1988. News as Discourse. Lawrence Erlbaum Associates Publishers, New Jersey.
Van Kampen, N.J., 1994. The learnability of the left branch condition. In: Bok-Bennema, R., Cremers, C. (Eds.), Linguistics in the Netherlands
1994. John Benjamins, Amsterdam, pp. 83–94.

You might also like