Jackendoff, R. 1975. Regularities in The Lexicon

Linguistic Society of America
Morphological and Semantic Regularities in the Lexicon

Author(s): Ray Jackendoff
Source: Language, Vol. 51, No. 3 (Sep., 1975), pp. 639-671
Published by: Linguistic Society of America
Stable URL: http://www.jstor.org/stable/412891 .
Accessed: 26/05/2011 02:40
Your use of the JSTOR archive indicates your acceptance of JSTOR's Terms and Conditions of Use, available at .
http://www.jstor.org/page/info/about/policies/terms.jsp. JSTOR's Terms and Conditions of Use provides, in part, that unless
you have obtained prior permission, you may not download an entire issue of a journal or multiple copies of articles, and you
may use content in the JSTOR archive only for your personal, non-commercial use.
Please contact the publisher regarding any further use of this work. Publisher contact information may be obtained at .
http://www.jstor.org/action/showPublisher?publisherCode=lsa. .
Each copy of any part of a JSTOR transmission must contain the same copyright notice that appears on the screen or printed
page of such transmission.
JSTOR is a not-for-profit service that helps scholars, researchers, and students discover, use, and build upon a wide range of
content in a trusted digital archive. We use information technology and tools to increase productivity and facilitate new forms
of scholarship. For more information about JSTOR, please contact support@jstor.org.
Linguistic Society of America is collaborating with JSTOR to digitize, preserve and extend access to Language.
http://www.jstor.org
MORPHOLOGICAL AND SEMANTIC REGULARITIES IN THE LEXICON
RAY JACKENDOFF
Brandeis
University
This paper proposes a theory of the lexicon consistent with the Lexicalist
Hypoth-
esis of
Chomsky's 'Remarks on nominalization' (1970). The crucial problem is to
develop a notion of lexical
redundancy rules which permits an adequate description of
the partial relations and
idiosyncrasy characteristic of the lexicon. Two lexicalist
theories of
redundancy rules, each equipped with an evaluation measure, are com-
pared on the basis of their accounts of nominalizations; the superior one, the FULL-
ENTRY
THEORY, is then applied to a range of further well-known examples such as
causative verbs, nominal compounds, and idioms.
The
starting point
of the Lexicalist
Hypothesis, proposed
in
Chomsky's
'Remarks
on nominalization'
(1970),
is the
rejection
of the
position
that a nominal such as
Bill's decision to
go
is derived
transformationally
from a sentence such as Bill
decided to
go. Rather, Chomsky proposes
that the nominal is
generated by
the base
rules as an
NP,
no S node
appearing
in its derivation. His
paper
is concerned with
the
consequences
of this
position
for the
syntactic component
of the
grammar. The
present paper
will
develop
a more
highly
articulated
theory
of the lexical treatment
of
nominals,
show that it is
independently necessary,
and extend it to a wide
range
of cases other than nominalizations.1
The
goal
of this
paper
is
very
similar to that of Halle 1973: the
presentation
of a
framework in which discussion of lexical relations can be made more meaningful.
I will not
present any
new and unusual facts about the lexicon; rather, I will try
to formulate a
theory which accommodates a rather disparate range of well-known
examples
of lexical relations. The theory presented here, which was developed
independently
of
Halle's,
has
many points
of
correspondence with
it;
I
have,
however, attempted a more elaborate working out of numerous details. I will
mention
important
differences between the theories as they arise.
1. LEVELS OF
ADEQUACY IN DESCRIPTION. In a theory of the lexicon, we can
distinguish three levels of
adequacy in description, parallel to those discussed by
Chomsky
1965 for
grammatical theory. The first level consists in providing each
lexical item with sufficient information to describe its behavior in the language.
This
corresponds to Chomsky's level of observational adequacy, in which the
grammar is required to enumerate correctly the set of sentences in the language.
A
theory
of the lexicon
meeting the second level of adequacy expresses the relation-
ships, sub-regularities, and
generalizations among lexical items of the
language,
e.g.
the fact that decide and decision are related in a systematic fashion. This level
corresponds
to
Chomsky's level of descriptive adequacy, which requires the
1
My
thanks
go
to John Bowers, Francois Dell, Noam Chomsky, Morris Halle, and to classes
at the 1969
Linguistic
Institute and Brandeis
University
for valuable discussion. Earlier versions
of this
paper
were
presented
to the 1969 summer LSA
meeting
at the
University
of Illinois and
to the 1970 La Jolla
Syntax Conference. Thanks also to Dana Schaul for many useful examples
scattered
throughout
the
paper.
639
LANGUAGE, VOLUME 51, NUMBER 3 (1975)
grammar
to
express correctly relationships
between sentences, such as the active-
passive
relation.
A
theory
of the lexicon
meeting
the third level
of'adequacy
describes how the
particular relationships
and
sub-regularities
in the lexicon are
chosen-why
the
observed
relationships,
and not other
imaginable ones, form
part
of the
description
of the lexicon in
question.
One of the
questions
that must be answered at this level
is, e.g., why
decide rather than decision is chosen as the more 'basic' of the two
related items. This element of the
theory
takes the form of an 'evaluation measure'
which
assigns
relative values to
competing
lexical
descriptions
available within the
theory.
This is the level of
explanatory adequacy.
As
Chomsky emphasizes,
the evaluation measure does not decide between
competing
THEORIES of the lexicon, but between
competing descriptions
within
the same
theory.
Each
theory
must
provide
its own evaluation
measure, and a
comparison
of
competing
theories must be based on their success in
meeting
all
three levels of
adequacy.
Evaluation measures have
typically
been built into
linguistic
theories
implicitly
as measures of
length
of the
grammar,
i.e. its number of
symbols.
One
place where
such a measure is made
explicit
is in
Chomsky
& Halle
1968, Chapter
8. The
abbreviatory
conventions of the
theory-parentheses,
braces etc.-are
designed so
as to
represent linguistically significant generalizations
in terms of reduced
length
of grammatical description. Similarly, Chomsky & Halle develop the concept of
marking conventions in order to be able to distinguish more 'natural' (i.e. explana-
tory) rules from less 'natural' ones, in terms of the number of symbols needed to
write the rules.
In ?2 below, I will present two theories of the lexicon compatible with the
Lexicalist Hypothesis. One has a traditional evaluation measure which is applied
to the number of symbols in the lexicon; the other has a more unusual measure
of complexity, referring to 'independent information content'. In ?3 I will show
that the latter theory is preferable. It is hoped that such an example of a n-on-
traditional evaluation measure will lead to greater understanding of the issue of
explanatory adequacy, which has been a source of great confusion in the field.
2. FORMULATION OF TWO PRELIMINARY THEORIES. The fundamental linguistic
generalization that must be captured by any analysis of English is that words like
decision are related to words like decide in their morphology, semantics, and
syntactic patterning. For Lees 1960, it seemed very logical to express this relation-
ship by assuming that only the verb decide appears in the lexicon, and by creating
the noun decision as part of a transformational process which derives the NP
John's decision to go from the S John decided to go. However, for reasons detailed in
Chomsky 1970, this approach cannot be carried out consistently without expanding
the descriptive power of transformations to the point where their explanatory
power is virtually nil.
Without transformations to relate decide and decision, we need to develop some
other formalism. Chomsky takes the position that decide and decision constitute a
single lexical entry, unmarked for the syntactic feature that distinguishes verbs
from nouns. The phonological form decision is inserted into base trees under the
640
MORPHOLOGICAL AND SEMANTIC REGULARITIES IN THE LEXICON 641
node N;
decide is inserted under V. Since
Chomsky gives
no
arguments
for this
particular formulation,
I feel free to
adopt
here the alternative
theory
that decide
and decision have distinct but related lexical entries. In
regard
to
Chomsky's
further discussion, the theories are
equivalent;
the one to be used here extends
more
naturally
to the treatment of other kinds of lexical relations
(cf. ?5).
Our
problem
then is to
develop
a formalism which can
express
the relations between
lexical entries in accord with a native
speaker's
intuition.2
It is
important
to ask what it means to
capture
a native
speaker's
intuition of
lexical relatedness. It makes sense to
say
that two lexical items are related if
knowing
one of them makes it easier to learn the other-i.e. if the two items contain less
independent
information than two unrelated lexical items do. A
grammar
that
expresses
this fact should be more
highly
valued than one that does not. The
advocate of a transformational
relationship
between decide and decision claims
that this intuitive sense of relatedness is
expressed by
his
transformation,
in that
it is
unnecessary
to state the shared
properties
of the words twice. In
fact, it
is
unnecessary
to state the
properties
of decision at all, since
they
are
predictable
from the lexical
entry
of decide and the nominalization transformation.3 Hence
a
grammar containing
the nominalization transformation contains less
independent
information than one without it-since instead of
listing
a
large
number of
nominalizations,
we can state a
single
transformation. Within such a
grammar,
the pair decide-decision contains fewer symbols than a random pair such as decide-
jelly: given decide, there need be no lexical entry at all for decision, but jelly
needs a lexical entry whether or not decide is listed. Furthermore, the regularity
of decide-decision means that many pairs will be related by the transformation,
so a net reduction in symbols in the grammar is accomplished, and the evaluation
measure will choose a grammar including the transformation over one without it.
Since the Lexicalist Hypothesis denies a transformational relationship between
decide and decision, their relationship must be expressed by a rule within the
lexical component. Transformational grammar has for many years had a name
for the kind of rule that expresses generalizations within the lexicon-it is called a
2
Advocates of the theory of generative semantics might at this point be tempted to claim
that a formalism for separate but related lexical items is yet another frill required by lexicalist
syntax, and that generative semantics has no need for this type of rule. I hasten to observe that
this claim would be false. In the generative semantic theory of lexical insertion developed in
McCawley 1968 and adopted by Lakoff 1971a, lexical items such as kill and die have separate
lexical entries, and are inserted to distinct derived syntactic/semantic structures. For a
consistent treatment of lexical insertion, then, break in The window broke must be inserted onto
a tree of the form [vBREAK], while break in John broke the window must be inserted onto
[vCAUSE BREAK],
which has undergone Predicate Raising; in other words, break has two distinct
lexical entries. Semantically, the two breaks are related in exactly the same way as die and kill;
but clearly break and break must be related in the lexicon in a way that die and kill are not.
A similar argument holds for rule and ruler vs. rule and king. Thus generative semantics requires
rules expressing lexical relations for exactly the same reasons that the Lexicalist Hypothesis
needs them. Only in the earlier 'abstract syntax' of Lees 1960 and Lakoff 1971b are such rules
superfluous.
3
Of course, it also is difficult to express the numerous idiosyncrasies of nominalizations, as
Chomsky 1970 points out at some length.
lexical
redundancy rule; but little work has been done until now toward a formal-
ization of such rules.
The first
question
we must ask is:
By
what means does the existence of a lexical
redundancy
rule reduce the
independent
information content of the lexicon ? There
are two
possibilities.
The first, which is more obvious and also more akin to the
transformational
approach, gives
decide a
fully specified entry;
but the
entry
for
decision is either non-existent or, more
likely,
not
fully specified.
The
redundancy
rule fills in the
missing
information from the
entry
of decide at some
point
in the
derivation of a sentence
containing decision, perhaps
at the
stage
of lexical insertion.
As in the transformational
approach,
the
independent
information content of
decide-decision is
reduced,
because the
entry
for decision does not have to be filled
in. The evaluation measure
again
can
simply
count
symbols
in the
grammar.
We
may
call this
theory
the IMPOVERISHED-ENTRY THEORY.
Within such a
theory,
a
typical
lexical
entry
will be of the form
given
below. All
aspects
of this form are traditional
except
for the
'entry number', which is
simply
an index
permitting
reference to a lexical
entry independent
of its content:
(1)
rentry
number
/phonological representation/
L
syntactic features
SEMANTIC REPRESENTATION J
For
example,
decide will have the form 2. The
entry
number is
arbitrary,
and the
semantic
representation
is a
fudge standing
for some
complex
of semantic markers.
The NP indices correlate the syntactic arguments of the verb to the semantic
arguments (cf. Jackendoff 1972, Chapter 2, for discussion of this):
(2) -784
/decid/
+V
+[[NPi on NP2]
_NP1
DECIDE ON
NP2_
We now introduce a redundancy rule, 3, in which the two-way arrow may be read
as the symmetric relation 'is lexically related to'. The rule thus can be read: 'A
lexical entry x having such-and-such properties is related to a lexical entry w
having such-and-such properties.'
x
- ~w 1
/y +
ion/ ly!
(3)
+N
^
+
( ) + [NP1's (P) NP2] + [NP1 (P) NP2]
ABSTRACT RESULT OF ACT NP1 Z NP2
OF NP1'S Z-ING NP2
-
L
Given the existence of
3,
decision needs
only
the
following
lexical
entry:
(4) -375
derived from 784
by rule 3
642
This
theory
thus reduces the lexical
entry
for decision to a cross-reference to the
related verb
plus
a reference to the
redundancy
rule. The entries of
many
other
nouns will be
simplified similarly by
the use of a reference to 3. The
independent
information content of the lexicon can be determined
straightforwardly by adding
up
the information in lexical entries
plus
that in
redundancy rules; hence the
evaluation measure can be stated so as to favor
grammars
with fewer
symbols.
A second
possible approach
to lexical
redundancy rules, the FULL-ENTRY
THEORY,
assumes that both decide and decision have
fully specified
lexical entries, and that
the
redundancy
rule
plays
no
part
in the derivation of
sentences, as it does in both
the transformational
theory
and the
impoverished-entry theory. Rather,
the
redundancy
rule
plays
a role in the information measure for the lexicon. It
designates
as redundant that information in a lexical
entry
which is
predictable by
the existence
of a related lexical
item; redundant information will not be counted as
independent.
In the
full-entry theory,
lexical entries
again
have the form of
1, except
that an
entry
number is
unnecessary.
Decide has the form of 2, minus the
entry
number.
Decision, however, will have the
following entry:
(5)
[/decid
+
ion/
+N
+ [NP's __ on NP2]
ABSTRACT RESULT OF ACT OF
NP1'S DECIDING NP2
We evaluate the lexicon as follows: first, we must determine the amount of in-
dependent information added to the lexicon by introducing a single new lexical
entry; then, by adding up all the entries, we can determine the information content
of the whole lexicon.
For a first approximation, the information added by a new lexical item, given a
lexicon, can be measured by the following convention:
(6) (Information measure)
Given a fully specified lexical entry W to be introduced into the lexicon,
the independent information it adds to the lexicon is
(a) the information that W exists in the lexicon, i.e. that W is a word
of the language; plus
(b) all the information in W which cannot be predicted by the existence
of some redundancy rule R which permits W to be partially
described in terms of information already in the lexicon; plus
(c) the cost of referring to the redundancy rule R.
Here 6a is meant to reflect one's knowledge that a word exists. I have no clear
notion of how important a provision it is (it may well have the value zero), but I
include it for the sake of completeness. The heart of the rule is 6b; this reflects one's
knowledge of lexical relations. Finally, 6c represents one's knowledge of which
regularities hold in a particular lexical item; I will discuss this provision in more
detail in ?6.
To determine the independent information content of the pair decide-decision,
let us assume that the lexicon contains neither, and that we are adding them one by
one into the lexicon. The cost of adding 2, since it is related to nothing yet in the
lexicon, is the information that a word exists, plus
the
complete
information content
of the
entry
2. Given 2 in the lexicon, now let us add 5. Since its lexical
entry
is
completely predictable
from 2 and
redundancy
rule
3, its cost is the information
that a word exists
plus
the cost of
referring
to 3, which is
presumably
less than the
cost of all the information in 5. Thus the cost of
adding
the
pair
decide-decision
is the information that two words
exist, plus
the total information of the
entry 2,
plus
the cost of
referring
to
redundancy
rule 3.
Now note the
asymmetry
here: if we add decision first, then decide, we arrive at
a different sum: the information that two words exist, plus
the information
contained in
5, plus
the cost of
referring
to
redundancy
rule 3
(operating
in the
opposite direction).
This is more than the
previous sum, since 5 contains more
information than 2: the four extra
phonological segments
+ion and the extra
semantic information
represented by
ABSTRACT RESULT OF ACT OF. To establish
the
independent
information content for the entire lexicon, we must choose an
order of
introducing
the lexical items which minimizes the sum
given by
successive
applications
of 6. In
general,
the more
complex
derived items must be introduced
after the items from which
they
are derived. The information content of the
lexicon is thus measured as follows:
(7) (Information
content of the
lexicon)
Given a lexicon L
containing
n entries, W1, ..., Wn,
each
permutation
P
of the integers 1, ..., n determines an order
Ap
in which W1, ..., W,, can
be introduced into L. For each ordering
Ap,
introduce the words one
by one and add up the information specified piecemeal by procedure 6,
to get a sum
Sp.
The independent information content of the lexicon
L is the least of the n! sums
Sp,
plus the information content of the
redundancy rules.
Now consider how an evaluation measure can be defined for the full-entry
theory. Minimizing the number of symbols in the lexicon will no longer work,
because a grammar containing decide and decision, but not redundancy rule 3,
contains fewer symbols than a grammar incorporating the redundancy rule, by
exactly the number of symbols in the redundancy rule. Since we would like the
evaluation measure to favor the grammar incorporating the redundancy rule, we
will state the evaluation measure as follows:
(8) (Full-entry theory evaluation measure)
Of two lexicons describing the same data, that with a lower information
content is more highly valued.
The details of the full-entry theory as just presented are somewhat more
complex
than those of either the transformational theory or the impoverished-entry theory.
However, its basic principle is in fact the same: the evaluation measure is set
up
so as to minimize the amount of unpredictable information the speaker
knows
(or must have learned). However, the measure of unpredictable
information is no
longer the number of symbols in the lexicon, but the output
of information
measure 7: this expresses the fact that, when one knows two lexical items related
by redundancy rules, one knows less than when one knows two unrelated items
of commensurate complexity.
I will argue that the full-entry theory, in spite
of its
apparent complexity,
is
644
preferable
to the
impoverished-entry theory.
As a
prelude
to this
argument,
I will
mention two other discussions of
redundancy
rules.
The formulation of
morpheme-structure
rules-those
redundancy
rules which
predict possible phonological
combinations within the words of a
language-is
also
open
to an
impoverished-entry theory
and a
full-entry theory.
The former is
used in Halle 1959 and in the main
presentation
of
Chomsky
& Halle, where the
redundancy
rules are treated as
part
of the
re-adjustment
rules.
However, Chapter
8
of SPE describes some difficulties in this
theory pointed
out
by Stanley
1967. The
alternative
theory presented
is
(I believe)
a notational variant of the
full-entry
theory:
the
redundancy
rules do not
play
an active role in a
derivation, but rather
function as
part
of the evaluation measure for the lexicon.4 If the
full-entry theory
turns out to be correct for the
closely
related area of
morpheme-structure rules, we
should be inclined to
prefer
it for the rules
relating
lexical items.
Halle 1973
proposes
a variant of the
full-entry theory
for the
processes
of word
formation we are concerned with here. In his
theory,
the
redundancy
rules
generate
a set
of'potential
lexical items' of the
language.
He then uses the feature [
+
Lexical
Insertion]
to
distinguish
actual words from non-existent but
possible
words.
A
'special
filter'
supplies unpredictable information, including
the value of
[Lexical
Insertion].
The filter thus contains all the information of 6a and 6b, but
has
nothing
that I can relate to 6c.
Consider the contents of Halle's filter,
an unordered list of
idiosyncratic
informa-
tion. This list must include reference to every
lexical
item, including
all
potential
but
non-existent ones. It is not rule-governed-rather, it is intended to state precisely
what is not rule-governed. It is clear why Halle sets up
the lexicon in this
way:
he is
trying to retain a portion of the lexicon where the
independent
information can be
measured simply by counting features, and the filter is
just
such a
place.
Our
formulation of the information measure in the
full-entry theory
has freed us of the
necessity of listing the independent information separately,
or of
distinguishing
it
extrinsically from the redundant information. Instead we have a lexicon
containing
merely a set of fully specified lexical entries (giving exactly
those words that
exist),
plus the set of redundancy rules. (I will mention Halle's
theory again briefly
at the
end of ?5.1.)
3. WHICH THEORY? The argument for fully specified
entries comes from con-
sideration of words whose affixation is predictable by
a
redundancy rule,
but whose
putative derivational ancestors are not lexical items of
English. Examples
are
aggression, retribution, and fission,
which have the
morphological
and semantic
properties of the nouns described in
redundancy
rule
3,
but for which there are no
corresponding verbs *aggress, *retribute,
or
*fiss.
Our intuition about these items
is that they contain less independent information than
comparable
items which
cannot be partially described by
a
redundancy
rule
(e.g.
demise and
soliloquy),
but that they contain more than
comparable
items which are related to
genuine
lexical items (e.g. decision, attribution).
4
Chomsky & Halle retain impoverished
lexical
entries,
but
only
for the
purpose
of
counting
up features not predicted by redundancy
rules and
listing
what
potential
words
actually
exist.
Paired with each impoverished entry, however,
is a
fully specified entry,
which is what
actually
takes part in the derivation.
How can the three theories we have discussed describe these verbs? The trans-
formational theory
must
propose
a
hypothetical
lexical item marked obligatorily
to undergo
the nominalization transformation (cf.
Lakoff
1971b).
Thus the lexicon
must be
populated
with lexical items such as
*fiss
which are
positive
absolute
exceptions
to various word-formation transformations. The
positive
absolute
exception
is of course a
very powerful
device to include in
grammatical theory
(see
discussion in Jackendoff 1972). Furthermore,
the use of an EXCEPTION feature
to
prevent
a lexical item from
appearing
in its 'basic' form is counter-intuitive: it
claims that English
would be
simpler
if
*fiss
were a word, since one would not have
to learn that it is
exceptional.
Lakoff in fact claims that there must be a hypothetical
verb *king, corresponding
to the noun
king
as the verb rule
corresponds
to the
noun ruler. Under his
theory,
the introduction of a real verb
king
would make
English simpler,
in that it would eliminate an absolute
exception
feature from the
lexicon. In other words,
the evaluation measure for the transformational theory
seems to favor a lexicon in which
every
noun with functional semantic information
has a related verb. Since there is little evidence for such a
preference,
and since it is
strongly
counter-intuitive in the case of
king,
the transformational account-besides
requiring
a
very powerful mechanism,
the absolute
exception-is
incorrect at the
level of
explanatory adequacy.
Next consider the
impoverished-entry theory.
There are two
possible
solutions
to the problem of non-existent derivational ancestors. In the first, the entry of
retribution is as unspecified as that of decision (4); and it is related by redundancy
rule 3 to an entry retribute, which however is marked [-Lexical Insertion].
The
cost of adding retribution to the lexicon is the sum of the information in the entry
*retribute, plus the cost of retribution's references to the redundancy rule and to the
(hypothetical) lexical item, plus the information that one word exists (or, more
likely, two-and the information that one of those is non-lexical). Under the
reasonable assumption that the cost of the cross-references is less than the cost of
the phonological and semantic affixes, this arrangement accurately reflects our
initial intuition about the information content of retribution. Furthermore,
it
eliminates the use of positive absolute exceptions to transformations,
replacing
them with the more restricted device [-Lexical Insertion]. Still, it would be nice to
dispense with this device as well, since it is rather suspicious to have entries which
have all the properties of words except that of being words. The objections
to
hypothetical lexical items in the transformational theory at the level of explanatory
adequacy in fact apply here to [-Lexical Insertion] as well: the language is always
simpler if this feature is removed.
We might propose eliminating the hypothetical lexical entries by building them
into the entries of the derived items:
(9) 511
1
derived by rule 3 from
/retribut/
+V
+ [NP1 for NP2]
_
NP2 RETRIBUTE NP2 _ -
646
The cost of 9 is thus the information that there is a word retribution, plus
the
information within the inner brackets, plus
the cost of
referring
to the
redundancy
rule.
Again,
the
assumption
that the cross-reference costs less than the additional
information
/ion/
and ABSTRACT RESULT OF ACT OF
gives
the correct
description
of our intuitions. This time we have avoided
hypothetical
lexical items, at the
expense
of
using
rather artificial entries like 9.
This
artificiality betrays
itself when we
try
to describe the relation between sets
like
aggression-aggressive-aggressor, aviation-aviator, and retribution-retributive.
If there are
hypothetical
roots
*aggress, *aviate,
and *retribute, each of the members
of these sets can be related to its root
by
the
appropriate redundancy
rule 3,
lOa, or
10b,
where lOa and
10b respectively
describe
pairs
like
predict-predictive
and
protect-protector (I
omit the semantic
portion
of the rules at this
point
for
convenience-in
any case, ?4 will
justify separating
the
morphological
and
semantic rules):
[x ] [w 1
(10)
a.
l/y
+
ive/
<[>
IUy
[+A J
L+VJ
b.
ly +
or/
-
[/y/
]
+N
_
+V
Suppose
we eliminate
hypothetical
lexical items in favor of entries like 9 for
retribution. What will the entry for retributive look like? One possibility is:
(11) [65
1
derived by rule lOa from
"/retribut/
+V
+NP1 for NP2
_NP1 RETRIBUTE NP2 _
But this solution requires us to list the information in the inner brackets twice, in
retribution and retributive: such an entry incorrectly denies the relationship between
the two words.
Alternatively, the entry for retributive might be 12 (I use 3' here to denote the
inverse of 3, i.e. a rule that derives verbs from -ion nouns; presumably the presence
of 3 in the lexical component allows us to use its inverse as well):
(12) -65
-
derived
by
3' and
lOa
from
511
Thus retributive is related to retribution by a sequence of redundancy rules, and the
independent information content of the pair retribution-retributive is the informa-
tion that there are two words, plus the information within the inner brackets of 9,
plus the cost of referring to 3' once and lOa twice. This is closer to the intuitively
correct solution, in that it relates the two words. However, it is still suspicious,
because it claims retribution is more basic than retributive.
Clearly
the entries could
just
as
easily
have been set
up
with no difference in cost
by making
retributive
basic. The same situation will arise with a
triplet
like
aggression-aggressor-
aggressive,
where the choice of one of the three as basic must be
purely arbitrary.
Intuitively,
none of the three should be chosen as basic, and the formalization of the
lexicon should reflect this. The
impoverished-entry theory
thus faces a choice:
either it
incorporates hypothetical
lexical items, or it describes in an unnatural
fashion those related lexical items which are related
through
a non-lexical root.
Consider now how the
full-entry theory
accounts for these sets of
words,
beginning
with the case of a
singleton
like
perdition (or conflagration),
which has
no relatives like
*perdite, *perditive etc., but which
obviously
contains the -ion
ending
of rule 3. We would like the
independent
information content of this item
to be less than that of a
completely idiosyncratic
word like orchestra-but more
than that of, say, damnation, which is based on the lexical verb damn. The
impover-
ished-entry theory
resorts either to a
hypothetical
lexical item
*perdite
or to an
entry containing
another
entry,
like 9, which we have seen to be
problematic.
The
full-entry theory,
on the other hand, captures
the
generalization
without
extra devices. Note that 6b, the measure of non-redundant information in the
lexical
entry,
is
cleverly
worded so as to
depend
on the existence of redundant
information somewhere in the lexicon, but not
necessarily
on the existence of
related lexical entries. In the case of perdition, the only part of the entry which
represents a regularity in the lexicon is in fact the -ion ending, which appears as
part of the redundancy rule 3. What remains irregular is the residue described in
the right-hand side of 3, i.e. that part of perdition which corresponds to the non-
lexical root *perdite. Hence the independent information content of perdition is the
information that there is a word, plus the cost of the root, plus the cost of referring
to rule 3. Perdition adds more information than damnation, then, because it has a
root which is not contained in the lexicon; it contains less information than
orchestra because the ending -ion and the corresponding part of the semantic
content are predictable by 3 (presumably the cost of referring to 3 is less than the
information contained in the ending itself; see ?6).
We see then that the full-entry theory captures our intuitions about perdition
without using a hypothetical lexical item. The root *perdite plays only an indirect
role, in that its COST appears in the evaluation of perdition as the difference between
the full cost of perdition and that of the suffix; nowhere in the lexicon does the root
appear as an independent lexical entry.
Now turn to the rootless pair retribution-retributive. Both words will have fully
specified lexical entries. To determine the independent information content of the
pair, suppose that retribution is added to the lexicon first. Its independent informa-
tion, calculated as for perdition above, is the information that there is a word, plus
the cost of the root *retribute, plus the cost of referring to 3. Note again that
*retribute does not appear anywhere in the lexicon. Now we add to the lexicon
the entry for retributive, which is entirely predictable from retribution plus redun-
dancy rules 3 and lOa. According to information measure 6, retributive adds the
information that it is a word, plus the cost of referring to the two redundancy rules.
The cost of the pair for this order of introduction is therefore the information that
648
there are two
words, plus
the information in the root *retribute, plus
the cost of
referring
to
redundancy
rules three times.
Alternatively,
if retributive is added to
the lexicon
first,
followed
by retribution, the
independent
information content of the
pair
comes out the same, though
this time the cost of the root
appears
in the
evaluation of retributive. Since the costs of these two orders are commensurate,
there is no
optimal
order of
introduction,
and thus no reason to consider either
item basic.
Similarly,
the
triplet aggression-aggressor-aggressive
will have, on
any
order of
introduction, an
independent
information content
consisting
of the information
that there are three words, plus
the information content of the root
*aggress, plus
the cost of
referring
to
redundancy
rules five times
(once
for the first
entry
in-
troduced,
and twice for each of the
others).
Since no
single
order
yields
a
signi-
ficantly
lower information content, none of the three can be considered basic to the
others.
Thus the
full-entry theory provides
a
description
of rootless
pairs
and
triplets
which avoids either a root in the lexicon or a claim that one member of the
group
is
basic, the two alternatives encountered
by
the
impoverished entry theory.
The full-
entry theory
looks still more
appealing
when contrasted with the transformational
theory's
account of these items. The
theory
of Lakoff 1971b introduces a
positive
absolute
exception
on
*perdite, requiring
it to nominalize; but
*aggress may
undergo either -ion nominalization, -or nominalization, or -ive adjectivaliza-
tion, and it must undergo one of the three. Lakoff is forced to introduce Boolean
combinations of exception features, together marked as an absolute exception, in
order to describe this distribution-patently a brute force analysis.
In the full-entry theory, then, the lexicon is simply a repository of all information
about all the existing words; the information measure expresses all the relation-
ships. Since the full-entry theory escapes the pitfalls of the impoverished-entry
theory, without giving up adequacy of description, we have strong reason to
prefer the former, with its non-standard evaluation measure. From here on, the
term 'lexicalist theory' will be used to refer only to the full-entry theory.
Before concluding this section, let us consider a question which frequently
arises in connection with rootless pairs and triplets: What is the effect on the lexicon
if a back-formation takes place, so that a formerly non-existent root (say *retribute)
enters the language? In the transformational theory, the rule feature on the hypo-
thetical root is simply erased, and the lexicon becomes simpler, i.e. more regular.
In the lexicalist theory, the account is a bit more complex, but also more sophis-
ticated. If retribute were simply added without disturbing the previous order for
measuring information content, it would add to the cost of the lexicon the informa-
tion that there is a new word plus the cost of referring to one of the redundancy
rules. Thus the total cost of retribution-retributive-retribute would be the informa-
tion that there are three words, plus the information in the root retribute, plus the
cost of four uses of redundancy rules. But now that retribute is in the lexicon, a
restructuring is possible, in which retribute is taken as basic. Under this order of
evaluation, the information content of the three is the information that there are
three words, plus the information in retribute, plus only two uses of redundancy
rules. This restructuring, then, makes the language simpler than it was before
retribute was
introduced, except
that there is now one more word to learn than
before. What this account
captures
is that a back-formation ceases to be
recognized
as such
by speakers precisely
when
they
restructure the evaluation of the lexicon,
taking
the back-formation rather than the
morphological
derivatives as basic.
I
speculate
that the verb
*aggress,
which seems to have
only marginal
status in
English,
is still evaluated as a
back-formation,
i.e. as a derivative of
aggression-
aggressor-aggressive,
and not as their
underlying
root. Thus the lexicalist
theory
of nominalizations
provides
a
description
of the diachronic
process
of back-
formation which does more than
simply
erase a rule feature on a
hypothetical
lexical item: it can describe the crucial
step
of
restructuring
as well.
4. SEPARATE MORPHOLOGICAL AND SEMANTIC RULES. At the outset of the dis-
cussion, I stated
redundancy
rule 3 so as to relate lexical items both at the mor-
phological
and semantic levels. In
fact,
this formulation will not do. It claims that
there is a
particular meaning,
ABSTRACT RESULT OF ACT OF V-ING, associated with the
ending
-ion.
However, several different semantic relations obtain between -ion
nominals and their related verbs, and several
nominalizing endings
can
express
the
same
range
of
meanings.
Some of the
morphological
rules are stated in M1
(the
morphological part
of
3), M2, and M3:
(13)
..M:
[/+ ion/]
[ ]
M2:
[Y+ ment]
]
M3: [Y+al]
[+VI]
Some of the semantic rules are S1
(the
semantic part
of
3), S2, and S3:
[+N
+
[NPi's
-
((P)NP2)] "+ V
(14)
S1: ABSTRACT RESULT OF + [NP1
-
((P)NP2)]
ACT OF NP1'S Z-ING NP1 Z
NP2
NP2
[+N 1 [+V 1
S2: +
[
(NP2)]
] + [NP1 (NP2)]
GROUP THAT Z-S (NP2) NP1 Z (NP2)
+N ]
+[(NP1's) ((P)NP2)] [+V
S3:
(ACT^
*- +
+[NP1--((P)NP)]I
.^P^ pROCSOF0 NP1ZNP2 (NP1'S)
{PROCESS) JF [ LN1ZN21
l Z-ING NP2
An example of the cross-classification of the morphological and semantic
relations is the following table of
nouns,
where each row contains nouns of the
650
same semantic
category,
and each column contains nouns of the same
morpho-
logical category.
(15)
Ml M2 M3
S 1: discussion
argument
rebuttal
S2:
congregation government
S3:
copulation
establishment refusal
That is, John's discussion
of
the claim, John's
argument against
the claim, and
John's rebuttal
of
the claim are
semantically
related to John discussed the claim,
John
argued against
the claim, and John rebutted the claim
respectively
in a
way
expressed by
the
subcategorization
conditions and semantic
interpretations
of S1;
the
congregation
and the
government ofFredonia
are related
by
S2 to
they congregate
and
they govern
Fredonia; and John's
copulation
with
Mary,
John's establishment
of
a new order, and John's
refusal of
the
ojffer
are related
by
S3 to John
copulated
with
Mary,
John established a new order, and John refused
the
offer.5
There are further
nominalizing endings
such as -ition
(supposition), -ing (writing)
and -0
(offer);
and
further semantic relations,
such as ONE WHO Z's
(writer, occupant)
and THING IN
WHICH ONE Z's
(residence, entrance).
The
picture
that
emerges
is of a
family
of
nominalizing
affixes and an associated
family
of noun-verb semantic
relationships.
To a certain extent,
the
particular
members of each
family
that are
actually
utilized
in
forming
nominalizations from a verb are chosen
randomly.
Insofar as the choice
is random, the information measure must measure independently the cost of
referring to morphological and semantic redundancy rules (cf. ?6 for further
discussion).
How do we formalize the information measure, in light of the separation of
morphological rules (M-rules) and semantic rules (S-rules)? An obvious first point
is that a semantic relation between two words without a morphological relationship
cannot be counted as redundancy: thus the existence of the verb rule should render
the semantic content of the noun ruler redundant, but the semantic content of
king must count as independent information. Hence we must require a morpho-
logical relationship before semantic redundancy can be considered.
A more delicate question is whether a morphological relationship alone should
be counted as redundant. For example, professor and commission (as in 'a salesman's
commission') are morphologically related to profess and commit; but the existence
of a semantic connection is far from obvious, and I doubt that many English
speakers other than philologists ever make the association. What should the
information content of these items include? A permissive approach would allow
the phonology of the root to be counted as redundant information; the only
non-redundant part of professor, then, would be some semantic information like
TEACH. A restrictive approach would require a semantic connection before mor-
phology could be counted as redundant; professor then would be treated like
perdition, as a derived word with a non-lexical root, and both the phonology
/profess/ and the semantics TEACH would count as independent information.
5
I assume, with Chomsky 1970, that the of in nominalizations is transformationally inserted.
Note also that the semantic relations are only approximate-the usual idiosyncrasies appear in
these examples.
In ?5.1 I will
present
two cases in which
only morphological
rules
play
a role
because there are no semantic
regularities. However, where semantic rules exist,
it has not
yet
been established whether their use should be
required
in
conjunction
with
morphological
rules. Therefore the
following
restatement of information
measure 6 has alternative versions:
(16) (Information measure)
Given a
fully specified
lexical
entry
W to be introduced into the lexicon,
the
independent
information it adds to the lexicon is
(a)
the information that W exists in the lexicon; plus
(b) (permissive form)
all the information in W which cannot be
predicted by
the existence of an M-rule which
permits
W to be
partially
described in terms of information
already
in the
lexicon, including
other lexical items and S-rules; or
(b') (restrictive form)
all the information in W which cannot be
predicted by
the existence of an M-rule and an associated S-rule
(if
there is
one)
which
together permit
W to be
partially
described
in terms of information
already
in the lexicon; plus
(c)
the cost of
referring
to the
redundancy
rules.
Examples
below will show that the
permissive
form of 16 is
preferable.
5. OTHER APPLICATIONS. The
redundancy
rules
developed
so far describe the
relation between verbs and nominalizations. It is clear that similar rules can describe
de-adjectival nouns (e.g. redness, entirety, width), deverbal adjectives (predictive,
explanatory), denominal adjectives (boyish, national, fearless), de-adjectival verbs
(thicken, neutralize, yellow), and denominal verbs (befriend, originate, smoke).
Likewise, it is easy to formulate rules for nouns with noun roots like boyhood,6
adjectives with adjective roots like unlikely, and verbs with verb roots like re-enter
and outlast.7 Note, by the way, that phonological and syntactic conditions such as
choice of boundary and existence of internal constituent structure can be spelled
out in the redundancy rules.
Note also that a complex form such as transformationalist can be accounted
for with no difficulty: each of the steps in its derivation is described by a regular
redundancy rule. No question of ordering rules or of stating rule features need ever
arise (imagine, by contrast, the complexity of Boolean conditions on exceptions
which would be needed in the entry for transform, if we were to generate this word
in a transformational theory of the lexicon). Transformationalist is fully specified
in the lexicon, as are transform, transformation, and transformational. The total
information content of the four words is the information that there are four words,
plus the information in the word transform, plus idiosyncratic information added
by successive steps in derivation (e.g. that transformation in this sense refers to a
component of a theory of syntax and not just any change of state, and that a
transformationalist
in the sense used in this paper is one who believes in a particular
6
The abstractness of nouns in -hood, mentioned by Halle 1973 as an example, is guaranteed
by the fact that the associated semantic rule yields the meaning STATE (or PERIOD) OF BEING
A Z.
7
Cf. Fraser 1965 for an interesting discussion of this last class of verbs.
652
form of transformational
theory), plus
the cost of
referring
to the three
necessary
redundancy
rules. Note that the information measure allows
morphologically
derived lexical items to contain more semantic information than the rule
predicts;
we
use this fact
crucially
in
describing
the information content of
transformationalist.
More
striking examples
will be
given
below.
With these
preliminary observations, I will now
present
some more diverse
applications
of the
redundancy
rules.
5.1. PREFIX-STEM VERBS.
Many
verbs in
English
can be
analysed
into one of the
prefixes in-, de-, sub-, ad-, con-, per-,
trans- etc. followed
by
one of the stems -sist,
-mit, -fer, -cede,
-cur etc.
Chomsky
& Halle
argue,
for
phonological reasons, that
the
prefix
and stem are
joined by
a
special boundary
=. Whether a
particular
prefix
and a
particular
stem
together
form an actual word of
English
seems to be
an
idiosyncratic
fact:
(17)
*transist transmit transfer *transcede *transcur
persist permit prefer precede *precur
consist commit confer concede concur
assist admit *affer accede *accur
subsist submit suffer succeed *succur
desist *demit defer *decede *decur
insist *immit infer *incede incur
We would like the information measure of the lexicon to take into account the
construction of these words in computing their information content. There are two
possible solutions. In the first, the lexicon will contain, in addition to the fully
specified lexical entries for each actually occurring prefix-stem verb, a list of the
prefixes and stems from which the verbs are formed. The redundancy rules will
contain the following morphological rule, which relates three terms:
E+
StemJJ
The information content of a particular prefix-stem verb will thus be the informa-
tion that there is a word, plus the semantic content of the verb (since there is no
semantic rule to go with 18, at least in most cases), plus the cost of referring to
morphological rule 18. The cost of each individual prefix and stem will be counted
only once for the entire lexicon.
Since we have up to this point been unyielding on the subject of hypothetical
lexical items, we might feel somewhat uncomfortable about introducing prefixes
and stems into the lexicon. However, this case is somewhat different from earlier
ones. In the case of perdition, the presumed root is the verb *perdite. If perdite
were entered in the lexicon, we would have every reason to believe that lexical
insertion transformations would insert perdite into deep structures, and that the
syntax would then produce well-formed sentences containing the verb perdite.
In order to prevent this, we would have to put a feature in the entry for perdite to
block the lexical insertion transformations. It is this rule feature [-Lexical
Insertion]
which we wish to exclude from the
theory.
Consider now the lexical
entry
for the
prefix
trans-:
(19) [/trans/
]
+ly
[Prefix]
Trans- has no
(or little)
semantic information, and as
syntactic
information has
only
the marker
[+Prefix].
Since the
syntactic category
Prefix is not
generated by
the
base rules of
English,
there is no
way
for trans- alone to be inserted into a
deep
structure. It can be inserted
only
when combined with a stem to form a verb, since
the
category
Verb does
appear
in the base rules. Hence there is no need to use the
offending
rule feature
[-Lexical Insertion]
in the
entry
for trans-, and no need to
compromise
our earlier
position
on
*perdite.
However, there is another
possible
solution which eliminates even entries like 19,
by introducing
the
prefixes
and stems in the
redundancy
rule itself. In this case, the
redundancy
rule consists of a
single term, and
may
be
thought
of as the
simplest
type
of'word-formation' rule:
trans sist
per
mit
con fer
(20) /
aD = cede
/
( suB tain /
de
The information content of prefix-stem verbs is the same as before, but the cost of
the individual prefixes and stems is counted as
part
of the
redundancy rule,
not of
the list of lexical items. The two solutions appear
at this level of
investigation
to be
equivalent, and I know as yet of no empirical evidence to decide which should be
permitted by the theory or favored by the evaluation measure.
This is the case of morphological redundancy without semantic
redundancy
promised in ?4. Since, for the most part, prefixes and stems do not
carry
semantic
information, it is not possible to pair 18 or 20 with a semantic rule.8 Obviously
the information measure must permit the morphological redundancy anyway.
Besides complete redundancy, we now have three cases to consider: those in which
a semantic redundancy rule relates a word to a non-lexical root
(e.g. perdition),
those in which the semantic rule relates a word incorrectly
to a lexical root
(e.g.
professor), and those in which there is no semantic rule at all. The three cases are
independent, and a decision on one of them need not affect the others. Thus the
decision to allow morphological redundancy for prefix-stem verbs still leaves open
the question raised in ?4 of how to treat professor.
It should be pointed out that word-formation rules like 20 are very similar to
8
If they did carry semantic information,
it would be more difficult, but not necessarily
impossible, to state the rule in the form of 20. This is a potential difference between the solutions,
concerning which I have no evidence at present.
654
Halle's word-formation rules
(1973).
The
major
difference here between his
theory
and mine is that his lexicon
includes, in addition to the
dictionary,
a list of all
MORPHEMES in the
language, productive
and
unproductive.
The
present theory
lists
only
WORDS in the lexicon. Productive affixes are introduced as
part
of lexical
redundancy rules, and
non-productive
non-lexical
morphemes (such as
*perdite)
do not
appear independently anywhere
in the lexical
component.
Other than the
arguments already
stated
concerning
the feature
[Lexical
Insertion],
I know of little
evidence to
distinguish
the two solutions.
However, since Halle has not formulated
the
filter, which
plays
a crucial role in the evaluation measure for his
theory
of the
lexicon, it is hard to
compare
the theories on the level where the
present theory
makes its most
interesting
claims.
5.2. NOUN COMPOUNDS. The
compound
nouns in 21 are all formed
by
conca-
tenating
two nouns:
(21)
a.
garbage man,
iceman,
milkman,
breadbasket,
oil drum
b.
snowman, gingerbread man, bread
crumb,
sand castle
c.
bulldog, kettledrum, sandstone, tissue
paper
Although
the
meaning
of each
compound
is formed from the
meanings
of the two
constituent
nouns, the
way
in which the
meaning
is formed differs from line to line.
Part of a
speaker's knowledge
of the
English
lexicon is the
way
in which the
meanings
of
compounds
are related to the
meanings
of their constituents: thus we
would say that someone did not know English if he (seriously) used garbage man to
mean 'a man made out of garbage', by analogy with snowman.
If one brought Lees 1960 up to date, one would get an approach to compounds
which uses transformations to combine nouns randomly, controlled by exception
features so as to produce only the existing compounds with the correct meanings.
But how can such exception features be formulated ? Either noun in a compound
can be changed, with a corresponding change in acceptability: we have garbage
man, garbage truck, but not *garbage gingerbread, *garbage tree; we also have
garbage man, gingerbread man, but not *ant man, *tissue man. Thus the use of
exception features will require each noun in the lexicon to be cross-listed with
every other noun for the compounding transformations. Furthermore, since
gingerbread is itself a compound, ginger, bread, and man will all somehow have to be
related by the exception features. In the end, the exception features appear to be
equivalent to a listing of all the existing compounds along with their meanings.
In the lexicalist theory, we can dispense with exception features in the description
of compounds. We simply give each actually occurring compound a fully specified
lexical entry, and in the list of redundancy rules we enter morphological rule
22 and semantic rules 23a,b,c, describing the data of 21a,b,c respectively. Of
course, there are a great number of additional semantic rules (cf. Lees, chapter 4);
I list only these three as a sample:
(22) [I[Nx] [NY]/!] [|+N]
m-i\
,
+N
JLZJ
(3) a. [Z THAT CARRIES W] (
\+N
]
}
1,~~~~I W\\
f+N1l
bZ MADE OF WI '' +N { N}
C.LZ LIKE A W\
\
[+N] [z
c+. A W]
+
N{ [\}
The
redundancy
rules thus define the set of
possible compounds
of
English,
and the
lexicon lists the
actually occurring compounds.
The information measure 16
gives
an
intuitively
correct result for the
independent
information in
compounds.
For
example,
since the nouns
garbage
and man are
in the lexicon, all their information will be counted as redundant in
evaluating
the
entry
for
garbage
man. Thus the
independent
information content of
garbage
man
will be the information that such a word exists, plus any idiosyncratic
facts about
the
meaning (e.g.
that he
picks up
rather than delivers
garbage), plus
the cost of
referring
to 22 and 23a. The information of a
complex compound
like
gingerbread
man is measured in
exactly
the same
way;
but the
independent
information in its
constituent
gingerbread
is reduced because of its relation to
ginger
and bread.
Gingerbread
man is thus
parallel
in its evaluation to the case of
transformationalist
cited earlier.
Now consider the problem of evaluating the following nouns:
(24) a. blueberry, blackberry
b. cranberry, huckleberry
c. gooseberry, strawberry
Blueberry and blackberry are obviously formed along the lines of morphological
rule 25 and semantic rule 26. This combination of rules also forms flatiron, high-
chair, madman, drydock, and many others.
(25) [I[Ax] [NY]/] JI +A]}
(26) [Z WHICH IS
WJ
{[+A ]}
Thus blueberry and blackberry are evaluated in exactly the same way as garbage
man.
Cranberry and huckleberry contain one lexical morpheme and one non-lexical
morpheme. The second part (-berry) and its associated semantics should be redun-
656
dant,
but the
phonological segments /craen/
and
/hukl/,
and the semantic charac-
teristics
distinguishing
cranberries and huckleberries from other kinds of
berries,
must be non-redundant. Hence this case is
just
like
perdition,
where a non-lexical
root is
involved, and the information measure formulated for the case of
per-
dition will
yield
the
intuitively
correct result. One
problem
is that the lexical
categories
of cran- and huckle- are indeterminate, so it is unclear which mor-
phological
rule
applies. Likewise, it is unclear which semantic rule
applies. However,
I see
nothing against arbitrarily applying
the rules which cost least; this conven-
tion will minimize the information in the lexicon without
jeopardizing
the
generality
of the evaluation
procedure.
We observe next that
gooseberry
and
strawberry
contain two lexical
morphemes
and are both berries, but
gooseberries
have
nothing
to do with
geese
and straw-
berries have
nothing
to do with straw. This case is thus like
professor,
which has
nothing
to do
semantically
with the verb
profess,
and
exactly
the same
question
arises in their evaluation: should
they
be intermediate in cost between the
previous
two cases, or should
they
be evaluated like
cranberry,
with straw- and
goose-
counted as non-redundant? The fact that there is
pressure
towards
phonological
similarity
even without semantic basis
(e.g. gooseberry
was once
groseberry)
is
some evidence in favor of the
permissive
form of
16,
in which
morphological
similarity
alone is sufficient for
redundancy.
Another semantic class of compound nouns (exocentric compounds) differs
from those mentioned so far in that neither constituent describes what kind of
object the compound is. For example, there is no way for a non-speaker of English
to know that a redhead is a kind of person, but that a blackhead is a kind of
pimple.9 Other examples are redwing (a bird), yellow jacket (a bee), redcoat (a
soldier), greenback (a bill), bigmouth (a person), and big top (a tent). The mor-
phological rule involved is 25; the semantic rule must be
[?N 1 [[?N]
(27)
THING WITH A Z j <-4 [ A
.]
[
WHICH IS
w \
[?A ]
This expresses the generalization inherent in these compounds, but it leaves open
what kind of object the compound refers to. The information measure gives as the
cost of redhead, for example, the information that there is a word, plus the informa-
tion that a redhead is a person (a more fully specified form of THING in 27), plus
the cost of referring to 27. This evaluation reflects precisely what a speaker must
learn about the word.
A transformational theory of compound formation, on the other hand, encounters
severe complication with this class of compounds. Since a
compounding trans-
formation must preserve functional semantic content, the
underlying
form of
redhead must contain the information that a redhead is a person and not a pimple,
and this information must be captured somehow in rule features
(or
derivational
constraints) which are idiosyncratic to the word redhead. I am sure that such con-
straints can be formulated, but it is not of much interest to do so. The need for
9
I am grateful to Phyllis Pacin for this example.
these elaborate rule features stems from the nature of transformations. Any
phrase-marker
taken as
input
to a
particular
transformation
corresponds
to a set
of
fully specified output phrase-markers.
In the case of exocentric
compounds,
the
combination of the two constituent words
by
rule 27 does not
fully specify
the
output,
since the nature of THING in 27 is
inherently
indeterminate.
We thus see an
important empirical
difference between lexical
redundancy
rules and transformations: it is
quite
natural and
typical
for lexical
redundancy
rules to relate items
only partially,
whereas transformations cannot
express partial
relations. Several illustrations of this
point
have
appeared already,
in the mor-
phological
treatment of
perdition
and
cranberry
and in the semantic treatment of
transformationalist. However, the case of exocentric
compounds
is
perhaps
the
most
striking example,
since no combination of
exception
features and
hypothetical
lexical items can make the transformational treatment
appear
natural. The lexicalist
treatment,
since it allows rules to relate
exactly
as much as
necessary,
handles
exocentric
compounds
without
any
remarkable extensions of the
machinery.
5.3. CAUSATIVE VERBS. There is a
large
class of verbs which have both transitive
and intransitive forms;10 e.g.,
(28)
a. The door
opened.
b. Bill
opened
the door.
(29)
a. The window broke.
b. John broke the window.
(30) a. The coach changed into a
pumpkin.
b. Mombi the witch changed the coach from a handsome
young
man
into a pumpkin.
It has long been a concern of transformational grammarians
to
express
the fact
that the semantic relations of door to
open,
of window to
break,
and of coach to
change are the same in the transitive and intransitive cases.
There have been two widely accepted approaches,
both transformational in
character. The first, that of Lakoff
1971b,
claims that the
underlying
form of the
transitive sentence contains the intransitive sentence as a
complement
to a verb of
causation-i.e., that the underlying
form of 28b is revealed more
accurately
in the
sentence Bill caused the door to
open.
The other
approach,
case
grammar,
is that of
Fillmore 1968. It claims that the semantic relation of door to
open
is
expressed
syntactically in the deep
structures of 28a and
28b,
and that the choice of
subject
is a purely surface fact. The deep
structures are taken to be 31la and 31b
respectively:
(31) a. past open [objective
the
door]
b. past open [objective
the
door] [Agentive by Bill]
These proposals and their consequences
have been criticized on diverse
syntactic
and semantic grounds (cf., e.g., Chomsky 1972,
Fodor
1970,
and Jackendoff
1972, Chap. 2); I do not intend to
repeat
those criticisms here. It is of interest to
note, however, that Lakoff's analysis
of causatives is the
opening wedge
into the
generative semanticists' theory
of lexicalization: if the causative verb break is the
result of a transformation,
we would miss a
generalization
about the nature of
10
This class may also include the two forms of begin proposed by Perlmutter 1970.
658
agentive
verbs
by failing
to derive the causative verb kill
by
the same transformation.
But since kill has as intransitive
parallel
not kill but die, and since there are
many
such causative verbs without
morphologically
related
intransitives, the
only way
to avoid an
embarrassing
number of
exceptions
in the lexicon is to
perform
lexical
insertion AFTER the causative transformation, as
proposed by McCawley
1968.
Again,
the
difficulty
in this solution lies in the nature of transformations. There
are two
cross-classifying generalizations
which a
satisfactory theory
must
express:
all causative verbs must share a semantic element in their
representation;
and the
class of verbs which have both a transitive causative form and an intransitive non-
causative form must be described in a
general
fashion.
Expressing
the second
generalization
with a transformation
implies
a
complete regularity,
which in turn
loses the first
generalization; McCawley's
solution is to make a radical move to
recapture
the first
generalization.
There remains the alternative of
expressing
the second
generalization
in a
way
that does not disturb the first. Fillmore's solution is
along
these
lines; but he still
requires
a radical
change
in the
syntactic component,
viz. the introduction of case
markers.
The lexicalist
theory
can leave the
syntactic component unchanged by using
the
power
of the lexicon to
express
the
partial regularity
of the second
generalization.
The two forms of break are
assigned separate
lexical entries:
7/brak/
(32) a.
NP ]
,NPi
BREAK
7/brtk/
+V
+[NP2 ' NP1]
_NP2
CAUSE
(NP, BREAK)_
The two forms are related by the
following morphological
and semantic rules:"
(33)
a[ KV] [+V
b. +[NP1 ]|
-
+[NP2 NP1]
[NP,
W
NP2 CAUSE
(NP, W)J
Thus the independent information contained in the two entries for break is the fact
that there are two words,12 plus the independent information in the intransitive
form 32a, plus the cost of referring to the redundancy
rules. Hence the relation
between the (a) and (b) sentences in 28-30 is expressed
in the lexicon and not in
the transformational component.
11
Since 32a is an identity rule, it is possibly dispensable. I have included it here for the sake of
explicitness, and also in order to leave the form of the information measure
unchanged.
12
Perhaps the use of the identity rule 32a could make the two words count as
one,
if this were
desirable. I have no intuitions on the matter, so I will not bother with the modification.
This solution
permits
us still to
capture
the semantic
similarity
of all causative
verbs in their lexical entries; thus die and kill will have entries 34a and 34b
respectively:
-/d/
-
(34)
a.
+
+a [NP,__]
NP1 DIE
-/kil/ 1
+V
+
[NP2 NP1]
NP2 CAUSE
(NP1 DIE)
Die and kill are related
semantically
in
exactly
the
way
as the two entries of break:
one is a causative in which the event caused is the event described
by
the other.
However, since there is no
morphological
rule
relating 34a-b, the information
measure does not relate them; the independent information contained in the two
entries is the fact that there are two words, plus
all the information in both entries.
Thus the lexicalist
theory successfully expresses
the relation between the two
breaks and their relation to kill and die,
without in
any
sense
requiring
kill and die
to be
exceptional,
and without
making any
radical
changes
in the nature of the
syntactic component.
A further possibility suggested by this account of causative verbs is that the
partial regularities of the following examples from Fillmore are also expressed in the
lexicon:
(35) a. Bees swarmed in the garden.
We sprayed paint on the wall.
b. The garden swarmed with bees.
We sprayed the wall with paint.
Fillmore seeks to express these relationships transformationally, but he encounters
the uncomfortable fact that the (a) and (b) sentences are not synonymous: the (b)
sentences imply that the garden was full of bees and that the wall was covered with
paint, but the (a) sentences do not carry this implication. Anderson 1971 shows
that this semantic difference argues against Fillmore's analysis, and in favor of one
with a deep-structure difference between the (a) and (b) sentences. A lexical treat-
ment of the relationship between the two forms of swarm and spray could express
the difference in meaning, and would be undisturbed by the fact that some verbs,
such as put, have only the (a) form and meaning, while others, such as fill, have
only the (b) form and meaning. This is precisely parallel to the break-break vs.
die-kill case just discussed.
Consider also some of the examples mentioned in Chomsky 1970. The relation of
He was amused at the stories and The stories amused him can be expressed in the
lexicon, and no causative transformation of the form Chomsky proposes need be
invoked. The nominalization his amusement at the stories contrasts with *the stories'
amusement of him because amusement happens to be most directly related to the
adjectival amused at rather than to the verb amuse. Other causatives do have
nominalizations, e.g. the excitation of the protons by gamma rays. I take it then that
660
the existence of
only
one of the
possible
forms of amusement is an ad-hoc fact,
expressed
in the lexicon.
Chomsky
also cites the fact that the transitive use of
grow,
as in John
grows
tomatoes, does not form the nominalization *the
growth of
tomatoes
by
John.
Rather the
growth of
tomatoes is related to the intransitive tomatoes
grow. Again
we can
express
this fact
by
means of lexical relations. This
time, the relation is
perhaps
more
systematic
than with amusement,
since nouns in -th, such as width and
length,
are
generally
related to intransitive
predicates.
Thus the
meaning
of
growth
can be
predicted by
the
syntactic properties
of the
redundancy
rule which introduces
the affix -th. The transitive
grow
does in fact have its own nominalization: the
growing of
tomatoes
by
John. Thus
Chomsky's
use of causatives as evidence for the
Lexicalist
Hypothesis
seems incorrect-in that causatives do have nominalizations,
contrary
to his claim. But we can account for the
unsystematicity
of the nominaliza-
tions, as well as for what
regularities
do
exist,
within the
present
framework.
Note also that our account of causatives extends
easily
to Lakoff's class of
inchoative verbs
(1971b).
For
example,
the relation of the
adjective open
to the
intransitive verb
open ('become open')
is
easily expressed
in a
redundancy
rule
similar to that
proposed
for causatives.
As further evidence for the lexicalist
theory,
consider two forms of the verb
smoke:
(36)
a
{The
cigar
}
smoked.
( The chimneys
,
,
,
,
,
(the
cigar.
b. John smoked 1..*the chi. f
(*the chimney. J
The intransitive verb smoke means 'give off smoke'; it is related to the noun smoke
by a redundancy rule that applies also to the appropriate senses of steam, smell,
piss, flower, and signal. The transitive form of smoke in the sense of 36b is
partially
related to the intransitive form by 33 in that it means 'cause to
give
off
smoke',
but it contains additional information-something like 'by holding
in the mouth
and puffing'. This information is not predictable from the
redundancy rule,
but it
provides the clue to the anomaly of *John smoked the chimney (so,
if John were
a giant, he might well use a chimney like a pipe,
and then the sentence
might
be
acceptable). A transformational theory has no way to capture
this
partial generaliza-
tion without artificiality. The lexicalist theory simply
counts the
unpredictable
information as non-redundant, and the predictable
information as redundant.
While we are on the subject of smoke, it may be
interesting
to
point
out some
other senses of smoke as illustration. Call the noun smoke1,
and the intransitive
and transitive senses just discussed smoke2
and
smoke3 respectively.
There is
another transitive verb smoke4, which means
'permeate
or cover with smoke' as
in John smoked the ham. The redundancy rule relating smoke4
to smoke1 is also
seen in verbs like paint, another sense of
steam,
water
(in
water the
garden),
and
powder (as in powder your nose), flour, and cover. There is another intransitive
smoke5, meaning 'smoke3 something'.
The
ambiguity
in John is
smoking
is between
smoke2 and smoke5. Smoke5
is related to
smoke3 by
a
redundancy
rule that also
handles two forms of eat, drink, draw, read, cook,
and
sing.
From
smoke3
we also
get the nominalization smoke6, 'something
that is smoked
3' (e.g.
A
cigar
is a
good
smoke)
by
the
redundancy
rule that also
gives
the nouns drink, desire, wish, dream,
find,
and
experience.
The verb milk (as in milk a
cow)
is related to the noun as
smoke1 and smoke3 are related, but without an intermediate *The cow milked
('The
cow
gave
off milk');
the relation between the two milks
requires
two sets of
redundancy
rules used
together.
We thus see the rich
variety
of
partial regularities
in lexical relations: their
expression
in a transformational
theory
becomes hard to
conceive, but
they
can be
expressed quite straightforwardly
in the lexicalist frame-
work.
5.4. IDIOMS. Idioms are fixed
syntactic
constructions which are made
up
of words
already
in the lexicon, but which
carry meanings independent
of the
meanings
of
their constituents. Since the
meanings
are
unpredictable,
the
grammar
must
rep-
resent a
speaker's knowledge
of what constructions are idioms and what
they
mean. The
logical place
to list idioms is of course in the lexicon, though
it is not
obvious that the usual lexical
machinery
will suffice.
Fraser 1970 discusses three
points
of interest in the formalization of idioms.
First, they
are constructed from known lexical items; the information measure,
which measures how much the
speaker
must learn,
should reflect this. Second,
they
are for the most
part
constructed in accordance with known
syntactic
rules
(with
a few
exceptions
such as
by
and
large),
and in accordance with the
syntactic
restrictions of their constituents. Third, they
are often resistant to
normally applic-
able transformations; e.g., The bucket was kicked by John has only the non-idio-
matic reading. I have nothing to say about this third consideration, but the first
two can be expressed in the present framework without serious difficulty.
Let us deal first with the question of the internal structure of idioms. Since we
have given internal structure to items like compensation and permit, there seems
to be nothing against listing idioms too, complete with their structure. The only
difference in the lexical entries is that the structure of idioms goes beyond the word
level. We can thus assign the lexical entries in 37 to kick the bucket, give hell to, and
take to task.13
(37) a. [NP1
[vp
[vkik] [NP [Art63] [Nbukat]]]]
b [NP1 [vp [vgiv] [NP [Nhel]] [pp [pto] NP2]
LNP1 YELL AT
NP2
J
[NPi
[v
[vtak]
NP2 [pp [pto] [NP [Ntaesk]]]]1
cLNP. CRITICIZE NP2
J
The lexical insertion rule will operate in the usual way, inserting the lexical entries
onto deep phrase markers that conform to the syntactic structure of the lexical
entries. Since the structure of the entries goes beyond the word level, the idiom
must be inserted onto a complex of deep-structure nodes, in contrast to ordinary
words which are inserted onto a single node.
13
The normal notation for strict subcategorization
restrictions is difficult to apply in this
case, so I have for convenience adopted a notation in which the strict subcategorization con-
ditions are combined with the phonological and syntactic representations, in an obvious
fashion. No particular theoretical significance is intended by the change in notation. This
proposal, which appears to be much like that of Katz 1973, was arrived at independently.
662
As with
ordinary
lexical entries, the
strictly subcategorized
NP's must have a
specific grammatical
relation with
respect
to the
entry,
and this is indicated in the
entries of 37. In the case of take NP to task, the
strictly subcategorized
direct
object
is in fact surrounded
by parts
of the idiom; i.e., the idiom is discontinuous.
But in the
present theory,
this
appears
not to be cause for
despair,
as our formalisms
seem
adequate
to accommodate a discontinuous lexical item.
This last observation enables us to solve a
puzzle
in
syntax:
which is the under-
lying
form in
verb-particle constructions, look
up
the answer or look the answer
up?
The standard
assumption (cf.
Fraser
1965)
is that the
particle
has to form a
deep-structure
constituent with the verb in order to formulate a lexical
entry;
hence
look
up
the answer is
underlying,
and the
particle
movement transformation is a
rightward
movement. But Emonds 1972
gives strong syntactic
evidence that the
particle
movement rule must be a leftward movement. He feels uncomfortable about
this result because it
requires
that look ...
up
be discontinuous in
deep structure;
he consoles himself
by saying
that the same
problem
exists for take ... to task,
but does not
provide any interesting
solution.
Having given
a viable
entry
for
take ... to task, we can now
equally
well
assign
discontinuous entries to idiomatic
verb-particle constructions, vindicating
Emonds'
syntactic
solution.
By claiming
that the normal lexical-insertion
process
deals with the insertion of
idioms,
we
accomplish
two ends. First, we need not
complicate
the
grammar
in
order to accommodate idioms. Second, we can explain why idioms have the
syntactic structure of ordinary sentences: if they did not, the lexical insertion rules
could not insert them onto deep phrase markers. Our account of idioms thus has
the important virtue of explaining a restriction in terms of already existing con-
ventions in the theory of grammar-good evidence for its correctness.
Now that we have provided a way of listing idioms, how can we capture the
speaker's knowledge that idioms are made up of already existing words? To relate
the words in the lexicon to the constituents of idioms, we need morphological
redundancy rules. The appropriate rules for kick the bucket must say that a verb
followed by a noun phrase forms a verb phrase, and that an article followed by a
noun forms a noun phrase. But these rules already exist as phrase-structure rules
for VP and NP. Thus, in the evaluation of idioms, we must use the phrase-structure
rules as morphological redundancy rules. If this is possible, the independent in-
formation in kick the bucket will be the information that it is a lexical entry, plus
the semantic information DIE, plus the cost of referring to the phrase-structure rules
for VP and NP.
Though mechanically this appears to be a reasonable solution, it raises the
disturbing question of why the base rules should play a role in the information
measure for the lexical component. Some discussion of this question will appear
in ?7. At this point I will simply note that this solution does not have very drastic
consequences for grammatical theory. Since the base rules can be used as redun-
dancy rules only if lexical entries go beyond the word level, no descriptive power
is added to the grammar outside the description of idioms. Therefore the proposal
is very limited in scope, despite its initially outrageous appearance.
If the base rules are used as morphological redundancy rules for idioms, we
might correspondingly expect the semantic projection rules to be used as semantic
LANGUAGE, VOLUME 51, NUMBER 3
(1975)
redundancy
rules. But of course this cannot be the case, since then an idiom would
have
exactly
its literal
meaning,
and cease to be an idiom. So we must assume that
the
permissive
version of the information measure is
being
used: both
morphological
and semantic
redundancy
rules
exist, but
only
the
morphological
rules
apply
in
reducing
the
independent
information in the idiom. This is further evidence that the
permissive
version of the information measure must be correct.
Note, by
the
way,
that a transformational
theory
of nominalization contains
absolutely
no
generalization
of the
approach
that accounts for idioms. Thus the
lexicalist
hypothesis proves
itself
superior
to the transformational
hypothesis
in a
way totally
unrelated to the
original arguments deciding
between them.
6. THE COST OF REFERRING TO REDUNDANCY RULES. In
evaluating
the
independent
information of lexical entries, we have
continually
included the cost of
referring
to
redundancy
rules. We have not so far
specified
how to calculate this
cost,
or how to
relate it
quantitatively
to other costs in the lexicon. In this section
I
will
propose
some
preliminary
answers to those
questions.
In the discussion of the
full-entry theory
in ?2,
1 said that the cost of
referring
to a
redundancy
rule in
evaluating
a lexical
entry represents
one's
knowledge
of which
regularities
hold in that
particular
lexical
entry.
In order to be more
specific,
let us
reconsider the
meaning
of the information measure in the
full-entry theory.
In
measuring
the
independent
information contained in a lexical
entry,
we are in
effect measuring
how much new information one needs in order to learn that
lexical item. If the lexical item is totally unrelated to anything else in the lexicon,
one must learn it from scratch. But if there is other lexical information which helps
one know in advance some of the properties of the new word, there is less to learn;
this is captured in clause (b) of the information measure.
In learning that a new lexical item can be formed on the basis of an old lexical
item and a redundancy rule, however, something must be learned besides the
identity of the old lexical item: namely, which redundancy rule. to apply. For
example, part
of one's
knowledge
of the lexicon of English is the fact that the
nominalizations of
refuse and confuse are refusal and confusion, not *refusion and
*confusal, although
in
principle the latter forms could exist. That is, in learning the
words refusal and confusion, one must learn the arbitrary fact that, of the choice of
possible nominal affixes, refuse uses -al and confuse uses -ion. Clause (c) of the
information measure, the cost of referring to the redundancy rule, is meant to
represent this knowledge. I am claiming therefore that the evaluation of refusal
must take into account the fact that it, and not *refusion, is the proper nominaliza-
tion of refuse.
For a clear case of the use of clause (c), let us turn to another example. Botha
1968 discusses the process of nominal compounding in Afrikaans, which contains
many compounds which are morphologically simple concatenations of two nouns,
as in
English. But there are also many compounds in which the two nouns are
joined by a 'link
phoneme' s or a. Botha demonstrates at great length that there
is no phonological, morphological, syntactic, or semantic regularity in the use
of link phonemes; i.e., the link phoneme must be learned as an idiosyncrasy of
each individual compound.
664
In the
present theory,
the Afrikaans lexicon contains three
morphological
rules
for noun
compounds:
(38) a. [[NX] [NY]/]
{[I
I}
[+,, LSN
]lu
{
i!
[I[NX] S [NY]/]
{[L+ N]}
1+N]
b-L+N l
Wyl
[I[NX] [NY]!] {[IxI]
Since all the
morphological
information of a
particular compound
is
predicted by
one of the three rules in 38, clause
(b)
of the information measure contributes
nothing
to the information content of the
compound.
But since the
speaker
must learn which of the three is
appropriate,
clause
(c)
must contribute the cost of
the information involved in
making
this choice.
A third
example
involves inflectional
morphology.
Halle 1973
argues
that
paradigmatic
information should be
represented
in the
dictionary,
and in fact that
only and all fully inflected forms should be entered. As a consequence, the lexical
insertion rules must enter partial or complete paradigms into deep structures, and
the rules of concord must have the function of filtering out all but the correct
forms, rather than that of inserting inflectional affixes.14 Under Halle's proposal,
part of the task of the lexical component of English is to list the correspondences
between the present and past tense forms of verbs. Accordingly, we can state a
few morphological redundancy rules relating present to past tense forms in English:
(39)a
]^ r/xd ]
(a) [+
[V+pres]] [+[V+past]J
b f/CoVCo/
ir/CoVCo+t/
b
L+[V+pres]J [+[V+past]]
[/CO back -
/CO
-aback
CO/
c.round -caround
+[VY+pres]
_
+[V+past]
df/CoVCo/
1
f/COoOx+d/
1
d.
[V+pres]J [[V+past]]
14
This of course requires rules of concord to be of a different formal nature than ordinary
transformations. But perhaps this is not such a bad result, considering that the most convincing
cases for Lakoff's global rules seem to be in this area. An independent argument that concord
rules differ formally from transformations could serve as evidence that transformations need
not be global: only the very limited class of concord rules, which are no longer transformations
at all, need information from various levels of derivation. This more highly structured theory
reduces the class of possible grammars.
Here 39a is the
regular
rule for
forming past tenses,
and the other three
represent
various
irregular
forms: 39b relates
keep-kept, dream-dreamt, lose-lost, feel-felt
etc.; 39c relates tell-told, cling-clung, hold-held, break-broke etc.; the
very marginal
and
strange
39d relates
just
the six
pairs buy-bought,
bring-brought, catch-caught,
fight-fought, seek-sought,
and
think-thought.
Note that 39b-c take over the
function of the
'precyclic re-adjustment
rules' described
by Chomsky
& Halle
(209-10).15
A final
preliminary point
in this
example:
in the evaluation of a
paradigm by
the
information
measure,
I assume that the information that a word exists is counted
only
once for the entire
paradigm. Although
one does have to learn whether a verb
has a
nominalization, one knows for certain that it has a
past tense, participles,
and
a
conjugation.
Therefore the information measure should not count
knowledge
that inflections exist as
anything
to be learned.
Now let us return to the
problem
of
measuring
the cost of
referring
to a redun-
dancy
rule.
Intuitively,
the
overwhelmingly productive
rule 39a should cost
virtually
nothing
to refer
to; the
overwhelmingly marginal
rules 39b-d should cost a
great
deal to refer
to, but less than the information
they
render
predictable.
The
disparity
in cost reflects the fact that,
in
choosing
a
past
tense form, 39a is
ordinary
and
unremarkable, so one must learn
very
little to use it; but the others are unusual or
'marked'
choices, and must be learned. We
might
further
guess
that 39b-c,
which
each account for a fair number of verbs, cost less to refer to than 39d, which applies
to only six forms (but which is nevertheless perceived as a minor regularity). Still,
the pair buy-bought contains less independent information than the totally irregular
pair go-went, which must be counted as two independent entries.
These considerations lead to a formulation of the cost of reference something
like
(40) The cost of referring to redundancy rule R in evaluating a lexical entry
W is
IR,W
x
PR,W,
where IR,W is the amount of information in W
predicted by R, and PR.W is a number between 0 and 1 measuring the
regularity of R in applying to the derivation of W.
For an altogether regular rule application, such as the use of 39a with polysyllabic
verbs, PR,W will be zero. With monosyllabic verbs and 39a,
PR,W
will be almost but
not quite zero; the existence of alternatives means that something must be learned.
For 39b-d, PR.W will be close to 1; their being irregular means that their use does
not reduce the independent information content of entries nearly as much as 39a.
In particular, 39d will reduce the independent information content hardly at all.
In fact, it is quite possible that the total information saved by 39d in the evaluation
15
I have not considered the question of how to extend the phonological generalization of
39c to other alternations such as mouse-mice, long-length. Perhaps the only way to do this is to
retain the rule in the phonology, and simply let the lexical redundancy rule supply a rule feature.
But a more sophisticated account of the interaction of the morphological rules might capture
this generalization without a rule feature; e.g., one could consider factoring morphological rules
into phonological and syntactic parts, as we factored out separate morphological and semantic
rules in ?4. In any event,
I am including all the phonology in 39 because many people have been
dissatisfied with the notion of re-adjustment rules: I
hope that bringing up an alternative may
stimulate someone to clarify the notion.
666f
of the six relevant
pairs
of lexical entries is less than the cost of
stating
the rule.
Our evaluation measure thus reflects the
extremely marginal
status of this rule. In
other
cases, perhaps
the
nominalizing
affixes and Afrikaans
compounds,
the various
possible
derived forms are in more
equal competition,
and
PR.W
will have a value
of, say,
0.3.
I will not
suggest
a
precise
method of
calculating PR,W,
as I believe it would be
premature. However, the
general concept
of how it should be formulated is
fairly
clear. Count a lexical
pair
related
by
R as an ACTUAL use of R. Count a lexical
entry
which meets one term of the structural
description
of R, but in whose evalua-
tion R
plays
no role, as a NON-USE of R. For
example, confuse
counts as a non-use
of the rule
introducing
the
-al
nominal affix, since it meets the structural
description
of the verbal term of the rule, but there is no noun
confusal.
The sum of the actual
uses and the non-uses is the number of POTENTIAL uses of R. PR W should be near
zero when the number of actual uses of R is close to the number of
potential uses;
PRw
should be near 1 when the number of actual uses is much smaller than the
number of
potential uses; and it should rise
monotonically
from the former
extreme to the latter.
If
phonological
conditions can be
placed
on the
applicability
of a
redundancy
rule, PR,W decreases; i.e., the rule becomes more regular. For example, if the actual
uses of 39b all contain mid vowels (as I believe to be the case), then this specification
can be added to the vowel in 39b, reducing the potential uses of the rule from the
number of monosyllabic verbs to the number of such verbs with mid vowels. Since
the number of actual uses of the rule remains the same, PR W is reduced; and,
proportionately, so is the cost of referring to 39b in the derivations where it is
involved.
It is obvious that this concept of PR,W must be refined to account for derivations
such as perdition with non-lexical sources; for compounding, where the number of
potential uses is infinite because compounds can form parts of compounds; and for
prefix-stem verbs, where the lexical redundancy rule does not relate pairs of items.
Furthermore, I have no idea how to extend the proposal to the evaluation of idioms,
where the base rules are used as lexical redundancy rules. Nevertheless, I believe
the notion of regularity of a lexical rule and its role in the evaluation measure for
the lexicon is by this point coherent enough to satisfy the degree of approximation
of the present theory.
7. CREATIVITY IN THE LEXICON AND ITS IMPLICATIONS. The accepted view of the
lexicon is that it is simply a repository of learned information. Creativity is taken to
be a product of the phrase-structure rules and transformations. That is, the ability
of a speaker to produce and understand new sentences is ascribed to his knowledge
of a productive set of rules which enable him to combine a fixed set of memorized
words in infinitely many ways.
If we were to adhere to this view strictly, it would be difficult to accept the
treatment of the lexicon proposed here. For example, it is quite common for
someone to invent a new compound noun spontaneously and to be perfectly under-
stood. This creative use of the compound rule, we would have to argue, is evidence
that compounding must be a transformational process rather than a lexical one.
(1975)
This conclusion would
fly
in the face of all the evidence in ?5.3
against
a transform-
ational account of
compounds.
The
way
out of the dilemma must be to follow the
empirical evidence,
rather
than our
preconceived notions of what the
grammar
should be like. We must
accept
the lexicalist account of
compounds,
and
change
our notion of how
creativity
is embodied in the
grammar.
The nature of the revision is clear. Lexical
redundancy
rules are learned from
generaIizations observed in
already
known lexical items. Once learned, they
make it
easier to learn new lexical items: we have
designed
them
specifically
to
represent
what new
independent
information must be learned. However, after a
redundancy
rule is
learned, it can be used
generatively, producing
a class of
partially specified
possible lexical entries. For
example,
the
compound
rule
says
that
any
two nouns
N1 and N2 can be combined to form a
possible compound
N1N2. The semantic
redundancy rules associated with the
compound
rule
provide
a finite
range of
possible readings
for N1N2. If the context is such as to
disambiguate
NiN2, any
speaker of
English who knows
N1 and
N2
can understand N1N2 whether he has
heard it before or
not, and whether it is an
entry
in his lexicon or not. Hence the
lexical rules can be used
creatively, although
this is not their usual role.
In
?5.4,
I
proposed that the description of idioms uses the phrase-structure
rules
as lexical redundancy rules. In broader terms, the rules normally used
creatively
are being used for the passive description of memorized items. Perhaps this
change
in function makes more sense in light of the discussion here: it is a mirror image
to
the creative use of the normally passive lexical redundancy
rules.
We have thus abandoned the standard view that the lexicon is memorized and
only the syntax is creative. In its place we have a somewhat more flexible theory of
linguistic creativity. Both creativity and memorization take place in both
the
syntactic and the lexical component. When the rules of either component are used
creatively, no new lexical entries need be learned. When memorization of
new
lexical entries is taking place, the rules of either component can serve as an aid
to
learning. However, the normal mode for syntactic rules is creative, and the
normal
mode for lexical rules is passive.
Is there, then, a strict formal division between phrase-structure rules and
morphological redundancy rules, or between the semantic projection rules of
deep
structure and the semantic redundancy rules? I suggest that perhaps there is
not,
and that they seem so different simply because of the differences in their
normal
mode of operation. These differences in turn arise basically because lexical rules
operate inside words, where things are normally memorized, while
phrase-structure
rules operate outside words, where things are normally created
spontaneously.
One might expect the division to be less clear-cut in a highly agglutinative
language,
where syntax and morphology are less separable than in English.
To show that the only difference between the two types of rules is indeed in their
normal modes of operation, one would of course need to reconcile their
somewhat
disparate notations and to show that they make similar claims. Though I will
not carry out this project
here,
it is important to note, in the present
scheme, that
the syntactic analog of a morphological redundancy rule is a phrase-structure
rule,
not a transformation. This result supports the lexicalist theory's general trend
668
toward
enriching
the base
component
at the
expense
of the transformational com-
ponent.'6
8. SUMMARY. This
paper
set out to
provide
a
theory
of the lexicon that would
accommodate Chomsky's theory
of the
syntax
of nominalizations. This
required
a
formalization of the notion
'separate
but related lexical entries'. The formalization
developed
uses
redundancy
rules not for
part
of the derivation of lexical entries, but
for
part
of their evaluation. I take this use of
redundancy
rules to be a
major
theoretical innovation of the
present approach.
In
turn,
this use of
redundancy
rules entails the formulation of a new
type
of
evaluation measure. Previous theories have used
abbreviatory
notations to reduce
the evaluation measure on the
grammar
to a
simple
count of
symbols.
But we have
seen that the usual notational conventions cannot
capture
the full
range
of
general-
izations in the lexicon.
Accordingly
I have formulated the evaluation measure as
a minimization of
independent information,
measured
by
the rather
complex
function 16 and its refinement in 40. The abandonment of the traditional
type
of
evaluation measure is a second
very
crucial theoretical innovation
required
for an
adequate
treatment of the lexicon.17
The
concept
of lexical rules that
emerges
from the
present theory
is that
they
are
separated
into
morphological
and semantic
redundancy
rules. The M-rules must
play
a role,
and the S-rules
may,
in
every
lexical evaluation in which entries are
related. Typically, the redundancy rules do not completely specify the contents of
one entry in terms of another, but leave some aspects open. This partial specification
of output is a special characteristic of lexical redundancy rules not shared by other
types of rules; I have used this characteristic frequently in arguing against trans-
formational solutions.
In the discussion of nominalizations, I have taken great pains to tailor the
information measure to our intuitions about the nature of generality in the lexicon.
In particular, attention has been paid to various kinds of lexical derivatives with
non-lexical sources, since these form an important part of the lexicon which is not
accounted for satisfactorily in other theories.
While our solutions were developed specifically with nominalizations in mind,
there is little trouble in extending them to several disparate areas in the lexicon.
I have shown that parallel problems occur in these other areas, and that the solution
for nominalizations turns out to be applicable. Insofar as the success of a theory
is measured by how easily it generalizes to other problems, this theory thus seems
quite successful for English. A more stringent test would be its applicability to
languages where morphology plays a much more central role.
Another measure of a theory's success is its salutary effect on other sectors of the
16
Halle 1973 argues for a view of lexical creativity very similar to that proposed here, on
similar grounds.
17
One might well ask whether the traditional evaluation measure has inhibited progress in
other areas of the grammar as well. I conjecture that the approach to marking conventions in
SPE (Chapter 9) suffers for this very reason: Chomsky & Halle set up marking conventions so
that more 'natural' rules save symbols. If, instead, the marking conventions were used as part
of an evaluation measure on a set of fully specified rules, a great deal of their mechanical
difficulty might well be circumvented in expressing the same insights.
(1975)
theory
of
grammar.
The most
important
effect of the
present theory
is to eliminate
a
major part
of the evidence for Lakoff's theory
of
exceptions
to transformations
(1971b):
the lexicon has been set
up
to accommodate
comfortably
both
regular
and ad-hoc
facts,
with no sense of absolute
exceptionality;
and transformations
are not involved in
any
event. Since (in Jackendoff
1972) I have eliminated another
great part
of Lakoff's evidence, virtually
all of Lakoff's so-called
exceptions
are now
accounted for in a much more
systematic
and restricted fashion. We also no
longer
need
hypothetical
lexical
entries,
a
powerful
device used
extensively by
Lakoff.
With
practically
all of Lakoff's evidence
dissolved, we see that the
theory
of
exceptions plays
a
relatively insignificant
role in lexicalist
grammar.
A small
dent has also been made in the
highly
controversial area of
idiosyncratic phono-
logical re-adjustment rules, though
much further work is needed before we know
whether
they
are eliminable.
There are three favorable results in
syntax
as well. First and most
important,
the
analysis
of causative verbs, which
supposedly provides
crucial evidence for the
generative
semantics
theory
of
lexicalization, can be
disposed
of
quietly
and without
fuss, leaving
the standard
theory
of lexical insertion intact.
Second, idioms can be
listed in the lexicon and can
undergo
normal lexical insertion; some of their
syn-
tactic
properties emerge
as an automatic
consequence
of this
position. Third, the
direction of the
English particle
movement transformation can
finally
be settled
in favor of leftward movement.
Thus a
relatively straightforward
class of intuitions about lexical relations has
been used to
justify
a
theory
of the lexicon which has quite a number of significant
properties
for
linguistic theory. Obviously, many questions remain in the area of
morphology.
I would
hope, however, that this
study has provided a more congenial
framework in which to
pose
these
questions.
REFERENCES
ANDERSON, S. 1971. On the role of
deep
structure in semantic interpretation. Founda-
tions of
Language
7.387-96.
BOTHA, RUDOLF P. 1968. The function of the lexicon in transformational generative
grammar.
The
Hague: Mouton.
CHOMSKY,
N. 1965.
Aspects
of the
theory
of
syntax. Cambridge, Mass.: MIT Press.
. 1970. Remarks on nominalization. In Jacobs &
Rosenbaum, 184-221.
. 1972. Some
empirical
issues in the
theory
of transformational grammar. Goals
of
linguistic theory,
ed.
by
S.
Peters,
63-130.
Englewood Cliffs, N.J.: Prentice-
Hall.
,
and M. HALLE. 1968. The sound
pattern
of
English. New York: Harper & Row.
EMONDS,
J. E. 1972. Evidence that indirect object movement is a structure-preserving
rule. Foundations of
Language 8.546-61.
FILLMORE,
C. 1968. The case for case. Universals in
linguistic theory, ed. by E. Bach &
R.
Harms,
1-88. New York:
Holt, Rinehart & Winston.
FODOR,
JERRY. 1970. Three reasons for not
deriving
'kill' from 'cause to die'. Linguistic
Inquiry
1.429-38.
FRASER,
B. 1965. An examination of the
verb-particle construction in English. MIT
dissertation.
. 1970. Idioms within a transformational
grammar. Foundations of Language
6.22-42.
670
HALLE, M. 1959. The sound pattern
of Russian. The Hague: Mouton.
--. 1973. Prolegomena
to a theory of word formation. Linguistic Inquiry 4.3-16.
JACKENDOFF, R. S. 1972. Semantic interpretation in generative grammar. Cambridge,
Mass.: MIT Press.
JACOBS, R., and P. ROSENBAUM (eds.) 1970. Readings in English transformational
grammar. Waltham, Mass.: Blaisdell.
KATZ,
J. J. 1973. Compositionality, idiomaticity, and lexical substitution. A Festschrift
for Morris Halle, ed. by S. Anderson & P. Kiparsky, 357-76. New York: Holt,
Rinehart & Winston.
LAKOFF, G. 1971a. On generative semantics. Semantics: an
interdisciplinary reader,
ed. by D. Steinberg & L. Jakobovits, 232-96. Cambridge: University Press.
--. 1971b. Syntactic irregularity.
New York: Hiolt, Rinehart & Winston.
LEES, R. B. 1960. The grammar
of English nominalizations. Bloomington: Indiana
University.
MCCAWLEY,
J. 1968. Lexical insertion in a transformational grammar without deep
structure. Papers
from the 4th Regional Meeting, Chicago Linguistic Society,
71-80.
PERLMUTTER, D. 1970. The two verbs 'begin'.
In Jacobs & Rosenbaum, 107-19.
STANLEY, R. 1967. Redundancy rules in
phonology. Lg. 43.393-436.
[Received
1 July 1974.]

Jackendoff, R. 1975. Regularities in The Lexicon

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Jackendoff, R. 1975. Regularities in The Lexicon

Uploaded by

Copyright:

Available Formats

Linguistic Society of America

Morphological and Semantic Regularities in the Lexicon

You might also like