You are on page 1of 46

Cognitive Science 35 (2011) 251–296

Copyright  2010 Cognitive Science Society, Inc. All rights reserved.


ISSN: 0364-0213 print / 1551-6709 online
DOI: 10.1111/j.1551-6709.2010.01144.x

Computational Exploration of Metaphor Comprehension


Processes Using a Semantic Space Model
Akira Utsumi
Department of Informatics, The University of Electro-Communications, Tokyo

Received 22 October 2008; received in revised form 25 June 2010; accepted 26 July 2010

Abstract
Recent metaphor research has revealed that metaphor comprehension involves both categorization
and comparison processes. This finding has triggered the following central question: Which property
determines the choice between these two processes for metaphor comprehension? Three competing
views have been proposed to answer this question: the conventionality view (Bowdle & Gentner,
2005), aptness view (Glucksberg & Haught, 2006b), and interpretive diversity view (Utsumi, 2007);
these views, respectively, argue that vehicle conventionality, metaphor aptness, and interpretive
diversity determine the choice between the categorization and comparison processes. This article
attempts to answer the question regarding which views are plausible by using cognitive modeling
and computer simulation based on a semantic space model. In the simulation experiment, categoriza-
tion and comparison processes are modeled in a semantic space constructed by latent semantic analy-
sis. These two models receive word vectors for the constituent words of a metaphor and compute a
vector for the metaphorical meaning. The resulting vectors can be evaluated according to the degree
to which they mimic the human interpretation of the same metaphor; the maximum likelihood
estimation determines which of the two models better explains the human interpretation. The result
of the model selection is then predicted by three metaphor properties (i.e., vehicle conventionality,
aptness, and interpretive diversity) to test the three views. The simulation experiment for Japanese
metaphors demonstrates that both interpretive diversity and vehicle conventionality affect the choice
between the two processes. On the other hand, it is found that metaphor aptness does not affect this
choice. This result can be treated as computational evidence supporting the interpretive diversity and
conventionality views.

Keywords: Metaphor comprehension; Cognitive modeling; Semantic space model; Latent semantic
analysis (LSA); Categorization; Comparison; Interpretive diversity; Conventionality; Maximum
likelihood estimation

Correspondence should be sent to Akira Utsumi, Department of Informatics, The University of Electro-
Communications, 1-5-1 Chofugaoka, Chofushi, Tokyo 182-8585, Japan. E-mail: utsumi@inf.uec.ac.jp
252 A. Utsumi/Cognitive Science 35 (2011)

1. Introduction

Metaphors pervade language, both in spoken and written discourse. For example, in an
analysis of different types of discourses, Cameron (2008) demonstrated that 20 metaphors
were used per 1,000 words for college lectures, 50 metaphors in ordinary discourse, and 60
metaphors in discourses by teachers. Hence, it is not an exaggeration to say that people can-
not verbally communicate with each other without using metaphors. Furthermore, an
increasing number of studies have revealed that metaphors are essentially involved in our
everyday thought (e.g., Gibbs, 2006; Kövecses, 2002; Lakoff & Johnson, 1980).
The prevalence of metaphor in language and thought has motivated a considerable num-
ber of cognitive studies on metaphor, particularly on the cognitive mechanism of metaphor
comprehension. These studies have focused on how people comprehend metaphors and have
discovered that two different processes, namely, comparison (Gentner, 1983; Gentner,
Bowdle, Wolff, & Boronat, 2001; Gentner & Markman, 1997) and categorization
(Glucksberg, 2001; Glucksberg & Keysar, 1990), are involved in metaphor comprehension.
Therefore, recent psycholinguistic studies have explored the metaphor property that
determines the choice between the two processes and proposed different views: the
conventionality view (Bowdle & Gentner, 2005; Gentner & Bowdle, 2008), aptness view
(Glucksberg & Haught, 2006a, 2006b; Jones & Estes, 2006), and interpretive diversity view
(Utsumi, 2007); these views, respectively, argue that vehicle conventionality, metaphor
aptness, and interpretive diversity determine the choice. However, the studies that have
empirically tested these three hybrid views show different results. Researchers have hitherto
not reached a consensus, and there is a heated debate regarding the kind of metaphors that
are processed as comparisons and as categorizations (e.g., Bowdle & Gentner, 2005; Gibbs,
2008; Glucksberg & Haught, 2006a, 2006b; Jones & Estes, 2006; Utsumi, 2007).
To answer the question regarding which of these metaphor views is most plausible, this
study adopts an approach that is different from that of existing studies, namely, computa-
tional modeling and simulation. In this approach, this study employs a semantic space model
(e.g., Landauer, McNamara, Dennis, & Kintsch, 2007; Padó & Lapata, 2007; Widdows,
2004). It models two processes of metaphor comprehension (categorization and comparison)
using a semantic space model and determines which of the two models better explains the
human interpretation of each Japanese metaphor obtained in a psychological experiment
(Utsumi, 2007). The result of the model selection procedure is then predicted by three meta-
phor properties, namely, conventionality, aptness, and interpretive diversity. The best pre-
dictor can be treated as the most plausible view of metaphor comprehension that determines
a shift between comparison and categorization.
The rest of this article is organized as follows. Section 2 illustrates the comparison and
categorization views of metaphor comprehension, and then presents three hybrid views for
reconciling the categorization and comparison views, which I intend to compare in this
study. Other metaphor views are also reviewed in this framework and discussed again in
Section 5 with reference to the implications of the result of this article. Section 3 explains
the semantic space model as a computational framework used for the simulation experiment,
and two algorithms as models of the comparison and categorization processes. Furthermore,
A. Utsumi/Cognitive Science 35 (2011) 253

this section shows that these algorithms are plausible models of the psychological processes
of comparison and categorization by demonstrating that they are consistent with two
processing phenomena that the existing empirical studies of metaphor have employed to
determine the dominant process in metaphor comprehension, that is, grammatical concor-
dance between form and function and directionality in the metaphor comprehension process.
Section 4 presents the procedure for model selection and theory testing performed in the
simulation experiment and the result of the simulation experiment. Section 5 explains the
implications of the simulation results as well as some issues concerning the computational
methodology based on the semantic space model. Finally, Section 6 provides concluding
remarks for this study.

2. Metaphor theories

A metaphor consists of two concepts, which are referred to as topic and vehicle. A topic
is the concept described using a metaphor, and a vehicle is the concept employed to describe
a topic in a metaphor. For example, a metaphor ‘‘An X is a Y’’ has the topic X and the vehi-
cle Y. Note that analogy researchers often refer to the topic as the target and the vehicle as
the base (or the source).

2.1. Disagreement between comparison and categorization

Metaphor comprehension is a process of establishing correspondences in the non overlap-


ping domains of the vehicle and the topic of a metaphor. For example, consider the meta-
phor ‘‘Socrates is a midwife.’’ This metaphor implies a correspondence between the topic
Socrates and the vehicle midwife such that, just as a midwife helps women to bring children
into the world, Socrates helps his students to bring interesting ideas into the world. Hence,
the manner in which such correspondences are established for comprehending metaphors is
a central topic for metaphor research. Although a considerable number of studies have been
conducted on this topic, a consensus has not been reached. The approaches of such studies
to the mechanism of metaphorical mapping are divided into two categories, namely, the
comparison view (e.g., Gentner, 1983; Gentner & Markman, 1997; Gentner et al., 2001)
and the categorization view (e.g., Glucksberg, 2001; Glucksberg & Keysar, 1990).

2.1.1. Comparison view


The comparison view has argued that metaphors are comprehended via a comparison pro-
cess, wherein metaphorical correspondences are established by finding commonalities
between conceptual representations of the vehicle and the topic. According to the structure
mapping theory proposed by Gentner et al. (Gentner, 1983, 1989; Gentner & Markman,
1997; Gentner et al., 2001), which is a representative theory of the comparison view, meta-
phorical commonalities are found by a process of structural alignment between two repre-
sentations. In the alignment process, all identical elements (i.e., predicates and arguments)
in the topic and the vehicle (e.g., the predicates help and bring in the case of ‘‘Socrates is a
254 A. Utsumi/Cognitive Science 35 (2011)

midwife’’) are matched. In the later process of alignment, these local matches are collected
to form structurally consistent mappings, and these mappings are then merged into the com-
mon structure. Next, the elements connected to the common structure in the vehicle but not
initially present in the topic (e.g., the predicates specifying the gradual development of the
child within the mother) are projected as candidate inferences into the topic. Structural
alignment and inference projection constitute a process of comparison, which Gentner et al.
propose as a general cognitive mechanism for analogy, metaphor, and similarity.
The basic concept of the structure mapping theory is shared by dominant theories of
analogy (e.g., Holyoak & Thagard, 1989; Hummel & Holyoak, 1997; Larkey & Love,
2003). They accept the process of comparison, comprising alignment and projection, as a
mechanism of metaphorical mapping. (In particular, Hummel and Holyoak’s [1997] Learn-
ing and Inference with Schemas and Analogies [LISA] is basically the structure mapping
view.) The difference among these theories lies primarily in the kind of similarities that are
preferentially included in the common structure. The structure mapping theory emphasizes
relational similarities, particularly similarities between higher order relations according to
the systematicity principle. On the other hand, Holyoak and Thagard’s (1989) Analogical
Constraint Mapping Engine (ACME) and Hummel and Holyoak’s (1997) LISA argue that
semantic and pragmatic similarities are also required in analogical mapping. Despite such
differences, these theories of analogy can be classified into the comparison view (for the
same treatment, see Gentner & Bowdle, 2008).
The conceptual metaphor theory (Clausner & Croft, 1997; Grady, 1997; Kövecses, 2002;
Lakoff & Johnson, 1980, 1999) is related to the comparison view. The basic tenet of the
conceptual metaphor theory is that our conceptual system is structured by preexisting cross-
domain mappings, the so-called conceptual metaphors, which are grounded in embodied
experiences. Verbal metaphors (i.e., metaphorical expressions) are assumed to be compre-
hended merely on the basis of such conceptual metaphors. Conceptual metaphors differ in
the way they are experientially grounded; Grady (1997, 2005) distinguished primary meta-
phors (e.g., Happy Is Up)—directly motivated by experiential correlation—from complex
metaphors (e.g., Theories Are Buildings), which do not appear to be directly motivated by
experiential correlation but are constructed by combining primary metaphors. Although pri-
mary metaphors are embodied and not based on analogical mappings, some complex meta-
phors are based on analogical mappings (Grady, 2005). Hence, the conceptual metaphor
theory can be regarded as relevant to the comparison view, rather than to the categorization
view. Note that such an embodied view of metaphors leads to the critical argument that met-
aphor comprehension cannot be simulated using non embodied computational methods,
such as the semantic space model (e.g., Louwerse & Van Peer, 2009), which appears to be
incompatible with this study. This issue will be discussed in Section 5.2.

2.1.2. Categorization view


The categorization view by Glucksberg et al. (Glucksberg, 2001, 2003; Glucksberg &
Keysar, 1990; Glucksberg, McGlone, & Manfredi, 1997) has claimed that metaphors are
comprehended via a categorization process, by which the topic is treated as a member of an
abstract superordinate category exemplified by the vehicle. The topic and the vehicle play
A. Utsumi/Cognitive Science 35 (2011) 255

different roles in this comprehension process; the vehicle provides a superordinate category
that can be used to characterize the topic, whereas the topic constrains the dimensions by
which it can be characterized. For example, the metaphor ‘‘My job is a jail’’ is compre-
hended so that the topic my job is categorized as an ad hoc category like ‘‘unpleasant and
confining things’’ to which the vehicle jail typically belongs. In evoking the ad hoc cate-
gory, my job facilitates the attribution of features related to tasks and jobs, while blocking
out irrelevant features such as those related to jail building.
The recent development of Sperber and Wilson’s (1995) relevance theory adopts a view
of metaphor comprehension that is similar to the categorization view. Relevance theory
argues that metaphor comprehension can be regarded as the online construction of ad hoc
concepts by broadening or narrowing lexical (or literal) meanings of the vehicle (Carston,
2002; Sperber & Wilson, 2008). The role of the topic assumed in relevance theory differs
from Glucksberg’s attributive categorization view. Relevance theory assumes that the topic
influences the process of concept construction through pragmatic inferencing according to
the principle of relevance, rather than restricting the dimensions along which features are
mapped. Despite such a difference, the relevance-theoretic view of metaphor is very similar
to the categorization view; thus, it can be reasonably classified as the categorization view
(Carston, 2002).

2.1.3. Which view is better?


The two views of categorization and comparison encounter serious problems of their
own; this makes it difficult to decide which view is better for a plausible theory of metaphor
comprehension.
The main problem of the categorization view is that it downplays the role of the topic in
metaphor comprehension (Bowdle & Gentner, 2005). This view assumes that multiple ad
hoc categories are initially derived in parallel from the vehicle. This assumption implies that
in many cases, an unlimited number of categories have to be generated and stored in work-
ing memory until the topic determines the relevant category. Such a process seems to be
resource demanding and therefore is intuitively less plausible. An adequate theory of meta-
phor should allow more interaction between the topic and the vehicle.
In contrast, the comparison view has a serious problem of dealing with unspecified topics
(e.g., Glucksberg et al., 1997). When hearers are unfamiliar with the topic, they cannot align
the topic with the vehicle and thus cannot derive common structures sufficient to yield a
proper interpretation. Given that a metaphor serves as an efficient way of describing the
topic from a novel perspective, an adequate theory of metaphor should provide a way of
deriving a metaphorical interpretation primarily from the vehicle.

2.2. Reconciliation between comparison and categorization

The comparison and categorization views have their respective limitations and advanta-
ges; this has led metaphor research to reconcile these two opposite views. Thus, recent
studies (e.g., Bowdle & Gentner, 2005; Glucksberg & Haught, 2006b; Jones & Estes,
2005, 2006; Utsumi, 2007) have provoked a heated debate on how these two views can be
256 A. Utsumi/Cognitive Science 35 (2011)

reconciled; that is, they debate over the metaphor properties that determine the choice of
comprehension strategy between categorization and comparison. Three different views,
namely, the conventionality view, aptness view, and interpretive diversity view, have been
proposed for reconciling the categorization and comparison views; these three views are
summarized in Table 1. In this section, using the metaphor examples listed in Table 2,
I explain how these views predict the comprehension process.

Table 1
Comparison of three hybrid metaphor views that attempt to reconcile the categorization view and the comparison
view
Initial Alternative What Kind of Metaphor
Metaphor View Process Process Should Activate the Alternative?
Conventionality view Comparison Categorization Conventional metaphors (metaphors referring
(Bowdle & Gentner, 2005) to a lexically encoded metaphoric category
that can be attributed to the topic)
Aptness view Categorization Comparison Less apt metaphors (metaphors that cannot
(Glucksberg & Haught, evoke any metaphoric categories relevant to
2006a, 2006b) the important feature of the topic)
Interpretive diversity Categorization Comparison Less diverse metaphors (metaphors that
view (Utsumi, 2007) cannot evoke any rich metaphoric
categories for the topic)
Note. Each view predicts that the comprehension of all metaphors starts with the initial process (the second
column), but a specific kind of metaphor (the last column) is comprehended later by the alternative process (the
third column).

Table 2
Examples of metaphors showing how the three metaphor views make the same or different predictions on the
comprehension process
Conventionality View Aptness View Interpretive Diversity View
Predicted Predicted Predicted
Metaphor Example Conventionality Process Aptness Process Diversity Process
My job is a jail Conventional Categorization Apt Categorization Diverse Categorization
A gene is a blueprint Conventional Categorization Apt Categorization Less diverse Comparison
My memories are money Conventional Categorization Less apt Comparison Diverse Categorization
Birds are airplanes Conventional Categorization Less apt Comparison Less diverse Comparison
A goalie is a spider Novel Comparison Apt Categorization Diverse Categorization
That supermodel is a rail Novel Comparison Apt Categorization Less diverse Comparison
A child is a snowflake Novel Comparison Less apt Comparison Diverse Categorization
A fisherman is a spider Novel Comparison Less apt Comparison Less diverse Comparison
Note. Each metaphor example can be characterized by the distinctive properties (i.e., conventionality,
aptness, diversity) of three metaphor views, and its comprehension process is predicted on the basis of the
characterized properties.
A. Utsumi/Cognitive Science 35 (2011) 257

2.2.1. Conventionality view


Bowdle and Gentner (2005) have claimed that although metaphors are initially processed
as comparisons, conventional metaphors are processed as categorizations by accessing the
stored metaphoric categories that are conventionalized by repeated figurative use of
the vehicle term.1 According to this view, it is vehicle conventionality that determines the
choice of comprehension strategy. Vehicle conventionality (or simply, conventionality)
refers to the degree of association between the figurative meaning of a metaphor and the
vehicle of that metaphor (Bowdle & Gentner, 2005; Gentner & Wolff, 1997; Jones & Estes,
2006). Consider, for example, that the term snowflake is used as the vehicle of metaphors.
As it is rarely used in metaphors, this term seems to have no conventional metaphoric
categories, and it only refers to a crystal of snow. The metaphorical meaning of the meta-
phor ‘‘A child is a snowflake,’’ shown in Table 2, cannot be associated with the vehicle
snowflake before the comprehension of that metaphor. Hence, according to the convention-
ality view, the metaphorical meaning should be derived online by the process of comparison
between child and snowflake. (Therefore, this novel metaphor may mean that all children
are unique and delicate.) Likewise, the terms spider and rail have no salient metaphorical
meanings. Hence, metaphors with these vehicle terms (e.g., ‘‘A goalie is a spider,’’
‘‘That supermodel is a rail,’’ and ‘‘A fisherman is a spider’’ in Table 2) are novel (or less
conventional) and are processed as comparisons.
On the other hand, the term jail refers to the metaphoric category ‘‘unpleasant and confin-
ing things,’’ which is conventionalized by the repeated figurative use of the term. Hence,
the metaphorical meaning of the metaphor ‘‘My job is a jail’’ is highly associated with the
vehicle jail, and the comparison process can be bypassed. As a result, the metaphor is com-
prehended directly by the categorization process, in which the topic is regarded as a member
of the metaphoric category ‘‘unpleasant and confining things.’’ Other terms in Table 2 such
as blueprint, money, and airplane also refer to a conventional metaphoric category relevant
to the metaphors ‘‘A gene is a blueprint,’’ ‘‘My memories are money,’’ and ‘‘Birds are
airplanes.’’ Thus, they are comprehended as categorizations. It must be noted here that when
the vehicle of a metaphor has a conventional metaphoric category but that category cannot
be ascribed to the topic (e.g., the metaphor ‘‘My team is a jail’’ is less likely to mean that
my team is unpleasant and confining), the metaphor is not conventional and should be
comprehended via the comparison process.

2.2.2. Aptness view


Against Bowdle and Gentner’s (2005) conventionality view, the recent development of
the categorization view (Glucksberg & Haught, 2006a, 2006b; Jones & Estes, 2005, 2006)
has advocated that metaphor aptness, not vehicle conventionality, mediates both the pro-
cesses. This view argues that while metaphors are initially processed as categorizations, less
apt metaphors are processed as comparisons, as shown in Table 1. Metaphor aptness (or
simply, aptness) refers to the extent to which the vehicle’s metaphoric category captures an
important feature of the topic (Blasko & Connine, 1993; Chiappe & Kennedy, 1999; Jones
& Estes, 2006). If the vehicle of a metaphor indicates a metaphoric category that is relevant
to an important feature of the topic during a categorization process, the metaphor is apt and
258 A. Utsumi/Cognitive Science 35 (2011)

the comprehension process is completed regardless of whether the metaphoric category is


constructed online or lexically encoded in the vehicle. A typical example is the metaphor
‘‘My job is a jail,’’ whose metaphorical meaning is relevant to the important feature of the
topic (e.g., pleasantness and the degree of confinement are important characteristics of all
jobs). Although some conventional metaphors such as the job-jail example are apt, some
novel metaphors are also apt. For example, the term spider may have no conventional meta-
phoric categories, but the metaphor ‘‘A goalie is a spider’’ in Table 2 is nevertheless highly
apt because the ad hoc metaphoric category ‘‘waiting for something in the net and shooting
it down in a quick manner,’’ which is newly created by this metaphor, expresses an impor-
tant feature of goalies. The metaphor ‘‘That supermodel is a rail’’ is also apt and eventually
processed as categorization because the property implied by the vehicle (i.e., being
extremely thin) is highly relevant to fashion models. As a result, in the case of novel but apt
metaphors, the aptness and conventionality views make a different prediction about the
comprehension process.
According to the aptness view, less apt (or low apt) metaphors may be processed as
comparisons because a categorization does not make sense. For example, although the meta-
phor ‘‘A goalie is a spider’’ is highly apt, a different metaphor with the same vehicle ‘‘A
fisherman is a spider’’ is not apt because the vehicle spider cannot evoke any metaphoric
categories that include the topic fisherman as a typical member. Therefore, it should be rein-
terpreted by the process of comparison, and it yields an interpretation such as ‘‘a fisherman
uses a net to catch fishes like a spider.’’ Conventional metaphors are also less apt when the
conventional metaphoric category is not relevant or informative. The term money refers to
the conventional metaphoric category of precious things, and thus, the metaphor ‘‘My
memories are money’’ in Table 2 can mean that my memories are precious to me. However,
preciousness is not necessarily a salient feature of memory. Hence, the metaphor is less apt
and may be reinterpreted as comparison.

2.2.3. Interpretive diversity view


Utsumi (2007) has recently argued that interpretive diversity determines whether meta-
phors are processed as comparisons or categorizations. Although metaphors are initially
processed as categorizations, less diverse metaphors fail to be processed as categorizations
and must be reinterpreted as comparisons, because their vehicles do not readily exemplify a
metaphorical category to which the topic might belong. Interpretive diversity (or simply,
diversity) refers to the semantic richness of metaphors and depends on the following two
factors: the number of features or interpretations that constitute the figurative meaning, and
the uniformity of the salience distribution of those features (Utsumi, 2005, 2007). A higher
value of interpretive diversity means that the metaphor has a larger number of meanings (or
interpretations) and that the salience of those meanings is more uniformly distributed. For
example, the interpretive diversity view explains that the metaphor ‘‘My job is a jail’’ is
processed as categorization not because it is either conventional or apt, but because it is
interpretively diverse; the evoked metaphoric category implies many equally salient proper-
ties (e.g., unpleasant, involuntary, confining, unrewarding) that are applicable to the topic.
Novel metaphors (e.g., ‘‘A child is a snowflake’’) are also interpretively diverse when an ad
A. Utsumi/Cognitive Science 35 (2011) 259

hoc category evokes many equally salient meanings (e.g., ‘‘A child is unique, delicate, and
likely to change the atmosphere just as snowfall changes the landscape’’). Similarly, some
less apt metaphors such as ‘‘My memories are money’’ may be interpretively diverse
because a number of less relevant but equally salient meanings are evoked from the vehicle
money as a potential metaphorical meaning (e.g., ‘‘My memories are precious to me,’’
‘‘I keep my memories in the soul so as not to miss it,’’ and ‘‘I cannot live without mem-
ory’’). Therefore, all these metaphors are predicted to be comprehended as categorizations.
On the other hand, metaphors are interpretively less diverse when the vehicle cannot
evoke a metaphoric category with many potential features, regardless of whether they are
conventional or whether they are apt. For example, the metaphor ‘‘A fisherman is a spider’’
is interpretively much less diverse because the vehicle spider cannot evoke any rich cate-
gories for the topic fisherman, although ‘‘A goalie is a spider’’ may imply diverse meanings.
For the same reason, some apt metaphors (e.g., ‘‘That supermodel is a rail’’) are also less
diverse. These less diverse metaphors can be reinterpreted via the comparison process. The
interpretive diversity view also predicts that even conventional metaphors (e.g., ‘‘A gene is a
blueprint’’) are comprehended via the comparison process if the conventional metaphoric
categories associated with the vehicle is semantically less rich (or semantically ‘‘narrow’’).

2.3. Disagreement among three hybrid views

These three hybrid views have empirically demonstrated the superiority of their own
views. In these experiments, the metaphor–simile distinction was used as a valuable tool for
examining the use of comparison and categorization during metaphor comprehension. The
basic assumption underlying this method is that the linguistic form of a figurative statement
invokes a specific comprehension process. Metaphors of the form ‘‘An X is a Y’’ should
invite categorization because they are grammatically identical to literal categorization state-
ments, whereas similes of the form ‘‘An X is like a Y’’ should invite comparison because
they are grammatically identical to literal comparison statements. Therefore, if the process
initially invoked by the form is different from the process eventually used for comprehen-
sion, such figurative statements should be reinterpreted; thus, such statements are compre-
hended more slowly and are less preferred. Following Bowdle and Gentner (2005), I refer to
this link between form and process as grammatical concordance.
Bowdle and Gentner’s (2005) conventionality view predicts that novel topic–vehicle
pairs should be comprehended as comparisons, and according to grammatical concordance,
it follows that they should be more comprehensible when presented in the simile form ‘‘An
X is like a Y’’ than in the metaphor form ‘‘An X is a Y.’’ This is because the metaphor form
initially invites an inappropriate process of categorization, whereas similes are compre-
hended as comparisons from the very beginning. In contrast, if topic–vehicle pairs are con-
ventional, both forms should be equally comprehensible. Bowdle and Gentner (2005)
demonstrated that the experimental results were consistent with this prediction; moreover,
they showed that these results could not be explained in terms of metaphor aptness.
On the other hand, Glucksberg and Haught (2006b) demonstrated that novel but apt figu-
rative statements (e.g., ‘‘My lawyer is (like) a well-paid shark’’) were easier to comprehend
260 A. Utsumi/Cognitive Science 35 (2011)

in the metaphor form than in the simile form. This finding is obviously inconsistent with the
prediction of the conventionality view, and therefore, they concluded that the aptness or the
quality of metaphors determines the choice of comprehension strategy. Furthermore, Jones
and Estes (2005, 2006) reported that apt metaphors were more likely to be processed as
categorizations and were comprehended faster than less apt metaphors; however, no such
differences were observed between novel and conventional metaphors.
Against these two hybrid views, Utsumi (2007) demonstrated that only the interpretive
diversity of topic–vehicle pairs was positively correlated with the relative comprehensibility
of the metaphor form, as compared with the simile form; however, neither vehicle conven-
tionality nor metaphor aptness showed a correlation with the relative comprehensibility.
Although diverse pairs were equally comprehensible in both forms, less diverse pairs were
more comprehensible when presented in the simile form than in the metaphor form. In
addition, less diverse metaphors shared more meanings with the corresponding similes
than diverse metaphors, suggesting that less diverse metaphors and similes are likely to be
understood by the same process, namely, a comparison process. Again, such a difference
was not observed for either vehicle conventionality or aptness.
As I have described earlier, recent metaphor studies have provoked a heated debate with
regard to which metaphor properties determine the choice between categorization and com-
parison processes for comprehending metaphors. However, the question of determining
which view is the most plausible remains unresolved. This study thus employs a different
approach to this issue, namely, computational modeling and simulation experiment. I
attempt to provide a computational or theoretical solution to the problem by identifying the
metaphor property from vehicle conventionality, metaphor aptness, and interpretive diver-
sity that best explains the result of model selection between comparison and categorization
models. Given the lack of metaphor research using a model comparison technique, this
study can be regarded as a pioneering study that derives evidence or knowledge about the
mechanism of metaphor comprehension through a computational method.

3. Computational model

3.1. Semantic space model

Recently, vector-based semantic space models have been frequently used to represent
lexical meanings and have proved highly useful for a variety of natural language processing
(NLP) tasks, such as word sense disambiguation (Schütze, 1998), information retrieval
(Deerwester, Dumais, Furnas, Landauer, & Harshman, 1990; Widdows, 2004), thesaurus
construction (Lin, 1998), document clustering (Shahnaz, Berry, Pauca, & Plemmons, 2006),
and essay scoring (Landauer, Laham, & Foltz, 2003). What is more important is that
semantic space models have also provided a useful framework for cognitive modeling, for
example, similarity judgment (Landauer & Dumais, 1997), semantic priming (Jones,
Kintsch, & Mewhort, 2006; Lowe & McDonald, 2000), text comprehension (Foltz, Kintsch,
& Landauer, 1998; Kintsch, 2001), and language-mediated eye movement (Huettig,
A. Utsumi/Cognitive Science 35 (2011) 261

Quinlan, McDonald, & Altmann, 2006). There are also good reasons for using semantic
space models for cognitive modeling and NLP. First, semantic space models are cost-
effective in that it takes lesser time and effort to construct large-scale geometric
representations of word meanings than to construct other types of lexical knowledge, such
as dictionaries or thesauri. Second, they can represent the implicit knowledge of word mean-
ings that dictionaries and thesauri cannot. Lastly, semantic spaces are easy to revise and
extend.
Semantic space models are based on two main assumptions. One assumption is that the
meaning of each word wi can be represented by a high-dimensional vector v(wi) ¼
(wi1,wi2,…,wiD), that is, a word vector. These D real-valued components define the lexical
meaning of the word. The second assumption is that the degree of semantic similarity
sim(wi,wj) between any two words wi and wj can be computed using the similarity function
of their word vectors. Among the variety of functions that can be used to compute the simi-
larity between two word vectors in semantic space models, the cosine cos (v(wi),v(wj)) is
the most widely used. Using the similarity measure, one can easily compute the degree to
which two words are semantically related.
Semantic spaces (or word vectors) are constructed from large bodies of text by observing
distributional statistics of word occurrence. The method for constructing word vectors
generally comprises the following two steps. First, M content words in a given corpus are
represented as R-dimensional initial word vectors, and a M by R matrix A is constructed
using M word vectors as rows.2 Then, the dimension of A’s row vectors is reduced from the
initial dimension R to D and, as a result, a D-dimensional semantic space including M words
is generated.
Numerous methods have been proposed for computing initial word vectors and for reduc-
ing the dimensionality (for an overview, see Padó & Lapata, 2007). Among them, latent
semantic analysis (henceforth, LSA; Landauer & Dumais, 1997; Landauer et al., 2007) is
the most popular. LSA uses the frequency of words in a document (e.g., paragraph) to com-
pute initial vectors, whose dimension R is equal to the number of documents.3 LSA then
reduces the number of dimensions using singular value decomposition (henceforth, SVD).
Many studies (e.g., Kintsch, 2001; Landauer & Dumais, 1997; Landauer et al., 2007) have
demonstrated that LSA successfully mimics a variety of human behaviors, particularly those
associated with semantic processing. Hence, this study uses LSA to construct a semantic
space for computer simulation.

3.2. Semantics in a semantic space model

To simulate human sentence processing including metaphor comprehension, it is neces-


sary to devise a method for generating a vector representation of a piece of text (e.g., phrase,
sentence) from its constituent words. Formally, given a piece of text S comprising a
sequence of words w1,…,wn, a generation function f(v(w1),…,v(wn)) that computes a vector
representation v(S) of S must be defined.
The standard method widely used in LSA research is the computation of the centroid
of constituent word vectors; in other words, a generation function is defined as
262 A. Utsumi/Cognitive Science 35 (2011)
P
fðvðw1 Þ; . . . ; vðwn ÞÞ ¼ ni¼1 vðwi Þ=n. However, such a representation is not intuitively
plausible because word orders (and thus, semantic roles) are completely ignored; the mean-
ing of ‘‘People eat fishes’’ is obviously different from that of ‘‘Fishes eat people.’’ This
drawback is particularly harmful when LSA is used for simulating metaphor comprehen-
sion; when the topic and the vehicle of a metaphor are reversed, its metaphorical meaning is
drastically altered.
Kintsch (2001) proposed a predication algorithm for generating intuitively plausible and
contextually dependent vectors of the proposition with the predicate argument structure.
Given a proposition P(A), where P is a predicate and A is an argument, the predication algo-
rithm first chooses m nearest neighbors of a predicate P (i.e., m words with the highest simi-
larity to P). The algorithm then picks up k neighbors of P that are also related to A.4 Finally,
the algorithm computes the centroid vector of P, A, and the k neighbors of P as a vector rep-
resentation of P(A). The essence of the algorithm lies in the set of neighbors of P relevant to
A. It can be interpreted that this set of words represents the meaning of a predicate P that is
appropriate for describing or predicating an argument A. For example, consider the sentence
‘‘The computer works’’ with the predicate work and the argument computer. The verb work
is semantically ambiguous because it has many different meanings, such as to perform a
task for money, to operate correctly, or to have an effect. In LSA, all these meanings of
work are represented together as a single vector. However, each use of work in a sentence
does not represent all the meanings together; it represents a specific meaning appropriate for
a given context, such as work in ‘‘The computer works’’ is used to represent the meaning to
operate correctly. This is why the simple combination of P and A cannot represent the
appropriate meaning of the whole sentence. In the predication algorithm, such contextually
dependent meaning, for example, features of work that are relevant to the argument com-
puter, can be represented as the set of neighbors of P relevant to A. Kintsch (2001) demon-
strated that the predication algorithm performs in the way it is supposed to perform using
several semantic problems, such as causal inference, similarity judgment, and metaphor
comprehension.

3.3. Modeling the metaphor comprehension processes

As described in Section 2, metaphor comprehension involves two different processes,


namely, categorization and comparison. In this section, these two comprehension processes
are modeled in the LSA framework.

3.3.1. Modeling the process of categorization


As a computational model of the categorization process, this study employs Kintsch’s
(2001) predication algorithm without modification because it is reasonable to assume that a
set of P’s neighbors relevant to A characterizes an abstract metaphorical category exempli-
fied by the vehicle. The word virus in the metaphor ‘‘A rumor is a virus’’ refers to the
abstract category of contagious things that are spreading, preventable, and harmful. These
features of the abstract category can be included in the set of P’s neighbors because they are
closely related to the literal meaning of virus (P) and are also related to rumor (A). On
A. Utsumi/Cognitive Science 35 (2011) 263

the other hand, the metaphorical category of contagious things does not include diseases
literally caused by a virus, such as influenza and tuberculosis, and they will be excluded
from the set of k neighbors because they are not relevant to rumor (A). Glucksberg’s
categorization theory also argues that literal categorization statements can be comprehended
in the same way as metaphorical assertions. The word virus in the literal statement ‘‘Influ-
enza is a virus’’ refers to the literal contagious virus, and the literal category can be repre-
sented as the set of k neighbors; in this case, the set of k neighbors can include meanings
that are not relevant to the metaphorical category, because the argument influenza is also a
virus. It must be noted that Kintsch (2000) briefly describes the relationship between the
predication algorithm and Glucksberg’s categorization theory. He suggests that the predica-
tion algorithm is consistent with the categorization theory, although he does not argue that it
can be considered as a computational model of categorization. Furthermore, Glucksberg
(2003) points out that the predication algorithm is very similar to the categorization process.
Let M be a given nominal metaphor with the vehicle wV (i.e., predicate) and the topic wT
(i.e., argument), and Ni(x) be a set of i neighbors of the word x (i.e., a set of words with i
highest similarity to x). The algorithm Categ(v(wT),v(wV);hcat) of computing a metaphor
vector vcat(M) for M by the process of categorization is given as follows: (Note here that the
algorithm Categ is identical to Kintsch’s [2001] predication algorithm.)

Categ(v(wT),v(wV);hcat)
1. Compute Nm(wV), that is, m neighbors of the vehicle wV.
2. Choose k words with the highest similarity to the topic wT from among Nm(wV).
3. Compute a vector vcat(M) as the centroid of v(wT), v(wV), and k vectors of the words
chosen in Step 2.
The parameter m denotes the number of vehicle neighbors that should be searched for in
the algorithm and the parameter k denotes the number of vehicle neighbors that should be
selected to be similar to the topic. The notation hcat denotes a list of these parameter values
(m, k) for the Categ algorithm.
For example, Table 3 shows the step-by-step behavior of the Categ algorithm when it
computes a vector for the metaphor ‘‘A rumor is a virus’’ with the parameters hcat ¼
(m,k) ¼ (500,5). The first column of Table 3 lists the top 10 and last 2 neighbors of the
vehicle virus included in the set N500(wV) computed at Step 1.5 (Hence, the words capsule
and estimation are the 499th and the 500th nearest neighbors.) These 500 neighbors are
sorted in descending order of cosine similarity to the topic rumor and are listed in the second
column. As k ¼ 5, the top five words (i.e., doubt, trigger, recency, spread/get about, and
guess) are chosen for characterizing a metaphoric category of ‘‘contagious things.’’ (Note
that vehicle neighbors are generally not the nearest neighbors of the topic, although they
move closer to the topic as m grows larger. For example, the top five nearest neighbors of
the topic rumor are disclosure, prosecutor, expose, report, and conjecture.) These chosen
words do not seem to represent the metaphoric category of contagious things on their own,
but their centroid vector is close to the features of contagious things. In Step 3, these five
vectors of the chosen words are averaged with the topic and the vehicle vectors to obtain a
264 A. Utsumi/Cognitive Science 35 (2011)

Table 3.
An example of the step-by-step behavior of the Categ algorithm in comprehending the metaphor ‘‘A rumor is a
virus’’
Nearest Neighbors of the Sorted List of the 500 Vehicle Nearest Neighbors of the
Vehicle virus and Cosines Neighbors and Cosines with the Metaphor Vector and Their
with the Vehicle (Step 1) Topic (Step 2) Cosines (Step 3)
contagion 0.94 doubt 0.28 recency 0.86
fungus 0.86 trigger 0.19 virus 0.63
tolerance 0.81 recency 0.18 epidemic/spread b 0.63
disease onset 0.80 spread/get about a 0.18 contagion 0.60
tuberculosis 0.80 guess 0.17 fungus 0.59
bacteria 0.80 ovulation 0.17 take effect 0.57
antibiotic 0.77 sneeze 0.17 bacteria 0.57
heated 0.75 pregnancy 0.14 tolerance 0.55
drug disaster 0.75 efficacy 0.14 of a kind 0.55
blood sampling. 0.75 appearance .. 0.14 disease onset . 0.55
.. . ..
capsule 0.18 trachea )0.11
estimation 0.18 pulse )0.11
a
The original Japanese word hiromaru means both ‘‘spread’’ and ‘‘get about.’’
b
The original Japanese word ryuko means both ‘‘epidemic’’ and ‘‘spread.’’

metaphor vector vcat(M). The rightmost column of Table 3 lists the top 10 nearest neighbors
of the metaphor vector. They can be regarded as representing the meanings of the metaphor
because, as mentioned in Section 3.1, the cosine similarity between two vectors is used as a
measure of semantic relatedness. Some nearest neighbors of the vehicle (i.e., contagion and
take effect), which are also relevant to the topic, are attributed to the topic rumor. On the
other hand, some nearest neighbors of the vehicle that are not relevant to the topic, such as
tuberculosis, are downplayed. More important, some emergent words, such as recency and
epidemic/spread, which are not close to the vehicle or the topic, but appropriate as a meta-
phorical meaning, are also attributed to the topic; a rumor spreads rapidly, just like a virus is
epidemic, and recency is an important factor for the spread of both a virus and a rumor.

3.3.2. Modeling the process of comparison


For a computational model of the comparison process, I propose the following algorithm
Compa(v(wT),v(wV);hcom) that computes a metaphor vector vcom(M), given that hcom ¼ (k)
is a list of one parameter value k.

Compa(v(wT),v(wV);hcom)
1. Compute k common neighbors Ni(wT) \ Ni(wV) of wT and wV by finding the smallest i
that satisfies |Ni(wT) \ Ni(wV)| ‡ k.
2. Compute a metaphor vector vcom(M) as the centroid of v(wT) and k vectors of the
words chosen in Step 1.
A. Utsumi/Cognitive Science 35 (2011) 265

Table 4 shows an example of how the Compa algorithm works for the metaphor ‘‘A
fisherman is a spider’’ with the parameter hcom ¼ (k) ¼ (3). First, the algorithm finds k
common neighbors of the topic fisherman and the vehicle spider. In this example, the first
common neighbor prey/bait is found when i ¼ 23 (in other words, Ni(wT) \ Ni(wV) first
becomes non-empty when i ¼ 23). As a result, three common neighbors prey/bait, net, and
fishing are found when i ¼ 67. These common neighbors represent a correspondence
between fisherman and spider, in that just as a spider waits for and catches its prey in a net
(web), a fisherman waits for and catches fish (as prey) using a net. (Note that the common
term fishing means the act of catching fish.) Then in Step 2, the vectors of these common
neighbors are averaged with the topic vector such that the resulting metaphor vector is close
to both the original topic vector and common neighbors. As shown in Table 4, the resulting
metaphor vector is indeed close to the three common vectors, as well as the topic fisherman
itself and some of the topic properties, such as fishery and catch landing, which are included
in the top 10 nearest neighbors of the metaphor vector. In particular, the common neighbors
are highlighted, because their cosines with the metaphor vector are higher than those with
the original topic vector. Furthermore, for example, the term wait, which is not initially
salient for fisherman, is also highlighted, although it is not ranked among the top 10 nearest
neighbors; wait has a higher cosine with the metaphor vector (0.35) than with fisherman
(0.24). Taken together, these results mean that the Compa algorithm produces an appropri-
ate metaphor vector that represents an intuitively sensible interpretation that the fisherman’s

Table 4
An example of the step-by-step behavior of the Compa algorithm in comprehending the
metaphor ‘‘A fisherman is a spider’’
Common Neighbors of
fisherman and spider Cosine with fisherman Cosine with spider and
Computed at Step 1 and Its Rank in Parentheses Its Rank in Parentheses
prey/bait a 0.55 (18) 0.31 (23)
net 0.56 (16) 0.28 (58)
fishing 0.46 (67) 0.30 (26)
Top 10 Nearest Neighbors of the Metaphor Cosine with the
Vector Computed at Step 2 Metaphor Vector
prey/bait 0.92
net 0.85
small fish 0.79
fishing 0.79
fishery 0.76
water temperature 0.75
fisherman 0.75
angler 0.72
migration 0.71
catch landing 0.70
a
The original Japanese word esa means both ‘‘prey’’ and ‘‘bait.’’
266 A. Utsumi/Cognitive Science 35 (2011)

specific property of ‘‘waiting for and catching fish using a net’’ is emphasized by this
metaphor.
This algorithm can be regarded as a simplified model of the comparison process that com-
prises alignment and projection (Gentner, 1983; Gentner et al., 2001). The computation of
common neighbors Ni(wT) \ Ni(wV) of topic and vehicle at Step 1 can be reasonably
regarded as the alignment process. It is likely that the set of common neighbors includes
identical elements found in the early stage of alignment (e.g., the arguments net and prey in
the case of ‘‘A fisherman is a spider’’ and the predicates bring and help in the case of
‘‘Socrates is a midwife’’). It must be noted that according to Gentner (1983), the alignment
process is governed by the systematicity principle: a system of relations connected by higher
order relations is preferred over one with an equal number of independent matches. The
Compa algorithm does not explicitly take into account the later stage of alignment in which
structurally consistent mappings are derived from local matches according to the systematic-
ity principle; however, it implicitly deals with some aspects of this process. Predicates for
higher order relations are likely to be expressed by ambiguous words, and such ambiguous
words are likely to be common neighbors because they are similar to many words in a
semantic space. These predicates constitute consistent mappings, and as a result, the Compa
algorithm seems to prefer higher order relations in the alignment process.
Of course, I do not argue that the Compa algorithm completely embodies the later stage
of alignment (and the systematicity principle). This limitation is common to any algorithm
in semantic space models, rather than being specifically meant for the Compa algorithm,
because semantic space models at present lack the ability to represent the relational knowl-
edge of concepts expressed by words (Kintsch, 2008a). However, I do not consider this to
be a serious limitation of the Compa algorithm for the present purpose because most empiri-
cal findings regarding the debate between comparison and categorization (e.g., Bowdle &
Gentner, 2005; Chiappe & Kennedy, 1999; Glucksberg & Haught, 2006b; Jones & Estes,
2006; Utsumi, 2007; Wolff & Gentner, 2000) have been obtained for simple nominal meta-
phors, and understanding these metaphors does not require so many alignments of higher
order relations.
On the other hand, the computation of the centroid of k common neighbors and the topic
vector in Step 2 can be reasonably regarded as the projection process. Projection is a process
of transferring to the topic predicates and arguments connected to the common structure
found in the alignment process. As a result, projected predicates and arguments that are
included in the aligned structure (i.e., those common to both concepts or unique to the vehi-
cle) are highlighted, whereas the original salient properties of the topic are retained (e.g.,
Gentner & Bowdle, 2008; Gentner et al., 2001). This can be modeled as the centroid com-
putation of the vectors of k common neighbors and the topic vector, because the centroid of
multiple vectors is generally close to those vectors. It implies that the resulting centroid vec-
tor is close to the elements in the aligned structure (represented by k common neighbors), as
well as to the salient properties of the topic (represented by the topic vector). In fact, as
shown in the fisherman–spider example, the elements connected to the common structure,
that is, net and prey, which are common to both concepts, and wait, which are unique to the
vehicle but not initially salient in the topic, are highlighted by the Compa algorithm. In
A. Utsumi/Cognitive Science 35 (2011) 267

addition, the original salient properties of the topic, such as fishery, are also included in the
nearest neighbors of the metaphor vector.

3.4. Justifying the plausibility of two models

Before presenting the results of the simulation experiment, I must demonstrate the plausi-
bility of the two algorithms Categ and Compa as models of categorization and comparison
processes, so that the simulation result constitutes a valid test of metaphor theories. In this
regard, to demonstrate the empirical adequacy of the model (McClelland, 2009), I show that
these algorithms are consistent with the following two distinctive processing phenomena:
1. Grammatical concordance between form and function (Bowdle & Gentner, 2005;
Chiappe & Kennedy, 1999; Gentner & Bowdle, 2008; Glucksberg & Haught, 2006b;
Utsumi, 2007).
2. Directionality in the metaphor comprehension process (Gentner & Wolff, 1997;
Wolff & Gentner, 2000).

Psychological studies of metaphor have employed these phenomena as a test to determine


whether people comprehend metaphors as comparisons or categorizations and as a means of
encouraging them to comprehend metaphors as comparisons or categorizations. Hence, if
the algorithms can explain these distinctive phenomena, they are plausible models at least
for reproducing the findings obtained in these psychological studies; this is sufficient for the
present purpose.
As described in Section 2.3, grammatical concordance refers to the link between linguis-
tic form and function. Literal statements of the form ‘‘An X is a Y’’ are interpreted as cate-
gorizations (e.g., ‘‘A whale is a mammal’’), whereas literal statements of the form ‘‘An X is
like a Y’’ are interpreted as comparisons (e.g., ‘‘A whale is like a dolphin’’). Therefore, the
algorithms Categ and Compa are shown to be plausible models of the categorization and
comparison processes if they can produce the sentence vector for literal categorization and
comparison sentences that fits our intuitions about the meaning of these sentences. Specifi-
cally, the Categ algorithm would be expected to produce intuitively more plausible results
for a literal categorization statement ‘‘An X is a Y,’’ whereas the Compa algorithm would
produce more plausible results for a literal comparison statement ‘‘An X is like a Y.’’
Directionality is concerned with the asymmetry of metaphors and its processing stage.
When the topic and the vehicle of a metaphor are reversed, the resulting statement expresses
a different meaning from the original metaphor or, in many cases, does not make sense. For
example, reversing the terms of the metaphor ‘‘A rumor is a virus’’ produces a different
metaphor ‘‘A virus is a rumor,’’ which seems meaningless. Categorization and comparison
processes make a different prediction with regard to when this asymmetry appears during
metaphor comprehension. Categorization is initially asymmetrical (or role specific) because
a metaphoric category is constructed primarily from the vehicle. On the other hand, the
comparison process begins with a symmetrical (i.e., role neutral) alignment process and
becomes asymmetrical in the later projection process. Hence, the plausibility of the
268 A. Utsumi/Cognitive Science 35 (2011)

algorithms can be tested by examining whether and when these algorithms can yield the
result that is consistent with the asymmetry of metaphors.

3.4.1. Products of literal comparison and categorization statements


In general, a categorization statement ‘‘An X is a Y’’ is processed so that, owing to
default inheritance, the features characterizing Y-ness are highlighted unless they are irrele-
vant to X, and other salient features of X are downplayed. For example, consider a literal
categorization statement ‘‘A whale is a mammal.’’ Our intuition says that this statement
modifies our knowledge of the whale by emphasizing its typical mammalian features such
as having animal nature and suckling and de-emphasizing its distinctive, whale-specific
features that many (land-living) mammals do not have, for example, living in the sea and
swimming.
Fig. 1 shows the simulation result of computing the sentence vectors of the literal
categorization statement ‘‘A whale is a mammal’’ by the Categ algorithm and the Compa
algorithm. Fig. 1 depicts one bar chart and two line graphs; the bar chart shows the cosine
similarity between the original vector of topic whale and the vectors for the relevant land-
mark features, whereas the line graphs show the cosine similarity between the vectors vcat(S)
or vcom(S) for the literal categorization statement (S) ‘‘A whale is a mammal’’ and the land-
mark features.6 Note that, as mentioned in Section 3.1, the cosine similarity between two
vectors is used as a measure of semantic relatedness; a higher cosine implies that two words
or sentences are semantically more related. Hence, when the vector of a landmark feature
has a higher cosine with the sentence vector (i.e., vcat(S) or vcom(S) denoted by line graphs)
than with the vector of the topic whale (denoted by bars), the algorithm determines that the
sentence emphasizes or highlights that feature.
The algorithm Categ mimics the intuition (that the categorization statement emphasizes
mammalian features and de-emphasizes whale-specific features) more appropriately than

Cosine
0 0.1 0.2 0.3 0.4 0.5 0.6

animal nature

Mammalian features
suckle

swim Whale
Whale-specific features
Categ algorithm
sea Compa algorithm

Fig. 1. An illustrative example showing that the Categ algorithm generates a more plausible vector than the
Compa algorithm for a literal categorization statement: The case of ‘‘A whale is a mammal.’’ The bar chart
shows the cosine similarity between the original vector of whale and the vectors for the relevant landmark fea-
tures. Two line graphs denote the cosine similarity between the vectors vcat(S) or vcom(S) for the literal categori-
zation statement ‘‘A whale is a mammal’’ and the landmark features. It is preferable that mammalian features
are more highlighted (i.e., graphs are on the right of the bars) and whale-specific features are more downplayed
(i.e., graphs are on the left of the bars). The Categ algorithm increases the cosine similarity of the mammalian
features and decreases the cosine similarity of the whale-specific features to a greater extent than the Compa
algorithm.
A. Utsumi/Cognitive Science 35 (2011) 269

the algorithm Compa, as shown in Fig. 1. First, although the typical mammalian features
animal nature and suckle are highlighted by both algorithms, the Categ algorithm highlights
these mammalian features to a greater extent than the Compa algorithm; the increase in
cosine similarity from the original whale vector (denoted by the gray bars) is greater for the
sentence vector vcat(S) computed by the Categ algorithm (denoted by filled circles) than for
the sentence vector vcom(S) computed by the Compa algorithm (denoted by filled triangles).
The mean increase in cosine similarity for the two mammalian features is 0.27 for the vector
vcat(S) and 0.17 for the vector vcom(S). Second, whale-specific but non mammalian features
such as sea and swim are downplayed by the Categ algorithm, but they are not downplayed
(or somewhat highlighted) by the Compa algorithm. The mean decrease in cosine similarity
from the original whale vector is 0.08 for the vector vcat(S) by Categ, and it is greater than
the mean decrease )0.01 for the vector vcom(S) by Compa. Finally, owing to these differ-
ences, the sentence vector vcat(S) of ‘‘A whale is a mammal’’ by the Categ algorithm
behaves like a mammal more appropriately than the vector vcom(S) by the Compa algorithm;
the vector vcat(S) by Categ has higher cosine with the mammalian features than with the
whale-specific features, but the vector vcom(S) by Compa inappropriately has higher cosine
with the whale-specific feature sea than with the mammalian feature suckle. From these sim-
ulation results, this example indicates that the algorithm Categ works better as a model of
categorization than the algorithm Compa.7
In contrast, it is reasonably assumed that people comprehend a literal comparison state-
ment ‘‘An X is like a Y,’’ such that only the common features shared by X and Y are high-
lighted without other Y-ness features being highlighted. For example, in comprehending ‘‘A
whale is like a dolphin,’’ people would try to seek commonality between whale and dolphin
and arrive at the interpretation in which common features (e.g., living in the sea, swimming)
are emphasized but dolphin-specific features (e.g., therapy, intelligence) are not highlighted.
Fig. 2 shows that such a pattern of interpretation can be replicated more appropriately

Cosine
0 0.1 0.2 0.3 0.4

sea

Common features
swim

therapy Whale
Categ algorithm
intelligent Dolphin-specific features Compa algorithm

Fig. 2. An illustrative example showing that the Compa algorithm generates a more plausible vector than the
Categ algorithm for a literal comparison statement: The case of ‘‘A whale is like a dolphin.’’ The bar chart
shows the cosine similarity between the original vector of whale and the vectors for the relevant landmark fea-
tures. Two line graphs denote the cosine similarity between the vectors vcat(S) or vcom(S) for the literal categori-
zation statement ‘‘A whale is like a dolphin’’ and the landmark features. It is preferable that the common
features are more highlighted (i.e., graphs are on the right of the bars) and dolphin-specific features are not high-
lighted (i.e., graphs are located near the bars). The Compa algorithm increases the cosine similarity of the com-
mon features to a greater extent and the cosine similarity of the dolphin-specific features to a lesser extent than
the Categ algorithm.
270 A. Utsumi/Cognitive Science 35 (2011)

by the algorithm Compa than by the algorithm Categ. (Note that the bar chart of Fig. 2
represents the cosine between the whale vector and feature vectors, and two graphs repre-
sent the cosines between the sentence vector vcat(S) or vcom(S) for the literal comparison
statement (S) ‘‘A whale is like a dolphin’’ and feature vectors.) First, the common features
sea and swim are highlighted by the Compa algorithm (i.e., they are closer to the sentence
vector vcom(S) than to the original whale vector), but the Categ algorithm undesirably down-
plays the common feature sea. Moreover, the increase in the cosine similarity of the feature
swim is greater for the Compa algorithm than for the Categ algorithm. Second, the unshared
dolphin-specific features therapy and intelligent are less highlighted by the Compa algo-
rithm than by the Categ algorithm. The mean increase in cosine similarity of two dolphin-
specific features is 0.06 for the vector vcom(S) and is smaller than the mean increase of 0.21
for the vector vcat(S). Finally, as a result, the Compa algorithm generates an intuitively plau-
sible sentence vector that is more similar (and thus, semantically more related) to the com-
mon features than to the dolphin-specific features. The Categ algorithm, however, does not
generate such a plausible vector; the generated sentence vector is less similar to the common
features than to the dolphin-specific feature therapy. These simulation results indicate that
the Compa algorithm works better as a model of comparison than the Categ algorithm.

3.4.2. Directionality in the metaphor comprehension process


In this section, I test the plausibility of the algorithms Categ and Compa by showing
whether and when these algorithms can yield the asymmetry of metaphors.
Concerning whether the algorithms Categ and Compa are consistent with the asymmetry
of metaphors, they can indeed compute different meanings (i.e., vectors) for an original
metaphor and its reversed metaphor, as shown in Tables 5 and 6. Table 5 illustrates that the
sentence vector of ‘‘A rumor is a virus’’ computed by the Categ algorithm (with the param-
eters hcat ¼ (m,k) ¼ (500,5)) highlights the feature contagion that is typical of virus (i.e.,
contagion has higher cosine with ‘‘A rumor is a virus’’ than with rumor alone, which is
shown by DCosine ¼ 0.60). Further, this sentence vector downplays scandal that is typical
of rumor but irrelevant to the metaphor (i.e., scandal has lower cosine with ‘‘A rumor is a

Table 5
Asymmetry between the metaphor ‘‘A rumor is a virus’’ and its reversed metaphor ‘‘A virus is a
rumor’’ generated by the Categ algorithm
contagion scandal
Cosine DCosine Cosine DCosine
Rumor 0.00 – 0.32 –
Virus 0.94 – 0.12 –
A rumor is a virus 0.60 0.60 0.10 )0.22
A virus is a rumor 0.52 )0.42 0.31 0.19
Notes. Cosine denotes the cosine similarity to two landmarks contagion and scandal. DCo-
sine denotes an increase in cosine similarity by metaphorization, which is equal to (Cosine of the
metaphor) ) (Cosine of the topic alone).
A. Utsumi/Cognitive Science 35 (2011) 271

Table 6
Asymmetry between the metaphor ‘‘Deserts are ovens’’ and its reversed metaphor ‘‘Ovens are
deserts’’ generated by the Compa algorithm
dry vast dish
Cosine DCosine Cosine DCosine Cosine DCosine
Deserts 0.28 – 0.56 – )0.02 –
Ovens 0.35 – )0.02 – 0.70 –
Deserts are ovens 0.78 0.50 0.41 )0.15 0.29 0.31
Ovens are deserts 0.78 0.43 0.37 0.39 0.32 )0.38
Notes. Cosine denotes the cosine similarity to the three landmarks. DCosine denotes an
increase in cosine similarity by metaphorization, which is equal to (Cosine of the metaphor)
) (Cosine of the topic alone).

virus’’ than with rumor, which is shown by DCosine ¼ )0.22). On the other hand, the
vector for ‘‘A virus is a rumor’’ shows a different result; it highlights the feature scandal
(DCosine ¼ 0.19) and downplays the feature contagion (DCosine ¼ )0.42). Although the
cosine similarity of contagion is still higher than the cosine of scandal, it may reflect the
intuition that the reversed metaphor ‘‘A virus is a rumor’’ does not make sense. This mean-
ingless metaphor cannot appropriately describe the relevant features of a virus, and thus, the
originally salient features of a virus may be still salient in the metaphor. Likewise, Table 6
shows that the Compa algorithm (with the parameter hcom ¼ (k) ¼ (3)) reflects the asymme-
try of the metaphor; two metaphor vectors differ in that the vector for ‘‘Deserts are ovens’’
highlights the two features dry and dish, whereas the vector for ‘‘Ovens are deserts’’ high-
lights different features dry and vast. Note that the Compa algorithm may highlight common
properties (e.g., dry in this case) shared by the topic and the vehicle regardless of their order.
This tendency is consistent with the existing findings that reversed similes (i.e., reversed
figurative comparisons) preserved the original interpretation better and lowered the
comprehensibility to a lesser extent than the reversed metaphors (Chiappe, Kennedy, &
Smykowski, 2003).
With regard to when the algorithms Categ and Compa generate the asymmetry of meta-
phors, they differ in the stage at which directionality arises during computation, just as the
categorization and comparison processes differ as to when the asymmetry appears during
metaphor comprehension. In general, the algorithm Categ is initially asymmetrical in the
same way as the categorization process. The first step (Step 1) of the algorithm Categ com-
putes a set of m neighbors of the vehicle, that is, the set of Y’s neighbors for the original
metaphor ‘‘An X is a Y,’’ and the set of X’s neighbors for the reversed metaphor ‘‘A Y is
an X.’’ It is very likely that these two sets are not only different but also have quite a small
overlap, unless X and Y have very similar vectors. Particularly in the case of metaphors, X
and Y are not semantically similar, and thus, it seems much less likely that two sets of
neighbors would be identical; in many cases, they do not even overlap (especially when the
number of vehicle neighbors m is small). As a result, the second step (Step 2) chooses a dif-
ferent set of k neighbors between the original metaphor and its reversed metaphor. For
example, Table 7 lists 20 neighbors of the vehicle for the metaphors ‘‘A rumor is a virus’’
272 A. Utsumi/Cognitive Science 35 (2011)

Table 7
Twenty neighbors of the vehicle computed at Step 1 of the Categ algorithm in comprehending
the metaphor ‘‘A rumor is a virus’’ and its reversed metaphor ‘‘A virus is a rumor’’
‘‘A rumor is a virus’’ ‘‘A virus is a rumor’’
(neighbors of virus) (neighbors of rumor)
contagion disclosure
fungus prosecutor
tolerance expose
disease onset report
tuberculosis conjecture
bacteria lady
antibiotic surprised
heated trouble
drug disaster public prosecutors office
blood sampling aide
blood donation resignation
administration scandal
prevention tale
take effect fact
blood transfusion business trip
blood reveal
immunity monthly
vaccine illegitimate child
chronic mass media
side-effect disavow
Note. The words are listed in descending order of cosine similarity to the vehicle.

and ‘‘A virus is a rumor’’ computed in Step 1 of the Categ algorithm. In this example, the
two sets of vehicle neighbors do not overlap. Hence, when m £ 20, the sets of k neighbors
chosen at Step 2 are inevitably different. Even in the case of m ¼ 500 and k ¼ 5, the sets of
vehicle neighbors N500(virus) and N500(rumor) only have two common words (i.e., doubt,
trigger). These two common words are chosen at Step 2 for both metaphors, but the other
three words differ between them. The words recency, spread/get about, and guess are cho-
sen for ‘‘A rumor is a virus,’’ as shown in Table 3, whereas fear, topic, and emerge/show up
are chosen for the reversed metaphor ‘‘A virus is a rumor.’’ Furthermore, when the reversed
versions of all the metaphors used in the simulation of Section 4 are computed by the Categ
algorithm with the optimal parameters, none of the reversed metaphors produce the same
set of m neighbors in Step 1 and the same set of k neighbors in Step 2 as the original meta-
phors. These results show that the Categ algorithm is basically asymmetrical from the
beginning.
On the other hand, following the same steps as the comparison process, the algorithm
Compa is initially symmetrical and asymmetrical later. The first step (Step 1) of the algo-
rithm computes k common neighbors of the topic and the vehicle, and thus it is obviously
symmetrical; the Compa algorithm computes the same set of common neighbors for the
reversed metaphors as the original metaphor. The second step (Step 2) computes the
A. Utsumi/Cognitive Science 35 (2011) 273

centroid of the topic and k common words as the metaphor vector, and the topic X of the
original metaphor ‘‘An X is a Y’’ is different from the topic Y of the reversed metaphors
‘‘A Y is an X.’’ Hence, Step 2 produces different metaphor vectors when the word order is
reversed, which means that Step 2 is asymmetrical.
Note that, in almost all cases (particularly in the case of a small m), the k common neigh-
bors computed at Step 1 of the Compa algorithm differ from the k neighbors computed at
Step 2 of the Categ algorithm. Words that are highly similar to both the vehicle and the
topic are likely to be included in both sets of k neighbors (i.e., k neighbors computed by
Compa and those computed by Categ); however, such words are quite rare, given that the
topic and the vehicle of metaphors are usually semantically dissimilar. Hence, most of the
vehicle neighbors are not neighbors of the topic.
To sum up the discussion, both algorithms can produce the asymmetry of metaphor.
Furthermore, the Categ algorithm behaves like categorization in that the computation is
asymmetrical from the beginning, whereas the Compa algorithm behaves like comparison in
that the computation is initially symmetrical and asymmetrical later. This consistency
strengthens the plausibility of the algorithms as a model of categorization and comparison.

4. Simulation experiment

This section presents the details and the results of the simulation experiment comprising
model selection and theory testing. The overall procedure of model selection and theory
testing is summarized as follows and is illustrated in Fig. 3.

Computer Model fit Parameter Model


interpretation (4.2) assessment (4.3) estimation (4.4) selection (4.5)
Categ
algorithm
qcat (θcat) D(p||qcat (θcat)) θ̂cat
(3.3)
AICcat
Human interpretation (4.1)
Metaphor M p The model selected
(4.1)

AICcom
Compa
algorithm
qcom (θcom) D(p||qcom (θcom )) θ̂com
(3.3) Model selection

Pass Theory testing


Vehicle conventionality Class D(p||q̂)
(4.1) (Categorization Goodness-of-fit testing (4.6)
Metaphor aptness or Comparison)
(4.1)
Interpretive diversity Discriminant analysis (4.7)
(4.1)

Fig. 3. An illustration of model selection and theory testing procedure. The numbers in parentheses denote the
number of the section in which the corresponding procedure is explained.
274 A. Utsumi/Cognitive Science 35 (2011)

Table 8
Metaphors used in the simulation experiment
1. Life is a journey. (Jinsei ha tabi da) 2. Life is a game. (Jinsei ha ge-mu da)
3. Love is a journey. (Ai ha tabi da) 4. Love is a game. (Ai ha ge-mu da)
5. Anger is the sea. (Ikari ha umi da) 6. Anger is a storm. (Ikari ha arashi da)
7. Sleep is the sea. (Nemuri ha umi da) 8. Sleep is a storm. (Nemuri ha arashi da)
9. Perfume is a bouquet. (Ko-sui ha hanataba da) 10. Perfume is ice. (Ko-sui ha koori da)
11. A star is a bouquet. (Hoshi ha hanataba da) 12. A star is ice. (Hoshi ha koori da)
13. A sky is a mirror. (Sora ha kagami da) 14. A sky is a lake. (Sora ha mizuumi da)
15. An eye is a mirror. (Me ha kagami da) 16. An eye is a lake. (Me ha mizuumi da)
17. A lover is the sun. (Koibito ha taiyo da) 18. A lover is a rainbow. (Koibito ha niji da)
19. One’s hope is the sun. (Kibou ha taiyo da) 20. One’s hope is a rainbow. (Kibou ha niji da)
21. A child is water. (Kodomo ha mizu da) 22. A child is a jewel. (Kodomo ha houseki da)
23. Words are water. (Kotoba ha mizu da) 24. Words are jewels. (Kotoba ha houseki da)
25. An elderly person is a doll. 26. An elderly person is a deadwood.
(Roujin ha ningyou da) (Roujin ha kareki da)
27. One’s voice is a doll. (Koe ha ningyou da) 28. One’s voice is a deadwood. (Koe ha kareki da)
29. One’s character is fire. (Seikaku ha hi da) 30. One’s character is a stone. (Seikaku ha ishi da)
31. A marriage is fire. (Kekkon ha hi da) 32. A marriage is a stone. (Kekkon ha ishi da)
33. Death is the night. (Shi ha yoru da) 34. Death is the fog. (Shi ha kiri da)
35. Anxiety is the night. (Fuan ha yoru da) 36. Anxiety is the fog. (Fuan ha kiri da)
37. Time is money. (Jikan ha okane da) 38. Time is an arrow. (Jikan ha ya da)
39. Memory is money. (Kioku ha okane da) 40. Memory is an arrow. (Kioku ha ya da)
Note. The original Japanese expressions used in the experiment are shown in parentheses, preceded by their
literal English translations.

1. Forty Japanese metaphors of the form ‘‘An X is a Y’’ were used for the simulation
experiment, as listed in Table 8. The human interpretation data (i.e., a list of mean-
ings W(M) and its salience distribution p in Fig. 3) of these metaphors, their ratings of
vehicle conventionality and metaphor aptness, and their interpretive diversity values
were obtained beforehand in a previous experiment (Utsumi, 2007). Section 4.1 and
the Appendix A will explain how these data were obtained.
2. For each of the 40 metaphors, optimal parameters ^hcat and ^hcom of the two algorithms
Categ and Compa were estimated by the maximum likelihood method as follows.
(a) For given parameter values h, an algorithm (Categ or Compa) computed the
similarity distribution q(h) (i.e., qcat(hcat) or qcom(hcom) in Fig. 3) for the list of
meanings W(M).8 The method for computing the similarity distribution is
described in Section 4.2.
(b) Kullback–Leibler divergence D(p || q(h)) (henceforth, KL-divergence) between
the computed similarity distribution q(h) and the salience distribution p is com-
puted as a measure of the match between the model and data. KL-divergence
and its relation to the maximum likelihood method will be described in Sec-
tion 4.3.
(c) The optimal parameter ^ hcat or ^hcom ) is obtained by finding the parameter
h (i.e., ^
values that minimize the KL-divergence, which is described in Section 4.4.
A. Utsumi/Cognitive Science 35 (2011) 275

3. For each metaphor, the two algorithms (i.e., models) were compared using
Akaike’s information criterion (henceforth, AIC), which is a measure of statisti-
cal model selection considering the tradeoff between the model’s precision (i.e.,
the maximum log-likelihood for the model computed by the minimum KL-
divergence) and complexity (i.e., the number of free parameters of the model).
The model with a smaller AIC is selected as the best one. Hence, if the AIC of
the Categ algorithm (denoted by AICcat in Fig. 3) is smaller than the AIC of
the Compa algorithm (denoted by AICcom), the categorization model is selected
as the best one. Likewise, if AICcat is greater than AICcom, the comparison
model is selected as the best one. The details of AIC and the result of model
selection are described in Section 4.5.
4. For each metaphor and its selected model, whose optimal similarity distribution is
denoted by q^ in Fig. 3, the goodness-of-fit between the model and data is tested using
a chi-square test, owing to the well-known fact that KL-divergence can be approxi-
mated by chi-square. Metaphors that exhibit the significant discrepancy between the
model and data are excluded from the subsequent analysis. The issue of the good-
ness-of-fit test is described in Section 4.6.
5. A linear discriminant analysis is conducted with the selected model (i.e., categoriza-
tion or comparison) as the dependent variable and three metaphor properties (vehicle
conventionality, metaphor aptness, and interpretive diversity) as the independent
variables. The method and the result of the discriminant analysis are described in
Section 4.7.

4.1. Metaphors, human interpretation data, and metaphor properties

This study employed 40 metaphors, as shown in Table 8. They were created from 10
groups, each of which comprised two topic words and two vehicle words. For each group,
four metaphors were created from all possible pairings of two topic words with two vehicle
words. For example, from the two topics, anger and sleep, and the two vehicles, sea and
storm, the following four metaphors were created: ‘‘Anger is the sea,’’ ‘‘Anger is a storm,’’
‘‘Sleep is the sea,’’ and ‘‘Sleep is a storm.’’ Topic and vehicle words were selected from an
experimental study on Japanese metaphor and a list of words frequently used for Japanese
metaphors so that they are highly frequent and familiar.
For human interpretation data of metaphors, this study employed the result of the psycho-
logical experiment (Experiment 2) that Utsumi (2007) conducted using the same 40 meta-
phors. This experiment addressed the difference in the comprehensibility between the
metaphor form and the simile form of a topic–vehicle pair and demonstrated that among the
three hybrid metaphor theories, the interpretive diversity view best explained the observed
comprehensibility difference. This study used some of the results obtained in this experi-
ment, namely, the listed meanings of metaphors (with the number of participants who listed
that meanings), ratings of vehicle conventionality and metaphor aptness, and interpretive
276 A. Utsumi/Cognitive Science 35 (2011)

diversity values. A detailed procedure for obtaining these results in this experiment is pro-
vided in the Appendix A.
For each metaphor M, a list W(M) of metaphorical meanings wi with the number of par-
ticipants xi who listed that meaning was provided by Utsumi’s (2007) experiment. These
meanings were used as landmarks with respect to which the computational model’s interpre-
tation and human interpretation were compared for evaluation. Note that in the experiment,
participants were instructed that they should write down three or more meanings by single
words wherever possible; as a result, the final list of meanings included only single words,
which corresponds to the unit of representation of the semantic space model. For example,
the list of meanings for the metaphor ‘‘Anger is the sea’’ includes eight features, such as
fearful/dreadful, rage/stormy, and deep, as shown in Fig. 4.
Using these data, the degree of salience pi for each meaning wi in the list W(M) isP
defined
as the ratio of the number of participants xi to the total number of tokens N ¼ nj¼1 xj ,
where n¼|W(M)|.
xi xi
pi ¼ Pn ¼ : ð1Þ
x
j¼1 j N
This definition of salience is almost identical to the definition of salience for prototype rep-
resentation of concepts by Smith, Osherson, Rips, and Keane’s (1988), and it reflects the
subjective frequency with which the feature (i.e., meaning) occurs in people’s interpreta-
tions of the metaphor. This definition is psychologically plausible because it has been
pointed out that frequency is closely related to salience (Giora, 2003; Tversky, 1977). For
example, the bar chart of Fig. 4 indicates the degree of salience of the eight features that the

pi or qi ( θ̂)
0.00 0.05 0.10 0.15 0.20 0.25 0.30

fearful/dreadful
rage/stormy

deep

big
surge

wave
Human (pi )
wide Categorization (qcat,i ( θ̂cat ) )
strong Comparison (qcom,i ( θ̂com ) )

Fig. 4. Simulation results for the metaphor ‘‘Anger is the sea.’’ The bar chart indicates the degree of salience pi
of the human interpretation. Two line graphs indicate the normalized degree of similarity qcat;i ð^hcat Þ or
qcom;i ð^hcom Þ computed by Categ or Compa algorithms. The closer a line graph is to the bar chart, the better is the
match between its corresponding model and the human data. For six of eight features (i.e., fearful/dreadful,
rage/stormy, deep, surge, wave, strong), the normalized degree of similarity qcom;i ð^hcom Þ computed by the
Compa algorithm is closer to the human data p than the degree of similarity qcat;i ð^hcat Þ by the Categ algorithm.
This result indicates that the metaphor ‘‘Anger is the sea’’ is comprehended by the comparison process.
A. Utsumi/Cognitive Science 35 (2011) 277

participants listed as the meaning of the metaphor ‘‘Anger is the sea.’’ The meaning fearful/
dreadful had the highest salience of 0.25, indicating that the number of participants who
listed it was the largest.
The interpretive diversity of each metaphor M was calculated using Shannon’s entropy
H(p), defined by the following formula (Utsumi, 2005, 2007):
X n
HðpÞ ¼  pi log pi : ð2Þ
i¼1

For example, the interpretive diversity of the metaphor ‘‘Anger is the sea’’ in Fig. 4 was
calculated as 2.71, given that the bar length for a feature wi corresponds to pi. The mean
interpretive diversity across the 40 metaphors used in the experiment was 3.01 (SD ¼ 0.42),
ranging from 2.09 (‘‘One’s character is a stone,’’ numbered 30 in Table 8) to 3.76 (‘‘An
eye is a mirror,’’ numbered 15).
For vehicle conventionality and metaphor aptness, this study used the mean ratings for
each metaphor obtained in the previous experiment (Utsumi, 2007). The metaphors were
rated on a 7-point scale ranging from 1 (very novel) to 7 (very conventional) for convention-
ality, or from 1 (not at all apt) to 7 (extremely apt) for aptness. The details of the rating
experiment are provided in the Appendix A. The mean conventionality rating across the 40
metaphors was 4.46 (SD ¼ 1.19), ranging from 1.83 (‘‘Memory is an arrow,’’ numbered 40
in Table 8) to 6.28 (‘‘A lover is the sun,’’ numbered 17). The mean aptness rating was 3.70
(SD ¼ 1.07), ranging from 1.83 (‘‘One’s voice is a doll,’’ numbered 27) to 6.00 (‘‘Life is a
journey,’’ numbered 1).

4.2. Computer interpretation

To generate a semantic space for computer simulation, a term–paragraph matrix A was


constructed from a Japanese corpus of 523,249 paragraphs containing 62,712 different
words, which were derived from a CD-ROM of Mainichi newspaper articles published in
1999. The dimension of the row vectors of A was then reduced using SVD. The number of
dimensions D of the semantic space was determined to be 300 because a 300-dimensional
space usually yields the best performance for simulating human behavior (e.g., Kintsch,
2001; Landauer & Dumais, 1997).
Using the constructed semantic space, the metaphorical interpretation (i.e., similarity dis-
tribution q(h)) of a metaphor M was computed as follows:
1. For a given list of parameter values h (i.e., hcat or hcom presented in Section 3.3), an
algorithm (Categ or Compa) computed the metaphor vectors v(M;h) (i.e., vcat(M;hcat)
or vcom(M;hcom)).
2. For each of the features wi listed for a metaphor M, the cosine similarity cos (v(wi),
v(M;h)) between the feature wi and the metaphor vector v(M;h) was computed. Fea-
tures with higher cosine similarity to the metaphor vector were more appropriate as a
metaphorical meaning, or in other words, more relevant to the metaphorical interpre-
tation.
278 A. Utsumi/Cognitive Science 35 (2011)

3. Finally, the similarity distribution q(h) (i.e., qcat(hcat) or qcom(hcom)) was calculated by
the following formulas:
di ðM; hÞ
qi ðhÞ ¼ Pn ; ð3Þ
j¼1 dj ðM; hÞ

di ðM; hÞ ¼ cosðvðwi Þ; vðM; hÞÞ  minfcosðvðxÞ; vðM; hÞÞg: ð4Þ


x2X

In Eq. 4, X denotes the set of all words in the semantic space, and thus, minx2X
{cos (v(x),v(M;h))} denotes the smallest (minimum) cosine value between the meta-
phor vector v(M;h) and any word vector v(x) in the semantic space. Therefore,
di(M;h) expresses the deviation of wi’s cosine similarity from the minimum cosine.
Equation 3 shows that the normalized degree of similarity qi(h) for the feature wi is
defined as the ratio of the derivation of cosine, so that it is analogous to the degree
of salience p defined in Eq. 1. The reason for using the deviation of cosine similarity
instead of the cosine similarity itself is that cosine takes a negative value, and thus,
the absolute cosine value does not necessarily reflect the degree of similarity in that,
for example, a zero cosine does not imply that there is no similarity.
For instance, the two line graphs in Fig. 4 illustrate the similarity distribution computed
using two metaphor vectors for the metaphor ‘‘Anger is the sea.’’ The solid line with filled
circles depicts the similarity distribution qcat(hcat) computed by the categorization algorithm
Categ with hcat ¼ (m,k) ¼ (10,7), and the dotted line with filled triangles depicts the simi-
larity distribution qcom(hcom) computed by the comparison algorithm Compa with hcom ¼
(k) ¼ (1). This figure shows, for example, that the normalized degree of similarity qi(h)
for the feature fearful/dreadful is qi,cat(hcat) ¼ 0.157 for the Categ algorithm and
qi,com(hcom) ¼ 0.247 for the Compa algorithm.

4.3. Assessing the match between model and data

As mentioned in the beginning of this section, the match between the model and data can
be assessed as the degree of similarity between the computed distribution q(h) and the
human salience distribution p. The greater the similarity between the two distributions, the
better is the algorithm’s simulation of human interpretation.
To quantitatively evaluate similarity or dissimilarity between the two distributions, this
study used KL-divergence, which is also known as relative entropy. KL-divergence is the
most popular measure of dissimilarity between two probability distributions and has been
applied in computational semantics as a semantic similarity measure (e.g., Hu et al., 2006;
Manning & Schütze, 1999). The KL-divergence D(p || q(h)) of the computed similarity dis-
tribution q(h) relative to human salience distribution p is given by the following formula:
Xn
pi
Dðp jj qðhÞÞ ¼ pi ln : ð5Þ
i¼1
qi ðhÞ

As it measures how badly the model’s distribution q(h) approximates observed distribution
p, lower divergence implies better performance.
A. Utsumi/Cognitive Science 35 (2011) 279

Minimizing the KL-divergence is equivalent to maximizing the likelihood function


(Bishop, 2006), because Eq. 5 can be written as:

X
n X
n
DðpjjqðhÞÞ ¼ pi ln pi  pi ln qi ðhÞ
i¼1 i¼1
1 X
n
¼ HðpÞ  xi ln qi ðhÞ ð6Þ
N i¼1
1
¼ HðpÞ  ln LðhÞ:
N

The first term )H(p) on the right-hand side of Eq. 6 is the (negative) interpretive
diversity defined by Eq. 2 and independent
P of h, and the second term is the (negative)
log-likelihood function ln LðhÞ ¼ ni¼1 xi ln qi ðhÞ (divided by N) for h under the
distribution q(h). Therefore, minimizing D(p || q(h)) is equivalent to maximizing the
log-likelihood function.

4.4. Estimating optimal parameters by maximum likelihood method

For each metaphor, optimal parameters ^ hcat and ^hcom of the two algorithms (i.e., categori-
zation and comparison models) were computed by finding the parameter values that mini-
mize the KL-divergence, that is, maximize the likelihood function. The parameter space
was given such that the parameter m varied between 10 and 45 in steps of 5 and between 50
and 500 in steps of 50, and the parameter k varied between 1 and 10.
In the case of the metaphor ‘‘Anger is the sea,’’ for example, the parameter hcat ¼
(m,k) ¼ (10,7) minimized the KL-divergence of the categorization model (i.e., Categ algo-
rithm), and the parameter hcom ¼ (k) ¼ (1) minimized the KL-divergence of the compari-
son model (i.e., Compa algorithm). (Two line graphs in Fig. 4 show the similarity
distributions at these optimal parameters.) The minimum KL-divergence was 0.147 for the
categorization model and 0.0854 for the comparison model. These KL-divergence values
indicate that the similarity distribution computed by the comparison model (i.e., the dotted
line with filled triangles) is more similar to the human salience distribution (i.e., bar chart)
than the similarity distribution computed by the categorization model (i.e., the solid line
with filled circles).

4.5. Selecting the best comprehension model by AIC

To compare the two models while considering the tradeoff between the model’s precision
and complexity, this study used AIC, which has been widely used as a tool for statistical
model selection (e.g., Wagenmakers & Farrell, 2004). In general, AIC is given by:

AIC ¼ 2 ln Lð^
hÞ þ 2K; ð7Þ
280 A. Utsumi/Cognitive Science 35 (2011)

where Lð^ hÞ is the maximum value of the likelihood function for the model, and K is the
number of parameters in the model. Smaller AIC values represent more plausible models.
Hence, the model with the smallest AIC can be selected as the best one.
In this study, the AIC value can be calculated by:
Xn
AIC ¼ 2N pi ln qi ð^
hÞ þ 2K; ð8Þ
i¼1

where K ¼ 2 for the categorization model (because the algorithm Categ has two parameters
m and k), and K ¼ 1 for the comparison model (because the algorithm Compa has only one
parameter k). For each metaphor, DAIC was calculated as the difference between the AIC
value of the categorization model (AICcat) and the AIC value of the comparison model
(AICcom).

DAIC ¼ AICcat  AICcom


Xn
qi;com ð^hcom Þ ð9Þ
¼ 2N pi ln þ 2:
i¼1 qi;cat ð^
hcat Þ

If DAIC > 0 (i.e., AICcom is less than AICcat), the comparison model (Compa) was selected
as the one that best approximated the underlying comprehension process; conversely, if
DAIC < 0 (i.e., AICcat is less than AICcom), then the categorization model (Categ) was
selected.
The result of the model selection for all 40 metaphors was that the categorization model
was selected for 11 metaphors (the metaphors numbered 3, 17, 18, 25, 28, 31, 32, 33, 37,
38, and 39 in Table 8), and the comparison model was selected for 29 metaphors. The mean
DAIC for the 11 metaphors judged to be comprehended as categorizations was )2.71
(SD ¼ 2.64), and that for the 29 metaphors judged to be comprehended as comparisons was
1.87 (SD ¼ 1.38). For example, in the case of ‘‘Anger is the sea’’ in Fig. 4, the AIC value
of the categorization model was 165.98, and that of the comparison model was 159.07.
Because DAIC ¼ 6.91 was positive, the comparison model was selected as the best model,
suggesting that the metaphor ‘‘Anger is the sea’’ is likely to be comprehended as a compari-
son, rather than as a categorization. Indeed, Fig. 4 shows that for six of eight features, the
normalized degree of similarity qcom;i ð^
hcom Þ computed by the comparison model is closer to
the human data than the degree of similarity qcat;i ð^hcat Þ by the categorization model. Further-
more, the comparison model correctly distinguishes the three most salient features (i.e.,
fearful, rage, and deep) from other less salient features by computing the degree of similar-
ity, whereas the categorization model does not.

4.6. Testing the goodness-of-fit between data and model

For each metaphor and its selected model, the chi-square goodness-of-fit test was
conducted to examine whether the match between the model and data was significant.
Metaphors that displayed a significant discrepancy between the model and data would
A. Utsumi/Cognitive Science 35 (2011) 281

be excluded from the subsequent analysis. (Note that the goodness-of-fit test was not
applied to the model that was not selected by the AIC model selection procedure
because the goodness-of-fit for the model not selected has no influence on the
subsequent analysis.)
It is well known that the KL-divergence can be approximated by chi-square (divided by
2N) because it is identical up to the third order (Cover & Thomas, 2006).

1X n
ðpi  q^i Þ2
Dðp jj q^Þ ¼ þ Oðjjpi  q^i jj3 Þ
2 i¼1 q^i
  ð10Þ
1 X n
Nðpi  q^i Þ2 1 2
 ¼ v :
2N i¼1 q^i 2N

Hence, the discrepancy between the model distribution and the observed human distribution
is significant (i.e., the null hypothesis that the data distribution follows the model distribu-
tion is rejected) if the KL-divergence multiplied by 2N exceeds the critical value of the
chi-square distribution with n ) 1 degrees of freedom.

qÞ  v21a ðn  1Þ:
2N  Dðpjj^ ð11Þ
The result of the goodness-of-fit test for all the metaphors was that none of the fits of the
selected model to the data were rejected at a significance level of a ¼ 0.05; thus, all the 40
metaphors would be included in the subsequent analysis. For example, in the case of the
metaphor ‘‘Anger is the sea’’ in Fig. 4, the selected model (i.e., comparison) was accepted
as a good fit to the human data; the minimum KL-divergence (¼0.0854) multiplied by
2N(¼2 · 40) equals 6.832, which did not exceed the critical value 14.07 (d.f. ¼ 7).

4.7. Evaluating metaphor theories by discriminant analysis

A linear discriminant analysis was conducted to reveal the metaphor properties that deter-
mine the choice of comprehension process. The dependent variable was whether the
selected model (i.e., the comprehension process) is categorization or comparison. The inde-
pendent variables comprised three metaphor properties, namely, vehicle conventionality,
metaphor aptness, and interpretive diversity, whose correlations were r ¼ .36 between con-
ventionality and aptness, r ¼ ).30 between conventionality and diversity, and r ¼ ).14
between aptness and diversity.
Table 9 shows the result of the discriminant analysis based on all the 40 metaphors. The
analysis yielded a significant discrimination function, Wilk’s lambda ¼ 0.70, F(3,36) ¼
5.08, p < .005. The function correctly classified 32 of 40 metaphors (80.0%), and the
kappa coefficient of agreement j ¼ 0.55 was significant, Z ¼ 3.10, p ¼ .002. The meta-
phors numbered 15, 16, 23, 24, 29, 35, 37, and 39 in Table 8 were not correctly classified.
The left table in Table 10 shows the classification table of the analysis. For the class of
282 A. Utsumi/Cognitive Science 35 (2011)

Table 9
Result of the non-cross-validated discriminant analysis for predicting the choice
between categorization and comparison models
Standardized coefficients
Vehicle conventionality 1.37 (p ¼ .0062)
Metaphor aptness )0.28 (p ¼ .54)
Interpretive diversity 1.47 (p ¼ .0022)
Wilk’s lambda 0.70
R2 0.30
Accuracy (correctly predicted) 0.80
Cohen’s kappa 0.55

Table 10
Classification tables of non-cross-validated and cross-validated discriminant analyses
Predicted Categories
Non-cross-validated Classification Cross-validated Classification
Actual Cat Com Total Cat Com Total
Cat 9 2 11 9 2 11
Com 6 23 29 7 22 29
Total 15 25 40 16 24 40
Note. Cat, categorization; Com, comparison.

categorization, recall was 0.82 (¼9/11) and precision was 0.60 (¼9/15). For the class of
comparison, recall was 0.79 (¼23/29) and precision was 0.92 (¼23/25). Therefore, the bal-
anced F-score (i.e., the harmonic mean of recall and precision) was 0.69 for the class of cat-
egorization and 0.85 for that of comparison.
Concerning the standardized discriminant coefficient for the three metaphor properties,
Table 9 demonstrates that interpretive diversity had the highest discriminant coefficient and
was significantly associated with the choice of comprehension process, F(1,36) ¼ 10.89,
p < .005. This result is consistent with the interpretive diversity view, which argues that
high-diversity metaphors are processed as categorizations and low-diversity metaphors are
processed as comparisons. Vehicle conventionality had the second-highest coefficient and
also reached statistical significance, F(1,36) ¼ 8.47, p < .01. This result is consistent with
the conventionality view, which predicts that conventional metaphors are processed as cate-
gorizations, whereas novel metaphors are processed as comparisons. On the other hand,
metaphor aptness did not affect the choice of comprehension process; its standardized coef-
ficient )0.28 was not significant. This result is not consistent with the aptness view, suggest-
ing that the choice of comprehension strategy may not depend on metaphor aptness. Hence,
the result of the discriminant analysis indicates that both the interpretive diversity view and
the conventionality view are plausible theories of metaphor comprehension.
This result was replicated by the cross-validated discriminant analysis, suggesting
that the finding on the importance of interpretive diversity and conventionality may be
A. Utsumi/Cognitive Science 35 (2011) 283

independent of the training data. The leave-one-out cross-validation method was used for
the cross-validated analysis, in which each metaphor was classified using a discriminant
function derived from the remaining 39 metaphors. The classification table for the cross-val-
idated analysis (the right table of Table 10) shows that 31 metaphors (77.5%) were classi-
fied correctly and the kappa coefficient of agreement j ¼ 0.51 was significant, Z ¼ 2.92,
p < .005. All the eight misclassified metaphors in the non-cross-validated analysis were also
misclassified in the cross-validated analysis, and additionally, the metaphor numbered 2 was
misclassified in the cross-validated analysis.
To compare the predictive ability between interpretive diversity and vehicle convention-
ality, I conducted a commonality analysis. Commonality analysis is a method of variation
partitioning by which one can calculate the proportions of variance in the dependent vari-
able associated uniquely with each of the independent variables (i.e., unique contributions
of independent variables to the prediction of the discriminant analysis), as well as the pro-
portions of variance attributed to various combinations of independent variables (i.e., com-
mon contributions of the combinations of variables). Table 11 shows the result of the
commonality analysis. Interpretive diversity made a larger unique contribution (0.212) to
predicting model selection than vehicle conventionality (0.165); this suggested that interpre-
tive diversity may be a more important factor in explaining the choice of comprehension
process. The negative common contribution ()0.080) of interpretive diversity and vehicle
conventionality indicates that they have no joint effect, or more concretely, that the two
variables are competitive in the sense that one variable hinders the contribution of the other
(Legendre & Legendre, 1998). In addition, I conducted two separate discriminant analyses,
one considered only interpretive diversity as the independent variable, whereas the other
considered only conventionality. The discriminant analysis with interpretive diversity
yielded a significant discrimination function, Wilk’s lambda ¼ 0.87, F(1,38) ¼ 5.64,
p < .05, which correctly classified 29 (72.5%) metaphors. The kappa coefficient j ¼ 0.42
showed moderate agreement and was significant, Z ¼ 2.56, p ¼ .01. On the other hand, the
discriminant analysis with conventionality did not yield a significant discrimination func-
tion, Wilk’s lambda ¼ 0.92, F(1,38) ¼ 3.08, p ¼ .09. The function correctly classified only
47.5% of the metaphors and the kappa coefficient was negative, which indicated that there
was no agreement between the prediction and the simulation experiment. These findings
suggest that interpretive diversity may be a better predictor of the metaphor comprehension
process.

Table 11
Unique and common contributions of three metaphor properties in accounting for the variance in the
choice of the metaphor comprehension process
Unique Contributions Common Contributions
ID VC AP ID and VC ID and AP VC and AP ID, VC, and AP Sum
0.212 0.165 0.007 )0.080 0.002 )0.004 )0.006 0.298
Note. ID, interpretive diversity; VC, vehicle conventionality; AP, aptness.
284 A. Utsumi/Cognitive Science 35 (2011)

Furthermore, to justify the finding of the simulation, I examined the effect of the image-
ability of metaphors (Marschark, Katz, & Paivio, 1983), that is, the ease with which the met-
aphorical sentence evokes mental imagery. Various metaphor studies (e.g., Marschark &
Hunt, 1985; Marschark et al., 1983; Paivio & Walsh, 1993) have addressed the role of men-
tal imagery in metaphor comprehension since the very beginning of metaphor research. If
metaphor imageability accounts for a significant portion of the variance in the discriminant
analysis, this weakens the validity of the finding that both the interpretive diversity view
and the conventionality view are plausible. The imageability of the 40 metaphors was rated
by 21 participants on a 7-point scale ranging from 1 (difficult to evoke mental imagery) to
7 (easy to evoke), and the mean imageability rating for each metaphor was used as the inde-
pendent variable of the discriminant analysis. Furthermore, lexical properties (i.e., vehicle
frequency, topic frequency, vehicle concreteness, topic concreteness, vehicle familiarity,
and topic familiarity) were also used as the independent variables. Word concreteness was
obtained by the rating study in which 11 participants rated 40 words used in the 40 meta-
phors on a 7-point scale of concreteness (1: abstract, 7: concrete). On the other hand, word
frequency and familiarity values were derived from the database of Japanese lexical proper-
ties ‘‘Nihongo No Goi Tokusei.’’ The result of the (non-cross-validated) discriminant analy-
sis was that none of these seven properties were significantly associated with the dependent
variable. This result indicates that the explanatory power of interpretive diversity and con-
ventionality is not attributable to these factors.
In sum, these results support the conclusion that both the interpretive diversity view and
the conventionality view are plausible theories of metaphor comprehension. Additionally,
interpretive diversity emerged as a better predictor of the metaphor comprehension process
in this study, mimicking the experimental results. However, simulations with other meta-
phor data may yield different conclusions.

5. General discussion

5.1. Implications of the simulation results

The simulation experiment reported in this article demonstrated that the interpretive
diversity view and the conventionality view are plausible, but it did not provide evidence
supporting the aptness view. This computational finding is consistent with the empirical
finding obtained by Utsumi (2007); in his psychological experiment (Experiment 1), both
interpretive diversity and vehicle conventionality were found to be significant predictors of
the choice of comprehension process. Therefore, I have obtained theoretical and experimen-
tal convergence on the conclusion that the interpretive diversity view and the conventional-
ity view are plausible theories of metaphor comprehension. The observed consistency
between empirical and theoretical findings also indicates that the computational methodol-
ogy of this study is potentially useful for providing a new insight into the cognitive pro-
cesses in metaphor comprehension and possibly language comprehension in general. If the
cognitive processes that are being explored can be appropriately modeled in the semantic-
A. Utsumi/Cognitive Science 35 (2011) 285

space-based framework, the maximum likelihood method can determine which processes
are plausible.
Why does interpretive diversity (or semantic richness) affect metaphor comprehension?
Utsumi (2007) has provided one possible answer in terms of the nature of categorization.
When people interpret that an entity X is a member of, or classified into, a category Y, entity
X is expected to share many salient features with category Y because the members of a cate-
gory inherit many features of the category by default. In other words, a semantically rich
entity is easy to categorize. Hence, the categorization process proceeds more easily when
more features of category Y can be attributed to X, that is, when a pairing of X and Y is
more diverse. As a result, interpretively diverse metaphors are comprehended via a categori-
zation process, whereas less diverse metaphors fail to be processed as categorizations, and
thus, they must be reinterpreted via a comparison process.
Empirical evidence for the effects of semantic richness or diversity has been established
by a number of studies on language comprehension. Rodd, Gaskell, and Marslen-Wilson
(2002) demonstrated that semantically rich words with many related senses facilitated word
recognition. Similarly, Pexman, Lupker, and Hino (2002) found a number-of-features effect,
that is, faster lexical decision responses, for words with many semantic features than words
with fewer semantic features. Pexman, Holyk, and Monfils (2003) demonstrated that the
number-of-features effect was also observed in the semantic categorization task; semanti-
cally richer words were more quickly judged as a member of a given category, and such an
effect was greater when a given category was broader, in other words, semantically richer.
Furthermore, Adelman et al. (Adelman & Brown, 2008; Adelman, Brown, & Quesada,
2006) have recently demonstrated that contextual diversity—the number of contexts in
which a word appears—affects lexical decision. As semantically richer words will be used
in more variable contexts, contextual diversity can also be considered as a measure of
semantic richness.

5.2. Semantic space model and the embodied theory of metaphor

As mentioned in Section 2, cognitive linguists have proposed that metaphor comprehen-


sion is fundamentally embodied (e.g., Gibbs, 2006; Kövecses, 2002; Lakoff & Johnson,
1980, 1999). According to the embodied theory of metaphor, metaphorical expressions can
be comprehended on the basis of conceptual metaphors, which are grounded in embodied
experiences. For example, to comprehend a verbal metaphor ‘‘I’m feeling up today,’’
people must know the conceptual metaphor Happy Is Up and it is acquired from their
experiential correlation between an affective state of happiness and an upright posture
(Lakoff & Johnson, 1999). Many abstract concepts are comprehended in the same way;
love can be understood as an act of traveling (e.g., Love Is A Journey), and theories can be
understood as physical structures (e.g., Theories Are Buildings). In addition, recent
research in cognitive science has also argued that language comprehension in general is
embodied (e.g., Barsalou, 1999, 2008; Pecher & Zwaan, 2005); the meaning of linguistic
symbols can be captured by grounding them in human perceptual experiences with the
environment. The embodied theory of language comprehension naturally implies that
286 A. Utsumi/Cognitive Science 35 (2011)

semantic space models such as LSA cannot simulate language comprehension in general
(Glenberg & Robertson, 2000; Zwaan & Yaxley, 2003) and metaphor comprehension in
particular because semantic space models are not embodied. If this is true, then this
study cannot provide any evidence concerning the cognitive mechanism of metaphor
comprehension.
Against the criticism of the inability of semantic space models made by the embodied
theory, this article responds in two ways. One way of defending the position that metaphor
comprehension can be computationally simulated by the semantic space models is to dem-
onstrate the ability of semantic space models in explaining linguistic phenomena that
according to the embodied theory cannot be explained by semantic space models. Although
it is still controversial whether semantic space models can represent knowledge based on
embodied experiences and whether they can explain embodied comprehension (de Vega,
Glenberg, & Graesser, 2008), many recent studies have demonstrated that semantic space
models such as LSA (or co-occurrence statistics) are capable of doing so (e.g., Kintsch,
2007, 2008b; Louwerse, 2007, 2008; Louwerse & Van Peer, 2009). For example, Louwerse
(2007) demonstrated that LSA can successfully distinguish non-afforded sentences (e.g.,
‘‘He used his glasses to dry his feet.’’) from afforded sentences (e.g., ‘‘He used his shirt to
dry his feet.’’) and related sentences (e.g., ‘‘He used his towel to dry his feet.’’); this is in
contrast to Glenberg and Robertson’s (2000) claim that LSA cannot capture such embodied
distinction. Furthermore, the assumption of the embodied metaphor theory that metaphorical
expressions are inevitably linguistic realizations of conceptual metaphors would imply that
linguistic co-occurrence can capture conceptual metaphors. For example, it is highly likely
that Happy Is Up encourages words expressing affective states and words expressing vertical
positions to co-occur in text. It follows that LSA can explain embodied metaphors. Indeed,
Mason (2004) revealed that many conceptual metaphors can be extracted automatically
from a large corpus.
Another way of defending the position that the semantic space models can simulate met-
aphor comprehension is to demonstrate that the role of embodiment in metaphor compre-
hension is more limited than expected by the embodied theory of metaphor. As mentioned
in Section 2.1.1, although there is little doubt that primary metaphors are embodied, it is
highly unclear whether any complex metaphors are embodied or whether they are necessary
for metaphor comprehension. Concerning the need for conceptual metaphors, some negative
findings have established that people do not necessarily comprehend metaphors depending
on conceptual metaphors (Glucksberg & McGlone, 1999; Keysar, Shen, Glucksberg, &
Horton, 2000; Murphy, 1996). Surprisingly, even Barsalou (1999), who adopts an embodied
view of cognition, pointed out that abstract concepts such as anger are directly grounded in
perceptual experience, without being mediated by conceptual metaphors. He also suggested
that conceptual metaphors were not required in this case; familiar or conventional meta-
phors may bypass conceptual metaphors.
From these discussions, it can be concluded that metaphor comprehension can be ade-
quately simulated by the computational models presented in this article. It can be asserted
that the criticism made by the embodied theories does not apply to the framework of this
study.
A. Utsumi/Cognitive Science 35 (2011) 287

5.3. Computational approaches to metaphor comprehension

Over the past few decades, a number of computational studies on metaphor comprehen-
sion have been conducted. Computational studies from the 1990s include computational dis-
crimination among metaphor, metonymy, anomaly, and literalness using lexical semantics
(Fass, 1991); comprehension of predicative metaphors using knowledge about conceptual
metaphors (Martin, 1992); and connectionist implementation of nominal metaphor compre-
hension (Thomas & Mareschal, 2001) and adjective metaphors (Weber, 1991).
This study essentially differs from these computational studies in that they did not test
the validity of their computational models in a systematic way; they provided only a small
number of examples, whose plausibility was judged on the basis of the researcher’s insight.
The reason behind this drawback was that the lexical, semantic, or metaphorical knowledge
used in these studies had to be manually coded by the researchers, and therefore, was small
in size.
In recent years, however, very large corpora have become easily available and corpus-
based computational studies on metaphor have been conducted. One corpus-based approach
to metaphor is to automatically build a large knowledge base on conceptual metaphor,
which is used for comprehending predicative metaphors (Martin, 1994; Mason, 2004),
particularly for the technical purpose of dealing with metaphors in an NLP system.
A more important and promising corpus-based approach is to develop a computa-
tional model of metaphor comprehension using a semantic space model constructed
from the statistical analysis of a huge corpus. A pioneering work that follows this
approach is Kintsch’s (2000, 2008a) computational model of metaphor comprehension
based on LSA. Kintsch applied his predication algorithm to metaphor comprehension
and demonstrated that the model can not only compute intuitively reasonable interpreta-
tions of metaphors but also account for some of the phenomena observed in metaphor
comprehension experiments, such as the nonreversibility of metaphors. However, he did
not test the model’s psychological plausibility either in a direct or systematic fashion;
in other words, he did not clarify how well the computed interpretation fits with human
data for metaphor interpretation. Lemaire and Bianco (2003) also employed LSA to
develop a computational model of referential metaphors for simulating the processing
time difference between a metaphorical reference and a literal reference. They modeled
the processing time of referential expressions as the depth of the search for those
neighbors of a referential expression that are also related to a given context. Using this
model, they showed that the simulation result was consistent with empirical data on the
processing time difference between a metaphorical reference and a literal reference
according to a different (literally supportive or metaphorically supportive) context.
However, their model has some limitations: It cannot compute the meaning of referen-
tial metaphors (i.e., the referent of metaphorical references). Thus, Lemaire and Bianco
have not addressed how well their model mimics human interpretations.
In contrast, the LSA-based approach to metaphor presented in this article differs from
these studies in two ways. First, this study employs a quantitative measure of the fit between
the model (i.e., computer interpretations of metaphors) and data (i.e., human interpretations
288 A. Utsumi/Cognitive Science 35 (2011)

of the same metaphors) to evaluate the degree to which the computational model imitates
human behavior concerning metaphor comprehension. Second, this study uses a computa-
tional methodology to provide an original contribution to the understanding of the cognitive
mechanisms of metaphor comprehension, rather than to simply retest or confirm the empiri-
cal findings. In other words, this study determines which of the metaphor views are more
plausible, by identifying the view that can best explain the result of the simulation in which
human behavior can be simulated by the model that embodies metaphor comprehension pro-
cess. In contrast, other LSA-based studies only showed whether human behavior could be
simulated by the model that may not embody existing metaphor views. As mentioned in
Section 5.1, the observed consistency between the existing empirical findings and the com-
putational finding of this study provides some support for the usefulness of the computa-
tional methodology of this study for metaphor research.

5.4. Limitations of the simulation method

The semantic-space-based methodology presented in this article has its limitations; these
limitations reveal the aspects of the simulation results that do not hold for other metaphor
views apart from those tested directly in the simulation experiment.
One important limitation is that the finding obtained in this study does not address the
subtle but crucial differences among various views on the comparison process described in
Section 2.1.1. A crucial perspective according to which these views are differentiated is the
kind of similarities that are preferentially included in the common structure obtained during
the comparison process. For example, Gentner’s structure mapping theory argues that the
comparison process primarily focuses on the relational similarities, whereas Holyoak’s
ACME and LISA argue that semantic and pragmatic similarities are required in the compar-
ison process. At present, the semantic-space-based methodology is less likely to provide an
appropriate technique to compare the plausibility of these views; Ramscar and Yarlett
(2003) suggested that a semantic space model such as LSA does not have sufficient model-
ing ability for analogical mapping, although it simulates appropriate patterns of analogical
retrieval. However, I am somewhat optimistic about this issue in that Kintsch (2008a) and
Mangalath, Quesada, and Kintsch (2004) show a possibility of LSA-based modeling of ana-
logical mapping.
Another limitation of the semantic-space-based methodology concerns the time-course of
metaphor comprehension. The semantic space framework is not suitable for simulating the
temporal behavior of the comprehension process, because it does not provide a method for
representing time. Although some product measures such as comprehension speed can be
simulated in this framework (e.g., Lemaire & Bianco, 2003), a fine-grained analysis of the
time-course using eye movement or functional brain mapping cannot be simulated. (Note
that it does not mean that the semantic space model cannot model cognitive processes. The
extent of time-course details specified is different between the semantic space model and
other models, such as the connectionist model [e.g., recurrent networks], that are much more
adequate for representing time.) This limitation may be serious for metaphor research, given
that a considerable number of metaphor studies (e.g., Gibbs, 1994; Giora, 2003) have been
A. Utsumi/Cognitive Science 35 (2011) 289

conducted on the time-course of literal and metaphorical comprehension. In particular, it is


impossible to examine whether two processes run serially or in parallel in the semantic
space framework. An interesting and efficient method to deal with the time-course in the
semantic space model is to integrate the connectionist models with the semantic space
model. Recurrent neural networks can exhibit dynamic temporal behavior of activation
computation, which can be analyzed as the time-course of language comprehension (e.g.,
Kawamoto, 1993; McRae, de Sa, & Seidenberg, 1997). Such neural networks generally
employ distributed representation; the vector representation of word meaning is a good
implementation of distributed representation. If a computational method for constructing a
metaphor vector (or a sentence vector in general), such as Categ and Compa algorithms, can
be implemented on a recurrent network, it provides an effective way to computationally
analyze the time-course of metaphor comprehension.
Examining whether the two metaphor comprehension processes (i.e., categorization and
comparison) run serially or in parallel is a very interesting topic for further research. All the
three hybrid views of metaphor tested in this article assume serial processing, but it is likely
that there is a race between two processes running in parallel.9 One possible version of a
race model would be that metaphor comprehension starts with both processes, and the com-
parison process is suppressed later when the categorization process works properly because
of the high diversity of a metaphor or its fast access to the conventional metaphoric catego-
ries; the comparison process wins the race only when the categorization process does not
work. Although the present semantic-space-based methodology cannot provide a useful tool
for investigating such the race model, it is worth pursuing this issue as a metaphor study as
well as a methodological study of semantic space models. The integration of the semantic
space model and the connectionist model mentioned earlier may offer an effective approach
to evaluating the race model.

6. Conclusion

A semantic space model, such as LSA, can provide an effective technique to simu-
late metaphor comprehension processes, such as categorization and comparison. Using
a semantic space model, this study has attempted to determine which of the existing
metaphor views, namely, the conventionality view (Bowdle & Gentner, 2005), aptness
view (Glucksberg & Haught, 2006b), or interpretive diversity view (Utsumi, 2007), is
most plausible. The simulation experiment, which comprised model selection and the-
ory testing, has shown that the interpretive diversity and conventionality views signi-
ficantly account for which of the categorization and comparison models fits better
with the empirical data; this finding indicates that both views are plausible. These
results are consistent with Utsumi’s (2007) empirical findings, and thus, they
strengthen the validity of the interpretive diversity and conventionality views. At the
same time, these results indicate the potential of the semantic-space-based computational
methodology for the cognitive study of language comprehension.
290 A. Utsumi/Cognitive Science 35 (2011)

Notes

1. Bowdle and Gentner (2005) refer to this view as the ‘‘career of metaphor’’ hypothesis.
This term emphasizes an evolutionary aspect of metaphor comprehension. When a
metaphor is first used (i.e., it is novel), it is comprehended strictly as comparison.
However, if this metaphor is repeatedly used to convey the same meaning, then this
repeated mapping process gives rise to the creation of an abstract category that
becomes associated with the vehicle. They refer to the process through which a vehi-
cle term becomes associated with a metaphoric category as conventionalization.
Hence, conventionalization results in an evolutionary shift from comparison to catego-
rization. Note that in this article I do not use the term ‘‘career of metaphor’’ to refer to
their view, because this study is not directly concerned with an evolutionary aspect of
metaphor comprehension.
2. Content words are words that primarily express lexical meanings; therefore, they can
be represented as vectors in a semantic space. On the other hand, function words, such
as articles, auxiliary verbs, and pronouns, which primarily express grammatical rela-
tionships, are not represented because they have little lexical meaning. Grammatical
functions should not be attributed to the vector representation; they should be consid-
ered in the method for generating a vector representation of a sentence.
3. Formally, given that tfij is the frequency of the ith word wi in the jth document (e.g.,
paragraph) and R is the number of documents, the jth element wij of the word vector
for the word wi is computed by the following formulas:
PR !
k¼1 Pik log Pik
wij ¼ tfij  1 þ ;
log R

tfij
Pij ¼ PR :
k¼1 tfik

4. The step of picking up k neighbors with the highest cosine to A is an approxima-


tion of the original predication algorithm. In the original predication algorithm, a
spreading activation network along the lines of the construction–integration model
(Kintsch, 1998) is constructed; this network comprises A, P, and m nearest neigh-
bors of P (or all other terms in the semantic space). In this network, each term is
connected to A and P with the cosine similarity between the two nodes as a
weight and is also connected to every other term by an inhibitory link. However,
Kintsch (2000, 2001) suggested that an approximation yields quite the same result
as such a self-inhibitory network, but reduces the processing cost of spreading
activation. Therefore, similar to Kintsch (2000, 2001), I have used this approxima-
tion. Note that this approximation is also supported by the recent finding (Rowe
& McNamara, 2008) that inhibition needs no negative links in the construction–
integration model.
A. Utsumi/Cognitive Science 35 (2011) 291

5. Table 3 (and other tables) lists some phrases comprising multiple words (e.g., ‘‘dis-
ease onset,’’ ‘‘drug disaster,’’ ‘‘blood sampling’’), which appear inconsistent with the
assumption that the semantic space model can only represent vectors for single words.
However, the Japanese translations of these phrases are single words (e.g.,
‘‘hatsubyo,’’ ‘‘yakugai,’’ ‘‘saiketsu’’), and thus, an inconsistency does not actually
occur.
6. These values are computed by using the semantic space employed in the simulation
experiment, which will be presented in Section 4. The Categ algorithm computed a
sentence vector in Fig. 1 (and also in Fig. 2) with m ¼ 20 and k ¼ 3, whereas the
Compa algorithm computed a sentence vector with k ¼ 3. Kintsch (2001) suggests
that these parameter values work effectively for literal sentences.
7. Using some examples, Kintsch (2001) also showed that the predication algorithm
works well for a model of categorization. For example, he demonstrated that the
vector for ‘‘A pelican is a bird’’ computed by the predication algorithm became
more similar to the features related to bird (e.g., sing beautifully) and less similar
to the features irrelevant to bird (e.g., eat fish and sea) than the original vector of
pelican.
8. In this article, I use a generic notation without a subscript indicating the algorithm
(e.g., h instead of hcat and hcom, and q instead of qcat and qcom), if the description is
applicable to both algorithms or models.
9. I thank one of the reviewers for suggesting the possibility of a race between
categorization and comparison.

Acknowledgments

This study was supported by a Grant-in-Aid for Scientific Research C (No. 17500171 and
No. 20500234), The Ministry of Education, Culture, Sports, Science and Technology. I
thank the associate editor Danielle S. McNamara and four anonymous reviewers for their
insightful comments and suggestions, which helped me improve the article.

References

Adelman, J., & Brown, G. (2008). Modeling lexical decision: The form of frequency and diversity effects.
Psychological Review, 115, 214–229.
Adelman, J., Brown, G., & Quesada, J. (2006). Contextual diversity, not word frequency, determines word-
naming and lexical decision times. Psychological Science, 17, 814–823.
Barsalou, L. (1999). Perceptual symbol systems. Behavioral and Brain Sciences, 22, 577–660.
Barsalou, L. (2008). Grounded cognition. Annual Review of Psychology, 59, 617–645.
Bishop, C. (2006). Pattern recognition and machine learning. New York: Springer.
Blasko, D., & Connine, C. (1993). Effects of familiarity and aptness on metaphor understanding. Journal of
Experimental Psychology: Learning, Memory, and Cognition, 19(2), 295–308.
Bowdle, B., & Gentner, D. (2005). The career of metaphor. Psychological Review, 112, 193–216.
292 A. Utsumi/Cognitive Science 35 (2011)

Cameron, L. (2008). Metaphor and talk. In R. Gibbs (Ed.), The Cambridge handbook of metaphor and thought
(pp. 197–211). New York: Cambridge University Press.
Carston, R. (2002). Thoughts and utterances: The pragmatics of explicit communication. Oxford, England:
Blackwell.
Chiappe, D., & Kennedy, J. (1999). Aptness predicts preference for metaphors or similes, as well as recall bias.
Psychonomic Bulletin & Review, 6, 668–676.
Chiappe, D., Kennedy, J., & Smykowski, T. (2003). Reversibility, aptness, and the conventionality of metaphors
and similes. Metaphor and Symbol, 18, 85–105.
Clausner, T., & Croft, W. (1997). Productivity and schematicity in metaphors. Cognitive Science, 21,
247–282.
Cover, T., & Thomas, J. (2006). Elements of information theory (2nd ed.). Hoboken, NJ: John Wiley & Sons.
Deerwester, S., Dumais, S., Furnas, G., Landauer, T., & Harshman, R. (1990). Indexing by latent semantic
analysis. Journal of the American Society for Information Science, 41, 391–407.
Fass, D. (1991). Met*: A method for discriminating metonymy and metaphor by computer. Computational
Linguistics, 17, 49–90.
Foltz, P., Kintsch, W., & Landauer, T. (1998). The measurement of textual coherence with latent semantic
analysis. Discourse Processes, 25, 285–307.
Gentner, D. (1983). Structure mapping: A theoretical framework for analogy. Cognitive Science, 7, 155–170.
Gentner, D. (1989). The mechanisms of analogical learning. In S. Vosniadou & A. Ortony (Eds.), Similarity and
analogical reasoning (pp. 199–241). Cambridge, England: Cambridge University Press.
Gentner, D., & Bowdle, B. (2008). Metaphor as structure mapping. In R. Gibbs (Ed.), The Cambridge handbook
of metaphor and thought (pp. 109–128). New York: Cambridge University Press.
Gentner, D., & Markman, A. (1997). Structure mapping in analogy and similarity. American Psychologist, 52,
45–56.
Gentner, D., & Wolff, P. (1997). Alignment in the processing of metaphor. Journal of Memory and Language,
37, 331–355.
Gentner, D., Bowdle, B., Wolff, P., & Boronat, C. (2001). Metaphor is like analogy. In D. Gentner, K. Holyoak,
& B. Kokinov (Eds.), Analogical mind: Perspectives from cognitive science (pp. 199–253). Cambridge, MA:
MIT Press.
Gibbs, R. (1994). The poetics of mind. Cambridge, England: Cambridge University Press.
Gibbs, R. (2006). Embodiment and cognitive science. New York: Cambridge University Press.
Gibbs, R. W. (Ed.). (2008). The Cambridge handbook of metaphor and thought. New York: Cambridge
University Press.
Giora, R. (2003). On our mind: Salience, context, and figurative language. New York: Oxford University Press.
Glenberg, A., & Robertson, D. (2000). Symbol grounding and meaning: A comparison of high-dimensional and
embodied theories of meaning. Journal of Memory and Language, 43, 379–401.
Glucksberg, S. (2001). Understanding figurative language: From metaphors to idioms. New York: Oxford
University Press.
Glucksberg, S. (2003). The psycholinguistics of metaphor. Trends in Cognitive Sciences, 7, 92–96.
Glucksberg, S., & Haught, C. (2006a). Can Florida become like the next Florida? When metaphoric comparisons
fail. Psychological Science, 17, 935–938.
Glucksberg, S., & Haught, C. (2006b). On the relation between metaphor and simile: When comparison fails.
Mind & Language, 21, 360–378.
Glucksberg, S., & Keysar, B. (1990). Understanding metaphorical comparisons: Beyond similarity. Psycho-
logical Review, 97, 3–18.
Glucksberg, S., & McGlone, M. (1999). When love is not a journey: What metaphors mean. Journal of Pragmat-
ics, 31, 1541–1558.
Glucksberg, S., McGlone, M., & Manfredi, D. (1997). Property attribution in metaphor comprehension. Journal
of Memory and Language, 36, 50–67.
Grady, J. (1997). Theories are buildings revisited. Cognitive Linguistics, 8, 267–290.
A. Utsumi/Cognitive Science 35 (2011) 293

Grady, J. (2005). Primary metaphors as inputs to conceptual integration. Journal of Pragmatics, 37, 1595–1614.
Holyoak, K. J., & Thagard, P. (1989). Analogical mapping by constraint satisfaction. Cognitive Science, 13,
295–355.
Hu, B., Kalfoglou, Y., Alani, H., Dupplaw, D., Lewis, P., & Shadbolt, N. (2006). Semantic metrics. In S. Staab
& V. Svatek (Eds.), Proceedings of the 15th international conference on knowledge engineering and knowl-
edge management (EKAW 2006) (pp. 166–181). Berlin: Springer.
Huettig, F., Quinlan, P. T., McDonald, S. A., & Altmann, G. T. (2006). Models of high-dimensional
semantic space predict language-mediated eye movements in the visual world. Acta Psychologica, 121,
65–80.
Hummel, J. E., & Holyoak, K. J. (1997). Distributed representations of structure: A theory of analogical access
and mapping. Psychological Review, 104, 427–466.
Jones, L., & Estes, Z. (2005). Metaphor comprehension as attributive categorization. Journal of Memory and
Language, 53, 110–124.
Jones, L., & Estes, Z. (2006). Roosters, robins, and alarm clocks: Aptness and conventionality in metaphor
comprehension. Journal of Memory and Language, 55, 18–32.
Jones, M. N., Kintsch, W., & Mewhort, D. J. (2006). High-dimensional semantic space accounts of priming.
Journal of Memory and Language, 55, 534–552.
Kawamoto, A. (1993). Nonlinear dynamics in the resolution of lexical ambiguity: A parallel distributed process-
ing account. Journal of Memory and Language, 32, 474–516.
Keysar, B., Shen, Y., Glucksberg, S., & Horton, W. (2000). Conventional language: How metaphorical is it?
Journal of Memory and Language, 43, 576–593.
Kintsch, W. (1998). Comprehension: A paradigm for cognition. New York: Cambridge University Press.
Kintsch, W. (2000). Metaphor comprehension: A computational theory. Psychonomic Bulletin & Review, 7,
257–266.
Kintsch, W. (2001). Predication. Cognitive Science, 25, 173–202.
Kintsch, W. (2007). Meaning in context. In T. Landauer, D. McNamara, S. Dennis, & W. Kintsch (Eds.), Hand-
book of latent semantic analysis (pp. 89–105). Mahwah, NJ: Lawrence Erlbaum Associates.
Kintsch, W. (2008a). How the mind computes the meaning of metaphor: A simulation based on LSA. In
R. Gibbs (Ed.), The Cambridge handbook of metaphor and thought (pp. 129–142). New York: Cambridge
University Press.
Kintsch, W. (2008b). Symbol systems and perceptual representations. In M. de Vega, A. Glenberg, & A.
Graesser (Eds.), Symbols and embodiment: Debates on meaning and cognition (pp. 145–163). New York:
Oxford University Press.
Kövecses, Z. (2002). Metaphor: A practical introduction. New York: Oxford University Press.
Lakoff, G., & Johnson, M. (1980). Metaphors we live by. Chicago: The University of Chicago Press.
Lakoff, G., & Johnson, M. (1999). Philosophy in the flesh: The embodied mind and its challenge to western
thought. New York: Basic Books.
Landauer, T. K., & Dumais, S. T. (1997). A solution to Plato’s problem: The latent semantic analysis theory of
the acquisition, induction, and representation of knowledge. Psychological Review, 104, 211–240.
Landauer, T. K., Laham, D., & Foltz, P. W. (2003). Automated scoring and annotation of essays with the Intelli-
gent Essay Assessor. In M. Shermis & J. Burstein (Eds.), Automated essay scoring: A cross-disciplinary
perspective (pp. 87–112). Mahwah, NJ: Lawrence Erlbaum Associates.
Landauer, T., McNamara, D., Dennis, S., & Kintsch, W. (2007). Handbook of latent semantic analysis. Mahwah,
NJ: Lawrence Erlbaum Associates.
Larkey, L. B., & Love, B. C. (2003). CAB: Connectionist analogy builder. Cognitive Science, 27, 781–
794.
Legendre, P., & Legendre, L. (1998). Numerical ecology, second english edition. Amsterdam: Elsevier Science
B.V.
294 A. Utsumi/Cognitive Science 35 (2011)

Lemaire, B., & Bianco, M. (2003). Contextual effects on metaphor comprehension: Experiment and simulation.
In F. Detje, D. Dörner, & H. Schaub (Eds.), Proceedings of the 5th international conference on cognitive
modeling (ICCM2003) (pp. 153–158). Germany: Universitäts-Verlag Bamberg.
Lin, D. (1998). Automatic retrieval and clustering of similar words. In Proceedings of the 17th international
conference on computational linguistics and the 36th annual meeting on association for computational lin-
guistics (pp. 768–774). Montreal, Canada: ACL.
Louwerse, M. (2007). Symbolic or embodied representations: A case for symbol interdependency. In
T. Landauer, D. McNamara, S. Dennis, & W. Kintsch (Eds.), Handbook of latent semantic analysis (pp.
107–120). Mahwah, NJ: Lawrence Erlbaum Associates.
Louwerse, M. (2008). Embodied relations are encoded in language. Psychonomic Bulletin & Review, 15, 838–
844.
Louwerse, M., & Van Peer, W. (2009). How cognitive is cognitive poetics? The interaction between symbolic
and embodied cognition. In G. Brone & J. Vandaele (Eds.), Cognitive poetics: Goals, gains and gaps (pp.
423–444). Berlin: Mouton de Gruyter.
Lowe, W., & McDonald, S. (2000). The direct route: Mediated priming in semantic space. In L. Gleitman &
A. Joshi (Eds.), Proceedings of the 22nd annual meeting of the cognitive science society (pp. 806–811).
Austin, TX: Cognitive Science Society.
Mangalath, P., Quesada, J., & Kintsch, W. (2004). Analogy-making as predication using relational information
and LSA vectors. In K. Forbus, D. Gentner, & T. Regier (Eds.), Proceedings of the 26th Annual Meeting of
the Cognitive Science Society (CogSci2004) (p. 1623). Austin, TX: Cognitive Science Society.
Manning, C. D., & Schütze, H. (1999). Foundations of statistical natural language processing. Cambridge, MA:
MIT Press.
Marschark, M., & Hunt, R. (1985). On memory for metaphor. Memory and Cognition, 13, 413–424.
Marschark, M., Katz, A., & Paivio, A. (1983). Dimensions of metaphor. Journal of Psycholinguistic Research,
12(1), 17–40.
Martin, J. (1992). Computer understanding of conventional metaphoric language. Cognitive Science, 16,
233–270.
Martin, J. (1994). Metabank: A knowledge-base of metaphoric language conventions. Computational Intelli-
gence, 10, 134–149.
Mason, Z. (2004). CorMet: A computational, corpus-based conventional metaphor extraction system. Computa-
tional Linguistics, 30, 23–44.
McClelland, J. (2009). The place of modeling in cognitive science. Topics in Cognitive Science, 1, 11–38.
McRae, K., de Sa, V. R., & Seidenberg, M. S. (1997). On the nature and scope of featural representations of
word meaning. Journal of Experimental Psychology: General, 126(2), 99–130.
Murphy, G. (1996). On metaphoric representation. Cognition, 60, 173–204.
Padó, S., & Lapata, M. (2007). Dependency-based construction of semantic space models. Computational
Linguistics, 33, 161–199.
Paivio, A., & Walsh, M. (1993). Psychological processes in metaphor comprehension and memory. In A. Ortony
(Ed.), Metaphor and thought, second edition (pp. 307–328). Cambridge, England: Cambridge University
Press.
Pecher, D., & Zwaan, R. (2005). Grounding cognition: The role of perception and action in memory, language
and thinking. Cambridge, England: Cambridge University Press.
Pexman, P., Lupker, S., & Hino, Y. (2002). The impact of feedback semantics in visual word recognition: Num-
ber-of-features effects in lexical decision and naming tasks. Psychonomic Bulletin & Review, 9, 542–549.
Pexman, P., Holyk, G., & Monfils, M.-H. (2003). Number-of-features effects and semantic processing. Memory
& Cognition, 31, 842–855.
Ramscar, M., & Yarlett, D. (2003). Semantic grounding in models of analogy: An environmental approach.
Cognitive Science, 27, 41–71.
Rodd, J., Gaskell, G., & Marslen-Wilson, W. (2002). Making sense of semantic ambiguity: Semantic
competition in lexical access. Journal of Memory and Language, 46, 245–266.
A. Utsumi/Cognitive Science 35 (2011) 295

Rowe, M., & McNamara, D. (2008). Inhibition needs no negativity: Negative links in the construction-
integration model. In B. Love, K. McRae, & V. Sloutsky (Eds.), Proceedings of the 30th Annual Meeting of
the Cognitive Science Society (CogSci2008) (pp. 1777–1782). Austin, TX: Cognitive Science Society.
Schütze, H. (1998). Automatic word sense discrimination. Computational Linguistics, 24, 97–123.
Shahnaz, F., Berry, M., Pauca, V., & Plemmons, R. (2006). Document clustering using nonnegative matrix
factorization. Information Processing and Management, 42, 373–386.
Smith, E., Osherson, D., Rips, L., & Keane, M. (1988). Combining prototypes: A selective modification model.
Cognitive Science, 12, 485–527.
Sperber, D., & Wilson, D. (1995). Relevance: Communication and cognition second edition. Oxford, England:
Blackwell.
Sperber, D., & Wilson, D. (2008). A deflationary account of metaphors. In R. Gibbs (Ed.), The Cambridge
handbook of metaphor and thought (pp. 84–105). New York: Cambridge University Press.
Thomas, M., & Mareschal, D. (2001). Metaphor as categorization: A connectionist implementation. Metaphor
and Symbol, 16, 5–27.
Tversky, A. (1977). Features of similarity. Psychological Review, 84, 327–352.
Utsumi, A. (2005). The role of feature emergence in metaphor appreciation. Metaphor and Symbol, 20,
151–172.
Utsumi, A. (2007). Interpretive diversity explains metaphor-simile distinction. Metaphor and Symbol, 22,
291–312.
de Vega, M., Glenberg, A., & Graesser, A. (2008). Symbols and embodiment: Debates on meaning and cogni-
tion. New York: Oxford University Press.
Wagenmakers, E., & Farrell, S. (2004). AIC model selection using Akaike weights. Psychonomic Bulletin &
Review, 11, 192–196.
Weber, S. (1991). A connectionist model of literal and figurative adjective noun combinations. In D. Fass,
E. Hinkelman, & J. Martin (Eds.), Proceedings of the IJCAI workshop on computational approaches to non-
literal language: Metaphor, metonymy, idioms, speech acts, implicature (pp. 151–160). Sydney, Australia:
IJCAI.
Widdows, D. (2004). Geometry and meaning. Stanford, CA: CSLI Publications.
Wolff, P., & Gentner, D. (2000). Evidence for role-neutral initial processing of metaphors. Journal of Experi-
mental Psychology: Learning, Memory, and Cognition, 26, 529–541.
Zwaan, R., & Yaxley, R. (2003). Spatial iconicity affects semantic relatedness judgments. Psychonomic Bulletin
& Review, 10, 954–958.

Appendix A: Procedure for obtaining human interpretation data

In this appendix, I describe in detail the procedure of the psychological experiment


(Experiment 2) conducted by Utsumi (2007), which is the source of human interpretation
data for this study.
The experiment comprised metaphor comprehension, simile comprehension, and the rat-
ing of topic–vehicle pairs Utsumi (2007). (Simile comprehension is not described next
because this study did not use any results related to simile comprehension.) In the experi-
ment for metaphor comprehension, 42 undergraduate students of Japan Women’s Univer-
sity, who were all native speakers of Japanese, were assigned two metaphors that shared
neither vehicles nor topics (e.g., ‘‘Anger is the sea’’ and ‘‘Sleep is a storm’’) from each of
the 10 groups; therefore, each participant comprehended 20 metaphors from the total of
40. Metaphors of each group were counterbalanced such that they were assigned to 21
296 A. Utsumi/Cognitive Science 35 (2011)

participants. Participants performed three subtasks, namely, a feature listing task, a free
description task, and a comprehensibility rating task; however, this study used only the data
obtained in the feature listing task. In the feature listing task, participants were asked to con-
sider the meaning of each metaphor and list three or more features (i.e., meanings) of the
topic that they thought were involved in the interpretation of metaphors by words or
phrases.
After the metaphor comprehension experiment, the following preprocessing was
conducted for each metaphor M to obtain the final list of metaphorical meaning W(M). First,
a list of features generated in the metaphor comprehension experiment was generated for
each metaphor textitM. Then, closely related words or phrases in the generated list of
features were accepted as the same feature, if they met any of the following four criteria: (a)
they belonged to the same deepest category of a Japanese thesaurus Bunrui Goi Hyo (e.g.,
kakasenai and hitsuyoufukaketsu in Japanese, both of which mean being indispensable); (b)
they shared the same root form (e.g., red [akai in Japanese] and redness [akasa in
Japanese]); (c) they differed only in degree because of an intensive modifier (e.g., frightened
and quite frightened); or (d) a dictionary description of one word included the other word or
phrase (e.g., lie and not true). After this feature combination process, any feature mentioned
by only one participant was eliminated from the list of features. The amended list of features
according to this preprocessing was used as a list of meanings W(M) in this study.
The rating experiment comprised three rating tasks (vehicle conventionality, metaphor
aptness, and similarity) for metaphors and similes (Utsumi, 2007). The simulation experi-
ment in this study required only the conventionality and aptness ratings of metaphor. For
vehicle conventionality and metaphor aptness, 144 Japanese undergraduate students of the
University of Electro-Communications were recruited and assigned 10 metaphors. One half
of these students performed the conventionality rating task, whereas the other half per-
formed the aptness rating task. In the conventionality rating task, participants were given a
list of the vehicle terms of the metaphors and the most salient meaning and asked to rate
how conventional each meaning was as an alternative sense of the vehicle term on a scale of
1 (very novel) to 7 (very conventional). For example, as the meaning ephemeral was listed
by the largest number of participants for ‘‘Death is the fog,’’ the participants of this task
were asked the following question: ‘‘When we say that something (X) is the fog, how
conventional is the interpretation that this is something (X) that is ephemeral?’’ This method
of assessing vehicle conventionality was identical to the method used by Bowdle and
Gentner (2005). In the aptness rating task, participants were asked to rate how apt each
metaphor was, on a 7-point scale ranging from 1 (not at all apt) to 7 (extremely apt).
Following previous research (Jones & Estes, 2006), this study defined aptness as the extent
to which the metaphor captured the important features of the topic. These ratings were then
averaged across participants for each metaphor.

You might also like