You are on page 1of 20

21

MEASURING REASONING ABILITY


OLIVER WILHELM
DEDUCTIVE AND INDUCTIVE REASONING
Reasoning is a thinking activity that is of crucial
importance throughout our lives. Consequen-
tially, the ability to reason is of central impor-
tance in all major theories of intelligence structure.
Whenever we think about the causes of events
and actions, when we pursue discourse, when we
evaluate assumptions and expectations based on
our prior knowledge, and when we develop ideas
and plans, the ability to reason is pivotal.
The verb reason is associated with various
highly overlapping meanings. Justifying and
supporting concepts and ideas is as important as
convincing others through good reasons and the
discovery of conclusions through the analysis
of discourse. In modern psychology, usually two
to three forms of reasoning are distinguished. In
deductive reasoning, we derive a conclusion that
is necessarily true if the premises are true. In
inductive reasoning, we try to infer information
by increasing the semantic content when pro-
ceeding from the premises to the conclusion.
Sometimes, a third form of reasoning is distin-
guished (Magnani, 2001). In abductive reason-
ing, we reason from a fact to the action that has
caused it. Abductive reasoning has not been
thoroughly investigated in intelligence research,
and we can consider abductive reasoning to be a
subset and mixture of inductive and deductive
reasoning. In the remainder of this chapter,
abductive reasoning will not be discussed.
In deduction, the premises necessarily entail
or imply the conclusion. It is impossible that the
premises are true and that the conclusion is
false. Three perspectives on deduction can be
distinguished. From a syntactic perspective, the
relation between premises and conclusion is
derivable independent of the instantiation of the
premises. The criterion for the correctness of an
argument is its derivability from the premises.
From a semantic perspective, the conclusion is
true in any possible model of the premises. The
criterion for the correctness of an argument is its
validity. From a pragmatic perspective, there is
a learned or acquired relation between premises
and conclusion that has no logical necessity.
The criterion to assess the quality of an argu-
ment is its utility.
These perspectives cannot be applied to
induction because the criteria to assess conclu-
sions must be different. Carnaps formalization
has attracted considerable attention when it
comes to distinguishing forms of induction.
Carnap (1971) classifies inductive arguments as
enumerative and eliminative. In enumerative
induction, the premises assert something as true
of a finite number of specific objects or subjects,
and the conclusion infers that what is true for
the finite number is true of all such objects or
subjects. In eliminative induction, confirmation
proceeds by falsifying competing alternative
hypotheses. The problem with induction is that
we cannot prove for any inductive inference
373
21-Wilhelm.qxd 9/8/2004 5:09 PM Page 373
with true premises that the inference provides
us with a true conclusion (Stegmller, 1996).
Nevertheless, induction is of crucial importance
in science whenever we talk about discovery.
However, the testing of theories is a completely
deductive enterprise. Induction and deduction
hence both have their place in science, and the
ability to draw good inductive and deductive
inferences is of major importance in real life.
Historically, logic was primarily established
through Aristotle. Although Aristotle viewed
logic as the proper form of scientific investiga-
tion, he used the term as equivalent to verbal
reasoning. The syllogistic form of reasoning, as
established through Aristotle, dominated logic up
until the middle of the 19th century. Throughout
the second half of the 19th century, there was a
rapid development of logic as a scientific disci-
pline. Philosophers such as George Boole (1847)
and Gottlob Frege (1879) started to develop for-
malizations of deductive logic as a language that
went beyond the idea that logic should reflect
common sense and sound reasoning. In a nut-
shell, logic was the manipulations of symbols by
virtue of a set of rules. The logical truth of an
argument was hence no longer assessed by agree-
ment with some experts or through acceptance by
common sense. Whether logical reasoning was
correct could then be assessed by agreement with
a calculus. In our historical excursion, we need to
note, however, that George Boole did believe that
the laws of thinking and the rules of logic are
equivalent, and John Stuart Mill thought that the
rules of logic are generalizations of forms of
conclusions considered true by humans.
Apparently from the early days of logic to
now, the puzzle remains that although humans
invented logic, they are not able or willing to
follow its standards in all instances. Humans are
vulnerable to errors in reasoning and do not pro-
ceed consistently in deriving conclusions. The
research on biases, contents, and strategies in
reasoning has a long tradition in psychology.
For example, String (1908) investigated
thought processes in syllogistic reasoning and
distinguished various strategies, Wilkins (1929)
manipulated test content and observed effects
on test properties, and Woodworth and Sells
(1935) conducted outstanding research on a
particular bias in syllogistic reasoning labeled
the atmosphere effect.
In contemporary psychological research on
reasoning, so-called dual-process theories domi-
nate. In these theories, an associative, heuristic,
implicit, experiential, and intuitive system of
information processing is contrasted with a
second rule-based, analytical, explicit, and
rational system (Epstein, 1994; Evans, 1989;
Hammond, 1996; Sloman, 1996; Stanovich, 1999).
Most of the biases found in reasoning, judgment,
and decision making can be located within
the first system. A reasoning competence and
propensity to think rationally can be located
within the second system. In considering individ-
ual differences in reasoning ability, the interest is
primarily on differences within the second sys-
tem. Most of the differences could reflect indi-
vidual differences in available resources for the
computational work to be accomplished to obtain
a correct response. An additional source of indi-
vidual differences might be the probability with
which individuals deliberately use the second
system when responding to specific problems.
The discussion of individual differences in
reasoning ability starts with the assertion that
there are individual differences in the ability
to reason according to some rational standard.
Humans can be rational in principle, but they
fail to a varying degree in practice. The princi-
ple governing this rationality is that people
accept inferences as valid if there is no mental
representation contradicting the conclusion
(Johnson-Laird & Byrne, 1993; Stanovich,
1999). Individual differences from this per-
spective primarily arise from restrictions in the
ability to create and manipulate mental repre-
sentations. In other words, depending on our
cognitive apparatus, we are able to find a good,
or the correct, answer to some reasoning prob-
lems but not to other more difficult problems.
In measuring reasoning ability, it is conse-
quently assumed that individuals can think
rationally but that there are individual differ-
ences in how well people can do so.
THOUGHT PROCESSES IN REASONING
There are several competing theories for the
description and explanation of reasoning
processes. The theories are distinguished by the
broadness of the phenomena they can explain
374HANDBOOK OF UNDERSTANDING AND MEASURING INTELLIGENCE
21-Wilhelm.qxd 9/8/2004 5:09 PM Page 374
and how profound the proposed explanations
are. They are also different with respect to how
much experimental research was done to inves-
tigate them and how much supportive evidence
was collected. The theory of mental models
(Johnson-Laird & Byrne, 1991) is one outstand-
ing effort in describing and explaining what
people do when they reason, and this theory will
be described in more detail after briefly review-
ing more specific accounts of deductive and
inductive reasoning, respectively.
Besides many more specific accounts of rea-
soning, the mental logic approach to reasoning
has many adherents and was applied to a broad
range of reasoning problems (Rips, 1994).
According to mental logic theories, individuals
apply schemata of inference when they reason.
Errors in reasoning occur when inference
schemata are unavailable, corrupted, or cannot
be applied. More complex inferences are
accomplished by compiling several elemental
schemata. The inference schemata in various
mental logic theories are different from each
other, from logical terms in natural language,
and from logical terms in formal logic. The
psychology of proof by Rips (1994) is the
most elaborated and sophisticated theory of
mental logic. However, the mental model theory
covers a broader range of phenomena than
mental logic accounts do. In addition, the exper-
imental support seems to be in favor of the
mental models theory. Finally, both sets of
theories are closely related with each otherthe
major difference being that the mental model
approach deals with reasoning on the semantic
level, whereas mental logic theories investigate
reasoning on the syntactic level.
Analogical reasoning is a subset of inductive
thinking that has received considerable attention
in cognitive psychology. For example, Holyoak
and Thagard (1997) developed a multiconstraint
theory of analogical reasoning. Three con-
straints are claimed to create coherence in ana-
logical thought: similarity between the concepts
involved; structural parallelsspecifically,
isomorphismbetween the functions in the
source and target domains; and guidance by the
reasoners goals. This work was recently
extended. Hummel and Holyoak (2003) devel-
oped a symbolic connectionist model of rela-
tional inference. The theory suggests that
distributed symbolic representations are the
basis of relational reasoning in working memory.
There is no doubt substantial promise in extend-
ing these accounts of inductive thinking to
available reasoning measures. So far, there is not
enough experimental evidence available allow-
ing derivation of predictions of item difficulties
(but see Andrews & Halford, 2002), and there is
not enough variability in the application of the
theories to allow a broad application in pre-
dicting psychometric properties of reasoning
tests in general. To illustrate the character and
promise of theories of reasoning processes,
I will limit the exposition to the mental model
theory. It is hoped that the future will bring an
integration of theories of inductive and deduc-
tive reasoning along with strong links to
theories of working memory.
The mental model theory has been exten-
sively applied to deductive reasoning (Johnson-
Laird, 2001; Johnson-Laird & Byrne, 1991)
and inductive thinking (Johnson-Laird, 1994b).
Briefly, mental model theory views thinking
as the manipulation of models (Craik, 1943).
These models are analogous representations,
meaning that the structure of the models corre-
sponds to what they represent. Each entity is
represented by an individual token in a model.
Properties of and relations between entities
are represented by properties of and relations
between tokens, respectively. Negations of
atomic propositions are represented as annota-
tions of tokens. Information can be represented
implicitly, and the implicit status of a model is
part of the representation. If necessary, implicit
representations can be fleshed out by simple
mechanisms. The epistemic status of a model
is represented as a propositional annotation in
the model.
A major determinant of the difficulty of rea-
soning tasks is the number of mental models
that are compatible with the premises. The
premises A is left of B. B is left of C. C is left
of D. D is left of E. can be easily integrated into
one mental model:
A B C D E
This mental model supports conclusions
such as C is left of E. However, the premises
A is left of B. B is left of C. C is left of E. D is
Measuring Reasoning Ability375
21-Wilhelm.qxd 9/8/2004 5:09 PM Page 375
left of E. call for the construction of two mental
models. The first mental model places C left of
D, whereas the second mental model places D
left of C.
Model 1: A B C D E
Model 2: A B D C E
Both models are compatible with the
premises. Generally, the more mental models
that are compatible with the premises of a rea-
soning task, the harder the task will be. This pre-
diction has been confirmed with a wide variety
of reasoning tasks, including syllogisms, spatial
and temporal reasoning, propositional reason-
ing, and probabilistic reasoning (Johnson-Laird,
1994a; Johnson-Laird & Byrne, 1991). In estab-
lished measures of reasoning ability, it is hard or
impossible to specify the nature and number of
mental models a given item calls for (Yang &
Johnson-Laird, 2001). This is because test con-
struction is usually driven by applying psycho-
metric criteria and not by creating indicators
through the strictly theory-driven derivation
from a cognitive model of thought processes. In
specifically constructed measures, on the other
hand, the nature and number of mental models
that participants need to construct in order to
solve an item correctly can be manipulated. The
empirical study presented later in this chapter
mixes measures with and without explicit
manipulation of the number of mental models
required for successful solution.
Inductive and deductive reasoning processes
go through the same three stages of information
processing. In the first stage, the premises are
understood. Knowledge in general and literacy
in dealing with the stimuli are critical in build-
ing a representation of the problem. Frequently,
the problem will be verbal, and hence reading
comprehension will be an important aspect of
the creation of representations. However, it is
well known that strategies can have an effect on
encoding. In solving syllogisms, subgroups of
individuals might follow different strategies for
creating an initial representation of problem
content (Ford, 1995; Stenning & Oberlander,
1995; Sternberg & Turner, 1981). As a result,
specific groups of items are hard for one sub-
group but not for another, whereas for a second
group of items, the reverse is true.
In the second stage, a parsimonious description
of the constructed model(s) is attempted. If
the task is deductive reasoning, the resulting
construction should include something that
was not explicitly evident in the premises.
Technically, no meaning is created in deduction.
It is all implicit in the premises. Experientially,
deductive conclusions do not seem to be com-
pletely obvious and apparent. If no such conclu-
sion can be found, the answer to the problem can
be that there is no conclusion to the problem. If
the task is inductive reasoning, the resulting con-
struction allows a conclusion that increases the
semantic information of the premises. Hence,
a tentative hypothesis is constructed that implies
a semantically stronger description than evident
in the premises. However, if background knowl-
edge is operating besides the premises, an induc-
tive problem might turn into an enthymemea
deduction in which not all premises are explicit.
Many of the so-called inductive tasks used in
intelligence research technically might well
be classified as enthymemes. Frequently used
number-series problems could qualify as
enthymemes. If the premises of such a number-
series task are explicitly statedfor example, as
Continue the number series 1, 3, 5, 7, 9, 11 by
one more number, The operations you can use
are + , /, and * and all results are positive
integers, and rules are indicating regularities in
proceeding through the number series, and these
regularities can include rule-based changes to
the rulethere might be just one option that
meaningfully continues the sequence: 13.
In the third stage, models are evaluated,
maintained, modified, or rejected. If the task is
deductive reasoning, counterexamples to tenta-
tive conclusions are searched for. If no coun-
terexample can be found, the conclusion is
produced. If a counterexample is found, the
process goes back to Stage 2. If the task is
inductive reasoning, the conclusion adds infor-
mation to the premises. The conclusion should
be consistent with the premises and background
knowledge. Obviously, inductive conclusions
are not necessarily true. If an induction turns out
to be wrong, either the premises are false or the
induction was too strong. If a deduction turns
out to be wrong, the premises must be false.
Evidently, only the third stage is specific to
inductive and deductive reasoning, respectively.
376HANDBOOK OF UNDERSTANDING AND MEASURING INTELLIGENCE
21-Wilhelm.qxd 9/8/2004 5:09 PM Page 376
However, errors in answering reasoning
problems can be located at any of the three
stages. The relevance of the third stage as a
primary source of errors can be debated.
Johnson-Laird (1985) argues that the search
for counterexamples is crucial for individual
differences, yet Handley, Dennis, Evans, and
Capon (2000) argue that individuals rarely
engage in a search for counterexamples. Psy-
chometrically, syllogisms and spatial relational
tasks that do not rely on a search for counterex-
amples are as good or better than measures of
reasoning ability as items that require such a
search (Wilhelm & Conrad, 1998).
Theories about reasoning processes in general
and the mental model theory in particular have
been widely and successfully applied to reason-
ing problems. Few of these applications have
considered problems from psychometric reason-
ing tasks (but see Yang & Johnson-Laird, 2001).
We will now discuss the status of reasoning abil-
ity in various models of the structure of intelli-
gence, as assessed by psychometric reasoning
tasks, and then turn to formal and empirical clas-
sifications of reasoning measures. Ideally, a gen-
eral theory of reasoning processes should govern
test construction and confirmatory data analysis.
In practice, theories of reasoning processes have
rarely been considered when creating and using
psychometric reasoning tasks.
REASONING IN VARIOUS MODELS
OF THE STRUCTURE OF INTELLIGENCE
Binets original definition of intelligence
focused on abilities of sensation, perception,
and reasoning, but this definition was modified
several times and ended up defining intelligence
as the ability to adapt to novel situations (Binet,
1903, 1905, 1907). Structurally, Binets as well
as Ebbinghauss (1895) earlier investigations do
not fall within the realm of factor-analytic work,
and consequently, they have been rarely dis-
cussed in this context.
Spearmans invention of tetrad analysis as a
means to assess the rank of correlation matrices
was the starting point of factor-analytic work
(Krueger & Spearman, 1906; Spearman, 1904).
Spearmans definition of general intelligence
(g) focuses on the role of educing correlates and
relations. The ability to educe relations and
correlates is best reflected in reasoning measures.
Other intelligence measures are characterized
by varying proximity to the general factor.
Reasoning measures are expected to have high g
loadings and low proportions of specific vari-
ance. The g factor is said to be precisely defined
and the core construct of human abilities
(Jensen, 1998; but see Chapter 16, this volume).
There are several more or less strict interpreta-
tions of the g factor theory (Horn & Noll, 1997).
In its strictest form, one core process is causal
for all communalities in individual differences.
In a much more relaxed form of the theory, a
general factor is supposed to capture the corre-
lations between oblique first- or second-order
factors. With respect to reasoning, Spearman
(1923) considered inductive and deductive
reasoning to be forms of syllogisms. Although
Spearman (1927) did not exclude the option of
a reasoning-specific group factor besides g, per-
formance on reasoning measures was assumed
to be primarily limited by mental energyor g.
The controversy around Spearmans theory
was initially focused on statistical and method-
ological issues, and it was in the context of new
statistical developments that Thurstone con-
tributed his theory of primary mental abilities.
Thurstones initial work on the structure of intelli-
gence (1938) was substantially modified and
improved by Thurstone and Thurstone (1941). In
the later work, the primary factors of Space,
Number, Verbal Comprehension, Verbal Fluency,
Memory, Perceptual Speed, and Reasoning are
distinguished. The initial distinction between
inductive and deductive reasoning was abandoned,
and the associated variances were allocated to
Reasoning, Verbal Comprehension, Number, and
Space. The Reasoning factor is marked mostly
by inductive tasks. Several of the other factors
have substantial loadings from reasoning
tasks. In a sample of eighth-grade students, the
Reasoning factor is the factor with the highest
loading on a second-order factor. Further elabo-
ration of deductive measures by creating better
indicators, as suggested by the Thurstones, was
attempted only by the research groups surround-
ing Colberg (Colberg, Nester, & Cormier, 1982;
Colberg, Nester, & Trattner, 1985) and Guilford.
Guilfords contribution to the measurement
of reasoning ability is mostly in constructing and
Measuring Reasoning Ability377
21-Wilhelm.qxd 9/8/2004 5:09 PM Page 377
popularizing reasoning measures. The structure-
of-intellect (SOI) theory (Guilford, 1956, 1967)
is mostly to be credited for its heuristic value
in including some of what was previously no-
mans-land into intelligence research. For the
present purposes, the focus is on reasoning ability
exclusively, and Guilfords major contributions to
this topic can be located prior to the specification
of the SOI theory. On the basis of a mixture of
literature review, construction of specific tests, and
empirical investigations of the structure of reason-
ing, Guilford proposed initially three, later four,
reasoning factors (Guilford, Christensen, Kettner,
Green, & Hertzka, 1954; Guilford, Comrey,
Green, & Christensen, 1950; Guilford, Green, &
Christensen, 1951; Hertzka, Guilford, Christensen,
& Berger, 1954). These four factors (General
Reasoning, Thurstones Induction, Commonalities,
and Deduction) are hard to separate conceptually
and empirically. Specifically, the first three factors
are very similar on the task level, and empirically,
inductive tasks load on all three of these reasoning
factors. The deduction factor is marked weakly
with tasks that are hard to distinguish from tasks
assigned to other reasoning factors. The tasks
popularized by Guilford are still in use today
(Ekstrom, French, & Harman, 1976), but many
measures are available that are much better
conceptually and psychometrically.
The Berlin Intelligence Structure model
(Jger, S, & Beauducel, 1997; see Chapter 18,
this volume) is a bimodal hierarchical perspec-
tive on cognitive abilities. Intelligence tasks are
classified with respect to a content facet and an
operation facet. On the content facet, Verbal,
Quantitative, and Spatial intelligence are
distinguished. On the operation facet, Creativity/
Fluency, Memory, Processing Speed, and
Reasoning are distinguished. The model has a
surface similarity with Guilfords SOI theory
but avoids some of the technical pitfalls of
Guilfords model. The Reasoning factor on the
operation facet is defined as information pro-
cessing in tasks that require availability and
manipulation of complex information. The pro-
cessing thus reflects reasoning and judgment
abilities. The Reasoning factor is defined across
the content facet, and consequently, there are
verbal, spatial, and numerical reasoning tasks.
In an epochal effort, Carroll (1993) summa-
rized and reanalyzed factor-analytic studies of
human cognitive abilities. The result of this
work is an elaborated hierarchical theory that
postulates a general factor, g, at the highest
level. On a second level, broad ability factors
are distinguished. The proposed abilities are
fluid intelligence (Gf), crystallized intelligence
(Gc), general memory and learning, broad
visual perception, broad auditory perception,
broad retrieval ability, broad cognitive speedi-
ness, and processing speed. Fluid intelligence
is largely identified by three reasoning abili-
ties distinguished on the lowest stratum of
Carrolls theory. The three reasoning factors
are Sequential Reasoning, Induction, and
Quantitative Reasoning. The Sequential Reason-
ing factor is measured by tasks that require
participants to reason from premises, rules, or
conditions to conclusions that properly and
necessarily follow from them. In the remainder
of this chapter, the terms sequential reasoning
and deductive reasoning will be used inter-
changeably. The Induction factor is measured
by tasks that provide individuals with materials
that are governed by some rules, principles,
similarities, or dissimilarities. Participants are
supposed to detect and infer those features of
the stimuli and apply the inferred rule. The
Quantitative Reasoning factor is measured by
tasks that ask the participant to reason with
concepts involving numerical or mathematical
relations. Figure 21.1 presents the classification
of reasoning tasks according to Carroll (1993,
p. 210).
The theory developed by Cattell and Horn
(Horn & Noll, 1994, 1997) is very closely
related to Carrolls theory. In fact, Carrolls
theory is more based on Cattells and Horns
work than the other way round. Their investiga-
tion of human cognitive capabilities was
focused on five kinds of evidence in its develop-
ment: first, structural evidence as expressed in
the covariation of performances; second, devel-
opmental change through the life span; third,
neurocognitive evidence; fourth, achievement
evidence as expressed in the prediction of
criteria involving cognitive effort; and fifth,
behavioral-genetic evidence. Major differences
between the three-stratum theory from Carroll
and the Gf-Gc theory from Horn and Cattell are
the lack of a general factor in the Cattell-Horn
framework because, according to Horn and Noll
378HANDBOOK OF UNDERSTANDING AND MEASURING INTELLIGENCE
21-Wilhelm.qxd 9/8/2004 5:09 PM Page 378
(1994), there is no unifying principle and hence
no sufficient reason for specification of a
general factor. However, for the present
purposes, the proposed structure and interpreta-
tion of reasoning ability is of major importance.
Horn and Noll interpret fluid intelligence as
inductive and deductive reasoning that is critical
in understanding relations among stimuli, com-
prehending implications, and drawing infer-
ences. Horn and Noll (1997) also speak about
conjunctive and disjunctive reasoning, but sup-
posedly, these two forms fall under inductive
and deductive reasoning. The Cattell-Horn
theory assumes that both inductive and deduc-
tive reasoning tasks can have verbal as well as
spatial content (Horn & Cattell, 1967). This idea
can be extended, and both Gf and Gc can be
measured with a broader variety of contents
(Beauducel, Brocke, & Liepmann, 2001). In
terms of the structure of reasoning ability, there
is little difference between Carrolls theory, on
the one side, and the Cattell-Horn framework,
on the other. The major difference is the postu-
lation of a separate quantitative factor in the
latter model, whereas Carroll subsumes quanti-
tative reasoning under fluid intelligence.
Based on available psychometric reasoning
tasks, reasoning ability has a central place in all of
the above-discussed theories of the structure of
intelligence. However, the manifold of available
measures might still reflect a biased selection from
all possible reasoning tests. The two following
sections on formal and empirical classifications
should contribute to deepening our understanding
of reasoning measures and reasoning ability.
FORMAL CLASSIFICATIONS OF REASONING
There is certainly no lack of reasoning measures.
Carroll (1993) lists a very broad variety of avail-
able reasoning tasks, and more, similar tests
could be developed without major problems.
Kyllonen and Christal (1990) summarize the
situation as follows:
Since Spearman (1923) reasoning has been
defined as an abstract, high-level process, eluding
precise definition. Development of good tests of
reasoning ability has been almost an art form,
owing more to empirical trial-and-error than to
systematic delineation of the requirements such
tests must satisfy. (p. 426)
Although empirical evidence indicates that
some measures are better indicators of reason-
ing ability than others, the theoretical knowl-
edge about which measure is good for what
reasons is still very limited. In addition, scien-
tists and practitioners are left with little advice
from test authors as to why a specific test has
the form it has. It is easy to find two reasoning
tests that are said to measure the same ability
but that are vastly different in terms of their
features, attributes, and requirements.
Compared to this bottom-up approach of test
construction, a top-down approach could facilitate
construction and evaluation of measures. There
are four aspects of such a top-down approach that
will be discussed subsequently: operation, con-
tent, instantiation and nonreasoning requirements,
and vulnerability to reasoning strategies.
Measuring Reasoning Ability379
Sequ.
Reason.
gf
Quantitat.
Inductive Analo-
gies
Odd
Elements
Matrix
Tasks
Multiple
Exempl.
Quantit.
Tasks
Series
Tasks
Rule
Discover
Gen. ver.
Reason.
Linear
Syllog.
Categor.
Syllog.
Figure 21.1 Carrolls Higher-Order Model of Fluid Intelligence (Reasoning)
21-Wilhelm.qxd 9/8/2004 5:09 PM Page 379
The first aspect to consider in the classification
of reasoning measures is the formal operational
requirement. Reasoning tasks can call for induc-
tive and deductive inferences, and among
various tests for fluid intelligence, there are
additional tests that primarily call for judgment,
decision making, and planning. In focusing on
inductive and deductive reasoning, the distinc-
tion is that in inductive reasoning, individuals
create semantic information; as a result, the
inferences are not necessarily true. In deductive
reasoning, however, individuals maintain
semantic information and derive inferences that
are necessarily true if the premises are true.
Tasks that are commonly classified as requiring
broad visualization (Carroll, 1993) usually
satisfy the definition of deductive reasoning.
However, the visualization demand of such
tasks is pivotal and paramount (Lohman, 1996),
and such tasks will consequently be excluded
from further discussion.
A second aspect to consider in the classifica-
tion of reasoning measures is the content of
tasks. Tasks can have many contents, but the
vast majority of reasoning measures employ
figural, quantitative, or verbal stimuli. Many
tasks also represent a mixture of contents. For
example, arithmetic reasoning tasks can be both
verbal and quantitative. Experimental manipula-
tions of the content of measures are desirable
to understand the structure of reasoning ability
more profoundly.
A third aspect of relevance in classifying
measures of reasoning ability has to do with the
instantiation of reasoning problems. Reasoning
problems have an underlying formal structure.
If we decide to construct a measure of reasoning
ability, we instantiate this general form and have
a variety of options in doing so. In choosing
between these options, essentially we go
through a decision tree. A first choice might
be to use either concrete or abstract forms of
reasoning problems. In the abstract branch, we
might choose between a nonsense instantia-
tion and a variable instantiation. In the case of
syllogistic reasoning tests, a nonsense instantia-
tion might be All Gekus are Lemis. All Lemis
are Filop. A variable instantiation of the
same underlying logical form could be All
A are B. All B are C. In the concrete branch of
the decision tree, prior knowledge is of crucial
importance. Instantiations of reasoning problems
can either conform or not with our prior knowl-
edge. Nonconforming instantiations can either
be counterfactual or impossible. A coun-
terfactual instantiation could be All psycholo-
gists are Canadian. All Canadians drive Porsches.
An impossible instantiation could be All cats
are dogs. All dogs are birds. In the branch that
includes instantiations that conform to prior
knowledge, we can distinguish factual and
possible instantiations. A factual instantia-
tion could be All cats are mammals. All
mammals have chromosomes. A possible
instantiation could be All white cars in this
garage are fast. All fast cars in this garage run
out of petrol.
It is well established that the form of the
instantiation has substantial effects on the diffi-
culty of structurally identical reasoning tasks
(Klauer, Musch, & Naumer, 2000). It is also
known that the form of the instantiation of rea-
soning tasks has some influence on the psycho-
metric properties of reasoning tasks (Gilinsky
& Judd, 1993). Abstract instantiations might
induce test anxiety in some individuals because
they look like formulas. Aside from this possi-
ble negative effect, abstract instantiations might
be a good format for reasoning tasks. Instan-
tiations that do not conform to prior knowledge
are likely to be less good forms of reasoning
problems because there is an apparent conflict
between prior knowledge and the required
thought processes. It is likely that some indi-
viduals are better able than others to abstract
from their prior knowledge. However, such an
abstraction would not be covered by a measure-
ment intention that aims at assessing the ability
to reason deductively. Instantiations that actu-
ally reflect prior knowledge are not good forms
for reasoning problems because rather than rea-
soning, the easiest way to a solution is to recall
the actual knowledge. Some of the most widely
used tests of deductive reasoning are impossi-
ble instantiations. The psychometric differences
between measures instantiated in a different
way are likely to be not trivial.
The final aspect of a classification of reason-
ing measures discussed here deals with the
vulnerability of a task to reasoning strategies.
In measuring reasoning abilitylike most other
abilitiesit is assumed that all individuals
380HANDBOOK OF UNDERSTANDING AND MEASURING INTELLIGENCE
21-Wilhelm.qxd 9/8/2004 5:09 PM Page 380
approach the problems in the same way. Some
individuals are more successful than others
because they have more of the required abil-
ity. Consequently, it is implicitly assumed that
individuals at the very top of the ability distrib-
ution proceed roughly in the same way through
a reasoning test as individuals at the very
bottom of the distribution. If a subgroup of par-
ticipants chooses a different approach to work
on a given test, the consequence is that the test
is measuring different abilities for different sub-
groups. For syllogistic reasoning, it is known
that there are two or three subgroups of individ-
uals who approach syllogistic reasoning tests
differently. Depending on which strategy is
chosen, different items are easy and hard, respec-
tively (Ford, 1995). Knowledge about strategies
in reasoning is limited (but see Schaeken, de
Vooght, Vandierendonck, & dYdewalle, 2000),
and the role of strategies in established reasoning
measures has been barely investigated.
The actual reasoning tasks that have been
used in experimental investigations of reasoning
processes and psychometric studies of reason-
ing ability have little to no overlap in surface
features. However, there is now good evidence
(Stanovich, 1999) that reasoning problems, as
they have been used in cognitive psychology,
are moderately correlated with reasoning mea-
sures as they have been used in individual-
differences research. The experimentally used
tasks have been thoroughly investigated, and we
now know a lot about the ongoing thought
processes involved in these tasks. One important
conclusion from this research is that the instan-
tiations of reasoning problems are appropriate
to elicit the intended reasoning processes for the
most part (Shafir & Le Boeuf, 2002; Stanovich,
1999). However, there are pervasive reliability
issues because frequently, only a few such
problems are used in any given experiment.
Conversely, we do not know a lot about ongoing
thought processes in established measures of
reasoning ability as used in psychometric
research. However, we do know a lot about their
structure (Carroll, 1993), their relations with
other measures of maximal behavior (Carroll,
1993; Jger et al., 1997; Kyllonen & Christal,
1990), and their validity for the prediction
of real-life criteria (Schmidt & Hunter, 1998).
Both sets of reasoning tasks can and should
be used when studying reasoning ability. The
benefits would be mutual. For example, differ-
ences in correlations between various individual
reasoning items as used in cognitive research
and latent variables from reasoning ability tests
might reveal important differences between the
experimental tasks. Similarly, variability in the
difficulties of items from standard psychometric
reasoning tests can be possibly explained by
application of various theories of reasoning
processeslike the mental model theory that
was sketched above.
EMPIRICAL CLASSIFICATIONS
OF REASONING MEASURES
In psychology, inductive reasoning has fre-
quently been equated with proceeding from
specific premises to general conclusions.
Conversely, deductive reasoning has frequently
been equated with proceeding from general
premises to specific conclusions. This definition
can still be found in textbooks, but it is outdated.
There are inductive arguments proceeding from
general premises to specific conclusions, and
there are deductive arguments proceeding from
specific premises to general conclusions. For
example, the argument Almost all Swedes are
blond. Jan is a Swede. Therefore Jan is blond.
is an inductive argument that violates the above
definition, and the argument Jan is a Swede.
Jan is blonde. Therefore some Swedes are
blond. is a deductive argument that also
violates the above definition.
According to Colberg et al. (1982), most
established reasoning tests confound the direc-
tion of inference (general or specific premises
and general or specific conclusions) with deduc-
tive and inductive reasoning tasks. By con-
structing specific deductive and inductive
reasoning tasks (Colberg et al., 1985), they pre-
sent correlational evidence that seems to support
the unity of inductive and deductive reasoning
tasks. However, reliability of the measures is
very low; the applied method of disattenuating
correlations is not satisfying; and, most impor-
tant, Shye (1988) reclassifies their tasks and
finds support for a distinction between rule-
inferring and rule-applying tasks (see Chapter 18,
this volume). In the initial classification and
Measuring Reasoning Ability381
21-Wilhelm.qxd 9/8/2004 5:09 PM Page 381
construction of tasks (Colberg et al., 1985), tests
have been labeled as inductive when in fact they
were probabilistic. Probabilistic tasks can, in
principle, be deductive (Johnson-Laird, 1994a;
Johnson-Laird, Legrenzi, Girotto, Legrenzi, &
Caverni, 1999), and the probabilistic tasks used
(Colberg et al., 1985) were in fact deductive
tasks. What was shown by Colberg (Colberg
et al., 1982, 1985), then, was the unity of some
forms of deductive reasoning tasks, and what
Shye demonstrated was that task classification
is a sensitive business and that rule-applying
tasks, as constructed by Colberg et al., fall into
the periphery of a multidimensional scaling,
with rule inferring/inductive reasoning at the
center of the solution.
The most sophisticated, ambitious, and
advanced attempt to propose factors of reason-
ing ability comes from Carroll (1993). Carroll
discusses the structure of reasoning ability,
bearing in mind several objections and difficul-
ties. Among those objections are that (a) reason-
ing tests are frequently complex, requiring
both inductive and deductive thought processes;
(b) reasoning measures are often short and
administrated under timed conditions; (c) rea-
soning tests are usually not carefully constructed
and analyzed on the item level; (d) inductive and
deductive reasoning processes are learned and
developed together; and (e) many reasoning
measures involve language, quantitative, or
spatial skills to an unknown amount.
Carroll (1993) asserts that his proposal of the
three reasoning factorsInduction, Deduction,
and Quantitative Reasoningis preliminary for
several reasons (but see Carroll, 1989). First, in
many of the reanalyzed studies, only one rea-
soning factor emerged. This is simply due to the
fact that there was frequently not a sufficient
number of reasoning tests included to examine
the structure of reasoning ability in such studies.
Second, in the 37 out of 176 data sets with more
than one reasoning factor, most of the studies
were never intended and designed to investigate
the structure of reasoning ability. Third, those
studies intended to investigate the structure of
reasoning ability included insufficient numbers
of reasoning measures. Other problems with
investigating the structure of reasoning ability
include variations in time pressure across tests
and studies, variations in scoring procedures,
variations in instructing participants, and, most
important, individual measures that are classi-
fied post hoc rather than a priori.
In carefully examining Tables 6.1 and 6.2
from Carroll (1993), it is apparent that the
deductive reasoning tasks are frequently verbal.
Content for the inductive reasoning tasks is
more diverse but tends to be figural-spatial. The
last reasoning factor is rather unequivocally a
quantitative factor. An explanation of the data
in Carroll as indicating a distinction between
inductive, deductive, and quantitative reasoning
competes with an explanation that distinguishes
between verbal, figural-spatial, and quantitative
content. Inspection of Carrolls reanalysis of
individual data sets is compatible with an inter-
pretation of the factor labeled as general sequen-
tial reasoning or deductive reasoning as a verbal
reasoning factor. The inductive reasoning factor,
on the other side, could reflect figural-spatial
reasoning. The quantitative reasoning factor
apparently reflects numerical or quantitative
reasoning. Compatible with this interpretation
is that the deductive reasoning factor can fre-
quently not be distinguished from a verbal
factor and tends to have high loadings on a
higher-order crystallized factor. In accord with
the interpretation of the inductive reasoning
factor, the figural-spatial reasoning processes
measured with the associated tasks tend to be
highly associated with a higher-order fluid rea-
soning factor. In line with this theorizing, the
induction factor has the highest loading on g of
all Stratum 1 factors. The deductive reasoning
factor ranks only 10 among these loadings. The
mean loading of induction on g is .57, whereas
the mean loading of deductive reasoning is only
.41. Besides the mean difference in the average
magnitude of loadings, there is a higher disper-
sion of g loadings among the deductive tasks.
Similarly, the fluid intelligence factor, Gf, is
best defined by induction in Carrolls reanalysis.
Gf is defined by induction 19 times, with an
average loading of .64. Deductive reasoning
defined Gf only 6 times, with an average load-
ing of .55. On the other side, deductive reason-
ing appears among the variables defining
crystallized intelligence. Deductive reasoning
defined the Gc factor 7 times, with an average
loading of .69. Induction does not appear on the
list of Stratum 1 abilities defining crystallized
382HANDBOOK OF UNDERSTANDING AND MEASURING INTELLIGENCE
21-Wilhelm.qxd 9/8/2004 5:09 PM Page 382
intelligence. Finally, deductive reasoning appears
8 times, with an average loading of .70 on a
factor labeled 2Hreflecting a mixture of fluid
and crystallized intelligence. Induction, on the
other hand, appeared only twice, with an
average loading of .41.
Given these considerations, the proposal of
reasoning ability as being composed of induc-
tive, deductive, and quantitative reasoning is
competing with a proposal of verbal, figural-
spatial, and quantitative reasoning. To investi-
gate possible structures of reasoning ability, one
should include tasks that allow for comparison
between several competing theories. There are
basically five theories competing as explana-
tions for the structure of reasoning ability.
1. a general reasoning factor accounting for the
communality of reasoning tasks varying with
respect to content (verbal, quantitative, figural-
spatial) and operation (inductive, deductive);
2. two correlated factors for inductive and
deductive reasoning, respectively, without the
specification of any content factors;
3. three correlated factors for verbal, quantitative,
and figural-spatial reasoning, without distin-
guishing inductive and deductive reasoning
processes;
4. a general reasoning factor along with nested
and completely orthogonal factors for verbal
and quantitative reasoning but no figural-
spatial factor; and
5. two correlated factors for inductive and deduc-
tive reasoning along with completely orthogo-
nal content factors for verbal and quantitative
reasoning and again no figural-spatial factor.
For the evaluation of these models, it is
important to avoid a confound between content
and process on the task level. A second crucial
aspect for exploring the structure of reasoning
ability is to select appropriate tasks to measure
the intended constructs. This is particularly hard
in the domain of deductive reasoning. Following
the above-presented definition of inductive and
deductive reasoning, it is very difficult to find
adequate measures of figural-spatial deductive
reasoning. In fact, only 7 of all the tasks
described in Carroll (1993) can be classified as
deductive figural-spatial tasks. However, these
tasks frequently represent a mixture with other
demands. For example, ship-destination has
quantitative demands; match problems, plot-
ting, and route planning have visualization
demands. In classifying 90 German intelligence
tasks, Wilhelm (2000) could not find a single
deductive figural-spatial measure.
To test the structure of reasoning ability,
Wilhelm (2000) selected reasoning measures
based on their cognitive demands and the
content involved. In addressing the above-
mentioned criticisms of existent reasoning tasks,
several reasoning tasks were newly constructed.
The following 12 measures were included in the
study (D and I denote deductive and inductive
reasoning; F, N, and V stand for figural, numeri-
cal, and verbal content, respectively).
DF1 (Electric Circuits): Positive and negative
signals travel through various switches. The result-
ing signal has to be indicated. The number and kind
of switches and the number of signals are varied
(Gitomer, 1988; Kyllonen & Stephens, 1990).
DF2 (Spatial Relations): Spatial orientation of
symbols is presented pairwise. The spatial orien-
tation of two symbols that were not presented
together can be derived from the pairwise presen-
tations (Byrne & Johnson-Laird, 1989).
DN1 (Solving Equations): A series of equations is
presented. Participants can derive values of vari-
ables deductively. Items vary by the number of
variables and the difficulty of relation. A difficult
sample item is A plus B is C plus D. B plus C is
2*A. A plus D is 2*B. A + B is 11. A + C is 9.
DN2 (Arithmetic Reasoning): Participants pro-
vide free responses to short verbally stated arith-
metic problems from a real-life context.
DV1 (Propositions): Acts of a hypothetical
machine are described, and the correct conclusion
has to be deduced. The number of mental models,
logical relation, and negation are varied in this
multiple-choice test (Wilhelm & McKnight,
2002). A simple sample item is as follows: If the
lever moves and the valve closes, then the inter-
rupter is switched. The lever moves. The valve
closes.
DV2 (Syllogisms): Verbally phrased quantitative
premises are presented in which the number of
Measuring Reasoning Ability383
21-Wilhelm.qxd 9/8/2004 5:09 PM Page 383
mental models is varied by manipulating the
figure and quantifier (Wilhelm & McKnight,
2002). A sample item is as follows: No big shield
is red. All round shields are big.
IF1 (Figural Classifications): Participants are
asked to find the one pictorial figure that does not
belong with four other figures based on various
attributes of the figures.
IF2 (Matrices): Based on trends in rows and
columns of 3*3 matrices, a figure that belongs in
a specified cell has to be selected from several
distractors.
IN1 (Number Series): Rule-ordered series of
numbers are to be continued by two elements. The
difficulty of the rule that has to be detected is varied.
IN2 (Unfitting Number): In a series of numbers,
one that does not fit has to be identified.
IV1 (Verbal Analogies): Analogies as they are fre-
quently used in intelligence research. The general
form of the multiple-choice items is ? is to B as
C is to ?. The vocabulary of these double analo-
gies is simple (i.e., participants are familiar with
all terms), and the difficulty of the relationship is
varied.
IV2 (Word Meanings): In this multiple-choice
test, participants should identify a word that means
approximately the same thing as a given word.
A total of 279 high school students with a
mean age of 17.7 years and a standard deviation
of 1.2 years completed all tests and several cri-
terion measures. All tests were analyzed sepa-
rately with item response theory models. For all
tests, a two-parameter model assuming disper-
sion in item discrimination was superior to a
Rasch model. The estimated person parameters
from these two-parameter models were subse-
quently analyzed. For participants who got
either all answers wrong or all answers right,
person parameters were interpolated. Some of
the reliabilities of the tasks are not satisfying.
Coefficient Omega (McDonald, 1985) for IF1
and IF2 are only .50 and .51, respectively. The
overall test length for individual measures might
be responsible for these suboptimal results.
The core research question in the present
context is which of the above-specified models
provides the best fit for the data. A one-factor
model simply specifies one latent reasoning
factor with loadings from all indicators. A
two-factor model specifies two correlated latent
factors: one factor with loadings on all the
inductive tasks, the other factor with loadings on
all the deductive tasks. The correlation between
both factors is estimated freely. The three-factor
model specifies three correlated content factors:
a verbal factor with loadings from all the verbal
tasks, a quantitative factor with loadings from
all quantitative tasks, and a figural-spatial factor
with loadings on all the figural-spatial tasks.
The fourth model specifies a general reasoning
factor and two orthogonal nested factorsone
for the four verbal tasks and the other for the
four quantitative tasks. The fifth model specifies
an inductive reasoning factor with loadings
from all inductive reasoning tasks and, likewise,
a deductive reasoning factor with loadings from
all the deductive reasoning tasks. In addition,
the two content factors as in the fourth model
are specified. The two reasoning factors are
correlated, but the three content factors are not.
Generally, there are, of course, other possible
model architectures (see Chapter 14, this
volume). However, the above-mentioned mod-
els provide a test of competing theories for the
structure of reasoning ability. The last two models
mentioned above specify content factors for the
verbal and quantitative tasks only. For the
figural-spatial tasks, such a content factor might
not be necessary because such tasks have been
said to require decontextualized reasoning, and
observed individual differences do not reflect
specific prior knowledge (Ackerman, 1989,
1996; Undheim & Gustafsson, 1987). Models
with and without a first-order factor of figural-
spatial reasoningas specified in the current
contextare nested and can be compared infer-
entially (see Chapter 14, this volume).
Table 21.1 summarizes the fit of the five
confirmatory factor analyses. Comparing the
general factor model with a model that specifies
two correlated factors of inductive and deductive
reasoning, respectively, reveals that there is no
advantage in estimating the correlation between
inductive and deductive reasoning freely (as
opposed to restricting this correlation to unity).
Indeed, the correlation between both factors in
Model 2 is estimated to be exactly 1. Conseque-
ntly, when comparing these two models, the
384HANDBOOK OF UNDERSTANDING AND MEASURING INTELLIGENCE
21-Wilhelm.qxd 9/8/2004 5:09 PM Page 384
general factor model is the better explanation of
the data because it is more parsimonious than the
two-factor model. However, both models do not
provide acceptable fit.
A model specifying three correlated group
factors for content does substantially better in
explaining the data. Although there is still room
to improve fit, the model represents an accept-
able explanation of the data. Given that the
model is completely derived from theory, it can
serve as a good starting point for future investi-
gations. Comparing the two models with com-
pletely orthogonal content factors again
demonstrates the superiority of the model that
postulates the unity of inductive and deductive
reasoning. In this data set, inductive and deduc-
tive reasoning are perfectly correlated.
Introducing a distinction between both factors is
unnecessary and consequently does not improve
model fit. Both models are substantially better
than the initial one- and two-factor models.
However, one of the loadings on the verbal
factor is not significant and negative in sign.
Given this departure from the theoretical expec-
tation of positive and significant loadings, and
keeping in mind interpretative issues with group
factors in nested factor models (see Chapter 14,
this volume), the best solution seems to be
accepting the model based on the content
factors. In this model, there are three content-
related reasoning factors, each one of them sub-
suming inductive and deductive reasoning tasks.
In the current study, the model with correlated
group factors is equivalent to a second-order
factor model. In this model, the correlations
between factors are captured by a higher-order
factor. This model is presented in Figure 21.2.
The two content factorsVerbal and Quantita-
tive Reasoningreflect deductive and inductive
reasoning with verbal and quantitative material,
respectively. Due to the relevance of task con-
tent, it can be expected that the Verbal and the
Quantitative Reasoning factors do predict dif-
ferent aspects of criteria such as school grades,
achievement, and the like. The loading of the
Figural Reasoning factor on fluid intelligence is
freely estimated to be 1. Not only are g and Gf
very highly or perfectly correlated (Gustafsson,
1983), but the same is true between figural-
spatial reasoning and fluid intelligence. Con-
sequently, the current analysis extends Undheim
and Gustafssons (1987) work to a lower stra-
tum. It is a replicated finding that Gf is the
Stratum 2 factor with the highest loading on
g (Carroll, 1993). It has also been argued that
this relation might be perfect (Gustafsson, 1983;
Undheim & Gustafsson, 1987, but see Chapter
18, this volume). Figural-spatial reasoning, in
turn, has the highest loading on fluid intelli-
gence, and in the data presented in this chapter,
the relation between figural-spatial reasoning
and the factor labeled fluid intelligence is per-
fect. Hence, if we do want to measure g with a
single task, we should select a task of figural-
spatial reasoning. Matrices tasks have been con-
sidered particularly good measures of Gf and g.
Spearman (1938) suggested the Matrices test
from Penrose and Raven (1936), as well as the
inductive figural measure from Line (1931), as
the single best indicators of g. The latter test is
less prominent than the Matrices test, but vari-
ants of it can be found in various intelligence
Measuring Reasoning Ability385
Table 21.1 Fit Statistics of Five Competing Structural Explanations of Reasoning Ability
g Ind. Ded. Cont. g & Cont. Ind. Ded. & Cont.

2
121.2 121.2 84.8 73.3 72.0
df 54 53 51 46 45
p <.0001 <.0001 .002 .006 .006
CFI .901 .900 .950 .960 .960
RMSEA .067 .068 .049 .046 .046
BIC 316.0 324.1 303.9 333 339.8
CAIC 280.4 287 263.8 285.5 290.8
Note: Ind. Ded. = inductive and deductive; Cont. = contents; CFI = comparative fit index; RMSEA = root mean square error
of approximation; BIC = Bayesian information criterion; CAIC = consistent Akaikes information criterion.
21-Wilhelm.qxd 9/8/2004 5:09 PM Page 385
tests. Although it is not good practice to
measure rather general constructs with single
tasks, there is certainly evidence suggesting
that, if need be, this sole task should be a
figural-spatial reasoning measure. Whether such
a task is classified as inductive or deductive is
not important for that purpose.
Frequently, the composition of intelligence
batteries is not well balanced in the sense that
there are many indicators for one intelligence
construct but few or no tests for other intelli-
gence constructs. In such cases (e.g., Roberts
et al., 2000), the overall solution can be domi-
nated by tasks other than fluid intelligence
tasks. As a result, figural-spatial reasoning tasks
might not be the best selection in these cases to
reflect the g factor of such a battery.
When interpreting the results from this study,
it is important to keep in mind that the differ-
ences between various models were not that big.
With different tasks and different participants, it
is possible that different results emerge. The
present results are preliminary and in need of
replication and extension. The most important
result from the study reported above is that in a
critical test aimed to assess a distinction
between inductive and deductive reasoning, no
such distinction could be found. Latent factors
of inductive and deductive reasoning are per-
fectly correlated in several models. The result of
a unity of inductive and deductive reasoning
was also obtained with multidimensional
scaling, exploratory factor analysis, and tetrad
analysis. It is important to note that this result
emerged considering the desiderata for future
research provided by Carroll (1993, p. 232).
Specifically, the present tasks have been
selected or constructed based on a careful review
of the individual-differences and cognitive
literature on the topic, the items were analyzed
by latent item response theory, and the scales
were analyzed by confirmatory factor analyses.
The current tests include several new reasoning
measures that are based on and informed
through cognitive psychology.
WORKING MEMORY AND REASONING
There have been several attempts to explain
reasoning ability in terms of other abilities that
are considered more basic and tractable. Specifi-
cally, working memory has been proposed as
the major limiting factor for human reasoning
(Kyllonen & Christal, 1990; S, Oberauer,
Wittmann, Wilhelm, & Schulze, 2002). The
working definition of working memory has been
that any task that requires individuals to simul-
taneously store and process information can be
considered a working memory task (Kyllonen &
Christal, 1990). This definition has been criti-
cized because it seems to include all reasoning
measures. The definition has also been criti-
cized because its notion of storage and pro-
cessing are imprecise and fuzzy (see Chapter
22, this volume). A critique of the working
memory = reasoning hypothesis can also focus
on the problem of the reduction of one construct
386HANDBOOK OF UNDERSTANDING AND MEASURING INTELLIGENCE
Verbal
gf
Figural
Quant.
IV2
DV1
DV2
IN1
IN2
IV1
IF2
DF1
DF2
DN1
DN2
IF1 .35
.57
.50
.68
.67
.60
.33
.49
.45
.63
.73
.69
.83
.84 1.00
Figure 21.2 Higher-Order Model of Fluid Intelligence (Reasoning)
21-Wilhelm.qxd 9/8/2004 5:09 PM Page 386
in need of explanation through another one
(Deary, 2001) that is not doing any better.
However, this critique is unjustified for several
reasons.
1. It is easy to construct and create working
memory tasks. Many tasks that satisfy the above
definition work in the sense that they correlate
highly with other working memory measures,
reasoning, Gf, and g. In addition, it is easy and
straightforward to manipulate the difficulty of a
working memory item by manipulating the stor-
age demand, the process demand, or the time
available to do storage, processing, or both.
Those manipulations account for a large amount
of variance of task difficulty in almost all cases.
2. There is an enormous corpus of research
on working memory and processes in working
memory in cognitive psychology (Conway,
Jarrold, Kane, Miyake, & Towse, in press;
Miyake & Shah, 1999). It is fruitful to derive
knowledge and hypotheses about individual dif-
ferences in cognition from this body of research.
3. In the sense of a reduction of working
memory on biological substrates, intensive and
very productive research has linked working
memory functioning to the frontal lobes and
investigated the role of various physiological
parameters to cognitive functioning (Kane &
Engle, 2002; see Chapter 9, this volume, for a
review of research linking reasoning to various
neuropsychological parameters). Hence, the
equation of working memory with reasoning is
complemented by relating working memory to
the frontal lobes and other characteristics and
features of the brain.
The strengths of the relation found between
latent factors of working memory and reasoning
vary substantially, fluctuating between a low of
.6 (Engle, 2002; Engle, Tuholski, Laughlin, &
Conway, 1999; Kane et al., 2004) and a high of
nearly 1 (Kyllonen, 1996). In the discussion of
the strength of the relation, several sources that
could cause an underestimation or an overesti-
mation should be kept in mind.
1. The relation should be assessed on the
level of latent factors because this is the level
of major interest when it comes to assessing
psychological constructs. There should be more
than three indicators of sufficient psychometric
quality for each construct to allow an evaluation
of the measurement models on both sides.
2. Depending on the task selection and the
breadth of the definition of both constructs, the
specification of more than one factor on both
sides might be necessary (Oberauer, S,
Wilhelm, & Wittmann, 2003).
3. The definition of constructs and task
classes is a difficult issue. Classifying anything
as a working memory task that requires simulta-
neous storage and processing could turn out to
be overinclusive. Restricting fluid intelligence
to figural-spatial reasoning measures is likely to
be underinclusive. The comments on tasks of
reasoning ability presented in this chapter, as
well as similar comments on what constitutes a
good working memory task (see Chapters 5 and
22, this volume), might be a good starting point
for definition of task classes.
4. Content variation in the operationaliza-
tion for both constructs can have an influence on
the magnitude of the relation. When assessing
reasoning ability, one is well advised to use
several tasks with verbal, figural, and quantita-
tive content. The same is true for working
memory. This chapter provided some evidence
for the content distinction on the reasoning side.
Similar evidence for the working memory
side is evident in structural models that posit
content-specific factors of working memory
(Kane et al., 2004; Kyllonen, 1996; Oberauer,
S, Schulze, Wilhelm, & Wittmann, 2000).
Relating working memory tasks of one content
with reasoning tasks of another content causes
one to underestimate the true relation.
5. A mono-operation bias should be avoided
in assessing both constructs. Using only com-
plex span tasks or only dual-tasks to assess
working memory functioning does not do
justice to the much more general nature of the
construct (Oberauer et al., 2000). Task class-
specific factors or task-specific strategies might
have an effect on the estimated relation.
6. Reasoning measureslike other intelli-
gence tasksare frequently administered under
time constraints. Timed and untimed reasoning
Measuring Reasoning Ability387
21-Wilhelm.qxd 9/8/2004 5:09 PM Page 387
ability are not perfectly correlated (Wilhelm &
Schulze, 2002). Similarly, working memory tasks
frequently have timed aspects (Ackerman, Beier,
& Boyle, 2003). For example, there might be
only a limited time to execute a process before
the next stimulus appears, there might be a timed
rate of stimulus presentation, and the like.
Common speed variance could inflate the corre-
lation between working memory and reasoning.
The assumption that working memory is a
critical ingredient to success on reasoning tasks
is compatible with experimental evidence and
theories from cognitive psychology. The ability
to successfully create and manipulate mental
representations was argued to be the critical
ingredient in reasoning. Whether the necessary
representations can be created and manipulated
depends crucially on working memory. This
prediction has gained strong support from the
correlational studies relating working memory
and reasoning. If the individual differences in
reasoning ability and working memory turn out
to be roughly the same, the evidence supporting
the predictive validity of reasoning ability and
fluid intelligence applies to working memory
capacity, too. After careful consideration of
costs and benefits, it might be sensible to use
more tractable working memory tasks for many
practical purposes.
SUMMARY AND CONCLUSIONS
The fruitful avenue to future research on mea-
suring and understanding reasoning ability is
characterized by (a) more theoretically moti-
vated work in the processes and resources
involved in reasoning and (b) the use of confir-
matory methods on the item and test level to
investigate meaningful measurement and struc-
tural models. The major result of efforts directed
that way would be a more profound understand-
ing of important thought processes and an
improved construction and design of measures
of reasoning ability. A side product of such
efforts will be generative item production and
theoretically derived assumptions about psycho-
metric properties of items and tests. Another
side product would be the option to develop
more appropriate means of altering reasoning
ability. There are several very interesting
attempts to develop training methods for rea-
soning ability, and the initial results are encour-
aging in some cases (Klauer, 1990, 2001).
Although it was not possible to discriminate
between inductive and deductive reasoning
psychometrically, it could be possible that
appropriate training causes differential gains in
both forms of reasoning. The cognitive processes
in inductive and deductive reasoning tasks
might be different, but the individual differences
we can observe on adequate measures are not.
This does not exclude the option that both
thought processes might be affected by different
interventions.
REFERENCES
Ackerman, P. L. (1989). Abilities, elementary infor-
mation processes, and other sights to see at the
zoo. In R. Kanfer, P. L. Ackerman, & R. Cudeck
(Eds.), Abilities, motivation, and methodology:
The Minnesota symposium on learning and
individual differences (Vol. 10, pp. 280293).
Hillsdale, NJ: Lawrence Erlbaum.
Ackerman, P. L. (1996). A theory of adult intellectual
development: Process, personality, interests, and
knowledge. Intelligence, 22, 229259.
Ackerman, P. L., Beier, M. E., & Boyle, M. D.
(2003). Individual differences in working
memory within a nomological network of
cognitive and perceptual speed abilities. Journal
of Experimental Psychology: General, 131,
567589.
Andrews, G., & Halford, G. S. (2002). A cognitive
complexity metric applied to cognitive develop-
ment. Cognitive Psychology, 45, 153219.
Beauducel, A., Brocke, B., & Liepmann, D. (2001).
Perspectives on fluid and crystallized intelli-
gence: Facets for verbal, numerical, and figural
intelligence. Personality and Individual Differ-
ences, 30, 977994.
Binet, A. (1903). L etude experimentale de lintelli-
gence [Experimental studies of intelligence].
Paris: Schleicher, Frenes.
Binet, A. (1905). A propos de la measure de lintelli-
gence [On the subject of measuring intelli-
gence]. Anne Psychologique, 12, 6982.
Binet, A. (1907). La psychologie du raisonnement
[The psychology of reasoning]. Paris: Alcan.
388HANDBOOK OF UNDERSTANDING AND MEASURING INTELLIGENCE
21-Wilhelm.qxd 9/8/2004 5:09 PM Page 388
Boole, G. (1847). The mathematical analysis of logic:
Being an essay towards a calculus of deductive
reasoning. Cambridge, UK: Macmillan, Barclay,
and Macmillan.
Byrne, R. M. J., & Johnson-Laird, P. N. (1989).
Spatial reasoning. Journal of Memory and
Language, 28, 564575.
Carnap, R. (1971). Logical foundations of probabil-
ity. Chicago: University of Chicago Press.
Carroll, J. B. (1989). Factor analysis since Spearman:
Where do we stand? What do we know? In
R. Kanfer, P. L. Ackerman, & R. Cudeck (Eds.),
Abilities, motivation, and methodology: The
Minnesota symposium on learning and individ-
ual differences (Vol. 10, pp. 4370). Hillsdale,
NJ: Lawrence Erlbaum.
Carroll, J. B. (1993). Human cognitive abilities: A
survey of factor-analytic studies. Cambridge,
MA: Cambridge University Press.
Colberg, M., Nester, M. A., & Cormier, S. M. (1982).
Inductive reasoning in psychometrics: A philo-
sophical corrective. Intelligence, 6, 139164.
Colberg, M., Nester, M. A., & Trattner, M. H. (1985).
Convergence of the inductive and deductive
models in the measurement of reasoning abilities.
Journal of Applied Psychology, 70, 681694.
Conway, A. R. A., Jarrold, C., Kane, M., Miyake, A.,
& Towse, J. (in press). Variation in working
memory. Oxford, UK: Oxford University Press.
Craik, K. (1943). The nature of explanation.
Cambridge, MA: Cambridge University Press.
Deary, I. J. (2001). Human intelligence differences:
Towards a combined experimental-differential
approach. Trends in Cognitive Science, 5,
164170.
Ebbinghaus, H. (1895). ber eine neue Methode
zur Prfung geistiger Fhigkeiten und ihre
Anwendung bei Schulkindern [On a new method
to test mental abilities and its application with
schoolchildren]. Zeitschrift fr Psychologie und
Physiologie der Sinnesorgane, 13, 401459.
Ekstrom, R. B., French, J. W., & Harman, H. H.
(1976). Manual for kit of factor-reference cogni-
tive tests. Princeton, NJ: Educational Testing
Service.
Engle, R. W. (2002). Working memory capacity as
executive attention. Current Directions in
Psychological Science, 11, 1923.
Engle, R. W., Tuholski, S. W., Laughlin, J. E., & Conway,
A. R. A. (1999). Working memory, short-term
memory and general fluid intelligence: A latent
variable approach. Journal of Experimental
Psychology: General, 128, 309331.
Epstein, S. (1994). Integration of the cognitive and
the psychodynamic unconscious. American
Psychologist, 49, 709724.
Evans, J. St. B. T. (1989). Bias in human reasoning:
Causes and consequences. Hove, UK: Lawrence
Erlbaum.
Ford, M. (1995). Two modes of mental representation
and problem solution in syllogistic reasoning.
Cognition, 51, 171.
Frege, G. (1879). Begriffsschrift: Eine der arithmetis-
chen nachgebildete Formelsprache des reinen
Denkens [Begriffsschrift: A formula language
modeled upon that of arithmetic, for pure
thought]. Halle a.S.: L. Nebert.
Gilinsky, A. S., & Judd, B. B. (1993). Working
memory and bias in reasoning across the life
span. Psychology and Aging, 9, 356371.
Gitomer, D. H. (1988). Individual differences in
technical troubleshooting. Human Performance,
1, 111131.
Guilford, J. P. (1956). The structure of intellect.
Psychological Bulletin, 53, 267293.
Guilford, J. P. (1967). The nature of human intelli-
gence. New York: McGraw-Hill.
Guilford, J. P., Christensen, P. R., Kettner, N. W.,
Green, R. F., & Hertzka, A. F. (1954). A factor
analytic study of Navy reasoning tests with the
Air Force Aircrew Classification Battery.
Educational and Psychological Measurement,
14, 301325.
Guilford, J. P., Comrey, A. L., Green, R. F., &
Christensen, P. R. (1950). A factor-analytic
study on reasoning abilities: I. Hypotheses and
description of tests. Reports from the
Psychological Laboratory, University of
Southern California, Los Angeles.
Guilford, J. P., Green, R. F., & Christensen, P. R.
(1951). A factor-analytic study on reasoning
abilities: II. Administration of tests and analysis
of results. Reports from the Psychological
Laboratory, University of Southern California,
Los Angeles.
Gustafsson, J.-E. (1983). A unifying model for the
structure of intellectual abilities. Intelligence, 8,
179203.
Hammond, K. R. (1996). Human judgment and social
policy: Irreducible uncertainty, inevitable error,
unavoidable injustice. Oxford, UK: Oxford
University Press.
Measuring Reasoning Ability389
21-Wilhelm.qxd 9/8/2004 5:09 PM Page 389
Handley, S. J., Dennis, I., Evans, J. St. B. T., & Capon,
A. (2000). Individual differences and the search
for counter-examples in reasoning. In W. Schaeken,
A. Vandierendonck, & G. de Vooght (Eds.),
Deductive reasoning and strategies (pp. 241266).
Hillsdale, NJ: Lawrence Erlbaum.
Hertzka, A. F., Guilford, J. P., Christensen, P. R., &
Berger, R. M. (1954). A factor analytic study
of evaluative abilities. Educational and Psycho-
logical Measurement, 14, 581597.
Holyoak, K. J., & Thagard, P. (1997). The analogical
mind. American Psychologist, 52, 3544.
Horn, J. L., & Cattell, R. B. (1967). Age differences
in fluid and crystallized intelligence. Acta
Psychologica, 26, 107129.
Horn, J. L., & Noll, J. (1994). A system for under-
standing cognitive capabilities: A theory and
the evidence on which it is based. In D. K.
Detterman (Ed.), Current topics in human
intelligence: Vol. 4. Theories of intelligence
(pp. 151203). Norwood, NJ: Ablex.
Horn, J. L., & Noll, J. (1997). Human cognitive
capabilities: Gf-Gc theory. In D. P. Flanagan, J. L.
Genshaft, & P. L. Harrison (Eds.), Contemporary
intellectual assessment: Theories, tests, and
issues (pp. 5392). New York: Guilford.
Hummel, J. E., & Holyoak, K. J. (2003). A symbolic-
connectionist theory of relational inference and
generalization. Psychological Review, 110,
220264.
Jger, A. O., S, H.- M., & Beauducel, A. (1997).
Berliner Intelligenzstruktur Test [Berlin Intel-
ligence Structure test]. Gttingen: Hogrefe.
Jensen, A. R. (1998). The g factor: The science of
mental ability. London: Praeger.
Johnson-Laird, P. N. (1985). Deductive reasoning abil-
ity. In R. J. Sternberg (Ed.), Human abilities: An
information-processing approach (pp. 173194).
New York: Freeman.
Johnson-Laird, P. N. (1994a). Mental models and
probabilistic thinking. Cognition, 50, 189209.
Johnson-Laird, P. N. (1994b). A model theory of
induction. International Studies in the Philoso-
phy of Science, 8, 529.
Johnson-Laird, P. N. (2001). Mental models and
deduction. Trends in Cognitive Science, 5,
434442.
Johnson-Laird, P. N., & Byrne, R. M. J. (1991).
Deduction. Hove, UK: Lawrence Erlbaum.
Johnson-Laird, P. N., & Byrne, R. M. J. (1993).
Models and deductive rationality. In K. Manktelov
& D. Over (Eds.), Rationality: Psychological
and philosophical perspectives (pp. 177210).
London: Routledge.
Johnson-Laird, P. N., Legrenzi, P., Girotto, V.,
Legrenzi, M. S., & Caverni, J. P. (1999). Nave
probability: A mental model theory of exten-
sional reasoning. Psychological Review, 106,
6288.
Kane, M. J., & Engle, R. W. (2002). The role of pre-
frontal cortex in working-memory capacity,
executive attention, and general fluid intelli-
gence: An individual-differences perspective.
Psychonomic Bulletin & Review, 9, 637671.
Kane, M. J., Hambrick, D. Z., Tuholski, S. W.,
Wilhelm, O., Payne, T. W., & Engle, R. W.
(2004). The generality of working-memory
capacity: A latent-variable approach to verbal
and visuo-spatial memory span and reasoning.
Journal of Experimental Psychology: General,
133, 189217.
Klauer, K. C., Musch, J., & Naumer, B. (2000). On
belief bias in syllogistic reasoning. Psychologi-
cal Review, 107, 852884.
Klauer, K. J. (1990). A process theory of inductive
reasoning tested by the teaching of domain-
specific thinking strategies. European Journal of
Psychology of Education, 5, 191206.
Klauer, K. J. (2001). Handbuch kognitives training
[Handbook of cognitive training]. Toronto:
Hogrefe.
Krueger, F., & Spearman, C. (1906). Die Korrelation
zwischen verschiedenen geistigen Leistungs-
fhigkeiten [The correlation between different
mental abilities]. Zeitschrift fr psychologie, 44,
50114.
Kyllonen, P. C. (1996). Is working memory capacity
Spearmans g? In I. Dennis & P. Tapsfield
(Eds.), Human abilities: Their nature and mea-
surement (pp. 4975). Mahwah, NJ: Lawrence
Erlbaum.
Kyllonen, P. C., & Christal, R. E. (1990). Reasoning
ability is (little more than) working-memory
capacity?! Intelligence, 14, 389433.
Kyllonen, P. C., & Stephens, D. L. (1990). Cognitive
abilities as determinants of success in acquiring
logic skill. Learning and Individual Differences,
2, 129160.
Line, W. (1931). The growth of visual perception in
children. British Journal of Psychology, 15.
Lohman, D. F. (1996). Spatial ability and g. In
I. Dennis & P. Tapsfield (Eds.), Human abilities:
390HANDBOOK OF UNDERSTANDING AND MEASURING INTELLIGENCE
21-Wilhelm.qxd 9/8/2004 5:09 PM Page 390
Their nature and measurement (pp. 97116).
Mahwah, NJ: Lawrence Erlbaum.
Magnani, L. (2001). Abduction, reason, and science:
Processes of discovery and explanation.
Dordrecht, the Netherlands: Kluwer Academic.
McDonald, R. P. (1985). Factor analysis and related
methods. Hillsdale, NJ: Lawrence Erlbaum.
Miyake, A., & Shah, P. (1999). Models of working
memory: Mechanisms of active maintenance
and executive control. New York: Cambridge
University Press.
Oberauer, K., S, H.-M., Schulze, R., Wilhelm, O.,
& Wittmann, W. W. (2000). Working memory
capacity: Facets of a cognitive ability construct.
Personality and Individual Differences, 29,
10171045.
Oberauer, K., S, H.-M., Wilhelm, O., & Wittmann,
W. W. (2003). The multiple faces of working
memory: Storage, processing, supervision, and
coordination. Intelligence, 31, 167193.
Penrose, L. S., & Raven, J. C. (1936). A new series
of perceptual tests: Preliminary communication.
British Journal of Medical Psychology, 16,
97104.
Rips, L. J. (1994). The psychology of proof:
Deductive reasoning in human thinking.
Cambridge: MIT Press.
Roberts, R. D., Goff, G. N., Anjoul, F., Kyllonen, P. C.,
Pallier, G., & Stankov, L. (2000). The Armed
Services Vocational Aptitude Battery: Not much
more than acculturated learning (Gc)? Learning
and Individual Differences, 12, 81103.
Schaeken, W., de Vooght, G., Vandierendonck, A., &
dYdewalle, G. (Eds.). (2000). Deductive reason-
ing and strategies. New York: Lawrence Erlbaum.
Schmidt, F. L., & Hunter, J. E. (1998). The validity
and utility of selection methods in personnel
psychology: Practical and theoretical implica-
tions of 85 years of research findings.
Psychological Bulletin, 124, 262274.
Shafir, E., & Le Boeuf, R. A. (2002). Rationality.
Annual Review of Psychology, 53, 491517.
Shye, S. (1988). Inductive and deductive reasoning: A
structural reanalysis of ability tests. Journal of
Applied Psychology, 73, 308311.
Sloman, S. A. (1996). The empirical case for two
systems of reasoning. Psychological Bulletin,
119, 322.
Spearman, C. (1904). General intelligence objec-
tively determined and measured. American
Journal of Psychology, 15, 201293.
Spearman, C. (1923). The nature of intelligence and
the principles of cognition. London: Macmillan.
Spearman, C. (1927). The abilities of man: Their
nature and measurement. New York: AMS.
Spearman, C. (1938). Measurement of intelligence.
Scientia, 64, 7582.
Stanovich, K. E. (1999). Who is rational: Studies of
individual differences in reasoning. Mahwah,
NJ: Lawrence Erlbaum.
Stegmller, W. (1996). Das Problem der Induktion:
Humes Herausforderung und moderne Antworten
[The problem of induction: Humes challenge
and modern answers]. Darmstadt: Wissenschaf-
tliche Buchgesellschaft.
Stenning, K., & Oberlander, J. (1995). A cognitive
theory of graphical and linguistic reasoning:
Logic and implementation. Cognitive Science,
19, 97140.
Sternberg, R. J., & Turner, M. E. (1981). Components
of syllogistic reasoning. Acta Psychologica, 47,
245265.
String, G. (1908). Experimentelle Untersuchungen
ber einfache Schlussprozesse [Experimental
studies on simple inference processes]. Archiv
fr die gesamte Psychologie, 11, 127.
S, H.-M., Oberauer, K., Wittmann, W. W.,
Wilhelm, O., & Schulze, R. (2002). Working
memory capacity explains reasoning ability
and a little bit more. Intelligence, 30, 261288.
Thurstone, L. L. (1938). Primary mental abilities.
Chicago: University of Chicago Press.
Thurstone, L. L., & Thurstone, T. G. (1941).
Factorial studies of intelligence. Chicago:
University of Chicago Press.
Undheim, J. O., & Gustafsson, J.-E. (1987). The hier-
archical organization of cognitive abilities:
Restoring general intelligence through the use of
linear structural relations. Multivariate Behavior
Research, 22, 149171.
Wilhelm, O. (2000). Psychologie des schlussfolgern-
den Denkens: Differentialpsychologische Prfung
von Strukturberlegungen [Psychology of rea-
soning: Testing structural theories]. Hamburg:
Dr. Kovac.
Wilhelm, O., & Conrad, W. (1998). Entwicklung und
Erprobung von Tests zur Erfassung des logis-
chen Denkens [Development and evaluation of
deductive reasoning tests]. Diagnostica, 44,
7183.
Wilhelm, O., & McKnight, P. E. (2002). Ability and
achievement testing on the World Wide Web. In
Measuring Reasoning Ability391
21-Wilhelm.qxd 9/8/2004 5:09 PM Page 391
B. Batinic, U.-D. Reips, & M. Bosnjak (Eds.),
Online social sciences (pp. 151181). Toronto:
Hogrefe.
Wilhelm, O., & Schulze, R. (2002). The relation of
speeded and unspeeded reasoning with mental
speed. Intelligence, 30, 537554.
Wilkins, M. C. (1929). The effect of changed mater-
ial on ability to do formal syllogistic reasoning.
Psychological Archives, 16, (102).
Woodworth, R. S., & Sells, S. B. (1935). An
atmosphere effect in formal syllogistic reason-
ing. Journal of Experimental Psychology, 18,
451460.
Yang, Y., & Johnson-Laird, P. N. (2001). Mental
models and logical reasoning problems in the
GRE. Journal of Experimental Psychology:
Applied, 7, 308316.
392HANDBOOK OF UNDERSTANDING AND MEASURING INTELLIGENCE
21-Wilhelm.qxd 9/8/2004 5:09 PM Page 392

You might also like