The Nature of Non Monotonic Reasoning

The Nature of Nonmonotonic Reasoning
CHARLES G. MORGAN
Department of Philosophy, University of Victoria, Victoria, B.C. V8W 3P4, Canada and Varney Bay
Institute for Advanced Study, P.O. Box 45, Coal Harbour, B.C. VON 1KO, Canada (E-mail:
morgan@phastf.phys.uvic.ca)
Abstract. Conclusions reached using common sense reasoning from a set of premises are often
subsequently revised when additional premises are added. Because we do not always accept previous
conclusions in light of subsequent information, common sense reasoning is said to be nonmonotonic.
But in the standard formal systems usually studied by logicians, if a conclusion follows from a set
of premises, that same conclusion still follows no matter how the premise set is augmented; that is,
the consequence relations of standard logics are monotonic. Much recent research in AI has been
devoted to the attempt to develop nonmonotonic logics. After some motivational material, we give
four formal proofs that there can be no nonmonotonic consequence relation that is characterized
by universal constraints on rational belief structures. In other words, a nonmonotonic consequence
relation that corresponds to universal principles of rational belief is impossible. We show that the
nonmonotonicity of common sense reasoning is a function of the way we use logic, not a function of
the logic we use. We give several examples of how nonmonotonic reasoning systems may be based
on monotonic logics.
Key words: logic, non-classical logic, nonmonotonic logic
1. Introduction
My friend Sarah is debating with herself about going to the mall. She needs to
do some shopping, but she dislikes crowds. She notes that it is 10:00 a.m. on a
Wednesday morning, and she reasons as follows:
At this hour of the day in the middle of the week, most adults with jobs will be
at work, and most children will be in school. So, the mall will not be crowded.
But then Sarah remembers that it is the 15th of the month. In her town, all the
merchants in the mall give a 25% discount to seniors on the 15th of the month. She
reasons as follows:
But it is senior’s discount day at the mall. Seniors really like to take advantage
of discounts because many of them are on fixed incomes, and seniors will not
be at work. So, the mall will be crowded.
On further reflection, Sarah remembers reading in the paper that there is a ser-
ious flu epidemic in her area. It is so severe that just yesterday the Department
of Health issued an advisory, warning all seniors to avoid crowds. She reasons as
follows:
However, the health department just issued that flu advisory, and at their age,
most seniors are pretty careful about their health. Hence most seniors will avoid
shopping until the advisory is lifted. So, the mall will not be crowded.
Minds and Machines 10: 321–360, 2000.

© 2000 Kluwer Academic Publishers. Printed in the Netherlands.
322 C.G. MORGAN
Sarah’s opinions about whether or not the mall will be crowded switch back and
forth as she recalls more and more relevant information. The pattern illustrated in
our example is familiar to us all. As we add more and more information, our conclu-
sions are apt to swing from one extreme to another, and back again. Commonsense
reasoning about everyday matters is full of such examples.
Formal logics of the sorts usually studied by logicians (at least up to about 20
years ago) do not seem well suited to the analysis of this sort of reasoning. Let A
be an arbitrary sentence and let 0 be an arbitrary set of sentences. We will use the
notation ‘0 ` A’ to indicate that our logic sanctions the inference from premises
0 to conclusion A. We may sometimes add a subscript to specify the logic if in
the context confusion about the logic is likely. The usual formal logics (classical,
intuitionistic, many-valued, modal) have a property that has come to be known
as “monotonicity”; namely if the logic sanctions an inference from premises 0 to
conclusion A, then the logic will sanction the inference to conclusion A from 0
plus any additional premises. That is, if 0 ` A, then 0 ∪ 1 ` A, for any set 1.
Note that in our example about Sarah, we did not reject any of our previous
explicitly stated premises as we proceeded. We simply added more and more in-
formation. As each new bit of information was added, Sarah’s conclusion switched
back and forth. Sarah’s reasoning was nonmonotonic. Most common sense reas-
oning about everyday matters seems to be nonmonotonic. Since common sense
reasoning is nonmonotonic, but the usual formal logics are all monotonic, it would
seem that we need to develop a new kind of logic, a nonmonotonic logic, to model
common sense reasoning.
It is the purpose of this paper to challenge not only the NEED, but also the
rational basis, for moving to a special logic in order to account for the nonmono-
tonic character of commonsense inference. After a few brief preliminaries, we
will present four proofs that it is impossible to have a nonmonotonic consequence
relation that corresponds to universal constraints on rational belief structures. In
brief, a rational nonmonotonic logic is impossible. We will then go on to show that
one can model nonmonotonic reasoning with standard monotonic logic, indicating
that there is no need for a special nonmonotonic logic anyway. Non-monotonicity
results from making alterations in our mental models of the world as a result of
new information, and then using the revised mental models and standard logics to
draw our conclusions. Principles of nonmonotonic reasoning are simply heuristics
for the modification of the mental models we use to get around in the world. As
such they are not content free, as is the case with principles of deductive reasoning.
Principles of nonmonotonic reasoning contain empirical assumptions and should
be subject to empirical test and verification, as with any other scientific generaliza-
tions. While these conclusions are not startlingly new, what is new are our proofs of
the irrationality inherent in the alternative of viewing nonmonotonicity as deriving
from some strange logic.
This paper is basically meta-theoretical. It is not intended to be a survey of the
field of nonmonotonic logic, and generally, we will avoid detailed commentary on
NONMONOTONIC REASONING 323
specific systems. Good introductions to and surveys of the early trends in research
in this area may be found in Besnard (1989), Smets et al. (1988) and Turner (1984).
More recently there have been a large number of survey books published covering
more recent developments; one may profitably begin with Brewka et al. (1997).
2. Preliminary matters
We will begin with a few mundane observations about the role of computation in
the interaction of an organism with its environment. Biological organisms have a
number of primary goals, among which we may list (1) energy acquisition, (2)
reproduction, and (3) death avoidance. These categories are only mentioned as
a rough guide and are intended to be neither mutually exclusive nor jointly ex-
haustive. Let us compare a very simple organism like the paramecium with a more
complicated organism, like the common house cat.
For the paramecium in its usual environment, food is plentiful and randomly
distributed, enemies are randomly distributed, and reproductive opportunities (at
least opportunities for the sharing of genetic material) are randomly distributed. So
in its environment, the paramecium has no need of great computational power.
Computational power would not help it in energy acquisition, reproduction, or
death avoidance. Since computational power costs energy, increased computational
power for the paramecium would be a net loss.
In contrast to the environment of the paramecium, for the common cat, food
is scarce and not randomly distributed, enemies are not randomly distributed, and
reproductive opportunities are not randomly distributed. Nor are any of these items
uniformly distributed. In order to overcome this non-random, but non-uniform dis-
tribution, the cat has to be able to take information in from its environment and
make predictions about the consequences of various actions it might take.
Consider a cat trying to pounce on a mouse in the middle of a field. In order
to be useful, the feline computations must satisfy a number of constraints. Any
prediction by the cat of the movements of the mouse, in response to anticipated
movements by the cat, must occur sufficiently in advance of the event to allow the
cat to act on the prediction; if it takes too long to do the computation, it is useless.
Further, the prediction must concern matters for which the cat possesses some
behavioral strategy; it will do the cat no good to compute air flight trajectories,
as an owl might do, since the cat cannot fly. And finally, the time and energy costs
of the computation from the data available must not on average exceed the utility
of the prediction; so the extra time and energy cost to the cat of predicting the
position of the mouse correct to an angstrom, rather than correct to a centimeter,
would not yield enough payoff to be worthwhile. In summary, predictions must be
made relatively efficiently, they must be mostly correct, and they must be about
matters that are important to the organism.
Random strategies are effective only in random environments of a certain sort.
Success in a highly non-random, non-uniform environment requires both (1) an
324 C.G. MORGAN
adequate behavioral repertoire, and (2) appropriate predictive capacity. The greater
the behavioral repertoire and the better the predictive capacity, the more diverse
environments an organism can exploit, and vice versa. But in general as beha-
vioral and computational capability increase, there is a corresponding increase
in the logical state space complexity (more vectors in the space). And as state
space complexity increases, there must in general be an increase in the physical
complexity of any physical realization of a computer with respect to that space.
But as the physical complexity of an organism increases, there will be increased
energy requirements. Hence, in general, organisms that can exploit more diverse
environments have greater energy requirements.
Devices which can store information in the environment and recover it in a
systematic way as needed will in general have an advantage over devices not
so endowed. The first advantage is that fewer internal states will be required for
memory, so devices with “environmental memory” can be physically simpler. And
the second advantage is that devices with “environmental memory” will generally
have logically superior computational ability. By analogy, recall that a Turing ma-
chine is just a finite state machine brain controlling a read/write head for storing
and recovering information in its environment; and Turing machines are certainly
computationally superior to finite state machines. Further, a universal Turing ma-
chine can perform any computation that can be done by any Turing machine, and
yet the universal machine has a fixed “brain” size.
Thus organisms, with the ability to exploit very diverse environments will derive
an energy advantage from the use of computationally efficient languages that can be
used to model the environment and to predict future events. But in general, time and
energy efficiency requirements prohibit totally accurate models and totally accurate
predictions.
So the human use of natural languages is based, at least in part, on their util-
ity for modeling and predicting aspects of the environment under conditions of
uncertainty. In order to be practically useful, our modeling of the world will not
generally include all possible details. The most detailed map would be one with
a 1:1 scale, but it would be absolutely useless because it would completely cover
the terrain! Similarly, time, space, and energy bounds on our mental computations
require that mental models be somewhat simpler than that which they model. And
we are seldom able to make, nor have need of, measurements of length precise to
an angstrom, for example. So many, perhaps most, of our models of and claims
about the world are best viewed as approximations.
To repeat, our rational thought processes involve modeling and predicting as-
pects of our environment under conditions of uncertainty. Here we will assume that
our formal language is adequate for such modeling and for the expression of claims
about our world that are important to us.
To begin with, we do not wish to beg any important formal questions, so we
will be as general as possible in our characterization of the language. We will
assume that we have given some formal language, with a denumerable set of well
formed formulas.
L = {E1 , E2 , . . . }
For stylistic variety, we will use the terms “well formed formula”, “WFF”, “sen-
tence”, and “expression” interchangeably. Initially we make no assumptions about
connectives, quantifiers, or other logical particles.
We will simply assume that crucial aspects of our reasoning can be represented
in an appropriate way by expressions in the language. In particular, we assume that
for any argument, the premises are given as a set of sentences, and the conclusion
is a sentence. We use the notation “0 ` A” to indicate that the logic under study
sanctions drawing conclusion A from the set of premises F. The consequence re-
lation “`” will sometimes be subscripted when it is necessary to distinguish the
consequence relation of one logic from that of another. For a given logic, the set of
consequences of the premises 0 is defined by:
Con(0) = {A : 0 ` A}
We will now be more precise about the definition of monotonicity. In mathematical

terminology, a real function f(x) defined over the real interval (a,b) is said to be
strongly positively monotonic in the interval (a,b) iff for x (a,b), f(x) increases
as x increases; f(x) is said to be weakly positively monotonic iff for x (a, b),
f(x) does not decrease as x increases. For negatively monotonic, just replace “f(x)
increases” and “f(x) does not decrease” in the above definitions by respectively
“f(x) decreases” and “f(x) does not increase”. By an obvious extension, the ter-
minology may be applied to any function whose domain and/or range is merely
partially ordered. So for example, the power set function P (0) is strongly posit-
ively monotonic, since if 0 ⊂ P , then P (0) ⊂ P (1). Further, the terminology
may be applied to functions of more than one argument by careful specification of
the argument position. For example, we may say that over the interval (1,2) the two
place function g(x,y) = x/y is strongly positively monotonic with respect to its first
argument and strongly negatively monotonic with respect to its second argument.
With these definitions, we can see that the consequence relation of classical
logic is weakly positively monotonic with respect to premises, because if 0 ⊂ 1,
then ConC (0) ⊆ ConC (1); i.e., if 0 ⊂ 1, then for all sentences A, 1 `C A if
0 `C A. Or, to use the more usual formulation, the consequence relation of clas-
sical logic is (weakly positively) monotonic with respect to premises because if
0 `C A, then 0 ∪ 1 `C A. Many researchers in the field of computing science
seem to be unaware of the mathematical origins of the terms “monotonic” and
“nonmonotonic”, and this lack of understanding sometimes leads to confusion.
For future reference, it will be useful to characterize the situation in a slightly
different way. There is an equivalent way of defining monotonicity of a relation
in terms of the characteristic function of the relation. For any n-adic relation R (n
= 1, 2, . . . ), the characteristic function C[R] is defined to be the map which takes
326 C.G. MORGAN
n-tuples from the field of R into {1, 0}, as follows:
C[R](x1 , . . . , xn ) =1, if and only if R(x1 , . . . , xn ) is true

0, otherwise
We then say that the relation R is strongly/weakly, positively/negatively monotonic

with respect to position i if and only if its characteristic function is. So, we can say
that the consequence relation of classical logic `C is weakly positively monotonic
because the characteristic function C[`C ] is weakly positively monotonic.
In this paper we will always be talking about positively monotonic functions,
so with that understanding we will no longer specify which sort of monotonicity
is meant. Further, consequence relations are generally only weakly monotonic at
best, so henceforth we will follow common usage in the logic literature and say
“monotonic” when we mean “weakly positively monotonic”.
Common sense reasoning takes us from premises to conclusions. So just as
with classical logic, we may think of common sense reasoning as corresponding
to a consequence relation “`CS ”. Our simple example about Sarah’s reasoning
(concerning how crowded the mall will be) clearly shows that the common sense
consequence relation is not monotonic, i.e., it is nonmonotonic; in some cases we
have 0 `CS A, but we do NOT have 0 ∪ 1 `CS A. We now want to examine the
possibility of formalizing a theory of this nonmonotonic consequence relation.
Cooking up a nonmonotonic consequence relation is a completely trivial task.
For example, consider the following strange consequence relation:
0 ` S A iff card(0) ≤ n and Lng(A) ⊆ Lng(0) but A 6 ∈ 0.
By “card(0)”, we mean “the cardinal number of elements in 0”; and in the above
formulation, “n” is some arbitrary finite integer. And we use “Lng(A)”, the lan-
guage of A, to mean the set of non-logical components used in the construction of
A, i.e., the set of predicates, constants, and sentence letters that actually occur in
A; similarly, “Lng(0)” is used to designate the set of nonlogical components that
appear in the members of 0. So basically, our strange consequence relation says
that as long as we have fewer than some specified number of premises, any sentence
which uses only the language of the premises but is not one of the premises is a
consequence of those premises.
Our strange consequence relation is clearly nonmonotonic, since if we add
premises so the total exceeds the specified number n, then nothing follows. It is
also paraconsistent, because if the language contains a negation “∼”, we may have
both 0 `S A and 0 `S ∼ A, but we will not have 0 `S B for all B. But even though
it is nonmonotonic, this consequence relation is still clearly uninteresting. It is
trivial because it sanctions many inferences we would deem to be irrational, and it
fails to sanction inferences we would deem to be rational. For example, assuming
2 ≤ n, then from premise set “The cat sat on the mat., The dog is brown.”, we could
conclude “The dog sat on the cat.” We would also be entitled to conclude “The cat
did not sit on the mat” but we would not be entitled to conclude “The cat sat on the
mat”. Anyone whose reasoning corresponded to our strange consequence relation
would rightly be deemed to be irrational. To the extent that one refuses to accept
logically obvious conclusions, one may be deemed to be less than rational; on the
other hand, jumping to bizarre conclusions is a clear indication of irrationality.
So cooking up nonmonotonic consequence relations is not a problem. The prob-
lem is to formulate a nonmonotonic consequence relation that corresponds to can-
nons of rationality. Roughly what we want is an explication of something like
“Given that it is rational to believe the set of premises 0, it is rational to believe
A.” Relations which violate principles of rational belief can not be taken seriously
as candidates for a common sense consequence relation; to be non-trivial, a con-
sequence relation must correspond to principles of rational belief. In order to be
acceptable, a logic must sanction all (or at least most) of the arguments which
our common sense tells us are rational. On the other hand, we will not accept as
adequate a logic that sanctions just any old inference at all. To be acceptable, the
logic must reject all (or at least most) of the arguments which our common sense
tells us are not rational.
In short, to be acceptable, our consequence relation must correspond to some
sort of theory of rational belief and inference, both in the arguments it accepts and
in the arguments it rejects. When we say that nonmonotonic logic is impossible,
at least part of what we are claiming is that every adequate characterization of a
logical consequence relation in terms of rational belief structures is monotonic.
That is, mathematically speaking its consequence relation is weakly, positively
monotonic. Our first two proofs are aimed at establishing this result. Further, we
also wish to claim that the nonmonotonic character of our commonsense inferences
is an essentially meta-linguistic characteristic of the way in which the proof theory
is used, and it cannot be mirrored by any syntactic structures in the object language;
in other words, it is impossible to introduce into the syntax of the object language
any structures that adequately reflect the nonmonotonic character of common sense
reasoning. Our second two proofs are aimed at establishing this result.
3. First proof
Perhaps the most detailed formal account of rational belief structures is classical
probability theory. We can think of probability theory as an instantaneous snapshot
of the beliefs of some ideally rational agent. But for our purposes, classical prob-
ability theory depends on too many assumptions. We do not want to assume that
the language contains any particular logical particles, while classical probability
theory usually assumes at least the machinery of classical logic. We do not want
to assume that our beliefs can be assigned precise numerical values, as is done
in probability theory. We do not want to assume that all beliefs can be linearly
ordered, nor even that all beliefs are pairwise comparable.
328 C.G. MORGAN
Intuitively, we can think of a theory about some aspect of our world as just a set
of sentences. An instantaneous snapshot of the belief system of a rational agent will
assign some level of “belief” to at least some sets; we could write “b(0)” for “the
degree of belief in 0”. But degrees of belief may not correspond to numbers, and
many sets may have no degree of belief because they have never been entertained.
Further, it may be that there is no intersubjective way of comparing degrees of
belief from one individual to another. From our point of view, what is important is
that the belief system will impose some form of ordering on some of the sets. Since
it is the ordering that indicates the structure, we will generally concentrate on the
ordering. We will use LEb to indicate a quasi-ordering relation relative to belief
structure b; but because the “degree of belief” is of no interest to us apart from
the ordering relation, we will generally drop the subscript. So instead of writing
something like “b(0) ≤ b(1)” we will write “0 LEb 1”, or more simply “0 LE 1”
for “the degree of joint acceptability of the members of 0 is less than or equal to
the degree of joint acceptability of the members of 1”.
Recall that we do not assume that “degree of acceptability” or “degree of belief”
corresponds to any numerical measure. Further, we do not assume that the relations
LE are either objectively or subjectively based; they may be either, they may be
partially one and partially the other, or they may be based on some third alternative.
We do not assume that the relations LE are in accord with any standard probability
measure or that they correspond to some objective relative frequency scheme. We
want to admit interpretations such as “If all the members of 0 have a probability
above threshold r, then all the members of 1 also have probability above that
threshold,” or “most people who find all the members of 0 to be more acceptable
than their denials would find the members of 1 to be more acceptable than their
denials”.
With these brief preliminaries, we are now in a positions to specify some re-
strictions which apply to all rational belief structures. For our purposes, we will
take a belief structure LE to be a subset of P (L)× P (L), subject to the following
two constraints:
b.1 Reflexivity:0 LE 0
b.2 Transitivity: If 0 LE 0 0 and 0 0 LE 0 00 , then 0 LE 0 00 .
In short, we will consider any quasi-ordering relation defined over pairs of sets of
sentences to be a belief structure.
Recall that we are thinking of a set of sentences as being a theory about the
world, in the ordinary scientific sense of the word ‘theory’. For example, the set
{A, B} is interpreted as meaning that both A and B hold. It would be bizarre in the
extreme for someone to claim to believe the theory of thermodynamics, but then
go on to say they do not believe in the second law. So sets of sentences are treated
conjunctively, even if there is no conjunction operator in the language. Further, sets
may contain an infinite number of sentences, and so would not correspond to any
single sentence even in standard languages with a conjunction operator. Thus we do
not assume that sets are closed with respect to arbitrary conjunctions of members.
In addition, just as in ordinary scientific discourse, we do not assume that the sets
are “deductively closed”, or closed with respect to any consequence relation. To
write down the theory of Newtonian mechanics, we do not have to write down all
its deductive consequences.
Our next constraint is motivated by simple relative frequency considerations.
As an illustrative example, note that it is harder to build a house and an attached
garage than it is to build just the house. Similarly, a theory which claims both A
and B will be more difficult to support than a theory which claims just A. Looking
at it from another point of view, there will be fewer (or no more) universe designs
compatible with both A and B than there are compatible with just A. In general, if
0 makes no more claims about the universe than 1, then 0 is at least as likely as
1. Formally, we can state the condition as follows.
b.3 Subset principle: If 0 ⊆ 1, then 1 LE 0.
If 0 makes no more claims about the world than does 1, then it would be irrational
not to have at least as strong a belief in 0 as in 1.
For any specific logic, there will no doubt be other constraints one could place
on rational belief structures. For just one example, consider classical logic. In
classical logic we know that from a conditional A ⊃ B and its antecedent A, one
may infer the consequent B. This inference rule might translate into a number of
restrictions on rational belief structures, such as either of the following:
b.mpl { A ⊃ B, A} LE {B}
b.mp2 If A ⊃ B 0 and A 0, then 0 LE 0 ∪ {B}
But again, we do not wish to prejudge any issues with regard to logical particles or
inference rules. Recall that formal logics are both descriptive and prescriptive. So
to avoid questionable presuppositions, we will adopt the very general view that a
logic is just any prescription for what is to count as a rational belief structure. That
is, a logic L is any arbitrary set of belief structures:
L = {LE1 , LE2 , . . . }
Each distinct logic will pick out a different set of belief structures; those for clas-
sical logic will be different from those for intuitionism, and both will be different
than those for Post logic. But the important point is that from the standpoint of
logic L, all and only the belief structures in L are rational.
Recall that we are concerned with the question of what kinds of entailment
relations can be characterized in this way. Whatever else it may be, logic is not
psychology. Logicians are not trying to codify the mental associations of some
particular individual. So logical entailment does not depend on any single belief
structure. Rather, logical entailment for logic L must be based on some universal
properties of all rational belief structures as determined by L. Further, if the proof
330 C.G. MORGAN
theory is to be adequate, it must accept all the arguments sanctioned by L and reject
all the arguments deemed unacceptable by L. These considerations are expressed
by two standard properties, which we will discuss in turn.
To motivate the first property, suppose we are trying to construct an artificial
reasoning system. We certainly want to AVOID making a system that will some-
times, in logically unpredictable cases, draw conclusions that most would regard as
irrational. Our first property captures this desideratum and is called “soundness.”
We can think of this principle as the “good path” principle; logical entailment
should never lead us astray. We can express the soundness principle formally as
follows:
(1.1) Soundness: If 0 ` b A, then 0 LE {A} for all (most) rational belief

structures LE L.
Note that we are using the notation “`b ” to indicate a proof theory which coincides
with some set L of rational belief structures.
To motivate the second property, again suppose we are trying to construct an
artificial reasoning system. As before, we want to AVOID making a system that
will sometimes, in logically unpredictable cases, FAIL to draw conclusions that
most would deem to be rational. The second property is called “completeness.”
Basically, we want our proof theory to be strong enough to capture any argument
which is sanctioned by every rational belief structure. If, no matter what the rational
belief structure is like, it is always rational to accept A on condition that we accept
all of 0, then 0 logically entails A. Using the same notation as above, we can
formally express the completeness condition as follows:
(1.2) Completeness: If 0 LE {A} for all (most) rational belief structures

LE L, then 0 ` b A.
Note that in our statements of (1.1) and (1.2), we have included “most” in par-
entheses in an effort to be as generous as possible. In our proof below, we will use
the “all” construction to keep the presentation of the proof as simple as possible.
However, simple relative frequency considerations could be used to establish the
same result as long as “most” means “more that 50%.”
So far, we have been very general in all our considerations. We have allowed
the formal language to have absolutely any logical particles and internal structure
desired. Our characterization of belief structures was also very general, requiring
only the minimal conditions dictated by relative frequency considerations. In par-
ticular, we imposed no requirements on the evolution of belief structures. And we
allowed a completely arbitrary specification of which belief structures are to be
considered rational. We have imposed no requirements on the sorts of inference
rules contained in the proof theory. Of our proof theory, we required only (1)
that it not lead us astray by sanctioning inferences deemed to be irrational, and
(2) that it be strong enough to capture any argument deemed to be rational by all
belief structures. But even with this simple framework, we are now in a position to
demonstrate the impossibility of a nonmonotonic consequence relation.
Theorem 1: Let L be an arbitrary set of rational belief structures which are re-
flexive, transitive, and satisfy the subset principle. Further suppose that logical
entailment `b is sound and complete with respect to the set L . Then logical
entailment is monotonic; that is, if 0 `b A, then 0 ∪ 1 `b A.
Proof:
1. 0 ` b A given
2. 0 LE {A}for all LE L. 1, soundness
3. 0 ∪ 1 LE 0 for all LE L subset principle
4. 0 ∪ 1 LE {A}for all LE L. 2, 3, transitivity
5. 0 ∪ 1 ` b A 4, completeness
In summary, it is trivially easy to formulate “proof theories” which violate the

principle of monotonicity. But these nonmonotonic algebras cannot correspond to
logical entailment based on rational belief. Any entailment relation which cor-
responds to elementary principles of rational belief must be monotonic. Hence, a
rational nonmonotonic logic is impossible, no matter how the set of rational belief
structures is selected.
(I have presented the proof in this section in a number of lectures over the years
since 1988, and it appears in Morgan (1998).)
4. Second Proof
Some may feel that trying to represent belief structures by quasi-orderings on sets
of sentences does not do justice to the conditional nature of actual belief systems.
It seems that almost any belief we have is conditional on a lot of background
assumptions. For example, even the belief that “The cat is on the mat” involves
assumptions about the physical nature of cats and mats, the impenetrability of
matter, and so on. So, let us consider conditional belief structures as a foundation
for an entailment relation.
Theories of conditional probability are likely the most frequently encountered
candidates for models of conditional belief structures. As we previously indic-
ated, there is some controversy over the adequacy of numerically based probability
theories as models of actual belief structures. Most people cannot linearly order
their beliefs, much less assign precise numerical weights to them, so it seems that
numerically based theories are over specified. As before, we do not wish to beg
any fundamental questions, so we will allow both numerically based theories and
comparative theories as well. A “conditional belief unit” will be represented by
“c(A, 0)” for A any sentence in the language and 0 any set of sentences of the
language. We think of A as the conclusion, and we think of the set 0 as containing
332 C.G. MORGAN
the premises, the evidence, or the background assumptions. So c(A, 0) is to be

thought of as the degree of belief in A, given the assumptions (evidence) in the set
0.
For numerically based theories, we think of “c” as being a function which
maps ordered pairs (A, 0) into the closed unit interval [0,1]. A theory of ra-
tional belief must then include a set of constraints which must be satisfied by any
acceptable function c. Comparative theories are sometimes represented in terms
of some quasi-ordering relation on the units c(A, 0), and the theory of rational
belief specifies constraints on the permissible ordering relations. Alternatively,
comparative theories are sometimes formulated in terms of a 4-place predicate; for
example, Carnap (1962) uses the notation “mc(A, 0, B, 1)” intuitively meaning
something like “the argument (A, 0) is at least as strong as the argument (B, 0)”.
Constraints are then phrased in terms of the predicate mc. But for our purposes,
even for comparative theories it will be convenient to speak of “c” as a function
mapping ordered pairs (A, 0) onto the elements c(A, 0) of some quasi-ordered set.
That is, constraints on the ordering relation or the 4-place predicate can be written
equivalently as constraints on the function c. Thus in any case, at least part of a
theory of rational belief is the specification of a set of constraints that determines
the acceptable conditional belief functions c.
Since we wish to be as general as possible, we will not discuss the nature of the
constraints that determine the acceptable conditional belief functions. Rather, we
will allow any prescription for rationality. So once again, we will take a logic L to
be any set of conditional belief functions:
L = {c1 , c2 , . . . }
The c-functions are assumed to have as domain the ordered pairs in L× P (L).
However, for our purposes we need not specify the range; it could be the closed
unit interval [0,1], or it could be the elements of some quasi-ordered algebra in
case we are dealing with a comparative theory.
We now wish to draw attention to an important property of conditional belief. It
is of interest to note that we are rationally able to entertain a variety of alternative
hypotheses. I can consider what the world will be like, on condition that I decide
not to go to work tomorrow. I can consider what the world will be like, on condition
that I decide I will go to work tomorrow. In each case, my conditional belief state
seems perfectly rational. A casual examination of a wide variety of probability
theories, whether numerical or comparative, will show that they are virtually all
conditionalizable; i.e., if I conditionalize a rational belief state by adding some
additional sentence to my background assumptions, the resulting belief state is
itself rational. A more formal characterization follows:
C.1 A set L of conditional belief functions is closed under
conditionalization iff for all c in L, for any set 1, the function c∗
defined below is also a member of L.
c∗ (A, 0) = df c(A, 0 ∪ 1)
Carefully note that we are not saying that c(A, 0) = c(A, 0 ∪ 1). We are making no
claim about the relationship between c(A, 0) and c(A, 0 ∪ 1). All we are saying
is that a conditionalized legitimate conditional belief structure is itself a legitimate
conditional belief structure.
Also note that requiring L to be closed under conditionalization does not impose
any restriction on the evolution of beliefs. While some have claimed that the only
rational way for beliefs to evolve is by (e.g., Bayesian) conditionalization, others
have disputed such claims. Again, C. 1 does not say anything about the evolution
of beliefs; it simply says that a conditionalized rational belief function is itself a
rational belief function.
The only problems with conditionalization that generally arise correspond to
cases in which the measure of the conditioning set is 0, or alternatively, when the
sentences in the conditioning set are internally inconsistent or incoherent in some
way. In such cases there is a trivial modification of the theory possible to give c∗
some default value, usually 1 for numerically based theories. The alternative is
to permit partial functions; in such a case, we would take c∗ to be the function
indicated in C.1, but undefined when 0 ∪ 1 is problematic as evidence.
We will now consider an important general property of many characterizations
of entailment. In order to motivate the property, let us consider a few proposed
definitions for an entailment relation. For simplicity, we will restrict the examples
to those based on numerically valued functions c.
D.1 0 ` c A iff(∀)c(A, 0) > c(∼ A, 0).

D.2 0 ` c A iff(∀c)c(A, 0) ≥ 1 − .
D.3 0 ` c A iff(∀c) for all6, c(A,0 ∪ 6) > 0.5.
In both D. 1 and D.2, the evidence position is always occupied just by the premise
set 0. However, in D.3, the evidence position is occupied by 0∪6, and 6 is univer-
sally quantified. That is, the defining sentence may talk about not just 0, but also
some set-theoretic function of 0. But in all three cases, the defining sentence has a
property which we call being “transparent to unions”; the formal characterization
is as follows:
C.2 “Sentence [A, c(–, f(0))]” is transparent to unions iff for all sets 1, if
“Sentence[A, c(–, f(0) ∪1 ]” is true, then
“Sentence[A, c(–, f(0 ∪ 1)]” is true.
To forestall trivial objections based on misunderstanding, it is important to note

that by “transparent to unions”, we do not mean any of the following:
334 C.G. MORGAN
∗ if “Sentence[A, c(–, f(0)]” is true, then

“Sentence[A, c(–, f(0 ∪ 1)]” is true.
∗∗ if “Sentence[A, c(–, f(0)]” is true, then
“Sentence[A, c(–, f(0) ∪1]” is true.
∗ ∗ ∗ if “Sentence[A, c(–, f(0) ∪1]” is true, then
“Sentence[A, c(–, f(0)]” is true.
∗ ∗ ∗∗ if “Sentence[A, c(–, f(0 ∪ 1)]” is true, then
“Sentence[A, c(–, f(0)]” is true.
Being transparent to unions does not mean that one can ignore unions in the evid-
ence position.
Note that every sentence in which the assumption set 0 is mentioned only
in expressions of the form “c(–, 0)” will be transparent to unions. To fail to be
transparent to unions, a defining sentence must make strong exclusionary claims
about 0 (e.g. ‘0 must not contain sentences of the form X’, or ‘0 must be finite’)
or the sentence must make use of complicated set-theoretic functions f. A casual
examination of proposed definitions of entailment based on conditional belief func-
tions shows that virtually all of them are based on properties defined by sentences
transparent to unions.
We are now in a position to recast our soundness and completeness principles
in terms of some defining sentence concerning conditional belief structures. Recall
our previous discussion that logic is not psychology. Entailment must be a universal
property of all rational belief structures, independent of the beliefs of any particular
individual.
(2.1) Soundness: If 0 `c A, then for all conditional belief functions c L,
Sentence[A, c(–, f(0))].
(2.2) Completeness: If for all conditional belief functions c L,,
Sentence[A, c(–, f(0))], then 0 `c A.
With these preliminaries behind us, we are now in a position to give our second
proof that logical entailment must be monotone. Our proof of the following the-
orem is purposely rather pedantic in order that there will be no confusion on the
role of “union-transparent”.
Theorem 2: Suppose the set of conditional belief structures is closed under
conditionalization. Further suppose that logical entailment is sound and complete
with respect to a property of conditional belief structures that is transparent to
unions. Then logical entailment is monotonic; that is, if 0 `c A, then 0 ∪ 1 `c A,
for all A.
Proof.
1. 0 ` c A given
2. Let 1 be an arbitrary set. assume

3. (∀ c L) Sentence[A, c(–, f(0))] 1, soundness
4. Let c’ be an arbitrary member of L. assume
5. c00 L , for c00 (B, 6) = c0 (B,6 ∪ 1). 2,4, conditionalization
6. Sentence[A,] 3,5 c00 (–, f(0))
7. Sentence[A, c0 (–, f(0) ∪1)] 5, 6
8. Sentence[A, c0 (–, f(0 ∪ 1))] 7, union-transparent
9. (Vc L) Sentence[A, c(–, f(0 ∪ 1))] 4 − 8, generalization
10. 0 ∪ 1 `c A 9, completeness
11. 0 ∪ 1 `c A, for all 1. 3 − 10, generalization
Our proof here is extremely general, but very simple, and cannot be avoided by any
characterization of entailment in terms of conditional belief functions in which the
characterization depends only on the given evidence. All of our examples D.1-3
yield monotonic entailment relations. As another example, Carnap’s characteriz-
ation in Carnap (1962) of his comparative relation mc is such that mc(h, e, h0 ,
e0 ) holds if and only if for all his regular c-functions, c(h, e) ≥ c(h0 , e0 ); but it is
easy to show from his characterization that, as would be expected from the above
theorem, mc is monotonic in the following sense: If mc(h, e, h0 , e0 ), then for all d,
mc(h, e ∧ d, h0 , e0 ∧ d) (provided that neither e ∧ d nor e0 ∧ d is logically false).
As long as the set of conditional belief functions is closed under conditionaliza-
tion, then any entailment relation based on a universal claim about c(–, 0) must be
monotonic. Any entailment relation which corresponds to a universal property of
rational belief functions conditioned on the given evidence is monotonic. Hence,
again, a rational nonmonotonic logic is impossible.
One possible way to avoid the result is to try to formulate a defining sentence
that is not transparent to unions. But note that even if the defining sentence is not
fully transparent to unions, it may be transparent to unions when the set 1 belongs
to a certain class. For example, we may require that the set of assumptions be finite.
In this case, the defining sentence would be transparent to unions for finite sets.
Thus our proof of monotonicity would still hold, with the sets 1 restricted to being
finite. From the standpoint of the examples usually cited to motivate nonmonotonic
logic, the proof that entailment is monotonic relative to sets of a certain class (e.g.,
finite sets) is just as damaging as the more general proof.
Yet another way to try to avoid the result is to argue against conditionalizab-
ility. But while many have argued that conditionalization of a given belief state
(probability distribution) is not always the best, most useful, or most rational way
to update our beliefs, that is not the point at issue here. We are not here concerned
with the question of updating beliefs. All that is required is that a conditionalized
legitimate belief state should itself be a legitimate belief state. Clearly common
sense reasoning makes much use of constructions like “Suppose X is true . . . ”, and
“Consider what will happen if Y turns out to be the case . . . ” It would certainly
336 C.G. MORGAN
seem bizarre to claim that when we conditionalize our belief state in this way, we
suddenly fall into irrationality!
Further note that conditionalizability follows from relative frequency consider-
ations, even relative frequency of universe designs. Any distribution derived from
relative frequency can be conditionalized, as long as we incorporate the proviso to
handle cases with frequency 0. In addition, let us consider the usual product rule
for conjunctions.
P(A∧B, 0)=P(B,0) x P(A,0∪ {B})
If we reject conditionalizability, then this rule would seem to reduce to an absurdity,

since it takes the probability of the conjunction to be a product of two terms, one
of which comes from a conditionalized distribution.
There is an additional argument for conditionalizability based on the product
rule for conjunctions. Assume P is some given distribution, and consider the distri-
bution P∗ , which is just P conditionalized on B.
P∗ (A, 0) =df P(A ∧ B, 0)/ P(B, 0), for all WFFs A and sets 0
There will be a problem if P(B, 0) = 0, but otherwise the distribution P∗ will be as

well defined as the distribution P. A trivial modification of the classical theory will
permit the distribution obtained by assigning the constant value 1 to all conclusion-
evidence pairs; so we could assign P∗ (A, 0) the value 1 for those cases in which
P(B, 0) = 0.
Now, note that in the definition, for each evidence set 0, the value P(B, 0) is
just a normalization factor, a constant. In terms of order alone, for a fixed WFF B,
the values of P∗ (A, 0), for various WFFs A, will be related to each other in the
same way as the values of P(A ∧ B, 0). So, if there is nothing irrational about the
ordering for conjunctions, there should be nothing irrational about the ordering in
the conditionalized distribution for each fixed 0. We do need to worry a bit about
the ordering as 0 is varied; but again, by hypothesis the conjunctions are rationally
ordered, compared over different evidence sets, by the original distribution. But
recall as we said before, in terms of the beliefs of real agents, a quasi-ordering
is the best we can hope to achieve. Hence those who wish to argue against con-
ditionalizability will be forced to question the rationality of the ordering among
conjunctions in the original belief structure, contrary to the hypothesis that the
original belief structure is rational. (This same justification can be given in terms
of classical comparative probability theory, but is a bit more complex, and arrives
at the same result.) So the set of rational belief structures must be closed under
conditionalization, assuming something like the classical product rule for conjunc-
tions holds. And something akin to the classical product rule for conjunctions is
dictated by relative frequency considerations.
(The proof in this section is a generalization of one presented in Morgan (1994).)
5. Soundness
Both of our first two proofs depend on the assumption of a principle of soundness.
One proposal for avoiding the impossibility proofs is to abandon soundness. In
terms of classical logic, “revisable” conclusions must go beyond what is deduct-
ively sanctioned by the premises, otherwise they would not be “revisable”. But for
such conclusions, it will always be possible to find a model in which the premises
are all true but the conclusion is false. Hence, in terms of classical semantics,
nonmonotonic inferences are not sound.
Before blithely abandoning the soundness principle, it should be noted that
rational belief structures of the sort considered in our first two proofs are quite
different from classical model theory. The two soundness principles utilized in our
proofs are not equivalent, at least not by themselves, to the soundness principle
of classical logic. Note that in both of our impossibility proofs, subject to the
initial constraints, we allowed the selection of the belief structures deemed to be
rational to be made in any way whatsoever. What we proved in each case is that
it is impossible to characterize a set of rational belief structures in such a way
that (1) every inference that is always deemed to be rational is sanctioned by the
proof theory, and such that (2) every inference sanctioned by the proof theory is
always deemed to be rational. That is, no matter what the nonmonotonic proof
theory is like, and no matter how you select the set of rational belief structures, if
the proof theory is complete, then it must sanction arguments that are sometimes
deemed irrational. In this sense, nonmonotonic proof theory is self-refuting. To
give up on soundness is, in essence, to admit that nonmonotonic proof theory must
be irrational.
Note that we cannot even beg the question and use the proof theory to select the
set of belief structures that are to be considered rational. Suppose we start with a
proof theory, and then select the set of belief structures that sanction (in the sense
given by the appropriate soundness and completeness conditions) all and only the
inferences that are sanctioned by the proof theory. Then relative to that set of belief
structures, the proof theory will be monotonic. In short, there is no way to select a
set of rational belief structures and specify a nonmonotonic proof theory that does
not violate the cannons of rationality. If you want a complete nonmonotonic proof
theory, then you must be (at least sometimes) irrational.
6. Third Proof
In spite of our first two proofs, one might still harbor a bit of hope for constructing
a theory with some sort of nonmonotonic operator in the syntax of the language,
and then using that operator in some way to account for the nonmonotonicity of
the consequence relation. It will be useful to consider an analogy with classical
probability theory.
We begin with a simple example. Suppose we draw a single card from a stand-
ard deck with the usual constitution. Then the probability that the card is black,
338 C.G. MORGAN
given the background information, is 1/2. Now, if we are told, in addition to the
constitution of the deck, that the card drawn is a spade, then the probability that
the card is black will rise to 1; but if instead the additional information is that the
card is red, then the probability that the card is black will fall to 0. In other words,
we can find specific probability distributions P, sentences A, and sets 0, 1, and 6
such that:
P(A,0 ∪ 1)<P(A,0) <P(A, 0 ∪ 6)
Some additional evidence makes our conclusion more likely, while some addi-
tional evidence makes our conclusion less likely. Mathematically speaking, condi-
tional probability functions are in general not weakly positively monotonic in the
evidence position.
At this point, it is important to be very clear about our terminology. As we
stated above, conditional probability functions are in general not weakly positively
monotonic in the evidence position. For some sets 1, P(A, 0 ∪ 1) is less than
P(A, 0), while for other sets 6, P(A, 0 ∪ 6) is greater than P(A, 0). Without
trying to specify the details, we note that the assertability of a conclusion relative
to a set of premises is related to the conditional probability value. For just one
example, we consider the extreme cases: the sentence “A probably holds, provided
that things are as described by 0” is a reasonable claim if P(A, 0) = 1, but it is
clearly unreasonable if P(A, 0) = 0. Hence there will be cases in which the condi-
tional “A probably holds, provided that things are as described by 0” is reasonable,
but the conditional “A probably holds, provided that things are as described by
0 ∪ 1” is not reasonable. Similarly, there will be cases in which the conditional “A
probably holds, provided that things are as described by 0” is not reasonable, but
the conditional “A probably holds, provided that things are as described by 0 ∪ 6”
is reasonable. Using the modern idiom, the conditional of conditional probability
theory clearly seems to be nonmonotonic.
For an alternative view, recall our earlier discussion of the characteristic func-
tion of a consequence relation. Typically, characteristic functions are two-valued.
But by a simple extension of the notion, we could admit graded characteristic
functions. For any graded n-adic relation R (n = 1, 2, ...), the graded characteristic
function G[R] is defined to be the map which takes n-tuples from the field of R
into the range of possible gradations, as follows:
G[R](x1 , . . . , xn ) = d, if and only if R(x1 , . . . , xn )holds to degree d
One very simple view of probability theory is that it is a graded characteristic

function for an entailment relation. That is, probability theory specifies the de-
gree to which a set of premises entails a conclusion. For example, we typically
use significance levels as cutoffs for accepting or rejecting conclusions in statist-
ics. Once again, from this point of view, the entailment relation associated with
probability theory will be nonmonotonic (in the current idiom), because probab-
ility distributions are not in general weakly positively monotonic in the evidence
position.
In summary, it seems that the conditional associated with conditional probab-
ility theory will be nonmonotonic. Similarly, it seems that if by analogy with the
characteristic function for classical entailment, we think of probability theory as
a graded characteristic function for an entailment relation, that entailment relation
will be nonmonotonic. So it seems appropriate to simply say that probability theory
is nonmonotonic, meaning by this phrase only that in general, probability functions
are not weakly positively monotonic in the evidence position.
Note that not every probability function P will behave in this nonmonotonic
fashion. For example, in the case of classical logic, for any maximally consistent
set 0, we can define a two valued distribution as follows, for all sentences A and
sets 1:
P0 ( A, 0 ∪ 1) = 1 iff either A 0 or not 1 ⊆ 0

= 0 iff 1 ⊆ 0 and not A 0
The function so defined will satisfy all the constraints for classical probability
theory (modulo the proviso for evidence of measure 0). But the distribution P0
is strictly monotonic, in the sense that for all sentences A and all sets 1 and 6:
P0 ( A, 1) ≤ P 0 (A, 1 ∪ 6)
So we cannot say that every conditional probability distribution fails to be

weakly positively monotonic in the evidence position. However, the general theory
of probability does admit many nonmonotonic distributions.
So we do have general laws about how the conditional of conditional probability
works, and we know that it is sometimes nonmonotonic. Some researchers feel
that probability theory in some form (numerical, comparative, classificatory) is the
most detailed account of rational non-demonstrative reasoning we have. But even
without adopting such a strong view, it is clear that reasoning concerning relative
frequencies certainly constitutes at least a corner stone of rationality. Perhaps we
can formulate general laws about a nonmonotonic conditional in our object lan-
guage that captures the conditionality of some very general version of conditional
probability theory. In other words, the suggestion is that we extend our formal
object language with some object language syntactic structure “s” such that for all
distributions P, all sentences A and B, and all sets 0:
P(A, 0 ∪ {B}) = P(s(B, A), 0)
Since the conditional of conditional probability is nonmonotonic, the syntactic

structure “s(B, A)” should then represent some sort of nonmonotonic conditional
in the object language, with antecedent B and consequent A. With some such ob-
ject language conditional, and a corresponding proof theory “`s ”, we may hope
340 C.G. MORGAN
to define a corresponding nonmonotonic consequence relation “`nm ” for finite

premise sets, perhaps as follows:
0 ` nm A iff ` s s( ∧ 0, A)
for ∧0 the conjunction of the members of 0. (We would have to do something a

bit fancier for the case of infinite 0.)
As initially attractive as the above approach may seem, it is not too difficult to
prove that this strategy is doomed to failure; and the proof has a significant impact
on the prospects for any nonmonotonic logic. In order that our proof should be
as general as possible, we will not limit ourselves to classical probability theory,
nor even to numerically valued theories. Instead, our proof will depend only on
some very weak constraints of a comparative conditional theory. All of the standard
numerically valued probability theories will satisfy our constraints, and hence the
theorem will be very broad in its application.
As before, we assume to be given some arbitrary language L consisting of a
set of well formed formulas. We will use the notation “(A, 0) LE (B, 1)” for the
claim “A, given 0, is no more likely than is B, given 1”. Each distribution LE is
actually a 4-place relation:
LE ⊆ (L × P (L)) × (L × P (L))
However, as indicated, we will write the relations in infix notation, and treat them
as two-place relations on ordered pairs. We will first list the constraints on the
relations LE, and then we will give a brief motivation of each one.
CCP.1 The relation LE is transitive.

CCP.2 (A, 0∪ {A}) LE (A,0 ∪ {A,B})
CCP.3 If (A, 0∪ {A}) LE (A, 0∪ {B}), then (B, 0) LE (A,0).
Condition CCP. 1 must trivially be satisfied by any comparative relation. Next

consider CCP.2. In any standard logic, the argument to conclusion A from premises
that contain A itself is as strong as any argument can be; in numerical probability,
the value will always be 1. Such an argument is deductively valid, and no addi-
tional premises can reduce the strength of the conclusion, relative to the premises.
Our condition CCP.2 reflects this observation in our comparative theory. Finally,
consider CCP.3. The following relationship always holds in numerically valued
conditional probability theory.
P(A,0)×P(B,0∪ {A})=P(B,0)×P(A,0∪ {B})
An elementary 3 circle Venn diagram will easily establish that the relationship must
hold of any theory in accord with relative frequency. But if P(A, 0∪ {B}) takes the
value 1, then it immediately follows that P(B, 0)≤ P(A, 0). Condition CCP.3 is
just a simple comparative version of this result.
Let us now consider the possibility of attempting to introduce into the object
language a syntactic structure which is meant to capture the conditional of con-
ditional probability theory. In terms of our comparative structures, we require the
following two conditions:
CCP.4.a (A, 0∪ {B}) LE (s(B, A), 0)
CCP.4.b (s(B, A),0) LE (A,0 ∪ {B})
But serious difficulties arise if we try to add these conditions. We will state and
prove two theorems, and then discuss their significance.
Theorem 3.1: If LE satisfies CCP. 1-3, 4.a, then the syntactic construction
s(B, A) is monotonic; that is, the following holds for all expressions A and B and
all sets 0:
(A,0) LE (s(B, A),0)
Proof:
1. (A,0∪ {A,B}) LE (s(B,A), 0∪ {A}) CCP.4.a
2. (A,0∪ {A}) LE (A,0∪ {A,B}) CCP.2
3. (A,0∪ {A}) LE (s(B,A),0∪ {A}) 1,2,CCP.1
4. (s(B, A), 0∪ {s(B, A)}) LE
(s(B, A), 0∪ {s(B, A), A}) CCP.2
5. (A,0∪ {s(B,A),A} ∪ {A}) LE
(A, 0∪ {s(B, A), A} ∪ {A, s(B, A)}) CCP.2
6. (A, 0∪ {s(B, A), A} ∪ {A}) LE
(A,0∪ {s(B,A),A} ∪ {s(B,A)}) 5, set theory
7. (s(B, A), 0∪ {s(B, A), A}) LE
(A,0∪ {s(B, A), A}) 6,CCP.3
8. (s(B,A),0∪ {s(B,A)}) LE
(A,0∪ {s(B, A), A}) 4,7,CCP.1
9. (A,0∪ {A} ∪ {A}) LE
(A, 0∪ {A} ∪ {s(s(B, A), A)}) CCP.2
10. (s(s(B, A), A), 0∪ {A}) LE (A, 0∪ {A}) 9, CCP.3
11. (A,0∪ {A,s(B,A)} LE
(s(s(B, A), A), 0∪ {A}) CCP.4.a
12. (A,0∪ {A,s(B,A)} LE (A,0∪ {A}) 10,11,CCP.1
13. (s(B,A),0∪ {s(B,A)}) LE (A,0∪ {A}) 12,8,CCP.1
14. (s(B, A), 0∪ {s(B, A)}) LE (s(B, A), 0∪ {A}) 3,13,CCP.1
15. (A,0) LE (s(B,A),0) 14.CCP.3
342 C.G. MORGAN
Theorem 3.2: If LE satisfies CCP.1-3, 4.a, and 4.b, then LE is monotonic; that is,
the following holds for all expressions A and B and all sets 0:
(A,0) LE (A,0∪ {B})
Proof: The proof is immediate from Theorem 3.1, CCP.4.b, and CCP.1.
Suppose we try to introduce a conditional construction into the syntax that al-
lows us to move sentences from the premise set of the probability distribution into
the antecedent of a conditional in the object language without decrease of prob-
ability value (i.e., in accord with CCP.4.a). Theorem 3.1 immediately tells us that
any such conditional must be monotonic. Looked at in another way, Theorem 3.1
tells us that any nonmonotonic conditional in the syntax must violate fundamental
principles of probability theory. In other words, no nonmonotonic conditional in
the syntax can serve as the syntactic structure of CCP.4.a. It is logically impossible
for a nonmonotonic conditional in the syntax of the language to reflect the most
elementary principles of relative frequency.
But the situation is even more serious, as is indicated by Theorem 3.2. Note
that the statement of Theorem 3.2 does not directly mention the new syntactic
structure. The simple attempt to include a structure in the object language which
captures the conditional of the probability theory forces the probability theory to
be monotonic for every set of premises and every possible conclusion, whether
the new conditional structure is involved or not. But not only is the probability
theory forced to be monotonic, the corresponding object language conditional must
itself be monotonic as well. So the attempt to capture in the object language the
nonmonotonic character of conditional probability theory is doubly doomed: (1)
the probability theory immediately ceases to be nonmonotonic, and (2) the new
object language conditional cannot be monotonic either.
No matter what the logic is like, no matter what logical particles the language
contains, it is logically impossible to incorporate any syntactic structure in the
object language which captures the nonmonotonic character of conditional prob-
ability theory. As soon as we try to incorporate such a structure into the object
language, the probability theory is forced to be monotonic. But if the conditional
of the probability theory is forced to be monotonic and the object language condi-
tional captures the conditional of the probability theory, then the object language
conditional will be monotonic too. Hence, the nonmonotonic character of condi-
tional probability theory is essentially, necessarily meta-theoretical, and cannot be
reflected in the underlying syntax of the object language.
Thus any alleged nonmonotonic conditional in the object language must be
characterized by principles that violate the most elementary considerations of re-
lative frequency. Note that this result applies, no matter how a theory of non-
monotonic conditionals is formulated. We motivated the theorem in this section by
beginning with a consideration of the conditional of conditional probability theory.
But these results apply to any logic, with any logical particles, no matter what the
characteristics of the underlying proof theory, no matter what the characteristics of

its underlying semantics, if any.
In short, it does not matter whether or not the theory is formulated in terms
of some abstract calculus completely unrelated to probabilistic considerations. No
matter what the proof theory is like, no matter what the syntax of the language is
like, if the language includes a conditional operator that is nonmonotonic, then that
operator logically cannot correspond to the conditional of conditional probability
theory.
It is fair enough for any researchers to say that, at least initially, they are not
trying to capture probability theory. But the important point that cannot be avoided,
regardless of the goals of the research, is that any nonmonotonic conditional lo-
gically must be governed by principles that violate elementary considerations of
relative frequency. But these elementary principles are basic to most any notion of
rationality.
(The proof in this section is based on considerations which appear in Morgan
and Mares (1995) and Morgan (1997).)
7. Fourth Proof
From our first three theorems it is obvious that there are very severe problems asso-
ciated with any attempt to account for the general nonmonotonic character of our
reasoning by the development of a new consequence relation in formal logic. Such
developments cannot correspond to rational belief structures. And there is no way
to incorporate logical particles into the syntax of the language in such a way that
those particles are governed by relative frequency considerations. However, we did
indicate that it is always trivially possible to develop nonmonotonic algebras, and in
spite of their shortcomings, such theories may turn out to have useful applications
in some areas. For our last proof, we will consider potential difficulties that might
arise when we try to reason with such logics. We will begin with a couple of simple
examples of reasoning.
First example: Sherlock, the detective, is trying to figure out who shot JR. Since
there was only one bullet, only one person did the deed. Initial evidence (motive,
location) indicates the culprit was either Mary or Sue. Subsequent evidence (find-
ing the gun, matching the marks on the bullet in the body to those from bullets fired
from the gun, fingerprints on the gun, witnesses) strongly suggests Mary pulled the
trigger. Sherlock concludes that Sue did not shoot JR.
Second example: Astronomer Spock makes observations of the motion of bod-
ies in a certain location in space. His observations plus considerations of relativistic
mechanics suggest there is a very massive object with a relatively small radius
there. Such an object would be a black hole. From the theory of black holes, and
from the observations of the matter in the region, Spock predicts that characteristic
radiation, released by matter falling into the black hole, will be detected in the
region.
344 C.G. MORGAN
Both of our examples are instances of chaining together inferences in a way

indicated by the following pattern:
trans: If 0 ` A and 0∪ {A} ` B,then 0 ` B.
This characteristic of consequence relations is sometimes called transitivity. We
have already alluded to another property of consequences relations, which is some-
times called reflexivity.
reflx: 0∪ {A} ` A
Both of these properties, along with monotonicity, were formulated by Tarski in his
early studies of consequence relations. However, here we would like to examine the
possibility of nonmonotonic reflexive and transitive consequence relations. It turns
out that there is a tight connection between our ability to reflect the consequence
relation in the syntax of our logic and the principle of monotonicity. In parallel to
our discussion in the previous section, let us consider the possibility of some sort of
“deduction” theorem. That is, let us suppose that there is some syntactic structure
in the object language which reflects the consequence relation; this “reflection” has
two parts.
ded.1 If 0∪ {B} ` A, then 0 ` s(B,A).
ded.2 If 0 ` s(B, A), then 0∪ {B} ` A.
Of course in classical logic, the deduction theorem holds with respect to the ma-
terial conditional, or equivalent syntactic structures. But it turns out that we cannot
have a nonmonotonic logic whose consequence relation is reflected in the syntax.
Theorem 4.1: For any reflexive, transitive consequence relation satisfying ded.l,
the syntactic structure “s” is monotonic; that is:
If 0 ` A, then 0 ` s(B,A).
Proof:
1. 0À given
2. 0 ∪ {A,B} ` A refix
3. 0 ∪ {A} ` s(B,A) ded.1
4. 0 ` s(B,A) 1, 3, trans
Theorem 4.1 demonstrates that it is impossible to reflect any transitive, reflexive
consequence relation by any nonmonotonic construction in the syntax of the ob-
ject language. However, the situation is even more serious, as is indicated by the
following result.
Theorem 4.2: Any reflexive, transitive consequence relation that satisfies both
ded.1 and ded.2 with respect to any structure “s” is itself monotonic; that is:
If 0 ` A, then 0∪ {B} À.
Proof: The proof is immediate from Theorem 4.1 and ded.2.
Theorem 4.2 tells us that if a deduction theorem holds with respect to any syntactic
construct, no matter what the characteristics of that construct, then a transitive,
reflexive consequence relation must be monotonic. (This result tells us that it is
impossible to obtain a nonmonotonic logic by any simple extension of a classical
natural deduction system.) Thus Theorem 4.2 suggests again that nonmonotonicity
of the consequence relation is independent of the syntax of the language.
Looked at from another point of view. Theorem 4.2 just says that no transitive,
reflexive, nonmonotonic consequence relation can have a deduction theorem with
respect to any syntactic structure in the object language. For some, this result may
not seem too devastating, at first. After all, the usual formulation of the rule of
necessitation for the standard modal logics precludes a proof of the deduction
theorem. But, there are some important differences between the standard modal
logics and attempts to formulate nonmonotonic logic. In modal logic, the rule of
necessitation is validity preserving, not truth preserving. But as we have seen in
our first two impossibility proofs, nonmonotonic logics cannot be formulated with
validity preserving principles, i.e., with principles that are universalizable across
all models. There are ways of formulating standard modal logics and the rule of
necessitation so that the deduction theorem holds. But in contrast, there is no way
to formulate a nonmonotonic logic so that the deduction theorem holds. So the
difficulty presented by Theorem 4.2 is rather deeper than the difficulty associated
with the usual formulations of modal logics.
To see how penetrating the problem is, we can reformulate Theorems 4.1-2
in terms of an axiomatic structure with a special conditional operator “->” and a
conjunction operator “∧”, both in the object-language. We can think of our charac-
terizations of the consequence relation at the meta-level as being characteristics of
the object-language conditional instead. Our transitivity and reflexivity conditions
now become the following object-language conditions:
otrans: If ` x -> y and ` (y ∧ x)->z, then ` x ->z.

orefix: ` (x ∧ y) -> x
There will be two object-language principles, parallel to the two components of the
deduction theorem, as follows:
oded.1: If ` (x ∧ y) -> z, then ` x-> s(y,z).
oded.2: If ` x->s(y,z), then ` (x ∧ y)->z.
Finally, we assume the conjunction operator is at least partially associative in the
antecedent of the conditional:
assoc:If ` (w ∧ (x ∧ y))->z, then ` ((w ∧ x) ∧ y)->z.
We can now recast Theorems 4.1 and 4.2, as follows.
346 C.G. MORGAN
Theorem 4.3: Suppose we have an axiomatic system that satisfies otrans, orefix,
oded.l, and assoc. Then for all expressions A, B, and C: if ` A -> B, then
` A -> s(C,B).
Proof:
1. À->B given
2. `(B ∧ (A ∧ C))->B orefix
3. `((B ∧ A) ∧ C)->B 2,assoc
4. ` (B ∧ A) -> s(C, B) 3,oded.1
5. ` A -> s(C, B) 1,4,otrans
Theorem 4.4: Suppose we have an axiomatic system that satisfies otrans, orefix,
oded.1, oded.2 and assoc. Then for all expressions A, B, and C: if ` A -> B, then
` (A ∧ C)->B.
Proof: The proof is immediate from Theorem 4.3 and oded.2.
Any syntactic structure which serves to reflect a conditional in the consequent

of that conditional, in the sense of oded.1, must be monotonic in that reflection.
So in particular, if the conditional reflects itself, then in its second order position,
it will be monotonic. For example, suppose our conditional “->” is the syntactic
structure of oded.1, so that the following holds:
oded.3: If ` (x ∧ y)->z, then ` x-> (y->z).
Then Theorem 4.3 tells us that the conditional is monotonic in its second order
occurrences:
If ` A -> B, then ` A-> (C -> B).
Hence, all of its nonmonotonic behavior will be limited to its first-order occur-
rences.
But even worse for seekers after a nonmonotonic logic, any conditional which
has a uniform reflection by any syntactic structure in its consequence, in the sense
of oded.1 and oded.2, any such conditional will be monotonic. If a conditional is
provable, then so will be a similar conditional with arbitrary added conjunctive
components in its antecedent. But then many moves which we make in common
sense reasoning will not hold for such a logic. For example one cannot convert a
nonmonotonic conditional with a conjunctive antecedent into a conditional chain
and back again.
In summary, there can be no uniform connection between nonmonotome con-
sequence relations and nonmonotonic conditionals. Out of necessity, researchers in
nonmonotonic logics have given up the deduction theorem for various specific pet
systems; thus some may be inclined to simply shrug at these results. But these
results are not peculiar to some specific system and some specific connective;
they are much more general. They tell us that all nonmonotonic conditionals and
all nonmonotonic consequence relations are logically incompatible. Paradoxically,

trying to formally force a connection between the two guarantees that neither the
consequence relation nor the conditional will be nonmonotonic. But then ordinary
patterns of common sense reasoning seem to have no counterpart in such logics.
So a theory of nonmonotonic conditionals that broadly accords with common sense
reasoning is impossible.
8. The Nature of Nonmonotonic Reasoning

Perhaps the first point that needs to be emphasized is that one cannot escape the
four impossibility proofs by introducing strange, new modal operators, condition-
als, or other connectives into the object language. Except for Theorems 4.3 and
4.4, none of our theorems depend in any way on the logical particles in the object-
language, and those latter two only assumed the presence of a simple conjunction
operator. The theorems apply no matter what exotic syntactic structures or infer-
ence rules are introduced. I do not mean to suggest that there are no good reasons
for introducing new syntactic structures or inference rules into our logical systems.
But changes in our underlying logic cannot be motivated by claims that the changes
will produce a nonmonotonic proof theory that accords with rational belief. What
we have shown is that no matter how exotic, if a proof theory is to accord with
rational belief, then it must be monotonic, period.
It seems we are now in a very strange position. We began this paper by giving
a rather straight forward, uncomplicated example of common sense reasoning that
is clearly nonmonotonic. Such examples can be enumerated without end. But we
have just presented four rather distinct proofs that a nonmonotonic logic that cor-
responds with rational belief is impossible. Are we left to conclude that common
sense nonmonotonic reasoning is illogical or irrational? Not at all. I do not want
to suggest that nonmonotonic reasoning is irrational. What I do want to suggest
is that we can account for nonmonotonic reasoning by using standard monotonic
logics; we do not need a “weird” logic to account for the process.
Thus far we have shown that no nonmonotonic consequence relation can corres-
pond to universal principles of either unconditional or conditional rational belief.
We have also shown that any nonmonotonic object language conditional is in-
compatible with elementary principles of relative frequency. And we have shown
that given elementary principles of common sense reasoning, nonmonotome con-
ditionals in the object language are logically incompatible with nonmonotonic
consequence relations. But we also assume that common sense nonmonotonic reas-
oning is rational. Hence, the nonmonotonicity of common sense reasoning cannot
be accounted for by any nonmonotone character of the logic that we use. The
nonmonotonicity of commonsense reasoning is due to the way we use logic, and
not due to the logic that we use.
Let us consider a simple analogy. The field of astrophysics changes relatively
quickly for a science. For example, our views of the large scale structure of the
348 C.G. MORGAN
universe have changed drastically, several times over the last 50 years. However,
no one would suggest that we need a nonmonotonic theory of thermodynamics,
general relativity theory, or elementary arithmetic just because we in astrophys-
ics change our minds about some astronomical phenomenon when we get new
evidence. Mathematics and the fundamental theories of physics are just tools that
astrophysicists use to try to understand astrophysical phenomena. In a similar way,
logic is a tool that we can use to help us understand the human process of reasoning.
Logic is not itself a full account of the process of reasoning, any more than is
general relativity an account of the process of the evolution of the universe. Logic
is a tool to help us understand the process of reasoning, just as general relativity is
a tool to help us understand the evolution of the universe.
Part of what makes the jump from “nonmonotonic reasoning” to “nonmonotonic
logic” seem so appealing is a confusion of logical theory with an account of the
human process of reasoning. The term “logic” is commonly used in several senses.
In one sense, “logical” is used interchangeably with “reasonable”. To ask for the
logic behind someone’s inferences or actions is to ask for an account that would
make us see those inferences or actions as rational; that is, we want a story that
would allow us to empathize, to see that we would make the same inferences or
perform the same actions in similar circumstances. When we arrive at such an
understanding, we feel the inferences or actions are “rule governed”, not random
or erratic. But it would be a mistake to jump from the observation “Her infer-
ences were quite logical” to the claim that there is some different formal system
with weird connectives and/or inference rules that is needed to account for her
inferences. Reasoning is a human process that takes place through time. Logic
provides, at best, an instantaneous snapshot of a “rational” consequence relation.
The difference is as fundamental as the difference in physics between dynamics
and statics.
Before giving a couple of examples of how monotonic logic can be used to
provide an account of nonmonotonic reasoning, we will sketch an elementary
abstract account in terms of model theory and associated proof theory that may
serve as a simplified general framework to aid our understanding. Consider some
standard deductive consequence relation, which we will symbolize by “`D ”. Let
us also assume we have given some notion of deductive model, so that with each
set 0 of sentences we have an associated set DM(0) of the deductive models of 0.
The usual soundness and completeness results can be stated as follows:
D-soundness: If 0 `D A, then DM(0) ⊆ DM({A}).

I-completeness: If DM(0) ⊆ DM({A}), then 0 `D A.
Can we construct a similar account for inductive logic (i.e., so-called nonmono-
tonic logic)? Using similar notation, it seems we need an inductive consequence
relation and some notion of inductive models, and hopefully parallel soundness
and completeness results.

I-soundness: If 0 Ì A, then IM(0) ⊆ IM({A}).
I-completeness: If IM(0) ⊆ IM({A}), then 0 Ì A.
But as we have seen above, reasonable accounts of inductive model theory (e.g.
in terms or rational belief structures, or in terms of probability theory) satisfy the
following condition:
subset: IM(0 ∪ 1) ⊆ IM(1)
And this combination immediately gives rise to monotonicity, which we are trying
to avoid. We now turn to a consideration of how this situation may be remedied.
It would seem pretty strange if there was no relationship between the notion
of a deductive model and an inductive model, particularly when there is a strong
relationship between the two consequence relations. After all, it is usual to treat
deductive consequence as just a “supreme” case of inductive consequence. In fact,
our formal theory would be much simpler if, instead of having both deductive
models “DM(0)” and inductive models “IM(0)” we could just settle on one notion
of model “M(0)” and use it throughout. But if inductive consequence is not to
coincide with deductive consequence, then there must be some set 6 and some
sentence C such that 6 Ì C, but not 6 `D C. Hence, if we are going to have only
one notion of model, it must not be the case that M(6) ⊆ M({C}). So the pressure
is to abandon inductive soundness.
Having only one notion of model is so appealing that it is not to be abandoned
lightly, especially in view of the lack of consensus in the literature concerning the
notion of an inductive model. But then we are faced with the problem of having a
semantic account of inductive consequence. There is a very simple suggestion for
resolving the problem that even has some basis in the facts of human reasoning.
Going back to our original example of Sarah reasoning about the crowds at the
mall, every time Sarah recalled some new information, she revised her model of
the actions of potential shoppers. Note she did not revise her beliefs concerning
the conservation of matter or the color of her car. That is, Sarah’s revisions were
not random, nor did she consider all possible revisions consistent with the new
information.
Suppose we associate with each set 0 of WFFs some set of “presupposed”
models PM(0). Of course we assume that the presupposed models of 0 are in fact
models of 0, so PM(0) ⊆ M(0). Our examples suggest that we define inductive
consequence as follows:
0 I A iff PM(0) ⊆ M({A})
That is, when reasoning inductively about the consequences of 0, we select some
“presupposed” set of models (pictures of the world, universe designs) in which 0
holds; we do not consider all possible models of 0. If all of these presupposed mod-
els of 0 turn out to be models of A, then we say that A is an inductive consequence
350 C.G. MORGAN
of 0. From this characterization, it would immediately follow that if 0 D A, then

0 I A, but not the reverse, just as desired. Further, there is no reason to expect
that PM(0 ∪ 1) ⊆ PM(0), so monotonicity would not follow in general.
I do not wish to insist on any more detail about how the presupposed models
are determined. It could be that presupposed models differ from one individual
to another. It could be that presupposed models are determined by the amount
of energy required to alter the current mental model. It could be that the way in
which we alter our mental models of the world in response to new information is
determined by sociological conditioning, or by genetic factors, or both, or perhaps
neither. There are a lot of alternatives. I do not pretend to have the answers as
to the origin, selection, or ordering of the presupposed models. I do not wish to
suggest that these questions are unimportant. On the contrary, I would agree that
such questions are among the most important in our understanding of inductive
(nonmonotonic) reasoning. In fact, I would urge that they deserve a great deal of
research effort. But my goal in this part of the paper is rather modest. I only want to
show that with the presupposed model, one only need employ standard monotonic
logic in order to construct systems which reason in a non-monotonic manner. In the
previous sections, I demonstrated that rational nonmonotonic logic is impossible.
In this section and the next, I will show that nonmonotone logic is not even needed.
As presented here, our account is much simpler than the “preferential models”
account of Besnard and Siegel (1988); the two accounts were developed independ-
ently but have some commonality. We are strongly sympathetic to their approach as
a general framework in which to think about nonmonotonic reasoning, though we
differ on specific technical details, and hence the slight shift in terminology. We do
not wish to suggest that they would agree with any of the positions advocated here.
Our simple account also has obvious affinities with the preferential models account
of Kraus et al. (1990); however, we doubt that the technical complexities of their
systems can be made to correspond to some independent account of rational belief.
Note that making inferences based on a set of presupposed models amounts to
making an assumption, or a bet, that the universe will turn out to be a certain way.
One is assuming, or betting, that it is safe to ignore those models of 0 that are not
in the presupposed set. Because inductive inferences frequently go beyond what
is deductively warranted by the premises, we can say in general that PM(0) 6 =
M(0). Hence, on this bare bones theory, even if 0 I A, it will still be the case that
Pr(A, 0) = 0 for some probability distributions Pr. So as we have seen, inductive
consequence is not definable in terms of some universal property of all probability
functions.
In terms of proof theory, the picture we are suggesting is extremely simple, and
not new. We get around in the world with some mental model of how the world
works; formally, uncertainties in our mental picture are equivalent to having a set of
more complete models instead of just one model. When I reason inductively, I just
draw standard deductive consequences from my mental models. When I get new
information, I update or change my mental models, and inductive consequences of
the new information are just deductive consequences of the updated models. Since
the set of presupposed models does not constitute all the models, the inductive
consequences are a superset of the deductive consequences.
Of course there are lots of interesting questions about how models get changed
or updated, especially if the new information contradicts something in the current
model. But just how models are actually updated by human beings is a matter
of empirical fact, not a question of logic. If one wishes to be prescriptive about
the way in which models should be changed, then clear criteria for preferring
one updating scheme to another must be specified. Claims like “changing models
this way rather than that is more likely to lead to correct conclusions” make very
strong claims about the way our world actually operates, and amount to empirical
assumptions about our world which should be explicitly stated and empirically
tested. Remember, it is not our goal here to answer such questions. Our goal is only
to show that we do not need any nonmonotonic logic to understand nonmonotonic
reasoning. In terms of understanding human reasoning, the effort spent trying to
develop some nonmonotonic logic is totally wasted; that effort should be redirected
to potentially more fruitful areas, such as how humans update their presupposed
models or shift from one to the other.
Our bare-bones semantic framework has a lot in common with the semantics of
spheres proposed by Lewis (1973) and subsequently applied by Delgrande (1988)
to nonmonotonic logic. But from the discussion in our impossibility proofs, serious
problems arise with Delgrande’s approach when he attempts to use the semantics
as a semantics for an object-language conditional, and then use that conditional to
define a nonmonotonic consequence relation. In light of our previous discussion,
it is particularly suggestive that Delgrande does not allow his conditional to be
nested, and thus it is more like a consequence relation than an object language con-
ditional. Certainly there are nonmonotonic conditionals in English, and their logic
is important. But as we have seen, nonmonotonic consequence relations cannot be
generally based on object-language constructs.
On the other hand, our associated bare-bones proof theoretic account has a lot
in common with default logics of the sorts elaborated by Reiter (1980). From our
point of view, one can see the default rules as an attempt to code information about
the current mental model, as well as information about how the model is to be
changed when certain sorts of information are encountered. However, our simple
picture does not require some special inference rules, nor some special object lan-
guage connectives. From the presupposed models point of view, so-called default
rules are highly simplified encodings of empirical assumptions about our world.
As such, they are best thought of as part of the corpus of common sense empirical
wisdom (folk physics, folk psychology, folk biology, etc.) rather than as part of
logic.
The heart of nonmonotonic reasoning is not in some new logic; rather, what
is needed is more information about how our models of the world come to be
originally formulated and then subsequently changed and updated in response to
352 C.G. MORGAN
new information. The logic that we use to draw conclusions is always the same;
nonmonotonicity is just the result of changing our background assumptions or
models and using the same old logic to draw conclusions.
This simple framework also sheds light on the problems we encountered when
trying to use an object-language conditional to reflect either the conditional of con-
ditional probability or any allegedly nonmonotonic consequence relation (our third
and fourth proofs, above). Both cases are impossible, as we demonstrated above.
Thus nonmonotonicity seems to be essentially meta-theoretical. It is a result of the
way in which logic is used; it is not the result of syntactic constructs within the
language. The nonmonotonicity of common sense reasoning is a result of the way
we use logic; it is not a result of the logic we use.
9. Non-monotonic systems
The general framework at which we have arrived is very simple. Common sense
reasoning by humans uses (some) standard monotonic logic. The reasoning process
involves (for all members of the animal kingdom) the use of mental models of our
world. When we get new information, we update or change our models. We use our
models and standard logic to draw conclusions. The nonmonotonic aspect of our
reasoning is a result of the model changing process, not a result of the logic. I do
not suggest that this picture is particularly original nor startlingly new. However,
I do want to suggest that it is richer and more powerful than has been widely ac-
knowledged. Let us turn to a few examples of systems based on this “presupposed
models” account.
Our first example of a nonmonotonic reasoning system is the theory of prob-
ability itself. One way of looking at a probability distribution is as a sequence of
elementary weighting functions over a σ -field of sets of models, where models are
conceived of as maximally consistent sets of WFFs (see Morgan (1991) and Van
Fraassen (1981)). For a given evidence set 0, we find the first weighting scheme
in the sequence in which there are models of 0 with non-zero weight. If there is
no such scheme in the sequence, then all probability values, given that 0, will be
1. If there is such a weighting scheme, then the probability of each sentence A,
given 0, is just the weighted relative frequency of the 0 models in which A holds.
When we add new information 1, we just select the first weighting scheme in the
sequence in which there are models of 0 ∪ 1 with non-zero weight, and compute
probabilities as before. In the simple presupposed models picture, each model gets
a weight of 0 or 1, but in this generalized scheme, we may allow more than just
two values for the weights. Each distinct probability distribution just corresponds
to a different weighting scheme over the models. Seen in this light, probability
theory is a generalization of the presupposed models analysis of nonmonotonic
reasoning. Probability theory just is relative frequency applied to the presupposed
models analysis of nonmonotonic reasoning. Probability theory makes one very
important assumption to which we would like to draw attention. It assumes that
the order in which we arrive at our premises does not matter. More graphically, the
probability of A, given the background information <B1 , B2 , B3 > is the same as
given <B2 , B3 , B1 >, and the same as given <B1 , B3 , B2 > or any other permutation.
In some treatments of weird conditionals, the order of the antecedents is deemed
to be important.
As we have mentioned earlier, probability theory is much too detailed for most
applications; we are just not in a position to assign precise numerical measures
to all evidence-conclusion pairs. Further, the abstract theory of probability really
does not tell us which distributions we should use in any given circumstance. Our
second example is a rather simple scheme which at least partially avoids both of
these problems. The approach is based on the pioneering work of Carnap (1962).
Many examples of nonmonotonic reasoning may be thought of as based on
simple statistical information derived from experience concerning elementary clas-
sificatory properties, combined with the following “rule”: if Pr(A, 0)> .5, then it is
reasonable, given 0, to act as if A is the case. (Of course the reasonableness of this
rule is based on the assumption of comparable, modest potential gains and losses.
We show by a simple example below the need to consider the magnitude of gains
and losses when choosing how to act.) We assume the language contains predicates,
individual constants, and the standard classical logical particles but no quantifiers.
Our experience is always finite, so our background information 6 concerns only a
finite number of individuals:
I(6) = {a1 , . . . , an }
In addition to S, we are given information 0(b) about some new individual b, and
we would like to know whether or not to conclude E(b); both 0 and E may be very
complex and may contain information about individuals other than b.
Of course the very first thing to check is whether 6 ∪ 0(b) ` E(b). If E(b) is
just a deductive consequence of our background information, then we are done.
For computational purposes, we could use any standard theorem checker for the
deductive logic in question, perhaps with resource limitation cut-off to ensure ter-
mination. In a similar way, we may check to see if 6 ∪ 0(b) `∼ E(b), for if so, our
problem is solved.
If neither E(b) nor ∼E(b) is a deductive consequence (within whatever resource
limits we have imposed), we need to consider whether or not either is an inductive
consequence. For the purposes of this second problem, we may act as if I(6) con-
stitutes the entire universe. We then find the set of objects 0(I6) in this universe
which, according to the information in 6, have all of the properties listed in 0:
0(I(6)) = {x: x I (6) and 6 ` B(x) for every WFF B(x) 0( x)}
If 0(I(6)) is empty, we refuse to draw the conclusion on the grounds of inadequate
information. If it is not empty, we then find the set of objects E(0(I(6))) which,
according to the information in 6, have the property E:
E(0(I(6))) = {x: x 0(I (6)) and 6 ` E(x)}
354 C.G. MORGAN
Then if E(0(I(6))) contains more than 50% of the objects in 0(I(6)), it would
make sense to predict that E(b). On the other hand, if E(0(I(6))) contains less than
50% of the objects in 0(I(6)), it would make sense to predict that ∼ E(b).
This simple scheme is an elementary formalization of reasoning by analogy.
It is similar to circumscription and is applicable to many of the same kinds of
problems. But our scheme is easy to implement, is computationally tractable, and
corresponds well with our intuitions. It easily handles the standard examples about
flying and birds, and similar cases as well. It does not require the introduction of
weird logical particles nor the use of dubious “default” rules.
Obviously the scheme is not monotonic. Adding information 3 about new ob-
jects not in I(6) may change the distributions, since I(6 ∪ 3) will not be the same
as I(6). Further, if we add more background information 1(b) about the individual
in question we may find 0(I(6)) is different from (0 ∪ 1)(I(6)). The scheme has
obvious limitations, since it deals only with very simple, non-quantified languages.
However, we could extend the scheme to quantified languages by requiring the
expansion of all quantifiers using the objects in I(6). (With quantifiers, it becomes
possible to make claims about the number of objects without naming them, so in
some cases difficulties may result from such a treatment of the quantifiers.) There
are many other ways to use probability in a nonmonotonic inferencing system; see
Kyburg (1994) for a good alternative.
As a final example of a nonmonotonic system, we will consider a technique
which is similar to so-called default logic. The technique has nothing directly to do
with probability theory, although it is actually derived from the method presented
in Morgan (1996) for constructing canonical probability distributions. The intuitive
picture behind the technique is that part of common sense is the possession of an
ordered sequence of hypotheses, each of which is a potential background assump-
tion which may be used to fill in holes in our knowledge. When we are given
some set of premises, we augment those premises by trying to consistently add the
hypotheses, in order, to the premises. Any hypothesis that cannot be consistently
added is just abandoned. Common sense conclusions are then just the deductive
conclusions of this expanded premise set.
For a slightly more formal account, we assume to be given some deductive
consequence relation “`” and a corresponding notion of consistency. Let S be
any arbitrary (preference) ordered set of WFFs, which will be used as potential
background assumptions:
S = {H1 , H2 , . . . }
Consider some given premise set 0. We use S to determine an expanded premise

set 0(S), as follows:
00 = 0
0 i+1 = 0 i ∪ {Hi+1 }, if 0 i ∪ {Hi+1 } is consistent
= 0 i , otherwise
[
0(S) = 0i
There is no requirement that the set S be consistent. The members of S are con-
sidered to be alternatives. Thus even if S is itself inconsistent, it may well be that
individual members of S are consistent with the premise set. As long as the premise
set 0 is consistent, the expanded premise set 0(S) will be consistent, no matter
what S contains. We use the expanded premise set to determine the S-consequences
(inductive) of 0, as follows:
0 ` s E iff 0(S) ` E
Clearly even for a fixed sequence S, the S-consequence relation will be nonmono-
tonic. If we add information 1 to the set 0, we may well find that different hypo-
theses from sequence S get put into the expanded premise set. In general, we will
not have 0(S) ⊆ (0 ∪ 1)(S).
For a simple, but illustrative example, we will use the following symbols for the
corresponding English phrases:
Px: x is a penguin
Bx: x is a bird
Ax: x is an animal
Fx: x is capable of flight
t: Tweety
For the example, we will use the following sequence of ordered background as-
sumptions:
S = {(x)((Px ∧ Bx ∧ Ax)⊃∼ Fx), (x)((Bx ∧ Ax) ⊃Fx), (x)(Ax ⊃∼ Fx)}
Using the definitions above, we have the following expanded premise sets:
{At} (S) = {At} ∪ S
{At, Bt}(S) = {At, Bt} ∪ {(x)((Px ∧ Bx ∧ Ax) ⊃∼ Fx),
(x)((Bx ∧ Ax) ⊃ Fx)}
{At, Bt, Pt}(S) = {At, Bt, Pt} ∪ {(x)((Px ∧ Bx ∧ Ax) ⊃∼ Fx)}
When told only that Tweety is an animal, we conclude that Tweety cannot fly, since
{At} (S) `∼ Ft. Similarly, when told that Tweety is an animal but also a bird, we
conclude that Tweety can fly, since {At, Bt} (S) ` Ft. And finally, when told that
Tweety is an animal and a bird but also a penguin, we conclude that Tweety cannot
fly, since {At, Bt, Pt}(S) `∼ Ft.
This scheme makes use of a deductive consequence relation and the related
notion of consistency, both of which may be computationally problematic, espe-
cially for a first-order language. But these notions are problematic for humans as
356 C.G. MORGAN
well. In any real application, reasonable results could be obtained by relying on

resource limited computational processes. We have said nothing about the origin
of the sequence S. However, default logics generally have nothing to say about the
origin of the defaults. Perhaps some general principles could be formulated, like
“when comparing two universal conditionals, prefer the one with the logically more
complex antecedent over the one with the simpler antecedent”. From our point of
view, the important points to note are that (i) we rely on ordinary deductive logic,
and (ii) we do not require any weird connectives, any special inference rules, nor
any appeal to complex and powerful fixed point theorems.
10. Conclusions
We have given four formal proofs that a rational nonmonotonic logic is impossible.
In the first proof, we showed that it is impossible to formulate a nonmonotonic
consequence relation that corresponds to any characterization of unconditioned
rational belief structures. In the second proof, we showed that it is impossible to
formulate a nonmonotonic consequence relation that corresponds to any charac-
terization of conditional rational belief structures. In our third proof, we showed
that no matter how complex the formal language, no object-language construct
(e.g., a nonmonotonic conditional) can reflect elementary principles of relative fre-
quency. And in our fourth proof, we showed that given standard accepted patterns
of reasoning, there is a fundamental logical incompatibility between any nonmono-
tonic consequence relation and any object-language nonmonotonic conditional;
any attempt to reflect one by the other forces both to be monotonic.
Careful attention to the proofs suggests that nonmonotonicity is a function of
the way we use logic, rather than being a function of the logic we use. We sug-
gested a very elementary semantic account, the presupposed models account, of
non-demonstrative reasoning. Using this account, we were able to explain how
nonmonotonic reasoning arises from the application of standard monotonic logic to
situations in which additional acquired background information leads us to change
our presupposed world model. We then discussed three examples of nonmonotonic
reasoning systems based on standard monotonic logic.
We certainly do not want to claim that our three examples of nonmonotonic
reasoning systems exhaust the possibilities. Our primary purpose was to demon-
strate the variety of systems which can be constructed without resorting to weird
logics. The simple presupposed models picture of nonmonotonic reasoning is very
general and deserves to be more widely investigated.
A secondary goal was to illustrate how nonmonotonic reasoning systems de-
pend on empirical assumptions. Recall that in earlier sections, we proved that no
nonmonotonic proof system could cohere with universal principles of rational be-
lief; every nonmonotonic proof system logically must violate elementary principles
of relative frequency. Semantically speaking then, nonmonotonic reasoning must
violate soundness with respect to rational belief structures. So specific examples
of nonmonotonic reasoning are only rational relative to a proper subset of ra-

tional belief structures. It is the selection of that proper subset which constitutes
the empirical assumption behind the reasoning. It is impossible to construct a
nonmonotonic reasoning system without building in empirical assumptions. It is
important that researchers in the field clearly recognize these logical requirements
and try to be explicit about the empirical assumptions built into their systems.
These logical facts about nonmonotonic reasoning are clearly incorporated in the
presupposed model. In terms of formal structures, nonmonotonic reasoning is spe-
cific to the presupposed world model being considered. That is, to know what
inductive conclusions to draw, one must know the details of the presupposed world
model structure.
A third important item to note is that all of our different example reasoning
systems are applicable to many of the same sorts of problems (e.g., the question
Tweety’s flight capabilities). As with automated (deductive) theorem proving sys-
tems, there is no one “correct” nonmonotonic reasoning system. Each system will
have its advantages and its disadvantages; each may do better on certain kinds of
problems than others.
Finally, I would like to make a disclaimer. I do not claim to have shown here
that there are no general principles of reasoning that would result in changing the
conclusions that we draw when we change our background assumptions; I have not
shown that there are no nonmonotonic inference patterns. However, I suspect that
most such patterns, when fleshed out with all qualifications, are more reasonably
viewed as part of a higher order theory of rationality rather than as some sort of
inference rules.
As a simple example, consider the following elementary principle of “statistical
syllogism”:
SG.1.1 P(A,0) > P(∼ A, 0)
SG.1.2 0
− − − − − − −−
SG.1.3 (probably) A
Now, this simple exposition does not begin to do justice to the verbiage that gener-
ally accompanies its presentation. At the very least, we must qualify the pattern by
some sort of requirement for total evidence, i.e., we must require that 0 contains
all the available evidence. Note that the principle of total evidence cannot just be
expressed by:
P(A,0 ∪ 1) > P(∼ A, 0 ∪ 1)for, all 1.
This latter requirement is met by almost none of the examples to which we want to
apply the statistical syllogism pattern, since probability distributions are in general
not weakly positively monotonic in the evidence position. For example, given only
that Tweety is a bird, it is reasonable to hold that it is more likely that Tweety can
fly than that Tweety cannot fly; But the relation will be reversed if we are given the
358 C.G. MORGAN
fact that Tweety is a bird and a penguin. So the total evidence requirement has to
do with what we know, accept, or believe.
The following is a more reasonable characterization of statistical syllogism:
SG.2.1 X believes P(A, 0) > P(∼- A, 0).

SG.2.2 X believes the members of 0.
SG.2.3 X believes there is no 1 such that:
(2.3.a) X believes the members of 1, and
(2.3.b) X believes P(∼ A, 0 ∪ 1)>P(A, 0 ∪ 1).
− − − − − − − − − − − − − − − − −−
SG.2.4 X should believe A.
For “belief” here, I do not have anything fancy in mind, just good old ordinary
reasonable belief. Certainly if any of the premises changes in a fundamental way,
we would not want to insist on the same conclusion. But now as we have just
phrased it, this pattern is part of a higher order theory about rational belief. But if
we are going to use statistical syllogism to update a data base, we might phrase it
as follows:
SG.3.1 P(A, 0) > P(∼A, 0)
SG.3.2 0 is the total content of the data base.
−−−−−−−−−−−−−
SG.3.3 A should be added to the data base.
Now, if we add additional information 1 to the data base, we have falsified SG.3.2,
and thereby undercut the reason for adding A to the data base. Hence, we may have
to remove A on the basis of subsequent information. (This example well illustrates
the need for the currently common practice of distinguishing between information
given “externally” and information which results from “internal” computations
based on the external material.) But this inference pattern is no different in principle
from observing that: (1) if you give me p ⊃q and p, then I am entitled to conclude q;
but (2) if you subsequently retract p, I can no longer conclude q from p ⊃ q alone.
In other words, this rule looks suspiciously like the presupposed models picture of
nonmonotonic inference!
There remains one aspect of nonmonotonic reasoning which I would like to
stress. In many actual cases, the nonmonotonicity of our reasoning is a function
(at least in part) of our perceptions concerning the utility of acting as if various
conclusions were true. As the perceived utilities change, the conclusions drawn
from the same premises may change as well. As a simple example, suppose we are
told only that Tweety is a bird, and we are asked whether or not Tweety can fly.
Consider the two following “pay off” regimes:
Regime 1: You answer “yes.

A. You are correct. You receive $10,000.
B. You are incorrect. You receive nothing.
You answer “no”.

A. You are correct. You receive nothing.
Regime 2: You answer “yes”.

B. You are incorrect. You must pay $10,000.
You answer “no”.

Clearly under regime 1 the rational person ought to answer “yes”, since there is a
potential gain for that answer and no potential gain for answering “no”. However,
under regime 2, the rational person ought to answer “no” because an answer of
“yes” exposes one to a potential loss and no gain, while the “no” answer does not
expose one to any loss. So in this example, the utilities completely outweigh any
statistical or other considerations.
The presupposed models view has been around for some time. See Daniels and
Freeman (1980) for the details of a formal analysis of subjunctive conditionals
based on this idea as well as for older references to the literature. Quite independ-
ently of our work, Poole (1988) showed that default logics could be seen as a
scheme for revising a presupposed world model, and that nonmonotonicity results
from using standard logic to reason from the revised model. He rejected as unne-
cessary the formulation of new proof theories with associated formal semantics.
He argued for the thesis that “. . . the problem of default reasoning is not a problem
with logic, but with how logic is used.” (Poole (1988), p. 45) We claim that this
conclusion holds for all nonmonotonic reasoning, not just for default reasoning. We
have here presented four proofs that it is impossible to formulate a nonmonotonic
consequence relation with a semantics that corresponds to elementary principles
of rational belief. Briefly put, we have shown that nonmonotonic logic must be
irrational. The “presupposed models with updating” view seems to be a common
theme in much of the current research into nonmonotonic reasoning; hopefully we
have shed some light into why this should be the case.
Acknowledgement
Over the years I have benefited greatly from discussions of this material with many
people, especially with those who disagree strongly with my views. In particular I
would like to thank Romas Aleliunas, Jim Delgrande, David Etherington, Robert
Hadley, Henry Kyburg, and Don Nute.
360 C.G. MORGAN
References
Besnard, P., and Siegel, P. (1988), ‘The Preferential-models Approach to Nonmonotonic Logics’, in
Smets et al., Non-standard Logics for Automated Reasoning, Academic Press, pp. 137–161.
Besnard, P. (1989), An Introduction to Default Logic. Berlin: Springer-Verlag.
Brewka, G., Dix, J., and Konolige, K. (1997) Nonmonotonic Reasoning: An Overview. Stanford:
Center for the Study of Language and Information.
Carnap, R. (1962), Logical Foundations of Probability, 2nd ed, Chicago: University of Chicago Press.
Daniels, C., and Freeman, J. (1980) ‘An Analysis of the Subjunctive Conditional’, Notre Dame
Journal of Formal Logic 21, pp. 639–655.
Delgrande, J. (1988), ‘An Approach to Default Reasoning Based on a First-order Conditional Logic’,
Artificial Intelligence 36, pp. 63–90.
Kraus, S., Lehmann, D., and Magidor, M. (1990), ‘Nonmonotonic Reasoning, Preferential Models
and Cumulative Logics’, Artificial Intelligence 44, pp 167–207.
Kyburg, H. (1994), ‘Believing on the Basis of the Evidence’, Computational Intelligence 10, pp.
3–20.
Lewis, D. (1973), Counterfactuals, Oxford: Basil Blackwell.
Morgan, C. and Mares, E. (1991), ‘Conditionals, Probability, and Non-triviality’, Journal of
Philosophical Logic 24, pp. 455–467.
Morgan, C. (1991), ‘Logic, Probability Theory, and Artificial Intelligence – Part 1: the Probabilistic
Foundations of Logic’, Computational Intelligence 7, pp. 94–109.
Morgan, C. (1994), ‘Evidence, Belief, and Inference’, Computational Intelligence 10, pp. 79–84.
Morgan, C. (1996) ‘Canonical Models and Probabilistic Semantics’, presented at the Annual Meeting
of the Society for Exact Philosophy, held at East Tennessee State University, forthcoming in
Logic, Probability and Science, N. Shanks et al., eds., in Poznan Studies in Logic.
Morgan, C. (1997), ‘Conditionals, Comparative Probability, and Triviality’, presented at the Annual
Meeting of the Society for Exact Philosophy, held at McGill University, forthcoming in Topoi.
Morgan, C. (1998), ‘Non-monotonic Logic is Impossible’, Canadian Artificial Intelligence 42, pp
19–25.
Poole, D. (1998), ‘A Logical Framework for Default Reasoning’, Artificial Intelligence 36, pp 27–47.
Reiter, R. (1980), ‘A Logic for Default Reasoning’, Artificial Intelligence 13, pp. 81–132.
Smets, P. et al., eds. (1988), Non-Standard Logics for Automated Reasoning, London: Academic
Press.
Turner, R. (1984) Logics for Artificial Intelligence, West Sussex: Ellis Horwood Limited.
Van Fraassen, B. (1981) ‘Probabilistic Semantics Objectified: 1. Postulates and Logics’, Journal of
Philosophical Logic 10, pp. 371–394.

The Nature of Non Monotonic Reasoning

Uploaded by

Document Information

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

The Nature of Non Monotonic Reasoning

Uploaded by

Copyright:

Available Formats

The Nature of Nonmonotonic Reasoning

Key words: logic, non-classical logic, nonmonotonic logic

Minds and Machines 10: 321–360, 2000.

We will now be more precise about the definition of monotonicity. In mathematical

n-tuples from the field of R into {1, 0}, as follows:

C[R](x1 , . . . , xn ) =1, if and only if R(x1 , . . . , xn ) is true

We then say that the relation R is strongly/weakly, positively/negatively monotonic

0 ` S A iff card(0) ≤ n and Lng(A) ⊆ Lng(0) but A 6 ∈ 0.

b.3 Subset principle: If 0 ⊆ 1, then 1 LE 0.

(1.1) Soundness: If 0 ` b A, then 0 LE {A} for all (most) rational belief

(1.2) Completeness: If 0 LE {A} for all (most) rational belief structures

In summary, it is trivially easy to formulate “proof theories” which violate the

the premises, the evidence, or the background assumptions. So c(A, 0) is to be

D.1 0 ` c A iff(∀)c(A, 0) > c(∼ A, 0).

To forestall trivial objections based on misunderstanding, it is important to note

∗ if “Sentence[A, c(–, f(0)]” is true, then

2. Let 1 be an arbitrary set. assume

P(A∧B, 0)=P(B,0) x P(A,0∪ {B})

If we reject conditionalizability, then this rule would seem to reduce to an absurdity,

There will be a problem if P(B, 0) = 0, but otherwise the distribution P∗ will be as

P(A,0 ∪ 1)<P(A,0) <P(A, 0 ∪ 6)

G[R](x1 , . . . , xn ) = d, if and only if R(x1 , . . . , xn )holds to degree d

One very simple view of probability theory is that it is a graded characteristic

P0 ( A, 0 ∪ 1) = 1 iff either A 0 or not 1 ⊆ 0

So we cannot say that every conditional probability distribution fails to be

P(A, 0 ∪ {B}) = P(s(B, A), 0)

Since the conditional of conditional probability is nonmonotonic, the syntactic

to define a corresponding nonmonotonic consequence relation “`nm ” for finite

for ∧0 the conjunction of the members of 0. (We would have to do something a

CCP.1 The relation LE is transitive.

Condition CCP. 1 must trivially be satisfied by any comparative relation. Next

P(A,0)×P(B,0∪ {A})=P(B,0)×P(A,0∪ {B})

(A,0) LE (A,0∪ {B})

characteristics of the underlying proof theory, no matter what the characteristics of

Both of our examples are instances of chaining together inferences in a way

Proof: The proof is immediate from Theorem 4.1 and ded.2.

otrans: If ` x -> y and ` (y ∧ x)->z, then ` x ->z.

Any syntactic structure which serves to reflect a conditional in the consequent

oded.3: If ` (x ∧ y)->z, then ` x-> (y->z).

If ` A -> B, then ` A-> (C -> B).

all nonmonotonic consequence relations are logically incompatible. Paradoxically,

8. The Nature of Nonmonotonic Reasoning

D-soundness: If 0 `D A, then DM(0) ⊆ DM({A}).

and completeness results.

of 0. From this characterization, it would immediately follow that if 0 D A, then

Consider some given premise set 0. We use S to determine an expanded premise

well. In any real application, reasonable results could be obtained by relying on

of nonmonotonic reasoning are only rational relative to a proper subset of ra-

SG.2.1 X believes P(A, 0) > P(∼- A, 0).

Regime 1: You answer “yes.

You answer “no”.

Regime 2: You answer “yes”.

You answer “no”.

You might also like

P0 ( A, 0 ∪ 1) = 1 iff either A 0 or not 1 ⊆ 0