Professional Documents
Culture Documents
CHARLES G. MORGAN
Department of Philosophy, University of Victoria, Victoria, B.C. V8W 3P4, Canada and Varney Bay
Institute for Advanced Study, P.O. Box 45, Coal Harbour, B.C. VON 1KO, Canada (E-mail:
morgan@phastf.phys.uvic.ca)
Abstract. Conclusions reached using common sense reasoning from a set of premises are often
subsequently revised when additional premises are added. Because we do not always accept previous
conclusions in light of subsequent information, common sense reasoning is said to be nonmonotonic.
But in the standard formal systems usually studied by logicians, if a conclusion follows from a set
of premises, that same conclusion still follows no matter how the premise set is augmented; that is,
the consequence relations of standard logics are monotonic. Much recent research in AI has been
devoted to the attempt to develop nonmonotonic logics. After some motivational material, we give
four formal proofs that there can be no nonmonotonic consequence relation that is characterized
by universal constraints on rational belief structures. In other words, a nonmonotonic consequence
relation that corresponds to universal principles of rational belief is impossible. We show that the
nonmonotonicity of common sense reasoning is a function of the way we use logic, not a function of
the logic we use. We give several examples of how nonmonotonic reasoning systems may be based
on monotonic logics.
1. Introduction
My friend Sarah is debating with herself about going to the mall. She needs to
do some shopping, but she dislikes crowds. She notes that it is 10:00 a.m. on a
Wednesday morning, and she reasons as follows:
At this hour of the day in the middle of the week, most adults with jobs will be
at work, and most children will be in school. So, the mall will not be crowded.
But then Sarah remembers that it is the 15th of the month. In her town, all the
merchants in the mall give a 25% discount to seniors on the 15th of the month. She
reasons as follows:
But it is senior’s discount day at the mall. Seniors really like to take advantage
of discounts because many of them are on fixed incomes, and seniors will not
be at work. So, the mall will be crowded.
On further reflection, Sarah remembers reading in the paper that there is a ser-
ious flu epidemic in her area. It is so severe that just yesterday the Department
of Health issued an advisory, warning all seniors to avoid crowds. She reasons as
follows:
However, the health department just issued that flu advisory, and at their age,
most seniors are pretty careful about their health. Hence most seniors will avoid
shopping until the advisory is lifted. So, the mall will not be crowded.
Sarah’s opinions about whether or not the mall will be crowded switch back and
forth as she recalls more and more relevant information. The pattern illustrated in
our example is familiar to us all. As we add more and more information, our conclu-
sions are apt to swing from one extreme to another, and back again. Commonsense
reasoning about everyday matters is full of such examples.
Formal logics of the sorts usually studied by logicians (at least up to about 20
years ago) do not seem well suited to the analysis of this sort of reasoning. Let A
be an arbitrary sentence and let 0 be an arbitrary set of sentences. We will use the
notation ‘0 ` A’ to indicate that our logic sanctions the inference from premises
0 to conclusion A. We may sometimes add a subscript to specify the logic if in
the context confusion about the logic is likely. The usual formal logics (classical,
intuitionistic, many-valued, modal) have a property that has come to be known
as “monotonicity”; namely if the logic sanctions an inference from premises 0 to
conclusion A, then the logic will sanction the inference to conclusion A from 0
plus any additional premises. That is, if 0 ` A, then 0 ∪ 1 ` A, for any set 1.
Note that in our example about Sarah, we did not reject any of our previous
explicitly stated premises as we proceeded. We simply added more and more in-
formation. As each new bit of information was added, Sarah’s conclusion switched
back and forth. Sarah’s reasoning was nonmonotonic. Most common sense reas-
oning about everyday matters seems to be nonmonotonic. Since common sense
reasoning is nonmonotonic, but the usual formal logics are all monotonic, it would
seem that we need to develop a new kind of logic, a nonmonotonic logic, to model
common sense reasoning.
It is the purpose of this paper to challenge not only the NEED, but also the
rational basis, for moving to a special logic in order to account for the nonmono-
tonic character of commonsense inference. After a few brief preliminaries, we
will present four proofs that it is impossible to have a nonmonotonic consequence
relation that corresponds to universal constraints on rational belief structures. In
brief, a rational nonmonotonic logic is impossible. We will then go on to show that
one can model nonmonotonic reasoning with standard monotonic logic, indicating
that there is no need for a special nonmonotonic logic anyway. Non-monotonicity
results from making alterations in our mental models of the world as a result of
new information, and then using the revised mental models and standard logics to
draw our conclusions. Principles of nonmonotonic reasoning are simply heuristics
for the modification of the mental models we use to get around in the world. As
such they are not content free, as is the case with principles of deductive reasoning.
Principles of nonmonotonic reasoning contain empirical assumptions and should
be subject to empirical test and verification, as with any other scientific generaliza-
tions. While these conclusions are not startlingly new, what is new are our proofs of
the irrationality inherent in the alternative of viewing nonmonotonicity as deriving
from some strange logic.
This paper is basically meta-theoretical. It is not intended to be a survey of the
field of nonmonotonic logic, and generally, we will avoid detailed commentary on
NONMONOTONIC REASONING 323
specific systems. Good introductions to and surveys of the early trends in research
in this area may be found in Besnard (1989), Smets et al. (1988) and Turner (1984).
More recently there have been a large number of survey books published covering
more recent developments; one may profitably begin with Brewka et al. (1997).
2. Preliminary matters
We will begin with a few mundane observations about the role of computation in
the interaction of an organism with its environment. Biological organisms have a
number of primary goals, among which we may list (1) energy acquisition, (2)
reproduction, and (3) death avoidance. These categories are only mentioned as
a rough guide and are intended to be neither mutually exclusive nor jointly ex-
haustive. Let us compare a very simple organism like the paramecium with a more
complicated organism, like the common house cat.
For the paramecium in its usual environment, food is plentiful and randomly
distributed, enemies are randomly distributed, and reproductive opportunities (at
least opportunities for the sharing of genetic material) are randomly distributed. So
in its environment, the paramecium has no need of great computational power.
Computational power would not help it in energy acquisition, reproduction, or
death avoidance. Since computational power costs energy, increased computational
power for the paramecium would be a net loss.
In contrast to the environment of the paramecium, for the common cat, food
is scarce and not randomly distributed, enemies are not randomly distributed, and
reproductive opportunities are not randomly distributed. Nor are any of these items
uniformly distributed. In order to overcome this non-random, but non-uniform dis-
tribution, the cat has to be able to take information in from its environment and
make predictions about the consequences of various actions it might take.
Consider a cat trying to pounce on a mouse in the middle of a field. In order
to be useful, the feline computations must satisfy a number of constraints. Any
prediction by the cat of the movements of the mouse, in response to anticipated
movements by the cat, must occur sufficiently in advance of the event to allow the
cat to act on the prediction; if it takes too long to do the computation, it is useless.
Further, the prediction must concern matters for which the cat possesses some
behavioral strategy; it will do the cat no good to compute air flight trajectories,
as an owl might do, since the cat cannot fly. And finally, the time and energy costs
of the computation from the data available must not on average exceed the utility
of the prediction; so the extra time and energy cost to the cat of predicting the
position of the mouse correct to an angstrom, rather than correct to a centimeter,
would not yield enough payoff to be worthwhile. In summary, predictions must be
made relatively efficiently, they must be mostly correct, and they must be about
matters that are important to the organism.
Random strategies are effective only in random environments of a certain sort.
Success in a highly non-random, non-uniform environment requires both (1) an
324 C.G. MORGAN
adequate behavioral repertoire, and (2) appropriate predictive capacity. The greater
the behavioral repertoire and the better the predictive capacity, the more diverse
environments an organism can exploit, and vice versa. But in general as beha-
vioral and computational capability increase, there is a corresponding increase
in the logical state space complexity (more vectors in the space). And as state
space complexity increases, there must in general be an increase in the physical
complexity of any physical realization of a computer with respect to that space.
But as the physical complexity of an organism increases, there will be increased
energy requirements. Hence, in general, organisms that can exploit more diverse
environments have greater energy requirements.
Devices which can store information in the environment and recover it in a
systematic way as needed will in general have an advantage over devices not
so endowed. The first advantage is that fewer internal states will be required for
memory, so devices with “environmental memory” can be physically simpler. And
the second advantage is that devices with “environmental memory” will generally
have logically superior computational ability. By analogy, recall that a Turing ma-
chine is just a finite state machine brain controlling a read/write head for storing
and recovering information in its environment; and Turing machines are certainly
computationally superior to finite state machines. Further, a universal Turing ma-
chine can perform any computation that can be done by any Turing machine, and
yet the universal machine has a fixed “brain” size.
Thus organisms, with the ability to exploit very diverse environments will derive
an energy advantage from the use of computationally efficient languages that can be
used to model the environment and to predict future events. But in general, time and
energy efficiency requirements prohibit totally accurate models and totally accurate
predictions.
So the human use of natural languages is based, at least in part, on their util-
ity for modeling and predicting aspects of the environment under conditions of
uncertainty. In order to be practically useful, our modeling of the world will not
generally include all possible details. The most detailed map would be one with
a 1:1 scale, but it would be absolutely useless because it would completely cover
the terrain! Similarly, time, space, and energy bounds on our mental computations
require that mental models be somewhat simpler than that which they model. And
we are seldom able to make, nor have need of, measurements of length precise to
an angstrom, for example. So many, perhaps most, of our models of and claims
about the world are best viewed as approximations.
To repeat, our rational thought processes involve modeling and predicting as-
pects of our environment under conditions of uncertainty. Here we will assume that
our formal language is adequate for such modeling and for the expression of claims
about our world that are important to us.
To begin with, we do not wish to beg any important formal questions, so we
will be as general as possible in our characterization of the language. We will
assume that we have given some formal language, with a denumerable set of well
NONMONOTONIC REASONING 325
formed formulas.
L = {E1 , E2 , . . . }
For stylistic variety, we will use the terms “well formed formula”, “WFF”, “sen-
tence”, and “expression” interchangeably. Initially we make no assumptions about
connectives, quantifiers, or other logical particles.
We will simply assume that crucial aspects of our reasoning can be represented
in an appropriate way by expressions in the language. In particular, we assume that
for any argument, the premises are given as a set of sentences, and the conclusion
is a sentence. We use the notation “0 ` A” to indicate that the logic under study
sanctions drawing conclusion A from the set of premises F. The consequence re-
lation “`” will sometimes be subscripted when it is necessary to distinguish the
consequence relation of one logic from that of another. For a given logic, the set of
consequences of the premises 0 is defined by:
Con(0) = {A : 0 ` A}
By “card(0)”, we mean “the cardinal number of elements in 0”; and in the above
formulation, “n” is some arbitrary finite integer. And we use “Lng(A)”, the lan-
guage of A, to mean the set of non-logical components used in the construction of
A, i.e., the set of predicates, constants, and sentence letters that actually occur in
A; similarly, “Lng(0)” is used to designate the set of nonlogical components that
appear in the members of 0. So basically, our strange consequence relation says
that as long as we have fewer than some specified number of premises, any sentence
which uses only the language of the premises but is not one of the premises is a
consequence of those premises.
Our strange consequence relation is clearly nonmonotonic, since if we add
premises so the total exceeds the specified number n, then nothing follows. It is
also paraconsistent, because if the language contains a negation “∼”, we may have
both 0 `S A and 0 `S ∼ A, but we will not have 0 `S B for all B. But even though
it is nonmonotonic, this consequence relation is still clearly uninteresting. It is
trivial because it sanctions many inferences we would deem to be irrational, and it
fails to sanction inferences we would deem to be rational. For example, assuming
2 ≤ n, then from premise set “The cat sat on the mat., The dog is brown.”, we could
conclude “The dog sat on the cat.” We would also be entitled to conclude “The cat
NONMONOTONIC REASONING 327
did not sit on the mat” but we would not be entitled to conclude “The cat sat on the
mat”. Anyone whose reasoning corresponded to our strange consequence relation
would rightly be deemed to be irrational. To the extent that one refuses to accept
logically obvious conclusions, one may be deemed to be less than rational; on the
other hand, jumping to bizarre conclusions is a clear indication of irrationality.
So cooking up nonmonotonic consequence relations is not a problem. The prob-
lem is to formulate a nonmonotonic consequence relation that corresponds to can-
nons of rationality. Roughly what we want is an explication of something like
“Given that it is rational to believe the set of premises 0, it is rational to believe
A.” Relations which violate principles of rational belief can not be taken seriously
as candidates for a common sense consequence relation; to be non-trivial, a con-
sequence relation must correspond to principles of rational belief. In order to be
acceptable, a logic must sanction all (or at least most) of the arguments which
our common sense tells us are rational. On the other hand, we will not accept as
adequate a logic that sanctions just any old inference at all. To be acceptable, the
logic must reject all (or at least most) of the arguments which our common sense
tells us are not rational.
In short, to be acceptable, our consequence relation must correspond to some
sort of theory of rational belief and inference, both in the arguments it accepts and
in the arguments it rejects. When we say that nonmonotonic logic is impossible,
at least part of what we are claiming is that every adequate characterization of a
logical consequence relation in terms of rational belief structures is monotonic.
That is, mathematically speaking its consequence relation is weakly, positively
monotonic. Our first two proofs are aimed at establishing this result. Further, we
also wish to claim that the nonmonotonic character of our commonsense inferences
is an essentially meta-linguistic characteristic of the way in which the proof theory
is used, and it cannot be mirrored by any syntactic structures in the object language;
in other words, it is impossible to introduce into the syntax of the object language
any structures that adequately reflect the nonmonotonic character of common sense
reasoning. Our second two proofs are aimed at establishing this result.
3. First proof
Perhaps the most detailed formal account of rational belief structures is classical
probability theory. We can think of probability theory as an instantaneous snapshot
of the beliefs of some ideally rational agent. But for our purposes, classical prob-
ability theory depends on too many assumptions. We do not want to assume that
the language contains any particular logical particles, while classical probability
theory usually assumes at least the machinery of classical logic. We do not want
to assume that our beliefs can be assigned precise numerical values, as is done
in probability theory. We do not want to assume that all beliefs can be linearly
ordered, nor even that all beliefs are pairwise comparable.
328 C.G. MORGAN
Intuitively, we can think of a theory about some aspect of our world as just a set
of sentences. An instantaneous snapshot of the belief system of a rational agent will
assign some level of “belief” to at least some sets; we could write “b(0)” for “the
degree of belief in 0”. But degrees of belief may not correspond to numbers, and
many sets may have no degree of belief because they have never been entertained.
Further, it may be that there is no intersubjective way of comparing degrees of
belief from one individual to another. From our point of view, what is important is
that the belief system will impose some form of ordering on some of the sets. Since
it is the ordering that indicates the structure, we will generally concentrate on the
ordering. We will use LEb to indicate a quasi-ordering relation relative to belief
structure b; but because the “degree of belief” is of no interest to us apart from
the ordering relation, we will generally drop the subscript. So instead of writing
something like “b(0) ≤ b(1)” we will write “0 LEb 1”, or more simply “0 LE 1”
for “the degree of joint acceptability of the members of 0 is less than or equal to
the degree of joint acceptability of the members of 1”.
Recall that we do not assume that “degree of acceptability” or “degree of belief”
corresponds to any numerical measure. Further, we do not assume that the relations
LE are either objectively or subjectively based; they may be either, they may be
partially one and partially the other, or they may be based on some third alternative.
We do not assume that the relations LE are in accord with any standard probability
measure or that they correspond to some objective relative frequency scheme. We
want to admit interpretations such as “If all the members of 0 have a probability
above threshold r, then all the members of 1 also have probability above that
threshold,” or “most people who find all the members of 0 to be more acceptable
than their denials would find the members of 1 to be more acceptable than their
denials”.
With these brief preliminaries, we are now in a positions to specify some re-
strictions which apply to all rational belief structures. For our purposes, we will
take a belief structure LE to be a subset of P (L)× P (L), subject to the following
two constraints:
b.1 Reflexivity:0 LE 0
b.2 Transitivity: If 0 LE 0 0 and 0 0 LE 0 00 , then 0 LE 0 00 .
In short, we will consider any quasi-ordering relation defined over pairs of sets of
sentences to be a belief structure.
Recall that we are thinking of a set of sentences as being a theory about the
world, in the ordinary scientific sense of the word ‘theory’. For example, the set
{A, B} is interpreted as meaning that both A and B hold. It would be bizarre in the
extreme for someone to claim to believe the theory of thermodynamics, but then
go on to say they do not believe in the second law. So sets of sentences are treated
conjunctively, even if there is no conjunction operator in the language. Further, sets
may contain an infinite number of sentences, and so would not correspond to any
single sentence even in standard languages with a conjunction operator. Thus we do
NONMONOTONIC REASONING 329
not assume that sets are closed with respect to arbitrary conjunctions of members.
In addition, just as in ordinary scientific discourse, we do not assume that the sets
are “deductively closed”, or closed with respect to any consequence relation. To
write down the theory of Newtonian mechanics, we do not have to write down all
its deductive consequences.
Our next constraint is motivated by simple relative frequency considerations.
As an illustrative example, note that it is harder to build a house and an attached
garage than it is to build just the house. Similarly, a theory which claims both A
and B will be more difficult to support than a theory which claims just A. Looking
at it from another point of view, there will be fewer (or no more) universe designs
compatible with both A and B than there are compatible with just A. In general, if
0 makes no more claims about the universe than 1, then 0 is at least as likely as
1. Formally, we can state the condition as follows.
If 0 makes no more claims about the world than does 1, then it would be irrational
not to have at least as strong a belief in 0 as in 1.
For any specific logic, there will no doubt be other constraints one could place
on rational belief structures. For just one example, consider classical logic. In
classical logic we know that from a conditional A ⊃ B and its antecedent A, one
may infer the consequent B. This inference rule might translate into a number of
restrictions on rational belief structures, such as either of the following:
b.mpl { A ⊃ B, A} LE {B}
b.mp2 If A ⊃ B 0 and A 0, then 0 LE 0 ∪ {B}
But again, we do not wish to prejudge any issues with regard to logical particles or
inference rules. Recall that formal logics are both descriptive and prescriptive. So
to avoid questionable presuppositions, we will adopt the very general view that a
logic is just any prescription for what is to count as a rational belief structure. That
is, a logic L is any arbitrary set of belief structures:
L = {LE1 , LE2 , . . . }
Each distinct logic will pick out a different set of belief structures; those for clas-
sical logic will be different from those for intuitionism, and both will be different
than those for Post logic. But the important point is that from the standpoint of
logic L, all and only the belief structures in L are rational.
Recall that we are concerned with the question of what kinds of entailment
relations can be characterized in this way. Whatever else it may be, logic is not
psychology. Logicians are not trying to codify the mental associations of some
particular individual. So logical entailment does not depend on any single belief
structure. Rather, logical entailment for logic L must be based on some universal
properties of all rational belief structures as determined by L. Further, if the proof
330 C.G. MORGAN
theory is to be adequate, it must accept all the arguments sanctioned by L and reject
all the arguments deemed unacceptable by L. These considerations are expressed
by two standard properties, which we will discuss in turn.
To motivate the first property, suppose we are trying to construct an artificial
reasoning system. We certainly want to AVOID making a system that will some-
times, in logically unpredictable cases, draw conclusions that most would regard as
irrational. Our first property captures this desideratum and is called “soundness.”
We can think of this principle as the “good path” principle; logical entailment
should never lead us astray. We can express the soundness principle formally as
follows:
Note that we are using the notation “`b ” to indicate a proof theory which coincides
with some set L of rational belief structures.
To motivate the second property, again suppose we are trying to construct an
artificial reasoning system. As before, we want to AVOID making a system that
will sometimes, in logically unpredictable cases, FAIL to draw conclusions that
most would deem to be rational. The second property is called “completeness.”
Basically, we want our proof theory to be strong enough to capture any argument
which is sanctioned by every rational belief structure. If, no matter what the rational
belief structure is like, it is always rational to accept A on condition that we accept
all of 0, then 0 logically entails A. Using the same notation as above, we can
formally express the completeness condition as follows:
Note that in our statements of (1.1) and (1.2), we have included “most” in par-
entheses in an effort to be as generous as possible. In our proof below, we will use
the “all” construction to keep the presentation of the proof as simple as possible.
However, simple relative frequency considerations could be used to establish the
same result as long as “most” means “more that 50%.”
So far, we have been very general in all our considerations. We have allowed
the formal language to have absolutely any logical particles and internal structure
desired. Our characterization of belief structures was also very general, requiring
only the minimal conditions dictated by relative frequency considerations. In par-
ticular, we imposed no requirements on the evolution of belief structures. And we
allowed a completely arbitrary specification of which belief structures are to be
considered rational. We have imposed no requirements on the sorts of inference
rules contained in the proof theory. Of our proof theory, we required only (1)
that it not lead us astray by sanctioning inferences deemed to be irrational, and
(2) that it be strong enough to capture any argument deemed to be rational by all
NONMONOTONIC REASONING 331
belief structures. But even with this simple framework, we are now in a position to
demonstrate the impossibility of a nonmonotonic consequence relation.
Theorem 1: Let L be an arbitrary set of rational belief structures which are re-
flexive, transitive, and satisfy the subset principle. Further suppose that logical
entailment `b is sound and complete with respect to the set L . Then logical
entailment is monotonic; that is, if 0 `b A, then 0 ∪ 1 `b A.
Proof:
1. 0 ` b A given
2. 0 LE {A}for all LE L. 1, soundness
3. 0 ∪ 1 LE 0 for all LE L subset principle
4. 0 ∪ 1 LE {A}for all LE L. 2, 3, transitivity
5. 0 ∪ 1 ` b A 4, completeness
4. Second Proof
Some may feel that trying to represent belief structures by quasi-orderings on sets
of sentences does not do justice to the conditional nature of actual belief systems.
It seems that almost any belief we have is conditional on a lot of background
assumptions. For example, even the belief that “The cat is on the mat” involves
assumptions about the physical nature of cats and mats, the impenetrability of
matter, and so on. So, let us consider conditional belief structures as a foundation
for an entailment relation.
Theories of conditional probability are likely the most frequently encountered
candidates for models of conditional belief structures. As we previously indic-
ated, there is some controversy over the adequacy of numerically based probability
theories as models of actual belief structures. Most people cannot linearly order
their beliefs, much less assign precise numerical weights to them, so it seems that
numerically based theories are over specified. As before, we do not wish to beg
any fundamental questions, so we will allow both numerically based theories and
comparative theories as well. A “conditional belief unit” will be represented by
“c(A, 0)” for A any sentence in the language and 0 any set of sentences of the
language. We think of A as the conclusion, and we think of the set 0 as containing
332 C.G. MORGAN
c∗ (A, 0) = df c(A, 0 ∪ 1)
Carefully note that we are not saying that c(A, 0) = c(A, 0 ∪ 1). We are making no
claim about the relationship between c(A, 0) and c(A, 0 ∪ 1). All we are saying
is that a conditionalized legitimate conditional belief structure is itself a legitimate
conditional belief structure.
Also note that requiring L to be closed under conditionalization does not impose
any restriction on the evolution of beliefs. While some have claimed that the only
rational way for beliefs to evolve is by (e.g., Bayesian) conditionalization, others
have disputed such claims. Again, C. 1 does not say anything about the evolution
of beliefs; it simply says that a conditionalized rational belief function is itself a
rational belief function.
The only problems with conditionalization that generally arise correspond to
cases in which the measure of the conditioning set is 0, or alternatively, when the
sentences in the conditioning set are internally inconsistent or incoherent in some
way. In such cases there is a trivial modification of the theory possible to give c∗
some default value, usually 1 for numerically based theories. The alternative is
to permit partial functions; in such a case, we would take c∗ to be the function
indicated in C.1, but undefined when 0 ∪ 1 is problematic as evidence.
We will now consider an important general property of many characterizations
of entailment. In order to motivate the property, let us consider a few proposed
definitions for an entailment relation. For simplicity, we will restrict the examples
to those based on numerically valued functions c.
In both D. 1 and D.2, the evidence position is always occupied just by the premise
set 0. However, in D.3, the evidence position is occupied by 0∪6, and 6 is univer-
sally quantified. That is, the defining sentence may talk about not just 0, but also
some set-theoretic function of 0. But in all three cases, the defining sentence has a
property which we call being “transparent to unions”; the formal characterization
is as follows:
C.2 “Sentence [A, c(–, f(0))]” is transparent to unions iff for all sets 1, if
“Sentence[A, c(–, f(0) ∪1 ]” is true, then
“Sentence[A, c(–, f(0 ∪ 1)]” is true.
Proof.
1. 0 ` c A given
NONMONOTONIC REASONING 335
seem bizarre to claim that when we conditionalize our belief state in this way, we
suddenly fall into irrationality!
Further note that conditionalizability follows from relative frequency consider-
ations, even relative frequency of universe designs. Any distribution derived from
relative frequency can be conditionalized, as long as we incorporate the proviso to
handle cases with frequency 0. In addition, let us consider the usual product rule
for conjunctions.
P∗ (A, 0) =df P(A ∧ B, 0)/ P(B, 0), for all WFFs A and sets 0
5. Soundness
Both of our first two proofs depend on the assumption of a principle of soundness.
One proposal for avoiding the impossibility proofs is to abandon soundness. In
terms of classical logic, “revisable” conclusions must go beyond what is deduct-
ively sanctioned by the premises, otherwise they would not be “revisable”. But for
such conclusions, it will always be possible to find a model in which the premises
are all true but the conclusion is false. Hence, in terms of classical semantics,
nonmonotonic inferences are not sound.
Before blithely abandoning the soundness principle, it should be noted that
rational belief structures of the sort considered in our first two proofs are quite
different from classical model theory. The two soundness principles utilized in our
proofs are not equivalent, at least not by themselves, to the soundness principle
of classical logic. Note that in both of our impossibility proofs, subject to the
initial constraints, we allowed the selection of the belief structures deemed to be
rational to be made in any way whatsoever. What we proved in each case is that
it is impossible to characterize a set of rational belief structures in such a way
that (1) every inference that is always deemed to be rational is sanctioned by the
proof theory, and such that (2) every inference sanctioned by the proof theory is
always deemed to be rational. That is, no matter what the nonmonotonic proof
theory is like, and no matter how you select the set of rational belief structures, if
the proof theory is complete, then it must sanction arguments that are sometimes
deemed irrational. In this sense, nonmonotonic proof theory is self-refuting. To
give up on soundness is, in essence, to admit that nonmonotonic proof theory must
be irrational.
Note that we cannot even beg the question and use the proof theory to select the
set of belief structures that are to be considered rational. Suppose we start with a
proof theory, and then select the set of belief structures that sanction (in the sense
given by the appropriate soundness and completeness conditions) all and only the
inferences that are sanctioned by the proof theory. Then relative to that set of belief
structures, the proof theory will be monotonic. In short, there is no way to select a
set of rational belief structures and specify a nonmonotonic proof theory that does
not violate the cannons of rationality. If you want a complete nonmonotonic proof
theory, then you must be (at least sometimes) irrational.
6. Third Proof
In spite of our first two proofs, one might still harbor a bit of hope for constructing
a theory with some sort of nonmonotonic operator in the syntax of the language,
and then using that operator in some way to account for the nonmonotonicity of
the consequence relation. It will be useful to consider an analogy with classical
probability theory.
We begin with a simple example. Suppose we draw a single card from a stand-
ard deck with the usual constitution. Then the probability that the card is black,
338 C.G. MORGAN
given the background information, is 1/2. Now, if we are told, in addition to the
constitution of the deck, that the card drawn is a spade, then the probability that
the card is black will rise to 1; but if instead the additional information is that the
card is red, then the probability that the card is black will fall to 0. In other words,
we can find specific probability distributions P, sentences A, and sets 0, 1, and 6
such that:
Some additional evidence makes our conclusion more likely, while some addi-
tional evidence makes our conclusion less likely. Mathematically speaking, condi-
tional probability functions are in general not weakly positively monotonic in the
evidence position.
At this point, it is important to be very clear about our terminology. As we
stated above, conditional probability functions are in general not weakly positively
monotonic in the evidence position. For some sets 1, P(A, 0 ∪ 1) is less than
P(A, 0), while for other sets 6, P(A, 0 ∪ 6) is greater than P(A, 0). Without
trying to specify the details, we note that the assertability of a conclusion relative
to a set of premises is related to the conditional probability value. For just one
example, we consider the extreme cases: the sentence “A probably holds, provided
that things are as described by 0” is a reasonable claim if P(A, 0) = 1, but it is
clearly unreasonable if P(A, 0) = 0. Hence there will be cases in which the condi-
tional “A probably holds, provided that things are as described by 0” is reasonable,
but the conditional “A probably holds, provided that things are as described by
0 ∪ 1” is not reasonable. Similarly, there will be cases in which the conditional “A
probably holds, provided that things are as described by 0” is not reasonable, but
the conditional “A probably holds, provided that things are as described by 0 ∪ 6”
is reasonable. Using the modern idiom, the conditional of conditional probability
theory clearly seems to be nonmonotonic.
For an alternative view, recall our earlier discussion of the characteristic func-
tion of a consequence relation. Typically, characteristic functions are two-valued.
But by a simple extension of the notion, we could admit graded characteristic
functions. For any graded n-adic relation R (n = 1, 2, ...), the graded characteristic
function G[R] is defined to be the map which takes n-tuples from the field of R
into the range of possible gradations, as follows:
ility distributions are not in general weakly positively monotonic in the evidence
position.
In summary, it seems that the conditional associated with conditional probab-
ility theory will be nonmonotonic. Similarly, it seems that if by analogy with the
characteristic function for classical entailment, we think of probability theory as
a graded characteristic function for an entailment relation, that entailment relation
will be nonmonotonic. So it seems appropriate to simply say that probability theory
is nonmonotonic, meaning by this phrase only that in general, probability functions
are not weakly positively monotonic in the evidence position.
Note that not every probability function P will behave in this nonmonotonic
fashion. For example, in the case of classical logic, for any maximally consistent
set 0, we can define a two valued distribution as follows, for all sentences A and
sets 1:
The function so defined will satisfy all the constraints for classical probability
theory (modulo the proviso for evidence of measure 0). But the distribution P0
is strictly monotonic, in the sense that for all sentences A and all sets 1 and 6:
P0 ( A, 1) ≤ P 0 (A, 1 ∪ 6)
0 ` nm A iff ` s s( ∧ 0, A)
LE ⊆ (L × P (L)) × (L × P (L))
However, as indicated, we will write the relations in infix notation, and treat them
as two-place relations on ordered pairs. We will first list the constraints on the
relations LE, and then we will give a brief motivation of each one.
An elementary 3 circle Venn diagram will easily establish that the relationship must
hold of any theory in accord with relative frequency. But if P(A, 0∪ {B}) takes the
value 1, then it immediately follows that P(B, 0)≤ P(A, 0). Condition CCP.3 is
just a simple comparative version of this result.
NONMONOTONIC REASONING 341
Let us now consider the possibility of attempting to introduce into the object
language a syntactic structure which is meant to capture the conditional of con-
ditional probability theory. In terms of our comparative structures, we require the
following two conditions:
CCP.4.a (A, 0∪ {B}) LE (s(B, A), 0)
CCP.4.b (s(B, A),0) LE (A,0 ∪ {B})
But serious difficulties arise if we try to add these conditions. We will state and
prove two theorems, and then discuss their significance.
Theorem 3.1: If LE satisfies CCP. 1-3, 4.a, then the syntactic construction
s(B, A) is monotonic; that is, the following holds for all expressions A and B and
all sets 0:
(A,0) LE (s(B, A),0)
Proof:
1. (A,0∪ {A,B}) LE (s(B,A), 0∪ {A}) CCP.4.a
2. (A,0∪ {A}) LE (A,0∪ {A,B}) CCP.2
3. (A,0∪ {A}) LE (s(B,A),0∪ {A}) 1,2,CCP.1
4. (s(B, A), 0∪ {s(B, A)}) LE
(s(B, A), 0∪ {s(B, A), A}) CCP.2
5. (A,0∪ {s(B,A),A} ∪ {A}) LE
(A, 0∪ {s(B, A), A} ∪ {A, s(B, A)}) CCP.2
6. (A, 0∪ {s(B, A), A} ∪ {A}) LE
(A,0∪ {s(B,A),A} ∪ {s(B,A)}) 5, set theory
7. (s(B, A), 0∪ {s(B, A), A}) LE
(A,0∪ {s(B, A), A}) 6,CCP.3
8. (s(B,A),0∪ {s(B,A)}) LE
(A,0∪ {s(B, A), A}) 4,7,CCP.1
9. (A,0∪ {A} ∪ {A}) LE
(A, 0∪ {A} ∪ {s(s(B, A), A)}) CCP.2
10. (s(s(B, A), A), 0∪ {A}) LE (A, 0∪ {A}) 9, CCP.3
11. (A,0∪ {A,s(B,A)} LE
(s(s(B, A), A), 0∪ {A}) CCP.4.a
12. (A,0∪ {A,s(B,A)} LE (A,0∪ {A}) 10,11,CCP.1
13. (s(B,A),0∪ {s(B,A)}) LE (A,0∪ {A}) 12,8,CCP.1
14. (s(B, A), 0∪ {s(B, A)}) LE (s(B, A), 0∪ {A}) 3,13,CCP.1
15. (A,0) LE (s(B,A),0) 14.CCP.3
342 C.G. MORGAN
Theorem 3.2: If LE satisfies CCP.1-3, 4.a, and 4.b, then LE is monotonic; that is,
the following holds for all expressions A and B and all sets 0:
Proof: The proof is immediate from Theorem 3.1, CCP.4.b, and CCP.1.
Suppose we try to introduce a conditional construction into the syntax that al-
lows us to move sentences from the premise set of the probability distribution into
the antecedent of a conditional in the object language without decrease of prob-
ability value (i.e., in accord with CCP.4.a). Theorem 3.1 immediately tells us that
any such conditional must be monotonic. Looked at in another way, Theorem 3.1
tells us that any nonmonotonic conditional in the syntax must violate fundamental
principles of probability theory. In other words, no nonmonotonic conditional in
the syntax can serve as the syntactic structure of CCP.4.a. It is logically impossible
for a nonmonotonic conditional in the syntax of the language to reflect the most
elementary principles of relative frequency.
But the situation is even more serious, as is indicated by Theorem 3.2. Note
that the statement of Theorem 3.2 does not directly mention the new syntactic
structure. The simple attempt to include a structure in the object language which
captures the conditional of the probability theory forces the probability theory to
be monotonic for every set of premises and every possible conclusion, whether
the new conditional structure is involved or not. But not only is the probability
theory forced to be monotonic, the corresponding object language conditional must
itself be monotonic as well. So the attempt to capture in the object language the
nonmonotonic character of conditional probability theory is doubly doomed: (1)
the probability theory immediately ceases to be nonmonotonic, and (2) the new
object language conditional cannot be monotonic either.
No matter what the logic is like, no matter what logical particles the language
contains, it is logically impossible to incorporate any syntactic structure in the
object language which captures the nonmonotonic character of conditional prob-
ability theory. As soon as we try to incorporate such a structure into the object
language, the probability theory is forced to be monotonic. But if the conditional
of the probability theory is forced to be monotonic and the object language condi-
tional captures the conditional of the probability theory, then the object language
conditional will be monotonic too. Hence, the nonmonotonic character of condi-
tional probability theory is essentially, necessarily meta-theoretical, and cannot be
reflected in the underlying syntax of the object language.
Thus any alleged nonmonotonic conditional in the object language must be
characterized by principles that violate the most elementary considerations of re-
lative frequency. Note that this result applies, no matter how a theory of non-
monotonic conditionals is formulated. We motivated the theorem in this section by
beginning with a consideration of the conditional of conditional probability theory.
But these results apply to any logic, with any logical particles, no matter what the
NONMONOTONIC REASONING 343
7. Fourth Proof
From our first three theorems it is obvious that there are very severe problems asso-
ciated with any attempt to account for the general nonmonotonic character of our
reasoning by the development of a new consequence relation in formal logic. Such
developments cannot correspond to rational belief structures. And there is no way
to incorporate logical particles into the syntax of the language in such a way that
those particles are governed by relative frequency considerations. However, we did
indicate that it is always trivially possible to develop nonmonotonic algebras, and in
spite of their shortcomings, such theories may turn out to have useful applications
in some areas. For our last proof, we will consider potential difficulties that might
arise when we try to reason with such logics. We will begin with a couple of simple
examples of reasoning.
First example: Sherlock, the detective, is trying to figure out who shot JR. Since
there was only one bullet, only one person did the deed. Initial evidence (motive,
location) indicates the culprit was either Mary or Sue. Subsequent evidence (find-
ing the gun, matching the marks on the bullet in the body to those from bullets fired
from the gun, fingerprints on the gun, witnesses) strongly suggests Mary pulled the
trigger. Sherlock concludes that Sue did not shoot JR.
Second example: Astronomer Spock makes observations of the motion of bod-
ies in a certain location in space. His observations plus considerations of relativistic
mechanics suggest there is a very massive object with a relatively small radius
there. Such an object would be a black hole. From the theory of black holes, and
from the observations of the matter in the region, Spock predicts that characteristic
radiation, released by matter falling into the black hole, will be detected in the
region.
344 C.G. MORGAN
Theorem 4.1: For any reflexive, transitive consequence relation satisfying ded.l,
the syntactic structure “s” is monotonic; that is:
If 0 ` A, then 0 ` s(B,A).
Proof:
1. 0`A given
2. 0 ∪ {A,B} ` A refix
3. 0 ∪ {A} ` s(B,A) ded.1
4. 0 ` s(B,A) 1, 3, trans
Theorem 4.1 demonstrates that it is impossible to reflect any transitive, reflexive
consequence relation by any nonmonotonic construction in the syntax of the ob-
ject language. However, the situation is even more serious, as is indicated by the
following result.
Theorem 4.2: Any reflexive, transitive consequence relation that satisfies both
ded.1 and ded.2 with respect to any structure “s” is itself monotonic; that is:
If 0 ` A, then 0∪ {B} `A.
NONMONOTONIC REASONING 345
Theorem 4.2 tells us that if a deduction theorem holds with respect to any syntactic
construct, no matter what the characteristics of that construct, then a transitive,
reflexive consequence relation must be monotonic. (This result tells us that it is
impossible to obtain a nonmonotonic logic by any simple extension of a classical
natural deduction system.) Thus Theorem 4.2 suggests again that nonmonotonicity
of the consequence relation is independent of the syntax of the language.
Looked at from another point of view. Theorem 4.2 just says that no transitive,
reflexive, nonmonotonic consequence relation can have a deduction theorem with
respect to any syntactic structure in the object language. For some, this result may
not seem too devastating, at first. After all, the usual formulation of the rule of
necessitation for the standard modal logics precludes a proof of the deduction
theorem. But, there are some important differences between the standard modal
logics and attempts to formulate nonmonotonic logic. In modal logic, the rule of
necessitation is validity preserving, not truth preserving. But as we have seen in
our first two impossibility proofs, nonmonotonic logics cannot be formulated with
validity preserving principles, i.e., with principles that are universalizable across
all models. There are ways of formulating standard modal logics and the rule of
necessitation so that the deduction theorem holds. But in contrast, there is no way
to formulate a nonmonotonic logic so that the deduction theorem holds. So the
difficulty presented by Theorem 4.2 is rather deeper than the difficulty associated
with the usual formulations of modal logics.
To see how penetrating the problem is, we can reformulate Theorems 4.1-2
in terms of an axiomatic structure with a special conditional operator “->” and a
conjunction operator “∧”, both in the object-language. We can think of our charac-
terizations of the consequence relation at the meta-level as being characteristics of
the object-language conditional instead. Our transitivity and reflexivity conditions
now become the following object-language conditions:
There will be two object-language principles, parallel to the two components of the
deduction theorem, as follows:
oded.1: If ` (x ∧ y) -> z, then ` x-> s(y,z).
oded.2: If ` x->s(y,z), then ` (x ∧ y)->z.
Finally, we assume the conjunction operator is at least partially associative in the
antecedent of the conditional:
assoc:If ` (w ∧ (x ∧ y))->z, then ` ((w ∧ x) ∧ y)->z.
We can now recast Theorems 4.1 and 4.2, as follows.
346 C.G. MORGAN
Theorem 4.3: Suppose we have an axiomatic system that satisfies otrans, orefix,
oded.l, and assoc. Then for all expressions A, B, and C: if ` A -> B, then
` A -> s(C,B).
Proof:
1. `A->B given
2. `(B ∧ (A ∧ C))->B orefix
3. `((B ∧ A) ∧ C)->B 2,assoc
4. ` (B ∧ A) -> s(C, B) 3,oded.1
5. ` A -> s(C, B) 1,4,otrans
Theorem 4.4: Suppose we have an axiomatic system that satisfies otrans, orefix,
oded.1, oded.2 and assoc. Then for all expressions A, B, and C: if ` A -> B, then
` (A ∧ C)->B.
Proof: The proof is immediate from Theorem 4.3 and oded.2.
Then Theorem 4.3 tells us that the conditional is monotonic in its second order
occurrences:
Hence, all of its nonmonotonic behavior will be limited to its first-order occur-
rences.
But even worse for seekers after a nonmonotonic logic, any conditional which
has a uniform reflection by any syntactic structure in its consequence, in the sense
of oded.1 and oded.2, any such conditional will be monotonic. If a conditional is
provable, then so will be a similar conditional with arbitrary added conjunctive
components in its antecedent. But then many moves which we make in common
sense reasoning will not hold for such a logic. For example one cannot convert a
nonmonotonic conditional with a conjunctive antecedent into a conditional chain
and back again.
In summary, there can be no uniform connection between nonmonotome con-
sequence relations and nonmonotonic conditionals. Out of necessity, researchers in
nonmonotonic logics have given up the deduction theorem for various specific pet
systems; thus some may be inclined to simply shrug at these results. But these
results are not peculiar to some specific system and some specific connective;
they are much more general. They tell us that all nonmonotonic conditionals and
NONMONOTONIC REASONING 347
universe have changed drastically, several times over the last 50 years. However,
no one would suggest that we need a nonmonotonic theory of thermodynamics,
general relativity theory, or elementary arithmetic just because we in astrophys-
ics change our minds about some astronomical phenomenon when we get new
evidence. Mathematics and the fundamental theories of physics are just tools that
astrophysicists use to try to understand astrophysical phenomena. In a similar way,
logic is a tool that we can use to help us understand the human process of reasoning.
Logic is not itself a full account of the process of reasoning, any more than is
general relativity an account of the process of the evolution of the universe. Logic
is a tool to help us understand the process of reasoning, just as general relativity is
a tool to help us understand the evolution of the universe.
Part of what makes the jump from “nonmonotonic reasoning” to “nonmonotonic
logic” seem so appealing is a confusion of logical theory with an account of the
human process of reasoning. The term “logic” is commonly used in several senses.
In one sense, “logical” is used interchangeably with “reasonable”. To ask for the
logic behind someone’s inferences or actions is to ask for an account that would
make us see those inferences or actions as rational; that is, we want a story that
would allow us to empathize, to see that we would make the same inferences or
perform the same actions in similar circumstances. When we arrive at such an
understanding, we feel the inferences or actions are “rule governed”, not random
or erratic. But it would be a mistake to jump from the observation “Her infer-
ences were quite logical” to the claim that there is some different formal system
with weird connectives and/or inference rules that is needed to account for her
inferences. Reasoning is a human process that takes place through time. Logic
provides, at best, an instantaneous snapshot of a “rational” consequence relation.
The difference is as fundamental as the difference in physics between dynamics
and statics.
Before giving a couple of examples of how monotonic logic can be used to
provide an account of nonmonotonic reasoning, we will sketch an elementary
abstract account in terms of model theory and associated proof theory that may
serve as a simplified general framework to aid our understanding. Consider some
standard deductive consequence relation, which we will symbolize by “`D ”. Let
us also assume we have given some notion of deductive model, so that with each
set 0 of sentences we have an associated set DM(0) of the deductive models of 0.
The usual soundness and completeness results can be stated as follows:
Can we construct a similar account for inductive logic (i.e., so-called nonmono-
tonic logic)? Using similar notation, it seems we need an inductive consequence
relation and some notion of inductive models, and hopefully parallel soundness
NONMONOTONIC REASONING 349
the new information are just deductive consequences of the updated models. Since
the set of presupposed models does not constitute all the models, the inductive
consequences are a superset of the deductive consequences.
Of course there are lots of interesting questions about how models get changed
or updated, especially if the new information contradicts something in the current
model. But just how models are actually updated by human beings is a matter
of empirical fact, not a question of logic. If one wishes to be prescriptive about
the way in which models should be changed, then clear criteria for preferring
one updating scheme to another must be specified. Claims like “changing models
this way rather than that is more likely to lead to correct conclusions” make very
strong claims about the way our world actually operates, and amount to empirical
assumptions about our world which should be explicitly stated and empirically
tested. Remember, it is not our goal here to answer such questions. Our goal is only
to show that we do not need any nonmonotonic logic to understand nonmonotonic
reasoning. In terms of understanding human reasoning, the effort spent trying to
develop some nonmonotonic logic is totally wasted; that effort should be redirected
to potentially more fruitful areas, such as how humans update their presupposed
models or shift from one to the other.
Our bare-bones semantic framework has a lot in common with the semantics of
spheres proposed by Lewis (1973) and subsequently applied by Delgrande (1988)
to nonmonotonic logic. But from the discussion in our impossibility proofs, serious
problems arise with Delgrande’s approach when he attempts to use the semantics
as a semantics for an object-language conditional, and then use that conditional to
define a nonmonotonic consequence relation. In light of our previous discussion,
it is particularly suggestive that Delgrande does not allow his conditional to be
nested, and thus it is more like a consequence relation than an object language con-
ditional. Certainly there are nonmonotonic conditionals in English, and their logic
is important. But as we have seen, nonmonotonic consequence relations cannot be
generally based on object-language constructs.
On the other hand, our associated bare-bones proof theoretic account has a lot
in common with default logics of the sorts elaborated by Reiter (1980). From our
point of view, one can see the default rules as an attempt to code information about
the current mental model, as well as information about how the model is to be
changed when certain sorts of information are encountered. However, our simple
picture does not require some special inference rules, nor some special object lan-
guage connectives. From the presupposed models point of view, so-called default
rules are highly simplified encodings of empirical assumptions about our world.
As such, they are best thought of as part of the corpus of common sense empirical
wisdom (folk physics, folk psychology, folk biology, etc.) rather than as part of
logic.
The heart of nonmonotonic reasoning is not in some new logic; rather, what
is needed is more information about how our models of the world come to be
originally formulated and then subsequently changed and updated in response to
352 C.G. MORGAN
new information. The logic that we use to draw conclusions is always the same;
nonmonotonicity is just the result of changing our background assumptions or
models and using the same old logic to draw conclusions.
This simple framework also sheds light on the problems we encountered when
trying to use an object-language conditional to reflect either the conditional of con-
ditional probability or any allegedly nonmonotonic consequence relation (our third
and fourth proofs, above). Both cases are impossible, as we demonstrated above.
Thus nonmonotonicity seems to be essentially meta-theoretical. It is a result of the
way in which logic is used; it is not the result of syntactic constructs within the
language. The nonmonotonicity of common sense reasoning is a result of the way
we use logic; it is not a result of the logic we use.
9. Non-monotonic systems
The general framework at which we have arrived is very simple. Common sense
reasoning by humans uses (some) standard monotonic logic. The reasoning process
involves (for all members of the animal kingdom) the use of mental models of our
world. When we get new information, we update or change our models. We use our
models and standard logic to draw conclusions. The nonmonotonic aspect of our
reasoning is a result of the model changing process, not a result of the logic. I do
not suggest that this picture is particularly original nor startlingly new. However,
I do want to suggest that it is richer and more powerful than has been widely ac-
knowledged. Let us turn to a few examples of systems based on this “presupposed
models” account.
Our first example of a nonmonotonic reasoning system is the theory of prob-
ability itself. One way of looking at a probability distribution is as a sequence of
elementary weighting functions over a σ -field of sets of models, where models are
conceived of as maximally consistent sets of WFFs (see Morgan (1991) and Van
Fraassen (1981)). For a given evidence set 0, we find the first weighting scheme
in the sequence in which there are models of 0 with non-zero weight. If there is
no such scheme in the sequence, then all probability values, given that 0, will be
1. If there is such a weighting scheme, then the probability of each sentence A,
given 0, is just the weighted relative frequency of the 0 models in which A holds.
When we add new information 1, we just select the first weighting scheme in the
sequence in which there are models of 0 ∪ 1 with non-zero weight, and compute
probabilities as before. In the simple presupposed models picture, each model gets
a weight of 0 or 1, but in this generalized scheme, we may allow more than just
two values for the weights. Each distinct probability distribution just corresponds
to a different weighting scheme over the models. Seen in this light, probability
theory is a generalization of the presupposed models analysis of nonmonotonic
reasoning. Probability theory just is relative frequency applied to the presupposed
models analysis of nonmonotonic reasoning. Probability theory makes one very
important assumption to which we would like to draw attention. It assumes that
NONMONOTONIC REASONING 353
the order in which we arrive at our premises does not matter. More graphically, the
probability of A, given the background information <B1 , B2 , B3 > is the same as
given <B2 , B3 , B1 >, and the same as given <B1 , B3 , B2 > or any other permutation.
In some treatments of weird conditionals, the order of the antecedents is deemed
to be important.
As we have mentioned earlier, probability theory is much too detailed for most
applications; we are just not in a position to assign precise numerical measures
to all evidence-conclusion pairs. Further, the abstract theory of probability really
does not tell us which distributions we should use in any given circumstance. Our
second example is a rather simple scheme which at least partially avoids both of
these problems. The approach is based on the pioneering work of Carnap (1962).
Many examples of nonmonotonic reasoning may be thought of as based on
simple statistical information derived from experience concerning elementary clas-
sificatory properties, combined with the following “rule”: if Pr(A, 0)> .5, then it is
reasonable, given 0, to act as if A is the case. (Of course the reasonableness of this
rule is based on the assumption of comparable, modest potential gains and losses.
We show by a simple example below the need to consider the magnitude of gains
and losses when choosing how to act.) We assume the language contains predicates,
individual constants, and the standard classical logical particles but no quantifiers.
Our experience is always finite, so our background information 6 concerns only a
finite number of individuals:
I(6) = {a1 , . . . , an }
In addition to S, we are given information 0(b) about some new individual b, and
we would like to know whether or not to conclude E(b); both 0 and E may be very
complex and may contain information about individuals other than b.
Of course the very first thing to check is whether 6 ∪ 0(b) ` E(b). If E(b) is
just a deductive consequence of our background information, then we are done.
For computational purposes, we could use any standard theorem checker for the
deductive logic in question, perhaps with resource limitation cut-off to ensure ter-
mination. In a similar way, we may check to see if 6 ∪ 0(b) `∼ E(b), for if so, our
problem is solved.
If neither E(b) nor ∼E(b) is a deductive consequence (within whatever resource
limits we have imposed), we need to consider whether or not either is an inductive
consequence. For the purposes of this second problem, we may act as if I(6) con-
stitutes the entire universe. We then find the set of objects 0(I6) in this universe
which, according to the information in 6, have all of the properties listed in 0:
0(I(6)) = {x: x I (6) and 6 ` B(x) for every WFF B(x) 0( x)}
If 0(I(6)) is empty, we refuse to draw the conclusion on the grounds of inadequate
information. If it is not empty, we then find the set of objects E(0(I(6))) which,
according to the information in 6, have the property E:
E(0(I(6))) = {x: x 0(I (6)) and 6 ` E(x)}
354 C.G. MORGAN
Then if E(0(I(6))) contains more than 50% of the objects in 0(I(6)), it would
make sense to predict that E(b). On the other hand, if E(0(I(6))) contains less than
50% of the objects in 0(I(6)), it would make sense to predict that ∼ E(b).
This simple scheme is an elementary formalization of reasoning by analogy.
It is similar to circumscription and is applicable to many of the same kinds of
problems. But our scheme is easy to implement, is computationally tractable, and
corresponds well with our intuitions. It easily handles the standard examples about
flying and birds, and similar cases as well. It does not require the introduction of
weird logical particles nor the use of dubious “default” rules.
Obviously the scheme is not monotonic. Adding information 3 about new ob-
jects not in I(6) may change the distributions, since I(6 ∪ 3) will not be the same
as I(6). Further, if we add more background information 1(b) about the individual
in question we may find 0(I(6)) is different from (0 ∪ 1)(I(6)). The scheme has
obvious limitations, since it deals only with very simple, non-quantified languages.
However, we could extend the scheme to quantified languages by requiring the
expansion of all quantifiers using the objects in I(6). (With quantifiers, it becomes
possible to make claims about the number of objects without naming them, so in
some cases difficulties may result from such a treatment of the quantifiers.) There
are many other ways to use probability in a nonmonotonic inferencing system; see
Kyburg (1994) for a good alternative.
As a final example of a nonmonotonic system, we will consider a technique
which is similar to so-called default logic. The technique has nothing directly to do
with probability theory, although it is actually derived from the method presented
in Morgan (1996) for constructing canonical probability distributions. The intuitive
picture behind the technique is that part of common sense is the possession of an
ordered sequence of hypotheses, each of which is a potential background assump-
tion which may be used to fill in holes in our knowledge. When we are given
some set of premises, we augment those premises by trying to consistently add the
hypotheses, in order, to the premises. Any hypothesis that cannot be consistently
added is just abandoned. Common sense conclusions are then just the deductive
conclusions of this expanded premise set.
For a slightly more formal account, we assume to be given some deductive
consequence relation “`” and a corresponding notion of consistency. Let S be
any arbitrary (preference) ordered set of WFFs, which will be used as potential
background assumptions:
S = {H1 , H2 , . . . }
00 = 0
0 i+1 = 0 i ∪ {Hi+1 }, if 0 i ∪ {Hi+1 } is consistent
NONMONOTONIC REASONING 355
= 0 i , otherwise
[
0(S) = 0i
There is no requirement that the set S be consistent. The members of S are con-
sidered to be alternatives. Thus even if S is itself inconsistent, it may well be that
individual members of S are consistent with the premise set. As long as the premise
set 0 is consistent, the expanded premise set 0(S) will be consistent, no matter
what S contains. We use the expanded premise set to determine the S-consequences
(inductive) of 0, as follows:
0 ` s E iff 0(S) ` E
Clearly even for a fixed sequence S, the S-consequence relation will be nonmono-
tonic. If we add information 1 to the set 0, we may well find that different hypo-
theses from sequence S get put into the expanded premise set. In general, we will
not have 0(S) ⊆ (0 ∪ 1)(S).
For a simple, but illustrative example, we will use the following symbols for the
corresponding English phrases:
Px: x is a penguin
Bx: x is a bird
Ax: x is an animal
Fx: x is capable of flight
t: Tweety
For the example, we will use the following sequence of ordered background as-
sumptions:
S = {(x)((Px ∧ Bx ∧ Ax)⊃∼ Fx), (x)((Bx ∧ Ax) ⊃Fx), (x)(Ax ⊃∼ Fx)}
Using the definitions above, we have the following expanded premise sets:
{At} (S) = {At} ∪ S
{At, Bt}(S) = {At, Bt} ∪ {(x)((Px ∧ Bx ∧ Ax) ⊃∼ Fx),
(x)((Bx ∧ Ax) ⊃ Fx)}
{At, Bt, Pt}(S) = {At, Bt, Pt} ∪ {(x)((Px ∧ Bx ∧ Ax) ⊃∼ Fx)}
When told only that Tweety is an animal, we conclude that Tweety cannot fly, since
{At} (S) `∼ Ft. Similarly, when told that Tweety is an animal but also a bird, we
conclude that Tweety can fly, since {At, Bt} (S) ` Ft. And finally, when told that
Tweety is an animal and a bird but also a penguin, we conclude that Tweety cannot
fly, since {At, Bt, Pt}(S) `∼ Ft.
This scheme makes use of a deductive consequence relation and the related
notion of consistency, both of which may be computationally problematic, espe-
cially for a first-order language. But these notions are problematic for humans as
356 C.G. MORGAN
10. Conclusions
We have given four formal proofs that a rational nonmonotonic logic is impossible.
In the first proof, we showed that it is impossible to formulate a nonmonotonic
consequence relation that corresponds to any characterization of unconditioned
rational belief structures. In the second proof, we showed that it is impossible to
formulate a nonmonotonic consequence relation that corresponds to any charac-
terization of conditional rational belief structures. In our third proof, we showed
that no matter how complex the formal language, no object-language construct
(e.g., a nonmonotonic conditional) can reflect elementary principles of relative fre-
quency. And in our fourth proof, we showed that given standard accepted patterns
of reasoning, there is a fundamental logical incompatibility between any nonmono-
tonic consequence relation and any object-language nonmonotonic conditional;
any attempt to reflect one by the other forces both to be monotonic.
Careful attention to the proofs suggests that nonmonotonicity is a function of
the way we use logic, rather than being a function of the logic we use. We sug-
gested a very elementary semantic account, the presupposed models account, of
non-demonstrative reasoning. Using this account, we were able to explain how
nonmonotonic reasoning arises from the application of standard monotonic logic to
situations in which additional acquired background information leads us to change
our presupposed world model. We then discussed three examples of nonmonotonic
reasoning systems based on standard monotonic logic.
We certainly do not want to claim that our three examples of nonmonotonic
reasoning systems exhaust the possibilities. Our primary purpose was to demon-
strate the variety of systems which can be constructed without resorting to weird
logics. The simple presupposed models picture of nonmonotonic reasoning is very
general and deserves to be more widely investigated.
A secondary goal was to illustrate how nonmonotonic reasoning systems de-
pend on empirical assumptions. Recall that in earlier sections, we proved that no
nonmonotonic proof system could cohere with universal principles of rational be-
lief; every nonmonotonic proof system logically must violate elementary principles
of relative frequency. Semantically speaking then, nonmonotonic reasoning must
violate soundness with respect to rational belief structures. So specific examples
NONMONOTONIC REASONING 357
fact that Tweety is a bird and a penguin. So the total evidence requirement has to
do with what we know, accept, or believe.
The following is a more reasonable characterization of statistical syllogism:
For “belief” here, I do not have anything fancy in mind, just good old ordinary
reasonable belief. Certainly if any of the premises changes in a fundamental way,
we would not want to insist on the same conclusion. But now as we have just
phrased it, this pattern is part of a higher order theory about rational belief. But if
we are going to use statistical syllogism to update a data base, we might phrase it
as follows:
SG.3.1 P(A, 0) > P(∼A, 0)
SG.3.2 0 is the total content of the data base.
−−−−−−−−−−−−−
SG.3.3 A should be added to the data base.
Now, if we add additional information 1 to the data base, we have falsified SG.3.2,
and thereby undercut the reason for adding A to the data base. Hence, we may have
to remove A on the basis of subsequent information. (This example well illustrates
the need for the currently common practice of distinguishing between information
given “externally” and information which results from “internal” computations
based on the external material.) But this inference pattern is no different in principle
from observing that: (1) if you give me p ⊃q and p, then I am entitled to conclude q;
but (2) if you subsequently retract p, I can no longer conclude q from p ⊃ q alone.
In other words, this rule looks suspiciously like the presupposed models picture of
nonmonotonic inference!
There remains one aspect of nonmonotonic reasoning which I would like to
stress. In many actual cases, the nonmonotonicity of our reasoning is a function
(at least in part) of our perceptions concerning the utility of acting as if various
conclusions were true. As the perceived utilities change, the conclusions drawn
from the same premises may change as well. As a simple example, suppose we are
told only that Tweety is a bird, and we are asked whether or not Tweety can fly.
Consider the two following “pay off” regimes:
Clearly under regime 1 the rational person ought to answer “yes”, since there is a
potential gain for that answer and no potential gain for answering “no”. However,
under regime 2, the rational person ought to answer “no” because an answer of
“yes” exposes one to a potential loss and no gain, while the “no” answer does not
expose one to any loss. So in this example, the utilities completely outweigh any
statistical or other considerations.
The presupposed models view has been around for some time. See Daniels and
Freeman (1980) for the details of a formal analysis of subjunctive conditionals
based on this idea as well as for older references to the literature. Quite independ-
ently of our work, Poole (1988) showed that default logics could be seen as a
scheme for revising a presupposed world model, and that nonmonotonicity results
from using standard logic to reason from the revised model. He rejected as unne-
cessary the formulation of new proof theories with associated formal semantics.
He argued for the thesis that “. . . the problem of default reasoning is not a problem
with logic, but with how logic is used.” (Poole (1988), p. 45) We claim that this
conclusion holds for all nonmonotonic reasoning, not just for default reasoning. We
have here presented four proofs that it is impossible to formulate a nonmonotonic
consequence relation with a semantics that corresponds to elementary principles
of rational belief. Briefly put, we have shown that nonmonotonic logic must be
irrational. The “presupposed models with updating” view seems to be a common
theme in much of the current research into nonmonotonic reasoning; hopefully we
have shed some light into why this should be the case.
Acknowledgement
Over the years I have benefited greatly from discussions of this material with many
people, especially with those who disagree strongly with my views. In particular I
would like to thank Romas Aleliunas, Jim Delgrande, David Etherington, Robert
Hadley, Henry Kyburg, and Don Nute.
360 C.G. MORGAN
References
Besnard, P., and Siegel, P. (1988), ‘The Preferential-models Approach to Nonmonotonic Logics’, in
Smets et al., Non-standard Logics for Automated Reasoning, Academic Press, pp. 137–161.
Besnard, P. (1989), An Introduction to Default Logic. Berlin: Springer-Verlag.
Brewka, G., Dix, J., and Konolige, K. (1997) Nonmonotonic Reasoning: An Overview. Stanford:
Center for the Study of Language and Information.
Carnap, R. (1962), Logical Foundations of Probability, 2nd ed, Chicago: University of Chicago Press.
Daniels, C., and Freeman, J. (1980) ‘An Analysis of the Subjunctive Conditional’, Notre Dame
Journal of Formal Logic 21, pp. 639–655.
Delgrande, J. (1988), ‘An Approach to Default Reasoning Based on a First-order Conditional Logic’,
Artificial Intelligence 36, pp. 63–90.
Kraus, S., Lehmann, D., and Magidor, M. (1990), ‘Nonmonotonic Reasoning, Preferential Models
and Cumulative Logics’, Artificial Intelligence 44, pp 167–207.
Kyburg, H. (1994), ‘Believing on the Basis of the Evidence’, Computational Intelligence 10, pp.
3–20.
Lewis, D. (1973), Counterfactuals, Oxford: Basil Blackwell.
Morgan, C. and Mares, E. (1991), ‘Conditionals, Probability, and Non-triviality’, Journal of
Philosophical Logic 24, pp. 455–467.
Morgan, C. (1991), ‘Logic, Probability Theory, and Artificial Intelligence – Part 1: the Probabilistic
Foundations of Logic’, Computational Intelligence 7, pp. 94–109.
Morgan, C. (1994), ‘Evidence, Belief, and Inference’, Computational Intelligence 10, pp. 79–84.
Morgan, C. (1996) ‘Canonical Models and Probabilistic Semantics’, presented at the Annual Meeting
of the Society for Exact Philosophy, held at East Tennessee State University, forthcoming in
Logic, Probability and Science, N. Shanks et al., eds., in Poznan Studies in Logic.
Morgan, C. (1997), ‘Conditionals, Comparative Probability, and Triviality’, presented at the Annual
Meeting of the Society for Exact Philosophy, held at McGill University, forthcoming in Topoi.
Morgan, C. (1998), ‘Non-monotonic Logic is Impossible’, Canadian Artificial Intelligence 42, pp
19–25.
Poole, D. (1998), ‘A Logical Framework for Default Reasoning’, Artificial Intelligence 36, pp 27–47.
Reiter, R. (1980), ‘A Logic for Default Reasoning’, Artificial Intelligence 13, pp. 81–132.
Smets, P. et al., eds. (1988), Non-Standard Logics for Automated Reasoning, London: Academic
Press.
Turner, R. (1984) Logics for Artificial Intelligence, West Sussex: Ellis Horwood Limited.
Van Fraassen, B. (1981) ‘Probabilistic Semantics Objectified: 1. Postulates and Logics’, Journal of
Philosophical Logic 10, pp. 371–394.