You are on page 1of 155

Fundamentals of Mathematics

Sets, Logic and Relations

Vasco Brattka

Cape Town
February 11, 2011

Picture of the Sierpi


nski Pyramid on the front page is taken from:
http://en.wikipedia.org/wiki/File:Sierpinski pyramid.png
It is under GNU Free Documentation License, Version 1.2 and
Creative Commons Attribution ShareAlike 3.0 Licence
For the written notes
c 2010 Vasco Brattka

All rights reserved.
Version of February 11, 2011

Contents
Contents

1 Mathematics
1.1 What is Mathematics about? . . . . . . . . . . . . . . . . . . . . . .
1.2 What are Proofs? . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
1.3 Indirect Proofs and the Principle of Excluded Middle . . . . . . . . .

3
3
5
9

2 Sets
2.1 What is a Set? . . . . . . . . . . . . . . . . . . . .
2.2 Explicit Definitions of Sets . . . . . . . . . . . . . .
2.3 Subsets and Comprehension . . . . . . . . . . . . .
2.4 Russels Paradox . . . . . . . . . . . . . . . . . . .
2.5 Union and Intersection of Sets . . . . . . . . . . . .
2.6 Difference and Complement of Sets . . . . . . . . .
2.7 Union and Intersection of Indexed Families of Sets
2.8 Power Sets . . . . . . . . . . . . . . . . . . . . . .
2.9 Product of Sets . . . . . . . . . . . . . . . . . . . .
2.10 Disjoint Union of Sets . . . . . . . . . . . . . . . .

.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.

13
13
15
16
21
23
27
32
36
38
42

3 Logic
3.1 What is Logic? . . . . .
3.2 Propositional Logic . . .
3.3 First-Order Logic . . . .
3.4 Correspondence Between

.
.
.
.

.
.
.
.

.
.
.
.

.
.
.
.

.
.
.
.

.
.
.
.

.
.
.
.

.
.
.
.

.
.
.
.

.
.
.
.

45
45
46
50
53

.
.
.
.
.
.
.
.

55
55
57
61
68
73
74
80
82

. . . . . .
. . . . . .
. . . . . .
Logic and

. . . . . . .
. . . . . . .
. . . . . . .
Set Theory

4 Relations and Functions


4.1 What are Relations? . . . . . . . . . .
4.2 Composition and Inverse Relations . .
4.3 Functions . . . . . . . . . . . . . . . .
4.4 Injections, Surjections and Bijections .
4.5 Families, Sequences and Restrictions
4.6 Images and Preimages . . . . . . . . .
4.7 Set of Functions . . . . . . . . . . . .
4.8 The Axiom of Choice . . . . . . . . . .

.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.

.
.
.
.

.
.
.
.
.
.
.
.

.
.
.
.

.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.

Contents
4.9

Infinite Products . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

5 Cardinality
5.1 What is the Cardinality of a Set? . . . .
5.2 The Theorem of Schr
oder-Bernstein . .
5.3 Cantors Diagonalization Method . . . .
5.4 The Continuum Hypothesis . . . . . . .
5.5 Cantors Pairing Function . . . . . . . .
5.6 Induction Principle on Natural Numbers
5.7 Finite and Countable Sets . . . . . . . .
5.8 Dedekind Infinite Sets . . . . . . . . . .
5.9 Cardinality and Set Constructions . . .

.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.

6 Order
6.1 What is Order? . . . . . . . . . . . . . . . .
6.2 Reflexivity, Symmetry and Transitivity . . .
6.3 Equivalence Relations . . . . . . . . . . . .
6.4 Preorders, Partial Orders and Linear Orders
6.5 Monoids . . . . . . . . . . . . . . . . . . .
6.6 Maximum and Minimum . . . . . . . . . . .
6.7 Supremum and Infimum . . . . . . . . . . .

.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.

85

.
.
.
.
.
.
.
.
.

89
89
92
95
96
97
100
103
107
110

.
.
.
.
.
.
.

117
117
118
121
125
128
131
136

Axiomatic Set Theory

139

Mathematicians

141

Greek Alphabet

145

Mathematical Symbols

147

Index

149

CHAPTER

Mathematics
In my own experience, mathematics in general and pure mathematics
in particular has always seemed like secret gardens, special places where
I could grow exotic and beautiful theories. You need a key to get in, a key
that you earn by letting mathematical structures turn in your head until they
are as real as the room you are sitting in.
David Mumford (Fields Medalist, Brown University)

1.1

What is Mathematics about?

What is mathematics about? It is difficult to come up with an exact answer to


this question. Perhaps the best way to approach this question is to look at what
the actual practice in mathematics is and to look at the areas that mathematicians
actually work in.
However, even this is not so easy to undertake. The Mathematical Reviews
database of the American Mathematical Society (AMS) contains references to more
than 2 million articles produced by many thousands of authors and currently about
one hundred thousand articles are added each year, which means that in the average
more than 270 mathematical articles are published per day.1 The articles in this
database are classified according to the Mathematics Subject Classification and this
classification alone is almost 50 pages long. Today an active mathematician is usually
just expert for one tiny little subfield in some of these categories and has some
rough idea about some of the others. Nowadays, the mere volume of the body of
knowledge in mathematics is so enormous that no single human being can oversee
all of it. Following the exposition in [5] one can subsume most of mathematics under
the following main areas.
1

This database can be found at http://www.ams.org/mathscinet/

1.

Mathematics

Main areas of mathematics


1.
2.
3.
4.
5.
6.
7.
8.
9.
10.

Algebra
Number Theory
Geometry
Algebraic Geometry
Analysis
Logic
Combinatorics
Theoretical Computer Science
Probability
Mathematical Physics

Of course, this classification is a simplified one and many mathematicians will


wonder where their particular area can be found here. Regarding these topics and
numbers that we have mentioned, we can try a vague encyclopaedia like definition
of what mathematics is:
Mathematics is the science of structure, quantity, change and space and
the interactions between them. While mathematical ideas can be inspired
by everyday observations, it is a characteristic feature of mathematical
truth that it is derived with logical reasoning on the basis of sound definitions. Mathematics dates back to ancient times, but has undergone
some of its most dramatic advances in the modern era. Nowadays it
can be considered as one of the most successful collective human endeavours. Each day mathematicians all over the world prove hundreds of
new theorems and solve numerous open problems and in this way they
contribute to the systematic body of knowledge that comprises modern
mathematics.
Besides the question of what mathematics is, this description also addresses the
question of how mathematics approaches its subject. That is, besides the content
there is an activity that characterizes what mathematics is. And the main tool of the
activity is the proof. Mathematics is developed by rigorous reasoning about precise
definitions and the result of this reasoning is presented in form of theorems. The
correctness of a theorem is usually witnessed by a proof. In the next section we will
deal with the question of what a proof is.
The results of mathematical work come entitled in different forms and we give
the reader some glossary on the terminology.
1. Theorem: This usually stands for some major result that might itself be based
on several other auxiliary results. In a mathematical article often just a handful
or even only one result comes with the title of a theorem and that is then
usually the main result. Sometimes a theorem can also have a very simple
proof.

1.2. What are Proofs?


2. Corollary: A corollary is usually a direct conclusion made from results that
have been presented before. It usually does not come with a separate proof.
3. Proposition: A proposition is usually a result, which is considered as interesting
by itself and which is worth being spelled out separately, although it might
have a relatively simple proof and is not necessarily a major achievement.
4. Lemma: A lemma is typically an auxiliary result that is used to prove some
other theorem. It is spelled out separately, because this structures the entire
proof and makes it usually more understandable. Sometimes, lemmas are so
useful that they become very well-known and perhaps better known than the
theorems originally derived from them.
The above terminology is not entirely clear and the boundaries between these
different terms is fuzzy. Different authors also adapt different habits in using these
terms and the above is just meant as a rough guideline.

1.2

What are Proofs?


If Gauss says he has proved something, it seems very probable to me; if Cauchy says
so, it is about as likely as not; if Dirichlet says so, it is certain.
Carl Gustav Jacob Jacobi (1804-1851)

Now, what is this activity of mathematician exactly about? What is a mathematical proof? Usually, a proof is considered as a text that convinces the reader of a
certain result in form of rigorous logical reasoning about the underlying definitions
and concepts. But what is rigorous logical reasoning? The truth is that we cannot
present a proper definition of rigorous reasoning and that mathematics is learned
by doing. This is a bit like to learn bicycling. It is very hard to describe in words
what you have to do, but somebody will show you how to do it and eventually you
will manage not to fall. Basically, everybody can learn how to reason logically and
rigorously in the mathematical sense, but it requires some years of practice under
the guidance of other mathematicians to achieve some mastery in this discipline.
So, let us start right away and let us look into some proofs.
We recall that the natural numbers are exactly the numbers 0, 1, 2, 3, .... We
write
N = {0, 1, 2, 3, ...}
for the set of natural numbers. Strictly speaking, this is not a good definition of
N, since it leaves the dots ... open to interpretation. However, we assume that
the reader has some intuitive understanding of the concept of natural numbers and
hence the definition above is clear enough. For the professional mathematician, the
most important information in this definition is that 0 is considered as a natural

1.

Mathematics

number. Some authors also start with 1 here, but throughout this text we will
consider 0 as a natural number as well.
Now, among the natural numbers we single out the prime numbers as interesting
subset. We recall the definition.
Definition 1.1 (Prime numbers) A natural number p 2 is called prime number if it has no other natural number as divisor than 1 and p itself. By
P = {p N : p is a prime number}
we denote the set of all prime numbers.
An easy calculation shows that the first few prime numbers are
2, 3, 5, 7, 11, 13, 17, 19, 23, 29, 31, 37, 41, 43, 47, 53, 59, 61, 67, 71, 73, 79, 83, 89, 97, ...
One obvious question is whether this sequence of numbers ends eventually or whether
it is is infinite? Euclid already proved more than 2000 years ago that the set of prime
numbers is infinite. His proof is basically still the same that we use nowadays. Let
us formulate our first theorem and proof to illustrate the mathematical activity.
Theorem 1.2 (Euclid 300 BC) The set P of prime numbers is not finite.
Proof. Let us assume the contrary, i.e. suppose that there are exactly finitely many
prime numbers p1 , ..., pn for some n N. We know that there are prime numbers
such as 2 and hence n 1. That is, the finite set P = {p1 , ..., pn } is the set of all
prime numbers and it is not empty. Now consider the product of all these numbers
plus one:
k = p1 p2 ...pn + 1.
This number k > 1 has a prime divisor p and hence p P. Then p divides the
product p1 p2 ...pn and the number k and hence it divides also 1 = k p1 p2 ...pn ,
which is impossible. This means that we have a contradiction and our assumption
was wrong. Thus, the set P of prime number cannot be finite.
2
The little box 2 indicates the position in the text where the proof ends. Some
authors use other symbols for this purpose or they write q.e.d. (which stands for
the Latin phrase quod erat demonstrandum, i.e. which was to be demonstrated).
This version of the proof can be found in many text books and it is considered as
a logically rigorous example of a proof and as a starting point of number theory.
Despite this fact, the proof raises a number of questions:
1. What does it exactly mean that a set is finite or not finite?
2. What is a set at all?
3. We have proved that the assumption that there are only finitely many prime
numbers leads to a contradiction. Is it admissible to conclude in this indirect
way that there are infinitely many prime numbers?

1.2. What are Proofs?


4. And is this really an indirect proof?
5. What is a proof at all?
6. Why is it so that the number k in the proof must have a prime divisor?
7. What does it mean precisely that one number divides another number?
All these questions are legitimate questions and the first five questions really
touch some core topics of this course. We will try to answer these questions during
this course step by step. The last two questions are rather specific technical questions
and they address indeed a gap in the proof that we have left and we will very soon
close this gap and answer the last two questions.
However, let us step back for a moment and let us analyse what this experience
tells us about the mathematical concept of a proof:
1. The question of whether a proof is rigorous enough is a context dependent
question. It depends on what the reader is supposed to know, it depends on
the relevant background and the development of the subject and hence it also
depends on the level of advancement of the presentation.
2. In a course of mathematics it has to be negotiated between students and lecturers, what is the right level of rigour and even during the course it might
depend on the exact advancement within the course. At some stage, a certain type of argument needs to be practised in detail and lecturers will expect
that students flesh out any little detail of the required argument. At a later
stage it is taken for granted that this type of technique is mastered and the
corresponding claim just needs to be mentioned without proof.
What this means is that there is no mathematically well-defined concept of rigorous enough, but this is a topic that needs to be resolved by interaction within the
relevant community (this could be a class or, for instance, the group of all experts
in a particular field in general).
Now let us try to close some gaps that we have left in the above proof of Euclids
theorem on the infinity of primes. Firstly, we have to define precisely what divisibility
means. That one natural number a divides another number b, in symbols a|b, means
that there exists a natural number d such that b = ad. In mathematical terms this
is often written as follows.
Definition 1.3 (Divisibility) Let a and b be natural numbers. We define
a|b : (d N) b = ad
and if a|b holds, then we say that a divides b and that a is a divisor or factor of b
and that b is a multiple of a.
Here | is read as divides, : is read as is defined to be equivalent
to and is read as there exists. We summarize all the symbols that we use in

1.

Mathematics

the course of this text in an appendix. A prime divisor is a divisor that is a prime
number. Now we can formulate the following lemma that closes the most essential
gap in the proof of Theorem 1.2. It shows why the number k in the proof has to
have a prime divisor.
Lemma 1.4 Any natural number n > 1 has a prime divisor p P.
Proof. Let n > 1 be a natural number. Let us consider the set
D = {d N : d > 1 and d is a divisor of n}
of all natural numbers d > 1 that divide n. This set is certainly not empty since
n D and hence there is a minimal number m D and we write m = min(D). If
this m is a prime number, then we have found the desired prime divisor of n. If m
is not a prime number, then it has a natural number d as divisor other than 1 and
m itself. This d divides m and hence also n, i.e. d D and d < m = min(D). But
this is a contradiction and hence this second case cannot occur.
2
Note that we have proved more than claimed, we have even proved that each
natural number n > 1 has a prime number as smallest divisor d > 1. Once again
one could complain that this proof is not rigorous enough. For instance, we did not
prove that if d divides m and m divides n, then m also divides n. This property
is called transitivity of the divisor relation and we will discuss it in the exercises.
Besides this question and the questions raised above, this second proof that we have
seen provokes a number of further questions:
1. Are we allowed to form arbitrary sets or why can we just build a set like D?
2. What exactly is a minimum and why has D such a minimum?
We will address all these questions more carefully in this course. Below we
formulate a number of exercises that continue our little excursion into number theory.
Here we end with a big open problem.
Conjecture 1.5 (Twin primes) There are infinitely many twin primes, i.e. there
are infinitely many pairs (p, q) of prime numbers p, q P such that q p = 2.
Here are the first few twin primes:
(3, 5), (5, 7), (11, 13), (17, 19), (29, 31), (41, 43), (59, 61), (71, 73), (101, 103)
It is conjectured for some thousand years that there are infinitely many such twin
primes, but until today nobody managed to prove this conjecture. In recent years
there was some partial progress on this matter, but until today (February 2010)
there is no final solution to this problem.
So, after the discussion of a proof that is several thousands of years old, we see
a very similar property as a conjecture that is still unsolved. Hence, the impression

1.3. Indirect Proofs and the Principle of Excluded Middle


that mathematics is a completed body of results is as wrong as it can be. Every
solution of a mathematical problem brings further new questions with it that await
a solution and some cases it can take along time and can require enormous efforts
to find a solution.

Problems
1.1 Prove that the following properties of the divisor relation hold true for all natural
numbers a, b, d, n, m:
1. n|n

(reflexivity)

2. d|n and n|m implies d|m

(transitivity)

3. d|n and d|m implies d|(an + bm)

(linearity)

4. ad|an and a 6= 0 implies d|n

(cancellation)

5. 1|n

(1 divides every natural number)

6. n|0

(every natural number divides 0)

7. 0|n implies n = 0

(0 only divides itself)

8. d|n and n 6= 0 implies d n

(divisibility implies less or equal)

9. d|n and n|d implies d = n

(antisymmetry)

Identify the places in the proofs of this section where some of these properties have been
implicitly used without further mentioning.

1.3

Indirect Proofs and the Principle of Excluded


Middle
So far as I can judge, Platonism of working mathematicians
is based on a feeling that important mathematical facts are
discoveries rather than inventions.
Yuri Manin (Bonn)

The proof of Euclids result Theorem 1.2 that the set of prime numbers is not
finite was presented as an indirect proof. An indirect proof is a proof that follows
the following logical pattern:
(A = ) = A.
Here stand for false and the symbol is also called falsum. The symbol =
stands for implies and stands for not. That is, if one uses the above logical
formula in order to prove A, then one first shows that the negation of A implies
something incorrect and then one concludes that this entire implication entails A.
Here one reads A as not A. Why is this pattern of logical reasoning justified? It is
essentially based on the principle of excluded middle which we formulate separately.

1.

Mathematics

Principle 1.6 (Excluded Middle, Aristotle 350BC) Any well-defined mathematical statement A is either true or false. In particular, the statement (A A) is
true.
Here is read as or and the principle says that (A A) is true. For most
mathematicians the principle of excluded middle is clearly true. For instance, we
believe that the Twin Prime Conjecture 1.5 is either true or false, it is just that
we do not know which alternative is correct. There are some mathematicians, who
would argue that the statement of the Twin Prime Conjecture 1.5 is not clearly true
or false. This direction of mathematics is called intuitionism and was essentially
founded by Brouwer. In intuitionistic logic the formula (A A) is not considered
as correct, since our current knowledge does not suffice to say clearly whether a
statement A such as the Twin Prime Conjecture 1.5 is correct or whether its negation
A is correct. That is, intuitionists have an understanding of truth that is time
dependent and something that is not found to be true today, might be recognized
as true tomorrow. Most mathematicians rather follow platonism and they believe
that any well-defined mathematical statement is either true or not. By the way,
we will say that a mathematical object or statement is well-defined, if its definition
or specification has a clear and non-ambiguous mathematical interpretation that
actually leads to an object of the specified type. In case of a statement the type
would be a truth value that we can clearly assign.
Directions of Mathematical Philosophy: Platonism versus Intuitionism
1. Platonism: Any well-defined mathematical statement A is either true or false.
2. Intuitionism: For some well-defined mathematical statements A proofs have
been found, i.e. those A are true, others have been disproved, i.e. for them A
is true. For some statements A currently neither A nor A is known.
For intuitionists truth entails knowledge and hence they cannot relate to the
statement that (A A) holds in cases where neither A nor A is known. We
adapt the platonistic main stream philosophy of mathematics for this course and we
assume that the Principle of Excluded Middle is correct. Using this principle we can
obtain a justification for indirect proofs.
Proposition 1.7 (Indirect proof ) For each well-defined mathematical statement
A the reasoning ((A = ) = A) is correct.
Proof. Let A be some well-defined mathematical statement for which we can show
(A = ). This means that if A does not hold, then something false follows.
The Principle of Excluded Middle 1.6 tells us that either A or A is correct. Since
something false follows from A, we have no other option but to conclude that A
must be correct.
2
We formulate the indirect proof method (also called proof by contradiction or
reductio ad absurdum) as a method again.

10

1.3. Indirect Proofs and the Principle of Excluded Middle


Proof Method (Proof by Contradiction)
If we want to prove A, then it is sufficient to prove that A = holds, since this
implies A by the Principle of Excluded Middle.
We have seen that indirect proofs are essentially based on the Principle of Excluded Middle 1.6 and our platonistic mind set. Despite this platonistic mind set,
many theorems in mathematics are constructive, so they can also be proved without
using the Principle of Excluded Middle. One of the first examples of a result that
was not provable without the Principle of Excluded Middle is Hilberts Basis Theorem that Hilbert proved indirectly. His proof provoked significant controversies and
doubts about the justification of the indirect method. However, indirect proofs are
sometimes much shorter and more elegant than direct proofs. Often, proofs without
the Principle of Excluded Middle are significantly harder, but today we know that
it is worth investing these additional efforts. The benefit is that proofs done in
intuitionistic logic, i.e. without the Principle of Excluded Middle, can (more or less
automatically) be translated into computer programmes. That is, one can extract
programmes from intuitionistic proofs and this is not possible, in general, for nonintuitionistic proofs. From this perspective, intuitionism has found a very pragmatic
justification and is, as a technique rather than as a philosophy, successfully used for
this and other purposes also by platonists.

Problems
1.2 Revisit the proof of Euclids Theorem 1.2 and show that essentially the same proof with
little modifications proves the following statement:
For any given finite number p1 , ..., pn P of prime numbers with n 1 there
exists a prime number p P that is not among the numbers p1 , ..., pn , i.e. such
that p 6 {p1 , ..., pn }.
Show that this proof is easily arranged such that it does not use any indirect reasoning!
In fact, this shows that Euclids Theorem is a constructive theorem and the proof actually
contains an algorithm how to compute a further prime number p, given any finite number
p1 , ..., pn of prime numbers.

Bibliographic Remarks
We close this chapter with some bibliographic remarks on useful books. There exists a
huge number of text books which can be used together with this course. Most of them
complement the course in one or the other way. We just mention a few of them.
[1] Martin Aigner and G
unter M. Ziegler, Proofs from THE BOOK, 4th edition, Springer,
Berlin, 2009.
[2] Ethan D. Bloch, Proofs and Fundamentals, A First Course in Abstract Mathematics,
Birkh
auser, Boston, 2000.
[3] Mariana Cook, Mathematicians: An Outer View of the Inner World, Princeton University Press, 2009.

11

1.

Mathematics
[4] Philip J. Davis and Reuben Hersh, The Mathematical Experience, Birkhauser, Bosten,
1981.
[5] Timothy Gowers (editor), The Princeton Companion to Mathematics, Princeton University Press, 2008.
[6] Paul R. Halmos, Naive Set Theory, Springer, New York, 1974.
[7] Kevin Houston, How to Think Like a Mathematician, A Companion to Undergraduate
Mathematics, Cambridge University Press, 2009.

The content of the first text book by Aigner and Ziegler goes far beyond this course and
it is basically a collection of the gems of mathematics. It is a good companion throughout
the life of any professional mathematician, who will return back to this book in order to
learn some of the most beautiful proofs in mathematics. The second book by Cook is not a
text book, but a collection of more than 90 photographic portraits of mathematicians, which
provides the reader some authentic insights into what mathematicians think and feel about
their work. The book by Davis and Hersh is a book that tries to disclose the nature of mathematics and the philosophical grounds on which many mathematicians operate. It also raises
questions about the metaphysical status of truth in mathematics and dominant attitudes
of mathematicians in this respect, such as platonism, formalism and constructivism. The
companion edited by Gowers is an encyclopedic introduction into all areas of mathematics.
The content goes far beyond our course, but it is one of the best available such introductions
into mathematics in general. Finally, the text book by Houston is perhaps the most useful
and affordable companion for the reader of this course.

12

CHAPTER

Sets
No one shall expel us from the paradise that Cantor created for us.
David Hilbert (1862-1943)

2.1

What is a Set?
A set is a Many that allows itself to be thought of as a One.
Georg Cantor (18451918)

In the previous section we have already seen several examples of sets, among
them the set of natural numbers N and the set of prime numbers P. Although
mathematics is about rigorous reasoning, we will not present a formal definition
of what a set is here. There is a more rigorous development of set theory, which is
called axiomatic set theory, but this axiomatic approach is too difficult for beginners.
The way we will develop set theory here is called naive set theory, since it is based
on an intuitive understanding of the concept of a set. In other words, although we
want to develop mathematics rigorously, we have to start from somewhere and in
this case the starting point is the naive concept of a set. However, even this naive
concept has a number of features that we can make more precise:
Informal definition of a set
1. A set S is a well-defined collection of mathematical objects x.
2. The members x of a set are called elements. If x is an element of the set S,
then we write x S. Otherwise, we write x 6 S.

13

2.

Sets
3. Two sets S1 and S2 are equal if and only if they contain exactly the same
elements. If S1 and S2 are equal, then we write S1 = S2 . Otherwise, we write
S1 6= S2 .

There is a particular set , which is called the empty set and it does not contain
any elements. Sometimes one writes = {}. We give a few further examples of sets
that are commonly used. The sets N and P have already been mentioned and used
in the previous chapter.
Some useful sets of numbers
1. N := {0, 1, 2, ....}, the set of natural numbers,
2. Nn := {1, ..., n}, the set of natural numbers from 1 to n for n N,
3. P, the set of prime numbers,
4. Z := {..., 2, 1, 0, 1, 2, ...}, the set of integers,
5. Q, the set of rational numbers,
6. A, the set of algebraic numbers,
7. R, the set of real numbers,
8. C, the set of complex numbers.
We do not define the integers, rational numbers, algebraic numbers, real numbers
and complex numbers precisely here. We assume that the reader has seen these sets
of numbers before and we leave a precise treatment to a later stage. We just want
to name some commonly used sets here in order to have some examples. We close
by emphasizing two important properties that members in a set do not have. They
have neither a position nor multiple appearances.
Multiplicity and Order
1. Multiplicity of an object x in a set S is not considered. Either x is an element
of S or not. No element can have multiple instances within a set.
2. Order of elements in a set does not play any role. That is, a member of a set
has no particular position within the set.
Later we will see the concept of an indexed family, where the position of elements
plays a role and one and the same element can also appear several times in different
positions. In certain special areas of mathematics also multisets are considered,
which are sets where multiplicity occurs, although order plays no role. Such multisets
have already been considered by Dedekind, but we will not use them here.

14

2.2. Explicit Definitions of Sets

2.2

Explicit Definitions of Sets


I remember once going to see him [Ramanujan] when he was ill at Putney. I had
ridden in taxi cab number 1729 and remarked that the number seemed to me rather
a dull one, and that I hoped it was not an unfavourable omen. No, he replied, it
is a very interesting number; it is the smallest number expressible as the sum of two
cubes in two different ways.
Godfrey Harold Hardy (1877-1947)

In general, the curly brackets { and } are used to specify a particular set.
This can happen in at least two different ways, either by listing the elements explicitly or by comprehension. Only finite sets can be specified by listing their elements
explicitly.1 For instance
{2, 7, 2010, 4, 2}
is a finite set with 4 elements. The particular listing of elements used to specify
the set names the elements in a particular order and some elements might even be
repeated in this list. Nevertheless, neither the order nor the repetition matters for
the set that is defined in this way, as pointed out before. So,
{2, 7, 2010, 4, 2} = {2010, 7, 4, 2}.
This is simply because we agreed that two sets are equal if and only if they contain
exactly the same elements. That is order and multiplicity are features of the list that
specifies the set, but not properties of the set itself. Also the naming of elements
can happen in very different ways. For instance, the following two singletons are
identical:
{1729} = {the smallest number expressible as the sum of two cubes in two different ways}.
A singleton is a set with exactly one element. In this case it is not so easy to
recognize that the two sets are actually identical. This requires some knowledge
about cubes and numbers and also some agreements. For instance, there is an
implicit agreement that the number 1729 is understood as base 10 expansion (this
is usually the case if not mentioned otherwise). The text in the set on the right
hand side is not read as a sequence of symbols, but as a mathematical definition
of the uniquely identified number, which is an element of this set. However, such
definitions are only acceptable if they have some clear mathematical interpretation.
See Problem 2.1 for a problematic example. Sometimes even concretely specified
sets are not so easy to understand. Here is an example.
Example 2.1 Let T = {the largest pair of twin primes (p, q)}. If the Twin Prime
Conjecture 1.5 is correct, then there is no largest pair (p, q) of twin primes and
1
We only made one exception, we also defined the set of natural numbers N = {0, 1, 2, 3, ...}
by indicating an infinite list of elements.

15

2.

Sets

consequently T = . If the Twin Prime Conjecture is false, then there is a largest


pair (p, q) of twin primes and T = {(p, q)} is the singleton with this pair as its only
member. That is, T = if and only if the Twin Prime Conjecture is correct and
since we do not know yet whether this conjecture is correct, we do not know whether
T = or not.
Also note that the pair (p, q), if existent, is a single object in the set T , although
it contains itself two components. The set T above is mathematically well-defined
although we cannot say whether it is empty or not. Hence, the limitation just
concerns our current knowledge, not the mathematical well-definedness of the set T .
However, other sets are not well-defined. For instance the set
S = {the most beautiful natural number}
is not well-defined, as long as we do not provide any precise mathematical definition
for a most beautiful natural number. Perhaps some people would argue that
S = {1729}, but as long as beautiful is not specified, the definition does not make
any sense. In contrast to that, we want to agree in this course that a set like
S = {the largest natural number}
is well-defined, since the property the largest natural number used to specify the
set has a clear mathematical interpretation. However, this set S is empty, since
there is no largest natural number.

2.3

Subsets and Comprehension

While the above sets are formed by an explicit listing of their objects, a more common
method is to specify a set by comprehension. Comprehension usually means that a
set is formed by specifying a subset of a given set using some property. An example
that we have already seen is
P := {p N : p is a prime number}.
Here the given set is the set N of natural numbers and we single out a subset P of it
by specifying which elements of N are members of this subset. So, the way to read
the above definition is that P is defined to be the set of those natural number p N
that have the property that they are prime. Some authors also write this set as
{p N | p is a prime number},
i.e. with a | instead of a :. In both cases : and | are read as such that.
The symbol := is read as is defined to be equal. Let us now more formally
capture what a subset is in general.
Definition 2.2 (Subset) Let S be a set. We say that T is a subset of S if all
elements of T are also elements of S. If T is a subset of S, then we write T S.
Otherwise, we write T 6 S.

16

2.3. Subsets and Comprehension


Please note that some authors also write T S instead of T S. However, the
former notation is somewhat ambiguous because it does not make clear whether the
sets T and S are also allowed to be equal. We define
T $ S : (T S and T 6= S).
That is $ stands for subset, but not equal. The subset symbols can also be
used the other way around. For instance, S T means that T is a subset of S.
Sometimes, S is also called a superset of T in this situation.
The diagram in Figure 2.1 illustrates a subset T S. This figure is a so-called

Figure 2.1: A subset T S


Venn diagram. The value of such diagrams is limited, since they illustrate sets as if
they are subsets of the two-dimensional plane. This can lead to wrong conclusions
and perceptions. Hence, one should use such diagrams only for inspiration and
any formal proof has to be based rigorously on the original definitions. Here is an
example of some sets and their corresponding subset relations.
Example 2.3 (Subsets) All the following statements hold true (try to prove them!):
1. {2, 3, 5} P N,
2. {2, 3, 5} $ P,
3. {1, 2, 3} 6 P,
4. P 6 {1, 2, 3}.
The last two example show that for two given sets S and T neither S T nor
T S needs to hold. However, both inclusions can hold also simultaneously. Our
first result says when exactly this happens. Although it is a very simple result, we
work out the proof in detail.
Proposition 2.4 (Equality) Let S and T be sets. Then
S = T (S T and T S).

17

2.

Sets

Proof. = We start with the assumption that S = T . By definition this means


that S and T have exactly the same elements. In particular, all elements of S are
elements of T , i.e. S T and all elements of T are also in S, i.e. T S.
= Now we assume that S T and T S. By definition this means that all
elements that are in S are also in T and all elements that are in T are also in S.
Thus, S and T have to have exactly the same elements and hence S = T .
2
Although this is an extremely simple proof, it helps to illustrate a number of
important proof techniques. If we want to prove an equivalence A B
of two statements A and B, then we have to prove two implications A = B
and B = A because these two implications together comprise the meaning of
A B. The way to read is as if and only if or is equivalent to.
Similarly, Proposition 2.4 tells us how to prove the equality of two sets S and T ,
namely by showing that S T and T S. This is important enough to be capture
again.
Proof Methods (Equivalence of Statements and Equality of Sets)
1. Equivalence: An equivalence of statements like A B is usually proved
by showing A = B and B = A separately.
2. Equality: A set equality like S = T is usually proved by showing S T
and T S separately.
There is another related terminology in mathematics. This is the terminology of
sufficient and necessary conditions.
Terminology (Sufficient and necessary conditions)
If A and B are two mathematical statements such that A = B holds, then
1. A is called a sufficient condition for B and
2. B is called a necessary condition for A.
That is, the condition S T and T S is necessary and sufficient for the two
sets S and T to be equal.
It is important to note that there is a correspondence between subsets and the
way sets can be defined by comprehension using some properties. Let us assume
that S is a set and P is a property that can hold for elements x of S or not. We
write P (x) in order to indicate that property P holds for x. Then
T := {x S : P (x)}
defines a subset of S with exactly those elements of S that satisfy property P . On
the other hand, if T S, then by
P (x) : x T

18

2.3. Subsets and Comprehension


we can define a property P for all elements of x S that holds if and only if x T .
Since we can move from properties to subsets by comprehension and backwards
from subsets to the respective properties, we can somehow identify subsets and
properties. These are essentially the same things! For example, we have used the
property of being a prime number in order to define the set P of prime numbers by
comprehension. And hence being prime is equivalent to being a member of P.
This correspondence between subsets formed by comprehension and properties
will play a crucial role in this course, since, as we will see, there is a close relationship
between set theory and logic and this is the first indication of this relationship.
Now let us look at further examples of subsets. It turns out that the empty set
is a subset of any set.

Proposition 2.5 For any set S we have S.


Proof. In order to show S we have to show that all elements of are also in
S. Since by definition there are no elements in , we have nothing to show and the
statement is proved.
2
Some readers might not like this proof, since it seems to show nothing. Perhaps,
an indirect proof is even more instructive in this case. So, let us formulate an indirect proof for this result.
Proof. Let us assume that 6 S. If S does not hold, then by definition
there must be an element x , which is not in S. But this is a contradiction, because there is no element in at all. Hence, the assumption was wrong and S. 2
So, what is the logical principle that corresponds to the statement that S
holds for any set S? It is the so-called principle of explosion that = A holds
for any statement A or in other words from a false statement one can conclude
everything. This principle illustrates why it is so important that mathematics is
consistent! As soon as there is some inconsistency, i.e. some contradiction or some
false statement that could be derived in mathematics, one could conclude everything
and hence all such results were useless.
It is important to point out that our naive approach to set theory is based on
untyped sets. That is, in order to define a set, we do not have to specify its type, i.e.
we do not have to name a superset S, first. In particular, there is only one empty
set and not one for each type S.
So, what is the property that corresponds to the empty set? The answer is
that this is the property falsum that we have seen before. It is the property
that does not hold. For an arbitrary mathematical object x we could also define
: (x 6= x). Since the property x 6= x is always wrong, no matter what x
is, is just a property that corresponds to is not true. We obtain the following.

19

2.

Sets

Proposition 2.6 (Empty set) For any set S we have


= {x S : x 6= x} = {x S : }.
Proof. Let S be some set and x S. Since (x 6= x), it is clear that
{x S : x 6= x} = {x S : }.
We have to show = {x S : x 6= x}. By Proposition 2.4 it suffices to show
{x S : x 6= x} and {x S : x 6= x} .
The first statement {x S : x 6= x} holds by Proposition 2.5. For the second
statement, let x be an element in {x S : x 6= x}. This element satisfies x 6= x,
which is impossible. Hence such an element does not exist and we have nothing to
show.
2
Here is an example of a set that we have considered before.
Example 2.7 The set S = {the largest natural number} can be defined more precisely by comprehension as follows:
S = {n N : n is the largest natural number}.
The property P (n) which is equivalent to n is the largest natural number is a welldefined property and for each natural number n N it is clear what this is supposed
to mean. However, this property does not hold true for any natural number n N
(since there is no largest one). Hence, S is the empty set by the previous proposition.
In contrast to that, the set
S = {n N : n is the most beautiful natural number}
is not well-defined, since it is not even clear for a fixed number n N what it is
supposed to mean that this is the most beautiful number.

Problems
2.1 Discuss the definition of the following set:
S = {the smallest natural number which cannot be defined with less than 100 symbols}.
Is this set S well-defined? If yes, can we determine the member of this set? If not, why not?
2.2 Prove that for any three sets R, S, T the following statements hold true:
1. S S
2. R S and S T implies R T
2.3 Find out which of the following statements are correct!
1. {},
2. {},
3. {} ,
4. {} {N}.

20

(reflexivity)
(transitivity)

2.4. Russels Paradox

2.4

Russels Paradox
In formal logic, a contradiction is the signal of defeat,
but in the evolution of real knowledge it marks the
first step in progress toward a victory.
Alfred North Whitehead (18611947)

We have already seen that there are some limitations on how a set can be listed
explicitly. Namely the listing has to be mathematically well-defined. Now we will see
another type of limitation that shows why comprehension has to be used carefully
as well. In early years of set theory one has already recognised that there are sets
that lead to serious problems. One such construction is called Russels paradox and
we present it as an example here.
Example 2.8 (Russels paradox 1901) We consider the set
S = {X : X 6 X}
of all sets X that do not contain themselves as element. Now the question is whether
S is an element of itself ? On the one hand, if S S, then S, by definition has the
property that S 6 S. This is a contradiction! On the other hand, if S 6 S, then S,
by definition has the property that S S. This is also a contradiction! Altogether,
we obtain
S S S 6 S.
This statement is clearly not correct, hence the existence of the set S leads to a
contradiction!
Cantor had discovered similar antinomies, but he did not publish them. The
way out of the problem of Russels paradox is just to declare the formation of sets
such as S as illegal. We have to apply some restrictions with regards to which sets
we can actually build. The discovery of this paradox has led to the development of
axiomatic set theory, a discipline which explains in great formal detail which sets
can be formed and which sets cannot be formed. Essentially, the situation is as
follows.
Admissible constructions of sets
1. We have an empty set and an infinite set such as the set N of natural numbers.
2. We can form finite sets by explicit specification of their elements.
3. We can form subsets of already constructed sets by comprehension (using some
property that characterizes the elements in the subset).
4. We can apply certain well-defined operations to sets in order to form new sets
out of given sets. These operations are the union of sets and the power set
construction.

21

2.

Sets

We will specify in subsequent sections what union and power set construction
exactly means. The essential point is that the set in Russels paradox has not been
built by either of these admissible tools. The condition X 6 X could be considered as
a property, but in order to use comprehension to form the set S of Russels paradox
one would need first the set U of all sets X and this set does not exist (something
that we will prove below exactly with the argument of Russels paradox).
In some approaches to axiomatic set theory, the collection U of all sets is considered as a proper class, which is something like a set of second order. Already Cantor
considered such classes as the way to avoid antinomies. Then the class S of all sets,
which are not members of themselves can be formed, but the question of whether
S is a member of itself does not make any sense, since S contains only sets and not
proper classes. Outside of set theory, the term class is sometimes also used as a
synonym for sets.
One should note that a set can very well be a member of another set. For instance
the set {, N} is a set with exactly two elements: the empty set and the set of
natural numbers N. And in fact, Russels paradox can be turned into a useful proof
of the fact that there is no set that contains all sets.
Proposition 2.9 (No universal set) There is no set U that contains all sets S.
Proof. Let U be some set. Now we consider the set
S := {X U : X 6 X}
of all sets in U that are not member of themselves. This set S is well-defined by
comprehension. Now, the assumption S S implies S 6 S. This is a contradiction.
Hence S 6 S. But this implies S 6 U . So, no matter how we choose the set U to
start with, we can always construct a set S that is not a member of U . Hence no
set U can contain all sets S.
2
Despite other claims, this result was already known and proved by Cantor in
1899 before Russel presented his paradox. However, Cantor did not publish his
result, but he only reported it in letters to David Hilbert and Richard Dedekind.
However, Cantors original proof was different from the proof presented here and we
will come back to his proof at a later stage (see Problem 5.5).
In computability theory, a branch of mathematical logic, one can use the idea
of Russels paradox also in a constructive way to define sets such as the halting
problem or the selfapplicability problem, which have been studied, for instance, by
Alan Turing and Kurt G
odel. These sets exhibit some interesting behaviour and
they play a crucial role in computability theory. It is not unusual in mathematics
that some paradox or contradiction has eventually been turned into a useful result.
In relation to Russels paradox one can ask the question, whether there can be
any set S with S S at all? Indeed, the axioms of formal set theory do not allow
the construction of such a set S. A set S is called well-founded, if an infinite chain

22

2.5. Union and Intersection of Sets


S0 , S1 , S2 , S3 , ... of sets with
... S3 S2 S1 S0 S
is impossible. If S is a set with S S, then we obtain an infinite chain
... S S S S S
and hence S is not well-founded. There is a particular axiom in formal set theory
which ensures that any set is well-founded. Hence, a set S with S S does not
exist. This shows that in axiomatic set theory the set S that we have considered in
the proof of Proposition 2.9 is actually identical to U .
Nowadays there are variants of non well-founded set theory studied, which have
found interesting applications in the study of non-terminating processes, in linguistics and also in a branch of mathematics, called non-standard analysis.
There is a little variant of comprehension, which is slightly more general, but
which we want to subsume under comprehension and this variant is called replacement. We illustrate this with an example.
Example 2.10 (Multiples) For any natural number k N we define the set kN
of numbers which are multiples of k by
kN := {n N : (m N) n = km} = {n N : k|n}.
This definition by comprehension is sometimes also written as follows:
kN = {km N : m N}.
Here all those values km are members of this set, for which m N. Such a definition
is called a definition by replacement.
For the purposes of this course we will not formally distinguish between replacement and comprehension.

2.5

Union and Intersection of Sets

In this section we want to study operations that allow to construct new sets from
given sets, in particular we will look at the union and the intersection of sets.
Definition 2.11 (Union and intersection) Let X and Y be sets.
1. We define the union X Y of X and Y by
X Y := {x : x X or x Y }.
2. We define the intersection X Y of X and Y by
X Y := {x : x X and x Y }.

23

2.

Sets

Thus, the union X Y is the collection of all elements from X and Y together,
i.e. it is the set of all elements which are in X or in Y . The intersection X Y is
the collection of all elements that are simultaneously in both sets X and Y , i.e. it is
the set of all elements x which are in X and also in Y .
We note that the union is formally not a special case of comprehension, since
we do not define X Y as a subset of some given set U , but we only create this
set U by forming the union. In contrary to this, intersection can be considered as a
special case of comprehension, since we can prove the following lemma.
Lemma 2.12 For any two sets X Y = {x X : x Y }.
The diagram in Figure 2.2 illustrates the intersection X Y and the union X Y
of two sets X and Y .

X Y

Y
X Y

Figure 2.2: The intersection X Y and the union X Y


We give a number of examples of the intersection and union of sets.
Example 2.13 The following statements hold true (try to prove them!):
1. {1, 3, 5} {2, 3, 4} = {1, 2, 3, 4, 5},
2. {1, 3, 5} {2, 3, 4} = {3},
3. 2N 3N = 6N,
4. 2N P = {2},
5. N P = N,
6. N P = P.
In the following proposition we collect a number of useful properties of union
and intersection. In particular, we prove that the union and intersection are both
commutative and associative and, additionally, they are distributive with respect to
each other.

24

2.5. Union and Intersection of Sets


Proposition 2.14 (Union and intersection) Let X, Y and Z be sets. Then the
following properties hold:
1. X Y X and X X Y,
2. X Y = Y X and X Y = Y X,

(commutativity)

3. Z (X Y ) = (Z X) Y and
Z (X Y ) = (Z X) Y,

(associativity)

4. Z (X Y ) = (Z X) (Z Y ) and
Z (X Y ) = (Z X) (Z Y ).

(distributivity)

Proof.
1. If x X Y , then x X and x Y . Hence, in particular, x X. This proves
the first statement. If x X, then x X or x Y . Hence, x X Y . This
proves the second statement.
2. and 3. are left to the reader (see Problem 2.4).
4. We prove only the first equality and leave the second one to the reader (see
Problem 2.4). In order to prove the first equality, we convince ourselves that
x Z (X Y )

x Z or x X Y

x Z or (x X and x Y )

(x Z or x X) and (x Z or x Y )

x (Z X) (Z Y ).

This implies the claim.


2
In case of the last part of the proof, we have highlighted the logical structure of
the proof, by combining both inclusions in a single equivalence chain. This is not
recommendable as a general approach, but sometimes proofs can be captured in this
way more transparently. In this case one can see clearly how a set theoretical proof
is done using the underlying logical operations that have been used to define union
and intersection.
The fact that the union and intersection of sets is associative, allows us to write
expressions like X Y Z for the union of three sets, since it does not matter how
we add parentheses to this expression. In other words, we have
X Y Z := (X Y ) Z = X (Y Z) and
X Y Z := (X Y ) Z = X (Y Z).

25

2.

Sets

An analogous definition holds for the union and intersection of any finite number
of sets in general. Now we prove a result on inclusion and its interaction with union
and intersection.
Proposition 2.15 (Inclusion, union and intersection) Let X, Y and Z be sets.
Then the following hold:
1. (X Z and Y Z) if and only if (X Y ) Z.
2. (Z X and Z Y ) if and only if Z (X Y ).
Proof.
1. We prove both directions of the implication separately.
= Let X Z and Y Z. We have to prove (X Y ) Z. Thus, let
x X Y . That is, x X or x Y . In the first case, it follows that x Z
since X Z and in the second case it also follows x Z since Y Z. Thus,
in any case x Z, which was to be proved.
= Now suppose (X Y ) Z. We have to prove X Z and Y Z. Let
x X. Then x X Y and hence x Z. For the second part, let x Y .
Then x X Y and x Z follows, which was to be proved.
2. We leave this proof to the reader (see Problem 2.5).
2
If two sets have no elements in common, then they are called disjoint.
Definition 2.16 (Disjoint) Let X and Y be sets. If X Y = , then X and Y
are called disjoint
The diagram in Figure 2.3 illustrates two disjoint sets.

Figure 2.3: Two disjoint sets X and Y

26

2.6. Difference and Complement of Sets

Problems
2.4 Prove the remaining statements from Proposition 2.14. Let X, Y and Z be sets. Then
the following properties hold:
1. X Y = Y X and X Y = Y X

(commutativity)

2. Z (X Y ) = (Z X) Y and
Z (X Y ) = (Z X) Y

(associativity)

3. Z (X Y ) = (Z X) (Z Y )

(distributivity)

2.5 Let X, Y and Z be sets. Prove that Z X and Z Y if and only if Z (X Y ).


2.6 Let X, Y and Z be sets. Prove that Z X or Z Y implies Z (X Y ). Give
examples of sets X, Y and Z such that the inverse implication does not hold true.
2.7 Prove that for any two distinct prime numbers p, q P and their product r = pq it
holds that
rN = pN qN.
Is this also true for arbitrary distinct natural numbers p, q N?

2.6

Difference and Complement of Sets

In this section we discuss another method to create new sets from given sets, by
considering the difference of sets.
Definition 2.17 (Difference) Let X and Y be sets. We define the difference X \Y
of X and Y by
X \ Y := {x : x X and x 6 Y }.
Some authors also write X Y instead of X \ Y . Figure 2.4 illustrates the
difference of two sets X and Y .

X \Y

Figure 2.4: The difference X \ Y of two sets X and Y


The following example illustrates some set differences.

27

2.

Sets

Example 2.18 The following statements hold true (try to prove them!):
1. {1, 3, 5} \ {2, 3, 4} = {1, 5},
2. 2N \ 3N = 2N \ 6N,
3. 2N \ P = 2N \ {2},
4. N \ 2N = {2k + 1 : k N}.
In the following proposition we collect some useful basic properties of the set
difference.
Proposition 2.19 (Difference) Let X and Y be sets. Then
1. X \ Y X,
2. (X \ Y ) Y = ,
3. (X \ Y ) Y = X Y .
Proof.
1. Let x X \ Y . Then x X and x 6 Y . In particular, x X.
2. Let x (X \ Y ) Y . Then x X \ Y and x Y . But x X \ Y means x X
and x 6 Y . Hence, x Y and x 6 Y , which is a contradiction. Thus, there is
no x (X \ Y ) Y and hence (X \ Y ) Y = .
3. We prove both inclusions separately.
Let x (X \ Y ) Y . Then x X \ Y or x Y . This means (x X and
x 6 Y ) or x Y . Altogether, x X or x Y , i.e. x X Y .
Let x X Y . Then x X or x Y . Now we make a case distinction.
1. Case: x Y . In this case x X \ Y or x Y is certainly correct.
2. Case: x 6 Y . In this case x X and x 6 Y is correct. Hence x X \ Y or
x Y is correct.
In both cases we obtain x (X \ Y ) Y .
2
De Morgans Law captures what happens if we subtract a union or an intersection of sets from another set. Basically, unions are turned into intersections and
intersections into unions in this case, as expressed more precisely in the following
proposition.
Proposition 2.20 (De Morgans Laws) Let X, Y and Z be sets. Then
1. Z \ (X Y ) = (Z \ X) (Z \ Y ),

28

2.6. Difference and Complement of Sets


2. Z \ (X Y ) = (Z \ X) (Z \ Y ).
Proof. We only prove the first part of the claim.
1. We prove both inclusions separately.
Let x Z \ (X Y ). This means x Z and x 6 (X Y ). That is we
do not have x X or x Y , hence x is neither in X nor in Y , which means
x Z \ X and x Z \ Y , i.e. x (Z \ X) (Z \ Y )
Let now x (Z \ X) (Z \ Y ). Then x (Z \ X) and x (Z \ Y ),
which means that x Z and x 6 X and x 6 Y . The latter implies that it is
not the case that x X or x Y , i.e. x 6 (X Y ). Altogether, this entails
x Z \ (X Y ).
2. We leave this part of the proof to the reader (see Problem 2.8).
2
Sometimes, if only subsets of a fixed type are considered, the following notation
is used in mathematics.
Definition 2.21 (Complement) We consider subsets X Z of a fixed given set
Z. Then we denote the complement of X by X c := Z \ X.
The reader should note that the notation X c implicitly refers to the given set
Z and this notation does not make sense if Z has not been specified before. Some
authors also write X for the complement of X; other notations are used as well. De
Morgans Laws can be expressed somewhat neater in this notation.
Corollary 2.22 (De Morgans Laws) Let X, Y be both subsets of some fixed given
set Z. Then the following hold (where all complements are taken with respect to Z):
1. (X Y )c = X c Y c ,
2. (X Y )c = X c Y c .
The fact that unions and intersections are swapped under complementation,
indicates that there is a duality between these two concepts. That means that any
property of unions can be translated into a property of intersections and vice versa,
using complements. The logical rule behind this result is captured in the following
two formulas (where A and B are two propositions):
(A B) A B,
(A B) A B.
Here stands for or and stands for and. A complement taken twice just
yields the original set back. We first prove a slightly more general statement.

29

2.

Sets

Proposition 2.23 (Double difference) For sets X, Y and Z we have


1. X \ (Y \ Z) = (X \ Y ) (X Z),
2. (X \ Y ) \ Z = X \ (Y Z).
Proof. We prove the first statement and leave the second one to the reader (see
Problem 2.11).
1. We obtain the following equivalence chain of statements for all x:
x X \ (Y \ Z)

x X and x 6 Y \ Z

x X and not(x Y and x 6 Z)

x X and (x 6 Y or x Z)

(x X and x 6 Y ) or (x Z and x Z)

x (X \ Y ) (X Z).

This means X \ (Y \ Z) = (X \ Y ) (X Z).


2. This is left to the reader (see Problem 2.11).
2
As a special case we obtain the following result on double complements.
Corollary 2.24 (Double complement) Let X be a subset of some fixed set Z.
Then (X c )c = X, where both complements are understood with respect to Z.
The logical rule behind this observation is the double negation law:
A A,
which holds for all well-defined mathematical statements. The next quite important
proposition is about the contraposition law.
Proposition 2.25 (Contraposition) Let X, Y, Z and W be sets. Then the following holds:
X Y and W Z = (W \ Y ) (Z \ X).
Proof. Suppose X Y and W Z. We prove (W \ Y ) (Z \ X). For this purpose
let x W \ Y . Then x W and x 6 Y . We obtain x Z since W Z. On the
other hand, we know that x X implies x Y . Hence, x 6 X. Altogether, we have
x Z \ X.
2
It is easy to see that the inverse implication of this proposition does not even
hold true if W = Z (see Problem 2.9). Once again, the contraposition law can be
expressed somewhat neater using the complement notation. Roughly speaking it
says that the inclusion order is inverted by complements.

30

2.6. Difference and Complement of Sets


Corollary 2.26 Let X, Y both be subsets of some fixed given set Z. Then
X Y Y c X c ,
where the complements are all understood with respect to Z.
Convince yourself why we do not just get = here but also =! The logical
version of the contraposition law is the following:
(A = B) (B = A).
This leads to a common proof method in mathematics that we formulate separately.
Proof Method (Contraposition)
In order to prove A = B for two well-defined mathematical statements it is sufficient (and, in fact, logically equivalent) to prove B = A.

Problems
2.8 Let X, Y and Z be sets. Prove that Z \ (X Y ) = (Z \ X) (Z \ Y ).
2.9 Find sets X, Y and Z such that (Z \ Y ) (Z \ X) and X 6 Y .
2.10 Let X and Y be sets with X Y . Prove that the following statements are pairwise
equivalent to each other:
1. X $ Y ,
2. Y 6 X,
3. Y \ X 6= .
2.11 We consider double differences of sets.
1. Prove that (X \ Y ) \ Z = X \ (Y Z) for all sets X, Y and Z.
2. Prove that (X \ Y ) \ Z X \ (Y \ Z) for all sets X, Y and Z.
3. Show that there are sets X, Y and Z such that (X \ Y ) \ Z 6 X \ (Y \ Z).
2.12 Let X and Y be subsets of a fixed set Z. Prove that X \ Y = X Y c , where the
complement is taken with respect to Z.
2.13 Let X and Y be sets. The symmetric difference XY of X and Y is defined by
XY := (X \ Y ) (Y \ X).
Let X, Y and Z be sets. Prove that the following holds:
1. X = X,
2. XX = ,
3. XY = Y X,
4. X(Y Z) = (XY )Z,

(commutative)
(associative)

5. XY = (X Y ) \ (X Y ).

31

2.

2.7

Sets

Union and Intersection of Indexed Families of Sets


One should always generalise.
Carl Jacobi (18041851)

Often we want to work with the union and intersection of infinitely many sets and
not just of finitely many sets. Usually this is done by considering indexed families of
sets. If I is a non-empty set and there is a set Xi given for each i I, then (Xi )iI
is called an indexed family of sets over I. Some authors also write {Xi }iI for an
indexed family of sets, but this is an unfortunate notation since the curly brackets
{ and } are overloaded in this way with a different meaning, hence we will only
use round brackets in order to denote indexed families. Now we can define union
and intersection for indexed families.
Definition 2.27 (Union and intersection for indexed families) Let I be a nonempty set and let (Xi )iI and (Yi )iI be indexed families of sets over I. We define:
S
1. iI Xi := {x : (i I) x Xi },
T
2. iI Xi := {x : (i I) x Xi }.
Here (i I) is read as there exists an i in I such that and (i I) is
read as for all i in I it holds that. In some
W sense, the existential quantifier can
actually be read like a big or operation
and the universal quantifier
can be
V
W
read
like
a
big
and
operation

.
Indeed,
some
authors
write
instead
of and
V
instead of . Having this in mind it is easy to see that the union and intersection
for indexed families of sets actually generalises the union and intersection for two
sets (see Problem 2.14). The reader should also note that intersection could be
considered as a special case of comprehension again, whereas union yields a genuine
new type of set.
In the special case that the index set I = N is the set of natural numbers, one
also uses the following notations:

Xi :=

i=0

Xi and

Xi :=

i=0

iN

Xi .

iN

Note that is not considered as value, but the notation i = 0 to is just


understood as another way of saying i N. Similarly, the above notation is used if
the index set is I = {n, ..., n + k} for n, k N and then we write
n+k
[
i=n

Xi :=

[
iI

Xi and

n+k
\
i=n

Xi :=

Xi .

iI

S
Sn+k
Note that these notations can also be typeset in-line like
i=0 Xi and
i=n Xi with
indexes written at the side. We give some examples of sets formed with union and
intersection over the natural numbers (or a subset of natural numbers).

32

2.7. Union and Intersection of Indexed Families of Sets


Example 2.28 We obtain the following (try to prove all these statements, see also
Problem 2.7):
S
1. kN = mN {km} for all k N,
T
2. kN kN = {0},
S
3. kN kN = N,
S
4. P = N \ ({1}
k=2 (kN \ {k})).
The first result that we prove on unions and intersections of indexed families of
sets concerns the change of the index set. The following proposition shows how this
affects the union and intersection, respectively.
Proposition 2.29 (Variation of index sets) Let I be a non-empty index set with
non-empty subsets J and K. Let (Xi )iI be an indexed family of sets. Then the
following hold:
S
S
T
T
1. J K = iJ Xi iK Xi and iK Xi iJ Xi ,
 S

S
S
2.
iJK Xi ,
iK Xi =
iJ Xi
 T

T
T
3.
iK Xi =
iJK Xi .
iJ Xi
Proof.
1. Let J K. We only prove the second statement. The first part
T of the
statement is left to the reader (see Problem 2.16). To this end, let x iK Xi .
Then x Xi for all i K. Since J K,
T this means that x Xi , in particular,
for all i J. But this means that x iJ Xi .
2. We prove this statement by considering both inclusions separately.



S
S
S
Let
iJ Xi or
iK Xi . This means that x
S x  iJ Xi
x
iK Xi . Hence, there exists an i J such that x Xi or there exists
an i K such that x Xi . Altogether,
S this mean that there exists an i J K
such that x Xi , which means x iJK Xi .
S
Now let x iJK Xi . Then there exists i J K such that x Xi .
Hence, there exists i S
J such that xS Xi or there exists
S i K such
S that x
Xi . This means x iJ Xi or x iK Xi , i.e. x
iJ Xi iK Xi .
3. We leave this proof to the reader (see Problem 2.16).
2
Next we discuss some basic property of unions and intersections of indexed families of sets, analogously to those that we have discussed for the union and intersections of two sets.

33

2.

Sets

Proposition 2.30 (Inclusion, union and intersection) Let I be a non-empty


index set and let (Xi )iI be an indexed family of sets and let Y be another set. Then
the following hold:
T
1. (k I) iI Xi Xk ,
S
2. (k I) Xk iI Xi ,
T
3. (k I) Y Xk Y iI Xi ,
S
4. (k I) Xk Y iI Xi Y .
Proof.
1. Let k I. If x

iI

Xi , then x Xi for all i I. In particular, x Xk .

2. We leave this proof to the reader (see Problem 2.17).


3. We prove both directions of the equivalence separately.
= Let T
Y Xk for all k I and let x Y . Then x Xk for all k I.
Hence x iI Xi .
T
T
= Let now Y iI Xi . Fix a k I and let x Y . Then x iI Xi
and hence x Xi for all i I. In particular, x Xk , which was to be proved.
4. We leave this proof to the reader (see Problem 2.17).
2
Also the distributivity law that we had formulated for union and intersection of
three sets can be generalised to the case of indexed families of sets.
Proposition 2.31 (Distributivity) Let (Yi )iI be an indexed family of sets over
a non-empty index set I. Let X be another set. Then the following hold true:
 S
S
1. X
iI Yi =
iI (X Yi ),
 T
T
2. X
iI Yi =
iI (X Yi ).
Proof.
1. We prove both inclusions separately.

S
S
Let x X
iI Yi . Then x X and x
iI Yi . Hence, there exists
an i I such that x Yi . But this means that there exists i I such that
x S
X and x Yi , i.e. there is an i I such that x X Yi . Altogether,
x iI (X Yi ).
S
Let x iI (X Yi ). Then there exists i I such that x X Yi , i.e.
such that x X and x Yi . In particular,
x X and there exists i I such
T
that x Yi and we obtain x X
Y
iI i .

34

2.7. Union and Intersection of Indexed Families of Sets


2. We leave this proof to the reader (see Problem 2.18).
2
Next we generalise de Morgans law about unions and intersections in a set
difference to the general case of unions and intersections of families of sets.
Proposition 2.32 (Generalised de Morgans law) Let I be a non-empty index
set. Let X be a set and (Yi )iI an indexed family of sets. Then
 S
T
1. X \
iI Yi =
iI (X \ Yi ),
 T
S
2. X \
iI Yi =
iI (X \ Yi ).
Proof.
1. We prove both inclusions separately.

T
T
Let x X \
iI Yi . Then x X and x 6
iI Yi . The latter means
that it is not the case that x Yi for all i I. In other words, this means that
there is an i I such
S that x 6 Yi . Hence there is an i I such that x X \ Yi ,
which means x iI (X \ Yi ).
S
Let x iI (X \ Yi ). Then there is some i I such that x X and
x 6 Yi . Hence, it is not the case that for T
all i I we have x Yi and
this means
T that it is not the case that x iI Yi . Altogether, we obtain
x X \ ( iI Yi ).
2. We leave this proof to the reader (see Problem 2.18).
2
Once again, there is a way to formulate de Morgans law for complements.
Corollary 2.33 Let I be a non-empty index set. Let X be a fixed set and let (Yi )iI
be a family of subsets Yi X. Then we obtain (with all complements taken with
respect to X):
c S
T
1.
= iI Yic ,
iI Yi
c T
S
2.
= iI Yic .
iI Yi

Problems
2.14 Let X1 and X2 be sets. Prove that the following hold:
S2
1. X1 X2 = i=1 Xi ,
S2
2. X1 X2 = i=1 Xi .

35

2.

Sets

2.15 We consider the set kN = {mk N : m N} of all multiples of k N. Prove


!

[
P = N \ {1}
(kN \ {k}) .
k=2

2.16 Let I be a non-empty index set with non-empty subsets J and K. Let (Xi )iI be an
indexed family of sets. Prove that the following holds:
S
S
1. J K = iJ Xi iK Xi ,

 T
T
T
2.
iJ Xi
iK Xi =
iJK Xi .
2.17 Let I be a non-empty index set and let (Xi )iI be an indexed family of sets and let
Y be another set. Prove that the following hold:
S
1. (k I) Xk iI Xi ,
S
2. (k I) Xk Y iI Xi Y .
2.18 Let I be a non-empty index set. Let X be a set and (Yi )iI an indexed family of sets.
Prove that:
 S
S
1. X
iI Xi =
iI (X Yi ),
 T
S
2. X \
iI Yi =
iI (X \ Yi ).
2.19 Let I be a non-empty index set. Let X be a set and (Yi )iN an indexed family of sets.
Prove that

S
S
1.
iI Xi \ Y =
iI (Xi \ Y ),

T
T
2.
iI Xi \ Y =
iI (Xi \ Y ).

2.8

Power Sets

Besides the method of comprehension we have already seen that the union of sets
(or of an indexed family of sets) is a way to defined new sets from given sets.
Another very important set theoretical construction that cannot be subsumed under
comprehension is the power set construction. Given a set X the power set 2X is the
set of all subsets of X, which is much larger than X itself.
Definition 2.34 (Power set) Let X be a set. Then
2X := {Y : Y X}
is called the power set of X.
Some authors also write P(X) instead of 2X . The power set 2X of any set X is
always non-empty, since 2X . We mention some examples.
Example 2.35 We obtain the following (try to verify these examples!):

36

2.8. Power Sets


1. 2 = {},
)

2. 2(2

= 2{} = {, {}},

(2 ) )

3. 2(2

= {, {}, {{}}, {, {}}},

4. 2{0,1} = {, {0}, {1}, {0, 1}},


5. , N, P, kN, {k} 2N for all k N.
The power set 2N of N is very large and contains many sets. The example only
lists a few of those. Later on, we will discuss the size of sets and we will see that
the power set 2X of a set X is usually much larger than X itself. We collect some
important properties of the power set in the following proposition.
Proposition 2.36 (Power set) Let X and Y be sets. Then the following holds
true:
1. X Y 2X 2Y ,

(monotonicity)

2. 2X 2Y = 2XY ,
3. 2X 2Y 2XY .
Proof.
1. We prove both implications separately.
= Let X Y . We have to prove 2X 2Y . Let A 2X . This means
A X. Since X Y , we get A Y by transitivity of the inclusion relation
(see Problem 2.2). This means A 2Y .
= Now let 2X 2Y . We have to prove X Y . Let x X. Then
{x} X, i.e. {x} 2X and hence {x} 2Y since 2X 2Y . But this means
{x} Y and hence x Y .
2. We prove both inclusions separately.
Let A 2X 2Y . Then A 2X and A 2Y . That is A X and A Y .
By Proposition 2.15 this implies A X Y , i.e. A 2XY .
Let A 2XY . Then A X Y , which implies by Proposition 2.15 that
A X and A Y . Hence A 2X and A 2Y , i.e. A 2X 2Y .
3. This proof is left to the reader (see Problem 2.20).
2
We note that the inverse inclusion of the last statement (3) does not hold true in
general (see Problem 2.20). We close this section with introducing another notation
that is commonly used in mathematics in order to denote unions and intersections
and that is best expressed using the power set.

37

2.

Sets

Definition 2.37 (Union and intersection for sets of subsets) Let X be a set
and let S 2X . Then we define
S
S
1. S := SS S,
T
T
2. S := SS S.
In order to make this more precise, we consider S here as an index set. Then we
can define an indexed family of sets (XS )SS where XS := S and we obtain
[
[
[
\
\
\
S=
S=
XS and
S=
S=
XS .
SS

SS

SS

SS

This is the precise interpretation of the definition above and it shows that this is
not a new concept. It is just another way of writing the union and intersection of
indexed families of sets in a way that is sometimes more convenient.

Problems
2.20 Let X and Y be sets. Prove that
1. 2X 2Y 2XY ,
2. 2X 2Y = 2XY X Y or Y X.
2.21 Let (Xi )iI be an indexed family of sets. Prove that
T
T
1. 2 iI Xi = iI 2Xi
S
S
2. 2 iI Xi iI 2Xi .

2.9

Product of Sets

When we discussed the twin prime conjecture, we have already spoken about pairs
(p, q) of prime numbers. The essential idea of a pair (p, q) is that it is ordered,
i.e. it matters in which position p and q appear, respectively. This distinguishes an
ordered pair from a set {p, q}. We could leave the definition of a pair intuitive, but
is is also relatively simple to define a pair more precisely using sets. This idea of
formalizing pairs goes back to Kuratowski.
Definition 2.38 (Kuratowski pair) Let X be a set with x, y X. Then we
define the Kuratowski pair, or for short the pair (x, y), as follows:
(x, y) := {{x}, {x, y}}.
Essentially, this definition of a pair is not really used in practice in mathematics,
but only the following property of pairs is of importance. That is, as soon as you
have understood the following proposition and its proof, you can forget the previous
definition.

38

2.9. Product of Sets


Proposition 2.39 (Equality of pairs) Let X be a set with x1 , x2 , y1 , y2 X.
Then
(x1 , y1 ) = (x2 , y2 ) (x1 = x2 and y1 = y2 ).
Proof. We prove both implications separately.
= Let (x1 , y1 ) = (x2 , y2 ). Then we obtain
\
\
{x1 } = {x1 } {x1 , y1 } = (x1 , y1 ) = (x2 , y2 ) = {x2 } {x2 , y2 } = {x2 },
which implies x1 = x2 . Moreover, we obtain
[
[
{x1 , y1 } = {x1 } {x1 , y1 } = (x1 , y1 ) = (x2 , y2 ) = {x2 } {x2 , y2 } = {x2 , y2 }.
Now we make a case distinction.
1. Case: x1 = y1 . Then {x1 } = {x1 , y1 } = {x2 , y2 } and hence y2 {x1 }, i.e.
y2 = x1 = y1 .
2. Case: x1 6= y1 . Then {y1 } = {x1 , y1 } \ {x1 } = {x2 , y2 } \ {x2 } and hence
y1 {x2 , y2 } \ {x2 }, which implies y1 = y2 .
= If x1 = x2 and y1 = y2 , then obviously the sets (x1 , y1 ) = {{x1 }, {x1 , y1 }}
and (x2 , y2 ) = {{x2 }, {x2 , y2 }} coincide.
2
What distinguishes a pair (x, y) from the set {x, y} is the fact that the pair is
ordered, i.e. the position in which x and y occurs matters, whereas this aspect is
irrelevant in case of the set {x, y}. Two sets {x1 , y1 } and {x2 , y2 } are equal if and
only if they contain exactly the same elements, whereas the pairs (x1 , y1 ) and (x2 , y2 )
are equal if and only if they contain exactly the same elements in exactly the same
positions. Now we use pairs in order to define the product of two sets, which is also
called the Cartesian product after Rene Descartes.
Definition 2.40 (Cartesian product) Let X and Y be sets. Then
X Y := {(x, y) : x X and y Y }
is called the Cartesian product or just the product of X and Y .
We give a few examples of products of sets.
Example 2.41 The following sets are examples of products:
1. N N is the set of pairs of natural numbers,
2. 2N (N \ 2N) = {(n, k) N N : n even and k odd},
3. {(p, q) P P : q p = 2} is the set of twin primes.
In the following proposition we capture some basic properties of the product of
sets.

39

2.

Sets

Proposition 2.42 (Products) Let W, X, Y and Z be sets. Then the following


hold:
1. X = X = ,
2. X Y and W Z = X W Y Z,

(monotonicity)

3. X (Y Z) = (X Y ) (X Z),

(distributivity)

4. X (Y Z) = (X Y ) (X Z),

(distributivity)

5. (X Y ) (W Z) = (X W ) (Y Z),
6. (X Y ) (W Z) (X W ) (Y Z).
Proof.
1. 2. and 3. are left to the reader (see Problem 2.22).
4. We prove both inclusions separately.
Let (x, y) X (Y Z). Then x X and y Y Z. The latter means
y Y and y Z. Hence (x, y) X Y and (x, y) X Z, which means
(x, y) (X Y ) (X Z).
Let a (X Y )(X Z). Then a (X Y ) and a (X Z). This means
that there are x X, y Y and z Z such that a = (x, y) and a = (x, z).
This implies y = z. In particular, y Y Z and thus a = (x, y) X (Y Z).
5. We prove both inclusions separately.
Let a (X Y ) (W Z). Then a (X Y ) and a (W Z). Hence
there are x X, y Y , w W and z Z such that a = (x, y) and a = (w, z).
This implies x = w and y = z and hence x X W and y Y Z. Thus
a = (x, y) (X W ) (Y Z).
Let now (x, y) (X W ) (Y Z). Then x X W and y Y Z, i.e.
x X and x W and y Y and y Z. Thus (x, y) (X Y ) (W Z).
6. This proof is left to the reader (see Problem 2.22).
2
We point out that the inverse inclusion in 6. does not hold true in general (see
Problem 2.22). The diagram in Figure 2.5 illustrates the products of X Y and
W Z in a coordinate system (this is not a Venn diagram!). The first components
of pairs are illustrated on the horizontal axis whereas the second components are
illustrated on the vertical axis. One can see why the intersection is a product
(=rectangle) itself and why the union is not. However, this does not constitute a
formal proof.

40

2.9. Product of Sets

{z
X

}
|

{z
W

Figure 2.5: The product X Y and W Z in a coordinate system

The definition of pairs can easily be generalised to higher arities. By the arity
we mean the number n of components in a tuple (x1 , x2 , ..., xn ). For instance, we
could define triples by (x1 , x2 , x3 ) := (x1 , (x2 , x3 )) and then we could prove that
(x1 , x2 , x3 ) = (y1 , y2 , y3 ) if and only if x1 = y1 and x2 = y2 and x3 = y3 . We will not
work this out formally here, but we will take an intuitive understanding of ntuples
(x1 , ..., xn ) for an n N from now on. That is, we assume
(x1 , ..., xn ) = (y1 , ..., yn ) (i {1, ..., n}) xi = yi .
By the way, ntuples are called pairs, triples, quadruples and quintuples for n = 2, 3, 4
and 5, respectively. There is also a tuple () of arity 0, which is sometimes called
the empty tuple or empty word. We do not distinguish between tuples of arity 1 and
their only component, i.e. (x) = x. Using tuples of arbitrary arity n we can now
also generalise the Cartesian product to higher arities.
Definition 2.43 (Generalised Cartesian product) Let X1 , ..., Xn be sets with
n N. Then we define
n

X Xi := {(x1 , x2 , ..., xn ) : (i {1, ..., n}) xi Xi }.

i=1

Later on, we can even further generalize this definition to products over families
of sets, but we first need to define what an infinite (or indexed) tuple is for this
purpose and we do not have such a definition at hand yet. An important special
case of the previous definition is the situation where all the sets Xi are the same set.
In this case we simply write
n

X n := X X = X
... X}
| {z
i=1
n times

41

2.

Sets

and call this the nfold product of the set X with itself. We also allow the special
case n = 0 here, in which case we obtain a singleton X 0 = {()} with the empty
tuple.

Problems
2.22 Let W, X, Y and Z be sets. Prove that the following holds:
1. X = X = ,
2. X Y and W Z = X W Y Z,
3. X (Y Z) = (X Y ) (X Z),

(distributivity)

4. (X Y ) (W Z) (X W ) (Y Z).
5. Prove that there are sets X, Y, Z, W such that the inverse inclusion in the previous
statement does not hold.
2.23 Let X be a set, I a non-empty set and (Yi )iN an indexed family of sets over I. Prove
that
 S
S
1. X
iI Yi =
iI (X Yi ),
 T
T
2. X
iI Yi =
iI (X Yi ).

2.10

Disjoint Union of Sets

Sometimes one would like to define a union of two sets X, Y such that one can keep
track from which set the elements actually originate from. This is important, in
particular, when X and Y have non-empty intersection.
Definition 2.44 (Disjoint union) Let X and Y be sets. Then we define the disjoint union by
X t Y := ({1} X) ({2} Y ).
Sometimes the disjoint union is also denoted by X +Y or by X Y and sometimes
it is called discriminated union or tagged union. Here the number 1 and 2 in the
first component is used like a label that indicates from which set, either X or Y , the
elements originate from. If one takes the ordinary union X Y of two sets X and
Y that are not disjoint, i.e. such that X Y 6= , then the information whether an
element X Y originates from X or Y (or both) is lost in the set X Y . Properties
of the disjoint union can easily be derived from properties of the ordinary union and
the set product and we are not going to study such properties here. We just mention
that the disjoint union can be generalized to families of sets.
Definition 2.45 (Disjoint union of a family of sets) Let (Xi )iI be an indexed
family of sets. Then we define the disjoint union of this family by
[
G
Xi := ({i} Xi ).
iI

42

iI

2.10. Disjoint Union of Sets


P
L
Sometimes,
the
disjoint
union
is
also
denoted
by
X
or
i
iI
iI Xi or by
`
iI Xi . Sometimes one would like to consider sets X that contain tuples (x1 , ..., xn )
of different arities n N. One can capture this idea using the operation on sets
that can be expressed as disjoint union.
Definition 2.46 (Sets of finite words) Let X be a set. Then the set X of finite
words over X is defined by
G
[
X :=
Xn =
({n} X n ).
nN

nN

The operation on sets is also called Kleene star operation. Strictly speaking,
any element of X has the form (i, x1 , ..., xi ) for some i N. This includes the case
0 = (0). One usually only writes (x1 , ..., xi ) = {i} (x1 , ..., xi ) in this situation with
the understanding that i is defined implicitly by the number of arguments in the
tuple (x1 , ..., xi ). In this abbreviated notation on obtains 0 = (). The Kleene star
operation has many applications also in computer science, where it is used to describe
regular languages. We give some examples how it can be used in mathematics.
Example 2.47 Here are some examples.
1. We want to create a set E {0, 1, 2, 3, 4, 5, 6, 7, 8, 9} that contains all decimal
expansions of natural numbers without leading zeros. That is
E := {(n1 , ..., nk ) {0, 1, 2, 3, 4, 5, 6, 7, 8, 9} : k = 1 or (k > 1 and n1 6= 0)}.
2. We want to create a set D N that contains chains of numbers that divide
each other. That is
D := {(n1 , ..., nk ) N : k 2 and (i {2, ..., k}) ni1 |ni }.
That is (2, 4) and (3, 6, 12, 36) are examples of elements in D.
We have used the simplified way to denote elements of sets of finite words that has
been described above.

43

CHAPTER

Logic
Logic is the anatomy of thought.
John Locke (16321704)

3.1

What is Logic?

Since ancient times logic was mostly considered as the art of proper and systematic
reasoning. Aristotles work on analytics (as he called what we call logic nowadays)
was considered for a long time as the major work in logic and for almost 2000
years there was not much progress in this discipline. This changed radically at the
end of the 19th century when logic became an active field of research again within
mathematics. Nowadays logic is a rich subfield of mathematics that has many subdisciplines on its own, such as model theory, proof theory and computability theory.
There are many applications of particular branches of logic in other disciplines such
as computer science and philosophy, but also within algebra, analysis or other mathematical areas. Within mathematics logic is also the major foundational sub discipline
which undertakes a reflection about mathematics with mathematical methods and
this is what has been called metamathematics. Godels results have spectacularly
contributed to the understanding of the limitations of mathematics and perhaps also
the scientific method in general and they are part of the jewels that 20th century
mathematics has produced.
The purpose of this section is neither to introduce any particular knowledge in
logic nor to introduce the subject as a foundational disciplines. We will rather take
a naive approach to logic (similar as with set theory) and we will try to highlight the
relevance of logic as it is used on a day-to-day basis by any working mathematician.
Essentially, we will just look at how we have used logic so far and we will emphasize
and collect the rules logical reasoning that we have already used.

45

3.

3.2

Logic

Propositional Logic

Propositional logic is the part of logic that deals with the logical combination of
mathematical propositions without considering any particular mathematical objects.
Informal definition of a proposition
A proposition is a well-defined mathematical statement that is either true or false.
We will typically denote propositions by variables using letters A, B, C, .... The
truth values true and false are sometimes denoted by t and f. We will
denote them by 1 (for true) and 0 (for false). If we have two propositions A
and B, which both are either true or false, then we have altogether 22 = 4 different
possibilities of assigning truth values to the pair (A, B) and correspondingly we
have 42 = 16 different binary logical operations that we can consider. We will only
consider a small subset of these and we will define them in the following definition
via a truth table.
Definition 3.1 (Logical operations) Let A and B be propositions. Then we
define the logical operations of negation A, conjunction A B, disjunction A B,
implication A = B and equivalence A B via the following table of truth
values:
A
0
0
1
1

B
0
1
0
1

A
1
1
0
0

AB
0
0
0
1

AB
0
1
1
1

A = B
1
1
0
1

A B
1
0
0
1

The symbols and > can be read as constant logical operations with truth
values 0 and 1, respectively. The way to read this table is that it tells us what we
actually mean when we say A and B is true, we mean that A and B both have the
truth value 1. Similarly, A or B is true means that at least one (possibly both)
of the propositions A and B have the truth value 1. This should be distinguished
from the exclusive or that we occasionally mean when we say or in our daily
language and that excludes the option that both A and B are true. The exclusive
or operation is sometimes denoted by A + B or A B and it is true if and only
if exactly one of A and B is true. However, we will not further use this operation
here and hence we have not included it in the table above. Note that by definition
A = B is considered as true if and only if the statement if A is true, then B
is true is true. By definition this is always correct, if A is not true (no matter
what the truth value of B is). That means that implications are closely related to
disjunctions and we capture this relation in the following proposition.

46

3.2. Propositional Logic


But before we do this, we point out that whenever A and B are propositions,
then also A, A B, A B, A = B and A B are propositions. Such propositions that only involve logical operations and propositional variables are called
logical formulas. If we combine several propositions, then we will use parenthesis to
make the order of combinations clear. In cases of doubts we use the rule that negation binds stronger than any other operation, followed by conjunction, disjunction,
implication and equivalence in this order of priority. Some logical propositions have
truth values that do not depend on the involved propositional variables.
Definition 3.2 (Tautology) A proposition that involves finitely many propositional variables A, B, C, ... and that has the property that its truth value is always
1, irrespectively of the truth values of A, B, C, ..., is called a tautology.
We give some examples.
Example 3.3 We consider the following logical formulas.
1. A B is not a tautology, since it depends on the truth values of A and B
whether A B is true or not.
2. (A A) is a tautology, since it is always true, irrespectively of the truth
value of A.
(Law of Contradiction)
3. (A A) is a tautology.
4. A A is a tautology.

(Principle of Excluded Middle)


(Double Negation Law)

5. > is a tautology since it is true and there are no propositional variables involved.
How can we actually find out whether a logical formula is a tautology or not?
This can be done with the truth table method that we illustrate in the proof of the
next proposition. We collect a number of tautologies that involve implications.
Proposition 3.4 (Implication) Let A, B and C be propositions. Then the following are tautologies:
1. ((A = B) A) = B
2. (A = B) (A B)
3. ((A = B) (B = C)) = (A = C)
4. (A = B) (B = A)
5. ((A B) ((A = B) (B = A))
6. ((A B) = C) (A = (B = C))

(modus ponens)
(implication and disjunction)
(hypothetical syllogism)
(contraposition law)
(equivalence)
(currying)

47

3.

Logic

Proof. We only prove 2. and leave the other proofs to the reader (see Problem 3.1).
We prove that the given logical formula is a tautology by systematically writing
down its truth table:
A
0
0
1
1

B
0
1
0
1

A
1
1
0
0

(A B)
1
1
0
1

A = B
1
1
0
1

(A = B) (A B)
1
1
1
1

The last column of this table indicates the truth value of the entire logical formula
(A = B) (A B) depending on the truth values of A and B in the first
two columns. We see that the last column always carries the truth value 1 for true,
irrespectively of the truth values of A and B. Hence the formula is a tautology. 2
In fact, the tautology 2. is sometimes exploited in order to proof implications
and we capture this as a proof method.
Proof Method (Implications as disjunctions)
In order to prove A = B for two well-defined mathematical statements A and B
it is sufficient (and, in fact, logically equivalent) to prove A B.
In the previous proposition we have carefully used parentheses to indicate in
which order the logical operations are to be applied. Sometimes, parenthesis are left
away and the operations are ordered in the following priority list:
, =, , ,
with increasing priority. That is, a logical formula such as
A B C D

would be read as

((A) B) (C D).

However, in cases of doubts it is better to use more than less parentheses. In the
following result we collect some very common other tautologies.
Proposition 3.5 (Tautologies) Let A, B and C be propositions. Then the following are tautologies:
1. ((A B) C) (A (B C))

(associativity)

2. ((A B) C) (A (B C))
3. (A B) (B A)

(commutativity)

4. (A B) (B A)
5. (A (B C)) ((A B) (A C))

48

(distributivity)

3.2. Propositional Logic


6. (A (B C)) ((A B) (A C))
7. (A B) (A B)

(de Morgans laws)

8. (A B) (A B)
Proof. One can use the truth table method to prove that all these formulas are
tautologies. We work number 5. out as an example and leave the rest to the reader
(see Problem 3.2). We denote the entire formula (A(BC)) ((AB)(AC))
by F .
A
0
0
0
0
1
1
1
1

B
0
0
1
1
0
0
1
1

C
0
1
0
1
0
1
0
1

BC
0
1
1
1
0
1
1
1

(A (B C))
0
0
0
0
0
1
1
1

AB
0
0
0
0
0
0
1
1

AC
0
0
0
0
0
1
0
1

((A B) (A C))
0
0
0
0
0
1
1
1

F
1
1
1
1
1
1
1
1

The last column of this table indicates the truth value of the entire logical formula
F , i.e. (A (B C)) ((A B) (A C)) depending on the truth values of
A, B and C in the first three columns. We see that the last column always carries
the truth value 1 for true, irrespectively of the truth values of A, B and C. Hence
the formula is a tautology.
2
The previous proof shows that checking whether a logical formula is a tautology
or not becomes increasingly more time consuming as more propositional variables
are involved. Roughly speaking, the truth table method requires 2n computational
steps (i.e. columns in the table) if the formula involves n propositional variable
A1 , ..., An .
This observation is related to one of the challenging big open problems of mathematics, the so-called P-NP problem. Here P stands for the set of problems that
can be decided in polynomial time and NP stands for the set of problems that can
be verified in polynomial time. We cannot make the definitions of these sets precise
here, this would be subject of a course on computational complexity theory, but we
state that the big open problem is whether these two sets are equal or not. While
it can be proved easily that P NP, it is not known whether the inverse inclusion
holds or not. The majority of experts believes that the inverse inclusion does not
hold. That is, we have the following conjecture (which is open more or less since the
late 1960s).
Conjecture 3.6 (P-NP Problem) P $ NP.
This problem is among those few big open mathematical problems for which
the Clay Mathematics Institute offers one million US dollar to anybody who solves

49

3.

Logic

the problem successfully in either way (i.e. by proving or disproving the conjecture
and by publishing the result properly). For the proof that indeed P $ NP holds, it
would be sufficient to show that there is no significantly more efficient way to check
whether a given formula is a tautology or not, then the truth table method discussed
above. For the proof that P = NP, it would be sufficient to provide a significantly
more efficient algorithm (that is one that does not require 2n many steps, but rather
roughly n2 , n3 or nk many steps for some fixed k N).

Problems
3.1 Let A, B and C be propositions. Prove that the following logical formulas are tautologies:
1. ((A = B) A) = B
2. ((A = B) (B = C)) = (A = C)
3. (A = B) (B = A)
4. ((A B) ((A = B) (B = A))
5. ((A B) = C) (A = (B = C))

(modus ponens)
(hypothetical syllogism)
(contraposition law)
(equivalence)
(currying)

3.2 Let A, B and C be propositions. Prove that the following logical formulas are tautologies:
1. ((A B) C) (A (B C))

(associativity)

2. ((A B) C) (A (B C))
3. (A B) (B A)

(commutativity)

4. (A B) (B A)
5. (A (B C)) ((A B) (A C))
6. (A B) (A B)

(distributivity)
(de Morgans laws)

7. (A B) (A B)

3.3

First-Order Logic

Roughly speaking, first-order logic is an extension of logic where one does not only
consider mathematical propositions and their truth values, but also such propositions that depend on certain mathematical objects. Besides the ordinary logical
operations discussed in the previous section, first-order formulas also involve quantifications over such objects using universal and existential quantifiers. We start
with an example.
Example 3.7 The following first-order formula expresses the fact that p N is a
prime number:
p 2 (n N)(n|p = (n = 1 n = p)).

50

3.3. First-Order Logic


If we abbreviate this formula with F (p), then we have
p is a prime number F (p) is true.
In particular, the fact whether F (p) is true does not only depend on the formula
F , but also on the involved mathematical object p N. We say that p is a free
variable in the formula F (p), whereas the variable n is a bound variable that falls
into the scope of the universal quantifier (n N).
Besides the logical operations , , , =, a first-order formula can also
involve existential quantifiers and universal quantifiers . Almost everything in
mathematics can be expressed using first-order logical formulas. Occasionally, one
needs second-order logic, where quantifications over subsets are allowed (i.e. we can
have formulas like (A X)...). First-order formulas can also involve other mathematical objects such as relations or functions (that we will discuss later on). For
instance, in the above example the divisibility relation | has been used.
Similarly to propositional formulas, first-order formulas can be true just due to
there mere logical form. For instance, the formula
(x X)F (x) (x X)F (x)
is true, irrespectively of what F (x) means or whether it is true for certain x. Such
first-order formulas are called valid. An example of a formula which is not valid, is
(x X)(F (x) = G(x)).
The truth of this formula depends on what F (x) and G(x) actually means and how
the respective truth value depends on x. If, for instance X = N and F (x) means x
is a prime number and G(x) means x 2, then this would be correct. If we swap
the meaning of F (x) and G(x), then the above formula would not be true.
Since we have not defined first-order formulas precisely, we will not be able
to prove in detail that a given formula is actually valid. This can only be done
in a course on logic where syntax and semantics of first-order formulas is defined
more precisely. However, we believe that all the following examples are intuitively
understandable.
Example 3.8 (Valid first-order formulas) Let X, Y be sets and let F (x), G(x),
H(x, y) be a first-order formula whose truth value depends on some x X and
y Y . Let E be a formula that does not depend on x. Then all the following
first-order formulas are valid.
1. (x X)(F (x) G(x)) ((x X)F (x) (x X)G(x))
(quantifier exportation)
2. (x X)(F (x) G(x)) ((x X)F (x) (x X)G(x))
3. (x X)(E G(x)) (E (x X)G(x))

(free quantifier exportation)

51

3.

Logic
4. (x X)(E G(x)) (E (x X)G(x))
5. (x X)F (x) (x X)F (x)

(de Morgans law)

6. (x X)F (x) (x X)F (x)


7. (x X)(y Y )H(x, y) (y Y )(x X)H(x, y)

(quantifier order)

8. (x X)(y Y )H(x, y) (y Y )(x X)H(x, y).


The thumb rule for the quantifier exportation rules is that universal quantification is compatible with disjunctions and existential quantification is compatible
with conjunctions.
This is because one can read the universal
V
W quantifier like a big
and
and the
existential
quantifier

like
a
big
or
. Correspondingly, some
V
W
authors write xX instead of (x X) and xX instead of (x X). It is easy
to see that the exportation does not work if conjunctions are used with existential
quantifiers or if disjunctions are used with universal quantifiers (see Problem 3.3).
If one of the involved formulas does not involve the variable over which one quantifies, then one can export quantifiers also with incompatible logical operations, as
specified under free quantifier exportation above. Similarly as general quantifier
exportation is not valid for incompatible logical connectives, quantifiers of different
type might not be changed in order in general (see Problem 3.3).
In the section on propositional logic we have illustrated a simple method that
can be used to find out whether a given propositional formula is a tautology or
not. This truth table method was indicated as inefficient, but at least in principle
it is applicable to any formula whatsoever and it yields a clear result following the
specified algorithm. Unfortunately, there is not such method for first-order formulas
and the absence of such a method does not mean that one has not found such a
method, but it has been prove that there is no such method as a matter of principle.
Theorem 3.9 (Church 1936) There is no algorithm that can decide for a given
first-order formula whether the formula is valid or not.
However, this does not mean that we cannot prove that certain first-order formulas are valid. Indeed, there is an axiom system of valid first-order formulas from
which one can derive all the valid first-order formulas. This is the subject of G
odels
Completeness Theorem and this is treated in a course on logic. That means, in particular, that for any valid first-order formula there is also a proof that the formula is
valid. It might just be that the proof is very intricate and lengthy. For the current
purposes we just treat first-order formulas intuitively and we do not formally prove
their correctness. We can, however, construct some counterexamples for first-order
formulas that are not valid.

Problems
3.3 We consider counterexamples for incompatible quantifier exportation.

52

3.4. Correspondence Between Logic and Set Theory


1. Prove that there is a set X and logical formulas F (x), G(x) such that
(x X)(F (x) G(x)) (x X)F (x) (x X)G(x)
is not true.
2. Prove that there is a set X and logical formulas F (x), G(x) such that
(x X)(F (x) G(x)) (x X)F (x) (x X)G(x)
is not true.
3. Prove that there are sets X, Y and a formula H(x, y) such that
(x X)(y Y )H(x, y) (y Y )(x X)H(x, y)
is not true.

3.4

Correspondence Between Logic and Set Theory

We have noticed repeatedly that there is a close correspondence between concepts


in set theory and concepts in logic. Union and intersection, for instance, is defined
using the concepts of disjunction and conjunction, respectively (and using existential quantification and universal quantification, respectively, in the case of indexed
families of sets).
Set Theory

Logic

(i I)
(i I)

=
S
TiI

iI

Table 3.1: Correspondence between concepts in set theory and logic


On the other hand, for instance, the logical form of de Morgans Law is used
(implicitly) to prove its counterpart for sets. In Table 3.1 we just want to highlight
and collect these corresponding occurrences of concepts again in order to emphasize
this relation. One could add the correspondence between the symmetric difference
(see Problem 2.13) and the exclusive or operation to this table, but since we
are not going to use them any further, we will not include them here.
We are not going to explain the exact relation of these concepts here, but we
refer the reader to the respective definitions in order to identify the relation.

53

CHAPTER

Relations and Functions


Mathematicians do not study objects, but relations between objects.
Thus, they are free to replace some objects by others so long as the
relations remain unchanged. Content to them is irrelevant:
they are interested in form only.
(1854-1912)
Jules Henri Poincare

4.1

What are Relations?

Mathematics is not just about objects, but about relations between objects. If we
study natural numbers, then we are not just interested in them as such, but we
want to understand relations between natural numbers such as divisibility. Only
such relations are giving substance to a subject such as number theory. Similarly,
if we study real numbers, we want to understand relations between them such as
linear, continuous or differentiable functions. This is what brings substance to linear
algebra and analysis. All such relations can be considered as subsets of set products
in the following straightforward sense.
Definition 4.1 (Relation) A triple (R, X, Y ) is called a relation, if X and Y are
sets and R X Y . We will call X the source and Y the target of the relation and
R its graph.
Typically, we will just say that R X Y is a relation between X and Y and we
assume that the source X and target Y is defined in this way implicitly. However,
one should keep in mind that source X and target Y have to be specified as part of
the relation. It is not sufficient just to specify the graph alone. A relation is called
homogeneous if X = Y , i.e. if source and target are identical. A relation R X X

55

4.

Relations and Functions

is also called a relation on X. If R X Y is a relation, then we also write


xRy : (x, y) R.
The idea of the notation xRy is that it is a short way of saying that x is in relation
R to y. To understand the nature of the definition of a relation, we illustrate it
with a number of examples, some of which we have actually seen earlier.
Example 4.2 The following are relations:
1. The set {(x, y) N N : x y} N N defines the ordinary less or equal
relation on natural numbers N, usually denoted by .
2. The set {(x, y) N N : x < y} N N defines the ordinary strictly less
relation on natural numbers N, usually denoted by <.
3. The set {(x, y) N N : x|y} N N defines the ordinary divisor relation
on natural numbers N, usually denoted by |.
4. For any set X, the set X := {(x, y) X X : x = y} X X defines
the usual equality relation on X, usually denoted by =. The set X is also
called the diagonal of X.
5. For any two sets X, Y , the set X Y defines the all relation between X and
Y.
6. For any two sets X, Y the set X Y defines the empty relation between
X and Y .
7. For any set X, the set {(x, A) X 2X : x A} defines the element relation
between X and 2X , usually denoted by .
8. For any set X, the set {(A, B) 2X 2X : A B} defines the subset relation
on the power set 2X , usually denoted by .
9. For any set X, the set {(A, B) 2X 2X : A $ B} defines the proper subset
relation on the power set 2X , usually denoted by $.
We emphasize again that the source and target sets specified here are part of
the definition of the corresponding relations. For instance, there is just one unique
empty set , but there are many empty relations (, X, Y ), namely one for each pair
of sets X and Y . All the relations given in the previous example, besides the element
relation, the all relation and the empty relation, are homogeneous.
Since relations are just specified using subsets of products of sets, we can perform
all usual set-theoretic operations on relations or, more precisely, on their graphs. We
illustrate this with another example, where we try to capture the divisibility relation
restricted to numbers up to 5 and disregarding the number 1.

56

4.2. Composition and Inverse Relations


Example 4.3 We consider the following sets:
1. D := {(n, k) N N : n|k},
2. X := Y := {0, 1, 2, 3, 4, 5},
3. A := X \ {1} = {0, 2, 3, 4, 5},
4. R := D (A A)
= {(0, 0), (2, 0), (2, 2), (2, 4), (3, 0), (3, 3), (4, 0), (4, 4), (5, 0), (5, 5)}.
Then (R, X, Y ) is a relation that captures divisibility up to 5, but disregarding the
number 1. The following diagram illustrates this relation.
' $
0
1
2
3
4
5

& %

' $
- 0
:
1
z
-

1
2
3
4
5

& %

Figure 4.1: The relation R X Y .

Finite relations R are often illustrated in diagrams such as the one in Figure 4.1.
The source set and the target set are given separately with all their elements and an
arrow is added from each point x in the source space to each point y in the target
space with xRy.

4.2

Composition and Inverse Relations

Since relations are special sets, we can apply the machinery of set theory to relation,
i.e. we can form unions, intersections, differences and other operations on sets. This
has been illustrated in Example 4.3. However, there are also some operations that
are tailor-made for relations, the most important of which is composition, which we
define next.
Definition 4.4 (Composition) Let R X Y and S Y Z be relations. Then
we define a relation S R X Z, which is called the composition of the two given
relations by
S R := {(x, z) X Z : (y Y )(xRy and ySz)}.

57

4.

Relations and Functions

We point out that two relations (R, X, Y ) and (S, V, Z) can only be composed in
the order S R if the target Y of R is identical to the source V of the S, i.e. if Y = V .
Sometimes we will just write SR := S R, for short. We illustrate composition in a
continuation of Example 4.3.
Example 4.5 We consider the relation R X Y from Example 4.3 and we let
Z := X = Y . Moreover, we consider the predecessor relation S Y Z with
S := {(y, z) Y Z : z = y 1.}.
The following diagram illustrates the composition S R of the two relations.

0
1
2
3
4
5





-0
:
1

-0
-1
-2
-3
-4

z
-



1
2
3
4
5



5
S


Z





0
1
2
3
4
5



SR

0
1
2
3
4
5


Z

Figure 4.2: The relation S R X Z.

The composition operation on relations satisfies a number of important properties. We mention that it is associative and that the diagonal acts as identity element
with respect to composition.
Proposition 4.6 (Composition) Let R X Y , S Y Z and T Z W be
relations. Then
1. (T S) R = T (S R)
2. R X = Y R = R

58

(associativity)
(identity element)

4.2. Composition and Inverse Relations


Proof. We only prove the first statement and we leave the second one to the reader
(see Problem 4.1). Let x X and w W . Then we obtain

(x, w) (T S) R


(y Y ) (x, y) R and (y, w) (T S)


(y Y ) (x, y) R and (z Z)((y, z) S and (z, w) T )


(y Y )(z Z) (x, y) R and ((y, z) S and (z, w) T )


(z Z)(y Y ) ((x, y) R and (y, z) S) and (z, w) T


(z Z) (y Y )((x, y) R and (y, z) S) and (z, w) T


(z Z) (x, z) S R and (z, w) T

(x, w) T (S R).

Thus, we have proved (T S) R = T (S R). We note that in the above proof


we have used, among other logical transformations, free quantifier exportation (see
Example 3.8).
2
A relation does not need to be defined on all elements of the source and it does
not need to reach all elements of the target. In the next definition we capture those
elements of the source and target, respectively, which are actually in use. The
corresponding subsets of the source and the target are called domain and range,
respectively.
Definition 4.7 (Domain and range) Let R X Y be a relation. Then we
define
1. dom(R) := {x X : (y Y ) xRy}, which is called the domain of R,
2. range(R) := {y Y : (x X) xRy}, which is called the range of R.
Some authors write ran(R) or im(R) instead of range(R). In Example 4.3 we
obtain dom(R) = range(R) = A, which is a proper subset of the source and target
X = Y . Those relations for which domain and source set, on the one hand, and
range and target set, on the other hand, coincide, have special names.
Definition 4.8 (Totality) Let R X Y be a relation. Then
1. R is called left total, if dom(R) = X,
2. R is called right total, if range(R) = Y .

59

4.

Relations and Functions

None of the relations R, S and S R in Example 4.3 is left total or right total.
The less or equal relation and the divisibility relation | on N are examples of left
and right total relations. The strictly less relation < on N is left total, but not right
total. The strictly larger relation > on N is right total, but not left total. In fact,
> is just the inversion of <. Inversion is another operation that can be performed
on relations in general and we define it next.
Definition 4.9 (Inverse relation) Let R X Y be a relation. Then we define
the inverse relation R1 Y X by
R1 := {(y, x) Y X : xRy}.
Inversion intuitively means to swap source and target space, but to leave the
relation as it is otherwise. For instance, the inverse 1 is nothing but , the
inverse <1 is nothing but > (where we consider all these relation on N). Moreover,
the inverse 1 is nothing but (where we consider these relations on 2X for an
arbitrary set X). Regarding a diagram as in Example 4.3 inversion means to reverse
all the arrows. We state a number of properties regarding inversion and composition.
Proposition 4.10 (Inverse and composition) Let R X Y and S Y Z
be relations. Then
1. (R1 )1 = R.
2. (S R)1 = R1 S 1 .
3. dom(R) R1 R.
4. dom(R1 ) = range(R) and range(R1 ) = dom(R).
Proof. We prove 2. and 3. and we leave the other statements to the reader (see
Problem 4.2). Let x X and z Z. Then we obtain
(z, x) (S R)1

(x, z) S R

(y Y ) ((x, y) R and (y, z) S)

(y Y ) ((z, y) S 1 and (y, x) R1 )

(z, x) R1 S 1

Thus, we have proved (S R)1 = R1 S 1 . Now let (x, x0 ) dom(R) . Then


x = x0 and x dom(R). Hence, there exists y Y such that (x, y) R and
hence (y, x) R1 . This implies (x, x0 ) = (x, x) R1 R. Thus, we have proved
dom(R) R1 R.
2
Now we want to discuss a result that shows how composition and inverses affect
the totality of relations.
Proposition 4.11 (Totality) Let R X Y and S Y Z be relations. Then

60

4.3. Functions
1. If R and S are left total, then S R is left total.
2. If R and S are right total, then S R is right total.
3. R is left total R1 is right total.
Proof. We prove 1. and we leave 2. and 3. to the reader (see Problem 4.3). Let R
and S be left total. Then dom(R) = X and dom(S) = Y and we obtain
x dom(S R)

(z Z) x(S R)z

(z Z)(y Y )(xRy and ySz)

(y Y )(xRy and (z Z)ySz)

(y Y )(xRy and y dom(S))

(y Y ) xRy

x dom(R) = X.

Hence dom(S R) = X and S R is left total.

Problems
4.1 Let R X Y be a relation. Prove that
1. R X = Y R = R

(identity element)

4.2 Let R X Y be a relation. Prove that:


1. (R1 )1 = R,
2. dom(R1 ) = range(R) and range(R1 ) = dom(R).
4.3 Let R X Y and S Y Z be relations. Prove that:
1. If R and S are right total, then S R is right total.
2. R is left total R1 is right total.

4.3

Functions

Perhaps the most important relations that are considered in mathematics are functions. The idea of a function f : X Y is that each value x X is mapped to one
and only one function value f (x) Y . Thus, the crucial property is the uniqueness
here. For symmetry reasons we have uniqueness on the left and uniqueness on the
right-hand side, where the right uniqueness is what is required for functions.
Definition 4.12 (Uniqueness) Let R X Y be a relation.
1. R is called left unique, if for all x1 , x2 X and y Y
x1 Ry and x2 Ry = x1 = x2 .

61

4.

Relations and Functions


2. R is called right unique, if for all x X and y1 , y2 Y
xRy1 and xRy2 = y1 = y2 .
Firstly, we show how composition and inverses affect the uniqueness of relations.

Proposition 4.13 (Uniqueness) Let R X Y and S Y Z be relations.


Then we obtain:
1. If R and S are right unique, then S R is right unique.
2. If R and S are left unique, then S R is left unique.
3. R is left unique if and only if R1 is right unique.
Proof. We just prove the first statement and we leave the other statements to the
reader (see Problem 4.4). Let R and S be right unique. We prove that this implies
that S R is right unique. Let x X and let z1 , z2 Z such that x(S R)z1 and
x(S R)z2 hold. Then there are y1 , y2 Y such that xRy1 , y1 Sz1 and xRy2 , y2 Sz2
hold. Since R is right unique, we obtain y1 = y2 . That is, we have y1 Sz1 and y1 Sz2 .
This implies z1 = z2 since S is right unique. Altogether we have proved that S R
is right unique.
2
Using the notion of right uniqueness we can now formally define what a function
is.
Definition 4.14 (Function) Any left total and right unique relation R X Y
is called a function. We denote such a function by f : X Y .
The above notation f : X Y for a function just indicates that this object is a
left total and right unique relation f = (R, X, Y ). The underlying set R is usually
referenced as graph(f ) = R and it is called the graph of the function f . Functions
are often called map or mapping and we want to understand all the three words
synonymously here. The important thing is, once again, a function is more than
its graph R; the entire triple (R, X, Y ) constitutes the function. We write range(f )
for the range of a function. The domain of a function is by definition always equal
to the source space. The target space of a function is referred to by some authors
as codomain. However, we will avoid this terminology and stick to the notion pairs
source and target, on the one hand, and domain and range, on the other hand. We
mention that by Y X one denotes the set of functions f : X Y . None of the
relations R, S or S R in Example 4.5 is a function. We provide an example of a
function.
Example 4.15 Let X := Y := {0, 1, 2, 3, 4, 5} and let
R := {(0, 1), (1, 1), (2, 3), (3, 2), (4, 4), (5, 5)}.
Then R X Y is a relation that is left total and right unique. Hence it defines a
function f : X Y . The relation R is illustrated in the diagram in Figure 4.3.

62

4.3. Functions
' $

' $

0
1
2
3
4
5

0
1
2
3
4
5

& %

& %

Figure 4.3: A function f : X Y .

A characteristic feature of a function f : X Y with R := graph(f ) is that for


each x X there is one and only one value f (x) Y such that x is related to f (x),
i.e. such that (x, f (x)) R holds. This value will be called the function value of f
on input x.
Definition 4.16 (Function value) Let f : X Y be a function with graph(f ) =
R. Then we define f (x) Y to be the unique value in the set
{y Y : xRy}
for any x X. The value f (x) is called the function value of f at x.
We need to justify why the value f (x) is well-defined by the above implicit
definition. For one, the given set {y Y : xRy} is non-empty, since R is left total
and secondly it contains exactly one point, since R is right unique. Hence, we can
define f (x) to be this point.
Example 4.17 We consider the relation R N N given by
R := {(n, k) N N : k = n2 }.
This relation is left total and right unique and hence it defines a function f : N N
with graph(f ) = R. We have
f (n) = n2
for each value n N. We could also define the function f : N N by saying that
f (n) := n2 for all n N, as this fully specifies the graph of f (see Proposition 4.18).
Another common way of denoting this definition is as follows:
f : N N, n 7 n2 .
Here the understanding is that n is an arbitrary element in the source set N and
n 7 n2 means that n is mapped to n2 . This is the same as saying f (n) := n2
for all n N.

63

4.

Relations and Functions


Sometimes one finds statements such as
We consider the function f (n) = n2 .....

The reader should be warned that this is abuse of mathematical terminology that
sometimes creates confusion and mistakes. The function in the above example is
the object f : N N and it is not fully specified without naming its source and
target set. Moreover, the object f (n) is a natural number and not a function in this
case. It is recommendable to avoid the above terminology and to keep functions and
their function values clearly separated in mathematical formulations. The equation
f (n) = n2 can only be used to define f , given its source and target set. But neither
the equation nor f (n) is the function. The following proposition justifies to define
a function f by an equation as in Example 4.17 above.
Proposition 4.18 (Graph) Let f : X Y be a function. Then
graph(f ) = {(x, y) X Y : f (x) = y}
Proof. If f : X Y is a function, then that means that R := graph(f ) is a left total
and right unique relation R X Y . We prove R = {(x, y) X Y : f (x) = y}.
If (x, y) R, then xRy and hence {f (x)} = {y 0 Y : xRy 0 } = {y} due to right
uniqueness of R. This means f (x) = y, which proves . For the other inclusion
we consider (x, y) X Y with f (x) = y. This means that y is the only value
in {y 0 Y : xRy 0 } and in particular xRy holds, i.e. (x, y) R.
2
f (n)

16
14
12
10

8
6

4
2

t
- n
2

Figure 4.4: Graph of the function f : N N, n 7 n2 .

64

4.3. Functions
This proposition says that the graph of a function can essentially be characterized
by the function values. The diagram in Figure 4.4 is a typical illustration of (a part
of) a graph of a function. This illustration uses a Cartesian coordinate system to
illustrate the graph. The horizontal axis represents the input values n, whereas the
vertical axis represents the function values f (n). The previous proposition is the
basis of the following observation which says that two functions with identical source
and target set are equal if and only if all their function values coincide.
Proposition 4.19 (Equality of functions) Let f : X Y and g : X 0 Y 0 be
functions. Then
f = g X = X 0 and Y = Y 0 and (x X) f (x) = g(x).
Proof. The fact that f : X Y and g : X 0 Y 0 are functions means that f =
(graph(f ), X, Y ) and g = (graph(g), X 0 , Y 0 ) and graph(f ) X Y and graph(g)
X 0 Y 0 are left total and right unique relations. Hence, it is clear that
f = g X = X 0 and Y = Y 0 and graph(f ) = graph(g).
So let us assume now that X = X 0 and Y = Y 0 . By Proposition 4.18 we obtain
graph(f ) = graph(g)

{(x, y) X Y : f (x) = y} = {(x, y) X Y : g(x) = y}

(x X)(y Y )(f (x) = y g(x) = y)

(x X) f (x) = g(x)

Altogether, this proves the claim.

Next we mention that the composition of two functions is a function again. In


fact this result follows from our previous results on relations.
Corollary 4.20 (Composition) Let f : X Y and g : Y Z be functions with
graphs R := graph(f ) and S := graph(g). Then the relation S R X Z is a
function too, which we denote by g f : X Z.
If f and g are functions, then the relations R and S are both left total and right
unique. Hence S R is also left total and right unique by Propositions 4.11 and 4.13.
But this means that S R is a function too. The composition of two functions can
be seen such that the functions are applied after each other. This is made precise
in the following proposition.
Proposition 4.21 (Composition) Let f : X Y and g : Y Z be functions.
Then we obtain
(g f )(x) = g(f (x))
for all x X.

65

4.

Relations and Functions

Proof. For x X and z Z we obtain by Proposition 4.18


(g f )(x) = z

(x, z) graph(g f )

(y Y )((x, y) graph(f ) and (y, z) graph(g))

(y Y )(f (x) = y and g(y) = z)

g(f (x)) = z.
2

This proves (g f )(x) = g(f (x)) for all x X.

The composition of functions f : X Y and g : Y Z is often illustrated in


so-called commutative diagrams. Figure 4.5 shows a commutative diagram, which is
called such because it does not matter in which order one goes through the diagram.
Moving from X to Y along the arrow f and to continue along g to Z leads to the
same result as if one moves from X to Z along g f .
X

gf

R
- Z

?
Y
g

Figure 4.5: A commutative diagram for the composition of two functions.


Next we mention that the diagonal X X X of any set is a function that
we actually call the identity of X.
Definition 4.22 (Identity function) Let X be a set. The function
idX : X X, x 7 x
is called the identity of X.
It is easy to see that graph(idX ) = X . We mention an immediate corollary of
Proposition 4.6.
Corollary 4.23 (Identity) Let f : X Y be a function. Then
f = f idX = idY f.
At the end of this section we mention some other types of relations which are
often used in mathematics. Relations that are only left total are called multi-valued
function or correspondence and they are typically denoted by f : X Y . They
miss the uniqueness property and hence there is not necessarily one unique function
value f (x), but an entire set f (x) Y of possible values. Relations that are only

66

4.3. Functions

relation R X Y

left total












right unique

multi-valued function f : X Y

partial function f : X * Y

right unique

left total

j
function f : X Y

left unique












right total

j
surjection f : X
Y

injection f : X , Y

right total

left unique

j
bijection f : X Y

Figure 4.6: Some common types of functions

right unique are called partial function and they are often denoted by f : X * Y
or f : X Y . Partial functions are not necessarily defined on the entire source
set X. We write dom(f ) for the domain of a partial function. The diagram in
Figure 4.6 lists some common types of functions and relations that are often used
in mathematics. We study injections, surjections and bijections more closely in the
next section.

Problems
4.4 Let R X Y and S Y Z be relations. Prove that:
1. If R and S are left unique, then S R is left unique.
2. R is left unique if and only if R1 is right unique.

67

4.

Relations and Functions

4.4

Injections, Surjections and Bijections

In this section we discuss functions that have additional totality and uniqueness
properties. We start with a definition.
Definition 4.24 (Injective, surjective, bijective) Let f : X Y be a function.
1. f is called injective, if f is left unique,
2. f is called surjective, if f is right total,
3. f is called bijective, if f is injective and surjective.
Injective, surjective and bijective functions are also called injection, surjection and
bijection, respectively.
An injection is sometimes denotes as f : X , Y , where the arrow , is supposed
to indicate that this function is injective. Such an injection is also called a function f
from X into Y . Similarly, surjections are sometimes denoted as f : X
Y , where
the arrow
indicates that this function is surjective. Such a surjection is also
called a function f from X onto Y . For bijections one sometimes sees the notation
f : X Y , but we will not use this here.
By definition an injective function is a function that cannot map two different
inputs to the same output and a surjective function is a function that yields all
values of the target space as output. We capture these characterizations in terms of
function values in the following proposition.
Proposition 4.25 (Injectivity, surjectivity and bijectivity) Let f : X Y
be a function. Then
1. f is injective if and only if for all x, y X we have that f (x) = f (y) implies
x = y,
2. f is surjective if and only if for all y Y there exists an x X with f (x) = y.
3. f is bijective if and only if for all y Y there exists exactly one x X with
f (x) = y.
We leave the proof to the reader (see Problem 4.5). Often the above characterization of injectivity is used in its contrapositive form, i.e. a function f : X Y is
injective if and only if for all x, y X we have that x 6= y implies f (x) 6= f (y). For
short: distinct inputs have to be mapped to distinct outputs. This is the reason why
some authors also call injective functions one-to-one function. However, this terminology is ambiguous, since it is also sometimes used to refer to bijective functions
and hence we will try to avoid it here.

68

4.4. Injections, Surjections and Bijections







0
1
2
3
4

0
1
2
3
4
5

0
1
2
3
4
5


-

0
1
2
3
4



injective (but not surjective)  
surjective (but not injective)  


0
1
2
3
4



bijective

0
1
2
3
4



Figure 4.7: Examples of injective, surjective and bijective functions

Figure 4.6 summarizes the different types of functions that we have seen. The
function in Example 4.15 is neither surjective not injective. The diagrams in Figure 4.7 provide examples of injective, surjective and bijective functions. We provide
some further examples.
Example 4.26
1. The square function f : N N, n 7 n2 is an example of a function that is
injective, but not surjective. Hence, f is also not bijective.
2. The square function f : Z Z, z 7 z 2 on integers is an example of a function
that is neither injective nor surjective.
3. The predecessor function

f : N N, n 7

0
if n = 0
n 1 otherwise

is an example of a function that is not injective, but surjective.


4. The maximum function max : N N N is defined by

n if n k
max(n, k) :=
k otherwise
for all n, k N. The function max is surjective, but not injective. The same
holds for the minimum function min : N N N that is defined analogously
with in place of .

69

4.

Relations and Functions


5. The identity function idX : X X, x 7 x on any set X is injective and
surjective, hence bijective.
6. The constant function cy : X Y, x 7 y is defined for any two sets X and Y
and any y Y . If each of X and Y contains at least two different elements,
then cy is neither surjective nor injective.

The examples of the predecessor function and the maximum function illustrate
another method how definitions (of functions) are often written in mathematics,
namely by case distinction.
Another way to characterize injective and surjective functions is by using their
behaviour under composition with other functions. Roughly speaking, we can divide functional equations by injective functions on the left-hand side and by surjective functions on the right-hand side and these properties actually characterize
injective and surjective functions and they explain why these types of functions play
a significant role.
Theorem 4.27 (Cancellation) Let f : X Y be a function. Then
1. f is injective if and only if for all sets Z and all functions g, h : Z X we
have that f g = f h implies g = h,
2. f is surjective if and only if for all sets Z and all functions g, h : Y Z we
have that g f = h f implies g = h.
Proof.
1. = Let f be injective and let g, h : Z X be two functions with f g = f h.
By Propositions 4.19 and 4.21 we obtain
f (g(x)) = (f g)(x) = (f h)(x) = f (h(x))
for all x X and hence g(x) = h(x) follows for all x X due to injectivity of
f by Proposition 4.25. Again by Proposition 4.19 we obtain g = h.
= Now let us assume that for all functions g, h : Z X we have that
f g = f h implies g = h. Let us now choose Z = {0} (or any other non-empty
set) and let us consider for any x X the constant function cx : Z X, z 7 x.
Let now x1 , x2 X with f (x1 ) = f (x2 ). Then by Proposition 4.21
(f cx1 )(z) = f (cx1 (z)) = f (x1 ) = f (x2 ) = f (cx2 (z)) = (f cx2 )(z)
follows for all z Z. By Proposition 4.19 this means f cx1 = f cx2 and
hence by assumption cx1 = cx2 . This implies again by Proposition 4.19 that
we obtain x1 = cx1 (y) = cx2 (y) = x2 for any y X. Hence we have proved by
Proposition 4.25 that f is injective.
2. We leave this proof to the reader (see Problem 4.5).

70

4.4. Injections, Surjections and Bijections


2
Another important observation is that injective, surjective and bijective functions
are all closed under composition.
Corollary 4.28 Let f : X Y and g : Y Z be functions. Then we obtain the
following:
1. If f and g are injective, then g f is injective.
2. If f and g are surjective, then g f is surjective.
3. If f and g are bijective, then g f is bijective.
All these statements follow from Propositions 4.11 and 4.13. Another interesting
question is when the inverse relation of a function is actually a function.
Proposition 4.29 (Inverse function) Let f : X Y be a function with R :=
graph(f ). Then the inverse relation R1 Y X is a function if and only if f is
bijective.
Proof. = Let f be a bijective function, i.e. R is left and right total and left
and right unique. Then R1 is also left and right total and left and right unique by
Propositions 4.11 and 4.13. Thus, R1 is, in particular, a function.
= If R1 is a function, then it is left total and right unique. This implies that
R is right total and left unique by Propositions 4.11 and 4.13. Moreover, since f
is a function R is also left total and right unique. Altogether, this shows that f is
bijective.
2
If f : X Y is a bijective function with R := graph(f ), then the inverse
relation R1 Y X is a function too by this result and we denote this function
by f 1 : Y X and we call it the inverse function of f . If f : X Y is only
an injective function, then the inverse relation R1 is only a partial function (see
Problem 4.6). It is common practice in mathematics to denote this partial function
also by f 1 : Y * X. This partial function can also be considered as a function
of type f 1 : range(f ) X. In other words, the inverse of an injective function
f always exists as a function with range(f ) as source set. We obtain the following
result as corollary of Proposition 4.10.
Corollary 4.30 (Inverse function) Let f : X Y and g : Y Z be bijective
functions. Then
1. f 1 is bijective and (f 1 )1 = f .
2. g f is bijective and (g f )1 = f 1 g 1 .
3. f f 1 = idY and f 1 f = idX .

71

4.

Relations and Functions

The bijective functions of type f : X X (with identical source and target set)
have particularly nice properties. They form what is called the symmetric group on
X. We mention all the relevant properties.
Corollary 4.31 (Symmetric group) Let X be a set. Then we obtain for all bijective f, g, h : X X the following:
1. (f g) h = f (g h)

(associative)

2. f idX = idX f = f

(identity)

3. f f 1 = f 1 f = idX

(inverse)

The bijective functions f : X X are also called permutations. This terminology is in particular used if X is a finite set. This is because a bijective map
f : X X actually permutes the elements of X. The bijective function in Figure 4.7
is a typical example of a permutation on a finite set.

Problems
4.5 Let f : X Y be a function. Prove the following:
1. f is injective if and only if for all x, y X we have that f (x) = f (y) implies x = y,
2. f is surjective if and only if for all y Y there exists an x X with f (x) = y.
3. f is bijective if and only if for all y Y there exists exactly one x X with f (x) = y.
4. f is surjective if and only if for all functions g, h : Y Z we have that g f = h f
implies g = h.
4.6 Let f : X Y be a function with R := graph(f ). Prove that the inverse relation
R1 Y X is a partial function of type f 1 : Y * X if and only if f is injective. Prove
that for injective f one obtains dom(f 1 ) = range(f ). Hence, one can also consider this
partial function as a function f 1 : range(f ) X.
4.7 Let X and Y be non-empty sets. Prove that the canonical projections
pX : X Y X, (x, y) 7 x and pY : X Y Y, (x, y) 7 y
are both surjective.
4.8 Let fi : Xi Yi be functions for i {1, 2}. Then we define the product function by
f1 f2 : X1 X2 Y1 Y2 , (x1 , x2 ) 7 (f1 (x1 ), f2 (x2 )).
Prove the following:
1. f1 and f2 injective = f1 f2 injective,
2. f1 and f2 surjective = f1 f2 surjective,
3. f1 and f2 bijective = f1 f2 bijective,

72

4.5. Families, Sequences and Restrictions


4. f1 and f2 bijective = (f1 f2 )1 = f11 f21 .
4.9 Let X be a set. Prove the following:
1. The union U : 2X 2X 2X , (A, B) 7 A B is surjective, but not injective in
general.
2. The intersection I : 2X 2X 2X , (A, B) 7 A B, surjective, but not injective in
general.
3. The complement C : 2X 2X , A 7 X \ A is bijective.
4.10 Let f : X Y and g : Y Z be functions. Prove the following:
1. g f surjective = g surjective,
2. g f injective = f injective.
4.11 Prove that for each function f : X Y there exists a set Z and a surjective function
g : X Z and an injective function h : Z Y such that f = h g.

4.5

Families, Sequences and Restrictions

In this section we just introduce some further terminology that is related to the
source set of a function. A sequence in X is just another name for a function
f : N X and a family in X indexed by I is just a function f : I X. There are
special ways of denoting such functions.
Definition 4.32 (Family and sequence) Let I and X be non-empty sets and let
xi X for each i I. Then (xi )iI is just another way of writing the function
f : I X, i 7 xi
and this function is called a family in X (indexed by I). A family (xn )nN in X that
is indexed by N is called a sequence in X.
The notation (xn )nN for sequences can easily be read as generalization of the
notation (x1 , x2 , ..., xn ) for ntuples, since a sequence (xn )nN can be considered in
some sense as the infinite tuple
(x0 , x1 , x2 , x3 , ...).
However, one should keep in mind that formally we mean by (xn )nN the function
f : N X, n 7 xn . Sometimes sequences are also written as {xn }nN , but this
notation is misleading since one has to distinguish a sequence (xn )nN (which is a
function f : N X) from the set {xn : n N} (which is, in fact, nothing but
range(f )). See also Problem 4.12. We mention that the terminology of an indexed
family of sets S
(Xi )iI naturally falls under the terminology of a family introduced
here. If X := iI Xi , then (Xi )iI can be seen as a family in 2X indexed by I. In
other words, what we mean by (Xi )iI is exactly the function f : I 2X , i 7 Xi .

73

4.

Relations and Functions

Occasionally, one is interested in changing the source set of a function. Since the
source set is part of what constitutes the function, this might change the properties
of the function.
Definition 4.33 (Restriction and extension) Let f : X Y be a function and
let A X. Then we define the restriction of f to A by
f |A : A Y, x 7 f (x).
In this situation f is also called an extension of f |A .
So,
f as it
In this
having

in other words, the restriction f |A of f to A is simply obtained by leaving


is, but by allowing only inputs from the (potentially smaller) source set A.
way one can cut off pieces of f that stop f form being injective or from
other properties. We give an example.

Example 4.34
1. We consider the function f : X Y from Example 4.15. This function
f is not injective, since f (0) = f (1) = 1. By restricting f to either A =
{0, 2, 3, 4, 5} or to B = {1, 2, 3, 4, 5} we obtain restrictions f |A : A Y and
f |B : B Y that are both injective.
2. We consider the square function f : Z Z, z 7 z 2 , which is not injective
since, for instance, f (1) = f (1) = 1. If we restrict f to N, then the resulting
restriction f |N : N Z is injective.
Later we will prove in Proposition 4.51 that any function f : X Y has a
restriction f |A : A Y with the same range, i.e. such that range(f ) = range(f |A ).
However, this proof requires the Axiom of Choice.

Problems
4.12 Show that there are two sequences (xi )iN and (yi )iN in N such that
(xi )iN 6= (yi )iN and {xi : i N} = {yi : i N}.
4.13 Let X be a set. We consider the union map
U : 2X 2X 2X , (A, B) 7 A B.
Find a restriction U |Y of U that is bijective.

4.6

Images and Preimages

When we work with functions f : X Y we are often not just interested in single
function values, but we would like to know how a function f behaves on certain
subsets A X or B Y . In order to express such properties, we define the image
of a set under a function and the preimage of a set under a function.

74

4.6. Images and Preimages


Definition 4.35 (Image and primage) Let f : X Y be a function and let
A X and B Y . Then we define
1. f (A) := {y Y : (x A) f (x) = y}, the image of A under f ,
2. f 1 (B) := {x X : f (x) B}, the preimage of B under f .
In other words, the values in the image f (A) are all the function values of f that
one obtains for inputs from A and the set f 1 (B) is the set of all inputs that yield
outputs in B. In particular, we obtain range(f ) = f (X) and X = f 1 (Y ).

i
A

f 1 (B)

y
B = f (A)

z
f :X Y
X

Figure 4.8: Image B = f (A) and preimage f 1 (B)


The diagram in Figure 4.8 illustrates the situation that we get if we start with
a function f : X Y and a set A X: in a first step we consider the image
B = f (A) and in a second step the preimage f 1 (B). The diagram illustrates that
the preimage f 1 (B) is potentially larger then the set A that we started with. This
is because elements from X that are not in A can also potentially be mapped to
B = f (A). So, what we can say is that A f 1 (f (A)). A proper proof of this fact
is requested in Problem 4.19.
Correspondingly, the diagram in Figure 4.9 illustrates the situation that we get
if we start with a function f : X Y and a set B Y : in a first step we consider
the preimage A = f 1 (B) and in a second step we consider the image f (A). In
general, we only get that f (f 1 (B)) B and potentially f (f 1 (B)) is smaller than
the set B we started with. This is because some elements of B might not be in the
range of f . A proper proof of this fact is subject of Problem 4.19.
It is important to emphasize that f 1 in the definition of the preimage does not
refer to the inverse function, but the notation f 1 is overloaded with two different
meanings. If f 1 is used together with a set B, such as in f 1 (B), then it refers
to the preimage of B under f , which always exists and if f 1 is used with a single
value y Y , such as in f 1 (y), then it refers to the inverse function of f , which does
not need to exist. We mention that there is a special name for the sets f 1 ({y}).

75

4.

Relations and Functions

y
A = f 1 (B)


z
f (A)

y
B



f :X Y
X

Figure 4.9: Preimage A = f 1 (B) and image f (A)

Definition 4.36 (Fiber) Let f : X Y be a function and y Y . Then f 1 ({y})


is called the fiber over y.
In general, the fiber over y contains many elements, namely all those elements
x X that are mapped by f to y. If the inverse function f 1 exists, then f 1 (B)
is the same thing as the image of B under the inverse function f 1 and hence this
overloading of notation is justified. We capture this for the special case of fibers in
the following proposition.
Proposition 4.37 (Inverse function and preimage) If f : X Y is an injective function, then
f 1 ({y}) = {f 1 (y)}
for all y range(f ).
Here f 1 appears in two different meanings, on the left-hand side of the equality
it appears as notation for the preimage, on the right-hand side of the equality it
appears as notation for the inverse function. We recall that for injective f the
function f 1 can either be considered as partial function f 1 : Y * X or as ordinary
function f 1 : range(f ) X. We leave the obvious proof of the proposition to
the reader. It is very important to keep in mind that the preimage f 1 (B) exists
irrespectively of whether the inverse function f 1 exists or not.
In a context where the values might be sets as well, it is better to use a slightly
different notation in order to avoid confusion. Some authors write f [A] for the image
and f 1 [B] for the preimage in such a context. The image is sometimes also called
forward image and the preimage is called inverse image as well. As a first result we
mention a monotonicity property of the image and the preimage that shows that
both preserve the subset relation.
Proposition 4.38 (Monotonicity of image and preimage) Let f : X Y be
a function and let A, B X and C, D Y . Then

76

4.6. Images and Preimages


1. f () = , f 1 () = ,
2. A B = f (A) f (B),
3. C D = f 1 (C) f 1 (D).
Proof.
1. This property is easy to verify.
2. We leave this one to the reader (see Problem 4.14).
3. Let C and D be subsets of Y with C D. We need to prove f 1 (C) f 1 (D).
Let x f 1 (C). This means f (x) C. Since C D, we obtain f (x) D.
But this means x f 1 (D).
2
Images and preimages are not completely independent constructions. There
are some important relations between images and preimages, which are studied in
Problems 4.17 and 4.19. For many applications in mathematics it is important
to understand how set theoretical operations behave with respect to images and
preimages. The rough thumb rule is that preimages are much better behaved than
images and images basically only perserve unions. We make this precise in the
following proposition.
Proposition 4.39 (Image, preimage and set operations) Let f : X Y be
a function and let (Ai )iI be an indexed family of subsets of X and let (Bi )iI be an
indexed family of subsets of Y . Let A, B X and C, D Y . Then
S
S
1. f ( iI Ai ) = iI f (Ai ),
T
T
2. f ( iI Ai ) iI f (Ai ),
3. f (A \ B) f (A) \ f (B),
S
S
4. f 1 ( iI Bi ) = iI f 1 (Bi ),
T
T
5. f 1 ( iI Bi ) = iI f 1 (Bi ),
6. f 1 (C \ D) = f 1 (C) \ f 1 (D).
Proof.
1. Let (Ai )iI be an indexed family of sets. Then we obtain for all y Y
S
S
y f ( iI Ai ) (x iI Ai ) f (x) = y

(i I)(x Ai ) f (x) = y

(i I) y f (Ai )
S
y iI f (Ai ).

77

4.

Relations and Functions


S
S
This shows f ( iI Ai ) = iI f (Ai ).
2. to 4. We leave these proofs to the reader (see Problem 4.14).
5. Let (Bi )iI be an indexed family of sets. Then we obtain for all x x
T
x f 1 ( iI Bi )

f (x)

(i I) f (x) Bi

(i I) x f 1 (Bi )
T
x iI f 1 (Bi ).

This shows f 1 (

iI

Bi ) =

iI

iI

Bi

f 1 (Bi ).

6. We leave this proof to the reader (see Problem 4.14).


2
It is important to point out that the image does not preserve intersections and
differences, but we only have the inclusions given in 2. and 3. It is easy to find
examples that show that the other inclusions are not valid in general. However, for
injective functions one can prove somewhat more (see Problem 4.15). Problem 4.16
is about how restrictions affect the preimage. In Problem 4.20 we discuss two maps
f and f , which are induced by the image and preimage, respectively. We close this
section with a result that shows how images and preimages of compositions can be
determined.
Proposition 4.40 (Image, preimage and composition) Let f : X Y and
g : Y Z be functions and let A X and B Y . Then the following holds:
1. (g f )(A) = g(f (A)),
2. (g f )1 (B) = f 1 (g 1 (B)).
We leave the proof to the reader (see Problem 4.21).

Problems
4.14 Let f : X Y be a function and let (Ai )iI be an indexed family of subsets of X and
let (Bi )iI be an indexed family of subsets of Y . Let A, B X and C, D Y . Prove the
following:
1. A B = f (A) f (B),
T
T
2. f ( iI Ai ) iI f (Ai ),
3. f (A \ B) f (A) \ f (B),
S
S
4. f 1 ( iI Bi ) = iI f 1 (Bi ),
5. f 1 (C \ D) = f 1 (C) \ f 1 (D).

78

4.6. Images and Preimages


4.15 Let f : X Y be an injective function and A, B X. Prove the following:
1. f (A B) = f (A) f (B),
2. f (A \ B) = f (A) \ f (B).
4.16 Let f : X Y be a function and let A X and B Y . Prove that
(f |A )1 (B) = A f 1 (B).
4.17 Let f : X Y be a function and let A X and B Y . Prove that
f (A) B A f 1 (B).
4.18 Let f : X Y be a function. Prove the following:
1. f is injective if and only if f 1 ({y}) contains at most one element for each y Y .
2. f is surjective if and only if f 1 ({y}) contains at least one element for each y Y .
3. f is bijective if and only if f 1 ({y}) contains exactly one element for each y Y .
4.19 Let f : X Y be a function. Prove the following:
1. A f 1 (f (A)) for each set A X,
2. f (f 1 (B)) B for each set B Y ,
3. f is injective if and only if A = f 1 (f (A)) for each set A X,
4. f is surjective if and only if B = f (f 1 (B)) for each set B Y .
4.20 For each function f : X Y we define two associated functions
1. f : 2X 2Y , A 7 f (A)
2. f : 2Y 2X , B 7 f 1 (B)

(image map)
(preimage map)

Prove the following:


1. f is injective if and only if f is injective if and only if f is surjective,
2. f is surjective if and only if f is surjective if and only if f is injective,
3. f is bijective if and only if f is bijective if and only if f is bijective,
4. If f is bijective then (f )1 = f .
4.21 Let f : X Y and g : Y Z be functions and let A X and B Y . Prove that
the following holds:
1. (g f )(A) = g(f (A)),
2. (g f )1 (B) = f 1 (g 1 (B)).

79

4.

4.7

Relations and Functions

Set of Functions

In this section we discuss the set Y X of all functions f : X Y for two given sets
X and Y . As we will see, this concept generalizes the concept of a power set in some
sense and it can be considered as an exponentiation operation for sets.
Definition 4.41 (Set of functions) Let X and Y be sets. Then we denote by
Y X the set of all functions f : X Y and by X! the set of bijective functions
f : X X.
Some authors denote the set of bijective functions also by SX , since it is also
called the symmetric group on X, as mentioned in Corollary 4.31. There is one
important function that comes associated with the function set Y X and which is
called evaluation.
Definition 4.42 (Evaluation) Let X and Y be sets. Then we define the evaluation map by
ev : Y X X Y, (f, x) 7 f (x)
Sometimes the evaluation map is also called apply operation since it applies the
first argument (which is a function) to the second argument (which is a suitable
input). The next theorem is telling us that we can identify the set Z XY with
the set (Z Y )X . This corresponds to the arithmetic rule that for natural numbers
x, y, z N we have (z y )x = z xy . The bijection that maps Z XY to (Z Y )X is called
currying operation since it has been studied by Haskell Curry (and indeed already
earlier by others such as Moses Schonfinkel).
Theorem 4.43 (Currying) Let X and Y be sets. Then the so-called currying
operation
C : Z XY (Z Y )X ,
which is defined by C(f )(x)(y) := f (x, y) for all functions f : X Y Z and all
x X and y Y , is bijective.
Proof. Let g : X Z Y be a function. Then we can define a function f : X Y Z
by
f (x, y) := g(x)(y)
for all x X and y Y . For this function f we obtain
C(f )(x)(y) = f (x, y) = g(x)(y)
for all x X and y Y . Hence C(f )(x) = g(x) for all x X, which means
C(f ) = g. This shows that C is surjective. Now, let f1 , f2 : X Y Z be two
functions with C(f1 ) = C(f2 ). This implies C(f1 )(x) = C(f2 )(x) for all x X and
hence
f1 (x, y) = C(f1 )(x)(y) = C(f2 )(x)(y) = f2 (x, y)

80

4.7. Set of Functions


for all x X and y Y . This means f1 = f2 . Hence C is injective. Altogether, we
have proved that C is bijective.
2
Now we want to show in which sense the exponentiation Y X generalizes the
power set construction. There is a particular function A : X {0, 1} for each set
A X which is called the characteristic function.
Definition 4.44 (Characteristic function) Let X be a set. For each subset A
X we define the characteristic function A : X {0, 1} of A by

1 if x A
A : X {0, 1}, x 7
.
0 otherwise
The characteristic function A is called so, since one can think about it as if it
answers the question of whether x A by the result 1 for true and 0 for false.
In particular, we have
x A A (x) = 1
for all x X. The following result shows now in which sense the function set
construction Y X generalizes the power set construction 2X . Namely, there is a
bijection between 2X and {0, 1}X .
Theorem 4.45 (Characteristic function) Let X be a set. Then the following
map is bijective:
: 2X {0, 1}X , A 7 A .
We leave the proof to the reader (see Problem 4.23). This result is telling us
that we can somehow identify the power set 2X with the set of functions {0, 1}X ,
where each set A 2X is represented by its characteristic function A . That the
map is bijective means that we do not loose any information when we move from
the set A to its characteristic function A or backwards.

Problems
4.22 Let X, Y and Z be sets. Prove that for any function f : X Y Z we obtain
ev (C(f ) idY ) = f.
Here C denotes the currying operation as defined in Theorem 4.43 and ev : Z Y Y Z
denotes the evaluation map. The following diagram illustrates the situation.
XY

C(f ) idY

R
- Z

?
ZY Y
ev

81

4.

Relations and Functions

4.23 Let X be a set. Prove that the following map is bijective:


: 2X {0, 1}X , A 7 A .
4.24 Let X be a set and let A, B X. Prove the following for all x X:
1. AB (x) = min(A (x), B (x)),
2. AB (x) = max(A (x), B (x)),
3. Ac (x) = 1 A (x).
Here the complement is understood with respect to X. We denote by min, max : N2 N
the minimum and maximum on natural numbers, respectively, as defined in Example 4.26.
4.25 Let X and Y be both sets with more than one element. Prove that
1. The range map range : Y X 2Y , f 7 range(f ) is not injective. Is it surjective?
2. The graph map graph : Y X 2XY , f 7 graph(f ) is injective, but not surjective.
3. The inversion map inv : X! X!, f 7 f 1 is bijective.

4.8

The Axiom of Choice


The Axiom of Choice (together with the Continuum Hypothesis)
is probably the most interesting and most discussed axiom in
mathematics after Euclids Axiom of Parallels.
P. Bernays and A.A. Fraenkel (1958)

There is a particularly important axiom in set theory, which is called the Axiom
of Choice. Perhaps it is the most controversial set-theoretical axiom and some
mathematicians prefer not to use it or to indicate at least, whenever they use this
axiom. However, often this axiom is applied tacitly without even mentioning it. We
phrase this axiom in form of a definition.
Definition 4.46 (Axiom of Choice) The Axiom of Choice is the statement that
for any set X there exists a choice function
CX : 2X \ {} X
with CX (A) A for all non-empty A X.
What the choice function CX does is that for any non-empty set A X it selects
a point x = CX (A) with the property that x A. This seems to be a trivial task
since any non-empty set A has to have some member x. This is the reason why
most mathematicians readily accept the Axiom of Choice. However, from a more
constructive point of view, the axiom is debatable, since it does not specify how
such a point x shall be chosen in general. The other axioms of set theory, such
as the power set axiom, specify in some sense how the object whose existence is

82

4.8. The Axiom of Choice


postulated is actually constructed. This is different in case of the Axiom of Choice.
The existence of a certain set (a left total and right unique relation that is the graph
of a choice function CX ) is postulated for each set X without further specification.
Indeed, one can prove that the Axiom of Choice directly implies the Principle of
Exclude Middle and hence it is not constructive in this sense. We formulate and
prove a corresponding theorem.
Theorem 4.47 (Diaconescu-Goodman-Myhill 1975) The Axiom of Choice directly implies the Principle of Excluded Middle (and directly here means that this
implication can be shown with a direct proof that does not use the principle itself ).
Proof. Let P be a proposition. We have to show that P P is true (without
using this principle itself). We consider the sets
U := {x {0, 1} : (x = 0) P } and V := {x {0, 1} : (x = 1) P }.
Certainly both sets are non-empty since 0 U and 1 V . Now we consider the set
X := 2{0,1} . By the Axiom of Choice there is a choice function CX for X and this
function has the property that CX (U ) U and CX (V ) V . Now, if P is true, then
U = V = {0, 1} and hence CX (U ) = CX (V ). This means that CX (U ) 6= CX (V )
implies P . Now we see
(CX (U ) U ) (CX (V ) V ) = ((CX (U ) = 0) P ) ((CX (V ) = 1) P ))
= (CX (U ) 6= CX (V )) P
= P P.
This was to be proved.

In the Appendix on Axiomatic Set Theory we have given a slightly different


alternative formulation of the Axiom of Choice. In the next section we will see even
a further equivalent version. The Axiom of Choice is known to be independent of
the other axioms of Zermelo-Fraenkel set theory (which follows from work of Kurt
Godel and Paul Cohen). Hence, it can neither be proven from those nor can it be
refuted on basis of the other axioms. A reasonable perspective is to think about
mathematics as a very rich theory that can explore theorems based on the Axiom
of Choice and also theorems which are not based on it (or even theorems based on
other competitive axioms that might contradict the Axiom of Choice). We will adapt
this perspective here to some extent and hence we will try to indicate whenever we
actually use the Axiom of Choice. We will give some application soon. We need the
notion of a right inverse.
Definition 4.48 (Right inverse) Let f : X Y and g : Y X be functions.
1. g : Y X is called a right inverse of f if f g = idY .
2. g : Y X is called a left inverse of f if g f = idX .

83

4.

Relations and Functions


3. g : Y X is called an inverse of f if g is a left and right inverse of f .

By Corollary 4.30 the inverse f 1 of a function f is an inverse if it exists. However, by by Proposition 4.29 the inverse function f 1 only exists if f is bijective.
We generalize this observation. However, the proof requires the Axiom of Choice.
Theorem 4.49 (Left and right inverses) Let X and Y be non-empty sets and
let f : X Y be a function. Then
1. f has a right inverse if and only if f is surjective.
2. f has a left inverse if and only if f is injective.
3. f has an inverse if and only if f is bijective.
The proof of statement 1. uses the Axiom of Choice.
Proof.
1. Let us assume that the Axiom of Choice holds. We need to show that for
every function f : X Y it holds that f has a right inverse if and only if f is
surjective. Let us fix some function f : X Y .
= Let f be surjective. Then there is a choice function CX : 2X \ {} X
with CX (A) A for each non-empty set A X. We need to show that f has
a right inverse g : Y X. Since f is surjective, the preimage f 1 ({y}) is a
non-empty subset of X for each y Y and hence we can define g by
g(y) := CX (f 1 ({y}))
for every y Y . Then g(y) f 1 ({y}), i.e. f g(y) = y for all y Y , which
means f g = idY . Hence f has a right inverse g.
= If, on the other hand, f has a right inverse g : Y X, then for each
y Y we have that f (g(y)) = f g(y) = idY (y) = y, hence f is surjective. We
do not need the Axiom of Choice for this direction.
2. We leave this proof to the reader (see Problem 4.26).
3. = Let f be bijective. Then the inverse function f 1 of f exists by Proposition 4.29 and by Corollary 4.30 we obtain f f 1 = idY and f 1 f = idX .
Hence, the inverse f 1 is a right inverse as well as a left inverse of f .
= Let f have a right inverse g : Y X and a left inverse h : Y X, i.e.
f g = idY and h f = idX . We obtain by associativity
g = idX g = (h f ) g = h (f g) = h idX = h.
Hence g is a left inverse and a right inverse of f , hence it is an inverse.

84

4.9. Infinite Products


2
The proof of 3. also shows that the inverse of a function f , if it exists, is uniquely
determined, namely it is the inverse function f 1 .
Corollary 4.50 Let f : X Y be a bijective function. Then the inverse function
f 1 : Y X is the uniquely determined inverse of f .
We mention another result whose proof requires the Axiom of Choice.
Proposition 4.51 (Injective restriction) The following statement is a consequence of the Axiom of Choice. Let f : X Y be a function. Then there exists
a subset A X such that the restriction f |A : A Y is injective and such that
range(f ) = range(f |A ).
Proof. Let f : X Y be a function. We define a function h : X range(f ) by
h(x) := f (x) for all x X. This function is surjective and hence it admits a right
inverse g : range(f ) X by Theorem 4.49, which requires the Axiom of Choice.
Let A := range(g). Then we obtain for each y range(f )
f |A g(y) = f (g(y)) = h(g(y)) = y,
hence range(f |A ) = range(f ). Now let x, y A with f |A (y) = f |A (x). Then there
are w, z range(f ) such that g(w) = x and g(z) = y. Hence
w = f |A g(w) = f |A (x) = f |A (y) = f |A g(z) = z,
which implies x = y. Hence f |A is injective.

The Axiom of Choice has not only lots of important applications in mathematics,
but also some counter-intuitive consequences. One of those is the Banach-Tarski
Paradox, which is in fact a theorem that follows from the Axiom of Choice. It
states that a solid ball in the three dimensional Euclidean space can be decomposed
into finitely many disjoint pieces that can be reassembled to two balls of the same
volume as the original ball. And this process can be performed by rotations and
other geometrical transformations that do not change the shape of the pieces. The
pieces themselves are, however, very complicated and not like solid physical objects.

Problems
4.26 Let X and Y be non-empty sets and let f : X Y be a function. Prove that f has a
left inverse if and only if f is injective.

4.9

Infinite Products

When we defined the Cartesian product of sets X Y we generalized this concept


only to finite products Xni=1 Xi , but not to products over indexed families of sets.
Using functions we can now provide such a generalization.

85

4.

Relations and Functions

Definition 4.52 (Product) Let (Xi )iI be an indexed family of sets. Then we
define the product
)
(
Y
[
Xi := f : I
Xi : (i I) f (i) Xi .
iI

iI

Q
We note that in case that Xi = X for all i I, we obtain iI X = X I . In this
sense the exponentiation or functions set construction is a special case of the product.
We will see in Problem 4.28 that the product also generalizes the finite Cartesian
product Xni=1 Xi that we have considered earlier. Now we show that the Axiom of
Choice is equivalent to the statement that this product is non-empty whenever the
sets Xi are all non-empty.
Theorem 4.53 (Non-empty products) The following statement is equivalent to
the Axiom of Choice. For all indexed families (Xi )iI we obtain
Y

Xi 6= (i I) Xi 6= .

iI

S
Proof. Let X := iI Xi . Let us assume the Axiom of Choice holds. We need to
show that the statement in the theorem holds too. We consider both directions.
= Let Xi 6= for all i I. We need to prove that there exists a function
f : I X. By the Axiom of Choice there is a choice function CX : 2X \ {} X
for X. Since Xi 6= for all i I, we obtain CX (Xi ) Xi . Hence, we Q
can define a
suitable
f
by
f
(i)
:=
C
(X
)
for
all
i

I.
For
this
f
we
obtain
f

i
X
iI Xi and
Q
hence iI 6= .
= We prove the contrapositive statement. Let j I be such that Xj = .
Then,
obviously, there cannot be any function f : I X with f (j) Xj . Hence
Q
X
i = . For this direction we have not used the Axiom of Choice.
iI
Let us now assume that the statement in the theorem is correct. We prove that
under this assumption the Axiom of Choice follows. Let X be a set. We consider
the indexed family of sets (YA )AI with I := 2X \ {} and YA := A for each A I.
We have to prove that there is a function
CX : 2X \ {} X with CX (A) A for
Q
each non-empty
S A X. But if AI YA is non-empty, then there exists a function f : I AI YA with f (A) YA = A for each A I = 2X \ {}. Hence
CX (A) := f (A) is a suitable choice for each
Q non-empty A X. Thus, the axiom of
choice follows from the non-emptyness of AI YA .
2
There are many other theorems in mathematics that are, in fact, equivalent to
the Axiom of Choice. We just mention two examples:
1. The statement that each vector space has a basis is equivalent to the Axiom of
Choice. This statement is a crucial and fundamental fact in Linear Algebra.

86

4.9. Infinite Products


2. The Theorem of Tychonoff, which is the statement that the product of compact
topological spaces is compact, is equivalent to the Axiom of Choice. This
theorem is an important theorem in Topology.1
The product of sets comes with associated maps prj for each j I, which are
called the canonical projections:
Y
prj :
Xi Xj , f 7 f (j).
iI

These maps are all surjective (see Problem 4.27). The product together with the
canonical projections satisfies a so-called universal property that we formulate in the
following result.
Theorem 4.54 (Product) Let (Xi )iI be an indexed family of sets. For each set
Y and each
Q family (fi )iI of functions fi : Y Xi there exists exactly one function
f : Y iI Xi such that
fj = prj f
for all j I.
Proof. We first prove the existence of f . Let Y be a set and Q
let (fi )iI be a family
of functions fi : Y Xi . Then we define a function f : Y iI Xi by
f (y)(i) := fi (y)
S
for all y Y and i I. This f is well-defined, since the function f (y) : IQ
iI Xi
has the property that f (y)(i) = fi (y) Xi for each i I, hence f (y) iI Xi for
all y Y . Now we obtain
fj (y) = f (y)(j) = prj (f (y)) = prj f (y)
for each y Y and j I, which means fj = prj f for each j QI. Now we still
need to prove that f is uniquely determined. Hence, let g : Y iI Xi be some
function such that fj = prj g for all j I. We have to show that f = g. We obtain
g(y)(j) = prj g(y) = fj = prj f (y) = f (y)(j)
for all y Y and j I, hence g(y) = f (y) for all y Y and this means f = g. This
completes the proof.
2
The diagram in Figure 4.10 illustrates the situation of the proof. It is an example
of a commutative diagram. For finite
Q products (i.e. finite sets I) we do not have
to distinguish between the product iI Xi and the product XiI Xi that we have
introduced
Q earlier. We leave the proof to the reader (see Problem 4.28). Thus, the
product iI Xi introduced in this section actually generalizes the finite product
XiI Xi .
1

Compactness is a notion that is studied in Topology and Analysis. It plays a very important
role because compact sets have many properties in common with finite sets, although they are not
necessarily finite.

87

4.

Relations and Functions


Q

iI

Xi


prj

?
- Xj

fj

Figure 4.10: Commutative diagram for fj = prj f .

Problems
4.27 Let (Xi )iI be a family of sets. Prove that the canonical projections prj :
are surjective for all j I. Prove that these maps are not injective in general.
4.28 Let I := Nn = {1, 2, ..., n} for some n 1. We consider the projections
n

pj : X Xi Xj , (x1 , x2 , ..., xn ) 7 xj .
i=1

Prove that there is exactly one map


n

F : X Xi
i=1

Xi

iI

with the property that pj = prj F . Show that F is bijective.

88

iI

Xj

CHAPTER

Cardinality
The infinite! No other question has ever moved so profoundly the spirit of man.
David Hilbert (18621943)

5.1

What is the Cardinality of a Set?

Obviously, for some considerations about sets the size of a set matters. This size of a
set X is called the cardinality of X and it is often denoted by |X|. Some authors also
write card(X) = |X|. To define exactly what kind of quantity |X| is, is somewhat
non-trivial and we will not do this here. There is a mathematically precise way
to interpret |X| as a so-called cardinal number, which is a quantity that can take
natural number values but also many different infinite values. Perhaps surprisingly,
we do not have to specify what |X| exactly is, in order to work with cardinalities.
We will just specify what expressions like |X| |Y | mean without saying what |X|
and |Y | actually are. We make this precise in the following definition.
Definition 5.1 (Cardinality) Let X and Y be sets.
1. We write |X| = |Y | and we say that X has the same cardinality as Y if there
is a bijective map f : X Y .
2. We write |X| |Y | and we say that X has smaller or the same cardinality as
Y if there is an injective map f : X Y .
3. We write |X| < |Y | and we say that X has strictly smaller cardinality than Y
if |X| |Y | and not |X| = |Y |.
To make this clear again: if we write |X| |Y |, we are not saying that some
sort of number |X| is less or equal to some sort of number |Y | (although there is a

89

5.

Cardinality

meaningful way to interpret things in this direction), but the expression |X| |Y |
is simply a short way to say that there exists an injective function f : X Y . We
give a simple example.
Example 5.2 The two sets {1, 2, 3} and {A, B, C} (where we assume that A, B
and C are pairwise distinct objects) are of the same cardinality, i.e. |{1, 2, 3}| =
|{A, B, C}|. We can easily verify this using a bijective function f : {1, 2, 3}
{A, B, C} defined by f (1) := A, f (2) := B and f (3) := C.
If one accepts the Axiom of Choice, then |X| |Y | is the same as saying that
there is a surjective function g : Y X.
Proposition 5.3 The following statement follows from the Axiom of Choice. Let
X and Y be non-empty sets. There exists an injective function f : X Y if and
only if there exists a surjective function g : Y X.
Proof. Let f : X Y be injective. Then f has a left inverse g : Y X by
Theorem 4.49 and hence g f = idX . This implies that g : Y X has to surjective.
For the other direction we assume that there exists a surjective function g : Y X.
Then by Theorem 4.49 the Axiom of Choice implies that there exists an right inverse
f : X Y of g, i.e. g f = idX . This implies that f has to be injective.
2
Hence, if one accepts the Axiom of Choice, then it does not matter whether
one defines |X| |Y | via injections f : X Y or via surjections g : Y X. If,
however, one wants to be rather careful and independent of the Axiom of Choice,
then one should use injections here. The previous proposition is not correct for the
special case X = and Y 6= . In this case, the only function f : Y is injective,
but there is no function whatsoever of type g : Y , in particular, no surjective
one. As a first result we show that any subset A X of a set X has smaller or the
same cardinality than X.
Proposition 5.4 (Inclusion and cardinality) Let X and Y be sets. Then
X Y = |X| |Y |.
Proof. Let X and Y be sets with X Y . We consider the identity idY restricted
to X, i.e. the function
f : X Y, x 7 x.
This function is clearly injective, hence |X| |Y |.

However, one should be careful since X $ Y does not necessarily mean |X| < |Y |.
That is a set X can very well be smaller than Y in these sense that it contains fewer
elements without having a smaller cardinality. We give an example.

90

5.1. What is the Cardinality of a Set?


Proposition 5.5 (Hilberts hotel) We have
|N| = |2N|,
where 2N denotes the set of even natural numbers.
2

Proof. Using the bijective function f : N 2N, n 7 2n we obtain |N| = |2N|.

This example is also known as the Hilbert Hotel Paradox, although it is not really
a paradox. The story goes as follows: imagine a hotel with infinitely many rooms
numbered by natural numbers 0, 1, 2, 3, .... Suppose all the rooms are occupied and
there are 5 new guests arriving. Then you can easily create space, by asking all the
existing guests to move from room number n into room number n + 5. Then the
rooms 0, 1, 2, 3, 4 become vacant and can host the 5 new guests. But even if there is
a bus arriving with infinitely many guests numbered by natural numbers 0, 1, 2, 3, ...
one can create enough space. One just asks each guest in room number n to move in
room number 2n. Then only the even room number are occupied and all the newly
arriving guests can move into the rooms with odd room numbers. One can even
continue this game if there are infinitely many buses arriving, one for each natural
number 0, 1, 2, 3, ..., but we do not go into this here. We close this section with a
number of further examples.
Example 5.6 Let X, Y and Z be sets.
1. |2X | = |{0, 1}X |, i.e. the power set 2X of X and the set {0, 1}X of functions
f : X {0, 1} have the same cardinality, which follows from Theorem 4.45.
2. |(Z Y )X | = |Z XY |, which follows from Theorem 4.43.
3. |Y X | < |2XY |, if X and Y both have at least two elements, which follows
from Problem 4.25.

Problems
5.1 Let X1 , X2 , Y1 and Y2 be sets. Prove the following:
1. (|X1 | |X2 | and |Y1 | |Y2 |) = |X1 t Y1 | |X2 t X2 |,
2. (|X1 | |X2 | and |Y1 | |Y2 |) = |X1 Y1 | |X2 X2 |,
3. |X1 | |X2 | = |2X1 | |2X2 |,
4. (|X1 | = |X2 | and |Y1 | |Y2 |) = |Y1X1 | |Y2X2 |,
5. |X1 | |X2 | = |X1 !| |X2 !|.
5.2 Let X and Y be sets and let x X. Prove that |{x} Y | = |Y |.
5.3 Let X and Y be sets. Prove that the following map is bijective:

(i, z) if z X Y
F : X t Y (X Y ) t (X Y ), (i, z) 7
(1, z) if z (X Y ) \ (X Y )
Conclude that |X t Y | = |(X Y ) t (X Y )|. Prove also |X Y | |X Y | |X t Y |.

91

5.

5.2

Cardinality

The Theorem of Schr


oder-Bernstein

An obvious question about cardinality is whether |X| |Y | and |Y | |X| together


imply |X| = |Y |? It turns out that the positive answer to this question requires a
proof that is somewhat more complicated. In fact, Cantor already proved this result
with a somewhat simpler proof using the Axiom of Choice. We will see a different
proof without any usage of the Axiom of Choice. This is perhaps the most difficult
theorem that we have seen here so far. We first state and prove a preparatory result
that is somewhat surprising.
Proposition 5.7 Let X and Y be sets and let f : X Y and g : Y X be
injective functions. Then there exists a set A X such that
g(Y \ f (A)) = X \ A.
Proof. We consider the following function
F : 2X 2X , B 7 X \ g(Y \ f (B)).
First we prove that this function F is monotone, this means
B A = F (B) F (A)
for all sets A, B X. If B A, then f (B) f (A) and hence Y \ f (A) Y \ f (B).
This implies g(Y \ f (A)) g(Y \ f (B)) and hence X \ g(Y \ f (B)) X \ g(Y \ f (A)).
But this means F (B) F (A). This finishes the proof that F is monotone. Next we
consider the set
M := {B 2X : B F (B)}.
Since M we obtain M =
6 and hence we can define
[
A :=
B.
BM

We claim that this set satisfies the claim. In order to prove this, it suffices to show
that A is a fixed point of F , i.e.
A = F (A),
since this implies A = F (A) = X \ g(Y \ f (A)), which implies X \ A = g(Y \ f (A))
and this is the claim. In order to prove A = F (A), we firstly note that B A for
all B M and hence monotonicity of F yields
!
[
F (B) F (A) = F
B ,
BM

which implies
!
[
BM

92

F (B) F

[
BM

5.2. The Theorem of Schroder-Bernstein


Since B F (B) for all B M, we obtain
!
A=

BM

F (B) F

BM

= F (A).

BM

Since F is monotone, this implies F (A) F (F (A)) and hence F (A) M. Hence
[
F (A)
B = A.
BM

Altogether, this proves A = F (A), i.e. A is a fixed point of F . This finishes the
proof.
2
The diagram in Figure 5.1 illustrates the claim of the previous proposition. Now
#

#
f

X\A

f (A)

Y \ f (A)

y
g

"!
X

"!
Y

Figure 5.1: Two injections f : X Y and g : Y X.


we can prove the following theorem.
Theorem 5.8 (Schr
oder-Bernstein 1897) Let X and Y be sets. Then we obtain
|X| = |Y | (|X| |Y | and |Y | |X|).
Proof. Let X and Y be sets. If |X| = |Y |, then there exists a bijective map
h : X Y . It follows by Proposition 4.29 that the inverse h1 : Y X exists and
by Corollary 4.30 the inverse h1 is bijective too. In particular, h : X Y and
h1 : Y X are injective and hence |X| |Y | and |Y | |X|.
Now we still have to prove the inverse implication. Hence, suppose |X| |Y | and
|Y | |X|. This means that there are injective functions f : X Y and g : Y X.
We need to show |X| = |Y |, i.e. we have to construct a bijective function h : X Y .
We recall that the inverse function g 1 : range(g) X exists (see Problem 4.6). By
Proposition 5.7 there exists a subset A X such that g(Y \ f (A)) = X \ A. Now
we can define a map h : X Y by

f (x)
if x A
h(x) :=
1
g (x) if x X \ A

93

5.

Cardinality

for all x X. The diagram in Figure 5.1 illustrates the idea of the construction.
We need to prove that h is bijective.
We first prove that h is surjective. Let y Y . If y f (A), then clearly there
is an x A such that h(x) = f (x) = y. Otherwise, y Y \ f (A), but then
x := g(y) X \ A and hence h(x) = g 1 (x) = y.
Next we prove that h is injective. Therefore, let x, y X with h(x) = h(y). If x
and y are both in A, then we obtain f (x) = h(x) = h(y) = f (y) and hence x = y,
since f is injective. If x and y are both in X \A, then g 1 (x) = h(x) = h(y) = g 1 (y).
Since g 1 : range(g) X is bijective and, in particular, injective, this implies x = y.
If x and y are not both in A and not both in X \ A, then we can assume without
loss of generality x A and y X \ A. In this case h(x) = f (x) f (A) and
h(y) = g 1 (y) g 1 (X \ A) = Y \ f (A). Hence this case is impossible, since
h(x) = h(y). This finishes the proof that h is injective and hence bijective.
2
Now we can conclude that the relations on cardinality that we have studied
satisfy the following important properties.
Corollary 5.9 The following holds for all sets X, Y and Z:
1. |X| = |X|

(reflexivity)

2. |X| |Y | and |Y | |X| = |X| = |Y |

(antisymmetry)

3. |X| |Y | and |Y | |Z| = |X| |Z|

(transitivity)

The first statement clearly holds since the identity idX : X X is bijective, the
second statement is the statement of the Theorem of Schroder-Bernstein 5.8 and the
third statement holds since since the composition of two injective maps is injective
by Corollary 4.28. The above properties are basically those of an order relation
(except that the underlying class of all sets is not a set itself). We will study such
order relations later on. This particular order is even total (in a sense specified in
Definition 6.1), as the next result shows.
Theorem 5.10 (Trichotomy) The following statement is equivalent to the Axiom
of Choice. For any two sets X and Y we have |X| < |Y | or |X| = |Y | or |Y | < |X|.
The proof is beyond our scope here and we have to postpone it until later.

Problems
5.4 Prove that the following two functions are injective:
1. I : {0, 1}N NN , f 7 f ,
2. J : NN {0, 1}N , f 7 ( 1, ..., 1 , 0, 1, ..., 1 , 0, 1, ..., 1 , ...).
| {z }
| {z }
| {z }
f (0)times

94

f (1)times

f (2)times

5.3. Cantors Diagonalization Method


For the definition of I we interpret f on the left hand side as function f : N {0, 1} and on
the right-hand side we consider f as function f : N N (which is possible since {0, 1} N).
For the definition of J we have written the function g : N {0, 1} with g = J(f ) like a
sequence as an infinite tuple, i.e. the tuple contains the function values (g(0), g(1), g(2), ...).
Conclude that |2N | = |{0, 1}N | = |NN |.

5.3

Cantors Diagonalization Method

In this section we want to prove that there are sets of many different sizes. We start
with a result of Cantor that shows that the power set 2X of any set X is larger
than the set X itself. In some sense this is another instance of Russels paradox (see
Example 2.8).
Theorem 5.11 (Cantor 1892) Let X be a set. Then
|X| < |2X |.
Proof. Let X be a set. We have to prove |X| |2X | and |X| 6= |2X |. That is, it is
sufficient to show that there is an injective function f : X 2X and that there is
no injective function g : 2X X. It is easy to see that the function
f : X 2X , x 7 {x}
is injective: if x, y X with f (x) = f (y), then we obtain {x} = {y}, which implies
x = y. Now let us assume that there is an injective function g : 2X X. Then this
function has a left inverse h : X 2X by Theorem 4.49, which means h g = id2X .
Now we define the set
A := {x X : x 6 h(x)}.
Let y := g(A). Then we obtain h(y) = h g(A) = A, which implies
y A y 6 h(y) y 6 A.
This is clearly a contradiction. Hence the assumption was wrong and there cannot
be any injective function g : 2X X.
2
In Theorem 4.45 we have proved that the power set 2X has the same cardinality
as the set of function {0, 1}X . Hence we obtain the following corollary of Cantors
Theorem.
Corollary 5.12 For any set X we have |X| < |{0, 1}X |.
From Cantors Theorem 5.11 we can deduce that there are infinite sets of different
cardinality. In particular, we get the following corollary.
Corollary 5.13 We have |N| < |2N |.

95

5.

Cardinality

That is, the power set 2N of the natural numbers is larger than the set N of
natural numbers itself, with respect to cardinality. Hence, we get an infinite chain
of larger and larger infinite sets:
N

2N

|N| < |2N | < |22 | < |22 | < ...


We mention that one can also conclude that there is no universal set from Cantors
Theorem 5.11.

Problems
5.5 Show as follows that Proposition 2.9 is also a consequence of Cantors Theorem 5.11:
1. Assume that there is a universal set U that contains all sets X.
2. Show that X U implies 2X U .
3. Conclude that X U implies |2X | |U |.
4. Show that this leads to a contradiction!

5.4

The Continuum Hypothesis

Since the power set 2X of any set X is strictly larger than the set X itself, the
question arises whether there is any set of cardinality in between. For large enough
finite sets this is certainly the case. If X has two or more elements then there is a set
Y with |X| < |Y | < |2X |, namely any set Y that has exactly one element more than
X will do the job. However, it was a matter of many mathematical investigations
whether there can be such a set Y for infinite sets X as well. It is the so-called
Continuum Hypothesis that there is no such set. It comes in a generalized form
which makes a statement for any infinite set and in a basic form which makes the
statement only for the set X = N. We capture both in the following definition.
Definition 5.14 (Continuum Hypothesis) The Generalized Continuum Hypothesis is the statement that for each set X with |N| |X| there does not exist any set
Y with
|X| < |Y | < |2X |.
The (ordinary) Continuum Hypothesis is this statement for the special case X = N.
Kurt G
odel proved in 1940 that consistency of Zermelo-Fraenkel set theory implies consistency of the Zermelo-Fraenkel set theory together with the Continuum
Hypothesis. This also holds in presence of the Axiom of Choice. This implies that
the Continuum Hypothesis cannot be proved to be false using the Zermelo-Fraenkel
axioms together with the Axiom of Choice. In 1960 Paul Cohen proved that the
Continuum Hypothesis can also not be derived from the Zermelo-Fraenkel axioms,
not even in presence of the Axiom of Choice. This means that also the negation
of the Continuum Hypothesis together with the Zermelo-Fraenkel axioms and the

96

5.5. Cantors Pairing Function


Axiom of Choice is consistent. Surprisingly, the Generalized Continuum Hypothesis
implies the Axiom of Choice (although the first one is a non-existence claim and the
second one an existence statement). We state this result here without proof.
Theorem 5.15 (Sierpi
nski 1947) The Generalized Continuum Hypothesis implies
the Axiom of Choice.
Hence the Generalized Continuum Hypothesis can be considered as an even
stronger non-constructive principle than the Axiom of Choice. Unlike the Axiom of
Choice the Generalized Continuum Hypothesis is not widely accepted in mathematics. That is, any application of it has to be mentioned explicitly.
5.6 Let X, Y be sets. Prove that the Generalized Continuum Hypothesis implies
|X| < |Y | = |2X | < |2Y |.

5.5

Cantors Pairing Function

In the previous section we have seen that |N| = |2N|, i.e. cardinalitywise there are
exactly as many natural numbers as there are even numbers. Perhaps, even more
surprisingly, we will show in this section that cardinalitywise there are as many
pairs of natural numbers as there are natural number, i.e. |N N| = |N|. This
proof is due to Cantor and it is called Cantors first diagonalization. The idea
is captured in the diagram in Figure 5.2. We systematically enumerate all pairs
(n, k) N N of natural numbers in a coordinate system by moving diagonally
through this system. This enumeration yields a function f : N N N that assigns
0

14

0
1
2
3
4

3 

3 
3 
3
?

4
8
13


3 
3 
3
?

3
7
12


3 
3
?

6
11


3
?

10
?
1

Figure 5.2: Cantors pairing function


the number f (n, k) N at position (n, k) to each pair (n, k) N N. This function
f is bijective and hence it shows that cardinalitywise there are as many pairs of
natural numbers as there are natural numbers.

97

5.

Cardinality

Proposition 5.16 (Cantors pairing function) The function


1
f : N N N, (n, k) 7 (n + k)(n + k + 1) + k
2
is bijective.
Proof. We consider the following additional functions:
1. s : N N, i 7 12 i(i + 1),
2. h : N N, m 7 max{i N : s(i) m},
3. g2 : N N, m 7 m s(h(m)),
4. g1 : N N, m 7 h(m) g2 (m),
5. g : N N N, m 7 (g1 (m), g2 (m)).
Intuitively, s captures the values in the first column of the diagram in Figure 5.2
and h(m) determines the number of the row, in which the upwards diagonal starts
on which m is located. Firstly, we need to show that all the above defined functions
are correctly defined. Since s(0) = 0 and s(i) < s(i + 1) for all i N, we obtain that
the maximum h(m) actually exists for all m N. The definition of s and h imply
s(h(m)) m < s(h(m) + 1) = s(h(m)) + h(m) + 1
for all m N. The first inequality shows that m s(h(m)) is non-negative and the
second inequality implies m s(h(m)) + h(m) and hence h(m) m + s(h(m)) is
also non-negative for all m N. Hence g2 and g1 are both correctly defined.
Now we prove that f is surjective. The definition of s implies
f (n, k) = s(n + k) + k
for all n, k N and hence we obtain for all m N
f g(m) = f (g1 (m), g2 (m))
= s(g1 (m) + g2 (m)) + g2 (m)
= s(h(m) g2 (m) + g2 (m)) + m s(h(m))
= s(h(m)) + m s(h(m))
= m.
This proves that f is surjective. In fact, it also proves that the function g is a right
inverse of f . Now we prove that g is also a left inverse of f and hence f is injective.
First we note that an easy calculation shows
s(n + k + 1) s(n + k) + k

98

5.5. Cantors Pairing Function


for all n, k N. This implies
h(f (n, k)) = h(s(n + k) + k) = n + k
for all n, k N and we obtain
g2 (f (n, k)) = f (n, k) s(h(f (n, k))) = f (n, k) s(n + k) = k
for all n, k N and hence
g1 (f (n, k)) = h(f (n, k)) g2 (f (n, k)) = n + k k = n
for all n, k N. Altogether, this means
g f (n, k) = (g1 (f (n, k)), g2 (f (n, k)) = (n, k)
for all n, k N, i.e. g is a left inverse of f . This implies that f is injective.

From this proposition we obtain the following corollary.


Corollary 5.17 We have |N N| = |N|.
The same idea can be used to generalize the result to triples and ktuples of
natural numbers in general. For instance, for k = 3 we can use the bijective function
F : N3 N, (m, n, k) 7 f (m, f (n, k))
that is defined with the help of the pairing function f from Proposition 5.16. This
implies |N3 | = |N| and similarly we obtain |Nk | = |N| for all k 1. From this we
can conclude that the set of natural numbers N, the set of integers Z and the set of
rational numbers Q all have the same cardinality.
Proposition 5.18 We have |N| = |Z| = |Q|.
We have not defined the sets Z and Q formally, however, we indicate in Problem 5.7 and Problem 5.8 how this conclusion follows. Another important remark is
that the set of real number has exactly the same cardinality as the power set of N.
Proposition 5.19 We have |2N | = |R|.
Once again, we have not formally defined R here, but we indicate how to prove
this result in Problem 5.9. Finally, we mention that from the two aforementioned
propositions and the Theorem of Cantor 5.11 we get the following corollary.
Corollary 5.20 We have |N| < |R|.

99

5.

Cardinality

Problems
5.7 Prove that the following function is surjective:
f : N N Z, (n, k) 7 n k.
Provide a concrete right inverse g : Z N N of f (without using the Axiom of Choice).
Show that this implies |N| = |Z|.
5.8 Prove that the following function is surjective:
f : N N N Q, (n, k, m) 7

nk
.
m+1

Provide a concrete right inverse g : Q NNN of f (without using the Axiom of Choice).
Show that this implies |N| = |Q|.
5.9 This question requires some basic knowledge about real numbers. Prove that the following two functions are injective:
P
1. F : {0, 1}N R, f 7 n=0 f (n)3n ,
2. G : R 2QQ , x 7 {(a, b) Q Q : a < x < b}.
Show that this implies |2N | = |R|.
5.10 Prove that the map
f : 2N 2N 2NN , (A, B) A B
is bijective. Conclude that |2N 2N | = |2N |.
5.11 Show with the help of the previous problems that |R2 | = |R|.
5.12 This quesion requires some basic knowledge about complex numbers. Prove that the
following function is bijective:
f : R2 C, (a, b) 7 a + bi.
Show that this implies |C| = |R|.

5.6

Induction Principle on Natural Numbers

In the following section we want to discuss finite and infinite sets and prototypes
of such sets will be derived from the natural numbers. For this purpose we need to
clarify some further properties of natural numbers. The first property is called the
induction principle.
Proposition 5.21 (Induction principle) Let A N be a subset that satisfies the
following properties:
1. 0 A

100

(induction base)

5.6. Induction Principle on Natural Numbers


2. (n N)(n A = n + 1 A)

(induction step)

then A = N.
We cannot really prove this proposition here, since we are working with an
intuitive concept of the natural numbers. We have just defined N as the set of numbers 0, 1, 2, ... and on basis of this informal definition, the above induction principle
is just intuitively correct. The interpretation of the dots ... in the informal definition of N is just that with any number n also its successor n + 1 follows in that list.
Besides the above induction principle, there is a second principle that also follows
intuitively. This is called the recursion principle. The recursion principle allows us
to define functions inductively (or recursively) following the inductive structure
of natural numbers.
Proposition 5.22 (Recursion principle) Let X, Y be sets and let g : X Y
and h : Y X N Y be functions. Then there exists exactly one function
f : X N Y with
1. f (x, 0) := g(x),
2. f (x, n + 1) := h(f (x, n), x, n)
for all n N.
Once again, we cannot prove this result here, at least not the existence claim,
since we do not use a precise definition of N. However, we can use the induction
principle in order to derive the uniqueness claim in the recursion principle. We
formulate this as an example here that shows how to use the induction method.
Example 5.23 Let X and Y be sets and let g : X X and h : Y X N Y
be functions. Let us assume that we have two functions f : X N Y and
f 0 : X N Y that both satisfy the equations given in the Recursion Principle 5.22.
We claim that f = f 0 follows, using the induction principle. In order to show this,
we prove the following claim:
(n N)(x X)f (x, n) = f 0 (x, n).
This claim clearly implies f = f 0 . More precisely, this claim is equivalent to the
statement that the set
A := {n N : (x X)f (x, n) = f 0 (x, n)}
is equal to N. If we can show that A satisfies both requirements of the Induction
Principle 5.21, then A = N follows. We prove this now.
Induction base: n = 0. In this case clearly f (x, 0) = g(x) = f 0 (x, 0) for all x X.
That means 0 A.
Induction step: n n+1. Now we assume that n N is fixed and that for this fixed

101

5.

Cardinality

n we have (x X)f (x, n) = f 0 (x, n). This means n A, which is the so-called
induction hypothesis. We need to show n + 1 A. We obtain for all x X
f (x, n + 1) = h(f (x, n), x, n) = h(f 0 (x, n), x, n) = f 0 (x, n + 1),
where the induction hypothesis has been used in the middle equality. This now means
n + 1 A. Hence we have proved n A = n + 1 A.
This finishes the induction. Altogether, we have proved A = N and hence f = f 0 .
Hence, there can at most be one function f that satisfies the requirements of the
Recursion Principle 5.22.
The above structure of a proof by induction is typical, including the terminology
of an induction base, an induction step and an induction hypothesis. Usually, we will
not formulate the set A explicitly. The implicit understanding is that whenever we
want to prove a statement of the form (n N)P (n) with a proposition P (n),
then we can achieve this by applying the Induction Principle to the set
A := {n N : P (n)}.
The induction principle is the key idea for the so-called Peano axioms of natural
numbers. We formulate these axioms in the following definition.
Definition 5.24 (Peano model) We say that a triple (N, z, s) is a Peano model
of the natural numbers if the following holds:
1. N is a set

(natural numbers)

2. z N

(zero)

3. s : N N is an injective function with z 6 range(s)

(successor)

4. if a subset A N has the properties


a) z N

(induction base)

b) (n N )(n A = s(n) A)

(induction step)

then A = N .
Since we are not going to develop set theory axiomatically here, we will not
prove that there are Peano models of the natural numbers at all. We keep on using
our intuitive model (N, 0, s) where s : N N, n 7 n + 1 is the successor function.
The Induction Principle 5.21 essentially says that (N, 0, s) is a Peano model of the
natural numbers. We briefly sketch how one can construct a set theoretical model
of the natural numbers, namely by choosing the following sets:
0 := , 1 := 0 {0}, 2 := 1 {1}, 3 := 2 {2}, ...
The set of natural numbers corresponds then to the set N of all these sets and the
successor function corresponds to the function s : N N, n 7 n {n}. One can
actually prove in a precise way that along these lines one can construct a Peano
model of the natural numbers. But we will not work this out in detail here.

102

5.7. Finite and Countable Sets

Problems
5.13 Use the Induction Principle 5.21 in order to prove the following statement by induction:
(n N)

n
X

i=

i=0

n(n + 1)
.
2

5.14 Use the Recursion Principle 5.22 in order to prove that there exists exactly one function
f : N N such that
1. f (0) := 1,
2. f (n + 1) := f (n) (n + 1),
for all n N. This function f is called the factorial function and usually one writes n! := f (n)
for all n N.
5.15 For n, k N with k n we define the binomial coefficient
n
k
and we define

n
k

:=

n!
k!(n k)!

:= 0 for k > n. Prove Pascals rule, which states that




n+1
k+1


=

n
k


+

n
k+1

for all n, k N. Use this rule in order to show by induction that


for all n, k N.

5.7

n
k

is a natural number

Finite and Countable Sets

We have seen that many common sets are either of the same cardinality as the set
of natural numbers N or of the same cardinality as the power set 2N . In fact, most
infinite sets that commonly occur in mathematics are of one of the two corresponding
cardinalities. We introduce some related terminology. We recall that for each n N
with n 1 we denote by Nn := {1, 2, 3, ..., n} the set of the natural numbers from
1, ..., n. We define N0 := to be the empty set.
Definition 5.25 Let X be a set. Then we say that
1. X is finite if |X| = |Nn | for some n N,
2. X is infinite if X is not finite,
3. X is countable if |X| |N|,
4. X is countably infinite if |X| = |N|,
5. X is uncountable if X is not countable.

103

5.

Cardinality

We note that it follows directly from the definition that the empty set is finite,
each finite set is countable and each countably infinite set is countable. Countable
sets are sometimes also called denumerable. We give some examples.
Example 5.26 We discuss some examples of sets.
1. The empty set is finite and so are 2 and Nn for each n N.
2. The set P of prime numbers is infinite, this is exactly what we proved in Theorem 1.2.
3. The sets N, Z and Q are all countably infinite.
4. The sets 2N and R and 2R are uncountable.
Next we prove that the relation between the cardinalities of Nn and Nk can be
directly deduced from n and k.
Proposition 5.27 (Finite sets) Let n, k N. Then we obtain:
1. |Nn | |Nk | n k,
2. |Nn | = |Nk | n = k,
3. |Nn | < |Nk | n < k.
Proof.
1. Let |Nn | |Nk |. Then there is an injective map f : Nn Nk . If n > k, then
the values f (1), ..., f (n) cannot all be distinct, but one of the values 1, ..., k
must occur twice among f (1), ..., f (n). In this case f is not injective. Hence
n k. Let us now assume that n k. Then Nn Nk and hence |Nn | |Nk |
by Proposition 5.4.
2. Let |Nn | = |Nk |. This means |Nn | |Nk | and |Nk | |Nn | and hence n k
and k n by 1. Hence n = k follows. If, on the other hand, n = k, then the
identity id : Nn Nk is clearly bijective and hence |Nn | = |Nk |.
3. This follows directly from 1. and 2.
2
The previous result is the reason why one can actually consider |X| as a natural
number for finite sets X.
Definition 5.28 (Cardinality) Let X be a finite set and n N. Then we define
|X| = n : |X| = |Nn |.
In this case, the natural number n N is called the cardinality of X.

104

5.7. Finite and Countable Sets


Firstly, this quantity |X| is well-defined for finite sets, since |X| = |Nn | and
|X| = |Nk | for n, k N implies n = k by Proposition 5.27. Secondly, we have now two
ways of reading expressions like |X| |Y | for finite sets X, Y . For one, according to
the original definition this means that there is an injective map f : X Y . Secondly,
we can read this statement as inequality n k for the cardinalities n = |X| and
k = |Y |. Proposition 5.27 guarantees that these two different interpretations actually
lead to the same result, i.e. this ambiguity cannot cause any confusion. An analogous
remark holds for the interpretation of the expressions |X| = |Y | and |X| < |Y |. Next
we prove that the set N is actually infinite according to our definition of finiteness.
Proposition 5.29 The set N is infinite.
Proof. Let us assume that N is finite, i.e. that there is an n N such that |N| = |Nn |.
Then it follows that there is a bijection f : N Nn . Then among the function values f (0), f (1), ..., f (n) there must be at least one repetition, since there are only n
distinct values in Nn , but there are n + 1 function values. This is a contradiction to
the injectivity of f . Hence, N cannot be finite.
2
Now we show how the notions of finiteness and infinity behave with respect to a
change in cardinality.
Proposition 5.30 Let X and Y be sets. Then we obtain the following:
1. |X| |Y | and Y finite = X finite,
2. |X| |Y | and Y countable = X countable,
3. |X| |Y | and X infinite = Y infinite,
4. |X| |Y | and X uncountable = Y uncountable.
Proof.
1. Let |X| |Y | and let Y be finite. This means that |Y | = |Nn | for some n N
and there is an injective map f : X Nn . Then the inverse f 1 can be
considered as a function f 1 : range(f ) X and range(f ) Nn is a set that
contains exactly k values with k n. Let ni be the ith value in range(f )
in increasing order n1 < n2 < ... < nk and let g : Nk X be defined by
g(k) := f 1 (nk ). Then g is bijective and hence |X| = |Nk | is finite.
2. This follows directly from the definition by transitivity.
3. and 4. This follows from 1. and 2. by contraposition.
2
In particular, we obtain that any subset of a finite set is finite and any subset
of a countable set is countable. Now we prove that for finite sets a situation like
|2N| = |N| for the natural numbers cannot occur.

105

5.

Cardinality

Proposition 5.31 Let X be a finite set. Then the following hold:


1. any injective map f : X X is bijective,
2. any surjective map f : X X is bijective.
Proof. Let X be finite. Then |X| = |Nn | for some n N. Then there is a bijective
map h : X Nn .
1. If f : X X is injective, then g := h f h1 : Nn Nn is also injective and
hence all the values g(1), ..., g(n) Nn are distinct. Since there are exactly n
distinct values in Nn , this means that g is surjective. But then h1 g h = f is
surjective too. See the diagram in Figure 5.3 for an illustration of the situation.
2. We leave this proof to the reader (see Problem 5.17).
2
f
X

-X

?
Nn

?
- Nn

Figure 5.3: Commutative diagram for g h = h f .


In the next section we will see that the conditions given in this result also imply
finiteness, provided one accepts the Axiom of Choice. We close this section with
some additional quantifiers that are related to finiteness.
Definition 5.32 (Infinitely many for almost all) Let X be a set and P (x) a
predicate that depends on x X. Then we define
1. ( x X) P (x) : {x X : P (x)} is finite,
2. ( x X) P (x) : {x X : P (x)} is infinite.
In the first case, we say that P (x) holds for infinitely many x X and in the the
second case we say that P (x) holds for almost all x X.
Here almost all means for all but finitely many. In probability theory, measure theory and topology the term for almost all is sometimes used with a different
meaning (with different concepts of size). We note that we get directly from the

106

5.8. Dedekind Infinite Sets


definition the following version of de Morgans law (and the corresponding statement
with the quantifiers swapped):
( x X) P (x) ( x X) P (x).

Problems
5.16 Let f : X Y be a function and let A X and B Y . Then we obtain:
1. A finite = f (A) finite,
2. A countable = f (A) countable,
3. B infinite and f surjective = f 1 (B) infinite,
4. B uncountable and f surjective = f 1 (B) uncountable.
5.17 Let X be a finite set. Prove that any surjective map f : X X is bijective.

5.8

Dedekind Infinite Sets

Even before Cantor defined finite sets in terms of Nn and infinite sets as sets that
are not finite, Richard Dedekind already suggested a concept of infinity that does
not refer to the natural numbers but that uses an intrinsic property of finite sets
to characterize them. This concept leads to further important characterizations of
finite and infinite sets (which, however, require the Axiom of Choice).
Definition 5.33 (Dedekind infinite sets) A set X is called Dedekind infinite if
and only if there is a proper subset A $ X such that |A| = |X|.
We give an example.
Example 5.34 The set N is Dedekind infinite, since |N| = |2N| and 2N $ N. The
set P of prime number is also Dedekind infinite (see Problem 5.18).
In the following theorem we collect a number of conditions that are equivalent
to Dedekind infiniteness.
Theorem 5.35 (Dedekind infinite sets) Let X be a set. Then the following are
equivalent:
1. X is Dedekind infinite,
2. there is a function f : X X which is injective but not bijective,
3. |N| |X|,
4. X has a countably infinite subset.

107

5.

Cardinality

Proof. 1.=2. Let X be Dedekind infinite. Then there is a proper subset


A $ X such that |A| = |X|, i.e. there is a bijective map g : A X. Then the
inverse g 1 : X A is bijective as well and the function f : X X, x 7 g 1 (x) is
injective, but not surjective, since range(f ) = A $ X.
2.=3. Let us now assume that there is a function f : X X that is injective,
but not surjective. We inductively define an injective function h : N X. Since
f is not surjective, there is some x0 X \ range(f ). Now we assume that we have
defined xn for each n N and we use xn in order to define
xn+1 := f (xn )
for each n N. We claim that h : N X, n 7 xn is injective. We prove by
induction that
(n N)(i < n) xi 6= xn .
Induction base: n = 0. In this case nothing is to be proved.
Induction step n n + 1. We assume we have some fixed n N such that (i <
n) xi 6= xn holds. We need to shows this statement for n + 1. By injectivity of f ,
this implies (i < n) f (xi ) 6= f (xn ). But this implies (i < n) xi+1 6= xn+1 . Since
xn+1 range(f ) and x0 6 range(f ), it is also clear that x0 6= xn+1 . Altogether, this
means (i < n + 1) xi 6= xn+1 . But this is the claim for n + 1. Altogether, this
proves that h is injective.
3.=4. Let |N| |X|. Then there is an injective function h : N X and hence
f : N range(h), x 7 h(x) is bijective and hence |N| = |range(h)|. Hence range(h)
is a countably infinite subset of X.
4.=1. Let D X be a countably infinite set. Then there exists a bijective
function h : N D. Let B := h(2N), C := X \ D and A := B C. We consider the
inverse function h1 : D N. Since h is injective, it is clear that B $ D and hence
A $ X. We define a function g : A X by

g(x) :=

h( 21 h1 (x)) if x B
x
if x C

for all x A. Then g(B) = D and g(C) = C, i.e. range(g) = X, i.e. g is


surjective. Moreover, g is also injective: if x, y B, then g(x) = g(y) implies
h( 21 h1 (x)) = h( 12 h1 (y)), which in turn implies x = y; if x, y C, then clearly
g(x) = g(y) implies x = y; if x B and y C, then g(x) D and g(y) C, hence
g(x) = g(y) is not possible in this case. Altogether, this shows that g is bijective. 2
It is easy to see that each Dedekind infinite set is infinite. This follows from
Proposition 5.31. The reverse implication requires the Axiom of Choice. We first
prove the following proposition.
Proposition 5.36 It follows from the Axiom of Choice that any infinite set X
contains a countably infinite set A X.

108

5.8. Dedekind Infinite Sets


Proof. Let X be infinite. By the Axiom of Choice there exists a choice function
CX : 2X \ {} X such that CX (A) A for each non-empty A X. We define a
function f : N X inductively by
f (0) := CX (X)
f (n + 1) := CX (X \ f ({0, ..., n}))
for all n N. This function is injective and hence it proves |N| |X|. By Proposition 5.31 this means that X is Dedekind infinite.
2
Now we obtain the following characterization of infinite sets.
Theorem 5.37 (Infinite sets) Let X be set. It follows from the Axiom of Choice
that the following are equivalent:
1. X is infinite,
2. X is Dedekind infinite.
Proof. Let X be a set. If X is Dedekind infinite, then by Theorem 5.35 it follows
that there is an injection f : X X that is not surjective. Hence X cannot be
finite according to Proposition 5.31. Hence X is infinite.
For the other direction, let X be infinite. Then X contains a countably infinite subset A X by Proposition 5.36. This implies by Theorem 5.35 that X is
Dedekind infinite.
2
If we accept the Axiom of Choice, then we also get the following characterization
of finite sets.
Theorem 5.38 (Finite sets) Let X be a set. It follows from the Axiom of Choice
that the following are equivalent:
1. X is finite,
2. there is no proper subset A $ X with |A| = |X|,
3. any injective function f : X X is bijective,
4. any surjective function g : X X is bijective,
5. |X| < |N|.
Proof. Let us assume the Axiom of Choice. It follows from Theorem 5.37 that X is
finite if and only if X is not Dedekind infinite, hence finiteness of X is equivalent to
the negations of the conditions of Theorem 5.35, i.e. to 2., 3. and 5. That |X| < |N|
is the negation of |N| |X| follows from the Trichotomy Theorem 5.10. In order to
complete the proof, it suffices to show that 4. is equivalent to the other statements. It
follows from Proposition 5.31 that any finite X satisfies 4. Let us hence assume that

109

5.

Cardinality

4. holds. We prove that 3. follows. Let hence f : X X be injective. Then f has


a left inverse g : X X, i.e. g f = idX . Such a left inverse g is necessarily surjective and hence bijective by 4. Thus, g 1 = g 1 g f = f and hence f is bijective. 2
We mention that it also follows from the Axiom of Choice that a set is countably
infinite if and only if it is countable and infinite.

Problems
5.18 Prove that the set of prime numbers P is Dedekind infinite, without using the Axiom
of Choice, directly by going back the Theorem of Euclid 1.2 (see also Problem 1.2).
5.19 Let A N. Prove that it follows from the Axiom of Choice that the following are
equivalent:
1. A is infinite,
2. (n N)(k N)(k n and k A).
5.20 Let X be a set. Prove that the following are equivalent (without the Axiom of Choice):
1. there exists a function f : X X which is surjective but not injective,
2. there exists a surjective map g : X Y with a countably infinite set Y ,
3. the power set 2X is Dedekind infinite.
5.21 Let X be a set. Prove that it follows from the Axiom of Choice that the following are
equivalent:
1. X is countably infinite,
2. X countable and infinite.

5.9

Cardinality and Set Constructions

In this section we investigate how set constructions affect the cardinality of sets. We
will only treat finite and countable sets here. The first observation is that all finite
set constructions that we have considered, preserve finiteness. That means that a
finite union, intersection, product, or the power set or function set of finite sets is
finite again. We can say more than this, we can determine formulas that allow to
compute the size of the resulting sets, if we know the size of the original sets. We
assume that k 0 = 1 for all k N.
Theorem 5.39 (Constructions on finite sets) Let X and Y be finite sets. Then
we obtain
1. |X Y | + |X Y | = |X t Y | = |X| + |Y |,
2. |X Y | = |X| |Y |,

110

5.9. Cardinality and Set Constructions


3. |Y X | = |Y ||X| ,
4. |X!| = |X|!,
5. |2X | = 2|X| .
In particular, the sets X Y , X Y , X Y , Y X , X! and 2X are finite as well.
Proof.
1. Before we start, we prove the extra claim that for any finite set Z and x 6 Z
we obtain |Z {x}| = |Z| + 1. Since Z is finite, there is some m N such
that |Z| = |Nm | and hence there is some bijection h : Z Nm . Hence we get
a bijection

h(z)
if z Z
H : Z {x} Nm+1 , z 7
m + 1 otherwise
This proves the claim, i.e. |Z {x}| = |Z| + 1. Since |X| = |Nn | and |Y | = |Nk |
implies |X t Y | = |Nn t Nk | (see Problem 5.1), it suffices to show |Nn t Nk | =
n + k for all n, k N in order to prove |X t Y | = |X| + |Y |. We prove this
claim by induction on n N.
Induction base: n = 0. In this case Nn tNk = ({1})({2}Nk ) = {2}Nk
and hence |Nn t Nk | = |{2} Nk | = k for all k N by Problem 5.2.
Induction step: n n + 1. We assume that we have a fixed n N with
|Nn t Nk | = n + k for all k N. We consider the claim for n + 1 and we obtain
|Nn+1 t Nk | = |({1} (Nn {n + 1})) ({2} Nk )|
= |(Nn t Nk ) {(1, n + 1)}|
= n + k + 1,
where the last equality follows from the induction hypothesis and the extra
claim we proved first since (1, n + 1) 6 (Nn t Nk ). This finishes the induction.
We still have to prove |X t Y | = |X Y | + |X Y |. Firstly, we note that X Y
and X Y are finite, since |X Y | |X Y | |X t Y | holds (see Problem 5.3.
Hence, the claim follows since |X tY | = |(X Y )t(X Y )| = |X Y |+|X Y |
by Problem 5.3 and by what we proved before.
2. Since |X| = |Nn | and |Y | = |Nk | implies |X Y | = |Nn Nk | (see Problem 5.1),
it suffices to show |Nn Nk | = n k for all n, k N. We prove this by induction
on n N.
Induction base: n = 0. In this case N0 Nk = Nk = for all k N and
hence |N0 Nk | = || = 0 = 0 k.
Induction step: n n + 1. We assume that n N is some fixed number such
that |Nn Nk | = n k for all k N. We now consider the statement for n + 1.
We obtain
Nn+1 Nk = (Nn {n + 1}) Nk = (Nn Nk ) ({n + 1} Nk )

111

5.

Cardinality
where (Nn Nk )({n+1}Nk ) = . Hence we obtain with 1. and Problem 5.2
|Nn+1 Nk | = |Nn Nk | + |{n + 1} Nk | = n k + k = (n + 1) k.
n
3. Since |X| = |Nn | and |Y | = |Nk | implies |Y X | = |NN
k | (see Problem 5.1), it
Nn
n
suffices to show |Nk | = k for all n, k N. We prove this by induction on
n N.
Induction base: n = 0. Then N0 = and there is only one function f : Nk ,
0
0
i.e. |NN
k | = 1 = k for all k N.
Induction step: n n + 1. We assume that we have a fixed n N with
n
n
|NN
k | = k for all k N. Now we consider the map

n
F : Nk n+1 NN
k Nk , f 7 (f |Nn , f (n + 1)),

where f |Nn : Nn Nk denotes the restriction of f : Nn+1 Nk to Nn . It is


not too difficult to see that F is actually bijective, which implies
Nn
n
n+1
n
|Nk n+1 | = |NN
k Nk | = |Nk | |Nk | = k k = k
N

by induction hypothesis and by 2.


4. Since |X| = |Nn | implies |X!| = |Nn !| (see Problem 5.1), it suffices to show
|Nn !| = n! for all n N. We prove this by induction on n N.
Induction base: n = 0. Then |N0 !| = |!| = 1 = 0!, since there is exactly one
bijective function f : .
Induction step: n n + 1. We assume that we have a fixed n N with
|Nn !| = n!. We consider the map
F : Nn+1 ! Nn ! Nn+1 , f 7 (f 0 , f (n + 1)),
where f 0 : Nn Nn is the function defined by

f (k)
if f (k) Nn
0
f (k) :=
f (n + 1) if f (k) = n + 1
for all k N and f : Nn+1 Nn+1 . Since f 0 is bijective for any bijective f , it
is clear that F is well-defined. Moreover, since F is bijective, we obtain
|Nn+1 !| = |Nn ! Nn+1 | = |Nn !| |Nn+1 | = n! (n + 1) = (n + 1)!
by induction hypothesis and by 2.
5. By Theorem 4.45 we have |2X | = |{0, 1}X |. Hence, we can conclude with 3.
|2X | = |{0, 1}X | = |{0, 1}||X| = 2|X| .

112

5.9. Cardinality and Set Constructions


2
This result explains one motivation for the notations for products, the power set,
the set of functions and the set of bijective functions. These notations are inspired by
the correspond operations on natural numbers that correspond to the cardinalities
of these sets in case of finite sets. For infinite sets, these notations still make some
sense in cardinal arithmetic, but we are not going to discuss this here. Countable
sets are less well-behaved with respect to set constructions. We give some examples
of operations that do not preserve countability.
Example 5.40 By definition N is countable.
1. 2N and NN are uncountable, hence power sets 2X and function sets Y X are not
countable in general for countable X, Y .
2.

= NN is uncountable, hence products


general for countable I and Xi .
iN N

iI

Xi are not countable in

3. N! is uncountable (see Problem 5.23), hence X! is not countable in general for


countable X.
However, we can say at least something positive on set operations that preserve
countability.
Proposition 5.41 Let X and Y be countable and let (Xi )iI be a countable family
of countable sets, i.e. I is countable and Xi is countable for all i I. Then we
obtain:
1. X Y and X Y are countable,
2.

iI

Xi and

iI

Xi are countable,

3. X Y and X n are countable for each n N.


Proof. That X, Y and Xi are countable for all i I means |X| |N|, |Y | |N| and
|Xi | |N| for all i I. Hence there are injective functions f : X N, g : Y N
and fi : Xi N for all i I.
1. The function

F : X Y N, z 7

2f (z)
if z X
2g(z) + 1 if z Y \ X

is injective. Hence |X Y | |N| and X Y is countable. Moreover, X Y


X Y and hence |X Y | |X Y | |N| and X Y is countable too.

113

5.

Cardinality
2. It is sufficient to prove the statement for I = Nn with n N and for I = N.
The case of a general I can be reduced to either of these cases. The case
I = Nn with n N follows from 1. by induction over n N. We only discuss
the case I = N here. We consider the function
[
m:
Xi N, x 7 min{i N : x Xi }
iN

S
that determines for each x iN Xi the smallest index i = m(x) N such
that x Xi . Using this function, we define
[
F :
Xi N N, x 7 (fm(x) (x), m(x)).
iN

S
This function F is injective and
S hence | iN Xi | |N N| = |N|Tby Corollary
S 5.17. This means
T that iN
S Xi is countable. Moreover, iN
T Xi
iN Xi and hence | iN Xi | | iN Xi | |N| and this means that
iN Xi
is countable as well.
3. The function
f g : X Y N N, (x, y) 7 (f (x), g(y))
is injective (see Problem 4.8). Hence |X Y | |NN| = |N| by Corollary 5.17.
This means that X Y is countable. It follows that X n is countable for all
n N by an easy induction on n.
2
Other operations that are well-behaved with respect to countability are the disjoint union (see Problem 5.24) and the Kleene star operation (see Problem 5.25).

Problems
5.22 Let X and Y be finite sets. Prove that
1. |X \ Y | = |X| |X Y |,
2. |X \ Y | = |X| |Y | if Y X,
3. |XY | = |X Y | |X Y |.
5.23 Prove that the set N! of bijections f : N N is not countable.
5.24 Let (Xi )iN be a sequence
F of countable sets, i.e. each Xi is countable for all i N.
Prove that the disjoint union iN Xi is countable.
5.25 Let X be a set. We consider the Kleene operation X (i.e. the set of all finite words
over X). Prove the following:
1. {0, 1} is countably infinite,

114

5.9. Cardinality and Set Constructions


2. X is countable for each countable X.
5.26 For any set X we define the set
F(X) := {A X : A finite}
of finite subsets of X. Prove that
1. X finite = F(X) finite,
2. X countable = F(X) countable.
Prove that the following are equivalent:
1. |X| = |F(X)| for any infinite set,
2. the Axiom of Choice.
5.27 For any two sets X and Y we define the set
 
X
:= {A X : |A| = |Y |}
Y
of subsets of X with exactly the same cardinality as Y . Prove that for finite sets X and Y
we obtain
  

X

= |X| ,
Y
|Y |
where the right hand side uses the binomial coefficient as defined in Problem 5.15.
5.28 Let X and Y be finite sets with k = |X| and n = |Y |. Prove that there are
many injective functions f : X Y .

n!
(nk)!

5.29 Let X be a set. Prove that the following are equivalent, without using the Axiom of
Choice:
1. X is infinite,
X

2. 22

is Dedekind infinite.
X

Hint: Consider the function f : N 22 , n 7 {A X : |A| = n}.


5.30 Prove that the following are equivalent:
1. For all sets X, Y we obtain
|X| < |X Y | and |Y | < |X Y | = X Y finite,
2. the Axiom of Choice.

115

CHAPTER

Order
The mathematical sciences particularly exhibit order, symmetry, and limitation;
and these are the greatest forms of the beautiful.
Aristotle (384322 BC)

6.1

What is Order?

So far we have studied mainly uniqueness and totality properties of relations and
their consequences. The concepts of left and right totality and uniqueness are the
building blocks that we have used to define functions, injections, surjections and
bijections. These building blocks have also led to the concept of cardinality. Now
we want to focus on homogeneous relations that extend the identity relation. Such
relations can be used to order mathematical objects and to identify them according
to specific properties. Often there are different properties that we can use to identify
or order objects. For instance, the relation on natural numbers orders the natural
numbers according to their appearance in the sequence 0, 1, 2, 3, ... (one could also
say that this is an additive property), whereas the divisibility relation | orders natural
numbers according to their multiplicative properties. These two ways of ordering
natural numbers focus on different properties and they are not identical. However,
they share certain properties as we will see. As another example, the relation
orders sets according to the containment of elements and the relation |X| |Y |
orders sets according to their cardinality. As we have seen, these different types of
ordering sets are not identical, but again they share certain properties. In a first
step we will identify the relevant properties that certain types of order relations have
in common.

117

6.

Order

6.2

Reflexivity, Symmetry and Transitivity

We recall that we call a relation R X X a relation on X. Such relations are also


called homogeneous, since source and target set are identical. Similarly, as totality
and uniqueness are the building blocks of our study of functions, the concepts of
reflexivity, symmetry and transitivity and variants thereof are the building blocks
of our study of order relations.
Definition 6.1 (Reflexivity, symmetry and transitivity) Let R X X be
a relation. Then
1. R is called reflexive, if xRx holds for all x X.
2. R is called irreflexive, if xRx holds for no x X.
3. R is called symmetric, if xRy implies yRx for all x, y X.
4. R is called antisymmetric, if xRy and yRx implies x = y for all x, y X.
5. R is called transitive, if xRy and yRz implies xRz for all x, y, z X.
6. R is called total, if xRy or yRx holds for all x, y X.
The reader should be warned that the concept of totality mentioned here is not
the same that we studied before. A relation that is total in the sense defined here is
left total and right total, but not any left and right total relation is necessarily total
in the sense specified here. We have already seen a number of relations with some
of the properties listed in the previous definition.
Example 6.2 In the following table we list the empty relation N N, the all
relation N N, the relations =, 6=, , < and divisibility | all considered on N and
the set relations , $ and 6 all on 2N .

reflexive
irreflexive
symmetric
antisymmetric
transitive
total

NN

6=

<

+
+
+
+

+
+

+
+
+

+
+

+
+
+

+
+

+
+

+
+

+
+

A plus + in the table means that the relation has the corresponding property, a
minus means that it does not have the property.
Symmetry is quite a restrictive property for relations. The equality relation,
the empty relation and the all relation are all unique symmetric relations in some
sense. For instance, equality is the only relation on a given set that is reflexive,

118

6.2. Reflexivity, Symmetry and Transitivity

homogeneous relation
irreflexive

reflexive

9
irreflexive relation

reflexive relation
symmetric 

transitive

?
strict order




transitive






compatibility relation

preorder
antisymmetric

symmetric

transitive

j
equivalence relation

partial order

antisymmetric

symmetric

j

total
j

equality relation

linear order

Figure 6.1: Some common types of homogeneous relations R X X.

symmetric and antisymmetric (see Problem 6.3). It is interesting to point out that
the properties of homogeneous relations defined here can be characterized in terms
of composition and inversion and without mentioning points. We formulate a corresponding result.
Proposition 6.3 Let R X X be a relation. Then the following hold:
1. R is reflexive if and only if X R.
2. R is irreflexive if and only if R X = .
3. R is symmetric if and only if R = R1 .
4. R is antisymmetric if and only if R R1 X .
5. R is transitive if and only if R R R.
6. R is total if and only if X X R R1 .
We leave the proof to the reader (see Problem 6.1). The diagram in Figure 6.1
shows how the building blocks of reflexivity, symmetry and transitivity can be used
to define certain common types of order relations. Those types of relations that are
highlighted in bold face are the most common ones in mathematics. We will, in
particular, focus on the right hand side of the diagram and study those relations
that contain the equality relation (i.e. the reflexive relations).

119

6.

Order

Problems
6.1 Let R X X be a relation. Prove all the statements in Proposition 6.3.
6.2 Let X be a set. Prove the following
1. The equality relation X XX is the only relation on X that is reflexive, symmetric
and antisymmetric.
2. The empty relation X X is the only relation on X that is irreflexive, symmetric
and antisymmetric.
3. The all relation X X is the only relation on X that is symmetric and total.
6.3 Let R X X be a relation.
1. We define the reflexive closure of R by R= := X R. Prove the following:
a) R= is a reflexive and symmetric relation.
b) If S X X is reflexive and R S, then R= S.
T
c) R= = {S X X : S reflexive and R S}.
d) If R is reflexive, then R = R= .
2. We define the reflexive and symmetric closure of R by R := X R R1 . Prove the
following:
a) R is a reflexive and symmetric relation.
b) If S X X is reflexive and symmetric and R S, then R S.
T
c) R = {S X X : S reflexive and symmetric and R S}.
d) If R is reflexive and symmetric, then R = R.
S
3. We define the transitive closure of R by R+ := n=1 Rn . Here Rn stands for the
nfold composition of R with itself. Prove the following:
a) R+ is a transitive relation.
b) If S X X is transitive and R S, then R+ S.
T
c) R+ = {S X X : S transitive and R S}.
d) If R is transitive, then R = R+ .
6.4 Let X be a finite set with n = |X| elements. Prove the following
2

1. There are exactly 2n relations R X X.


2. There are exactly 2n

120

reflexive relations R X X.

6.3. Equivalence Relations

6.3

Equivalence Relations

Often one needs to identify some mathematical objects that share certain properties.
Equivalence relations are a tool to express such identifications.
Definition 6.4 (Equivalence relation) Let be a relation on X. Then is
called an equivalence relation on X, if is reflexive, symmetric and transitive.
Perhaps the most basic example of an equivalence relation is the equality relation
= on an arbitrary set X. Obviously, we use the equality to identify objects. However,
in general we can also identify objects which are not equal.
Example 6.5 We mention a few examples of equivalence relations.
1. Let X be a set with the diagonal X X X. The diagonal can be seen as
equality relation on X and this relation is an equivalence relation.
2. Let X be a set with the all relation X X. This relation is also an equivalence
relation on X.
3. Let S be a set of sets. We define the equinumerosity relation S S by
X Y : |X| = |Y |
for all X, Y S. By Corollary 5.9 this relation is an equivalence relation.
4. Let n N be fixed. For integers x, y Z we define the relation
x n y : n divides |x y|.
In this case x is called congruent to y modulo n. The relation n is an
equivalence relation (see Problem 6.9).
5. Let f : X Y be a function. Then we define the fiber relation f X X
of f by
x f y : f (x) = f (y)
for all x, y X. This relation is an equivalence relation. The fiber relation
f is also called the equivalence kernel of f .
We will study the fiber relation f of functions somewhat more below, since it
is a particularly important equivalence relation. Since the purpose of equivalence
relations is to identify objects, we need a tool to combine those objects that we
identify. By [x] we denote the equivalence class of x, which is the set of all those
objects that are identified with x with respect to some given equivalence relation.

121

6.

Order

Definition 6.6 (Equivalence classes) Let be an equivalence relation on X.


Then we define the equivalence class of x X by
[x] := {y X : y x}.
The set
X/ := {[x] : x X}
of all equivalence classes is called the quotient of X by .
Any equivalence class [x] can be seen as a cluster of objects in X, namely the
cluster of those objects that we identify. The quotient X/ is then the coarsening
of X that we obtain if we replace points by clusters. In fact, the quotient X/
yields a partition of X and vice versa any partition of X defines an equivalence class
whose quotient is the original partition (see Problem 6.8 for details). We give an
example.
Example 6.7 Let X = {1, 2, 3} and let := {(1, 1), (1, 2), (2, 1), (2, 2), (3, 3)}
X X. Then is an equivalence relation and X/ = {[1], [3]}, where [1] = [2].

1


3n
X/

Figure 6.2: Quotient X/ .

The map in the following definition assigns to each point its equivalence class.
Definition 6.8 (Canonical projection) Let be an equivalence relation on a
set X. Then
p : X X/ , x 7 [x]
is called the canonical projection of the equivalence relation .
It is easy to see that p is surjective and that the fiber relation of p is just
(see Problem 6.6). In particular, this shows that any equivalence relation on X
is the fiber relation of some function f : X Y for some suitable Y . Now we prove
that the canonical projection and the fiber relation can be used to decompose any
function into an injective, a bijective and a surjective function.
Theorem 6.9 (Canonical decomposition) Let f : X Y be a function. Then
f can be decomposed into an injective function i, a bijective function b and a surjective function p, i.e.
f = i b p.

122

6.3. Equivalence Relations


This decomposition can be obtained with the fiber relation f and the following
selection of functions:
1. p : X X/ f , x 7 [x] is the canonical projection of f ,
2. b : X/ f range(f ), [x] 7 f (x),
3. i : range(f ) X, x 7 x is the restriction of the identity to range(f ).
Proof. It is clear that p is surjective (see Problem 6.6) and that i is injective (since
it is a restriction of the injective identity). We need to prove that b is well-defined
and bijective. We obtain for all x, y X
[x] = [y] x f y f (x) = f (y) b([x]) = b([y]).
The forwards direction = shows that b is well-defined (i.e. right unique) and the
backwards direction = shows that b is injective (i.e. left unique). Moreover, b
is also surjective sine for any y range(f ) there is some x X with f (x) = y and
hence b([x]) = f (x) = y. Now we still need to prove f = i b p. We obtain
i b p(x) = i b([x]) = i(f (x)) = f (x)
2

for all x X and hence f = i b p.

The composition f = i b p of f is called the canonical decomposition of f . The


commutative diagram in Figure 5.3 illustrates the situation for a general function
f : X Y whereas the set diagram next to it illustrates the special case of a
particular function f : {1, 2, 3} {1, 2, 3}.
X

-Y
6

- range(f )

X/ f

Y
f

-u
6u
-u
6


?
u
u ?
u

n

-u
-u
b

X/ f

range(f )

Figure 6.3: Canonical decomposition f = i b p

123

6.

Order

Problems
6.5 Let X X be an equivalence relation. Then the following three statements are
equivalent to each other for all x, y X:
1. x y,
2. [x] = [y],
3. [x] [y] 6= .
6.6 Let be an equivalence relation on X with canonical projection p : X X/ . Prove
the following:
1. p is surjective,
2. is the same relation as p .
Here p denotes the fiber relation of p.
6.7 Let X be a set and R X X a relation on X. We define the equivalence closure of
R by R := (R)+ (see Problem 6.3). Prove the following:
1. R is an equivalence relation.
2. If S X X is an equivalence relation and R S, then R S.
T
3. R = {S X X : S is an equivalence relation and R S}.
4. If R is an equivalence relation, then R = R .
6.8 Let X be a set. Then P 2X is called a partition of X, if the following hold:
1. 6 P
S
2.
P =X

(non-emptiness)
(cover)

3. A 6= B = A B = for all A, B P

(disjointness)

Each element C P is called a cell of the partition P . Prove the following:


S
1. If P is a partition of X, then CP C C is an equivalence relation P on X.
Moreover, X/ P is identical to P .
2. If is an equivalence relation on X, then X/ is a partition P of X such that P
is identical with .
6.9 Let n N be fixed. For integers x, y Z we define the relation
x n y : n divides |x y|.
In this case x is called congruent to y modulo n. This is often denoted by x y (mod n)
instead of x n y. Prove the following:
1. n is an equivalence relation,
2. 0 is the equality relation Z ,
3. [x] = {x + zn : z Z} for all x Z,
4. Z/ n = {[0], [1], ..., [n 1]} for n > 0,

124

6.4. Preorders, Partial Orders and Linear Orders


5. |Z/ n | = n for n > 0 and |Z/ 0 | = |Z|.
6.10 Let be an equivalence relation on a set X. Prove that it follows from the Axiom of
Choice that |X/ | |X|.
6.11 Let X be a finite set with n = |X| elements. Prove that there are exactly Bn equivalence relations on X, where Bn is the nth Bell number, defined by B0 := 1 and
Bn+1 :=

n  
X
n
k=0

Bk

for all n N.

6.4

Preorders, Partial Orders and Linear Orders

The minimal requirements that an order relation should satisfy are reflexivity and
transitivity. The properties of antisymmetry and totality are additional properties
that are considered.
Definition 6.10 (Order) Let be a relation on X.
1. is called a preorder, if it is reflexive and transitive.
2. is called a partial order if it is reflexive, antisymmetric and transitive.
3. is called a linear order if is reflexive, antisymmetric, transitive and total.
The pair (X, ) is called a preordered set, a partially ordered set or a linearly ordered
set if is a preorder, a partial order or a linear order on X, respectively.
Preorders are sometimes also called quasiorders and linear orders are also called
total orders.
Example 6.11 We provide a number of examples of order relations.
1. The less or equal relation on N is a linear order.
2. The divisibility relation | on N is a partial order, but not a linear order.
3. The divisibility relation | on Z, defined by
x|y : (z Z) x z = y
for all x, y Z, is a preorder, but not a partial order.
4. The prefix relation v on X for some set X, defined by
(n, u1 , ..., un ) v (k, v1 , ..., vk ) : (n k and (i n) ui = vi )
for all n, k N and u1 , ..., un , v1 , ..., vk X, is a partial order, but not a linear
order in general.

125

6.

Order
5. The subset relation on the power set 2X of some set X is a partial order,
but not a linear order in general (see Problem 6.12).
6. The relation  on a set of sets S, defined by
X  Y : |X| |Y |
for all X, Y S is a preorder, but not a partial order in general. It follows
from the Axiom of Choice that it is total (see Theorem 5.10).

The graphs in Figure 6.4 and 6.8 illustrate the partially ordered spaces (N, ),
(N, |) and (2N , ). Such a graph is called a Hasse diagram. If x is below y and
connected to y by an edge, then this means that x y. Edges that follow from
transitivity are left away in these graphs. Linear ordered sets like (N, ) actually
correspond to linear graphs in this way.
0

```
```

``

.
.
.

.
.
.

16

.
.
.

.
.
.

12

18

.
.
.

.
.
.

.
.
.

10

15

14

.
.
.

.
.
.

.
.
.

.
.
.

11

13

17

19

(N, )

(N, |)

Figure 6.4: Hasse diagrams of the partially ordered sets (N, ) and (N, |)
Whenever we have a preorder, then we automatically get some equivalence relab := R R1 is called the symmetric closure of R (see
tions. If R is a relation, then R
Problem 6.3). On the other hand, also the intersection R R1 is an equivalence
relation. This is what the following result says in other words.
Proposition 6.12 (Induced equivalence relation) Let be a preorder on X.
Then we obtain an equivalence relation on X by
x y : (x y and y x)

126

6.4. Preorders, Partial Orders and Linear Orders


for all x, y X. The equivalence relation is called the equivalence relation that is
induced by the preorder .
Proof. We can also express the definition by saying that is the intersection of
and its inverse 1 . This relation is obviously symmetric and it is also reflexive,
since is reflexive. We prove that is transitive. Let x, y, z X such that x y
and y z. Then x y and y z and z y and y x. Since is transitive, we
obtain x z and z x, which means x z.
2
We discuss a number of induced equivalence relations.
Example 6.13 We give some examples of induced equivalence relations.
1. The equivalence relation induced by any partial order on a set X is the equality
relation on X (see Problem 6.13).
2. The equivalence relation induced by the relation  from Example 6.11 is just the
relation of Example 6.5 (this is the statement of the Theorem of Schr
oderBernstein 5.8).
3. The equivalence relation induced by the divisibility relation | on Z has the
equivalence classes [z] := {z, z} for all z Z.
If we start with a preorder on some set X then the induced equivalence relation
leads to some quotient X/ and on this quotient we can derive now a partial
order from . We denote this derived partial order here with a slightly different
symbol 5.
Proposition 6.14 (Induced partial order) Let be a preorder on X with induced equivalence relation . Then we can define a partial order 5 on the quotient
X/ by
[x] 5 [y] : x y
for all x, y X.
Proof. Firstly, we need to show that 5 is a well-defined relation on X/ . Let
x, x0 , y, y 0 X with [x] = [x0 ] and [y] = [y 0 ]. Then x x0 and y y 0 and we obtain
(x y = x0 x y y 0 ) and (x0 y 0 = x x0 y 0 y),
which means x y x0 y 0 . Hence, 5 is well-defined. Now we need to show
that 5 is a partial order.
Reflexivity: We obtain [x] 5 [x] for all x X since x x.
Antisymmetry: Let [x] 5 [y] and [y] 5 [x] for some x, y X. Then x y and y x
and hence x y. But this means [x] = [y].
Transitivity: Let x, y, z X with [x] 5 [y] 5 [z]. Then x y z and hence x z,
which implies [x] 5 [z].
2

127

6.

Order

This result is the reason why one usually studies partial orders and not preorders.
If a relation is just a preorder on a given set X, then one can replace it by the partial
order induced on the quotient set X/ .

Problems
6.12 Let X be a set and consider the subset relation on 2X . Prove that is linear if and
only if X has less than two elements.
6.13 Let be a partial order on a set X. Prove that the induced equivalence relation is
the equality on X.
6.14 We call a relation < on X a strict order if it is irreflexive and transitive. Let be a
preorder on X. Prove that by
x < y : (x y and y 6 x)
we can define a strict order on X, which is called the induced strict order of . Prove that
1. the strict order < induced by the usual less or equal relation on N is the usual
strictly less relation,
2. the strict order $ induced by the inclusion relation on the power set 2X of some
set X is the usual proper inclusion relation,
3. the strict order induced by the cardinality relation  on some set S of sets satisfies
the property X Y |X| < |Y | for all X, Y S.
6.15 Let X be a finite set with n = |X| elements. Prove that there are exactly n! linear
orders on X.

6.5

Monoids

In the previous section we have seen that equivalence relations and partial orders
are often induced by preorders. But where do preorders come from? In this section
we show that many preorders are induced by monoids. A monoid is an algebraic
structure with one binary operation : X X X. In general, an algebraic
structure is a set together with operations that typically satisfy some additional
conditions. In the following definition we list some conditions that apply to a single
binary operation.
Definition 6.15 (Binary operations) Let X be a set with a binary operation
: X X X. We usually write x y instead of (x, y) for all x, y X. Let e X.
We define the following properties of binary operations:
1. is called associative if x (y z) = (x y) z for all x, y, z X,
2. is called commutative if x y = y x for all x, y X,
3. e is said to be an identity for if x e = x = e x for all x X,

128

6.5. Monoids
4. if e is a identity for , then y X is said be an inverse for x X with respect
to , if x y = e = y x.
Algebraic structures with one binary operation : X X X that satisfy
some combinations of these conditions have particular names. We give a survey on
some common such structures in the diagram in Figure 6.5. Here, we are mainly
interested in so-called monoids.
magma
associative 



inverses





semigroup

quasi group

identity

identity

loop

monoid

inverses

associative

j
group

Figure 6.5: Algebraic structures with one binary operation : X X X.

Definition 6.16 (Monoid) A triple (M, , e) is called a monoid if M is a set,


: M M M is a binary operation on M that is associative and e M is an
identity for .
One can prove that the identity of a monoid is uniquely determined (see Problem 6.16). The most important algebraic structures with one binary operation are
groups. These are monoids such that, additionally, each element has an inverse (see
Problem 6.20). We mention a number of monoids in the following example.
Example 6.17 Let X be a set.
1. (N, +, 0) is a monoid, where + : N N N is the usual addition.
2. (N, , 1) is a monoid, where : N N N is the usual multiplication.
3. (Z, , 1) is a monoid, where : Z Z Z is the usual multiplication.
4. (2X , , ) is a monoid, where : 2X 2X 2X is the usual union.
5. (X , , 0) is a monoid, where : X X X is the concatenation operation,
defined by
(n, u1 , ..., un ) (k, v1 , ..., vk ) := (n + k, u1 , ..., un , v1 , ..., vk )
for all n, k N and u1 , ..., un , v1 , ..., vk X.

129

6.

Order
6. (X X , , idX ) is a monoid, where : X X X X X X is the composition.

We leave the proofs of these facts to Problem 6.17. The reason why we discuss
monoids here is that each monoid automatically comes with an induced preorder
that we mention in the following result.
Theorem 6.18 (Preorder of monoids) Let (M, , e) be a monoid. Then we obtain a preorder on M by
x y : (z M ) x z = y
for all x, y M . The preorder is called the induced preorder of the monoid
(M, , e).
Proof. Firstly, is reflexive, since has an identity e and hence x = x e for all
x M and hence x x. Secondly, let x y and y z for x, y, z M . Then there
are a, b M such that y = x a and z = y b. Hence z = (x a) b = x (a b)
because is associative. But this means x z. Hence is transitive.
2
The following example shows that many of the preorders that we have considered
here are actually induced by monoids.
Example 6.19 Let X be a set. We consider some monoids and the induced preorders:
1. (N, +, 0) induces the usual less or equal relation on N,
2. (N, , 1) induces the usual divisibility relation | on N,
3. (Z, , 1) induces the usual divisibility relation | on Z,
4. (X , , 0) induces the usual prefix relation v on X ,
5. (2X , , ) induces the usual inclusion relation on 2N .
We leave the proof to Problem 6.18. The example of the monoid (Z, , 1) shows
that the induced preorder of a monoid is not necessarily a partial order. A monoid
is just the right algebraic structure to yield an interesting preorder. If we have
too much algebraic structure, then the induced preorder can become trivial (see
Problem 6.21).

Problems
6.16 Let (M, , e) be a monoid and let e0 M be an element with the property that
x e0 = x = e0 x for all x M . Prove that e = e0 follows.
6.17 Let X be a set. Prove the statements in Example 6.17.
6.18 Let X be a set. Prove the statements of Example 6.19.

130

6.6. Maximum and Minimum


.
.
.
.
.
.
aaa aab

.
.
.
.
.
.
aba abb




.
.
.
.
.
.
baa bab




aa

.
.
.
.
.
.
bba bbb




ab




ba

bb

Figure 6.6: Hasse diagram of the partially ordered set ({a, b} , v)

6.19 We consider the monoid (X X , , idX ) for a set X, where : X X X X X X denotes


the ordinary composition operation. Let be the induced preorder and the induced
equivalence relation on X X .
1. Prove that f idX if and only if f has a right inverse.
2. Prove that is not antisymmetric, if X has at least two elements.
3. Prove that is not total, if X has at least two elements.
6.20 A monoid (G, , e) is called a group, if for all x G there is a y G such that
x y = e = y x.
In this situation y is called the inverse of x.
1. Prove that (Z, +, 0) is a group with the usual addition + : Z Z Z.
2. Prove that (X!, , idX ) is a group for any set X, where : X! X! X! denotes the
composition (recall that X! denotes the set of bijective functions f : X X).
6.21 Prove that the preorder induced by a group (G, , e) is the all relation G G.

6.6

Maximum and Minimum

Some elements in a preordered set are located in a particular position. In order to


make this more precise, wee need the concept of an upper and lower bound.
Definition 6.20 (Upper and lower bounds) Let (X, ) be a preordered set,
A X and b X.
1. b is called upper bound of A, if x b for all x A.
2. b is called lower bound of A, if b x for all x A.

131

6.

Order

That is an upper bound of a set A is an element b X which is above all elements


of A and a lower bound of A is an element b X that is below all elements of A.
However, in both cases the bound is not required to be an element of the set A itself.
If an upper bound or a lower bound additionally is a member of A, then we call it
a greatest element or a least element, respectively.
Definition 6.21 (Minimum and Maximum) Let (X, ) be a preordered set and
let A X and m X.
1. Then m is called a greatest element of A or a maximum of A if m A and m
is an upper bound of A.
2. Then m is called a least element of A or a minimum of A if m A and m is a
lower bound of A.
The least and the greatest element of a subset of a preordered set need not exist
and if one of them exists it is not necessarily uniquely determined. We give some
examples.
Example 6.22 Let X be a set.
1. The partially ordered set (N, ) has 0 as least element and it has no greatest
element.
2. The partially ordered set (N, |) has 0 as greatest element and 1 as least element.
3. The preordered set (Z, |) has 0 as greatest element and 1 and 1 as least
elements.
4. In the preordered set (Z, |) the set A = {4, 2, 1, 1, 2, 4} has the greatest
elements 4 and 4 and the least elements 1 and 1.
5. In the preordered set (N, |) the set A = {1, 2, 3} has the least element 1 and no
greatest element (see the diagram in Figure 6.7).
6. The partially ordered set (2X , ) has the empty set as least element and X
as greatest element.
7. The partially ordered set (X , v) has the empty word 0 as least element and it
has no greatest element in general.
This example also shows that even a finite subset of a partially ordered set does
not necessarily have a greatest element. On the other hand, an infinite set (such as
2N in (2N , )) can have a least and a greatest element. In a partially ordered set,
the maximum and the minimum is at least uniquely determined if it exists. From
now on we will essentially consider only partially ordered sets.
Proposition 6.23 If (X, ) is a partially ordered set and A X, then A has at
most one minimum and at most one maximum.

132

6.6. Maximum and Minimum

1
A

Figure 6.7: The subset A = {1, 2, 3} of the partially ordered set (N, |)

Proof. We prove the claim for the minimum. Let us assume that A X has two
minima m, m0 . Then m A and m0 A and hence m m0 and m0 m. This
implies m = m, provided that is antisymmetric.
2
Since the maximum and minimum of a set in a partially ordered set is uniquely
determined, if it exists at all, we can use a special notation for these elements.
Definition 6.24 (Minimum and maximum) Let (X, ) be a partially ordered
set and let A X. Then we denote by max(A) the maximum of A, if it exists and
by min(A) the minimum of A, if it exists.
We give some further examples. In particular, we show that the maximum and
the minimum of a two element subset of a linear order always exist.
Example 6.25 We give some examples of maxima and minima.
1. We consider a linearly ordered set (X, ). Then for all x, y X

max({x, y}) =

y if x y
x otherwise


and

min({x, y}) =

x if x y
y otherwise

Hence these notations correspond to our previous usage of max and min as
functions on natural numbers (see Example 4.26).
2. We consider the partially ordered set (2X , ) for some set X. Then max(2X ) =
X and min(2X ) = .
In Example 6.22 we have seen that it can happen that a set like {1, 2, 3} in (N, |)
has no greatest element. Nevertheless, there are elements like 2 and 3 in this set,
for which there are no greater elements. Such elements are called maximal.
Definition 6.26 (Maximal and minimal elements) Let (X, ) be a partially
ordered set and let A X.
1. Then an element m A is called a maximal element of A, if m x implies
m = x for all x A.

133

6.

Order
2. Then an element m A is called a minimal element of A, if x m implies
x = m for all x A.

Example 6.27 The elements 2 and 3 of the subset A = {1, 2, 3} of the partially
ordered set (N, |) are maximal elements of A and 1 is a minimal element.
If a partially ordered set has a maximum or a minimum, then this is the only
maximal or minimal element of the set, respectively.
Proposition 6.28 Let (X, ) be a partially ordered set and let A X.
1. If max(A) exists, then it is the only maximal element of A.
2. If min(A) exists, then it is the only minimal element of A.
Proof. Let A X and let us assume that max(A) exists and let m A be a
maximal element. Then m max(A) since max(A) is the maximum and hence
m = max(A) since m is a maximal element. This shows that any maximal element
of A is already the maximum. Moreover, if max(A) x for some x A then we
also have x max(A) since max(A) is the maximum and hence x = max(A) follows
by antisymmetry of . Hence max(A) is actually a maximal element. The second
claim can be proved analogously.
2
If one considers a linearly ordered set, then each maximal or minimal element of
a set is automatically the maximum or minimum of that set.
Proposition 6.29 Let (X, ) be a linearly ordered set and let A X and m A.
1. If m is a maximal element of A, then m = max(A) follows.
2. If m is a minimal element of A, then m = min(A) follows.
Proof. Let m A is a maximal element of A. Then for each x A we have m x
or x m since is total. If m x, then m = x follows from the maximality of m.
In any case x m holds. Hence m is the maximum of A.
2
Besides the least and the greatest element, there are often elements in the second
row that are also of some importance. These elements are called atoms and co-atoms.
Definition 6.30 (Atoms) Let (X, ) be a partially ordered set and let A X.
1. Then a A is called an atom of A, if min(A) exists and a is minimal in
A \ {min(A)}.
2. Then a A is called a co-atom of A, if max(A) exists and a is maximal in
A \ {max(A)}.
We give some examples of atoms and co-atoms.

134

6.6. Maximum and Minimum


Example 6.31 Let X be some set.
1. In the partially ordered set (N, ) the only atom of N is 1 and there are no
co-atoms, since there is no greatest element.
2. In the partially ordered set (N, |) the atoms of N are exactly the prime numbers
and there are no co-atoms, although there is a greatest element 0 N.
3. In the partially ordered set (2X , ) the atoms of 2X are exactly the singletons
{x} for x X and the co-atoms are exactly the complements of singletons
X \ {x} for x X.
4. In the partially ordered set (X , v) the atoms of X are exactly the words
x X of length 1 (more formally, one should say the words (1, x) X for
x X) and there are no co-atoms in general.
N

```
```
``
N\{0}

.
.
.

.
.
.

.
.
.

.
.
.

{0, 1}

{0}

N\{1} N\{2} N\{3} N\{4} N\{5} N\{6} N\{7} N\{8} N\{9}

.
.
.

.
.
.

.
.
.

.
.
.

.
.
.

.
.
.

.
.
.

.
.
.

.
.
.

.
.
.

.
.
.

.
.
.

.
.
.

.
.
.

{3}

{4}

{5}

{6}

{7}

{8}

{9}

.
.
.

{0, 2} {1, 2}

{1}

.
.
.

{2}

(2 , )

Figure 6.8: Hasse diagram of the partially ordered set (2N , )

Problems
6.22 Let (M, , e) be a monoid with induced preorder . Prove that e is the least element
in M .
6.23 Prove the statements of Example 6.31.

135

6.

Order

6.7

Supremum and Infimum

We have called a least upper bound b of a subset A of a partially ordered set a


maximum if additionally b A. It is also interesting to consider the least upper
bound, which is not necessarily a member of A. And correspondingly, we also
consider the greatest lower bound. These values, if existent, are called supremum
and infimum, respectively.
Definition 6.32 (Supremum and infimum) Let (X, ) be a partially ordered
set and let A X.
1. If existent, the value sup(A) := min{b X : (x A) x b} is called
supremum or least upper bound of A.
2. If existent, the value inf(A) := max{b X : (x A) b x} is called infimum
or greatest lower bound of A.
We give an example that shows that the supremum is actually a concept that is
different from the maximum and from maximal elements.
Example 6.33 We consider the set A = {1, 2, 3} in the partially ordered set (N, |).
Then inf(A) = min(A) = 1 and sup(A) = 6, but A has no maximum and A has the
maximal elements 2 and 3 (see the diagram in Figure 6.7).
If the supremum or the infimum of a set A is additionally a member of that set A,
then it follows automatically that it is the maximum or the minimum, respectively.
This follows directly from the uniqueness of maximum and minimum by Proposition 6.23. If the maximum or the minimum of a set A exists, then the supremum or
the infimum exists and is identical to the maximum or minimum, respectively.
Corollary 6.34 Let (X, ) be a partially ordered set and let A X. Then the
following conditions are equivalent:
1. max(A) exists,
2. max(A) and sup(A) exist and sup(A) = max(A),
3. sup(A) exists and sup(A) A.
Moreover, also the following are equivalent:
1. min(A) exists,
2. inf(A) exists and inf(A) = min(A),
3. inf(A) exists and inf(A) A.

136

6.7. Supremum and Infimum


The importance of the supremum and the infimum stems from the fact that it
can exist even in some cases where the maximum and the minimum do not exist,
respectively. We have already seen in Example 6.33 that the supremum can exist in
a case where the maximum does not exist. Roughly speaking, if the maximum or
minimum of a set does not exist, then the supremum or the infimum, respectively,
is the next best object that one can hope to exist.
Partially ordered sets for which the supremum and the infimum of any two
elements always exist have a special name, they are called lattices.
Definition 6.35 (Lattice) A partially ordered set (X, ) is called a lattice, if the
values x t y := sup{x, y} and x u y := inf{x, y} exist for all x, y X.
In this situation x t y is often called the join of x and y and x u y is called the
meet of x and y. We discuss a number of examples.
Example 6.36 Let X be a set.
1. Any linear order (X, ) is a lattice with x t y = sup{x, y} = max{x, y} and
x u y = inf{x, y} = min{x, y} for all x, y X.
2. The partially ordered set (N, |) is a lattice and for all n, k N
n t k = sup{n, k} = lcm(n, k) := the least common multiple of n and k,
n u k = inf{n, k} = gcd(n, k) := the greatest common divisor of n and k.
3. The partially ordered set (2X , ) is a lattice with
A t B = sup{A, B} = A B and A u B = inf{A, B} = A B
for all A, B X
4. The partially ordered set (X , v) is not a lattice in general. The supremum
sup{x, y} of two words x, y X exists if and only if x v y or y v x holds
(in which case it is y or x, respectively). The infimum inf{x, y} of two words
x, y X always exists and it is the longest common prefix of x and y.
We leave the proof of these claims to Problem 6.25. In a lattice (X, ) the
operations join t and meet u can be interpreted as binary operations t : X X X
and u : X X X. These operations satisfy some properties that constitute an
algebraic characterization of a lattice.
Proposition 6.37 (Lattice) Let (X, ) be a lattice. Then we obtain the following
for all x, y, z X:
1. (x t y) t z = x t (y t z) and (x u y) u z = x u (y u z)
2. x t y = y t x and x u y = y u x

(associative)
(commutative)

137

6.

Order
3. x t (x u y) = x and x u (x t y) = x

(absorption)

4. x t x = x and x u x = x

(idempotent)

We leave the proof to the reader (see Problem 6.26).

Problems
6.24 Let (X, ) be a lattice with least element e X. Prove that (X,
S t, e) is a monoid.
Conclude that the following are monoids: (N, max, 0), (N, lcm, 1), (2N , , ).
6.25 Prove the statements of Example 6.36.
6.26 Prove Proposition 6.37.

138

Axiomatic Set Theory


God exists since mathematics is consistent,
and the Devil exists since we cannot prove it.
Andre Weil (19061998)

In this section we want to present a brief survey on axiomatic set theory. Actually, there are many different versions of axiomatic set theory and the version presented here is called Zermelo-Fraenkel set theory, often abbreviated by ZF. ZermeloFraenkel set theory plus the Axiom of Choice, abbreviated as ZFC, is the standard
framework in which most of modern mathematics is developed.
Zermelo-Fraenkel sets theory ZFC
1. Axiom of the Empty Set. There exists an empty set without elements.
2. Axiom of Extensionality. Two sets X and Y are equal, if they contain
exactly the same elements.
3. Axiom of Comprehension. For each given set X and each predicate P for
X, there exists a set S = {x X : P (x)} of all elements of X that satisfy the
predicate P .
4. Axiom of Pairing. For all objects x and y there exists a set {x, y} that
contains exactly x and y.
5. Axiom of Union. For all sets of sets S there exists a set
S = {x : (X S) x X}
that contains all elements that belong to at least one set X S.
6. Axiom of the Power Set. For all sets X there exists a power set
2X = {S : S X}
that contains all subsets of X.
7. Axiom of Infinity. There exists an infinite set (such as N).
8. Axiom of Replacement. If C and D are classes and f : C D is a function,
then for each subset X of C the image f (X) is a subset of the class D.

139

6.

Order
9. Axiom of Foundation. Any non-empty set X of sets has the property that
it contains a member Y such that X Y = .

10. Axiom of Choice. Any set S that contains non-empty sets has a choice
function, i.e. a function f : S S such that f (X) X for each X S.
The axioms as listed here are only informal and simplified versions of the formal
axioms of ZFC. The purpose is to give the reader and impression of what these
axioms are about. For instance, in case of the Axiom of Replacement one would have
to be more precise about what kind of functions f are allowed here. As phrased here,
the axioms are certainly also redundant. For instance, the existence of the empty
set can be deduced from the Axiom of Infinity and the Axiom of Comprehension.
The Axiom of Comprehension itself can be deduced form the Axiom of Regularity.
Hence, the presentation of these axioms can be optimised. The Axiom of Foundation
(sometimes also called the Axiom of Regularity) is equivalent to the fact that the
element relation is well-founded (provided one has the Axiom of Choice). There
are other versions of axiomatic set theory such as von Neumann-Bernay-G
odel set
theory.
Unfortunately, it is not known whether the ZFC axioms are consistent! This is
one of the big open problems in mathematics and is related to a problem that was
discussed by Hilbert in a famous talk he gave at Paris in 1900.
Conjecture 6.38 (Consistency) The ZFC axioms of Zermelo-Fraenkel set theory
together with the Axiom of Choice are consistent.
Usually one proves that some mathematical axioms are consistent, by providing
an example of a mathematical object that satisfies all the axioms. However, in case
of set theory the problem is that nobody has any idea how to create a model for set
theory without using sets (and hence set theory). Nobody was able to resolve this
circularity until today. Even worse, Godel turned the observation that there is such
a circularity into the following negative result.
Theorem 6.39 (G
odels Second Incompleteness Theorem 1931) ZFC is consistent if and only if one cannot prove its consistency within ZFC.
The if direction of this result is trivial, if ZFC is inconsistent, then one can
conclude everything from ZFC, using the principle of explosion. In particular, one
can prove the consistency of ZFC in ZFC in that case. The only if direction of
the proof requires a deeper insight into logic and computability. What this theorem
clearly shows is that a consistency proof for ZFC would require some meta theory
that goes beyond set theory and then the question of consistency for this meta theory
would have to be resolved.
What is known, however, is that the axiom of choice is independent of the other
axioms of ZFC, i.e. ZFC is consistent if and only if ZF is consistent.

140

Mathematicians
Mathematicians are generally thought of as some kind of intellectual machine, a great
brain that crunches numbers and spits out theorems. In fact we are, as Hermann
Weyl said, more like creative artists. Although strongly constrained by the rules of
logic and by physical experience, we use our imagination to make great leaps into
the unknown. The development of mathematics over thousands of years is one of the
great achievements of civilization.
Sir Michael Francis Atiyah (Fields Medalist, Cambridge)

Some Selected Biographies of Mathematicians


Aristotle (384322 BC) was a philosopher and student of Plato. His work
includes texts on almost all areas of sciences, philosophy and politics of his
time. Aristotle is considered as the first author who studied formal logic and
his views dominated mathematical logic for more than 2000 years. Aristotle
systematised deduction rules and systems and it seems that he was the first
one who formulated the Principle of Excluded Middle and in this way he
gave a logical foundation to the concept of reductio ad absurdum, which
was already used in Greek philosophy.
Euclid of Alexandria (323283 BC) is best known for his book Elements, in which he developed geometry axiomatically. For more than 2000
years this was the standard text book on geometry. However, Euclid also
worked in number theory and results such as Euclids lemma on factorization or the Euclidean algorithm to calculate greatest common divisors of
two numbers are named after him.
Descartes (15961650) was a philosopher, mathematician and
Rene
physicist. As a mathematician Descartes is best known for his contributions to analytic geometry including his approach to treat geometry using
coordinates and vectors. With this approach he created a link between
geometry and algebra that is essential for the modern treatment of these
subjects. Cartesian coordinate systems and the Cartesian product is named
after Descartes.

141

6.

Order
Leopold Kronecker (18231891) was a student of Dirichlet and worked
in algebra, number theory, analysis, and mathematical logic. He rejected
Cantors set theory due to its non-constructive nature and proposed himself
finitism, which can be considered as a forerunner of intuitionism. Kronecker
proposed the idea that analysis and other branches of mathematics should
be based on natural numbers and he said God made the integers; all else
is the work of man.
Georg Cantor (18451918) was a student of Kummer and Weierstra and
is best known for his development of set theory. His own definition of a set
was roughly the following: In its entirety we consider any collection M of
well-defined and distinguished objects m of our perception or of our thoughts
as a set. The objects m are called the elements of the set M . Cantor was
led to his study of set theory by his investigation of bijective maps and the
concept of equinumerity. In this context Cantors diagonalisation method
and Cantors pairing function are well-known and named after him. He
also initiated the study of cardinality and of transfinite numbers. The set
{0, 1}N as a metric space is often called Cantor space and plays a crucial
role in some fields of mathematics such as topology, descriptive set theory
and fractal geometry.
David Hilbert (18621943) was a student of Lindemann and is one of
the most well-known mathematicians of the 20th century. This is because
he made substantial contributions to various fields of mathematics such as
the theory of invariants, foundations of functional analysis and mathematical physics as well as mathematical logic. Many concepts and results are
named after him. For instance, Hilbert spaces play a central role in functional analysis and for the mathematical foundations of modern quantum
physics. Hilbert was a strong supporter of Cantors set theory. Hilberts
Basis Theorem caused a controversy around the fact that the proof was
highly non-constructive. Hilberts Nullstellensatz is another fundamental
result that relates geometry to algebra and it is one of the starting points of
modern algebraic geometry. In a famous talk that Hilbert delivered at the
Sorbonne in Paris in 1900 he described 23 mathematical problems that he
considered as essential for mathematics of the 20th century. His problems
turned out to be very influential and some developments of mathematics
in the 20th century were motivated by attempts to solve some of his problems. Some of these problems are still unsolved. One of his problems was
to find a complete and sound axiomatic system for mathematics. Godels
First Incompleteness Theorem brought Hilberts programme and its original
form to an end, since Godel proved that even for arithmetic there cannot
be any reasonable axiom system that is sound and complete simultaneously.
Nevertheless, Hilberts foundational ideas in mathematical logic have a substantial impact even on nowadays mathematical logic.

142

6.7. Supremum and Infimum


Betrand Russell (18721970) was a philosopher who published texts
on various areas of philosophy, science and politics. In mathematics he is
best known for the three volume work Principia Mathematica, which he
wrote together with Afred North Whitehead. The purpose of this work
was to derive mathematics starting from some basic axioms and using only
the rules of symbolic logic. At the same time this approach was designed
to avoid antinomies such as Russels paradox that Russel discovered when
he studied Freges work on naive set theory. The Principia Mathematica
is sometimes considered as the most significant work on formal logic since
Aristotle. The value of this approach was relativised, when Godel published
his First Incompleteness Theorem, from which it follows that the system in
Principia Mathematica cannot be complete and sound simultaneously.
Luitzen Egbertus Jan Brouwer (18811966) was a student of Korteweg
and made substantial contributions to topology and logic. He was one of
the main proponents of intuitionism, a constructive approach to mathematics that avoids the Principle of the Excluded Middle. Ironically, he is best
known for the Brouwer Fixed Point Theorem, which is a theorem in classical topology that does not admit any constructive proof. Brouwer proved
this theorem in the early years of his career, where he was seeking recognition among mathematicians and at that time he refrained deliberately from
propagating intuitionism in order not to endanger his career.
Andrey Kolmogorov (19031987) was a student of Luzin and one of
the most influential mathematicians of the 20th century. He is perhaps best
known for his contributions to probability theory, since he started to develop
the subject systematically using the concepts of measure and integration. In
topology Kolmogorov is known, for instance, for his Superposition Theorem
that solved Hilberts 13th problem and shows that addition is a universal
continuous functions of two arguments (any other continuous function of
two arguments can be expressed using addition and continuous functions
of only one variable). Kolmogorov also extensively contributed to classical
mechanics, the mathematical treatment of turbulences and to analysis and
the theory of dynamical systems. In the area of logic and the foundations
of mathematics he is best known for his work on intuitionistic logic and
the concept of Kolmogorov complexity that allows to define random objects
using the concept of algorithms (opposed to the treatment in probability
theory that does not allow to speak about randomness of single objects).

143

6.

Order
del (19061977) was a student of Hahn and certainly the most
Kurt Go
influential mathematical logician of the 20th century. The G
odel Completeness Theorem shows that first-order logic can be axiomatised in a complete
and sound way, i.e. it shows that in some sense provability and truth correspond in pure logic. However, G
odels First Incompleteness Theorem indicates that this is not true for mathematics in general. Even a simple fragment of mathematics such as arithmetic does not admit any axiom system
that is complete and sound simultaneously. This result brought Hilberts
programme in its original form to and end. G
odels Second Incompleteness
theorem states that a sufficiently rich mathematical theory cannot prove
its own consistency and this applies, in particular, to set theory. Godel
also proved that the Axiom of Choice and the Generalized Continuum Hypothesis are both consistent with Zermelo-Fraenkel set theory (i.e. if ZF is
consistent, then also ZF together with the Axiom of Choice and the Generalised Continuum Hypothesis). Later, Paul Cohen was able to show that
this is also true for the negations of the Axiom of Choice and the negation
of the Generalised Continuum Hypothesis, such that both are independent
of Zermelo-Fraenkel set theory. Godel also made notable contributions to
other areas of logic, such as intuitionistic logic and to relativity theory.

144

Greek Alphabet

or

alpha
beta
gamma
delta
epsilon
zeta
eta
theta
iota
kappa
lambda
mu
nu
xi
omicron
pi
rho
sigma
tau
upsilon
phi
chi
psi
omega

A
B

E
Z
H

I
K

M
N

Alpha
Beta
Gamma
Delta
Epsilon
Zeta
Eta
Theta
Iota
Kappa
Lambda
Mu
Nu
Xi
Omicron
Pi
Rho
Sigma
Tau
Upsilon
Phi
Chi
Psi
Omega

145

Mathematical Symbols
2
N
P
Z
Q
A
R
C
a|b

{, }
=
6=
:=

6
$

\
Xc

t
2X
(X
S i )iI
TiI Xi
FiI Xi
QiI Xi
P
`
X

>

XY
XY
XY

end of the proof box (q.e.d.)


set of natural numbers 0, 1, 2, ...
set of prime numbers 2, 3, 5, 7, 11, ...
set of integers ..., 2, 1, 0, 1, 2, ...
set of rational numbers
set of algebraic numbers
set of real numbers
set of complex numbers
a divides b
is an element of
is not an element of
empty set
set brackets
is equal to
is not equal to
is defined to be equal to
is a subset of
is not a subset of
subset, but not equal to
union
intersection
set difference
complement of X (wrt. some other set)
Cartesian product
disjoint union
power set of X, also written as P(X)
indexed family of sets
union of an indexed family of sets
intersection of an indexed family of sets
disjoint union of an indexed family of sets
product
sum
coproduct (disjoint union)
set of finite words over X (Kleene star)
is not true (falsum)
is true (verum)
and
or
not
X is of the same cardinality as Y
X is of cardinality less or equal to Y
X is of smaller cardinality as Y

=
=

SR
R1
dom(R)
range(R)
X
idX : X X
graph(f )
f :XY
f : X , Y
f :X
Y
f :X*Y
f :XY
x 7 y
f (x)
f 1 : Y X
f (A)
f 1 (B)
YX
X!
|X|
X
Y

F(X)

[x]
X/
f
|

<
x y (mod n)
n! 
n
k
Bn

implies
is implied by
if and only if (is equivalent to)
is defined to be equivalent to
there exists (existential quantifier)
for all (universal quantifier)
there is exactly one
there are infinitely many
for almost all (for all but finitely many)
composition of relations
inverse relation
domain of a relation
range of a relation
diagonal of X (equality relation)
identity function
graph of a function
function from X to Y
injection from X into Y
surjection from X onto Y
partial function from X to Y
multi-valued function from X to Y
x is mapped to y
function value of f : X Y at x X
inverse function of a bijective f : X Y
image of A X under f : X Y
preimage of B Y under f : X Y
set of functions f : X Y
set of bijective functions f : X X
cardinality of a set X
set of subsets of X of cardinality |Y |
set of finite subsets of X
equivalence relation
equivalence class of x
quotient of X by
fiber relation of f : X Y
divisibility relation
less or equal (or a preorder)
strictly less (or a strict order)
x congruent to y modulo n
factorial of n
binomial coefficient
nth Bell number

147

Index

Banach-Tarski Paradox, 85
Bell number, 125
bijection, 68
bijective, 68
binary operation, 128
binomial coefficient, 103
bound variable, 51
Brouwer Fixed Point Theorem, 143

cardinality, 89, 104


Cartesian product, 39
case distinction, 70
cell, 124
characteristic function, 80, 81
choice function, 82
co-atom, 134
codomain, 62
commutative, 128
commutative diagrams, 66
complex numbers, 14
comprehension, 15, 16
computability theory, 22, 45
computational complexity theory, 49
concatenation, 129
congruent, 121, 124
conjunction, 46
consistent, 19
constant function, 70
Continuum Hypothesis, 96
Corollary, 5
correspondence, 66
countable, 103
countably infinite, 103
currying operation, 80

canonical decomposition, 123


canonical projection, 122
canonical projections, 72, 87
Cantor space, 142
Cantors diagonalisation method, 142
Cantors first diagonalization, 97
Cantors pairing function, 142
cardinal number, 89

Dedekind infinite, 107


denumerable, 104
descriptive set theory, 142
diagonal, 56
difference, 27
discriminated union, 42
disjoint, 26
disjoint union, 42

nfold product, 42
ntuples, 41
algebraic geometry, 142
algebraic numbers, 14
algebraic structure, 128
all relation, 56
almost all, 106
American Mathematical Society, 3
antisymmetric, 118
apply operation, 80
arity, 41
associative, 128
atom, 134
Axiom of Choice, 82, 139, 144
Axiom of Regularity, 140
axiomatic set theory, 13

149

Index
disjunction, 46
divides, 7
divisibility relation, 125
divisor, 7
divisor relation, 56
domain, 59
double negation law, 30
duality, 29
element relation, 56
elements, 13
empty relation, 56
empty set, 14
empty tuple, 41
empty word, 41
equal, 14
equality relation, 56
equinumerosity relation, 121
equivalence, 46
equivalence class, 121, 122
equivalence closure, 124
equivalence kernel, 121
equivalence relation, 121
Euclids lemma, 141
Euclidean algorithm, 141
evaluation, 80
evaluation map, 80
exclusive or, 46
existential quantifier, 32
extension, 74
factor, 7
factorial function, 103
falsum, 19
family, 73
fiber, 76
fiber relation, 121
finite, 6, 103
finite words, 43
finitism, 142
First Incompleteness Theorem, 143
first-order logic, 50
fixed point, 92
forward image, 76

150

fractal geometry, 142


free variable, 51
function, 62
function value, 63
Godel Completeness Theorem, 144
Godels Completeness Theorem, 52
Godels First Incompleteness Theorem,
144
Godels Second Incompleteness, 144
Generalized Continuum Hypothesis, 96,
144
graph, 55, 62
graph map, 82
greatest element, 132
greatest lower bound, 136
group, 131
groups, 129
halting problem, 22
Hasse diagram, 126
Hilbert Hotel Paradox, 91
Hilbert spaces, 142
Hilberts Basis Theorem, 142
Hilberts Nullstellensatz, 142
Hilberts programme, 142
homogeneous, 55, 118
identity, 66, 128
identity function, 70
image, 75
implication, 46
inconsistency, 19
indexed family, 14, 32
indirect, 6
indirect proof, 7
induced preorder, 130
induced strict order, 128
Induction base, 101
induction base, 102
induction hypothesis, 102
induction principle, 100
Induction step, 101
induction step, 102
infimum, 136

Index
infinite, 103
infinitely many, 106
injection, 68
injective, 68
integers, 14
intersection, 23
into, 68
Intuitionism, 10
intuitionism, 10, 142, 143
intuitionistic logic, 10
inverse, 83, 129, 131
inverse function, 71
inverse image, 76
inverse relation, 60
inversion map, 82
irreflexive, 118
join, 137
Kleene star operation, 43
Kuratowski pair, 38
lattice, 137
lattices, 137
least element, 132
least upper bound, 136
left inverse, 83
left total, 59
left unique, 61
Lemma, 5
less or equal relation, 56
linear order, 125
linearly ordered set, 125
logical formulas, 47
lower bound, 131
map, 62
mapping, 62
Mathematical Reviews, 3
Mathematics Subject Classification, 3
maximal element, 133
maximum, 132
maximum function, 69
meet, 137
metamathematics, 45

minimal element, 134


minimum, 8, 132
minimum function, 69
model theory, 45
modulo, 121, 124
monoid, 129
monotone, 92
multi-valued function, 66
multiple, 7
multisets, 14
naive set theory, 13
natural numbers, 14
necessary, 18
necessary condition, 18
negation, 46
non well-founded set theory, 23
non-standard analysis, 23
not finite, 6
one-to-one function, 68
onto, 68
order relation, 94
P-NP problem, 49
pair, 38
pairs, 41
partial function, 67
partial order, 125
partially ordered set, 125
partition, 122, 124
Pascals rule, 103
Peano axioms, 102
Peano model of the natural numbers,
102
permutations, 72
Platonism, 10
platonism, 10
power set, 36
power set construction, 21
predecessor function, 69
prefix relation, 125
preimage, 75
preorder, 125
preordered set, 125

151

Index
prime divisor, 8
prime numbers, 14
Principia Mathematica, 143
principle of excluded middle, 9
principle of explosion, 19
Principle of the Excluded Middle, 143
product, 39, 85
product function, 72
projections, 88
proof, 4
proof by contradiction, 10
proof by induction, 102
proof theory, 45
proper class, 22
proper subset relation, 56
Proposition, 5
proposition, 46
Propositional, 46
quadruples, 41
quasiorders, 125
quintuples, 41
quotient, 122
range, 59
range map, 82
rational numbers, 14
real numbers, 14
reductio ad absurdum, 10
reflexive, 118
reflexive and symmetric closure, 120
reflexive closure, 120
relation, 55
replacement, 23
restriction of f to A, 74
right inverse, 83
right total, 59
right unique, 62
rigorous enough, 7
Russels paradox, 21, 143
same cardinality, 89
scope, 51
second-order logic, 51
selfapplicability problem, 22

152

sequence, 73
set, 6, 13
set of all functions, 79
set of bijective functions, 79
set of functions, 62
Sierpi
nski Pyramid, 2
singleton, 15
smaller or the same cardinality, 89
source, 55
square function, 69, 74
strict order, 128
strictly less relation, 56
strictly smaller cardinality, 89
subset, 16
subset relation, 56
successor function, 102
sufficient, 18
sufficient condition, 18
superset, 17
supremum, 136
surjection, 68
surjective, 68
symmetric, 118
symmetric closure, 126
symmetric difference, 31
symmetric group, 72, 80
tagged union, 42
target, 55
tautology, 47
Theorem, 4
Theorem of Diaconescu-Goodman-Myhill,
83
Theorem of Tychonoff, 86
topology, 142
total, 118
total orders, 125
transitive, 118
transitive closure, 120
transitivity, 8
triples, 41
truth table, 46
truth table method, 47
tuple, 41

Index
twin primes, 8
uncountable, 103
union, 21, 23
universal property, 87
universal quantifier, 32
untyped sets, 19
upper bound, 131
valid, 51
Venn diagram, 17
von Neumann-Bernay-G
odel set theory, 140
well-defined, 10, 63
well-founded, 22
Zermelo-Fraenkel set theory, 139

153