You are on page 1of 222

Mathematical Logic for Mathematicians

Joseph R. Mileti
January 9, 2014

Contents
1 Introduction
1.1 The Nature of Mathematical Logic . .
1.2 The Language of Mathematics . . . .
1.3 Syntax and Semantics . . . . . . . . .
1.4 The Point of It All . . . . . . . . . . .
1.5 Some Basic Terminology and Notation

.
.
.
.
.

.
.
.
.
.

.
.
.
.
.

.
.
.
.
.

.
.
.
.
.

.
.
.
.
.

.
.
.
.
.

.
.
.
.
.

.
.
.
.
.

.
.
.
.
.

.
.
.
.
.

.
.
.
.
.

.
.
.
.
.

.
.
.
.
.

.
.
.
.
.

.
.
.
.
.

.
.
.
.
.

.
.
.
.
.

.
.
.
.
.

.
.
.
.
.

.
.
.
.
.

.
.
.
.
.

.
.
.
.
.

.
.
.
.
.

.
.
.
.
.

.
.
.
.
.

.
.
.
.
.

.
.
.
.
.

7
7
8
12
13
13

2 Induction and Recursion


2.1 Induction and Recursion on N . . . . . . . . .
2.2 Generation . . . . . . . . . . . . . . . . . . .
2.2.1 From Above . . . . . . . . . . . . . . .
2.2.2 From Below: Building by Levels . . .
2.2.3 From Below: Witnessing Sequences . .
2.2.4 Equivalence of the Definitions . . . . .
2.3 Step Induction . . . . . . . . . . . . . . . . .
2.4 Step Recursion . . . . . . . . . . . . . . . . .
2.5 An Illustrative Example . . . . . . . . . . . .
2.5.1 Proving Freeness . . . . . . . . . . . .
2.5.2 The Result . . . . . . . . . . . . . . .
2.5.3 An Alternate Syntax - Polish Notation

.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.

15
15
16
18
19
19
20
20
21
24
24
26
27

3 Propositional Logic
3.1 The Syntax of Propositional Logic . . . . . .
3.1.1 Standard Syntax . . . . . . . . . . . .
3.1.2 Polish Notation . . . . . . . . . . . . .
3.1.3 Official Syntax and Our Abuses of It .
3.1.4 Recursive Definitions . . . . . . . . . .
3.2 Truth Assignments and Semantic Implication
3.3 Boolean Functions and Connectives . . . . . .
3.4 Syntactic Implication . . . . . . . . . . . . . .
3.4.1 Motivation . . . . . . . . . . . . . . .
3.4.2 Official Definitions . . . . . . . . . . .
3.4.3 Examples Of Deductions . . . . . . . .
3.4.4 Theorems about ` . . . . . . . . . . .
3.5 Soundness and Completeness . . . . . . . . .
3.5.1 The Soundness Theorem . . . . . . . .
3.5.2 The Completeness Theorem . . . . . .
3.6 Compactness and Applications . . . . . . . .

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

31
31
31
32
33
34
36
39
41
41
41
42
43
45
45
46
49

.
.
.
.
.

.
.
.
.
.

.
.
.
.
.

CONTENTS
3.6.1
3.6.2
3.6.3

The Compactness Theorem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .


Combinatorial Applications . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
An Algebraic Application . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

4 First-Order Logic : Syntax and Semantics


4.1 The Syntax of First-Order Logic . . . . . . . .
4.2 Structures: The Semantics of First-Order Logic
4.2.1 Structures: Definition and Satisfaction .
4.2.2 Elementary Classes of Structures . . . .
4.2.3 Definability in Structures . . . . . . . .
4.2.4 Substitution . . . . . . . . . . . . . . . .
4.3 Relationships Between Structures . . . . . . . .
4.3.1 Homomorphisms and Embeddings . . .
4.3.2 An Application To Definability . . . . .
4.3.3 Substructures . . . . . . . . . . . . . . .
4.3.4 Elementary Substructures . . . . . . . .
4.4 Changing the Language . . . . . . . . . . . . .
4.4.1 Expansions and Restrictions . . . . . . .
4.4.2 Adding Constants to Name Elements . .
5 Semantic and Syntactic Implication
5.1 Semantic Implication and Theories . .
5.1.1 Definitions . . . . . . . . . . .
5.1.2 Finite Models of Theories . . .
5.1.3 Countable Models of Theories .
5.2 Syntactic Implication . . . . . . . . . .
5.2.1 Definitions . . . . . . . . . . .
5.2.2 Some Fundamental Deductions
5.2.3 Theorems About ` . . . . . . .

.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.

53
53
58
58
62
64
66
69
69
72
72
73
76
76
76
77
77
77
79
81
81
81
83
84

.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.

6 Soundness, Completeness, and Compactness


6.1 Soundness . . . . . . . . . . . . . . . . . . . .
6.2 Prime Formulas . . . . . . . . . . . . . . . . .
6.3 Completeness . . . . . . . . . . . . . . . . . .
6.3.1 Motivating the Proof . . . . . . . . . .
6.3.2 The Proof . . . . . . . . . . . . . . . .
6.4 Compactness . . . . . . . . . . . . . . . . . .
6.5 Applications of Compactness . . . . . . . . .
6.6 An Example: The Random Graph . . . . . .

.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.

87
. 87
. 89
. 90
. 90
. 91
. 99
. 100
. 102

7 Quantifier Elimination
7.1 Motivation and Definition . . . . . . .
7.2 What Quantifier Elimination Provides
7.3 Quantifier Manipulation Rules . . . .
7.4 Examples of Theories With QE . . . .
7.5 Algebraically Closed Fields . . . . . .

.
.
.
.
.

.
.
.
.
.

.
.
.
.
.

.
.
.
.
.

.
.
.
.
.

.
.
.
.
.

.
.
.
.
.

.
.
.
.
.

.
.
.
.
.

.
.
.
.
.

.
.
.
.
.

.
.
.
.
.

.
.
.
.
.

.
.
.
.
.

.
.
.
.
.

.
.
.
.
.

.
.
.
.
.

.
.
.
.
.

.
.
.
.
.

.
.
.
.
.

.
.
.
.
.

.
.
.
.
.

.
.
.
.
.

.
.
.
.
.

.
.
.
.
.

.
.
.
.
.

.
.
.
.
.

.
.
.
.
.

.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.

.
.
.
.
.

.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.

49
49
50

.
.
.
.
.

.
.
.
.
.

105
105
106
107
108
110

CONTENTS

8 Nonstandard Models of Arithmetic and Analysis


113
8.1 Nonstandard Models of Arithmetic . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 113
8.2 The Structure of Nonstandard Models of Arithmetic . . . . . . . . . . . . . . . . . . . . . . . 115
8.3 Nonstandard Models of Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 118
9 Introduction to Axiomatic Set Theory
9.1 Why Set Theory? . . . . . . . . . . . .
9.2 Motivating the Axioms . . . . . . . . .
9.3 Formal Axiomatic Set Theory . . . . .
9.4 Working from the Axioms . . . . . . .
9.5 ZFC as a Foundation for Mathematics

.
.
.
.
.

.
.
.
.
.

.
.
.
.
.

.
.
.
.
.

.
.
.
.
.

.
.
.
.
.

.
.
.
.
.

.
.
.
.
.

.
.
.
.
.

.
.
.
.
.

.
.
.
.
.

.
.
.
.
.

.
.
.
.
.

.
.
.
.
.

.
.
.
.
.

.
.
.
.
.

.
.
.
.
.

.
.
.
.
.

.
.
.
.
.

.
.
.
.
.

.
.
.
.
.

.
.
.
.
.

.
.
.
.
.

.
.
.
.
.

.
.
.
.
.

.
.
.
.
.

.
.
.
.
.

.
.
.
.
.

.
.
.
.
.

.
.
.
.
.

.
.
.
.
.

125
125
126
130
131
132

10 Developing Basic Set Theory


10.1 First Steps . . . . . . . . . . . . . . .
10.2 Ordered Pairs and Cartesian Products
10.3 Relations and Functions . . . . . . . .
10.4 Orderings . . . . . . . . . . . . . . . .
10.5 The Natural Numbers and Induction .
10.6 Sets and Classes . . . . . . . . . . . .
10.7 Finite Sets, Powers, and Products . .
10.7.1 Finite Sets . . . . . . . . . . .
10.7.2 Finite Powers . . . . . . . . . .
10.7.3 Finite Products . . . . . . . . .
10.8 Definitions by Recursion . . . . . . . .
10.9 Infinite Sets, Powers, and Products . .
10.9.1 Countable Sets . . . . . . . . .
10.9.2 General Powers . . . . . . . . .
10.9.3 General Products . . . . . . . .

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

135
135
137
138
139
139
143
146
146
147
148
148
151
152
152
152

11 Doing Mathematics in Set Theory


153
11.1 The Basic Number Systems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 153
11.2 Doing Logic In Set Theory . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 153
12 Well-Orderings, Ordinals, and Cardinals
12.1 Well-Orderings . . . . . . . . . . . . . . .
12.2 Ordinals . . . . . . . . . . . . . . . . . . .
12.3 Arithmetic on Ordinals . . . . . . . . . .
12.4 Cardinals . . . . . . . . . . . . . . . . . .
12.5 Addition and Multiplication Of Cardinals
13 The
13.1
13.2
13.3

.
.
.
.
.

.
.
.
.
.

.
.
.
.
.

.
.
.
.
.

.
.
.
.
.

.
.
.
.
.

.
.
.
.
.

.
.
.
.
.

.
.
.
.
.

.
.
.
.
.

.
.
.
.
.

.
.
.
.
.

.
.
.
.
.

.
.
.
.
.

.
.
.
.
.

.
.
.
.
.

.
.
.
.
.

.
.
.
.
.

.
.
.
.
.

.
.
.
.
.

.
.
.
.
.

.
.
.
.
.

.
.
.
.
.

.
.
.
.
.

.
.
.
.
.

.
.
.
.
.

.
.
.
.
.

.
.
.
.
.

.
.
.
.
.

155
155
159
163
165
166

Axiom Of Choice
169
Use of the Axiom of Choice in Mathematics . . . . . . . . . . . . . . . . . . . . . . . . . . . . 169
Equivalents of the Axiom of Choice . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 170
The Axiom of Choice and Cardinal Arithmetic . . . . . . . . . . . . . . . . . . . . . . . . . . 171

14 Set-theoretic Methods in Analysis and


14.1 Subsets of R . . . . . . . . . . . . . . .
14.1.1 The Reals . . . . . . . . . . . .
14.1.2 Perfect Sets . . . . . . . . . . .
14.1.3 Closed Sets . . . . . . . . . . .

Model Theory
. . . . . . . . . .
. . . . . . . . . .
. . . . . . . . . .
. . . . . . . . . .

.
.
.
.

.
.
.
.

.
.
.
.

.
.
.
.

.
.
.
.

.
.
.
.

.
.
.
.

.
.
.
.

.
.
.
.

.
.
.
.

.
.
.
.

.
.
.
.

.
.
.
.

.
.
.
.

.
.
.
.

.
.
.
.

.
.
.
.

.
.
.
.

.
.
.
.

.
.
.
.

.
.
.
.

175
175
175
175
176

CONTENTS
14.1.4 Borel Sets . . . . . . . . . . . .
14.1.5 Measurable Sets . . . . . . . .
14.2 The Size of Models . . . . . . . . . . .
14.2.1 Controlling the Size of Models
14.2.2 Counting Models . . . . . . . .
14.3 Ultraproducts and Compactness . . .
14.3.1 Ultrafilters . . . . . . . . . . .
14.3.2 Ultraproducts . . . . . . . . . .

.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.

177
178
178
178
179
180
180
182

15 Primitive Recursive Functions and Relations


15.1 Primitive Recursive Functions . . . . . . . . .
15.2 Primitive Recursive Relations . . . . . . . . .
15.3 Coding Sequences . . . . . . . . . . . . . . . .
15.4 Coding Primitive Recursive Functions . . . .

.
.
.
.

.
.
.
.

.
.
.
.

.
.
.
.

.
.
.
.

.
.
.
.

.
.
.
.

.
.
.
.

.
.
.
.

.
.
.
.

.
.
.
.

.
.
.
.

.
.
.
.

.
.
.
.

.
.
.
.

.
.
.
.

.
.
.
.

.
.
.
.

.
.
.
.

.
.
.
.

.
.
.
.

.
.
.
.

.
.
.
.

.
.
.
.

.
.
.
.

.
.
.
.

.
.
.
.

187
187
191
193
195

16 Recursive Functions and Relations


16.1 Definitions and Basic Results . . . . . . . . .
16.2 Turing Machines and Computable Functions .
16.3 The Church-Turing Thesis . . . . . . . . . . .
16.4 Computably Enumerable Sets . . . . . . . . .

.
.
.
.

.
.
.
.

.
.
.
.

.
.
.
.

.
.
.
.

.
.
.
.

.
.
.
.

.
.
.
.

.
.
.
.

.
.
.
.

.
.
.
.

.
.
.
.

.
.
.
.

.
.
.
.

.
.
.
.

.
.
.
.

.
.
.
.

.
.
.
.

.
.
.
.

.
.
.
.

.
.
.
.

.
.
.
.

.
.
.
.

.
.
.
.

.
.
.
.

.
.
.
.

.
.
.
.

197
197
199
199
199

17 Coding Logic Computably

201

18 Subtheories of Number Theory


205
18.1 The Natural Numbers with Successor . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 205
18.2 The Natural Numbers with Order . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 207
18.3 The Natural Numbers with Addition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 210
19 Number Theory
19.1 Definability in N . . . . . . . . . . . . . . . . . . . . . . . .
19.2 Incompleteness and Undecidability in N . . . . . . . . . . .
19.2.1 Proof Using a C.E. Set Which is Not Computable .
19.2.2 Proof Using Undefinabilty of Truth . . . . . . . . . .
19.2.3 Proof Using a Sentence Implying Its Nonprovability
19.3 Robinsons Q and Peano Arithmetic . . . . . . . . . . . . .
19.3.1 Robinsons Q . . . . . . . . . . . . . . . . . . . . . .
19.3.2 Peano Arithmetic . . . . . . . . . . . . . . . . . . . .
19.4 Representable Relations and Functions . . . . . . . . . . . .
19.5 Working From Q . . . . . . . . . . . . . . . . . . . . . . . .
19.6 The Second Incompleteness Theorem . . . . . . . . . . . . .
19.7 Diophantine Sets . . . . . . . . . . . . . . . . . . . . . . . .
19.8 A Speed-Up Theorem . . . . . . . . . . . . . . . . . . . . .

.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.

213
213
215
215
215
216
217
217
218
218
219
220
221
222

Chapter 1

Introduction
1.1

The Nature of Mathematical Logic

Mathematical logic originated as an attempt to codify and formalize


1. The language of mathematics.
2. The basic assumptions of mathematics.
3. The permissible rules of proof.
One successful result of such a program is that we can study mathematical language and reasoning using
mathematics. For example, we will eventually give a precise mathematical definition of a formal proof, and
to avoid confusion with your current intuitive understanding of what a proof is, well call these objects
deductions. You should think of this as analogous to giving a precise mathematical definition of continuity
to replace the fuzzy a graph that can be drawn without lifting your pencil. Once weve codified the notion
in this way, we have turned deductions into mathematical objects and this allows us to prove mathematical theorems about deductions using normal mathematical reasoning. For example, weve opened up the
possibility of proving that there is no deduction of a certain statement.
Some newcomers to the subject find the whole enterprise perplexing. For instance, if you come to the
subject with the belief that the role of mathematical logic is to serve as a foundation to make mathematics
more precise and secure, then the description above probably seems a little circular and this will almost
certainly lead to a great deal of confusion. You may ask yourself:
Okay, weve just given a decent definition of a deduction. However, instead of proving things
about deductions following this formal definition, were proving things about deductions using
the usual informal proof style that Ive grown accustomed to in other math courses. Why should
I trust these informal proofs about deductions? How can we formally prove things (using deductions) about deductions? Isnt that circular? Is that why were only giving informal proofs? I
thought that Id come away from this subject feeling better about the philosophical foundations
of mathematics, but weve just added a new layer to mathematics and we have both informal
proofs and deductions, making the whole thing even more dubious.
To others who begin a study of the subject, there is no problem. After all, mathematics is the most
reliable method we have to establish truth, and there was never any serious question as to its validity. Such
a person may react to the above thoughts as follows:
7

CHAPTER 1. INTRODUCTION
We gave a mathematical definition of a deduction, so whats wrong with using mathematics to
prove things about deductions? Theres obviously a real world of true mathematics, and were
just working in that world to build a certain model of mathematical reasoning which is susceptible
to mathematical analysis. Its quite cool, really, that we can subject mathematical proofs to a
mathematical study by building this internal model. All of this philosophical speculation and
worry about secure foundations is tiresome and in the end probably meaningless. Lets get on
with the subject!

Should we be so dismissive of the first philosophically inclined student? The answer, or course, depends
on your own philosophical views, but Ill try to give my own views as a mathematician specializing in logic
with a definite interest in foundational questions. It is my firm belief that you should put all philosophical
questions out of your mind during a first reading of the material (and perhaps forever, if youre so inclined),
and come to the subject with a point of view which accepts an independent mathematical reality susceptible
to the mathematical analysis youve grown accustomed to. In your mind, you should keep a careful distinction
between normal real mathematical reasoning and the formal precise model of mathematical reasoning we
are developing. Some people like to give this distinction a name by calling the normal mathematical realm
were working in the metatheory.
To those who are interested, well eventually be able to give reasonable answers to the first student and
provide other respectable philosophical accounts of the nature of mathematics, but this should wait until
weve developed the necessary framework. Once weve done so, we can give examples of formal theories,
such as first-order set theory, which are able to support the entire enterprise of mathematics including
mathematical logic. This is of great philosophical interest, because this makes it possible to carry out
(nearly) all of mathematics inside this formal theory.
The ideas and techniques that were developed with philosophical goals in mind have now found application
in other branches of mathematics and in computer science. The subject, like all mature areas of mathematics,
has also developed its own very interesting internal questions which are often (for better or worse) divorced
from its roots. Most of the subject developed after the 1930s has been concerned with these internal and
tangential questions, along with applications to other areas, and now foundational work is just one small
(but still important) part of mathematical logic. Thus, if you have no interest in the more philosophical
aspects of the subject, there remains an impressive, beautiful, and mathematically applicable theory which
is worth your attention.

1.2

The Language of Mathematics

Our first and probably most important task in providing a mathematical model of mathematics is to deal with
the language of mathematics. In this section, we sketch the basic ideas and motivation for the development
of a language, but we will leave precise detailed definitions until later.
The first important point is that we should not use English (or any other natural language) because
its constantly changing, often ambiguous, and allows the construction of statements that are certainly not
mathematical and/or arguably express very subjective sentiments. Once weve thrown out natural language,
our only choice is to invent our own formal language. This seems quite daunting. How could we possibly
write down one formal language which encapsulates geometry, algebra, analysis, and every other field of
mathematics, not to mention those we havent developed yet, without using natural language? Our approach
to this problem will be to avoid (consciously) doing it all at once.
Instead of starting from the bottom and trying to define primitive mathematical statements which cant
be broken down further, lets first think about how to build new mathematical statements from old ones. The
simplest way to do this is take already established mathematical statements and put them together using
and, or, not, and implies. To keep a careful distinction between English and our language, well introduce
symbols for each of these, and well call these symbols connectives.

1.2. THE LANGUAGE OF MATHEMATICS

1. will denote and.


2. will denote or.
3. will denote not.
4. will denote implies.
In order to ignore the nagging question of what constitutes a primitive statement, our first attempt will be to
simply take an arbitrary set whose elements we think of as the primitive statements and put them together
in all possible ways using the connectives.
For example, suppose we start with the set P = {A, B, C}. We think of A, B, and C as our primitive
statements, and we may or may not care what they might express. We now want to put together the elements
of P using the connectives, perhaps repeatedly. However, this naive idea quickly leads to a problem. Should
the meaning of A B C be A holds, and either B holds or C holds, corresponding to A (B C), or
should it be Either both A and B holds, or C holds, corresponding to (A B) C? We need some way
to avoid this ambiguity. Probably the most natural way to achieve this is to insert parentheses to make it
clear how to group terms (but as we will see there are other natural ways to overcome this issue). We now
describe the formulas of our language, denoted by F ormP . First, we put every element of P in F ormP , and
then we generate other formulas using the following rules.
1. If and are in F ormP , then ( ) is in F ormP ..
2. If and are in F ormP , then ( ) is in F ormP .
3. If is in F ormP , then () is in F ormP .
4. If and are in F ormP , then ( ) is in F ormP .
Thus, the following is an element of F ormP :
(((B ((A) C))) A)
This simple setup, called propositional logic, is a drastic simplification of the language of mathematics,
but there are already many interesting questions and theorems that arise from a careful study. Well spend
some time on it in Chapter 3.
Of course, mathematical language is much more rich and varied than what we can get using propositional
logic. One important way to make more complicated and interesting mathematical statements is to make
use of the quantifiers for all and there exists which well denote using the symbols and . In order to do
so, we will need variables to act as something to quantify over. Well denote variables by letters like x, y, z,
etc. Once weve come this far, however, well have have to refine our naive notion of primitive statements
above because its unclear how to interpret a statement like xB without knowledge of the role of x inside
B.
Lets think a little about our primitive statements. As we mentioned above, it seems daunting to come
up with primitive statements for all areas of mathematics at once, so lets think of the areas in isolation.
For instance, take group theory. A group is a set G equipped with a binary operation (that is, takes in
two elements x, y G and produces a new element of G denoted by x y) and an element e satisfying
1. Associativity: For all x, y, z G, we have (x y) z = x (y z).
2. Identity: For all x G, we have x e = x = e x.
3. Inverses: For all x G, there exists y G such that x y = e = y x.

10

CHAPTER 1. INTRODUCTION

Although it is customary and certainly easier on the eyes to put between two elements of the group,
lets instead use the standard function notation in order to make the mathematical notation uniform across
different areas. In this setting, a group is a set G equipped with a function f : G G G and an element
e satisfying
1. For all x, y, z G, we have f (f (x, y), z) = f (x, f (y, z)).
2. For all x G, we have f (x, e) = x = f (e, x).
3. For all x G, there exists y G such that f (x, y) = e = f (y, x).
In order to allow our language to make statement about groups, we introduce a function symbol which we
denote by f to represent the group operation, and a constant symbol which we denote by e to represent the
group identity. Now the group operation is supposed to take in two elements of the group, so if x and y are
variables, then we should allow the formation of f(x, y) which should denote an element of the group (once
weve assigned elements of the group to x and y). Also, we should allow the constant symbol to be used
in this way, allowing us to form things like f(x, e). Once weve formed these, we should be allowed to use
them like variables in more complicated expressions such as f(f(x, e), y). Each of these expressions formed
by putting together, perhaps repeatedly, variables and the constant symbol e using the function symbol f is
called a term. Intuitively, a term will name a certain element of the group once weve assigned elements to
the variables.
With a way to name group elements in hand, were now in position to say what out primitive statements
are. The most basic thing that we can say about two group elements is whether or not they are equal, so
we introduce a new equality symbol, which we will denote by the customary =. Given two terms t1 and t2 ,
we call the expression (t1 = t2 ) an atomic formula. These are our primitive statements.
With atomic formulas in hand, we can use the old connectives and the new quantifiers to make new
statements. This puts us in a position to define formulas. First off, all atomic formulas are formulas. Given
formulas we already know, we can put them together using the connectives above. Also, if is a formula
and x is a variable then each of the following is a formula:
1. x
2. x
Perhaps without realizing it, weve described quite a powerful language which can make many nontrivial
statements. For instance, we can write formulas in this language which express the axioms for a group:
1. xyz(f(f(x, y), z) = f(x, f(y, z)))
2. x((f(x, e) = x) (f(e, x) = x))
3. xy((f(x, y) = e) (f(y, x) = e))
We can also write a statement saying that the group is abelian:
xy(f(x, y) = f(y, x))
or that the center is nontrivial:
x((x = e) y(f(x, y) = f(y, x)))
Perhaps unfortunately, we can write syntactically correct formulas which express things nobody would ever
utter, such as:
xyx((e = e))
What if you want to consider an area other than group theory? Commutative ring theory doesnt pose
much of a problem, so long as were allowed to alter the number of function symbols and constant symbols.

1.2. THE LANGUAGE OF MATHEMATICS

11

We can simply have two function symbols a and m which take two arguments (a to represent addition and
m to represent multiplication) and two constant symbols 0 and 1 (0 to represent the additive identity and 1
to represent the multiplicative identity). Writing the axioms for commutative rings in this language is fairly
straightforward.
To take something fairly different, what about the theory of partially ordered sets? Recall that a partially
ordered set is a set P equipped with a subset of P P , where we write x y to mean than (x, y) is an
element of this subset, satisfying
1. Reflexive: For all x P , we have x x.
2. Antisymmetric: If x, y P are such that x y and y x, then x = y.
3. Transitive: If x, y, z P are such that x y and y z, then x z.
Analogous to the syntax we used when handling the group operation, we will use notation which puts the
ordering in front of the two arguments. Doing may seem odd at this point, given that were putting equality
in the middle, but well see that this provides a unifying notation for other similar objects. We thus introduce
a relation symbol R (intuitively representing ), and we keep the equality symbol =, but we no longer have
a need for constant symbols or function symbols.
In this setting without constant or function symbols, the only terms that we have (i.e. the only names for
elements of the partially ordered set) are the variables. However, our atomic formulas are more interesting
because now there are two basic things we can say about elements of the partial ordering: whether they are
equal and whether they are related by the ordering. Thus, our atomic formulas are things of the form t1 = t2
and R(t1 , t2 ) where t1 and t2 are terms. From these atomic formulas, we build up all our formulas as above.
We can know write formulas expressing the axioms of partial orderings:
1. xR(x, x)
2. xy((R(x, y) R(y, x)) (x = y))
3. xyz((R(x, y) R(y, z)) R(x, z))
We can also write a statement saying that the partial ordering is a linear ordering:
xy(R(x, y) R(y, x))
or that there exists a maximal element:
xy(R(x, y) (x = y))
The general idea is that by leaving flexibility in the types and number of constant symbols, relation
symbols, and function symbols, well be able to handle many areas of mathematics. We call this setup
first-order logic. An analysis of first-order logic will consume the vast majority of our time.
Now we dont claim that first-order logic allows us to do and express everything in mathematics, nor do
we claim that each of the setups above allow us to do and express everything of importance in that particular
field. For example, take the group theory setting above. We can express that every nonidentity element has
order two with:
x(f(x, x) = e)
but its unclear how to say that every element of the group has finite order. The natural guess is:
xn(xn = e)
but this poses a problem for two reasons. The first is that our variables are supposed to quantify over
elements of the group in question, not the natural numbers. The second is that we put no construction in

12

CHAPTER 1. INTRODUCTION

our language to allow us to write something like xn . For each fixed n, we can express it (for example, for
n = 3, we can write f(f(x, x), x) and for n = 4, we can write f(f(f(x, x), x), x)), but its not clear how to write
it in a general way that would even allow quantification over the natural numbers.
Another example is trying to express that a group is simple (i.e. has no nontrivial normal subgroups).
The natural instinct is to quantify over all subsets H of the group G, and say that if it so happens that
H is a normal subgroup, then H is either trivial or everything. However, we have no way to quantify over
subsets. Its certainly possible to allow such constructions, and this gives second-order logic. If you allow
quantifications over sets of subsets (for example one way of expressing that a ring is Noetherian is to say
that every nonempty set of ideals has a maximal element), you get third-order logic, etc.
Newcomers to the field often find it strange that we focus primarily on first-order logic. There are many
reasons to give special attention to first-order logic that will be developed throughout our study, but for
now you should think of it as providing a simple example of a language which is capable of expressing many
important aspects of various branches of mathematics.

1.3

Syntax and Semantics

In the above discussion we introduced symbols to denote certain concepts (such as using in place of
and, in place of for all, and a function symbol f in place of the group operation f ). Building and
maintaining a careful distinction between formal symbols and how to interpret them is a fundamental aspect
of mathematical logic.
The basic structure of the formal statements that we write down using the symbols, connectives, and
quantifiers is known as the syntax of the logic that were developing. This corresponds to the grammar
of the language in question with no thought given to meaning. Imagine an English instructor who cared
nothing for the content of your writings, but only that the it was grammatically correct. That is exactly
what the syntax of a logic is all about. Syntax is combinatorial in nature and is based on rules which provide
admissible ways to manipulate symbols devoid of meaning.
The manner in which we are permitted (or forced) to interpret the symbols, connectives, and quantifiers is
known as the semantics of the the logic that were developing. In a logic, some symbols are to be interpreted
in only one way. For instance, in the above examples, we interpret the symbol to mean and. In the
propositional logic setting, this doesnt settle how to interpret a formula because we havent said how to
interpret the elements of P . We have some flexibility here, but once we assert that we should interpret
certain elements of P as true and the others as false, our formulas express statements that are either true
or false.
The first-order logic setting is more complicated. Since we have quantifiers, the first thing that must be
done in order to interpret a formula is to fix a set X which will act as the set of objects over which the
quantifiers will range. Once this is done, we can interpret each function symbol f taking k arguments as an
actual function f : X k X, each relation R symbol taking k arguments as a subset of X k , and each constant
symbol c as an element of X. Once weve fixed what were talking about by provided such interpretations,
we can view them as expressing something meaningful. For example, if weve fixed a group G and interpreted
f as the group operation and e as the identity, the formula
xy(f(x, y) = f(y, x))
is either true or false, according to whether G is abelian or not.
Always keep the distinction between syntax and semantics clear in your mind. Many basic theorems of
the subject involve the interplay between syntax and semantics. For example, in the logics we discuss, we will
have two types of implication between formulas. Let be a set of formulas and let be a formula. One way
of saying that the formulas in imply is semantic: whenever we provide an interpretation which makes all
of the formulas of true, it happens that is also true. For instance, if were working in propositional logic

1.4. THE POINT OF IT ALL

13

and we have = {((A B) C)} and = (A C), then implies in this sense because no matter how we
assign true/false values to A, B, and C that make the formulas in true, it happens that will also be true.
Another approach that well develop is syntactic. Well define deductions which are formal proofs built
from certain permissible syntactic manipulations, and will imply in this sense if there is a witnessing
deduction. The Soundness Theorem and the Completeness Theorem for first-order logic (and propositional
logic) says that the semantic version and syntactic version are the same. This result amazingly allows one
to mimic mathematical reasoning with purely syntactic manipulations.

1.4

The Point of It All

One important aspect, often mistaken as the only aspect, of mathematical logic is that it allows us to study
mathematical reasoning. A prime example of this is given by the last sentence of the previous section.
The Completeness Theorem says that we can capture the idea of one mathematical statement following
from other mathematical statements with nothing more than syntactic rules on symbols. This is certainly
computationally, philosophically, and foundationally interesting, but its much more than that. A simple
consequence of this result is the Compactness Theorem, which says something very deep about mathematical
reasoning and has many interesting applications in mathematics.
Although weve developed the above logics with modest goals of handling certain fields of mathematics,
its a wonderful and surprising fact that we can embed (nearly) all of mathematics in an elegant and natural
first-order system: first-order set theory. This opens the door to the possibility of proving that certain
mathematical statements are independent of our usual axioms. That is, that there are formulas such that
there is no deduction from the usual axioms of either or (). Furthermore, the field of set theory has
blossomed into an intricate field with its own deep and interesting questions.
Other very interesting and fundamental subjects arise when we ignore the foundational aspects and
deductions altogether, and simply look at what weve accomplished by establishing a precise language to
describe an area of mathematics. With a language in hand, we now have a way to say that certain objects
are definable in that language. For instance, take the language of commutative rings mentioned above. If we
fix a particular commutative ring, then the formula
y(m(x, y) = 1)
has a free variable x and defines the set of units in the ring. With this point of view, weve opened up
the possibility of proving lower bounds on the complexity of any definition of a certain object, or even of
proving that no such definition exists in the language.
Another, closely related, way to take our definitions of precise languages and run with it is the subject
of model theory. In group theory, you state some axioms and work from there in order to study all possible
realizations of the axioms, i.e. groups. However, as we saw above, the group axioms arise in one possible
language with one possible set of axioms. Instead, we can study all possible languages and all possible sets
of axioms and see what we can prove in general and how the realizations compare to each other. In this
sense, model theory is a kind of abstract abstract algebra.
Finally, although its probably far from clear how it fits in at this point, computability theory is intimately
related to the above subjects. To see the first glimmer of a connection, notice that computer programming
languages are also formal languages with a precise grammar and a clear distinction between syntax and
semantics. As well see in time, however, the connection is much deeper.

1.5

Some Basic Terminology and Notation

Definition 1.5.1. We let N = {0, 1, 2, . . . } and we let N+ = N\{0}.

14

CHAPTER 1. INTRODUCTION

Definition 1.5.2. For each n N, we let [n] = {m N : m < n}.


We will often find a need to work with finite sequences, so we establish notation here.
Definition 1.5.3. Let X be a set. Given n N, we call a function : [n] X a finite sequence from X
of length n. We denote the set of all finite sequences from X of length n by X n . We use to denote the
unique sequence of length 0, so X 0 = {}.
S
Definition 1.5.4. Let X be a set. We let X = nN X n , i.e. X is the set of all finite sequences from X.
We denote finite sequences by simply listing the elements in order. For instance, if X = {a, b}, the
sequence aababbba is an element of X . Sometimes for clarity, well insert commas and instead write
a, a, b, a, b, b, b, a.
Definition 1.5.5. If , X , we say that is an initial segment of , and write , if =  [n] for
some n. We say that is a proper initial segment of , and write , if and 6= .
Definition 1.5.6. If , X , we denote the concatenation of and by or .
Definition 1.5.7. If , X , we say that is a substring of if there exists , X such that = .
Definition 1.5.8. A set A is countably infinite if there exists a bijection g : N A. A set A is countable
if it is either finite or countably infinite.
Definition 1.5.9. Given a set A, we let P(A) be the set of all subsets of A, and we call P(A) the power set
of A.

Chapter 2

Induction and Recursion


The natural numbers are perhaps the only structure that youve had the pleasure of working with when doing
proofs by induction or definitions by recursion, but there are many more arenas in which variants of induction
and recursion apply. In fact, more delicate and exotic proofs by induction and definitions by recursion are
two central tools in mathematical logic. Once we get to set theory, well see how to do transfinite induction
and recursion, and this tool is invaluable in set theory and model theory. In this section, we develop the
more modest tools of induction and recursion along structures which are generated by one-step processes,
like the natural numbers. Occasionally these types of induction are called structural induction.

2.1

Induction and Recursion on N

We begin by compiling the basic facts about induction and recursion on the natural numbers. We dont
seek to prove that proofs by induction or definitions by recursion on N are valid methods because these
are obvious from the normal mathematical perspective which we are adopting. Besides, in order to do so,
we would first have to fix a context in which we are defining N. Eventually, we will indeed carry out such
a construction in the context of axiomatic set theory, but that is not our current goal. Although youre no
doubt familiar with the intuitive content of the results here, our goal here is simply to carefully codify these
facts in more precise ways to ease the transition to more complicated types of induction and recursion.
Definition 2.1.1. We define S : N N by letting S(n) = n + 1 for all n N.
Induction is often stated in the form If we know something holds of 0, and we know that it holds of S(n)
whenever it holds of n, then we know that it holds for all n N. We state it in the following more precise
set-theoretic fashion (avoiding explicit mention of somethings or properties) because we can always form
the set X = {n N : something holds of n}.
Theorem 2.1.2 (Induction on N - Step Form). Suppose that X N is such that 0 X and S(n) X
whenever n X. We then have X = N.
Definitions by recursion is usually referred to by saying that When defining f (S(n)), you are allowed to
refer to the value of f (n). For instance, let f : N N be the factorial function f (n) = n!. One usually sees
this defined in the following manner:
f (0) = 1
f (S(n)) = S(n) f (n)
We aim to codify this idea a little more abstractly and rigorously in order to avoid the self-reference of f in
the definition and allowable rules so that we can generalize it to other situations.
15

16

CHAPTER 2. INDUCTION AND RECURSION

Suppose that X is a set and were trying to define f : N X recursively. What do we need? Well, we
need to know f (0), and we need to have a method telling you how to define f (S(n)) from knowledge of n
and the value of f (n). If we want to avoid the self-referential appeal to f when invoking the value of f (n),
what we need is a method which tells us what to do next regardless of the actual particular value of f (n).
That is, it needs to tell us what to do on any possible value, not just the one the ends up happening to be
f (n). Formally, this method can be given by a function g : N X X which tells you what to do at the
next step. Intuitively, this function acts as an iterator. That is, it says if the the last thing you were working
on was input n and it so happened that you set f (n) to equal x A, then you should define f (S(n)) to be
the value g(n, x).
With all this setup, we now state the theorem which says that no matter what value you want to assign
to f (0), and no matter what iterating function g : N X X you give, there exists a unique function
f : N X obeying the rules.
Theorem 2.1.3 (Recursion on N - Step Form). Let X be a set, let y X, and let g : N X X. There
exists a unique function f : N X such that
1. f (0) = y.
2. f (S(n)) = g(n, f (n)) for all n N.
In the case of the factorial function, we have X = N, y = 1, and g : NN N defined by g(n, x) = S(n)x.
The above theorem implies that there is a unique function f : N N such that
1. f (0) = y = 1.
2. f (S(n)) = g(n, f (n)) = S(n) f (n) for all n N.
Notice how we moved any mention of self-reference out of the definition of g, and pushed all of the weight
onto the theorem which states the existence and uniqueness of a function which behaves properly, i.e. which
satisfies the initial condition and appropriate recursive equation.
There is another version of induction on N, sometimes called strong induction, which appeals to the
ordering of the natural numbers rather than the stepping of the successor function.
Theorem 2.1.4 (Induction on N - Order Form). Suppose that X N is such that n X whenever m X
for all m < n. We then have X = N.
Notice that there is no need to deal with the base case of n = 0, because this is handled vacuously due
to the fact that there is no m < 0.
Theorem 2.1.5 (Recursion on N - Order Form). Let X be a set and let g : X X. There exists a unique
function f : N X such that
f (n) = g(f  [n])
for all n N.

2.2

Generation

There are many situations throughout mathematics when we want to look at what a certain subset generates. For instance, you have a subset of a group (vector space, ring), and you want to consider the
subgroup (subspace, ideal) that they generate. Another example is you have a subset of a graph, and you
want to consider the set of vertices in the graph reachable from the subset. In the introduction, we talked
about generating all formulas from primitive formulas using certain connections. This situation will arise so
frequently in what follows that its a good idea to unify them all in a common framework.

2.2. GENERATION

17

Definition 2.2.1. Let A be a set and let k N+ . A function h : Ak A is called a k-ary function on A.
We call k the arity of h. A 1-ary function is sometimes called unary and a 2-ary function is sometimes
called binary.
Definition 2.2.2. Suppose that A is a set, B A and H is a collection of functions such that each h H is
a k-ary function on A for some k N+ . We call (A, B, H) a simple generating system. In such a situation,
for each k N+ , we denote the set of k-ary functions in H by Hk .
Examples.
1. Let G be a group and let B G. We want the subgroup of G that B generates. The operations in
question here are the group operation and inversion, so we let H = {h1 , h2 } where
(a) h1 : G2 G is given by h1 (x, y) = x y for all x, y G.
(b) h2 : G G is given by h2 (x) = x1 for all x G.
(G, B, H) is a simple generating system.
2. Let V be a vector space over R and let B V . We want the subspace of V that B generates. The
operations in question consist of vector addition and scalar multiplication, so we let H = {g} {h :
R} where
(a) g : V 2 V is given by g(v, w) = v + w for all v, w V .
(b) For each R, h : V V is given by h (v) = v for all v V .
(V, B, H) is a simple generating system.

There are certain cases when the natural functions to put into H are not total or are multi-valued. For
instance, in the first example below, well talk about the subfield generated by a certain subset of a field,
and well want to include multiplicative inverses for all nonzero elements. When putting a corresponding
function in H, there is no obvious way to define it on 0. Also, if generating the vertices reachable from a
subset of a graph, we may want to throw many vertices in because a vertex can be linked to many others.
Definition 2.2.3. Let A be a set and let k N+ . A function h : Ak P(A) is called a set-valued k-ary
function on A. We call k the arity of h. A 1-ary set-valued function is sometimes called unary and a 2-ary
set-valued function is sometimes called binary.
Definition 2.2.4. Suppose that A is a set, B A and H is a collection of functions such that each h H
is a set-valued k-ary function on A for some k N+ . We call (A, B, H) a generating system. In such a
situation, for each k N+ , we denote the set of multi-valued k-ary functions in H by Hk .
Examples.
1. Let K be a field and let B K. We want the subfield of K that B generates. The operations in
question here are addition, multiplication, and both additive and multiplicative inverses. We thus let
H = {h1 , h2 , h3 , h4 } where
(a) h1 : K 2 P(K) is given by h1 (a, b) = {a + b} for all a, b K.
(b) h2 : K 2 P(K) is given by h2 (a, b) = {a b} for all a, b K.
(c) h3 : K P(K) is given by h3 (a) = {a} for all a K.

18

CHAPTER 2. INDUCTION AND RECURSION


(d) h4 : K P(K) is given by
(
h4 (a) =

{a1 }

if a 6= 0
if a = 0

(K, B, H) is a generating system.


2. Let G be a graph with vertex set V and edge set E, and let B V . We want to consider the subset
of V reachable from B using edges from E. Thus, we want to say that if weve generated v V ,
and w V is connected to v via some edge, then we should generate w. We thus let H = {h} where
h : V V is defined as follows:
h(v) = {u V : (v, u) E}
(V, B, H) is a generating system.

Notice that if we have a simple generating system (A, B, H), then we can associate to it the generating
system (A, B, H0 ) where H0 = {h0 : h H} where if h : Ak A is an element of Hk , then h0 : Ak P(A) is
defined by letting h0 (a1 , a2 , . . . , ak ) = {h(a1 , a2 , . . . , ak )}.
Given a generating system (A, B, H), we want to define the set of elements of A generated from B using
the functions in H. There are many natural ways of doing this. We discuss three different ways which divide
into approaches from above and approaches from below. Each of these descriptions can be slightly
simplified for simple generating systems, but its not much harder to handle the more general case.

2.2.1

From Above

Our first approach is a top-down approach.


Definition 2.2.5. Let (A, B, H) be a generating system, and let J A. We say that J is inductive if
1. B J.
2. If k N+ , h Hk , and a1 , a2 , . . . , ak J, then h(a1 , a2 , . . . , ak ) J.
Given a generating system (A, B, H), we certainly have a candidate for an inductive set, namely A itself.
However, this set may be too big. For instance, consider the generating system A = R, B = {7}, and
H = {h} where h : R R is the function h(x) = 2x. In this situation, each of the sets R, Z, N, and
{n N : n is a multiple of 7} is inductive, but theyre not what we want. The idea is to consider the smallest
inductive subset of A containing B. Of course, we need to prove that such a set exists.
Proposition 2.2.6. Let (A, B, H) be a generating system. There exists a unique inductive set I such that
I J for every inductive set J.
Proof. We first prove existence. Let I be the intersection of all inductive sets, i.e. I = {a A : a J for
every inductive set J}. By definition, we have I J for every inductive set J, so we need only show that
I is inductive. Since B J for every inductive set J (by definition of inductive), it follows that B I.
Suppose that k N+ , h Hk and a1 , a2 , . . . , ak I. For any inductive set J, we have a1 , a2 , . . . , ak J,
hence h(a1 , a2 , . . . , ak ) J because J is inductive. Therefore, h(a1 , a2 , . . . , ak ) J for every inductive set
J, hence h(a1 , a2 , . . . , ak ) I. It follows that I is inductive.
To see uniqueness suppose that both I1 and I2 are inductive sets such that I1 J and I2 J for every
inductive set J. We then have I1 I2 and I2 I1 , hence I1 = I2 .
Definition 2.2.7. Let (A, B, H) be a generating system. We denote the unique set of the previous proposition
by I(A, B, H), or simply by I when the context is clear.

2.2. GENERATION

2.2.2

19

From Below: Building by Levels

The second idea is to make a system of levels, at each new level adding elements of A which are reachable
from elements already accumulated by applying an element of H.
Definition 2.2.8. Let (A, B, H) be a generating system. We define a sequence Vn (A, B, H), or simply Vn ,
recursively as follows.
V0 = B
Vn+1 = Vn {c A : There exists k N+ , h Hk , and a1 , a2 , . . . , ak Vn such that c h(a1 , a2 , . . . , ak )}
S
Let V (A, B, H) = V =
Vn .
nN

The following remarks are immediate from our definition.


Remark 2.2.9. Let (A, B, H) be a generating system.
1. If m n, then Vm Vn .
2. For all c V , either c B or there exists k N+ , h Hk , and a1 , a2 , . . . , ak V with c
h(a1 , a2 , . . . , ak ).

2.2.3

From Below: Witnessing Sequences

The third method is to consider those elements of A which you are forced to put in because you see a
witnessing construction.
Definition 2.2.10. Let (A, B, H) be a generating system. A witnessing sequence is an element A \{}
such that for all j < ||, either
1. (j) B
2. There exists k N+ , h Hk and i1 , i2 , . . . , ik < j such that (j) h((i1 ), (i2 ), . . . , (ik )).
If is a witnessing sequence, we call it a witnessing sequence for (|| 1) (i.e. a witnessing sequence for
the last element of that sequence).
Definition 2.2.11. Let (A, B, H) be a generating system. Set
W (A, B, H) = W = {a A : there exists a witnessing sequence for a}.
It sometimes useful to look only at those elements reachable which are witnessed by sequences of a bounded
length, so for each n N+ , set
Wn = {a A : there exists a witnessing sequence for a of length n}.
The first simple observation is that if we truncate a witnessing sequence, what remains is a witnessing
sequence.
Remark 2.2.12. If is a witnessing sequence and || = n, then for all m N+ with m < n we have that
 [m] is a witnessing sequence.
Another straightforward observation is that if we concatenate two witnessing sequences, the result is a
witnessing sequence.
Proposition 2.2.13. If and are witnessing sequences, then so is .
Finally, since we can always insert dummy elements from B (assuming its nonempty because otherwise
the result is trivial), we have the following observation.
Proposition 2.2.14. Let (A, B, H) be a generating system. If m n, then Wm Wn .

20

CHAPTER 2. INDUCTION AND RECURSION

2.2.4

Equivalence of the Definitions

Theorem 2.2.15. Let (A, B, H) be a generating system. We then have


I(A, B, H) = V (A, B, H) = W (A, B, H)
Proof. Let I = I(A, B, H), V = V (A, B, H), and W = W (A, B, H).
We first show that V is inductive, hence I V . Notice first that B = V0 V . Suppose now that k N+ ,
h Hk and a1 , a2 , . . . , ak V . For each i, fix ni such that ai Vni . Let m = max{n1 , n2 , . . . , nk }. We then
have ai Vm for all i, hence h(a1 , a2 , . . . , ak ) Vm+1 V . It follows that V is inductive.
We next show that W is inductive, hence I W . Notice first that for every b B, the sequence b is
a witnessing sequence, so b W1 W . Suppose now that k N+ , h Hk , and a1 , a2 , . . . , ak W . Let
c h(a1 , a2 , . . . , ak ). For each i, fix a witnessing sequence i for ai . The sequence 1 2 k c is a witnessing
sequence for c. Therefore. h(a1 , a2 , . . . , ak ) W . It follows that W is inductive.
We next show that Vn I by induction on n, and hence V I. Notice first that V0 = B I. Suppose
now that n N and Vn I. Fix k N+ , h Hk , and a1 , a2 , . . . , ak Vn . Since Vn I, we have
a1 , a2 , . . . , ak I, hence h(a1 , a2 , . . . , ak ) I because I is inductive. It follows that Vn+1 I. By induction,
Vn I for every n N, hence V I.
We next show that Wn I by induction on n N+ , and hence W I. Notice first that W1 = B I.
Suppose now that n N+ and Wn I. Let be an witnessing sequence of length n + 1. We then have that
that  [m + 1] is a witnessing sequence of length m + 1 for all m < n, hence (m) Wm Wn I for
all m < n. Now either (n) B or there exists i1 , i2 , . . . , ik < n such that (n) = h((i1 ), (i2 ), . . . , (ik )).
In either case, (n) I because I is inductive. It follows that Wn+1 I. By induction, Wn I for every
n N+ , hence W I.
Definition 2.2.16. Let (A, B, H) be a generating system. We denote the common value of I, V, W by
G(A, B, H) or simply G.
The nice thing about having multiple equivalent definitions for the same concept is that we can use the
most convenient one when proving a theorem. For example, using (2) of Remark 2.2.9, we get the following
corollary.
Corollary 2.2.17. Let (A, B, H) be a generating system. For all c G, either c B or there exists k N+ ,
h Hk , and a1 , a2 , . . . , ak G with c h(a1 , a2 , . . . , ak ).

2.3

Step Induction

Heres a simple example of using the I definition to prove that we can argue by induction.
Proposition 2.3.1 (Step Induction). Let (A, B, H) be a generating system. Suppose that X A satisfies
1. B X.
2. h(a1 , a2 , . . . , ak ) X whenever k N+ , h Hk , and a1 , a2 , . . . , ak X.
We then have that G X. Thus, if X G, we have X = G.
Proof. Our assumption simply asserts that X is inductive, hence G = I X.
The next example illustrates how we can sometimes identify G explicitly. Notice that we use 2 different
types of induction in the argument. One direction uses induction on N and the other uses induction on G
as just described.

2.4. STEP RECURSION

21

Example 2.3.2. Consider the following simple generating system. Let A = R, B = {7}, and H = {h}
where h : R R is the function h(x) = 2x. Determine G explicitly.
Proof. Intuitively, we want the set {7, 14, 28, 56, . . . }, which we can write more formally as {7 2n : n N}.
Let X = {7 2n : n N}
We first show that X G by showing that 7 2n G for all n N by induction (on N). We have
7 20 = 7 1 = 7 G because B G as G is inductive. Suppose that n N is such that 7 2n G. Since
G is inductive, it follows that h(7 2n ) = 2 7 2n = 7 2n+1 G. Therefore, 7 2n G for all n N by
induction, hence X G.
We now show that G X by induction (on G). Notice that B X because 7 = 7 1 = 7 20 X.
Suppose now that x X and fix n N with x = 7 2n . We then have h(x) = 2 x = 7 2n+1 X. Therefore
G X by induction.
It follows that X = G.
In many cases, its very hard to give a simple explicit description of the set G. This is where induction
really shines, because it allows us to prove something about all elements of G despite the fact that we have
a hard time getting a handle on what exactly the elements of G look like. Heres an example.
Example 2.3.3. Consider the following simple generating system. Let A = Z, B = {6, 183}, and H = {h}
where h : A3 A is given by h(k, m, n) = k m + n. Every element of G is divisible by 3.
Proof. Let X = {n Z : n is divisible by 3}. We prove by induction that G X. We first handle the bases
case. Notice that 6 = 3 2 and 183 = 3 61, so B X.
We now do the inductive step. Suppose that k, m, n X, and fix `1 , `2 , `3 Z with k = 3`1 , m = 3`2 ,
and n = 3`3 . We then have
h(k, m, n) = k m + n
= (3`1 ) (3`2 ) + 3`3
= 9`1 `2 + 3`3
= 3(3`1 `2 + `3 )
hence h(k, m, n) X.
It follows by induction that G X, i.e. that every element of G is divisible by 3.

2.4

Step Recursion

In this section, we restrict attention to simple generating systems for simplicity (and also because all examples
that well need which support definition by recursion will be simple). Naively, one might expect that a
straightforward analogue of Step Form of Recursion on N will carry over to recursion on generated sets. The
hope would be the following.
Hope 2.4.1. Suppose that (A, B, H) is a simple generating system and X is a set. Suppose also that
: B X and that for every h Hk , we have a function gh : (A X)k X. There exists a unique function
f : G X such that
1. f (b) = (b) for all b B.
2. f (h(a1 , a2 , . . . , ak )) = gh (a1 , f (a1 ), a2 , f (a2 ), . . . , ak , f (ak )) for all a1 , a2 , . . . , ak G.
Unfortunately, this hope is too good to be true. Intuitively, we may generate an element a of A in many
very different ways, and our different iterating functions conflict on what values we should assign to a. Heres
a simple example to see what can go wrong.

22

CHAPTER 2. INDUCTION AND RECURSION

Example 2.4.2. Consider the following simple generating system. Let A = {1, 2}, B = {1}, and H = {h}
where h : A A is given by h(1) = 2 and h(2) = 1. Let X = N. Define : B N by letting (1) = 1 and
define gh : A N N by letting gh (a, n) = n + 1. There is no function f : G N such that
1. f (b) = (b) for all b B.
2. f (h(a)) = gh (a, f (a)) for all a G.
Proof. Notice first that G = {1, 2}. Suppose that f : G N satisfies (1) and (2) above. Since f satisfies (1),
we must have f (1) = (1) = 1. By (2), we then have that
f (2) = f (h(1)) = gh (1, f (1)) = f (1) + 1 = 1 + 1 = 2.
By (2) again, it follows that
f (1) = f (h(2)) = gh (2, f (2)) = f (2) + 2 = 1 + 2 = 3,
contradicting the fact that f (1) = 1.
To get around this problem, we want a definition of a nice simple generating system. Intuitively, we
want to say something like every element of G is generated in a unique way. The following definition is a
relatively straightforward way to formulate this.
Definition 2.4.3. A simple generating system (A, B, H) is free if
1. ran(h  Gk ) B = whenever h Hk .
2. h  Gk is injective for every h Hk .
3. ran(h1  Gk ) ran(h2  G` ) = whenever h1 Hk and h2 H` with h1 6= h2 .
Heres a simple example which will play a role for us in Section 2.5. Well see more subtle and important
examples when we come to Propositional Logic and First-Order Logic.
Example 2.4.4. Let X be a set. Consider the following simple generating system. Let A = X , let B = X,
and let H = {hx : x X} where hx : X X is the function hx () = x. We then have that G = X \{}
and that (A, B, H) is free.
Proof. First notice that X \{} is inductive because
/ B and hx () 6= for all X . Next, a simple
n
+
induction on n shows that X G for all n N . It follows that G = X \{}.
We now show that (A, B, H) is free. First notice that for any x X, we have that ran(hx  G) X =
because every element of ran(hx  G) has length at least 2 (because
/ G).
Now for any x X, we have that hx  G is injective because if hx () = hx ( ), then x = x , and hence
= .
Finally, notice that if x, y X with x 6= y, we have that ran(hx  G) ran(hy  G) = because every
elements of ran(hx  G) begins with x while every element of ran(hy  G) begins with y.
On to the theorem which says that if a simple generating system is free, then we can perform recursive
definitions on it.
Theorem 2.4.5. Suppose that the simple generating system (A, B, H) is free and X is a set. Suppose also
that : B X and that for every h Hk , we have a function gh : (A X)k X. There exists a unique
function f : G X such that
1. f (b) = (b) for all b B.

2.4. STEP RECURSION

23

2. f (h(a1 , a2 , . . . , ak )) = gh (a1 , f (a1 ), a2 , f (a2 ), . . . , ak , f (ak )) for all h Hk and all a1 , a2 , . . . , ak G.


Proof. We first prove the existence of an f using a fairly slick argument. The basic idea is to build a new
simple generating system whose elements are pairs (a, x) where a A and x X. Intuitively, we want to
generate the pair (a, x) if something (either or one the gh functions) tells us that wed better set f (a) = x
if we want to satisfy the above conditions. We then go on to prove (by induction on G) that for every a A,
there exists a unique x X such that (a, x) is in our new generating system. Thus, there are no conflicts,
so we can use this to define our function.
Now for the details. Let A0 = A X, B 0 = {(b, (b)) : b B} A0 , and H0 = {gh0 : h H} where for
each h Hk , the function gh0 : (A X)k A X is given by
gh0 (a1 , x1 , a2 , x2 , . . . , ak , xk ) = (h(a1 , a2 , . . . , ak ), gh (a1 , x1 , a2 , x2 , . . . , ak , xk )).
Let G0 = G(A0 , B 0 , H0 ). A simple induction (on G0 ) shows that if (a, x) G0 , then a G. Let
Z = {a G : there exists a unique x X such that (a, x) G0 }
We prove by induction (on G) that Z = G.
Base Case: Notice that for each b B, we have (b, (b)) B 0 G0 , hence there exists x X such
that (b, x) G0 . Fix b B and suppose that y X is such that (b, y) G0 and y 6= (b). We then have
(b, y)
/ B 0 , hence by Corollary 2.2.17 there exists h Hk and (a1 , x1 ), (a2 , x2 ), . . . , (ak , xk ) G0 such that
(b, y) = gh0 (a1 , x1 , a2 , x2 , . . . , ak , xk )
= (h(a1 , a2 , . . . , ak ), gh (a1 , x1 , a2 , x2 , . . . , ak , xk )).
Since a1 , a2 , . . . , ak G, this contradicts the fact that ran(h  Gk ) B = . Therefore, for every b B,
there exists a unique x X, namely (b), such that (b, x) G0 . Thus, B Z.
Inductive Step: Fix h Hk , and suppose that a1 , a2 , . . . , ak Z. For each i, let xi be the unique element
of X with (ai , xi ) G0 . Notice that
(h(a1 , a2 , . . . , ak ), gh (a1 , x1 , a2 , x2 , . . . , ak , xk )) = gh0 (a1 , x1 , a2 , x2 , . . . , ak , xk ) G0
hence there exists x X such that (h(a1 , a2 , . . . , ak ), x) G0 . Suppose now that y X is such that
(h(a1 , a2 , . . . , ak ), y) G0 . We have (h(a1 , a2 , . . . , ak ), y)
/ B 0 because ran(h  Gk ) B = , so there exists
0

h H` together with (c1 , z1 ), (c2 , z2 ), . . . , (c` , z` ) G such that


1 , c2 , . . . , c` ), g (c1 , z1 , c2 , z2 , . . . , c` , z` )).
(h(a1 , a2 , . . . , ak ), y) = (h(c
h
because ran(h  Gk ) ran(h
 G` ) = if h 6= h,
and
Since c1 , c2 , . . . , c` G, it follows that h = h
k
hence k = `. Also, since h  G is injective, it follows that ai = ci for all i. We therefore have y =
gh (a1 , x1 , a2 , x2 , . . . , ak , xk ). Therefore, there exists a unique x X, namely gh (a1 , x1 , a2 , x2 , . . . , ak , xk ),
such that (h(a1 , a2 , . . . , ak ), x) G0 . It now follows by induction that Z = G.
Define f : G X by letting f (a) be the unique x X such that (a, x) G0 . We need to check that f
satisfies the needed conditions. As stated above, for each b B, we have (b, (b)) G0 , so f (b) = (b). Thus,
f satisfies condition (1). Suppose now that h Hk and all a1 , a2 , . . . , ak G. We have (ai , f (ai )) G0 for
all i, hence
(h(a1 , a2 , . . . , ak ), gh (a1 , f (a1 ), a2 , f (a2 ), . . . , ak , f (ak ))) G0
by the above comments. It follows that f (h(a1 , a2 , . . . , ak )) = gh (a1 , f (a1 ), a2 , f (a2 ), . . . , ak , f (ak )). Thus,
f also satisfies condition (2).
Finally, we need show that f is unique. Suppose that f1 , f2 : G X satisfy the conditions (1) and (2).
Let Y = {a G : f1 (a) = f2 (a)} We show that Y = G by induction on G. First notice that for any b B
we have
f1 (b) = (b) = f2 (b)

24

CHAPTER 2. INDUCTION AND RECURSION

hence b Y . It follows that B Y . Suppose now that h Hk and a1 , a2 , . . . , ak Y . Since ai Y for each
i, we have f1 (ai ) = f2 (ai ) for each i, and hence
f1 (h(a1 , a2 , . . . , ak )) = gh (a1 , f1 (a1 ), a2 , f1 (a2 ), . . . , ak , f1 (ak ))
= gh (a1 , f2 (a1 ), a2 , f2 (a2 ), . . . , ak , f2 (ak ))
= f2 (h(a1 , a2 , . . . , ak ))
Thus, h(a1 , a2 , . . . , ak ) Y . It follows by induction that Y = G, i.e. f1 (a) = f2 (a) for all a G.

2.5

An Illustrative Example

We now embark on a careful formulation and proof of the statement: If f : A2 A is associative, i.e.
f (a, f (b, c)) = f (f (a, b), c) for all a, b, c A, then any grouping of terms which preserves the ordering of
the elements inside the grouping gives the same value. In particular, if we are working in a group A, then
we can write things like acabba without parentheses because any allowable insertion of parentheses gives the
same value.
Throughout this section, let A be a set not containing the symbols [, ], or ?. Let SymA = A {[, ], ?}.
Definition 2.5.1. Define a binary function h : (SymA )2 SymA by letting h(, ) be the sequence [ ? ].
Let V alidExpA = G(SymA , A, {h}) (viewed as a simple generating system).
For example, suppose that A = {a, b, c}. Typical elements of G(SymA , A, {h}) are c, [b ? [a ? c]] and
[c ? [[c ? b] ? a]. The idea now is that if we have a particular function f : A2 A, we can intrepret ? as
application of the function, and then this should give us a way to make sense of, that is evaluate, any
element of V alidExpA .

2.5.1

Proving Freeness

Our first goal is to prove the following result.


Theorem 2.5.2. The simple generating system (SymA , A, {h}) is free.
Definition 2.5.3. Define K : SymA Z as follows. We first define w : SymA Z as follows.
w(a) = 0 for all a A
w(?) = 0
w([) = 1
w(]) = 1
We then define K : SymA Z by letting K() = 0 and letting
K() =

w((i))

i<||

for all SymA \{}.


Remark 2.5.4. If , SymA , then K( ) = K() + K( ).
Proposition 2.5.5. If V alidExpA , then K() = 0.

2.5. AN ILLUSTRATIVE EXAMPLE

25

Proof. The proof is by induction on . In other words, we let X = { V alidExpA : K() = 0}, and we
prove by induction that X = V alidExpA . Notice that for every a A, we have that K(a) = 0. Suppose
that , V alidExpA are such that K() = 0 = K(). We then have that
K([ ? ]) = K([) + K() + K(?) + K() + K(])
= 1 + 0 + 0 + 0 + 1
= 0.
The result follows by induction.
Proposition 2.5.6. If V alidExpA and (i.e. is a proper initial segment of ) with 6= , then
K() 1.
Proof. Again, the proof is by induction on . That is, we let
X = { V alidExpA : For all with 6= , we have K() 1}
and we prove by induction that X = V alidExpA .
For every a A, this is trivial because there is no 6= with a.
Suppose that , V alidExpA and the result holds for and . We prove the result for [ ? ].
Suppose that [ ? ] and 6= . If is [, then K() = 1. If is [ where 6= and , then
K() = 1 + K( )
1 1

(by induction)

1.
If is [ or [?, then
K() = 1 + K()
= 1 + 0

(by Proposition 2.5.5)

= 1.
If is [ ? , where 6= and , then
K() = 1 + K() + K( )
= 1 + 0 + K( )

(by Proposition 2.5.5)

1 + 0 1

(by induction)

1.
Otherwise, is [ ? , and
K() = 1 + K() + K()
= 1 + 0 + 0

(by Proposition 2.5.5)

= 1.
Thus, the result holds for [ ? ].
Corollary 2.5.7. If , V alidExpA , then 6 .
Proof. This follows by combining Proposition 2.5.5 and Proposition 2.5.6, along with noting that
/
V alidExpA (which follows by a trivial induction).

26

CHAPTER 2. INDUCTION AND RECURSION

Theorem 2.5.8. The generating system (SymA , A, {h}) is free.


Proof. First notice that ran(h  (V alidExpA )2 ) A = because all elements of ran(h) begin with [.
Suppose that 1 , 2 , 1 , 2 V alidExpA and h(1 , 1 ) = h(2 , 2 ). We then have [1 ? 1 ] = [2 ? 2 ],
hence 1 ? 1 = 2 ? 2 . Since 1 2 and 2 1 are both impossible by Corollary 2.5.7, it follows that
1 = 2 . Therefore, ?1 = ?2 , and so 1 = 2 . It follows that h  (V alidExpA )2 is injective.

2.5.2

The Result

Since we have established freeness, we can define functions recursively. The first such function we define is
the evaluation function.
Definition 2.5.9. Let f : A2 A. We define a function Evf : V alidExpA A recursively by letting
Evf (a) = a for all a A.
Evf ([ ? ]) = f (Evf (), Evf ()) for all , V alidExpA .
Formally, we use freeness to justify this definition as follows. Let : A A be the identity map, and let
gh : (SymA A)2 A be the function defined by letting gh ((, a), (, b)) = f (a, b). By freeness, there is a
unique function Evf : V alidExpA A such that
1. Evf (a) = (a) for all a A.
2. Evf (h(, )) = gh ((, Evf ()), (, Evf ())) for all , V alidExpA .
which, unravelling definitions, is exactly what we wrote above.
We now define the function which eliminates all mention of parentheses and ?. Thus, it produces the
sequence of elements of A within the given sequence in order of their occurrence.
Definition 2.5.10. Define a function D : V alidExpA A recursively by letting
D(a) = a for all a A.
D([ ? ]) = D() D() for all , V alidExpA .
With these definitions in hand, we can now precisely state our theorem.
Theorem 2.5.11. Suppose that f : A2 A is associative, i.e. f (a, f (b, c)) = f (f (a, b), c) for all a, b, c A.
For all , V alidExpA with D() = D(), we have Evf () = Evf ().
In order to prove our theorem, well make use of the following function. Intuitively, it takes a sequence
such as cabc and associates to the right to produce [c ? [a ? [b ? c]]]. Thus, it provides a canonical way to
put together the elements of the sequence into something we can evaluate.
To make the recursive definition precise, consider the simple generating system (A , A, {ha : a A})
where ha : A A is defined by ha () = a. As shown in Example 2.4.4, we know that (A , A, {ha : a A})
is free and we have that G = A \{}.
Definition 2.5.12. We define R : A \{} SymA recursively by letting R(a) = a for all a A, and letting
R(a) = [a ? R()] for all a A and all A \{}.
In order to prove our theorem, we will show that Evf () = Evf (R(D())) for all V alidExpA ,
i.e. that we can take any V alidExpA , rip it apart so that we see the elements of A in order, and then
associate to the right, without affecting the result of the evaluation. We first need the following lemma.
Lemma 2.5.13. Evf ([R() ? R( )]) = Evf (R( )) for all , A \{}.

2.5. AN ILLUSTRATIVE EXAMPLE

27

Proof. Fix A \{}. We prove the result for this fixed by induction on A \{}. That is, we let
X = { A \{} : Evf ([R() ? R( )]) = Evf (R( ))}
and prove by induction on (A , A, {ha : a A}) that X = A \{}. Suppose first that a A. We then have
Evf ([R(a) ? R( )]) = Evf ([a ? R( )])

(by definition of R)

= Evf (R(a ))

(by definition of R)

so a X. Suppose now that X and that a A. We show that a X. We have


Evf ([R(a) ? R( )]) = Evf ([[a ? R()] ? R( )])

(by definition of R)

= f (Evf ([a ? R()]), Evf (R( )))

(by definition of Evf )

= f (f (a, Evf (R())), Evf (R( )))

(by definition of Evf using Evf (a) = a)

= f (a, f (Evf (R()), Evf (R( ))))

(since f is associative)

= f (a, Evf ([R() ? R( )]))

(by definition of Evf )

= f (a, Evf (R( )))

(since X)

= Evf ([a ? R( )])

(by definition of Evf using Evf (a) = a)

= Evf (R(a ))

(by definition of R)

so a X. The result follows by induction.


Lemma 2.5.14. Evf () = Evf (R(D())) for all V alidExpA .
Proof. By induction on V alidExpA . If a A, this is trivial because R(D(a)) = R(a) = a. Suppose that
, V alidExpA and the result holds for and .
Evf ([ ? ]) = f (Evf (), Evf ())

(by definition of Evf )

= f (Evf (R(D())), Evf (R(D())))

(by induction)

= Evf ([R(D()) ? R(D())])

(by definition of Evf )

= Evf (R(D() D()))

(by Lemma 2.5.13)

= Evf (R(D([ ? ])))

(by definition of D)

Proof of Theorem 2.5.11. Suppose that , V alidExpA are such that D() = D(). We then have that
Evf () = Evf (R(D()))

2.5.3

(by Lemma 2.5.14)

= Evf (R(D()))

(since D() = D())

= Evf ()

(by Lemma 2.5.14)

An Alternate Syntax - Polish Notation

It is standard mathematical practice to place binary operations like ? between two elements (so called infix
notation) to signify the application of a binary function, and throughout this section we have followed that
tradition in building up permissible expressions. However, the price we pay is that we need to use parentheses
to avoid ambiguity. For example, it is not clear how to parse a ? b ? c into one of [[a ? b] ? c] or [a ? [b ? c]],
and if the underlying function f is not associative then the distinction really matters.

28

CHAPTER 2. INDUCTION AND RECURSION

We can of course move the operation to the front and write ?[a, b] instead of [a ? b] similar to how we
sometimes write f (x, y) for a function of two variables. At first sight this looks even worse because we
introduced a comma. However, it turns out that you can avoid all of this extra symbolism entirely. That
is, we simply write ?ab without additional punctuation and build further expressions from here without
introducing ambiguity. This syntactic approach is called Polish notation. For example, we have the
following translations in Polish notation.
[[a ? b] ? c] = ? ? abc
[a ? [b ? c]] = ?a ? bc.
[[a ? b] ? [c ? d]] = ? ? ab ? cd.
We now go about proving that every expression in Polish notation is built up in a unique way. That is, we
prove that the corresponding generating system is free. For this section, let A be a set not containing the
symbol ? and let SymA = A {?} (we no longer need the parentheses).
Definition 2.5.15. Define a binary function h : (SymA )2 SymA by letting h(, ) be the sequence ? .
Let P olishExpA = G(SymA , A, {h}) (viewed as a simple generating system).
Proposition 2.5.16. The simple generating system (SymA , A, {h}) is free.
Definition 2.5.17. Define K : SymA Z as follows. We first define w : SymA Z as follows.
w(a) = 1 for all a A
w(?) = 1
We then define K : SymA Z by letting K() = 0 and letting
K() =

w((i))

i<||

for all SymA \{}.


Remark 2.5.18. If , SymA , then K( ) = K() + K( ).
Proposition 2.5.19. If P olishExpA , then K() = 1.
Proof. The proof is by induction on . Notice that for every a A, we have that K(a) = 1. Suppose that
, P olishExpA are such that K() = 1 = K(). We then have that
K(?) = K(?) + K() + K()
= K()
= 1.
The result follows by induction.
Proposition 2.5.20. If P olishExpA and , then K() 0.
Proof. The proof is by induction on . For every a A, this is trivial because the only A is = , and
we have K() = 0.

2.5. AN ILLUSTRATIVE EXAMPLE

29

Suppose that , P olishExpA and the result holds for and . We prove the result for ?. Suppose
that ?. If = , then K() = 0. If is ? for some , then
K() = K(?) + K( )
1 + 0

(by induction)

1
0.
Otherwise, is ? for some , in which case
K() = K(?) + K() + K( )
= 1 + 1 + K( )

(by Proposition 2.5.19)

1 + 1 + 0

(by induction)

0.
Thus, the result holds for ?.
Corollary 2.5.21. If , P olishExpA , then 6 .
Proof. This follows by combining Proposition 2.5.19 and Proposition 2.5.20.
Theorem 2.5.22. The generating system (SymA , A, H) is free.
Proof. First notice that ran(h  (P olishExpA )2 ) A = because all elements of ran(h) begin with ?.
Suppose that 1 , 2 , 1 , 2 P olishExpA and that h(1 , 1 ) = h(2 , 2 ). We then have ?1 1 = ?2 2 ,
hence 1 1 = 2 2 . Since 1 2 and 2 1 are both impossible by Corollary 2.5.21, it follows that
1 = 2 . Therefore, 1 = 2 . It follows that h  (P olishExpA )2 is injective.

30

CHAPTER 2. INDUCTION AND RECURSION

Chapter 3

Propositional Logic
3.1
3.1.1

The Syntax of Propositional Logic


Standard Syntax

Definition 3.1.1. Let P be a nonempty set not containing the symbols (, ), , , , and . Let SymP =
P {(, ), , , , }. Define a unary function h and binary functions h , h , and h on SymP as follows.
h () = ()
h (, ) = ( )
h (, ) = ( )
h (, ) = ( )
Definition 3.1.2. Fix P . Let F ormP = G(SymP , P, H) where H = {h , h , h , h }.
Definition 3.1.3. Define K : SymP Z as follows. We first define w : SymP Z by letting w(A) = 0
for all A P , letting w(3) = 0 for all 3 {, , , }, letting
Pw(() = 1, and letting w()) = 1. We then
define K : SymP Z by letting K() = 0 and letting K() = i<|| w((i)) for all SymP \{}.
Remark 3.1.4. If , SymP , then K( ) = K() + K( ).
Proposition 3.1.5. If F ormP , then K() = 0.
Proof. A simple induction as above.
Proposition 3.1.6. If F ormP and with 6= , then K() 1.
Proof. A simple induction as above.
Corollary 3.1.7. If , F ormP , then 6 .
Proof. This follows by combining Proposition 3.1.5 and Proposition 3.1.6, along with noting that
/ F ormP
(which follows by a simple induction).
Theorem 3.1.8. The generating system (SymP , P, H) is free.
31

32

CHAPTER 3. PROPOSITIONAL LOGIC

Proof. First notice that ran(h  F ormP ) P = because all elements of ran(h ) begin with (. Similarly,
for any 3 {, , }, we have ran(h3  F orm2P ) P = since all elements of ran(h3 ) begin with (.
Suppose that , F ormP and h () = h (). We then have () = (), hence = . Therefore,
h  F ormP is injective. Fix 3 {, , }. Suppose that 1 , 2 , 1 , 2 F ormP and that h3 (1 , 1 ) =
h3 (2 , 2 ). We then have (1 31 ) = (2 32 ), hence 1 31 = 2 32 . Since 1 2 and 2 1 are
both impossible by Corollary 3.1.7, it follows that 1 = 2 . Therefore, 31 = 32 , and so 1 = 2 . It
follows that h3  F orm2P is injective.
Let 3 {, , }. Suppose that , 1 , 2 F ormP and h () = h3 (1 , 2 ). We then have () =
(1 32 ), hence = 1 32 , contradicting the fact that no element of F ormP begins with (by a simple
induction). Therefore, ran(h  F ormP ) ran(h3  F orm2P ) = .
Suppose now that 31 , 32 {, , } with 31 6= 32 . Suppose that 1 , 2 , 1 , 2 F ormP and
h31 (1 , 1 ) = h32 (2 , 2 ). We then have (1 31 1 ) = (2 32 2 ), hence 1 31 1 = 2 32 2 . Since 1 2
and 2 1 are both impossible by Corollary 3.1.7, it follows that 1 = 2 . Therefore, 31 = 32 , a
contradiction. It follows that ran(h31  F orm2P ) ran(h32  F orm2P ) = .

3.1.2

Polish Notation

Definition 3.1.9. Let P be a set not containing the symbols , , , and . Let SymP = P {, , , }.
Define a unary function h and binary functions h , h , and h on SymP as follows.
h () =
h (, ) =
h (, ) =
h (, ) =
Definition 3.1.10. Fix P . Let F ormP = G(SymP , P, H) where H = {h , h , h , h }.
Definition 3.1.11. Define K : SymP Z as follows. We first define w : SymP Z by letting w(A) = 1 for
all A P , letting w() = 0, and letting P
w(3) = 1 for all 3 {, , }. We then define K : SymP Z
by letting K() = 0 and letting K() = i<|| w((i)) for all SymP \{}.
Remark 3.1.12. If , SymP , then K( ) = K() + K( ).
Proposition 3.1.13. If F ormP , then K() = 1.
Proof. The proof is by induction on . Notice that for every A P , we have that K(A) = 1. Suppose that
F ormP is such that K() = 1. We then have that
K() = 0 + K()
= K()
= 1.
Suppose now that , F ormP are such that K() = 1 = K(), and 3 {, , }. We then have that
K(3) = 1 + K() + K()
= 1 + 1 + 1
= 1.
The result follows by induction.
Proposition 3.1.14. If F ormP and , then K() 0.

3.1. THE SYNTAX OF PROPOSITIONAL LOGIC

33

Proof. The proof is by induction on . For every A P , this is trivial because the only A is = , and
we have K() = 0.
Suppose that F ormP and the result holds for . We prove the result for . Suppose that .
If = , then K() = 0. Otherwise, is for some , in which case
K() = 0 + K( )
0+0

(by induction)

0.
Thus, the result holds for .
Suppose that , F ormP and the result holds for and . Let 3 {, , }. We prove the result
for 3. Suppose that 3. If = , then K() = 0. If is 3 for some , then
K() = 1 + K( )
1 + 0

(by induction)

1
0.
Otherwise, is 3 for some , in which case
K() = 1 + K() + K( )
= 1 + 1 + K( )

(by Proposition 3.1.13)

1 + 1 + 0

(by induction)

0.
Thus, the result holds for 3.
Corollary 3.1.15. If , F ormP , then 6 .
Proof. This follows by combining Proposition 3.1.13 and Proposition 3.1.14.
Theorem 3.1.16. The generating system (SymP , P, H) is free.
Proof. First notice that ran(h  F ormP ) P = because all elements of ran(h ) begin with . Similarly,
for any 3 {, , }, we have ran(h3  F orm2P ) P = since all elements of ran(h3 ) begin with 3.
Suppose that , F ormP and h () = h (). We then have = , hence = . Therefore,
h  F ormP is injective. Fix 3 {, , }. Suppose that 1 , 2 , 1 , 2 F ormP and that h3 (1 , 1 ) =
h3 (2 , 2 ). We then have 31 1 = 32 2 , hence 1 1 = 2 2 . Since 1 2 and 2 1 are both
impossible by Corollary 3.1.15, it follows that 1 = 2 . Therefore, 1 = 2 . It follows that h3  F orm2P is
injective.
For any 3 {, , }, we have ran(h  F ormP )ran(h3  F orm2P ) = because all elements of ran(h )
begin with and all elements of ran(h3 ) begin with 3. Similarly, if 31 , 32 {, , } with 31 6= 32 , we
have ran(h31  F orm2P ) ran(h32  F orm2P ) = because all elements of ran(h31 ) begin with 31 and all
elements of ran(h32 ) begin with 32 .

3.1.3

Official Syntax and Our Abuses of It

Since we should probably fix an official syntax, lets agree to use Polish notation because its simpler in
many aspects and it will be natural to generalize when we talk about the possibility of other connectives
and when we discuss first-order logic. However, as with many official definitions in mathematics, well ignore

34

CHAPTER 3. PROPOSITIONAL LOGIC

and abuse this convention constantly in the interest of readability. For example, well often write things in
standard syntax or in more abbreviated forms. For example, well write A B instead of AB (or (A B)
in the original syntax) . Well also write something like
A1 A2 An1 An
or

n
^

Ai

i=1

instead of (A1 (A2 ( (An1 An ) ))) in standard syntax or A1 A2 An1 An in Polish notation
(which can be precisely defined in a similar manner as R in Section 2.5). In general, when we string together
multiple applications of an operation (such as ) occur in order, we always associate to the right.
When it comes to mixing symbols, lets agree to the following conventions about binding in a similar
fashion to how we think of as more binding than + (so that 35+2 is read as (35)+2). We think of as the
most binding, so we read A B as ((A) B). After that, we consider and as the next most binding,
and has the least binding. Well insert parentheses when we wish to override this binding. For example,
A B C D is really ((A (B)) (C D)) while A (B C D) is really (A ((B) (C D))).

3.1.4

Recursive Definitions

Since weve shown that our generating system is free, we can define functions recursively. It is possible
to avoid using recursion on F ormP to define some of functions. In such cases, you may wonder why we
bother. Since our only powerful way to prove things about the set F ormP is by induction, and definitions
of functions by recursion are well-suited to induction, its simply the easiest way to procede.
Definition 3.1.17. If X is a set, we denote by P(X) the set of all subsets of X. Thus P(X) = {Z : Z X}.
We call P(X) the power set of X.
Definition 3.1.18. We define a function OccurP rop : F ormP P(P ) recursively as follows.
OccurP rop(A) = {A} for all A P .
OccurP rop() = OccurP rop().
OccurP rop(3) = OccurP rop() OccurP rop() for each 3 {, , }.
If you want to be precise in the previous definition, were defining functions : P P(P ), gh : SymP
P(P ) P(P ) and gh3 : (SymP P(P ))2 P(P ) for each 3 {, , } as follows.
(A) = {A} for all A P .
gh (, Z) = Z.
gh3 (1 , Z1 , 2 , Z2 ) = Z1 Z2 for each 3 {, , }.
and were using our result on freeness to assure that there is a unique function OccurP rop : F ormP P(P )
which satisfy the associated requirements. Of course, this method is more precise, but its hardly more
intuitive to use. Its a good exercise to make sure that you can translate a few more informal recursive
definitions in this way, but once you understand how it works you can safely keep the formalism in the back
of your mind.
Heres a somewhat trivial example of using induction to prove a result based on a recursive definition.
Proposition 3.1.19. Suppose that Q P . We then have that F ormQ F ormP .

3.1. THE SYNTAX OF PROPOSITIONAL LOGIC

35

Proof. A trivial induction on F ormQ .


Proposition 3.1.20. Fix P . For any F ormP , we have F ormOccurP rop() .
Proof. The proof is by induction on F ormP . Suppose first that A P . Since OccurP rop(A) = {A} and
A F ormA , we have A F ormOccurP rop(A) .
Suppose that F ormP and that the result holds for , i.e. we have F ormOccurP rop() . Since
OccurP rop() = OccurP rop(), it follows that F ormOccurP rop() . Hence, F ormOccurP rop() .
Suppose that , F ormP , that 3 {, , }, and that the result holds for and , i.e. we have
F ormOccurP rop() and F ormOccurP rop() . Since
OccurP rop() OccurP rop(3) and OccurP rop() OccurP rop(3)
it follows from Proposition 3.1.19 that , F ormOccurP rop(3) . Therefore, 3 F ormOccurP rop(3) .
On to some more important recursive definitions.
Definition 3.1.21. We define a function Depth : F ormP N recursively as follows.
Depth(A) = 0 for all A P .
Depth() = Depth() + 1.
Depth(3) = max{Depth(), Depth()} + 1 for each 3 {, , }.
Example 3.1.22. Depth( AB CD) = 2.
Definition 3.1.23. We define a function Subf orm : F ormP P(F ormP ) recursively as follows.
Subf orm(A) = {A} for all A P .
Subf orm() = {} Subf orm().
Subf orm(3) = {3} Subf orm() Subf orm() for each 3 {, , }.
Example 3.1.24. Subf orm(AB) = {AB, A, A, B}.
Definition 3.1.25. Let , F ormP . We define a function Subst : F ormP F ormP recursively as
follows.
(

if = A

Subst (A) =
A otherwise
(

=
Subst ()

Subst ()

Subst (3)

(
=

if =
otherwise

3Subst ()Subst ()

if = 3
otherwise

for each 3 {, , }.
Example 3.1.26. SubstAB
C ( CAC) = ABA AB.

36

CHAPTER 3. PROPOSITIONAL LOGIC

3.2

Truth Assignments and Semantic Implication

Definition 3.2.1. A function v : P {0, 1} is called a truth assignment on P .


Definition 3.2.2. Let v : P {0, 1} be a truth assignment. We denote by v the unique function v : F ormP
{0, 1} such that
v(A) = v(A) for all A P .
(
1 if v() = 0
v() =
0 if v() = 1

0
v() =
0

1
v() =

1
v( ) =
0

if
if
if
if

v() = 0
v() = 0
v() = 1
v() = 1

and
and
and
and

v() = 0
v() = 1
v() = 0
v() = 1

if
if
if
if

v() = 0
v() = 0
v() = 1
v() = 1

and
and
and
and

v() = 0
v() = 1
v() = 0
v() = 1

if
if
if
if

v() = 0
v() = 0
v() = 1
v() = 1

and
and
and
and

v() = 0
v() = 1
v() = 0
v() = 1

Before moving on, we should a couple of things about what happens when we shrink/enlarge the set P .
Intuitively, if F ormQ and Q P , then we can extend the truth assigment from Q to P arbitrarily
without affecting the value of v(). Here is the precise statement.
Proposition 3.2.3. Suppose that Q P and that v : P {0, 1} is a truth assignment on P . We then have
that v() = (v  Q)() for all F ormQ .
Proof. A trivial induction on F ormQ .
Proposition 3.2.4. Suppose F ormP . Whenever v1 and v2 are truth assignments on P such that
v1 (A) = v2 (A) for all A OccurP rop(), we have v 1 () = v 2 ().
Proof. Let Q = OccurP rop(). We then have that F ormQ by Proposition 3.1.20. Since v1  Q = v2  Q,
we have
v 1 () = (v1  Q)() = (v2  Q)() = v 2 ()

With a method of assigning true/false values to formulas in hand (once weve assigned them to P ), were
now in position to use our semantic definitions to given a precise meaning to The set of formulas implies
the formula .
Definition 3.2.5. Let P be given Let F ormP and let F ormP . We write P , or simply 
if P is clear, to mean that whenever v is a truth assignment on P such that v() = 1 for all , we have
v() = 1. We pronounce  as semantically implies .

3.2. TRUTH ASSIGNMENTS AND SEMANTIC IMPLICATION

37

We also have a semantic way to say that a set of formulas is not contradictory.
Definition 3.2.6. is satisfiable if there exists a truth assignment v : P {0, 1} such that v() = 1 for all
. Otherwise, we say that is unsatisfiable.
Example 3.2.7. Let P = {A, B, C}. We have {A B, (A (C))}  B C.
Proof. Let v : P {0, 1} be a truth assignment such that v(A B) = 1 and v((A (C))) = 1. We need to
show that v(B C) = 1. Suppose not. We would then have that v(B) = 0 and v(C) = 0. Since v(A B) = 1,
this implies that v(A) = 1. Therefore, v(A (C)) = 1, so v((A (C))) = 0, a contradiction.
Example 3.2.8. Let P be given. For any , F ormP , we have { , } 
Proof. Let v : P {0, 1} be a truth assignment and suppose that v( ) = 1 and v() = 1. If v() = 0,
it would follows that v( ) = 0, a contradiction. Thus, v() = 1.
Notation 3.2.9.
1. If = , we write  instead of  .
2. If = {}, we write  instead of {}  .
Definition 3.2.10.
1. Let F ormP . We say that is a tautology if  .
2. If  and  , we say that and are semantically equivalent.
Remark 3.2.11. Notice that and are semantically equivalent if and only if for all truth assigments
v : P {0, 1}, we have v() = v().
Example 3.2.12. is a tautology for any F ormP .
Proof. Fix F ormP . Let v : P {0, 1} be a truth assignment. If v() = 1, then v( ) = 1.
Otherwise, we have v() = 0, in which case v() = 1, and hence v( ) = 1. Therefore, v( ) = 1
for all truth assignments v : P {0, 1}, hence is a tautology.
Example 3.2.13. is semantically equivalent to for any F ormP .
Proof. Fix F ormP . We need to show that for any truth assignment v : P {0, 1}, we have v() = 1 if
and only if v() = 1. We have
v() = 1 v() = 0
v() = 1

Example 3.2.14. and are semantically equivalent for any , F ormP .


Proof. Fix , F ormP . We need to show that for any truth assignment v : P {0, 1}, we have v(
) = 1 if and only if v( ) = 1. Let v : P {0, 1} be a truth assignment. We have
v( ) = 1 v() = 0 or v() = 1
v() = 1 or v() = 1
v( ) = 1

38

CHAPTER 3. PROPOSITIONAL LOGIC

Definition 3.2.15. Define OccurP rop : P(F ormP ) P(P ) by letting


OccurP rop() = {A P : A OccurP rop() for some }.
Proposition 3.2.16. Suppose that Q P , that F ormQ , and that F ormQ . We then have that
P if and only if Q .
Proof. First notice that F ormP and F ormP by Proposition 3.1.19 and Proposition 3.1.20.
Suppose first that Q . Let v : P {0, 1} be a truth assignment such that v() = 1 for all .
We then have that (v  Q)() = 1 for all , hence (v  Q)() = 1 because Q . Therefore, v() = 1.
It follows that P .
Suppose then that P . Let v : Q {0, 1} be a truth assignment such that v() = 1 for all .
Define a truth assignment w : P {0, 1} by letting w(A) = v(A) for all A Q and letting w(A) = 0 for all
A P \Q. Since w  Q = v, we have w() = v() = 1 for all . Since P , it follows that w() = 1,
hence v() = 1. Therefore, Q .
Suppose that is finite, and we want to determine whether or not  . By the previous proposition,
instead of examining all truth assignments v : P {0, 1} on P , we need only consider truth assignments
v : OccurP rop( {}) {0, 1}. Now OccurP rop( {}) is a finite set, so there are only finitely many
possibilities. Thus, one way of determining whether  is simply to check all of them. If |OccurP rop(
{})| = n, then there are 2n different truth assignments. We can systematically arrange them in a table
like below, where we ensure we put the the elements of OccurP rop( {}) in the first columns, and put
all elements of {} in later columns. We also ensure that if is in a column, then all subformulas of
appear in an earlier column. This allows us to fill in the table one column at a time. This simple-minded
exhaustive technique is called the method of truth tables.
Example 3.2.17. Show that {(A B) C, A (C)}  (C) B.
Proof.
A
0
0
0
0
1
1
1
1

B
0
0
1
1
0
0
1
1

AB
0
0
1
1
1
1
1
1

C
0
1
0
1
0
1
0
1

(A B) C
0
0
0
1
0
1
0
1

C
1
0
1
0
1
0
1
0

A (C)
1
1
1
1
1
0
1
0

(C) B
0
1
1
1
0
1
1
1

Notice that every row in which both of the (A B) C column and the A (C) column have a 1, namely
just the row beginning with 011, we have that the entry under the (C) B column is a 1. Therefore,
{(A B) C, A (C)}  (C) B.
Example 3.2.18. Show that (A B) is semantically equivalent to A B.
Proof.
A
0
0
1
1

B
0
1
0
1

AB
0
0
0
1

(A B)
1
1
1
0

A
1
1
0
0

B
1
0
1
0

A B
1
1
1
0

3.3. BOOLEAN FUNCTIONS AND CONNECTIVES

39

Notice that the rows in which the (A B) column has a 1 are exactly the same as the rows in which the
A B column has a 1. Therefore, (A B) is semantically equivalent to A B.

3.3

Boolean Functions and Connectives

Its natural to wonder if our choice of connectives is the right one. For example, why didnt we introduce a
new connective , allowing ourselves to form the formulas (or in Polish notation) and extend
our definition of v so that

1 if v() = 0 and v() = 0

0 if v() = 0 and v() = 1


v( ) =

0 if v() = 1 and v() = 0

1 if v() = 1 and v() = 1


The idea is that theres no real need to introduce this connective because for any , F ormP we would
have that is semantically equivalent to ( ) ( ).
Perhaps we could be more exotic and introduce a new connective  with takes three formulas allowing
us to form the formulas  (heres an instance when Polish notation becomes important), and extend
our definition of v so that
(
1 if at least two of v() = 1, v() = 1, v() = 1
v() =
0 otherwise
Its not hard (and a good exercise) to show that for any , , F ormP , there exists F ormP such that
 is semantically equivalent to . We want a general theorem which says that no matter how exotic a
connective one invents, its always possible to find an element of F ormP which is semantically equivalent,
and thus our choice of connectives is sufficient to express everything wed ever want.
Rather than deal with arbitrary connectives, the real issue here is whether we can express any possible
function taking k true/false values to true/false values.
Definition 3.3.1. Let k N+ . A function f : {0, 1}k {0, 1} is called a boolean function of arity k.
Definition 3.3.2. Suppose that P = {A0 , A1 , . . . , Ak1 }. Given F ormP , we define a boolean function
B : {0, 1}k {0, 1} as follows. Given {0, 1}k , define a truth assignment v : P {0, 1} by letting
v(Ai ) = (i) for all i, and set B () = v().
Theorem 3.3.3. Fix k N+ , and let P = {A0 , A1 , . . . , Ak1 }. For any boolean function f : {0, 1}k {0, 1}
of arity k, there exists F ormP such that f = B .
In fact, well prove a stronger theorem below which says that we may assume that our formula is in a
particularly simple form.
Lets look at an example before we do the proof. Suppose that f : {0, 1}3 {0, 1} is given by
0
0
0
0
1
1
1
1

0
0
1
1
0
0
1
1

0
1
0
1
0
1
0
1

1
0
1
0
0
0
1
1

40

CHAPTER 3. PROPOSITIONAL LOGIC

Suppose we wanted to come up with a formula such that f = B . One option is to use a lot of thought
to come up with an elegant solution. Another is simply to think as follows. Since f (000) = 1, perhaps we
should put
A0 A1 A2
into the formula somewhere. Similarly, since f (010) = 1, perhaps we should put
A0 A1 A2
into the formula somewhere. If we do the same to the other lines which have value 1, we can put all of these
pieces together in a manner which makes them all play nice by connecting them with . Thus, our formula
is
(A0 A1 A2 ) (A0 A1 A2 ) (A0 A1 A2 ) (A0 A1 A2 )
We now give the general proof.
Definition 3.3.4. A literal is a element of P {A : A P }. We denote the set of literals by LitP .
Definition 3.3.5.
Let ConjP = G(SymP , LitP , {h }). We call the elements of ConjP conjunctive formulas.
Let DisjP = G(SymP , LitP , {h }). We call the elements of DisjP disjunctive formulas.
Definition 3.3.6.
Let DN FP = G(SymP , ConjP , {h }). We say that an element of DN FP is in disjunctive normal
form.
Let CN FP = G(SymP , DisjP , {h }). We say that an element of CN FP is in conjunctive normal
form.
Theorem 3.3.7. Fix k N+ , and let P = {A0 , A1 , . . . , Ak1 }. For any boolean function f : {0, 1}k {0, 1}
of arity k, there exists DN FP such that f = B .
Proof. Let T = { {0, 1}k : f () = 1}. If T = , we may let be A0 (A0 ). Suppose then that T 6= .
For each T , let
k1
^
=
i
i=0

where
(
i =

Ai
Ai

if (i) = 1
if (i) = 0

For each T , notice that ConjP because i LitP for all i. Finally, let
_
=

and notice that DN FP .


Since DN FP formulas suffice, we dont even need if all we want to do is have the ability to express
all boolean functions. In fact we can also get rid of one of or as well (think about why).

3.4. SYNTACTIC IMPLICATION

3.4
3.4.1

41

Syntactic Implication
Motivation

We now seek to define a different notion of implication which is based on syntactic manipulations instead
of a detour through truth assignments and other semantic notions. We will do this by setting up a proof
system which gives rules on how to transform certain implications to other implications. There are many
many ways to do this. Some approaches pride themselves on being minimalistic by using a minimal number
of axioms and rules, often at the expense of making the system extremely unnatural to work with. Well
take a different approach and set down our rules and axioms based on the types of steps in a proof that are
used naturally throughout mathematics.
We begin with a somewhat informal description of what we plan to do. The objects that will manipulate
are pairs, where the first component is a set of formulas and the second is a formula. Given F ormP
and F ormP , we write ` to intuitively mean that there is a proof of from the assumptions . We
begin with the most basic proofs. If , i.e. if is one of your assumptions, then youre permitted to
assert that ` .
Basic Proofs: ` if .
Rules for : We have two rules for -elimination and one for -introduction.
`
(ER)
`

`
(EL)
`

` `
(I)
`

Rules for : We have two rules for introducing .


`
(IL)
`
Rules for :

`
(IR)
`
{} `
( I)
`

`
( E)
{} `

Rules for proofs by cases:


{} ` {} `
(P C)
{ } `

{} ` {} `
(P C)
`

Rule for proof by contradiction:


{} ` {} `
(Contr)
`

3.4.2

Official Definitions

Definition 3.4.1. Let LineP = P(F ormP ) F ormP .


Definition 3.4.2. Let AssumeP = {(, ) LineP : }.
We need to define functions corresponding to various rules. For example, we define hEL : LineP
P(LineP ) by letting
(
{(, )} if = where , F ormP
hEL (, ) =

otherwise

42

CHAPTER 3. PROPOSITIONAL LOGIC

hEL is similar, and we define hI : (LineP )2 P(LineP ) by


(
{(1 , 1 2 )}
hI ((1 , 1 ), (2 , 2 )) =

if 1 = 2
otherwise

For the IL rule, we have the function hIL : LineP P(LineP ) by


hIL (, ) = {(, ) : F ormP }
Similarly, we define functions hIR .
We let H be the collection of all of these functions.
Definition 3.4.3. A deduction is a witnessing sequence in (LineP , AssumeP , H).
Definition 3.4.4. Let F ormP and let F ormP . We write `P , or simply ` if P is clear, to
mean that
(, ) G(LineP , AssumeP , H).
We pronounce ` as syntactically implies .
Notation 3.4.5.
1. If = , we write ` instead of ` .
2. If = {}, we write ` instead of {} ` .
Definition 3.4.6. is inconsistent if there exists F ormP such that ` and ` . Otherwise, we
say that is consistent.

3.4.3

Examples Of Deductions

Proposition 3.4.7. A B ` A B.
Proof.
{A B} ` A B

(AssumeP )

(1)

{A B} ` A

(EL on 1)

(2)

(I on 2)

(3)

{A B} ` A B

Proposition 3.4.8. ` for all F ormP .


Proof.
{, } `

(AssumeP )

(1)

{, } `

(AssumeP )

(2)

(Contr on 1 and 2)

(3)

{} `

Proposition 3.4.9. ` for all F ormP .

3.4. SYNTACTIC IMPLICATION

43

Proof.
{} `

(AssumeP )

(1)

{} `

(IL on 1)

(2)

{} `

(AssumeP )

(3)

{} `

(IR on 3)

(4)

(P C on 2 and 4)

(5)

Proposition 3.4.10. {, } ` for all , F ormP .


Proof.
{, , } `

(AssumeP )

(1)

{, , } `

(AssumeP )

(2)

(Contr on 1 and 2)

(3)

{, } `

(AssumeP )

(4)

{, } `

(P C on 3 and 4)

(5)

{, } `

3.4.4

Theorems about `

Proposition 3.4.11. If ` and 0 , then 0 ` .


Proof. The proof is by induction. We let X = {(, ) G : 0 ` for all 0 } and we show by induction
on G that X = G. We begin by noting that if , then for every 0 , we have 0 and hence
0 ` . Therefore, (, ) X for all (, ) AssumeP .
We first handle the EL rule. Suppose that (, ) X. We need to show that (, ) X. Suppose
that 0 . We then have that 0 ` by induction (i.e. since (, ) X), hence 0 ` by the
EL rule. Therefore, (, ) X. The other rules and the rules are similar.
We now handle E rule. Suppose that (, ) X. We need to show that ( {}, ). Suppose
that 0 {}. We then have that 0 , hence 0 ` by induction, and so 0 {} ` by the
E rule. However, 0 {} = 0 because 0 , so 0 ` . Therefore, ( {}, ) X.
We now handle I rule. Suppose that ( {}, ) X. We need to show that (, ). Suppose
that 0 . We then have that 0 {} {}, hence 0 {} ` by induction, and so 0 `
by the I rule. Therefore, (, ) X.
Lets go for the P C rule. Suppose that ( {}, ) X and ( {}, ) X. We need to show
that ( { }, ) X. Suppose that 0 { }. We then have that 0 {} {} and
0 {} {}, hence 0 {} ` and 0 {} ` by induction, and so 0 { } ` by the P C
rule. However, 0 { } = 0 because 0 , so 0 ` . Therefore, ( { }, ) X.
Lets next attack the P C rule. Suppose that ( {}, ) X and ( {}, ) X. We need to show
that (, ) X. Suppose that 0 . We then have that 0 {} {} and 0 {} {},
hence 0 {} ` and 0 {} ` by induction, and so 0 ` by the P C rule. Therefore, (, ) X.
We finish off with the Contr rule. Suppose that ( {}, ) X and ( {}, ) X. We need to
show that (, ) X. Suppose that 0 . We then have that 0 {} {}, hence 0 {} `
and 0 {} ` by induction, and so 0 ` by the Contr rule. Therefore, (, ) X.
The result follows by induction.

44

CHAPTER 3. PROPOSITIONAL LOGIC

Proposition 3.4.12. If is inconsistent, then ` for all F ormP .


Proof. Fix such that ` and ` , and fix F ormP . We have that {} ` and {} `
by Proposition 3.4.11. Therefore, ` by using the Contr rule.
Proposition 3.4.13.
1. If {} is inconsistent, then ` .
2. If {} is inconsistent, then ` .
Proof.
1. Since {} is inconsistent, we know that {} ` by Proposition 3.4.12. Since we also have
that {} ` by AssumeP , it follows that ` by the P C rule.
2. Since {} is inconsistent, we know that {} ` by Proposition 3.4.12. Since we also have
that {} ` by AssumeP , it follows that ` by the P C rule.

Corollary 3.4.14. If F ormP is consistent and F ormP , then either {} is consistent or


{} is consistent.
Proof. If both {} and {} are inconsistent, then both ` and ` by Proposition 3.4.13,
so is inconsistent.
Proposition 3.4.15.
1. If ` and {} ` , then ` .
2. If ` and ` , then ` .
Proof.
1. Since ` , it follows from Proposition 3.4.11 that {} ` . Since we also have {} `
by AssumeP , we may conclude that {} is inconsistent. Therefore, by Proposition 3.4.12, we
have that {} ` . Now we also have {} ` by assumption, so the P C rule gives that
` .
2. Since ` , we can conclude that {} ` by rule E. The result follows from part 1.

Proposition 3.4.16. ` if and only if there is a finite 0 such that 0 ` .


Proof. The proof is by induction. We let X = {(, ) G : there exists a finite 0 such that 0 ` }
and we show by induction on G that X = G. We begin by noting that if , then we have {} is a finite
subset of and {} ` . Therefore, (, ) X for all (, ) AssumeP .
We first handle the EL rule. Suppose that (, ) X. We need to show that (, ) X. By
induction (i.e. since (, ) X), we may fix a finite 0 such that 0 ` . We then have that
0 ` by the EL rule. Therefore, (, ) X. The other ER rule and the rules are similar.
We now handle the I rule. Suppose that (, ) X and (, ) X. We need to show that (, )
X. By induction, we may fix a finite 0 such that 0 ` and we may fix a finite 1 such that
1 ` . We then have that 0 1 ` and 0 1 ` by Proposition 3.4.11, hence 0 1 ` by the
I rule. Therefore, (, ) X because 0 1 is a finite subset of .

3.5. SOUNDNESS AND COMPLETENESS

45

We now handle E rule. Suppose that (, ) X. We need to show that ( {}, ) X. By


induction, we may fix a finite 0 such that 0 ` . We then have that 0 {} ` by the E
rule. Therefore, ( {}, ) X because 0 {} is a finite subset of {}.
We now handle I rule. Suppose that ( {}, ) X. We need to show that (, ) X. By
induction, we may fix a finite 0 {} such that 0 ` . Let 00 = 0 {}. We then have that
0 00 {}, hence 00 {} ` by Proposition 3.4.11, and so 00 ` by the I rule. Therefore,
(, ) X because 00 is a finite subset of .
The other rules are exercises. The result follows by induction.
Corollary 3.4.17. If every finite subset of is consistent, then is consistent.
Proof. Suppose that is inconsistent, and fix F ormP such that ` and ` . By Proposition
3.4.16, there exists finite sets 0 and 1 such that 0 ` and 1 ` . Using Proposition 3.4.11,
it follows that 0 1 ` and 0 1 ` , so 0 1 is a finite inconsistent subset of .

3.5

Soundness and Completeness

3.5.1

The Soundness Theorem

Theorem 3.5.1 (Soundness Theorem).


1. If ` , then  .
2. Every satisfiable set of formulas is consistent.
Proof.
1. The proof is by induction. We let X = {(, ) G :  } and we show by induction on G that
X = G. We begin by noting that if , then  because if v : P {0, 1} is such that v() = 1
for all , then v() = 1 simply because . Therefore, (, ) X for all X AssumeP .
We first handle the EL rule. Suppose that  . We need to show that  . However, this is
straightforward because if v : P {0, 1} is such that v() = 1 for all , then v( ) = 1 because
 , hence v() = 1. Therefore,  . The other rules and the rules are similar.
We now handle E rule. Suppose that  . We need to show that {}  . Let
v : P {0, 1} be such that v() = 1 for all {}. Since  , we have v( ) = 1.
Since v() = 1, it follows that v() = 1. Therefore, {}  . The I rule is similar.
Lets next attack the P C rule. Suppose that {}  and {}  . We need to show that
 . Let v : P {0, 1} be such that v() = 1 for all . If v() = 1, then v() = 1 because
{}  . Otherwise, we have v() = 0, hence v() = 1, and thus v() = 1 because {}  .
Therefore,  . The P C rule is similar.
We finish off with the Contr rule. Suppose that {}  and {}  . We need to show
that  . Let v : P {0, 1} be such that v() = 1 for all . Suppose that v() = 0. We then
have v() = 1, and so both v() = 1 and v() = 1 because {}  and {}  . This
is a contradiction, so we may conclude that v() = 1. Therefore,  .
The result follows by induction.
2. Let be a satisfiable set of formulas. Fix a truth assignment v : P {0, 1} such that v() = 1 for
all . Suppose that is inconsistent, and fix F ormP such that ` and ` . We then
have  and  by part 1, hence v() = 1 and v() = 1, a contradiction. It follows that is
consistent.

46

3.5.2

CHAPTER 3. PROPOSITIONAL LOGIC

The Completeness Theorem

The Completeness Theorem is the converse of the Soundness Theorem (both parts). In order words, it says
that (1) If  , then ` and (2) every consistent set of formulas is satisfiable. Part (1) looks quite
difficult to tackle directly (think about the amount of cleverness that went into finding the simple deductions
we have used so far), so instead we go after (2) first and use it to prove (1).
Suppose then that F ormP is consistent. We need to build a truth assignment v : P {0, 1} such
that v() = 1 for all . Suppose that we are trying to define v(A) for a given A P . If A , then we
should certainly set v(A) = 1. Similarly, if A , then we should set v(A) = 0. But what should we do
if both A
/ and A
/ ? What if every formula in is very long and complex so that you have no idea
how to start defining the truth assignment? The idea is to expand to a larger consistent set which has
come simpler formulas that aid us in deciphering how to define v. Ideally, we would like to extend to
consistent set 0 such that for all A P , either A 0 or A 0 . because that would give us a clear way
to define v. However, in order to check that our v satisfies v() = 1 for all , we want even more. That
is the content of our next definition.
Definition 3.5.2. Let F ormP . We say that is complete if for all F ormP , either or
.
Our first task is to show that if is consistent, then it can be expanded to a consistent and complete set
. We first prove this in the special case when P is countable because the construction is more transparent
and avoids more powerful set-theoertic tools.
Proposition 3.5.3. Suppose that P is countable. If is consistent, then there exists a set which is
consistent and complete.
Proof. Since P is countable, it follows that F ormP is countable. List F ormP as 1 , 2 , 3 , . . . . We define a
sequence of sets 0 , 1 , 2 , . . . recursively as follows. Let 0 = . Suppose that n N and we have defined
n . Let
(
n {n }
if n {n } is consistent
n+1 =
n {n } otherwise
S
Using induction and Corollary 3.4.14, it follows that n is consistent for all n N. Let = nN n .
We first argue that is consistent. For any finite subset 0 of , there exists an n N such that
0 n , and so 0 is consistent because every n is consistent. Therefore, is consistent by Proposition
3.4.17. We end by arguing that is complete. Fix F ormP , and fix n N+ such that = n . By
construction, we either have n or n . Therefore, is complete.
We now show how to handle the uncountable case. The idea is that a complete consistent set is a maximal
consistent set, so we can obtain one using Zorns Lemma (a standard set-theoretic tool to obtain maximal
objects). If you are unfamiliar with Zorns Lemma, feel free to focus only on the countable case until we
cover set theory.
Definition 3.5.4. is maximal consistent if is consistent and there is no 0 which is consistent.
Proposition 3.5.5. is maximal consistent if and only if is consistent and complete.
Proof. Suppose that is maximal consistent. We certainly have that is consistent. Fix F ormP . By
Corollary 3.4.14, either {} is consistent or {} is consistent. If {} is consistent, then
because is maximal consistent. Similarly, If {} is consistent, then because is maximal
consistent. Therefore, either or .
Suppose that is consistent and complete. Suppose that 0 and fix 0 . Since is
complete and
/ , we have . Therefore, 0 ` and 0 ` , so 0 is inconsistent. It follows that
is maximal consistent.

3.5. SOUNDNESS AND COMPLETENESS

47

Proposition 3.5.6. If is consistent, then there exists a set which is consistent and complete.
Proof. Let S = { F ormP : and is consistent}, and
S order S by . Notice that S is nonempty
because S. Suppose that C S is a chain in S. Let = C = { F ormP : for some C}.
We need to argue that is consistent. Suppose that 0 is a finite subset of , say 0 = {1 , 2 , . . . , n }.
For each i , fix i C with i i . Since C is a chain, there exists j such that j i for all i. Now
j C S, so j is consistent, and hence 0 is consistent. Therefore, is consistent by Proposition 3.4.17.
It follows that S and using the fact that for all C, we may conclude that C has an upper
bound.
Therefore, by Zorns Lemma, S has a maximal element . Notice that is maximal consistent, hence
is complete and consistent by Proposition 3.5.5.
Lemma 3.5.7. Suppose that is consistent and complete. If ` , then .
Proof. Suppose that ` . Since is complete, we have that either or . Now if ,
then ` , hence is inconsistent contradicting our assumption. It follows that .
Lemma 3.5.8. Suppose that is consistent and complete. We have
1. if and only if
/ .
2. if and only if and .
3. if and only if either or .
4. if and only if either
/ or .
Proof.
1. If , then
/ because otherwise ` and so would be inconsistent.
Conversely, if
/ , then because is complete.
2. Suppose first that . We then have that ` , hence ` by the EL rule and `
by the ER rule. Therefore, and by Lemma 3.5.7.
Conversely, suppose that and . We then have ` and ` , hence ` by the
I rule. Therefore, by Lemma 3.5.7.
3. Suppose first that . Suppose that
/ . Since is complete, we have that . From
Proposition 3.4.10, we know that {, } ` , hence ` by Proposition 3.4.11. Therefore,
by Lemma 3.5.7. It follows that either or .
Conversely, suppose that either or .
Case 1: Suppose that . We have ` , hence ` by the IL rule. Therefore,
by Lemma 3.5.7.
Case 2: Suppose that . We have ` , hence ` by the IR rule. Therefore,
by Lemma 3.5.7.
4. Suppose first that . Suppose that . We then have that ` and ` ,
hence ` by Proposition 3.4.15. Therefore, by Lemma 3.5.7. It follows that either
/ or
.
Conversely, suppose that either
/ or .
Case 1: Suppose that
/ . We have because is complete, hence {} is inconsistent
(as {} ` and {} ` ). It follows that {} ` by Proposition 3.4.12, hence
` by the I rule. Therefore, by Lemma 3.5.7.

48

CHAPTER 3. PROPOSITIONAL LOGIC


Case 2: Suppose that . We have {}, hence {} ` , and so ` by the
I rule. Therefore, by Lemma 3.5.7.

Proposition 3.5.9. If is consistent and complete, then is satisfiable.


Proof. Suppose that is complete and consistent. Define v : P {0, 1} by
(
1 if A
v(A) =
0 if A
/
We prove by induction on that if and only if v() = 1. For any A P , we have
A v(A) = 1 v(A) = 1
by our definition of v.
Suppose that the result holds for . We have

/
v() = 0

(by Lemma 3.5.8)


(by induction)

v() = 1
Suppose that the result holds for and . We have
and
v() = 1 and v() = 1

(by Lemma 3.5.8)


(by induction)

v( ) = 1
and
or
v() = 1 or v() = 1

(by Lemma 3.5.8)


(by induction)

v( ) = 1
and finally

/ or
v() = 0 or v() = 1

(by Lemma 3.5.8)


(by induction)

v( ) = 1
Therefore, by induction, we have if and only if v() = 1. In particular, we have v() = 1 for all
, hence is satisfiable.
Theorem 3.5.10 (Completeness Theorem). (Suppose that P is countable.)
1. Every consistent set of formulas is satisfiable.
2. If  , then ` .
Proof.
1. Suppose that is consistent. By Proposition 3.5.6, we may fix which is consistent and complete.
Now is satisfiable by Proposition 3.5.9, so we may fix v : P {0, 1} such that v() = 1 for all .
Since , it follows that v() = 1 for all , hence is satisfiable.
2. Suppose that  . We then have that {} is unsatisfiable, hence {} is inconsistent by
part 1. It follows from Proposition 3.4.13 that ` .

3.6. COMPACTNESS AND APPLICATIONS

3.6
3.6.1

49

Compactness and Applications


The Compactness Theorem

Corollary 3.6.1 (Compactness Theorem).


1. If  , then there exists a finite 0 such that 0  .
2. If every finite subset of is satisfiable, then is satisfiable.
Proof. We first prove 1. Suppose that  . By the Completeness Theorem, we have ` . Using
Proposition 3.4.16, we may fix a finite 0 such that 0 ` . By the Soundness Theorem, we have
0  .
We now prove 2. If every finite subset of is satisfiable, then every finite subset of is consistent by the
Soundness Theorem, hence is consistent by Corollary 3.4.17, and so is satisfiable by the Completeness
Theorem.

3.6.2

Combinatorial Applications

Definition 3.6.2. Let G = (V, E) be a graph, and let k N+ . A k-coloring of G is a function f : V [k]
such that for all u, v V which are linked by an edge in E, we have f (u) 6= f (v).
Proposition 3.6.3. Let G = (V, E) be a (possibly infinite) graph and let k N+ . If every finite subgraph
of G is k-colorable, then G is k-colorable.
Proof. Let P = {Au,i : u V and i [k]}. Let
k1
_

={

Au,i : u V } {(Au,i Au,j ) : u V and i, j [k] with i 6= j}

i=0

{(Au,i Aw,i ) : u and w are linked by an edge in E and i [k]}}


We use the Compactness Theorem to show that is satisfiable. Suppose that 0 is finite. Let
{u1 , u2 , . . . , un } be all of the elements u V such that Au,i occurs in some element of 0 for some i. Since
every finite subgraph of G is k-colorable, we may fix a k-coloring f : {u1 , u2 , . . . , un } [k] such that whenever
u` and um are linked by an edge of E, we have f (u` ) 6= f (um ). If we define a truth assignment v : P {0, 1}
by
(
1 if w = u` and f (u` ) = i
v(Aw,i ) =
0 otherwise
we see that v() = 1 for all 0 . Thus, 0 is satisfiable. Therefore, is satisfiable by the Compactness
Theorem.
Fix a truth assignment v : P {0, 1} such that v() = 1 for all . Notice that for each u V ,
there exists a unique i such that v(Au,i ) = 1 because of the first two sets in the definition of . If we define
f : V [k] by letting f (u) be the unique i such that v(Au,i ) = 1, then for all u, w V linked by an edge in
E we have that f (u) 6= f (w) (because of the third set in the definition of ). Therefore, G is k-colorable.
Corollary 3.6.4. Every (possibly infinite) planar graph is 4-colorable.
Proof. Since every subgraph of a planar graph is planar, this follows trivially from the previous proposition
and the highly nontrivial theorem that every finite planar graph is 4-colorable.
Definition 3.6.5. A set T {0, 1} is called a tree if whenever T and , we have T .

50

CHAPTER 3. PROPOSITIONAL LOGIC

Theorem 3.6.6 (Weak K


onigs Lemma). Suppose that T {0, 1} is a tree which is infinite. There exists
an f : N {0, 1} such that f  [n] T for all n N.
Proof. Let P = {A : T }. For each n N, let Tn = { T : || = n} and notice that Tn 6= for all
n N because T is an infinite tree. Let
_
A : n N} {(A A ) : , Tn and 6= }
={
Tn

{(A A ) : , T, }
We use the Compactness Theorem to show that is satisfiable. Suppose that 0 is finite. Let
= {1 , 2 , . . . , k } be all of the elements {0, 1} such that A occurs in some element of 0 . Let
n = max{|1 |, |2 |, . . . , |k |}. Since Tn 6= , we may fix Tn . If we define a truth assignment v : P {0, 1}
by
(
1 if
v(A ) =
0 otherwise
we see that v() = 1 for all 0 . Thus, 0 is satisfiable. Therefore, is satisfiable by the Compactness
Theorem.
Fix a truth assignment v : P {0, 1} such that v() = 1 for all . Notice that for each n N+ ,
there exists a unique Tn such that v(A ) = 1 because of the first two sets in the definition of . For
each n, denote the unique such by n and notice that m n whenver m n. Define f : N {0, 1} by
letting f (n) = n+1 (n). We then have that f  [n] = n T for all n N.

3.6.3

An Algebraic Application

Definition 3.6.7. An ordered abelian group is an abelian group (A, +, 0) together with a relation on A2
such that
1. is a linear ordering on A, i.e. we have
For all a A, we have a a.
For all a, b A, either a b or b a.
If a b and b a, then a = b.
If a b and b c, then a c.
2. If a b and c d, then a + c b + d.
Example 3.6.8. (Z, +, 0) with its usual order is an ordered abelian group.
Example 3.6.9. Define on Zn using the lexicographic order. In other words, given distinct elements
~a = (a1 , a2 , . . . , an ) and ~b = (b1 , b2 , . . . , bn ) in Zn , let i be least such that ai 6= bi , and set ~a < ~b if ai <Z bi ,
and ~b < ~a if bi <Z ai . With this order, (Zn , +, 0) is an ordered abelian group.
Proposition 3.6.10. Suppose that (A, +, 0) is an ordered abelian group, and define < be letting a < b if
a b and a 6= b. We the have
1. For all a, b A, exactly one of a < b, a = b, or b < a holds.
2. If a < b and b c, then a < c.
3. If a b and b < c, then a < c.

3.6. COMPACTNESS AND APPLICATIONS

51

Proof.
1. Let a, b A. We first show that at least one happens. Suppose then that a 6= b. We either have a b
or b a. If a b, we then have a < b, while if b a, we then have b < a.
We now show that at most one occurs. Clearly, we cant have both a < b and a = b, nor can we have
both a = b and b < a. Suppose then that we have both a < b and b < a. We would then have both
a b and b a, hence a = b, a contradiction.
2. Since a b and b c, we have a c. If a = c, it would follow that a b and b a, hence a = b, a
contradiction.
3. Since a b and b c, we have a c. If a = c, it would follow that c b and b c, hence b = c, a
contradiction.

Proposition 3.6.11. Suppose that (A, +, 0) is an ordered abelian group.


1. If a < b and c A, then a + c < b + c.
2. If a < b and c d, then a + c < b + d.
3. If a > 0, then a < 0.
4. If a < 0, then a > 0.
Proof.
1. Since a < b and c c, we have a + c b + c. If a + c = b + c, then we would have a = b, a contradiction.
Therefore, a + c < b + c.
2. We have a + c < b + c and b + c b + d, hence a + c < b + d by the previous proposition.
3. We have a 6= 0, hence a 6= 0. Suppose that a > 0. We would then have a + (a) > 0, hence 0 > 0,
a contradiction.
4. Similar to 3.

Definition 3.6.12. An abelian group (A, +, 0) is torsion-free if every nonzero element of A has infinite
order.
Proposition 3.6.13. Every ordered abelian group is torsion-free.
Proof. Let (A, +, 0) be an ordered abelian group. Let a A. If a > 0, then we have n a > 0 for every
n N+ by induction. If a < 0, then we have n a < 0 for every n N+ by induction.
Theorem 3.6.14. Every torsion-free abelian group can be ordered.
Proof. First notice that every finitely generated torsion-free abelian group is isomorphic to Zn for some n,
which we can order lexicographically from above. We can transer this ordering across the isomorphism to
order our finitely generated abelian group.
Suppose now that A is an arbitrary torsion-free abelian group. Let P be the set {La,b : a, b A} and let
be the union of the sets
{La,a : a A}.

52

CHAPTER 3. PROPOSITIONAL LOGIC


{La,b Lb,a : a, b A}.
{(La,b Lb,a ) : a, b A, a 6= b}
{(La,b Lb,c ) La,c : a, b, c A}.
{(La,b Lc,d ) La+c,b+d : a, b, c, d A}

We show that is satisfiable. By Compactness, it suffices to show that any finite subset of is satisfiable.
Suppose that 0 is finite, and let S be the finite subset of A consisting of all elements of A which appear
as a subscript of a symbol occuring in 0 . Let B be the subgroup of A generated by S. We then have that
B is a finitely generated torsion-free abelian group, so from above we may fix an order on it. If we define
a truth assignment v : P {0, 1} by
(
1 if a b
v(La,b ) =
0 otherwise
we see that v() = 1 for all 0 . Thus, 0 is satisfiable. Therefore, is satisfiable by the Compactness
Theorem.
Fix a truth assignment v : P {0, 1} such that v() = 1 for all . Define on A2 by letting a b
if and only if v(La,b ) = 1. We then have that orders A. Therefore, A can be ordered.

Chapter 4

First-Order Logic : Syntax and


Semantics
Now that weve succeeded in giving a decent analysis of propositional logic, together with proving a few
nontrivial theorems, its time to move on to a much more substantial and important logic: first-order logic.
As summarized in the introduction, the general idea is as follows. Many areas of mathematics deal with
mathematical structures consisting of special constants, relations, and functions, together with certain axioms
that these objects obey. We want our logic to be able to handle many different types of situations, so we
allow ourselves to vary the number and types of these symbols. Any such choice gives rise to a language,
and once weve fixed such a language, we can build up formulas which will express something meaningful
once weve decided on an interpretation of the symbols.

4.1

The Syntax of First-Order Logic

Since our logic will have quantifiers, the first thing that we need is a collection of variables.
Definition 4.1.1. Fix a countably infinite set V ar called variables.
Definition 4.1.2. A first-order language, or simply a language, consists of the following:
1. A set C of constant symbols.
2. A set F of function symbols together with a function ArityF : F N+ .
3. A set R of relation symbols together with a function ArityR : R N+ .
We also assume that C, R, F, V ar, and {, , =, , , , } are pairwise disjoint. For each k N+ , we let
Fk = {f F : ArityF (f) = k}
and we let
Rk = {R R : ArityR (R) = k}
Definition 4.1.3. Let L be a language. We let SymL = C R F V ar {, , =, , , , }.
Now that weve described all of the symbols that are available once weve fixed a language, we need to
talk about how to build up formulas. Before doing this, however, we need a way to name objects. Intuitively,
our constant symbols and variables name objects once weve fixed an interpretation. From here, we can get
new objects by applying, perhaps repeatedly, interpretations of function symbols. This is starting to sound
like a recursive definition.
53

54

CHAPTER 4. FIRST-ORDER LOGIC : SYNTAX AND SEMANTICS

Definition 4.1.4. Let L be a language. For each f Fk , define hf : (SymL )k SymL by letting
hf (1 , 2 , . . . , k ) = f1 2 k
Let
T ermL = G(SymL , C V ar, {hf : f F})
Now that we have terms which intuitively name elements once weve fixed an interpretation, we need to
say what our atomic formulas are. The idea is that the most basic things we can say are whether or not two
objects are equal or whether or not a k-tuple is in the interpretation of some relation symbol R Rk .
Definition 4.1.5. Let L be a language. We let
AtomicF ormL = {Rt1 t2 tk : k N+ , R Rk , and t1 , t2 , . . . , tk T ermL } {= t1 t2 : t1 , t2 T ermL }
From here, we can build up all formulas.
Definition 4.1.6. Let L be a language. Define a unary function h and binary functions h , h , and h
on SymL as follows.
h () =
h (, ) =
h (, ) =
h (, ) =
Also, for each x V ar, define two unary functions h,x and h,x on SymL as follows
h,x () = x
h,x () = x
Let
F ormL = G(SymL , AtomicF ormL , {h , h , h , h } {h,x , h,x : x V ar})
As with propositional logic, wed like to be able to define things recursively, so we need to check that our
generating systems are free. Notice that in the construction of formulas, we have two generating systems
around. We first generate all terms. With terms taken care of, we next describe the atomic formulas, and
from them we generate all formulas. Thus, well need to prove that two generating systems are free. The
general idea is to make use of the insights gained by proving the corresponding result for Polish notation in
propositional logic.
Definition 4.1.7. Let L be a language. Define K : SymL Z as follows. We first define w : SymL Z
as follows.
w(c) = 1

for all c C

w(f) = 1 k

for all f Fk

w(R) = 1 k

for all R Rk

w(x) = 1
w(=) = 1

for all x V ar

w(Q) = 1

for all Q {, }

w() = 0
w(3) = 1
We then define K on all of
SymL \{}.

SymL

for all 3 {, , }
P
by letting K() = 0 and letting K() =
i<|| w((i)) for all

4.1. THE SYNTAX OF FIRST-ORDER LOGIC

55

Remark 4.1.8. If , SymL , then K( ) = K() + K( ).


Proposition 4.1.9. If t T ermL , then K(t) = 1.
Proof. The proof is by induction on t. Notice first that K(c) = 1 for all c C and K(x) = 1 for all x V ar.
Suppose that k N+ , f Fk , and t1 , t2 , . . . , tk T ermL are such that K(ti ) = 1 for all i. We then have
that
K(ft1 t2 tk ) = K(f) + K(t1 ) + K(t2 ) + + K(tk )
= (1 k) + 1 + 1 + + 1

(by induction)

= 1.
The result follows by induction.
Proposition 4.1.10. If t T ermL and t, then K() 0.
Proof. The proof is by induction on t. For every c C, this is trivial because the only c is = and
we have K() = 0. Similarly, for every x V ar, the only x is = and we have K() = 0.
Suppose that k N+ , f Fk , and t1 , t2 , . . . , tk T ermL are such that the result holds for each ti . We
prove the result for ft1 t2 tk . Suppose that ft1 t2 tk . If = , then K() = 0. Otherwise, there
exists i < k and ti such that = ft1 t2 ti1 , in which case
K() = K(f) + K(t1 ) + K(t2 ) + + K(ti1 ) + K( )
= (1 k) + 1 + 1 + + 1 + K( )

(by Proposition 4.1.9)

= (1 k) + i + K( )
(1 k) + i + 0

(by induction)

= 1 + (i k)
0.

(since i < k)

Thus, the result holds for ft1 t2 tk .


Corollary 4.1.11. If t, u T ermL , then t 6 u.
Proof. This follows by combining Proposition 4.1.9 and Proposition 4.1.10.
Theorem 4.1.12. The generating system (SymL , C V ar, {hf : f F}) is free.
Proof. First notice that for all f F, we have that ran(hf  (T ermL )k ) (C V ar) = because all elements
of ran(hf ) begin with f and we know that f
/ C V ar.
Fix f Fk . Suppose that t1 , t2 , . . . , tk , u1 , u2 , . . . , uk T ermL and hf (t1 , t2 , . . . , tk ) = hf (u1 , u2 , . . . , uk ).
We then have ft1 t2 tk = fu1 u2 uk , hence t1 t2 tk = u1 u2 uk . Since t1 u1 and u1 t1 are both
impossible by Corollary 4.1.11, it follows that t1 = u1 . Thus, t2 tk = u2 uk , and so t2 = u2 for the
same reason. Continuing in this fashion, we conclude that ti = ui for all i. It follows that hf  (T ermL )k is
injective.
Finally notice that for any f Fk and any g F` with f 6= g, we have that ran(hf  (T ermL )k )
ran(hg  (T ermL )` ) = because all elements of ran(hf  (T ermL )k ) begin with f while all elements of
ran(hg  (T ermL )` ) begin with g.
Proposition 4.1.13. If F ormL , then K() = 1.

56

CHAPTER 4. FIRST-ORDER LOGIC : SYNTAX AND SEMANTICS

Proof. The proof is by induction on . We first show that K() = 1 for all AtomicF ormL . Suppose
that is Rt1 t2 tk where R Rk and t1 , t2 , . . . , tk T ermL . We then have
K(Rt1 t2 tk ) = K(R) + K(t1 ) + K(t2 ) + + K(tk )
= (1 k) + 1 + 1 + + 1

(by Proposition 4.1.9)

= 1.
Suppose that is = t1 t2 where t1 , t2 T ermL . We then have
K(= t1 t2 ) = K(=) + K(t1 ) + K(t2 )
= 1 + 1 + 1

(by Proposition 4.1.9)

= 1.
Thus, K() = 1 for all AtomicF ormL .
Suppose that F ormL is such that K() = 1. We then have that
K() = K() + K()
=0+1
= 1.
For any Q {, } and any x V ar we also have
K(Qx) = K(Q) + K(x) + K()
= 1 + 1 + 1
= 1.
Suppose now that , F ormL are such that K() = 1 = K(), and 3 {, , }. We then have that
K(3) = 1 + K() + K()
= 1 + 1 + 1
= 1.
The result follows by induction.
Proposition 4.1.14. If F ormL and , then K() 0.
Proof. The proof is by induction on . We first show that the results holds for all AtomicF ormL .
Suppose that is Rt1 t2 tk where R Rk and t1 , t2 , . . . , tk T ermL . Suppose that Rt1 t2 tk . If
= , then K() = 0. Otherwise, there exists i < k and ti such that is Rt1 t2 ti1 , in which case
K() = K(R) + K(t1 ) + K(t2 ) + + K(ti1 ) + K( )
= (1 k) + 1 + 1 + + 1 + K( )

(by Proposition 4.1.9)

= (1 k) + i + K( )
(1 k) + i + 0

(by induction)

= 1 + (i k)
0.

(since i < k)

Thus, the result holds for Rt1 t2 tk . The same argument works for = t1 t2 where t1 , t2 T ermL , so the
result holds for all AtomicF ormL .

4.1. THE SYNTAX OF FIRST-ORDER LOGIC

57

Suppose that the result holds for F ormL . Suppose that . If = , then K() = 0.
Otherwise, = for some , in which case
K() = K() + K( )
= 0 + K( )
0.

(by induction)

Suppose now that Q {, }, that x V ar, and that Qx. If = , then K() = 0, and if = Q,
then K() = 1. Otherwise, = Qx for some , in which case
K() = K(Q) + K(x) + K( )
= 1 + 1 + K( )
=0

(by induction)

Suppose now that the result holds for , F ormL , and 3 {, , }. Suppose that 3. If = ,
then K() = 0. If is 3 for some , then
K() = K(3) + K( )
= 1 + K( )
1.

(by induction)

Otherwise, is 3 for some , in which case


K() = K(3) + K() + K( )
= 1 + 0 + K( )

(by Proposition 3.1.13)

1.

(by induction)

Thus, the result holds for 3.


Corollary 4.1.15. If , F ormL , then 6 .
Proof. This follows by combining Proposition 4.1.13 and Proposition 4.1.14.
Theorem 4.1.16. The generating system (SymL , AtomicF ormL , {h , h , h , h } {h,x , h,x : x V }) is
free.
Proof. Similar to the others.
With these freeness results, we are now able to define functions recursively on T ermL and F ormL . Since
we use terms in our definition of atomic formulas, which are the basic formulas, we will often need to make
two recursive definitions (on terms first, then on formulas) in order to define a function on formulas. Heres
an example.
Definition 4.1.17. Let L be a language.
1. We define a function OccurV ar : T ermL P(V ar) recursively as follows.
OccurV ar(c) = for all x C.
OccurV ar(x) = {x} for all x V ar.
OccurV ar(ft1 t2 tk ) = OccurV ar(t1 ) OccurV ar(t2 ) OccurV ar(tk ) for all f Fk and
t1 , t2 , . . . , tk T ermL .

58

CHAPTER 4. FIRST-ORDER LOGIC : SYNTAX AND SEMANTICS


2. We define a function OccurV ar : F ormL P(V ar) recursively as follows.
OccurV ar(Rt1 t2 tk ) = OccurV ar(t1 ) OccurV ar(t2 ) OccurV ar(tk ) for all R Rk and
t1 , t2 , . . . , tk T ermL .
OccurV ar(= t1 t2 ) = OccurV ar(t1 ) OccurV ar(t2 ) for all t1 , t2 T ermL .
OccurV ar() = OccurV ar() for all F ormL .
OccurV ar(3) = OccurV ar() OccurV ar() for each 3 {, , } and , F ormL .
OccurV ar(Qx) = OccurV ar() {x} for each Q {, }, x V ar, and F ormL .

Definition 4.1.18. Let L be a language.


1. We define a function F reeV ar : F ormL P(V ar) recursively as follows.
F reeV ar(Rt1 t2 tk ) = OccurV ar(t1 ) OccurV ar(t2 ) OccurV ar(tk ) for all R Rk and
t1 , t2 , . . . , tk T ermL .
F reeV ar(= t1 t2 ) = OccurV ar(t1 ) OccurV ar(t2 ) for all t1 , t2 T ermL .
F reeV ar() = F reeV ar() for all F ormL .
F reeV ar(3) = F reeV ar() F reeV ar() for each 3 {, , } and , F ormL .
F reeV ar(Qx) = F reeV ar()\{x} for each Q {, }, x V ar, and F ormL .
2. We define a function BoundV ar : F ormL P(V ar) recursively as follows.
BoundV ar(Rt1 t2 tk ) = for all R Rk and t1 , t2 , . . . , tk T ermL .
BoundV ar(= t1 t2 ) = for all t1 , t2 T ermL .
BoundV ar() = BoundV ar() for all F ormL .
BoundV ar(3) = BoundV ar() BoundV ar() for each 3 {, , } and , F ormL .
BoundV ar(Qx) = BoundV ar() {x} for each Q {, }, x V ar, and F ormL .
Definition 4.1.19. Let L be a language and let F ormL . We say that is an L-sentence, or simply a
sentence, if F reeV ar() = . We let SentL be the set of sentences.

4.2
4.2.1

Structures: The Semantics of First-Order Logic


Structures: Definition and Satisfaction

Up until this point, all that weve dealt with are sequences of symbols without meaning. Sure, our motivation
was to capture meaningful situations with our languages and the way weve described formulas, but all weve
done so far is describe the grammar. If we want our formulas to actually express something, we need to set
up a context in which to interpret them. Since we have quantifiers, the first thing well need is a nonempty
set M to serve as the domain of objects that the quantifiers range over. Once weve fixed that, we need to
interpret the symbols of our language as actual elements of our set (in the case of constant symbols), actual
k-ary relations on M (in the case of R Rk ), and actual k-ary functions on M (in the case of f Fk ).
Definition 4.2.1. Let L be a language. An L-structure, or simply a structure, is a set M = (M, gC , gF , gR )
where
1. M is a nonempty set called the universe of M.
2. gC : C M .

4.2. STRUCTURES: THE SEMANTICS OF FIRST-ORDER LOGIC

59

3. gR is a function on R such that gR (R) is a subset of M k for all R Rk .


4. gF is a function on F such that gF (f) is a k-ary function on M for all f Fk .
We use the following notation.
1. For each c C, we use cM to denote gC (c).
2. For each R Rk , we use RM to denote gR (R).
3. For each f Fk , we use f M to denote gF (f).
Example 4.2.2. Let L = {R} where R is a binary relation symbol. Here are some examples of L-structures.
1. M = N and RM = {(m, n) M 2 : m | n}.
2. M = {0, 1} and RM = {(, ) M 2 : }.
3. M = R2 and RM = {((a1 , b1 ), (a2 , b2 )) M 2 : a1 = a2 }.
4. M = {0, 1, 2, 3, 4} and RM = {(0, 2), (3, 3), (4, 1), (4, 2), (4, 3)}.
Example 4.2.3. Let L = {c, f} where c is a constant symbol and f is a binary function symbol. Here are
some examples of L-structures.
1. M = Z, cM = 3 and f M is the subtraction function (m, n) 7 m n.
2. M = R, cM = and f M is the function (a, b) 7 sin(a b).
3. For any group (G, e, ), we get an L-structure by letting M = G, cM = e, and letting f M be the group
operation.
At first, it may appear than an L-structure provides a means to make sense out of any formula. However,
this is not the case, as we can see by looking at the formula x = y where x, y V ar. Even given an L-structure
M, we cant say whether the formula x = y is true in M until we know how to interpret both x and y.
For a more interesting example, consider the language of groups where L = {c, f} where c is a constant
symbol and f is a binary function symbol. Let M be the integers Z with cM = 0 and with f M being addition.
Consider the formula fxy = z. If we interpret x as 7, y as 3, and z as 4, then the formula fxy = z is true
in M. However, if we interpret x as 2, y as 7, and z as 1, then the formula fxy = z is false in M.
Once we fix an L-structure M, this need to interpret the elements of V ar as elements of M motivates
the following definition.
Definition 4.2.4. Let M be an L-structure. A function s : V ar M is called a variable assignment on
M.
Recall in propositional logic that every truth assignment v : P {0, 1} gave rise to an extension
v : F ormP {0, 1} telling us how to interpret every formula. In the first-order logic case, we need an
L-structure M together with variable assignment s : V ar M to make sense of things. We first show how
this apparatus allows us to assign an element of M to every term by extending s to a function s : T ermL M .
Definition 4.2.5. Let M be an L-structure, and let s : V ar M be a variable assignment. By freeness,
there exists a unique s : T ermL M such that
s(x) = s(x) for all v V ar.
s(c) = cM for all c C.

60

CHAPTER 4. FIRST-ORDER LOGIC : SYNTAX AND SEMANTICS


s(ft1 t2 tk ) = f M (s(t1 ), s(t2 ), . . . , s(tk ))

Notice that there is nothing deep going on here. Given an L-structure M and a variable assignment s,
to apply s to a term, we simply unravel the term attaching meaning to each symbol using M and s as we
bottom-out through the recursion. For example, assume that L = {c, f} where c is a constant symbol and f
is a binary function symbol. Given an L-structure M and a variable assignment s : V ar M , then working
through the definitions, we have
s(ffczfxffczy) = f M (s(fcz), s(fxffczy))
= f M (f M (s(c), s(z)), f M (s(x), s(ffczy)))
= f M (f M (s(c), s(z)), f M (s(x), f M (s(fcz), s(y))))
= f M (f M (s(c), s(z)), f M (s(x), f M (f M (s(c), s(z)), s(y))))
= f M (f M (cM , s(z)), f M (s(x), f M (f M (cM , s(z)), s(y))))
In other words, were taking the syntactic formula ffczfxffczy and assigned a semantic meaning to it by
returning the element of M described in the last line. For a specific example of how this gets interpreted
in one case, let M be the integers Z with cM = 5 and with f M being addition. Let s : V ar M be an
arbitrary variable assignment with s(x) = 3, s(y) = 11, and s(z) = 2. We then have
s(ffczfxffczy) = 6
because
s(ffczfxffczy) = f M (f M (cM , s(z)), f M (s(x), f M (f M (cM , s(z)), s(y))))
= f M (f M (0, 2), f M (3, f M (f M (0, 2), 11)))
= ((5 + 2) + (3 + (5 + 2) + (11)))
=6
Were now in position to define the intuitive statement holds in the L-structure M with variable
assignment s recursively. We need the following definition in order to handle quantifiers.
Definition 4.2.6. Let M be an L-structure, and let s : V ar M be a variable assignment. Given x V ar
and a M , we let s[x a] denote the variable assignment
(
a
if y = x
s[x a](y) =
s(y) otherwise
Definition 4.2.7. Let M be an L-structure. We define a relation (M, s)  (pronounced holds in
(M, s), or is true in (M, s), or (M, s) models ) for all F ormL and all variable assignments
s by induction on .
Suppose first that is an atomic formula.
If is Rt1 t2 tk ,, we have (M, s)  if and only if (s(t1 ), s(t2 ), . . . , s(tk )) RM .
If is = t1 t2 , we have (M, s)  if and only if s(t1 ) = s(t2 ).
For any s, we have (M, s)  if and only if (M, s) 6 .
For any s, we have (M, s)  if and only if (M, s)  and (M, s)  .
For any s, we have (M, s)  if and only if either (M, s)  or (M, s)  .

4.2. STRUCTURES: THE SEMANTICS OF FIRST-ORDER LOGIC

61

For any s, we have (M, s)  if and only if either (M, s) 6 or (M, s)  .


For any s, we have (M, s)  x if and only if there exists a M such that (M, s[x a])  .
For any s, we have (M, s)  x if and only if for all a M , we have (M, s[x a])  .
Comments. The above recursive definition takes a little explanation, because some recursive calls change
the variable assignment. Thus, we are not fixing an L-structurve M and a variable assignment s on M,
and then doing a recursive definition on F ormL . We can make the definition formal as follows. Fix an
L-structure M. Let V arAssignM be the set of all variable assignments on M. We then define a function
gM : F ormP V arAssignM recursively using the above rules as guides, and we write (M, s)  to mean
that s gM ().
Example. Let L = {R, f} where R is a unary relation symbol and f is a unary function symbol. Let M be
the following L-structure. We have M = {0, 1, 2, 3}, RM = {1, 3}, and f M : M M is the function defined
by
f M (0) = 3
f M (1) = 1
f M (2) = 0
f M (3) = 3
1. For every s : V ar M with s(x) = 2, we have (M, s)  Rx because
(M, s)  Rx (M, s) 6 Rx
s(x)
/ RM
2
/ RM
which is true.
2. For every s, we have (M, s)  xRx. To see this, fix s : V ar M . We have
(M, s)  xRx There exists a M such that (M, s[x a])  Rx
There exists a M such that s[x a](x) RM
There exists a M such that s[x a](x) RM
There exists a M such that a RM
with is true since 1 RM .
3. For every s, we have (M, s)  x(Rx (fx = x)). To see this, fix s : V ar M . We have
(M, s)  x(Rx (fx = x)) For all a M, we have (M, s[x a])  (Rx (fx = x))
For all a M, we have either
(M, s[x a]) 6 Rx or (M, s[x a])  (fx = x)
For all a M, we have either
s[x a](x)
/ RM or s[x a](fx) = s[x a](x)
For all a M, we have either
s[x a](x)
/ RM or f M (s[x a](x)) = s[x a](x)
For all a M, we have either
s[x a](x)
/ RM or f M (s[x a](x)) = s[x a](x)
For all a M, we have either a
/ RM or f M (a) = a
which is true because 0
/ RM , f M (1) = 1, 2
/ RM , and f M (3) = 3.

62

CHAPTER 4. FIRST-ORDER LOGIC : SYNTAX AND SEMANTICS

In the above examples, its clear that only the value of s on the free variables in affect whether or not
(M, s)  . The following precise statement of this fact follows by a straightforward induction.
Proposition 4.2.8. Let M be an L-structure. Suppose that t T ermL and s1 , s2 : V ar M are two
variable assignments such that s1 (x) = s2 (x) for all x OccurV ar(t). We then have s1 (t) = s2 (t).
Proposition 4.2.9. Let M be an L-structure. Suppose that F ormL and s1 , s2 : V ar M are two
variable assignments such that s1 (x) = s2 (x) for all x F reeV ar(). We then have
(M, s1 )  if and only if (M, s2 ) 
Notation 4.2.10. Let L be a language.
1. If x1 , x2 , . . . , xn V ar are distinct, and we refer to a formula (x1 , x2 , . . . , xn ) F ormL we mean that
F ormL and F reeV ar() {x1 , x2 , . . . , xn }.
2. Suppose that M is an L-structure, (x1 , x2 , . . . , xn ) F ormL , and a1 , a2 , . . . , an M . We write
(M, a1 , a2 , . . . , an )  to mean that (M, s)  for some (any) s : V ar M with s(xi ) = ai for all i.
3. As a special case of 2, we have the following. Suppose that M is an L-structure and SentL . We
write M  to mean that (M, s)  for some (any) s : V ar M .

4.2.2

Elementary Classes of Structures

As weve seen, once we fix a language L, an L-structure can fix any set M at all, interpret the elements of C
as fixed elements of M , interpret the elements of Rk as arbitrary subsets of M k , and interpret the elements
Fk as arbitrary k-ary functions on M . However, since we have a precise language in hand, we now carve out
classes of structures which satisfy certain sentences of our language.
Definition 4.2.11. Let L be a language, and let SentL . We let M od() be the class of all L-structures
M such that M  for all . If SentL , we write M od() instead of M od({}).
Definition 4.2.12. Let L be a language and let K be a class of L-structures.
1. K is an elementary class if there exists SentL such that K = M od().
2. K is a weak elementary class if there exists SentL such that K = M od().
By taking conjunctions, we have the following simple proposition.
Proposition 4.2.13. Let L be a language and let K be a class of L-structures. K is an elementary class if
and only if there exists a finite SentL such that K = M od().
Examples. Let L = {R} where R is a binary relation symbol.
1. The class of partially ordered sets is an elementary class as we saw in the introduction. We may let
be the following collection of sentences:
(a) xRxx
(b) xy((Rxy Ryx) (x = y))
(c) xyz((Rxy Ryz) Rxz)
2. The class of equivalence relations is an elementary class. We may let be the following collection of
sentences:

4.2. STRUCTURES: THE SEMANTICS OF FIRST-ORDER LOGIC

63

(a) xRxx
(b) xy(Rxy Ryx)
(c) xyz((Rxy Ryz) Rxz)
3. The class of simple undirected graphs (i.e. edges have no direction, and there are no loops and no
multiple edges) is an elementary class. We may let be the following collection of sentences:
(a) x(Rxx)
(b) xy(Rxy Ryx)

Examples. Let L be any language whatsoever, and let n N+ . The class of L-structures of cardinality at
least n is an elementary class as witnessed by the formula:
^
(xi 6= xj ))
x1 x2 xn (
1i<jn

Furthermore, the class of L-structures of cardinality equal to n is an elementary class. Letting n be the
above formula for n, we can see this by considering n (n+1 ).
Examples. Let L = {0, 1, +, } where 0, 1 are constant symbols and +, are binary function symbols.
1. The class of fields is an elementary class. We may let be the following collection of sentences:
(a) xyz(x + (y + z) = (x + y) + z)
(b) x((x + 0 = x) (0 + x = x))
(c) xy((x + y = 0) (y + x = 0))
(d) xy(x + y = y + x)
(e) xyz(x (y z) = (x y) z)
(f) x((x 1 = x) (1 x = x))
(g) x(x 6= 0 y((x y = 1) (y x = 1)))
(h) xy(x y = y x)
(i) xyz(x (y + z) = (x y) + (x z))
2. For each prime p > 0, the class of fields of characteristic p is an elementary class. Fix a prime p > 0,
and let p be the above sentences togheter with the sentence 1 + 1 + + 1 = 0 (where there are p
1s in the sum).
3. The class of fields of characteristic 0 is a weak elementary class. Let be the above sentences together
with {n : n N+ } where for each n N+ , we have n = (1 + 1 + + 1 = 0) (where there are n 1s
in the sum).

Example. Let F be a field, and let LF = {0, +} {h : F } where 0 is a constant symbol, + is binary
function symbol, and each h is a unary function symbol. The class of vector spaces over F is a weak
elementary class. We may let be the following collection of sentences:
1. xyz(x + (y + z) = (x + y) + z)

64

CHAPTER 4. FIRST-ORDER LOGIC : SYNTAX AND SEMANTICS


2. x((x + 0 = x) (0 + x = x))
3. xy((x + y = 0) (y + x = 0))
4. xy(x + y = y + x)
5. xy(h (x + y) = h (x) + h (y)) for each F .
6. x(h+ (x) = (h (x) + h (x))) for each F .
7. x(h (x) = h (h (x))) for each , F .
8. x(h1 (x) = x)

At this point, its often clear how to show that a certain class of structures is a (weak) elementary class:
simply exhibit the correct sentences. However, it may seem very difficult to show that a class is not a (weak)
elementary class. For example, is the class of fields of characteristic 0 an elementary class? Is the class of
finite groups a weak elementary class? There are no obvious ways to answer these questions affirmatively.
Well develop some tools later which will allow us to resolve these questions negatively.
Another interesting case is that of Dedekind-complete ordered fields. Now the ordered field axioms are
easily written down in the first-order language L = {0, 1, <, +, }. In contrast, the Dedekind-completeness
axiom, which says that every nonempty subset which is bounded above has a least upper bound, can not
be directly translated in the language L because it involves quantifying over subsets instead of elements.
However, we are unable to immediately conclude that this isnt due to a lack of cleverness on our part.
Perhaps there is an alternative approach which captures Dedekind-complete ordered fields in a first-order
way (by finding a clever equivalent first-order expression of Dedekind-completeness). More formally, the
precise question is whether the complete ordered fields are a (weak) elementary class in the language L.
Well be able to answer this question in the negative later as well.

4.2.3

Definability in Structures

Another wonderful side-effect of developing a formal language is the ability to talk about what objects we
can define using that language.
Definition 4.2.14. Let M be an L-structure. Suppose that k N+ and X M k . We say that X is
definable in M if there exists (x1 , x2 , . . . , xk ) F ormL such that
X = {(a1 , a2 , . . . , ak ) M k : (M, a1 , a2 , . . . , ak )  }
Examples. Let L = {0, 1, +, } where 0 and 1 are constant symbols and + and are binary function symbols.
1. The set X = {(m, n) N2 : m < n} is definable in (N, 0, 1, +, ) as witnessed by the formula
z(z 6= 0 (x + z = y))
2. The set X = {n N : n is prime} is definable in (N, 0, 1, +, ) as witnessed by the formula
(x = 1) yz(x = y z (y = 1 z = 1))
3. The set X = {r R : r 0} is definable in (R, 0, 1, +, ) as witnessed by the formula
y(y y = x)

4.2. STRUCTURES: THE SEMANTICS OF FIRST-ORDER LOGIC

65

Example.
1. Let L = {<} where < is a binary relation symbol. For every n N, the set {n} is definable in (N, <).
To see this, first define n (x) to be the formula
y1 y2 yn (

(yi 6= yj )

1i<jn

n
^

(yi < x))

i=1

Now notice that {0} is definable as witnessed by the formula


y(y < x)
+

and for each n N , the set {n} is definable as witnessed by the formula
n (x) n+1 (x)
2. Let L = {e, f} where e is a constant symbol and f is a binary function symbol. Let (G, e, ) be a group
interpreted as an L-stucture. The center of G is definable in (G, e, ) as witnessed by the formula
y(f(x, y) = f(y, x))

Sometimes, there isnt an obvious way to show that a set is definable, but some cleverness and/or
nontrivial mathematics really pays off.
Examples. Let L = {0, 1, +, } where 0 and 1 are constant symbols and + and are binary function symbols.
1. The set N is definable in (Z, 0, 1, +, ) as witnessed by the formula
y1 y2 y3 y4 (x = y1 y1 + y2 y2 + y3 y3 + y4 y4 )
Certainly every element of Z that is a sum of squares must be an element of N. The fact that every
element of N is a sum of four squares is Lagranges Theorem, an important result in number theory.
2. Let (R, 0, 1, +, ) be a commutative ring. The Jacobson radical of R, denoted Jac(R) is the intersection
of all maximal ideal of R. As stated, it is not clear that this is definable in (R, 0, 1, +, ) because it
appears to quantify over subsets. However, a basic result in commutative algebra says that
a Jac(R) ab 1 is a unit for all b R
For all b R, there exists c R with (ab 1)c = 1
Using this, it follows that Jac(R) is definable in (R, 0, 1, +, ) as witnessed by the formula
yz((x y) z = z + 1)
3. The set X = {(k, m, n) N3 : k m = n} is definable in (N, 0, 1, +, ), as is the set {(m, n) N2 : m is
the nth digit in the decimal expansion of }. In fact, every set C Nk which is computable (i.e. for
which you can write a computer program which outputs yes on elements of C and no on elements
of C) is definable in (N, 0, 1, +, ). These are nontrivial yet fundamental result we will prove later.
4. The set Z is definable in (Q, 0, 1, +, ). This is a deep result of Julia Robinson using some nontrivial
number theory.

As for elementary classes, its clear how to attempt to show that something is definable (although as
weve seen this may require a great deal of cleverness). However, its not at all obvious how one could show
that a set is not definable. Well develop a few tools to do this in time.

66

CHAPTER 4. FIRST-ORDER LOGIC : SYNTAX AND SEMANTICS

4.2.4

Substitution

Eventually, we will see the need to substitute terms for variables. Roughly, one might naturally think that
if x is true (M, s), then upon taking a term t and substituting it in for x in the formula , the resulting
formula would also be true in (M, s). We need a way to relate truth before substituting with truth after
substituting. The hope would be the following, where we use the notation tx to intuitively mean that you
substitute t for x:
Hope 4.2.15. Let M be an L-structure, let s : V ar M , let t T ermL , and let x V ar. For all
F ormL , we have
(M, s)  tx if and only if (M, s[x s(t)]) 
In order to make this precise, we first need to define substitition. Even with the correct definition of
substitution, however, the above statement is not true. Lets first define substitution for terms and show
that it behaves well.
Definition 4.2.16. Let x V ar and let t T ermL . We define a function Substtx : T ermL T ermL
denoted by utx as follows.
1. ctx = c for all c C.
(
t if y = x
2. yxt =
y otherwise
for all y V ar.
3. (fu1 u2 . . . uk )tx = f(u1 )tx (u2 )tx (uk )tx for all f Fk and all u1 , u2 , . . . , uk T ermL .
Heres the key lemma that relates how to interpret a term before and after substitition.
Lemma 4.2.17. Let M be an L-structure, let s : V ar M , let t T ermL , and let x V ar. For all
u T ermL , we have
s(utx ) = s[x s(t)](u)
Although the statement of the lemma is symbol heavy, it expresses something quite natural. In order
to determine the value of the term utx according to the variable assignment imposed by s, we need only
change s so that x now gets sent to s(t) (the value of t assigned by s), and evaluate u using this new
variable assignment.
Proof. The proof is by induction on T ermL . For any c C, we have
s(ctx ) = s(c)
= cM
= s[x s(t)](c)
= s[x s(t)](c)
Suppose that u = x. We then have
s(xtx ) = s(t)
= s[x s(t)](x)
= s[x s(t)](x)

4.2. STRUCTURES: THE SEMANTICS OF FIRST-ORDER LOGIC

67

Suppose that u = y V ar and that y 6= x. We then have


s(yxt ) = y
= s[x s(t)](y)
= s[x s(t)](y)
Finally, suppose that f Fk and that the result holds for u1 , u2 , . . . , uk T ermL . We then have
s((fu1 u2 uk )tx ) = s(f(u1 )tx (u2 )tx (uk )tx )
= f M (s((u1 )tx ), s((u2 )tx ), . . . , s((uk )tx ))
= f M (s[x s(t)](u1 ), s[x s(t)](u2 ), . . . , s[x s(t)](uk ))

(by induction)

= s[x s(t)](fu1 u2 uk )

With subsitution in terms defined, we now move to define substitution in formals. The key fact about
this definition is that we only replace x by the term t for the free occurances of x because we certainly dont
want to change x into t, nor do we want to mess with an x inside the scope of such a quantifier. We
thus make the following recursive definition.
Definition 4.2.18. We now define F reeSubstt,x : F ormL F ormL , again denoted tx , as follows.
1. (Ru1 u2 uk )tx = R(u1 )tx (u2 )tx (uk )tx for all R Rk and all u1 , u2 , . . . , uk T ermL .
2. (= u1 u2 )tx = = (u1 )tx (u2 )tx for all u1 , u2 T ermL .
3. ()tx = (tx ) for all F ormL .
4. (3)tx = 3tx xt for all , F ormL and all 3 {, , }.
(
Qy
if x = y
t
5. (Qy)x =
t
Qy(x ) otherwise
for all F ormL , y V ar, and Q {, }.
With the definition in hand, lets analyze the above hope. Suppose that L = , and consider the formula
(x) F ormL given by
y(y = x)
For any L-structure M and any s : V ar M , we have (M, s)  if and only if |M | 2. Now notice that
the formula yx is
y(y = y)
so for any L-structure M and any s : V ar M , we have (M, s) 6 yx . Therefore, the above hope fails
whenever M is an L-structure with |M | 2. The problem is that the term we substituted (in this case y)
had a variable which became captured by a quantifier, and thus the meaning of the formula became
transformed. In order to define ourselves out of this obstacle, we define the following function.
Definition 4.2.19. Let t T ermL and let x V ar. We define a function V alidSubsttx : F ormL {0, 1}
as follows.
1. V alidSubsttx () = 1 for all AtomicF ormL .
2. V alidSubsttx () = V alidSubsttx () for all F ormL .

68

CHAPTER 4. FIRST-ORDER LOGIC : SYNTAX AND SEMANTICS


(
3. V alidSubsttx (3) =

1
0

if V alidSubsttx () = 1 and V alidSubsttx () = 1


otherwise

for all , F ormL and all 3 {, , }.

1
t
4. V alidSubstx (Qy) = 1

if x
/ F reeV ar(Qy)
if y
/ OccurV ar(t) and V alidSubsttx () = 1
otherwise

for all F ormL , x, y V ar, and Q {, }.


Theorem 4.2.20 (Substitution Theorem). Let M be an L-structure, let s : V ar M , let t T ermL , and
let x V ar. For all F ormL with V alidSubsttx () = 1, we have
(M, s)  tx if and only if (M, s[x s(t)]) 
Proof. The proof is by induction on . We first handle the case when AtomicF ormL . Suppose that
R Rk and that u1 , u2 , . . . , uk T ermL . We then have
(M, s)  (Ru1 u2 uk )tx (M, s)  R(u1 )tx (u2 )tx (uk )tx
(s((u1 )tx ), s((u2 )tx ), , s((uk )tx )) RM
(s[x s(t)](u1 ), s[x s(t)](u2 ), , s[x s(t)](uk )) RM
(M, s[x s(t)])  Ru1 u2 uk
If u1 , u2 T ermL , we have
(M, s)  (= u1 u2 )tx (M, s)  = (u1 )tx (u2 )tx
s((u1 )tx ) = s((u2 )tx )
s[x s(t)](u1 ) = s[x s(t)](u2 )
(M, s[x s(t)])  = u1 u2
Suppose that the results holds for and that V alidSubsttx () = 1. We then have that V alidSubsttx () = 1,
and hence
(M, s)  ()tx (M, s)  (tx )
(M, s) 6 tx
(M, s[x s(t)]) 6

(by induction)

(M, s[x s(t)]) 


The connectives ,, and are similarly uninteresting.
Suppose that the result holds for and that V alidSubsttx (y) = 1. First, if x
/ F reeV ar(y), we have
(M, s)  (y)tx (M, s)  y
(M, s[x s(t)])  y
Suppose then that x F reeV ar(y), so in particular x 6= y. Since V alidSubsttx (y) = 1, we have that
y
/ OccurV ar(t), and also that V alidSubsttx () = 1. Now using the fact that y
/ OccurV ar(t), it follows

4.3. RELATIONSHIPS BETWEEN STRUCTURES

69

that s[y a](t) = s(t) for every a M . Therefore,


(M, s)  (y)tx (M, s)  y(tx )
There exists a M such that (M, s[y a])  tx
There exists a M such that (M, (s[y a])[x s[y a](t)]) 

(by induction)

There exists a M such that (M, (s[y a])[x s(t)]) 


There exists a M such that (M, (s[x s(t)])[y a]) 
(M, s[x s(t)])  y
Suppose that the result holds for and that V alidSubsttx (y) = 1. First, if x
/ F reeV ar(y), we have
(M, s)  (y)tx (M, s)  y
(M, s[x s(t)])  y
Suppose then that x F reeV ar(y), so in particular x 6= y. Since V alidSubsttx (y) = 1, we have that
y
/ OccurV ar(t) and also that V alidSubsttx () = 1. Now using the fact that y
/ OccurV ar(t), it follows
that s[y a](t) = s(t) for every a M . Therefore,
(M, s)  (y)tx (M, s)  y(tx )
For all a M , we have (M, s[y a])  tx
For all a M , we have (M, (s[y a])[x s[y a](t)]) 
For all a M , we have (M, (s[y a])[x s(t)]) 
For all a M , we have (M, (s[x s(t)])[y a]) 
(M, s[x s(t)])  y

4.3
4.3.1

Relationships Between Structures


Homomorphisms and Embeddings

Definition 4.3.1. Let L be a language, and let M and N be L-structures.


1. A function h : M N is called a homomorphism if
(a) For all c C, we have h(cM ) = cN
(b) For all R Rk and all a1 , a2 , . . . , ak M, we have
(a1 , a2 , . . . , ak ) RM if and only if (h(a1 ), h(a2 ), . . . , h(ak )) RN
(c) For all f Fk and all a1 , a2 , . . . , ak M, we have
h(f M (a1 , a2 , . . . , ak )) = f N (h(a1 ), h(a2 ), . . . , h(ak ))
2. A function h : M N is called an embedding if it is an injective homomorphism.
3. A function h : M N is called an isomorphism if it is a bijective homomorphism.
Notation 4.3.2. Let M and N be L-structures. If M and N are isomorphic, i.e. if there exists an
isomorphism h : M N , then we write M
= N.

70

CHAPTER 4. FIRST-ORDER LOGIC : SYNTAX AND SEMANTICS

Definition 4.3.3. Let M be an L-structure. An isomorphism h : M M is called an automorphism.


Theorem 4.3.4. Let L be a language, and let M and N be L-structures. Suppose that h : M N is a
homomorphism, and suppose that s : V ar M is a variable assignment.
1. For any t T ermL , we have h(s(t)) = h s(t).
2. For every quantifier-free F ormL not containing the equality symbol, we have
(M, s)  if and only if (N , h s) 
3. If h is an embedding, then for every quantifier-free F ormL , we have
(M, s)  if and only if (N , h s) 
4. If h is an isomorphism, then for every F ormL , we have
(M, s)  if and only if (N , h s) 
Proof.
1. We prove this by induction on t. First, for any c C, we have
h(s(c)) = h(cM )
= cN

(since h is a homomorphism)

= h s(c).
Now for x V ar, we have
h(s(x)) = h(s(x))
= (h s)(x)
= h s(x)
Suppose now that f Fk , that t1 , t2 , . . . , tk T ermL , and the result holds for each ti . We then have
h(s(ft1 t2 tk )) = h(f M (s(t1 ), s(t2 ), . . . , s(tk )))
= f M (h(s(t1 )), h(s(t2 )), . . . , h(s(tk )))

(since h is a homomorphism)

= f (h s(t1 ), h s(t2 ), . . . , h s(tk ))

(by induction)

= h s(ft1 t2 tk )
The result follows by induction
2. Suppose that h is an embedding. We prove the result by induction on . Suppose first that R Rk
and that t1 , t2 , . . . , tk T ermL . We then have
(M, s)  Rt1 t2 tk (s(t1 ), s(t2 ), . . . , s(tk )) RM
(h(s(t1 )), h(s(t2 )), . . . , h(s(tk ))) RN
(h s(t1 ), h s(t2 ), . . . , h s(tk )) R
(N , h s)  Rt1 t2 tk

(since h is a homomorphism)
(by part 1)

4.3. RELATIONSHIPS BETWEEN STRUCTURES

71

Suppose that the result holds for . We prove it for . We have


(M, s)  (M, s) 6
(N , h s) 6

(by induction)

(N , h s) 
Suppose that the result holds for and . We have
(M, s)  (M, s)  and (M, s) 
(N , h s)  and (N , h s) 

(by induction)

(N , h s) 
and similarly for and . The result follows by induction.
3. In light of the proof of 2, we need only show that if is = t1 t2 where t1 , t2 T ermL , then (M, s) 
if and only if (N , h s)  . For any t1 , t2 T ermL , we have
(M, s)  = t1 t2 s(t1 ) = s(t2 )
h(s(t1 )) = h(s(t2 ))

(since h is injective)

h s(t1 ) = h s(t2 )

(by part 1)

(N , h s)  = t1 t2
4. Suppose that the result holds for and x V ar. We have
(M, s)  x There exists a M such that (M, s[x a]) 
There exists a M such that (N , h (s[x a])) 
There exists a M such that (N , (h s)[x h(a)] 
There exists b N such that (N , (h s)[x b]) 

(since h is bijective)

(N , h s)  x
and also
(M, s)  x For all a M , we have (M, s[x a]) 
For all a M , we have (N , h (s[x a])) 
For all a M , we have (N , (h s)[x h(a)] 
For all b N , we have (N , (h s)[x b]) 

(since h is bijective)

(N , h s)  x

Definition 4.3.5. Let L be a language, and let M and N be L-structures. We write M N , and say that
M and N are elementarily equivalent, if for all SentL , we have M  if and only if N  .
Corollary 4.3.6. Let L be a language, and let M and N be L-structures. If M
= N , then M N .

72

CHAPTER 4. FIRST-ORDER LOGIC : SYNTAX AND SEMANTICS

4.3.2

An Application To Definability

Proposition 4.3.7. Suppose that M is an L-structure and k N+ . Suppose also that X M k is definable
in M and that h : M M is an automorphism. For every a1 , a2 , . . . , ak M , we have
(a1 , a2 , . . . , ak ) X if and only if (h(a1 ), h(a2 ), . . . , h(ak )) X
Proof. Fix (x1 , x2 , . . . , xk ) F ormL such that
X = {(a1 , a2 , . . . , ak ) M k : (M, a1 , a2 , . . . , ak )  }
By part 4 of Theorem 4.3.4, we know that for every a1 , a2 , . . . , ak M , we have
(M, a1 , a2 , . . . , ak )  if and only if (M, h(a1 ), h(a2 ), . . . , h(ak )) 
Therefore, for every a1 , a2 , . . . , ak M , we have
(a1 , a2 , . . . , ak ) X if and only if (h(a1 ), h(a2 ), . . . , h(ak )) X

Corollary 4.3.8. Suppose that M is an L-structure and k N+ . Suppose also that X M k and that
h : M M is an automorphism. If there exists a1 , a2 , . . . , ak M such that exactly one of the following
holds:
(a1 , a2 , . . . , ak ) X
(h(a1 ), h(a2 ), . . . , h(ak )) X
then X is not definable in M.
Example. Let L = {R} where R is a binary relation symbol, and let M be the L-structure where M = Z
and RM = {(a, b) Z2 : a < b}. We show that a set X M is definable in M if and only if either X = or
X = Z. First notice that is definable as witnessed by (x = x) and Z as witnessed by x = x. Suppose now
that X Z is such that X 6= and X 6= Z. Fix a, b Z such that a X and b
/ X. Define h : M M
by letting h(c) = c + (b a) for all c M . Notice that h is automorphism of M because it is bijective (the
map g(c) = c (b a) is clearly an inverse) and a homomorphism because if c1 , c2 Z, then have have
(c1 , c2 ) RM c1 < c2
c1 + (b a) < c2 + (b a)
h(c1 ) < h(c2 )
(h(c1 ), h(c2 )) RM
Notice also that h(a) = a + (b a) = b, so a X but h(a)
/ X. It follows from the proposition that X is
not definable in M.

4.3.3

Substructures

Definition 4.3.9. Let L be a language and let M and A be L-structures. We say that A is a substructure
of M, and we write A M if
1. A M .
2. cA = cM for all c C.

4.3. RELATIONSHIPS BETWEEN STRUCTURES

73

3. RA = RM Ak for all R Rk .
4. f A = f M  Ak for all f Fk .
Remark 4.3.10. Let L be a language and let M and A be L-structures with A M . We then have that
A M if and only if the identity map : A M is a homomorphism.
Remark 4.3.11. Suppose that M is an L-structure and that A M . A is the universe of a substructure
of M if and only if {cM : c C} A and f M (a1 , a2 , . . . , ak ) A for all f Fk and all a1 , a2 , . . . , ak A.
Proposition 4.3.12. Let M be an L-structure and let B M . Suppose either that B 6= or C 6= . If we
let A = G(M, B {cM : c C}, {f M : f F }), then A is the universe of a substructure of M. Moreover, if
N M with B N , then A N .
Proposition 4.3.13. Let L be a language.
1. A 1 -formula is an element of G(SymL , QuantF reeF ormL , {h,x : x V ar}).
2. A 1 -formula is an element of G(SymL , QuantF reeF ormL , {h,x : x V ar}).
Proposition 4.3.14. Suppose that A M.
1. For any QuantF reeF ormL and any s : V ar A, we have
(A, s)  if and only if (M, s) 
2. For any 1 -formula F ormL and any s : V ar A, we have
If (A, s)  , then (M, s) 
3. For any 1 -formula F ormL and any s : V ar A, we have
If (M, s)  , then (A, s) 
Proof.
1. This follows from Remark 4.3.10 and Theorem 4.3.4.
2. We prove this by induction. If is quantifier-free, this follows from part 1. Suppose that we know the
result for , and suppose that (A, s)  x. Fix a A such that (A, s[x a])  . By induction, we
know that (M, s[x a])  , hence (M, s)  x.
3. We prove this by induction. If is quantifier-free, this follows from part 1. Suppose that we know the
result for , and suppose that (M, s)  x. For every a A, we then have (M, s[x a])  , and
hence (A, s[x a])  by induction. It follows that (A, s)  x.

4.3.4

Elementary Substructures

Definition 4.3.15. Let L be a language and let M and A be L-structures. We say that A is an elementary
substructure of M if A M and for all F ormL and all s : V ar A, we have
(A, s)  if and only if (M, s) 
We write A  M to mean that A is an elementary substructure of M.

74

CHAPTER 4. FIRST-ORDER LOGIC : SYNTAX AND SEMANTICS

Example. Let L = {f} where f is a unary function symbol. Let M be the L-structure with M = N and
f M (n) = n + 1. Let A be L-structure with A = N+ and f A (n) = n + 1. We then have that A M.
Furthermore, we have M
= A, hence for all SentL we have
A  if and only if M 
However, notice that A 6 M because if (x) is the formula y(fy = x), we then have that (A, 1)  but
(M, 1) 6 .
Theorem 4.3.16 (Tarski-Vaught Test). Suppose that A M. The following are equivalent.
1. A  M.
2. Whenever F ormL , x V ar, and s : V ar A satisfy (M, s)  x, there exists a A such that
(M, s[x a]) 
Proof. We first prove that 1 implies 2. Suppose then that A  M. Let F ormL and s : V ar A be
such that (M, s)  x. Using the fact that A  M, it follows that (A, s)  x. Fix a A such that
(A, s[x a])  . Using again the fact that A  M, we have (M, s[x a])  .
We now prove that 2 implies 1. We prove by induction on F ormL that for all s : V ar A, we have
(A, s)  if and only if (M, s)  . That is, we let
X = { F ormL : For all s : V ar A we have (A, s)  if and only if (M, s)  }
and prove that X = F ormL by induction. First notice that X for all quantifier-free because A M.
Suppose now that X. For any s : V ar A, we have
(A, s)  (A, s) 6
(M, s) 6

(since X)

(M, s) 
Therefore, X.
Suppose now that , X. For any s : V ar A, we have
(A, s)  (A, s)  and (A, s) 
(M, s)  and (M, s) 

(since , X)

(M, s) 
Therefore, X. Similarly, we have X and X.
Suppose now that X and x V ar. For any s : V ar A, we have
(A, s)  x There exists a A such that (A, s[x a]) 
There exists a A such that (M, s[x a]) 

(since X)

(M, s)  x

(by our assumption 2)

Therefore, x X.
Suppose now that X and x V ar. We then have that X from above, hence x X from
above, hence x X again from above. Thus, for any s : V ar A, we have
(A, s)  x (A, s)  x
(M, s)  x
(M, s)  x
Therefore, x X.

(since x X)

4.3. RELATIONSHIPS BETWEEN STRUCTURES

75

Theorem 4.3.17 (Countable Lowenheim-Skolem-Tarski Theorem). Suppose that L is countable, that M is


an L-structure, and that X M is countable. There exists a countable A  M such that X A.
Proof. Fix an element d M (this will be used as a dummy element of M to ensure that we always have
something to go to when all else fails).
For each F ormL and x V ar such that F reeV ar() = {x}, we define an element n,x M as
follows. If M  x, fix an arbitrary m M such that (M, m)  , and let n,x = m. Otherwise, let
n,x = d.
Now for each F ormL and x V ar such that {x} ( F reeV ar(), we define a function. Suppose that
F reeV ar() = {y1 , y2 , . . . , yk , x}. We define a function h,x : M k M as follows. Let b1 , b2 , . . . , bk
M . If (M, b1 , b2 , . . . , bk )  x, fix an arbitrary a M such that (M, b1 , b2 , . . . , bk , a)  , and let
h,x (b1 , b2 , . . . , bk ) = a. Otherwise, let h,x (b1 , b2 , . . . , bk ) = d.
We now let
B = X {d} {cM : c C} {n,x : x V ar, F ormL , and F reeV ar() = {x}}
and we let
A = G(M, B, {f M : f Fk } {h,x : F ormP , x V ar})
We then have that A is the universe of a substructure A of M. Notice that X A and that by a problem
on Homework 1, we have that A is countable. Thus, we need only show that A  M, which we do by the
Tarski-Vaught test. Suppose that F ormL , x V ar, and s : V ar A are such that (M, s)  x.
Suppose first that x
/ F reeV ar(). Since (M, s)  x, we may fix m M such that (M, s[x m])  .
Now using the fact that x
/ F reeV ar(), it follows that (M, s[x d])  .
Suppose now that F reeV ar() = {x}, and let a = n,x A. Since M  x, hence (M, a)  by
definition of n,x . It follows that there exists a A such that (M, s[x a])  .
Finally, suppose that F reeV ar() = {y1 , y2 , . . . , yk , x}. For each i with 1 i k, let bi = s(yi ), and let
a = h,x (b1 , b2 , . . . , bk ) A. Since (M, b1 , b2 , . . . , bk )  x, hence (M, b1 , b2 , . . . , bk , a)  by definition of
h,x . It follows that there exists a A such that (M, s[x a])  .
Corollary 4.3.18. Suppose that L is countable and that M is an L-structure. There exists a countable
L-structure N such that N M.
Proof. Let N be a countable elementary substructure of M. For any SentL , we then have that N 
if and only if M  , so N M.
This is our first indication that first-order logic is not powerful enough to distinguish certain aspects of
cardinality, and well see more examples of this phenomenon after the Compactness Theorem (for first-order
logic) and once we talk about infinite cardinalities and extend the Lowenheim-Skolem-Tarski result.
This restriction already has some interesting consequences. For example, you may be familiar with the
result that (R, 0, 1, <, +, ) is the unique (up to isomorphism) Dedekind-complete ordered field.
Corollary 4.3.19. The Dedekind-complete ordered fields are not a weak elementary class in the language
L = {0, 1, <, +, }.
Proof. Let K be the class of all Dedekind-complete ordered fields. Suppose that SentL is such that
K = M od(). By the Countable Lowenheim-Skolem-Tarski Theorem, there exists a countable N such that
N (R, 0, 1, <, +, ). Since (R, 0, 1, <, +, ) K, we have (R, 0, 1, <, +, )  for all , so N  for all
, and hence N K. However, this is a contradiction because all Dedekind-complete ordered fields are
isomorphic to (R, 0, 1, <, +, ), hence are uncountable.
Definition 4.3.20. Let L be a language and let M and N be L-structures. Suppose that h : N M . We
say that h is an elementary embedding if h is an embedding and for all F ormL and all s : V ar N ,
we have
(N , s)  if and only if (M, h s) 

76

CHAPTER 4. FIRST-ORDER LOGIC : SYNTAX AND SEMANTICS

4.4
4.4.1

Changing the Language


Expansions and Restrictions

Definition 4.4.1. Let L L0 be languages, let M be an L-structure, and let M0 be an L0 -structure. We


say that M is the restriction of M0 to L, and that M0 is an expansion of M to L0 , if
M = M 0.
0

cM = cM for all c C.
0

RM = RM for all R R.
0

f M = f M for all f F.
Proposition 4.4.2. Let L L0 be languages, let M0 be an L0 -structure, and let M be the restriction of M0
to L. For all F ormL and all s : V ar M , we have (M, s)  if and only if (M0 , s)  .
Proof. By induction.

4.4.2

Adding Constants to Name Elements

Definition 4.4.3. Let L be a language and let M be an L-structure. For each a M , introduce a new
constant ca (not appearing in the original language L and all distinct). Let LM = L {ca : a M } and let
M
Mexp be the LM -structure which is the expansion of M in which ca exp = a for all a M . We call Mexp
the expansion of M obtained by adding names for elements of M .
Definition 4.4.4. Let M be an L-structure, and let s : V ar M be a variable assignment. Define a
function N ames : T ermL T ermLM by plugging in names for free variables according to s. Define a
function N ames : F ormL SentLM again by plugging in names for free variables according to s.
Proposition 4.4.5. Let M be an L-structure, and let s : V ar M be a variable assignment. For every
F ormL , we have
(M, s)  if and only if Mexp  N ames ()
Definition 4.4.6. Let M be an L-structure.
We let AtomicDiag(M) = { SentLM AtomicF ormLM : Mexp  }.
We let Diag(M) = { SentLM : Mexp  }.
Proposition 4.4.7. Let L be a language and let M and N be the L-structures. The following are equivalent:
There exists an embedding h from M to N .
There exists an expansion of N to an LM -structure which is a model of AtomicDiag(M).
Proposition 4.4.8. Let L be a language and let M and N be the L-structures. The following are equivalent:
There exists an elementary embedding h from M to N .
There exists an expansion of N to an LM -structure which is a model of Diag(M).

Chapter 5

Semantic and Syntactic Implication


5.1
5.1.1

Semantic Implication and Theories


Definitions

Definition 5.1.1. Let L be a language and let F ormL . A model of is a pair (M, s) where
M is an L-structure.
s : V ar M is a variable assignment.
(M, s)  for all .
Definition 5.1.2. Let L be a language. Let F ormL and let F ormL . We write  to mean that
whenever (M, s) is a model of , we have that (M, s)  . We pronounce  as semantically implies
.
Definition 5.1.3. Let L be a language and let F ormL . We say that is satisfiable if there exists a
model of .
Definition 5.1.4. Let L be a language. A set SentL is an L-theory if
is a satisfiable.
For every SentL with  , we have .
There are two standard ways to get theories. One is to take a stucture, and consider all of the sentences
that are true in that structure.
Definition 5.1.5. Let M be an L-structure. We let T h(M) = { SentL : M  }. We call T h(M) the
theory of M.
Proposition 5.1.6. Let L be a language and let M be an L-structure. T h(M) is an L-theory.
Proof. First notice that T h(M) is satisfiable because M is a model of T h(M) (since M  for all
T h(M) by definition). Suppose now that SentL is such that T h(M)  . Since M is a model of T h(M),
it follows that M  , and hence T h(M).
Another standard way to get a theory is to take an arbitrary satisfiable set of sentences, and close it off
under semantic implication.
77

78

CHAPTER 5. SEMANTIC AND SYNTACTIC IMPLICATION

Definition 5.1.7. Let L be a language and let SentL . We let Cn() = { SentL :  }. We call
Cn() the set of consequences of .
Proposition 5.1.8. Let L be a language and let SentL be satisfiable. We then have that Cn() is an
L-theory.
Proof. We first show that Cn() is satisfiable. Since is satsfiable, we may fix a model M of . Let
Cn(). We then have that  , so using the fact that M is a model of we conclude that M  .
Therefore, M is a model of Cn(), hence Cn() is satisfiable.
Suppose now that SentL and that Cn()  . We need to show that Cn(), i.e. that  .
Let M be a model of . Since  for all Cn(), it follows that M  for all Cn(). Thus, M
is a model of Cn(). Since Cn()  , it follows that M  . Thus,  , and so Cn().
Definition 5.1.9. An L-theory is complete if for all SentL , either or .
Proposition 5.1.10. Let L be a language and let M be an L-structure. T h(M) is a complete L-theory.
Proof. Weve already seen that T h(M) is a theory. Suppose now that SentL . If M  , we then have
that T h(M). Otherwise, we have M 
6 , so by definition M  , and hence T h(M).
Example. Let L = {f, e} where f is a binary function symbol and e is a constant symbol. Consider the
following sentences.
1 = xyz(f(f(x, y), z) = f(x, f(x, y)))
2 = x(f(x, e) = x f(e, x) = x)
3 = xy(f(x, y) = e f(y, x) = e)
The theory T = Cn({1 , 2 , 3 }) is the theory of groups. T is not complete because it neither contains
xy(f(x, y) = f(y, x)) nor its negation, since there are both abelian groups and nonabelian groups.
Definition 5.1.11. Let L = {R} where R is a binary relation symbol. Consider the following sentences
1 = xRxx
2 = xyz((Rxy Ryz) Rxz)
3 = xy(Rxy Ryx)
4 = xy(x = y Rxy Ryx)
and let LO = Cn({1 , 2 , 3 , 4 }). LO is called the theory of (strict) linear orderings. LO is not complete
because it neither contains yx(x = y x < y) nor its negation because there are linear ordering with greatest
elements and linear orderings without greatest elements.
Definition 5.1.12. Let L = {R} where R is a binary relation symbol. Consider the following sentences
1 = xRxx
2 = xyz((Rxy Ryz) Rxz)
3 = xy(Rxy Ryx)
4 = xy(x = y Rxy Ryx)
5 = xy(Rxy z(Rxz Rzy))
6 = xyRxy
7 = xyRyx
and let DLO = Cn({1 , 2 , 3 , 4 , 5 , 6 , 7 }). DLO is called the theory of dense (strict) linear orderings
without endpoints. DLO is complete as well see below.

5.1. SEMANTIC IMPLICATION AND THEORIES

79

Theorem 5.1.13 (Countable Lowenheim-Skolem Theorem). Suppose that L is countable and that
F ormL is satisfiable. There exists a countable model (M, s) of .
Proof. Since is satisfiable, we may fix a model (N , s) of . Let X = ran(s) N and notice that X
is countable. By the Countable Lowenheim-Skolem-Tarski Theorem, there exists a countable elementary
substructure M  N such that X M . Notice that s is also a variable assigment on M . Now for any
, we have that (N , s)  because (N , s) is a model of , hence (M, s)  because M  N . It follows
that (M, s) is a model of .

5.1.2

Finite Models of Theories

Given a theory T and an n N+ , we want to count the number of models of T of cardinality n up to


isomorphism. There are some technical set-theoretic difficulties here which will be elaborated upon later,
but the key fact that limits the number of isomorphism classes is the following result.
Proposition 5.1.14. Let L be a language and let n N+ . For every L-structure M with |M | = n, there
exists an L-stucture N with N = [n] such that M
= N.
Proof. Let M be an L-structure with |M | = n. Fix a bijection h : M [n]. Let N be the L-structure where
N = [n].
cN = h(cM ) for all c C.
RN = {(b1 , b2 , . . . , bk ) N k : (h1 (b1 ), h1 (b2 ), . . . , h1 (bk )) RN } for all R Rk .
f N is the function from N k to N defined by f N (b1 , b2 , . . . , bk ) = h(f M (h1 (b1 ), h1 (b2 ), . . . , h1 (bk )))
for all f F k .
We then have that h is an isomorphism from M to N .
Proposition 5.1.15. If L is finite and n N+ , then there are only finitely many L-structures with universe
[n].
Definition 5.1.16. Let L be a finite language and let T be an L-theory. For each n N+ , let I(T, n) be the
number of models of T of cardinality n up to isomorphism. Formally, we consider the set of all L-structures
with universe [n], and count the number of equivalence classes under the equivalence relation of isomorphism.
Example 5.1.17. If T is the theory of groups, then I(T, n) is a very interesting function that you study
in algebra courses. For example, you show that I(T, p) = 1 for all primes p, that I(T, 6) = 2, and that
I(T, 8) = 5.
Example 5.1.18. Let L = and let T = Cn(). We have I(T, n) = 1 for all n N+ .
Proof. First notice that for every n N+ , the L-structure M with universe [n] is a model of T of cardinality
n, so I(T, n) 1. Now notice that if M and N are models of T of cardinality n, then any bijection
h : M N is an isomorphism (because L = ), so I(T, n) 1. It follows that I(T, n) = 1 for all n N.
Example 5.1.19. I(LO, n) = 1 for all n N+ .
Proof. First notice that for every n N, the L-strucutre M where M = [n] and RM = {(k, `) [n]2 : k < `}
is a model of LO of cardinality n, so I(LO, n) 1. Next notice that any two linear orderings of cardinality
n are isomorphic. Intuitively, this works as follows. Notice (by induction on the number of elements) that
every finite linear ordering has a least element. Let M and N be two linear orderings of cardinality n.
Each must have a least element, so map the least element of M to that of N . Remove these elements, then
map the least element remaining in M to the least element remaining in N , and continue. This gives an
isomorphism. Formally, you can turn this into a proof by induction on n.

80

CHAPTER 5. SEMANTIC AND SYNTACTIC IMPLICATION

Example 5.1.20. Let L = {f} where f is a unary function symbol, and let T = Cn({x(ffx = x)}). We have
I(T, n) = b n2 c + 1 for all n N+ .
Proof. Lets first analyze the finite models of T . Suppose that M is a model of T of cardinality n. For every
a M , we then have f M (f M (a)) = a. There are now two cases. Either f M (a) = a, or f M (a) = b 6= a in
which case f M (b) = a. Let
F ixM = {a M : f M (a) = a}.
M oveM = {a M : f M (a) 6= a}.
From above, we then have that |M oveM | is even and that |F ixM | + |M oveM | = n. Now the idea is that
two models M and N of T of cardinality n are isomorphic if and only if they have the same number of fixed
points, because then we can match up the fixed points and then match up the pairings left over to get an
isomorphism. Heres a more formal argument.
We know show that if M and N are models of T of cardinality n, then M
= N if and only if |F ixM | =
|F ixN |. Clearly, if M
= N , then |F ixM | = |F ixN |. Suppose conversely that |F ixM | = |F ixN |. We then
M|
such that f M (x) 6= y
must have |M oveM | = |M oveN |. Let XM M oveM be a set of cardinality |M ove
2
M
for all x, y X (that is, we pick out one member from each pairing given by f ), and let XN be such
a set for N . Define a function h : M N . Fix a bijection from : F ixM F ixN and a bijection
: XM XN . Define h by letting h(a) = (a) for all a F ixM , letting h(x) = (x) for all x XM , and
letting h(y) = f N ((f M (y))) for all y M oveM \X. We then have that h is an isomophism from M to N .
Now we need only count how many possible values there are for |F ixM |. Let n N+ . Suppose first that n
is even. Since |M oveM | must be even, it follows that |F ixM | must be even. Thus, |F ixM | {0, 2, 4, . . . , n},
so there are n2 + 1 many possibilities, and its easy to construct models in which each of these possibilities
occurs. Suppose now that n is odd. Since |M oveM | must be even, it follows that |F ixM | must be odd.
Thus, |F ixM | {1, 3, 5, . . . , n}, so there are n1
2 + 1 many possibilities, and its easy to construct models in
which each of these possibilities occurs. Thus, in either case, we have I(T, n) = b n2 c + 1.
Example 5.1.21. I(DLO, n) = 0 for all n N+ .
Proof. As mentioned in the LO example, every finite linear ordering has a least element.
Definition 5.1.22. Suppose that L is a finite language and SentL . Let
Spec() = {n N+ : I(Cn(), n) > 0}
Proposition 5.1.23. There exists a finite language L and a SentL such that Spec() = {2n : n N+ }.
Proof. We give two separate arguments. First, let L = {e, f} be the language of group theory. Let be the
conjunction of the group axioms with the sentence x((x = e) fxx = e) expressing that there is an element
of order 2. Now for every n N+ , the group Z/(2n)Z is a model of of cardinality 2n because n is an
element of order 2. Thus, {2n : n N+ } Spec(). Suppose now that k Spec(), and fix a model M
of of order k. We then have that M is a group with an element of order 2, so by Lagranges Theorem it
follows that 2 | k, so k {2n : n N+ }. It follows that Spec() = {2n : n N+ }.
For a second example, let L = {R} where R is a binary relation symbol. Let be the conjunction of the
following sentences:
xRxx.
xy(Rxy Ryx).
xyz((Rxy Ryz) Rxz).

5.2. SYNTACTIC IMPLICATION

81

xy((y = x) Rxy z(Rxz (z = x z = y))).


Notice that a model of is simply an equivalence relation in which every equivalence class has exactly 2
elements. It is now straightforward to show that Spec() = {2n : n N+ }.
Proposition 5.1.24. There exists a finite language L and a SentL such that Spec() = {2n : n N+ }.
Proof. Again, we give two separate arguments. First, let L = {e, f} be the language of group theory. Let
be the conjunction of the group axioms with the sentences x(x = e) and x(fxx = e) expressing that
the group is nontrivial and that there every nonidentity element has order 2. Now for every n N+ , the
group (Z/2Z)n is a model of of cardinality 2n . Thus, {2n : n N+ } Spec(). Suppose now that
k Spec(), and fix a model M of of order k. We then have that k > 1 and that M is a group such that
every nonidentity element has order 2. Now for any prime p 6= 2, it is not the case that p divides k because
otherwise M would have to have an element of order p by Cauchys Theorem. Thus, the only prime that
divides k is 2, and so k {2n : n N+ }. It follows that Spec() = {2n : n N+ }.
For a second example, let L = {0, 1, +, } be the language where 0, 1 are constant symbols and +,
are binary function symbols. Let be the conjunction of the field axioms together with 1 + 1 = 0. Thus,
the models of are exactly the fields of characteristic 2. By results in algebra, there is a finite field of
characteristic 2 of order k if and only if k is a power of 2.

5.1.3

Countable Models of Theories

Theorem 5.1.25. Suppose that M and N are two countably infinite models of DLO. We then have that
M
= N.
Proof. Back-and-forth construction. See Damirs carefully written proof.
Corollary 5.1.26 (Countable Los-Vaught Test). Let L be a countable language. Suppose that T is an Ltheory such that all models of T are infinite, and suppose also that every two countably infinite models of T
are isomorphic. We then have that T is complete.
Proof. Suppose that T is not complete and fix SentL such that
/ T and
/ T . We then have that
T {} and T {} are both satisfiable by infinite models (because all models of T are infinite), so by
the Countable Lowenheim-Skolem Theorem we may fix countably infinite models M1 of T {} and M2
of T {}. We then have that M1 and M2 are countably infinite models of T which are not isomorphic
(because they are not elementarily equivalent), a contradiction.
Corollary 5.1.27. DLO is complete.
Proposition 5.1.28. Suppose that T is a complete L-theory. If M and N are models of T , then M N .
Proof. Let SentL . If T , we then have that both M  and N  . Suppose that
/ T . Since T is
complete, we then have that T , hence M  and N  . It follows that both M 6 and N 6 .
Therefore, for all SentL , we have that M  if and only if N  , so M N .
Corollary 5.1.29. In the language L = {R} where R is a binary relation symbol, we have (Q, <) (R, <).

5.2
5.2.1

Syntactic Implication
Definitions

Basic Proofs:
` if

(AssumeL )

82

CHAPTER 5. SEMANTIC AND SYNTACTIC IMPLICATION


` t = t for all t T ermL

(EqRef l)

Proof Rules:
`
(EL)
`

`
(ER)
`

` `
(I)
`
`
(IR)
`

`
(IL)
`
`
( E)
{} `

{} `
( I)
`
{} ` {} `
(P C)
`

{} ` {} `
(P C)
{ } `

{} ` {} `
(Contr)
`
Equality Rules:
` tx ` t = u
` ux
Existential Rules:

if V alidSubsttx () = 1 = V alidSubstux () (= Sub)

` tx
` x

{yx } `
{x} `
Universal Rules:

if y
/ F reeV ar( {x, }) and V alidSubstyx () = 1 (P )
` x
` tx

` yx
` x

if V alidSubsttx () = 1 (I)

if V alidSubsttx () = 1 (E)

if y
/ F reeV ar( {x}) and V alidSubstyx () = 1 (I)

Superset Rule:
`
0 `

if 0 (Super)

Definition 5.2.1. A deduction is a witnessing sequence in (P(F ormL ) F ormL , AssumeL EqRef l, H).
Definition 5.2.2. Let F ormP and let F ormP . We write ` to mean that
(, ) (P(F ormL ) F ormL , AssumeL EqRef l, H)
We pronounce ` as syntactically implies .
Notation 5.2.3.
1. If = , we write ` instead of ` .
2. If = {}, we write ` instead of {} ` .
Definition 5.2.4. is inconsistent if there exists F ormP such that ` and ` . Otherwise, we
say that is consistent.

5.2. SYNTACTIC IMPLICATION

5.2.2

83

Some Fundamental Deductions

Proposition 5.2.5. For any t, u T ermL , we have t = u ` u = t.


Proof. Fix t, u T ermL .
{t = u} ` t = t

(EqRef l)

(1)

{t = u} ` t = u

(AssumeL )

(2)

{t = u} ` u = t

(= Sub on 1 and 2 with x = t)

(3)

Proposition 5.2.6. For any t, u, w T ermL , we have {t = u, u = w} ` t = w.


Proof. Fix t, u, w T ermL .
{t = u, u = w} ` t = u

(AssumeL )

(1)

{t = u, u = w} ` u = w

(AssumeL )

(2)

{t = u, u = w} ` t = w

(= Sub on 1 and 2 with t = x)

(3)

Proposition 5.2.7. For any R Rk and any t1 , t2 , . . . , tk T ermL , we have


{Rt1 t2 tk , t1 = u1 , t2 = u2 , . . . , tk = uk } ` Ru1 u2 uk
Proof. Fix R Rk and t1 , t2 , . . . , tk T ermL . Fix x
/
{Rt1 t2 tk , t1 = u1 , t2 = u2 , . . . , tk = uk }. We have

Sk

i=1 (OccurV

ar(ti ) OccurV ar(ui )). Let =

` Rt1 t2 tk

(AssumeL )

(1)

` t1 = u1

(AssumeL )

(2)

(= Sub on 1 and 2 with Rxt2 t3 tk )

(3)

(AssumeL )

(4)

(= Sub on 3 and 4 with Ru1 xt3 tk )

(5)

` Ru1 t2 t3 tk
` t2 = u2
` Ru1 u2 t3 tk
..
.
` tk = uk

(AssumeL )

` Ru1 u2 uk

(= Sub on 2k 1 and 2k with Ru1 u2 x)

(2k)

(2k + 1)

Proposition 5.2.8. For any f Fk and any t1 , t2 , . . . , tk T ermL , we have


{t1 = u2 , t2 = u2 , . . . , tk = uk } ` ft1 t2 tk = fu1 u2 uk
Proof. Fix f Fk and t1 , t2 , . . . , tk T ermL . Fix x
/

Sk

i=1 (OccurV

ar(ti ) OccurV ar(ui )). Let = {t1 =

84

CHAPTER 5. SEMANTIC AND SYNTACTIC IMPLICATION

u1 , t2 = u2 , . . . , tk = uk }. We have
` ft1 t2 tk = f t1 t2 tk
` t1 = u1

(1)

(AssumeL )

(2)

(= Sub on 1 and 2 with ft1 t2 tk = fxt2 tk )

(3)

(AssumeL )

(4)

(= Sub on 1 and 2 with ft1 t2 tk = fu1 x tk )

(3)

` ft1 t2 tk = fu1 t2 tk
` t2 = u2
` ft1 t2 tk = fu1 u2 tk
..
.

(EqRef l)

` tk = uk

(AssumeL )

` ft1 t2 tk = fu1 u2 uk

(= Sub on 2k 1 and 2k with ft1 t2 tk = fu1 u2 x)

(2k)

(2k + 1)

Similar to the previous proposition, but start with the line ` ft1 t2 tk = ft1 t2 tk using the EqRef l
rule.
Proposition 5.2.9. x ` x.
Proof. Fix y 6= x with y
/ OccurV ar().
{yx , x, x} ` x
{yx , x, x}
{yx , x}
{yx , x}
{yx , x}
{yx }

` x

(AssumeL )

(1)

(AssumeL )

(2)

(Contr on 1 and 2)
(E on 3)

(3)
(4)

(AssumeL )

(5)

` x

(Contr on 4 and 5)

(6)

{x} ` x

(P on 6)

(7)

` x
` (yx )
` yx

Proposition 5.2.10. x ` x.
Proof. Fix y 6= x with y
/ OccurV ar().
{x, yx } ` x

(AssumeL )

(1)

{x, yx }
{x, yx }

(AssumeL )

(2)

()yx

` x

{x} ` yx
{x} ` x

5.2.3

(I on 2)

(3)

(Contr on 1 and 3)

(4)

(I on 4)

(5)

Theorems About `

Proposition 5.2.11. If is inconsistent, then ` for all F ormP .


Proof. Fix such that ` and ` , and fix F ormL . We have that {} ` and {} `
by the Super rule. Therefore, ` by using the Contr rule.
Proposition 5.2.12.

5.2. SYNTACTIC IMPLICATION

85

1. If {} is inconsistent, then ` .
2. If {} is inconsistent, then ` .
Proof.
1. Since {} is inconsistent, we know that {} ` by Proposition 5.2.11. Since we also have
that {} ` by Assume, it follows that ` by the P C rule.
2. Since {} is inconsistent, we know that {} ` by Proposition 5.2.11. Since we also have
that {} ` by Assume, it follows that ` by the P C rule.

Corollary 5.2.13. If F ormL is consistent and F ormL , then either {} is consistent or


{} is consistent.
Proof. If both {} and {} are inconsistent, then both ` and ` by Proposition 5.2.12,
so is inconsistent.
Proposition 5.2.14.
1. If ` and {} ` , then ` .
2. If ` and ` , then ` .
Proof.
1. Since ` , it follows from the Super rule that {} ` . Since we also have {} ` by
Assume, we may conclude that {} is inconsistent. Therefore, by Proposition 5.2.11, we have
that {} ` . Now we also have {} ` by assumption, so the P C rule gives that ` .
2. Since ` , we can conclude that {} ` by rule E. The result follows from part 1.

Proposition 5.2.15. Let Gf in = G(Pf in (F ormL ) F ormL , AssumeL EqRef l, H), i.e. we insist that the
set is finite but otherwise have exactly the same proof rules. Let `f in denote that (, ) Gf in
1. If `f in , then ` .
2. If ` , then there exists a finite 0 such that 0 `f in
In particular, if ` , then there exists a finite 0 such that 0 ` .
Proof. 1 is a completely straightforward induction because the starting points are the same and we have the
exact same rules. The proof of 2 goes in much the same way as the corresponding result for propositional
logic.
Corollary 5.2.16. If every finite subset of is consistent, then is consistent.
Proof. Suppose that is inconsistent, and fix F ormL such that ` and ` . By Proposition
5.2.15, there exists finite sets 0 and 1 such that 0 ` and 1 ` . Using the Super rule, it
follows that 0 1 ` and 0 1 ` , so 0 1 is a finite inconsistent subset of .

86

CHAPTER 5. SEMANTIC AND SYNTACTIC IMPLICATION

Chapter 6

Soundness, Completeness, and


Compactness
6.1

Soundness

Theorem 6.1.1 (Soundness Theorem).


1. If ` , then  .
2. Every satisfiable set of formulas is consistent.
Proof.
1. The proof is by induction. We let X = {(, ) G :  } and we show by induction on G that
X = G. We begin by noting that if , then  because if (M, s) is a model of , then (M, s)
is a model of simply because . Therefore, (, ) X for all (, ) AssumeL . Also, for any
F ormL and any t T ermL , we have  t = t because for any any model (M, s) of we have
s(t) = s(t), hence (M, s)  t = t.
We now handle the inductive steps. All of the old rules go through in a similar manner as before, and
the Super rule is trivial.
We first handle the = Sub rule. Suppose that  tx , that  t = u, and that V alidSubsttx () =
1 = V alidSubstux (). We need to show that  ux . Fix a model (M, s) of . Since  tx , we
have that (M, s)  tx . Also, since  t = u, we have that (M, s)  t = u, and hence s(t) = s(u).
Now using the fact that V alidSubsttx () = 1 = V alidSubstux () = 1, we have
(M, s)  tx (M, s[x s(t)]) 

(by the Substitution Theorem)

(M, s[x s(u)]) 


(M, s)  ux

(by the Substitution Theorem)

We now handle the I rule. Suppose that  tx where V alidSubstt,x () = 1. We need to


show that  x. Fix a model (M, s) of . Since  tx , it follows that (M, s)  tx . Since
V alidSubsttx () = 1, we have
(M, s)  tx (M, s[x s(t)]) 

(by the Substitution Theorem)

There exists a M such that (M, s[x a]) 


(M, s)  x
87

88

CHAPTER 6. SOUNDNESS, COMPLETENESS, AND COMPACTNESS


Lets next attack the P rule. Suppose that {yx }  , that y
/ F reeV ar( {x, }), and
that V alidSubstyx () = 1. We need to show that {x}  . Fix a model (M, s) of {x}.
Since (M, s)  x, we may fix a M such that (M, s[x a])  . We first divide into two
cases to show that (M, s[y a])  yx .
Case 1: Suppose that y = x. We then have yx = xx = and s[x a] = s[y a], hence
(M, s[y a])  yx because (M, s[x a])  .
Case 2: Suppose that y 6= x. We then have
(M, s[x a])  (M, (s[y a])[x a]) 

(since y
/ F reeV ar() and y 6= x)

(M, (s[y a])[x s[y a](y)]) 


(M, s[y a])  yx

(by the Substitution Theorem)

Thus, (M, s[y a])  yx in either case. Now since (M, s)  for all and y
/ F reeV ar(),
we have (M, s[y a])  for all . Thus, (M, s[y a])  because {yx }  . Finally,
since y
/ F reeV ar(), it follows that (M, s)  .
We next do the E rule. Suppose that  x and that t T ermL is such that V alidSubsttx () =
1. We need to show that  tx . Fix a model (M, s) of . Since  x, it follows that that
(M, s)  x. Since V alidSubsttx () = 1, we have
(M, s)  x For all a M, we have (M, s[x a]) 
(M, s[x s(t)]) 
(M, s)  tx

(by the Substitution Theorem)

/ F reeV ar( {x}), and that


We finally end with the I rule. Suppose that  yx , that y
V alidSubstyx () = 1. We need to show that  x. Fix a model (M, s) of .
Case 1: Suppose that y = x. Since y = x, we have yx = xx = . Fix a M . Since (M, s) 
for all and x = y
/ F reeV ar(), we may conclude that (M, s[x a])  for all .
Therefore, (M, s[x a])  yx , i.e. (M, s[x a])  , because  yx . Now a M was arbitrary,
so (M, s[x a])  for every a M , hence (M, s)  x.
Case 2: Suppose that y 6= x. Fix M . Since (M, s)  for all and y
/ F reeV ar(), we may
conclude that (M, s[y a])  for all . Therefore, (M, s[y a])  yx because  yx .
Since V alidSubstyx () = 1, we have
(M, s[y a])  yx (M, (s[y a])[x s[y a](y)])  (by the Substitution Theorem)
(M, (s[y a])[x a]) 
(M, s[x a]) 

(since y
/ F reeV ar() and y 6= x)

Now a M was arbitrary, so (M, s[x a])  for every a M , hence (M, s)  x.
The result follows by induction.
2. Let be a satisfiable set of formulas. Fix a model (M, s) of . Suppose that is inconsistent, and fix
F ormL such that ` and ` . We then have  and  by part 1, hence (M, s) 
and (M, s)  , a contradiction. It follows that is consistent.

6.2. PRIME FORMULAS

6.2

89

Prime Formulas

Definition 6.2.1. A formula F ormL is prime if one the following holds.


1. AtomicF ormL .
2. = x for some x V ar and some F ormL .
3. = x for some x V ar and some F ormL .
We denote the set of prime formulas by P rimeF ormL .
Definition 6.2.2. Let L be a language. Let P (L) = {A : P rimeF ormL }.
We define a function h : F ormL F ormP (L) recursively as follows.
1. h() = A for all AtomicF ormL .
2. h() = h() for all F ormL .
3. h(3) = 3h()h() for all , F ormL and all 3 {, , }.
4. h(Qx) = AQx for all F ormL , x V ar, and Q {, }.
For F ormL , we write # for h(). For F ormL , we write # for { # : }.
We also define a function g : F ormP (L) F ormL recursively as follows.
1. g(A ) = for all P rimeF ormL .
2. g() = g() for all F ormPL .
3. g(3) = 3g()g() for all , F ormPL and all 3 {, , }.
For F ormP (L) , we write ? for g(). For F ormP (L) , we write ? for { ? : }.
Proposition 6.2.3.
1. For all F ormL , we have (# )? = .
2. For all F ormP (L) , we have (? )# = .
Proof. A trivial induction.
Proposition 6.2.4. Let L be a language, let F ormP (L) , and let F ormP (L) .
1. If `P (L) (in the propositional language P (L)), then ? `L ? (in the first-order language L).
2. If P (L) (in the propositional language P (L)), then ? `L ? (in the first-order language L).
Proof.
1. This follows by induction because all propositional rules are included as first-order logic rules.
2. If P (L) , then `P (L) by the Completeness Theorem for propositional logic, hence ? `L ? by
part 1.

Corollary 6.2.5. Let L be a language, let F ormL , and let F ormL . If # P (L) # (in the
propositional language P (L)), then `L (in the first-order language L).

90

CHAPTER 6. SOUNDNESS, COMPLETENESS, AND COMPACTNESS

Proof. Suppose that # P (L) # . By Proposition 6.2.4, it follows that (# )? P (L) (# )? , hence `L
by Proposition 6.2.3
Example 6.2.6. Let L be a language and let , F ormL We have ( ) `L ()
Proof. We show that (( ))# P (L) ( ())# . Notice that
1. (( ))# = (# # )
2. ( ())# = # ( # ).
Suppose that v : P (L) {0, 1} is a truth assignment such that v((# # )) = 1. We then have v(#
# ) = 0, hence v(# ) = 1 and v( # ) = 0. We therefore have v(( # )) = 1 and hence v(# ( # )) = 1.
It follows that (( ))# P (L) ( ())# .
Corollary 6.2.7. Let L be a language, let F ormL , and let , F ormL . If `L and # P (L) # ,
then `L .
Proof. Since # P (L) # , we have that `L by Corollary 6.2.5. It follows from the Super rule that
{} `L . Using Proposition 5.2.14 (since `L and {} `L ), we may conclude that `L .

6.3
6.3.1

Completeness
Motivating the Proof

We first give an overview of the key ideas in our proof of completeness. Let L be a language, and suppose
that F ormL is consistent.
Definition 6.3.1. Suppose that L is a language and that F ormL . We say that is complete if for all
F ormL , either or .
As we saw in propositional logic, it will aid use greatly to extend to a set which is both consistent
and complete, so lets assume that we can do that (we will prove it exactly the same way below). We need
to construct an L-structure M and a variable assignment s : V ar M such that (M, s)  for all .
Now all that we have is the syntactic information that provides, so it seems that the only way to proceed
is to define our M from these syntactic objects. Since terms intuitively name elements, it is natural to try
to define the universe M to simply be T ermL . We would then define the structure as follows
1. cM = c for all c C.
2. RM = {(t1 , t2 , . . . , tk ) M k : Rt1 t2 . . . tk } for all R Rk .
3. f M (t1 , t2 , . . . , tk ) = ft1 t2 tk for all f Fk and all t1 , t2 , . . . , tk M .
and let s : V ar M be the variable assignment defined by s(x) = x for all x V ar.
However, there are two problems with this approach, one of which is minor and the other is quite serious.
First, lets think about the minor problem. Suppose that L = {f, e} where f is a binary function symbol
and e is a constant symbol, and that is the set of group axioms. Suppose that is consistent and
complete. We then have fee = e because ` fee = e. However, the two terms fee and e are syntactically
different objects, so if we were to let M be T ermL this would cause a problem because fee and e are distinct
despite the fact that says they must be equal. Of course, when you have distinct objects which you want
to consider equivalent, you should define an equivalence relation. Thus, we should define on T ermL by
letting t u if t = u . We would then need to check that is an equivalence relation and that the

6.3. COMPLETENESS

91

definition of the structure above is independent of our choice of representatives for the classes. This is all
fairly straightfoward, and will be carried out below.
On to the more serious obstacle. Suppose that L = {P} where P is a unary relation symbol. Let
= {Px : x V ar} {(x = y) : x, y V ar with x 6= y} {xPx} and notice that is consistent because
it is satisfiable (Let M = N, let s : V ar N be s(xk ) = k + 1 and let PM = {0}). Suppose that is
consistent and complete. In the structure M described above, we have M = T ermL = V ar (notice that the
equivalence relation defined above will be trivial in this case). Thus, since (M, s)  Px for all x V ar, it
follows that (M, s) 6 xPx. Hence, M is not a model of .
The problem in the above example is that there was an existential statement in , but whenever you
plugged a term in for the quantified variable, the resulting formula was not in . Since we are building
our structure from the terms, this is a serious problem. However, if had the following, then this problem
would not arise.
Definition 6.3.2. Let L be a language and let F ormL . We say that contains witnesses if for all
F ormL and all x V ar, there exists c C such that (x) cx .
Our goal then is to show that if is consistent, then there exists a which is consistent, complete,
and contains witnesses. On the face of it, this is not true, as the above example shows (because there are no
constant symbols). However, if we allow ourselves to expand our language with new constant symbols, we
can repeatedly add witnessing statements by using these fresh constant symbols as our witnesses. The key
question we need to consider is the following. Suppose that L is a language and F ormL is consistent.
If you expand the language L to a language L0 obtained by adding a new constant symbol, is the set still
consistent when viewed as a set of L0 formulas? It might seem absolutely harmless to add a new constant
symbol about which we say nothing (and its not hard very hard to see that it is semantically harmless),
but we are introducing new deductions in L. We need a way to convert a possibly bad L0 -deduction into a
similarly bad L-deduction to argue that is still consistent as a set of L0 -formulas.

6.3.2

The Proof

We can also define substitution of variables for constants in the obvious recursive fashion. Ignore the following
lemma until you see why we need it later.
Lemma 6.3.3. Let F ormL , let t T ermL , let c C, and let x, z V ar. Suppose that z
/ OccurV ar().
tz

(tx )zc equals (zc )xc .


tz

If V alidSubsttx () = 1, then V alidSubstxc (zc ) = 1.


Lemma 6.3.4. Let L be a language, and let L0 be L together with a new constant symbol c. Suppose that
0 `L0 0
1 `L0 1
2 `L0 2
..
.
n `L0 n

92

CHAPTER 6. SOUNDNESS, COMPLETENESS, AND COMPACTNESS


Sn
is an L0 -deduction. For any z V ar with z
/ OccurV ar( i=0 (i {i })), we have that
(0 )zc `L (0 )zc
(1 )zc `L (1 )zc
(2 )zc `L (2 )zc
..
.
(n )zc `L (n )zc
is an L-deduction.
Proof. We prove by induction on i that
(0 )zc `L (0 )zc
(1 )zc `L (1 )zc
(2 )zc `L (2 )zc
..
.
(i )zc `L (i )zc
is an L-deduction.
If , then zc zc .
Suppose that line i is ` t = t where t T ermL0 . Since (t = t)zc equals tzc = tzc , we can place zc ` tzc = tzc
on line i by the EqRef l rule.
Suppose that `L0 was a previous line and we inferred `L0 . Inductively, we have zc `L ( )zc
on the corresponding line. Since ( )zc = zc cz , we may use the EL rule to put zc `L zc on the
corresponding line. The other propositional rules are similarly uninteresting.
Suppose that `L0 tx and `L0 t = u were previous lines, that V alidSubsttx () = 1 = V alidSubstux (),
and we inferred `L0 ux . Inductively, we have zc `L (tx )zc and zc `L (t = u)zc on the corresponding lines.
tz
tz
Now (tx )zc equals (zc )xc by the previous lemma, and (t = u)zc equals tzc = uzc . Thus, we have zc `L (zc )xc
and zc `L tzc = uzc on the corresponding lines. Using the fact that V alidSubsttx () = 1 = V alidSubstux (),
tz
uz
we can use the previous lemma to conclude that that V alidSubstxc (zc ) = 1 = V alidSubstx c (zc ). Hence, we
uz
uz
may use that = Sub rule to put zc `L (zc )x c on the corresponding line. We now need only note that (zc )x c
equals (ux )zc by the previous lemma.
Suppose that `L0 tx where V alidSubsttx () = 1 was a previous line and we inferred `L0 x.
tz
tz
Inductively, we have zc `L (tx )zc on the corresponding line. Now (tx )zc equals (zc )xc and V alidSubstxc (zc ) = 1
by the previous lemma. Hence, we may use the I rule to put zc `L x(zc ) on the corresponding line. We
now need only note that x(zc ) equals (x)zc .
The other rules are similarly awful.
Corollary 6.3.5 (Generalization on Constants). Let L be a language, and let L0 be L together with a new
constant symbol c. Suppose that F ormL and F ormL . If `L0 cx , then `L x.
Proof. Since `L0 , we may use Proposition 5.2.15 to fix an L0 -deduction
0 `L0 0
1 `L0 1
2 `L0 2
..
.
n `L0 n

6.3. COMPLETENESS

93

such that each i F ormL0 is finite and n . Fix y V ar such that


y
/ OccurV ar(

n
[

(i {i }))

i=0

From Lemma 6.3.4, we have that


(0 )yc `L (0 )yc
(1 )yc `L (1 )yc
(2 )yc `L (2 )yc
..
.
(n )yc `L (n )yc
is an L-deduction. Since n F ormL , we have (n )yc = n . Now (n )yc = (cx )yc = yx . We therefore
have n `L yx . We may then use the I rule to conclude that n `L x. Finally, `L x by the Super
rule.
Corollary 6.3.6. Let L be a language, let F ormL , and let F ormL .
1. Let L0 be L together with a new constant symbol. If `L0 , then `L .
2. Let L0 be L together with finitely many new constant symbols. If `L0 , then `L .
3. Let L0 be L together with (perhaps infinitely many) new constant symbols. If `L0 , then `L .
Proof.
1. Since `L0 , we may use Proposition 5.2.15 to fix an L0 -deduction
0 `L0 0
1 `L0 1
2 `L0 2
..
.
n `L0 n
such that each i F ormL0 is finite, n , and n = . Fix y V ar such that
y
/ OccurV ar(

n
[

(i ) {i })

i=0

From Lemma 6.3.4, we have that


(0 )yc `L (0 )yc
(1 )yc `L (1 )yc
(2 )yc `L (2 )yc
..
.
(n )yc `L (n )yc
is an L-deduction. Since n F ormL and F ormL , it follows that (n )yc = n and (n )yc = .
Thus, n `L and so `L by the Super rule.

94

CHAPTER 6. SOUNDNESS, COMPLETENESS, AND COMPACTNESS


2. This is proved by induction on the number of new constant symbols, using part 1 for the base case
and the inductive step.
3. Suppose that `L0 , and use Proposition 5.2.15 to fix an L0 -deduction
0 `L0 0
1 `L0 1
2 `L0 2
..
.
n `L0 n
0
such that each
Sni F ormL is finite, n , and n = . Let {c0 , c1 , . . . , cm } be all constants
appearing in i=0 (i {i }) and let L0 = L {c0 , c1 , . . . , cm }. We then have that

0 `L0 0
1 `L0 1
2 `L0 2
..
.
n `L0 n
is an L0 -deduction, so n `L0 . Using the Super rule, we conclude that `L0 . Therefore, `L
by part 2.

Corollary 6.3.7. Let L be a language and let L0 be L together with (perhaps infinitely many) new constant
symbols. Let F ormL . is L-consistent if and only if is L0 -consistent.
Proof. Since any L-deduction is also a L0 -deduction, if is L-inconsistent then it is L0 -inconsistent . Suppose
that is L0 -inconsistent. We then have that `L0 for all F ormL by Proposition 5.2.11, hence `L
for all F ormL by Corollary 6.3.6. Therefore, is L-inconsistent.
Lemma 6.3.8. Let L be a language, and let L0 be L together with a new constant symbol c. Suppose that
F ormL is L-consistent and that F ormL . We then have that {(x) cx } is L0 -consistent.
Proof. Suppose that {(x) cx } is L0 -inconsistent. We then have that `L0 ((x) cx ), hence
#
#
`L0 (x) (cx ) by Corollary 6.2.7 (because (((x) cx )) P (L0 ) ((x) (cx )) ). Thus, `L0
x by the EL rule, so `L0 x (by Proposition 5.2.9 and Proposition 5.2.14), and hence `L x
by Corollary 6.3.6. We also have `L0 ()cx by the ER rule, so `L x by Generalization on
Constants. This contradicts the fact that is L-consistent.
Lemma 6.3.9. Let L be a language and let F ormL be L-consistent. There exists a language L0 L
and 0 F ormL0 such that
1. 0 .
2. 0 is L0 -consistent.
3. For all F ormL and all x V ar, there exists c C such that (x) cx 0 .

6.3. COMPLETENESS

95

Proof. For each F ormL and each x V ar, let c,x be a new constant symbol (distinct from all symbols
in L). Let L0 = L {c,x : F ormL and x V ar}. Let
c

0 = {(x) x,x : F ormL and x V ar}}


Conditions 1 and 3 are clear, so we need only check that 0 is L0 -consistent. By Corollary 5.2.16, it suffices
to check that all finite subsets of 0 are L0 -consistent, and for this it suffices to show that
c

,x

,x

{(x1 1 ) (1 )x1 1 1 , (x2 2 ) (2 )x2 2 2 , . . . , (xn n ) (n )xn n ,xn }


is L0 -consistent whenever 1 , 2 , . . . , n F ormL and x1 , x2 , . . . , xn V ar. Formally, one can prove this by
induction on n. A slightly informal argument is as as follows. Fix 1 , 2 , . . . , n F ormL and x1 , x2 , . . . , xn
V ar. We have
is L-consistent, so
c

,x

,x

,x

,x

,x

{(x1 1 ) (1 )x1 1 1 } is (L {c1 ,x1 })-consistent by Lemma 6.3.8, so


{(x1 1 ) (1 )x1 1 1 , (x2 2 ) (2 )x2 2 2 } is (L {c1 ,x1 , c2 ,x2 })-consistent by Lemma 6.3.8, so
...
c

{(x1 1 ) (1 )x1 1 1 , (x2 2 ) (2 )x2 2 2 , . . . , (xn n ) (n )xn n ,xn } is


(L {c1 ,x1 , c2 ,x2 , . . . , cn ,xn })-consistent
Therefore,

,x

,x

{(x1 1 ) (1 )x1 1 1 , (x2 2 ) (2 )x2 2 2 , . . . , (xn n ) (n )xn n ,xn }


is L0 -consistent by Corollary 6.3.7.
Proposition 6.3.10. Let L be a language and let F ormL be consistent. There exists a language L0 L
and 0 F ormL0 such that
1. 0 .
2. 0 is L0 -consistent.
3. 0 contains witnesses.
Proof. Let L0S= L and 0 = . ForSeach n N, use the previous lemma to get Ln+1 and n+1 from Ln and
n . Set L0 = nN L and set 0 = nN n .
Proposition 6.3.11. (Suppose that L is countable.) If is consistent, then there exists a set which
is consistent and complete.
Proof. Exactly the same proof as the propositional logic case, using Zorns Lemma in the uncountable
case.
Proposition 6.3.12. Let L be a language. If L is consistent, then there a language L0 L (which is
L together with new constant symbols) and F ormL0 such that
.
is consistent.
is complete.

96

CHAPTER 6. SOUNDNESS, COMPLETENESS, AND COMPACTNESS


contains witnesses.

Proof.
Lemma 6.3.13. Suppose that is consistent and complete. If ` , then .
Proof. Suppose that ` . Since is complete, we have that either or . Now if ,
then ` , hence is inconsistent contradicting our assumption. It follows that .
Lemma 6.3.14. Suppose that is consistent, complete, and contains witnesses. For every t T ermL ,
there exists c C such that t = c .
Proof. Let t T ermL . Fix x V ar such that x
/ OccurV ar(t). Since contains witnesses, we may fix
c C such that (x(t = x)) (t = c) (using the formula t = x). Now ` (t = x)tx , so we may use the
I rule (because V alidSubsttx (t = x) = 1) to conclude that ` x(t = x). From here we can use the Super
rule to conclude that ` x(t = x). We therefore have ` x(t = x) and ` (x(t = x)) (t = c), hence
` t = c by Proposition 5.2.14. Using Lemma 6.3.13, we conclude that t = c .
Lemma 6.3.15. Suppose that is consistent, complete, and contains witnesses. We have
1. if and only if
/ .
2. if and only if and .
3. if and only if or .
4. if and only if
/ or .
5. x if and only if there exists c C such that cx .
6. x if and only if cx for all c C.
Proof.
1. If , then
/ because otherwise ` and so would be inconsistent.
Conversely, if
/ , then because is complete.
2. Suppose first that . We then have that ` , hence ` by the EL rule and `
by the ER rule. Therefore, and by Lemma 6.3.13.
Conversely, suppose that and . We then have ` and ` , hence ` by the
I rule. Therefore, by Lemma 6.3.13.
3. Suppose first that . Suppose that
/ . Since is complete, we have that . From
Proposition 3.4.10, we know that {, } ` , hence ` by the Super rule. Therefore,
by Lemma 6.3.13. It follows that either or .
Conversely, suppose that either or .
Case 1: Suppose that . We have ` , hence ` by the IL rule. Therefore,
by Lemma 6.3.13.
Case 2: Suppose that . We have ` , hence ` by the IR rule. Therefore,
by Lemma 6.3.13.

6.3. COMPLETENESS

97

4. Suppose first that . Suppose that . We then have that ` and ` , hence
` by Proposition 5.2.14. Therefore, by Lemma 6.3.13. It follows that either
/ or
.
Conversely, suppose that either
/ or .
Case 1: Suppose that
/ . We have because is complete, hence {} is inconsistent
(as {} ` and {} ` ). It follows that {} ` by Proposition 5.2.11, hence
` by the I rule. Therefore, by Lemma 6.3.13.
Case 2: Suppose that . We have {}, hence {} ` , and so ` by the
I rule. Therefore, by Lemma 6.3.13.
5. Suppose first that x . Since contains witnesses, we may fix c C such that (x) cx .
We therefore have ` x and ` (x) cx , hence ` cx by Proposition 5.2.14. Using Lemma
6.3.13, we conclude that cx .
Conversely, suppose that there exists c C such that cx . We then have ` cx , hence ` x
using the I rule (notice that V alidSubstcx () = 1). Using Lemma 6.3.13, we conclude that x .
6. Suppose first that x . We then have ` x, hence ` cx for all c C using the E rule
(notice that V alidSubstcx () = 1 for all c C). Using Lemma 6.3.13, we conclude that cx for all
c C.
Conversely, suppose that cx for all c C. Since is consistent, this implies that there does not
/ by part 5, so x by part 1. It
exist c C with (cx ) = ()cx . Therefore, x
follows from Proposition 5.2.10 that ` x. Using Lemma 6.3.13, we conclude that x .

Proposition 6.3.16. If is consistent, complete, and contains witnesses, then is satisfiable.


Proof. Suppose that is consistent, complete, and contains witnesses.
Define a relation on T ermL by letting t u if t = u . We first check that is an equivalence
relation. Reflexivity follows from the EqRef l rule and Lemma 6.3.13. Symmetry and transitivity follow
from Proposition 5.2.5 and Proposition 5.2.6, together with the Super rule and Lemma 6.3.13).
We now define our L-structure M. We first let M = T ermL / . For each t T ermL , we let [t] denote
the equivalence class of t. Notice that M = {[c] : c C} be Lemma 6.3.14. We now finish our description of
the L-structure M by saying how to interpret the constant, relation, and function symbols. We let
1. cM = [c] for all c C.
2. RM = {([t1 ], [t2 ], . . . , [tk ]) M k : Rt1 t2 tk } for all R Rk .
3. f M ([t1 ], [t2 ], . . . , [tk ]) = [ft1 t2 tk ] for all f Fk .
Notice that our definitions of RM do not depend on our choice of representatives for the equivalence classes
by Proposition 5.2.7. Similarly, our definitions of f M do not depend on our choice of representatives for the
equivalences classes by Proposition 5.2.8. Finally, define s : V ar M by letting s(x) = [x] for all x V ar.
We first show that s(t) = [t] for all t T ermL by induction. We have s(c) = cM = [c] for all c C and
s(x) = s(x) = [x] for all x V ar. Suppose that f Fk and t1 , t2 , . . . , tk T ermL are such that s(ti ) = [ti ]
for all i. We then have
s(ft1 t2 tk ) = f M (s(t1 ), s(t2 ), . . . , s(tk ))
= f M ([t1 ], [t2 ], . . . , [tk ])
= [ft1 t2 tk ]

(by induction)

98

CHAPTER 6. SOUNDNESS, COMPLETENESS, AND COMPACTNESS

Therefore, s(t) = [t] for all t T ermL .


We now show that if and only if (M, s)  for all F ormL by induction. We first prove the
result for AtomicF ormL . Suppose that R Rk and t1 , t2 , . . . , tk T ermL . We have
Rt1 t2 tk ([t1 ], [t2 ], . . . , [tk ]) RM
(s(t1 ), s(t2 ), . . . , s(tk )) RM
(M, s)  Rt1 t2 tk
Suppose now that t1 , t2 T ermL . We have
t1 = t2 [t1 ] = [t2 ]
s(t1 ) = s(t2 )
(M, s)  t1 = t2
Suppose that the result holds for . We have

/

(by Lemma 6.3.15)

(M, s) 6

(by induction)

(M, s) 
Suppose that the result holds for and . We have
and

(by Lemma 6.3.15)

(M, s)  and (M, s) 

(by induction)

(M, s) 
and
or

(by Lemma 6.3.15)

(M, s)  or (M, s) 

(by induction)

(M, s) 
and finally

/ or

(by Lemma 6.3.15)

(M, s) 6 or (M, s) 

(by induction)

(M, s) 
Suppose that the result holds for and that x V ar. We have
x There exists c C such that cx
There exists c C such that (M, s) 

(by Lemma 6.3.15)


cx

There exists c C such that (M, s[x s(c)]) 


There exists c C such that (M, s[x [c]]) 
There exists a M such that (M, s[x a]) 
(M, s)  x

(by induction)
(by the Substitution Theorem)

6.4. COMPACTNESS

99

and also
x For all c C, we have cx
For all c C, we have (M, s) 

(by Lemma 6.3.15)


cx

For all c C, we have (M, s[x s(c)]) 

(by induction)
(by the Substitution Theorem)

For all c C, we have (M, s[x [c]])


For all a M, we have (M, s[x a]) 
(M, s)  x
Therefore, by induction, we have if and only if (M, s)  . In particular, we have (M, s)  for all
, hence is satisfiable.
Theorem 6.3.17 (Completeness Theorem). (Suppose that L is countable.)
1. Every consistent set of formulas is satisfiable.
2. If  , then ` .
Proof.
1. Suppose that is consistent. By Proposition 6.3.12, we may fix a language L0 L and F ormL0
such that is consistent, complete, and contains witnesses. Now is satisfiable by Proposition
6.3.16, so we may fix an L0 -structure M0 together with s : V ar M 0 such that (M0 , s)  for all
. We then have (M0 , s)  for all . Letting M be the restiction of M0 to L, we then have
(M, s)  for all . Therefore, is satisfiable.
2. Suppose that  . We then have that {} is unsatisfiable, hence {} is inconsistent by
part 1. It follows from Proposition 5.2.12 that ` .

We now give another proof of the Countable Lowenheim-Skolem Theorem which does not go through the
concept of elementary substructures.
Corollary 6.3.18 (Countable Lowenheim-Skolem Theorem). Suppose that L is countable and F ormL
is consistent. There exists a countable model of .
Proof. Notice that if L is consistent, then the L0 formed in Lemma 6.3.9 is countable because F ormL V ar
is countable. Thus, each Ln in the proof of Proposition 6.3.10 is countable, so the L0 formed in Proposition
6.3.10 is countable. It follows that T ermL0 is countable, and since the L0 -structure M we construct in the
proof of Proposition 6.3.16 is formed by taking the quotient from an equivalence relation on the countable
T ermL0 , we can conclude that M is countable. Therefore, the L-structure which is the restriction of M to
L from the proof of the Completeness Theorem is countable.

6.4

Compactness

Corollary 6.4.1 (Compactness Theorem).


1. If  , then there exists a finite 0 such that 0  .
2. If every finite subset of is satisfiable, then is satisfiable.
Proof.

100

CHAPTER 6. SOUNDNESS, COMPLETENESS, AND COMPACTNESS

1. Suppose that  . By the Completeness Theorem, we have ` . Using Proposition 5.2.15, we may
fix a finite 0 such that 0 ` . By the Soundness Theorem, we have 0  .
2. If every finite subset of is satisfiable, then every finite subset of is consistent by the Soundness
Theorem, hence is consistent by Corollary 5.2.16, and so is satisfiable by the Soundness Theorem.

6.5

Applications of Compactness

The next proposition is another result which expresses that first-order logic is not powerful enough to
distinguish certain aspects of cardinality. Here the distinction is between large finite numbers and the
infinite.
Proposition 6.5.1. Let L be a language. Suppose that F ormL is such that for all n N, there exists
a model (M, s) of such that |M | > n. We then have that there exists a model (M, s) of such that M is
infinite.
Proof. Let L0 = L {ck : k N} where the ck are new distinct constant symbols. Let
0 = {ck 6= c` : k, ` N and k 6= `}
We claim that every finite subset of 0 is satisfiable. Fix a finite 00 0 . Fix N N such that
00 {ck 6= c` : k, ` N and k 6= `}
By assumption, we may fix a model (M, s) of such that |M | > N . Let M0 be the L0 structure M together
with interpreting the constants c0 , c1 , . . . , cN as distinct elements of M and interpreting each ci for i > N
arbitrarily. We then have (M0 , s) is a model of 0 . Hence, every finite subset of 0 is satisfiable.
By the Compactness Theorem we may conclude that 0 is satisfiable. Fix a model (M0 , s) of 0 . If we
let M be the restriction of M0 to L, then (M, s) is a model of which is infinite.
Corollary 6.5.2. The class of all finite groups is not a weak elementary class in the language L = {f, e}.
Proof. If SentL is such that M od() includes all finite groups, then we may use the trivial fact that there
are arbitrarily large finite groups and Proposition 6.5.1 to conclude that it contains an infinite structure.
Proposition 6.5.3. Let L be a language. Suppose that F ormL is such there exists a model (M, s) of
with M infinite. We then have that there exists a model (M, s) of such that M is uncountable.
Proof. Let L0 = L {cr : r R} where the cr are new distinct constant symbols. Let
0 = {cr 6= ct : r, t R and r 6= t}
We claim that every finite subset of 0 is satisfiable. Fix a finite 00 0 . Fix a finite Z R such that
00 {cr 6= ct : r, t Z}
By assumption, we may fix a model (M, s) of such that M is infinite. Let M0 be the L0 structure M
together with interpreting the constants cr for r Z as distinct elements of M and interpreting each ct for
t
/ Z arbitrarily. We then have (M0 , s) is a model of 0 . Hence, every finite subset of 0 is satisfiable.
By the Compactness Theorem we conclude that 0 is satisfiable. Fix a model (M0 , s) of 0 . If we let M
be the restriction of M0 to L, then (M, s) is a model of which is uncountable.

6.5. APPLICATIONS OF COMPACTNESS

101

Proposition 6.5.4. The class K of all torsion groups is not a weak elementary class in the language
L = {f, e}.
Proof. Suppose that SentL is such that K M od(). Let L0 = L {c} where c is new constant symbol.
For each n N+ , let n SentL0 be cn 6= e (more formally, fcfc fcc where there are n 1 fs). Let
0 = {n : n N}
We claim that every finite subset of 0 has a model. Suppose that 0 0 is finite. Fix N N such that
0 {n : n < N }
0

Notice that if we let M0 be the group Z/N Z and let cM = 1, then M0 is a model of 0 . Thus, every finite
subset of 0 has a model, so 0 has a model by Compactness. If we restrict this model to L, we get an
element of M od() which is not int K because it has an element of infinite order.
Proposition 6.5.5. The class K of all equivalence relations in which all equivalence classes are finite is not
a weak elementary class in the language L = {R}.
Proof. Suppose that SentL is such that K M od(). Let L0 = L {c} where c is new constant symbol.
For each n N+ , let n SentL0 be
^

x1 x2 xn (

(xi 6= xj )

1i<jn

n
^

Rcxi )

i=1

and let
0 = {n : n N}
We claim that every finite subset of 0 has a model. Suppose that 0 0 is finite. Fix N N such that
0 {n : n N }
0

Notice that if we let M 0 = {0, 1, 2, . . . , N }, RM = (M 0 )2 , and cM = 0, then M0 is a model of 0 . Thus,


every finite subset of 0 has a model, so 0 has a model by Compactness. If we restrict this model to L, we
get an element of M od() which is not int K because it has an infinite equivalence class.
Proposition 6.5.6. Suppose that K is an elementary class, that SentL , and that K = M od(). There
exists a finite 0 such that K = M od(0 ).
Proof. Since K is an elementary class, we may fix SentL with K = M od( ). We then have  , so by
the Compactness Theorem we may fix a finite 0 such that 0  . Notice that K = M od(0 ).
Corollary 6.5.7. The class K of all fields of characteristic 0 is a weak elementary class, but not an elementary class, in the language L = {0, 1, +, }.
Proof. We already know that K is a weak elementary class because if we let be the conjunction of the
fields axioms and let n be 1 + 1 + + 1 6= 0 (where there are n 1s) for each n N+ , then K = M od()
where
= {} {n : n N+ }
Suppose that K is an elementary class. By the previous proposition, we may fix a finite 0 such that
K = M od(0 ). Fix N N such that
0 {} {n : n N }
Now if fix a prime p > N we see that (Fp , 0, 1, +, ) (the field with p elements) is a model of 0 which is not
an element of K. This is a contradiction, so K is not an elementary class.

102

6.6

CHAPTER 6. SOUNDNESS, COMPLETENESS, AND COMPACTNESS

An Example: The Random Graph

Throughout this section, we work in the language L = {R} where R is binary relation symbol. We think of
graphs as L-structures which are models of {xRxx, xy(Rxy Ryx)}.
Definition 6.6.1. For each n N+ , let Gn be the set of of all models of {xRxx, xy(Rxy Ryx)} with
universe [n].
Definition 6.6.2. For each A Gn , we let
P rn (A) =

|A|
|Gn |

For each SentL , we let


|{M Gn : M  }|
|Gn |
We use the suggestive P r because we think of constucting a graph randomly by flipping a fair coin for
each pair i, j to determine whether or not there is an edge linking them. In this context, P rn (A) is the
probability the graph so constucted on {1, 2, . . . , n} is in A. Notice that given i, j and i0 , j 0 distinct two
element subsets of {1, 2, . . . , n}, the question of whether there is an edge linking i, j and the question of
whether there is an edge linking i0 , j 0 are independent. We aim to prove the following.
P rn () =

Theorem 6.6.3. For all SentL , either lim P rn () = 1 or lim P rn () = 0.


n

Definition 6.6.4. For each r, s N with max{r, s} > 0, let r,s be the sentence
x1 x2 xr y1 y2 ys (

(xi 6= xj )

1i<jr

(yi 6= yj )

r ^
s
^

(xi 6= yj )

i=1 j=1

1i<js

z(

r
^

(z 6= xi )

i=1

s
^

(z 6= yj )

j=1

r
^
i=1

Rxi z

s
^

Ryj z))

j=1

Proposition 6.6.5. For all r, s N with max{r, s} > 0, we have lim P rn (r,s ) = 1.
n

Proof. Fix r, s N. Suppose that n N with n > r, s. Fix distinct a1 , a2 , . . . , ar , b1 , b2 , . . . , bs {1, 2, . . . , n}.
For each c distinct from the ai and bj , let
Ac = {M Gn : c is linked to each ai and to no bj }
For each such c, we have P rn (Ac ) =
is

1
2r+s ,

so P rn (Ac ) = 1
(1

1
2r+s

1
2r+s

and hence the probability that no c works

)nrs

Therefore,
 

n nr
1
P rn (r,s )
(1 r+s )nrs
r
s
2
1
nr+s (1 r+s )nrs
2
1 rs r+s
1
= (1 r+s )
n (1 r+s )n
2
2
1
2r+s 1
= (1 r+s )rs nr+s ( r+s )n
2
2
1 rs
nr+s
= (1 r+s )
2r+s
2
( 2r+s 1 )n

6.6. AN EXAMPLE: THE RANDOM GRAPH

103

Thus, lim P rn (r,s ) = 0, and hence lim P rn (r,s ) = 1.


n

Proposition 6.6.6. Let = {xRxx, xy(Rxy Ryx)} {r,s : r, s N+ and max{r, s} > 0} and let
RG = Cn().
Proposition 6.6.7. RG is satisfiable.
Proof. We build a countable model M of RG with M = N. Notice first that since Pf in (N) (the set of all
finite subsets of N) is countable, so is the set Pf in (N)2 , Hence the set
{(A, B) Pf in (N)2 : A B = and A B 6= }
is countable. Therefore, we may list it as
(A1 , B1 ), (A2 , B2 ), (A3 , B3 ), . . .
and furthermore we may assume that max(An Bn ) < n for all n N. Let M be the L-structure where
M = N and RM = {(k, n) : k An } {(n, k) : k An }. Suppose now that A, B N are finite with
A B = and A B 6= . Fix n N with A = An and B = Bn . We then have that (k, n) RM for all
k A (because k An ) and (`, n)
/ RM for all ` B (because `
/ An and n
/ A` since ` < n). Therefore,
M  r,s for all r, s N with max r, s > 0. Thus, M is a model of RG.
Theorem 6.6.8. All models of RG are infinite, and any two countable models of RG are isomorphic.
Proof. Suppose that M is model of RG which is finite. Let n = |M |. Since M  n,0 , there exists b M
such that (b, a) RM for all a M . However, this is a contradiction because (a, a)
/ RM for all a M . It
follows that all models of RG are infinite.
Suppose now that M and N are two countable models of RG. From above, we know that M and N are
both countably infinite. List M as m0 , m1 , m2 , . . . and list N as n0 , n1 , n2 , . . . . We build an isomorphism
via a back-and-forth construction as in the proof of the corresponding result for DLO. That is, we define
k Pf in (M N ) for k N recursively such that
1. k k+1 .
2. If (m, n) k and (m0 , n) k , then m = m0 .
3. If (m, n) k and (m, n0 ) k , then n = n0 .
4. mi dom(2i ).
5. nj ran(2j+1 ).
6. If (m, n) and (m0 , n0 ) , then (m, m0 ) RM if and only if (n, n0 ) RN
Suppose
that we are successful. Define h : M N be letting h(m) be the unique n such that (m, n)
S

,
and notice that h is isomorphism.
k
kN
We now define the k . Let 0 = (m0 , n0 ). Suppose that k N and weve defined k . Suppose first
that k is odd, say k = 2i + 1. If mi dom(k ), let k+1 = k . Suppose then that mi
/ dom(k ). Let
A = {m dom(k ) : (m, mi ) RM } and let B = {m dom(k ) : (m, mi )
/ RM }. Since N is a model of RG
and A B = , we may fix n N \ran(k ) such that (k (m), n) RM for all m A and (k (m), n)
/ RM
for all m B. Let k+1 = k {(mi , n)}.
Suppose now that k is even, say k = 2j. If nj ran(k ), let k+1 = k . Suppose then that nj
/ ran(k ).
Let A = {n ran(k ) : (n, nj ) RN } and let B = {n ran(k ) : (n, nj )
/ RN }. Since M is a model of RG
and AB = , we may fix m M \dom(k ) such that (k1 (n), m) RM for all n A and (k1 (n), m)
/ RM
for all n B. Let k+1 = k {(m, nj )}.

104

CHAPTER 6. SOUNDNESS, COMPLETENESS, AND COMPACTNESS

Corollary 6.6.9. RG is a complete theory.


Proof. Apply the Countable Los-Vaught Test.
Theorem 6.6.10. Let SentL .
1. If RG, then lim P rn ( ) = 1.
n

2. If
/ RG, then lim P rn ( ) = 0.
n

Proof.
1. Suppose that RG. We then have  , so by Compactness we may fix N N such that
{xRxx, xy(Rxy Ryx)} {r,s : r, s N } 
We then have that if M Gn is such that M  , then
_
M

r,s

0r,sN,max{r,s}>0

Hence for every n N we have


X

P rn ( )

P rn (r,s )

0r,sN,max{r,s}>0

Therefore, lim P rn ( ) = 0, and hence lim P rn ( ) = 1.


n

2. Suppose that
/ RG. Since RG is complete, it follows that RG. Thus, lim P rn ( ) = 1 by
n

part 1, and hence lim P rn ( ) = 0.


n

Chapter 7

Quantifier Elimination
7.1

Motivation and Definition

Quantifiers make life hard, so its always nice when we can find a way to express a statement involving
quantifiers using an equivalent statement without quantifiers.
Examples.
1. Let L = {0, 1, +, } and let (a, b, c) (where a, b, c V ar) be the formula
x(ax2 + bx + c = 0)
or more formally
x(a x x + b x + c = 0)
Let M be the L-structure (C, 0, 1, +, ). Since C is algebraically closed, we have that
(M, s)  (a 6= 0 b 6= 0 c = 0)
for all s : V ar C.
2. Let L = {0, 1, +, , <} and let (a, b, c) (where a, b, c V ar) be the formula
x(ax2 + bx + c = 0)
Let M be the L-structure (R, 0, 1, +, , <). Using the quadratic formula, we have
(M, s)  ((a 6= 0 b2 4ac 0) (a = 0 b 6= 0) (a = 0 b = 0 c = 0))
for all s : V ar R.

The above examples focused on one structure rather than a theory (which could have many models).
The next example uses a theory.
Example. Let L = {0, 1, +, } and let T be the theory of fields, i.e. T = Cn() where is the set of field
axioms. Let (a, b, c, d) (where a, b, c, d V ar) be the formula
wxyz(wa + xc = 1 wb + xd = 0 ya + zc = 0 yb + zd = 1)
105

106

CHAPTER 7. QUANTIFIER ELIMINATION

The part inside the parentheses is really just saying that the matrix equation is true:


 

w x a b
1 0
=
y z c d
0 1
Therefore, using simple facts about 2 2 determinants over an arbitrary field, we have
T  ad 6= bc

Definition 7.1.1. Let T be a theory. We say that T has quantifier elimination, or has QE, if for every
k 1 and every (x1 , x2 , . . . , xk ) F ormL , there exists a quantifier-free (x1 , x2 , . . . , xk ) such that
T 
This seems like an awful lot to ask of a theory. However, it is a pleasant surprise that several natural
and important theories have QE, and in several more cases we can obtain a theory with QE by only adding
a few things to the language. Before proving this, we first explain what we get from it.

7.2

What Quantifier Elimination Provides

The first application of using QE is to show that certain theories are complete. QE itself is not sufficient,
but a very mild additional assumption gives us what we want.
Proposition 7.2.1. Let T be a theory that has QE. If there exists an L-structure N such that for every
model M of T there is an embedding h : N M from N to M, then T is complete. (Notice, there is no
assumption that N is a model of T .)
Proof. Fix an L-structure N such that for every model M of T there is an embedding h : N M from N to
M, and fix n N . Let M1 and M2 be two models of T . For each i {1, 2}, fix an embedding hi : N Mi
from N to Mi . For each i, let Ai = ran(hi ), and notice that Ai is the universe of a substructure Ai of Mi .
Furthermore, notice that hi is an isomorphism from N to Ai .
Let SentL and let (x) F ormL be the formula (x = x). Since T has QE, we may fix a
quantifier-free (x) F ormL such that T  . We then have
M1  (M1 , h1 (n)) 
(M1 , h1 (n)) 
(A1 , h1 (n)) 

(since is quantifier-free)

(N , n) 

(since h1 is an isomorphism from N to A1 )

(A2 , h2 (n)) 

(since h2 is an isomorphism from N to A2 )

(M2 , h2 (n)) 

(since is quantifier-free)

(M2 , h2 (n)) 
M2 

Proposition 7.2.2. Let T be a theory that has QE. Suppose that A and M are models of T and that
A M. We then have that A  M.

7.3. QUANTIFIER MANIPULATION RULES

107

Proof. Let F ormL and let s : V ar A be a variable assignment. Suppose first that
/ SentL . Since
T has QE, we may fix a quantifier-free (x) F ormL such that T  . We then have
(M, s)  (M, s) 
(A, s) 

(since is quantifier-free)

(A, s) 
If is a sentence, we may tack on a dummy x = x as in the previous proof.
Proposition 7.2.3. Let T be a theory that has QE, let M be a model of T , and let k N+ . Let Z be the
set of all subsets of M k which are definable by atomic formuals. The set of definable subsets of M k equals
G(P(M k ), Z, {h1 , h2 }) where h1 : P(M k ) P(M k ) is the complement function and h2 : P(M k )2 P(M k )
is the union function.
Proof.

7.3

Quantifier Manipulation Rules

Definition 7.3.1. Let L be a language, and let , F ormL . We say that and are semantically
equivalent if  and  .
We now list a bunch of simple rules for manipulating formulas while maintaing.
1. (x) and x() are s.e.
2. (x) and x() are s.e.
3. (x) and x( ) are s.e. if x
/ F reeV ar().
4. (x) and x( ) are s.e. if x
/ F reeV ar().
5. (x) and x( ) are s.e. if x
/ F reeV ar().
6. (x) and x( ) are s.e. if x
/ F reeV ar().
7. (x) and x( ) are s.e. if x
/ F reeV ar().
8. (x) and x( ) are s.e. if x
/ F reeV ar().
Well need the following to change annoying variables.
1. x and y(yx ) are s.e. if y
/ OccurV ar().
2. x and y(yx ) are s.e. if y
/ OccurV ar().
Well also need to know that if and are s.e., then
1. and are s.e.
2. x and x are s.e.
3. x and x are s.e.
and also that if 1 are 2 s.e., and 1 and 2 are s.e., then
1. 1 1 and 2 2 are s.e.

108

CHAPTER 7. QUANTIFIER ELIMINATION

2. 1 1 and 2 2 are s.e.


3. 1 1 and 2 2 are s.e.
Definition 7.3.2. A quantifier-free formula is in disjunctive normal form if there exists i for 1 i n
such that
= 1 2 n
where for each i, there exists i,j which is either an atomic formula or the negation of an atomic formula
for 1 j mi such that
i = i,1 i,2 i,mi
Proposition 7.3.3. Suppose that (x1 , x2 , . . . , xk ) F ormL is quantifier-free. There exists a quantifier-free
formula (x1 , x2 , . . . , xk ) in disjunctive normal form such that and are s.e.
Proof.
Definition 7.3.4. A formula is called a prenex formula if it is an element of
G(SymL , QuantF reeF ormL , {h,x , h,x : x V ar})
Proposition 7.3.5. For every F ormL , there exists a prenex formula such that and are semantically equivalent.
Proposition 7.3.6. Let T be a theory. The following are equivalent
1. T has QE.
2. For all k 1 and all 1 , . . . , m , 1 , . . . , n F ormL with
(a) F reeV ar(1 , . . . , m , 1 , . . . , n ) {y, x1 , x2 , . . . , xk }.
(b) y F reeV ar(i ) for all i and y F reeV ar(j ) for all j.
(c) Each i and j is an atomic formula.
there exists a quantifier-free (x1 , x2 , . . . , xk ) F ormL such that
T  y(

m
^

i=1

n
^

j )

j=1

Proof.

7.4

Examples of Theories With QE

Theorem 7.4.1. Let L = . For each n N+ , let n be the sentence


^
x1 x2 xn
(xi = xj )
1i<jn

Let T = Cn({n : n N+ }). T has QE and is complete.


Proof. Suppose that 1 , . . . , m , 1 , . . . , n F ormL with
1. F reeV ar(1 , . . . , m , 1 , . . . , n ) {y, x1 , x2 , . . . , xk }.

7.4. EXAMPLES OF THEORIES WITH QE

109

2. y F reeV ar(i ) for all i and y F reeV ar(j ) for all j.


3. Each i and j is an atomic formula.
We need to show that there exists a quantifier-free (x1 , x2 , . . . , xk ) F ormL such that
m
^

T  y(

n
^

i=1

j )

j=1

Now each i and j is s.e. with, and hence we may assume is, one of the following:
1. x` = y
2. y = y
If some i is x` = y, then
T  y(

m
^

i=1

n
^

j ) (

j=1

m
^

i=1

n
^

j )xy`

j=1

If some j is y = y, then
T  y(

m
^

n
^

i=1

j ) (x1 = x1 )

j=1

Suppose then that each i is y = y and each j is x` = y. We then have


T  y(

m
^

i=1

n
^

j ) x1 = x1

j=1

because all models of T are infinite. Therefore, T has QE.


Notice that T is complete because the structure M given by M = {0} trivially embeds into all models
of T .
Theorem 7.4.2. RG has QE and is complete.
Proof. Suppose that 1 , . . . , m , 1 , . . . , n F ormL with
1. F reeV ar(1 , . . . , m , 1 , . . . , n ) {y, x1 , x2 , . . . , xk }.
2. y F reeV ar(i ) for all i and y F reeV ar(j ) for all j.
3. Each i and j is an atomic formula.
We need to show that there exists a quantifier-free (x1 , x2 , . . . , xk ) F ormL such that
RG  y(

m
^

i=1

n
^

j )

j=1

Now each i and j is RG-equivalent with, and hence we may assume is, one of the following:
1. x` = y
2. Rx` y
3. y = y

110

CHAPTER 7. QUANTIFIER ELIMINATION

4. Ryy
If some i is x` = y, then
RG  y(

m
^

i=1

n
^

j ) (

j=1

m
^

i=1

n
^

j )xy`

j=1

If some i is Ryy or some j is y = y, then


m
^

RG  y(

i=1

n
^

j ) (x1 = x1 )

j=1

Suppose then that no i is x` = y, no i is Ryy, and no j is y = y. Let


A = {` {1, 2, . . . , k} : there exists i such that i is Rx` y}
and let
B = {` {1, 2, . . . , k} : there exists j such that j is Rx` y}
We then have that
RG  y(

m
^

i=1

n
^

^ ^

j )

j=1

(xa = xb )

aA bB

because in models of RG, given disjoint finite sets A and B of vertices, there are infinitely many vertices
linked to everything in A and not linked to everything in B. Therefore, RG has QE.
Notice that RG is complete because the structure M given by M = {0} and RM = trivially embeds
into all models of RG.

7.5

Algebraically Closed Fields

Definition 7.5.1. Let L = {0, 1, +, }. Let SentL be the field axioms together with the sentences
a0 a1 an (an 6= 0 x(an xn + + a1 x + a0 = 0))
for each n N+ . Let ACF = Cn().
Theorem 7.5.2. ACF has QE.
Proof Sketch. The fundamental observation is that we can think of atomic formulas with free variables in
{y, x1 , x2 , . . . , xk } as equations p(~x, y) = 0 where p(~x, y) Z[~x, y] is a polynomial.
Thus, we have to find quantifier-free equivalents to formulas of the form
y[

m
^

(pi (~x, y) = 0)

i=1

which, letting q(~x, y) =

Qn

x, y),
j=1 qj (~

n
^

(qj (~x, y) 6= 0)]

j=1

is equivalent in ACF to
y[

m
^

(pi (~x, y) = 0) q(~x, y) 6= 0]

i=1

Suppose now that R is a ring and p1 , p2 , . . . , pm , q R[y] listed in decreasing order of degrees. Let the
leading term of p1 be ay n and let the leading term of pm be by k . We then have that there is a simultaneous
root of polynomials p1 , p2 , . . . , pm which is not a root of q if and only if one of the following happens:

7.5. ALGEBRAICALLY CLOSED FIELDS

111

1. b = 0 and there is simultaneous root of polynomials p1 , p2 , . . . , pm1 , pm by k which is not a root of q


(where by pm by k we mean that you eliminate any mention of the leading term).
2. b 6= 0 and there is a simultaneous root of the polynomials bp1 ay nk pm , p2 , . . . , pm which is not a
root of q.
Repeating this, we may assume that we have a formula of the form
y[p(~x, y) = 0 q(~x, y) 6= 0]
If q is not present, then we may use the fact that in an algebraically closed field, the polynomial an y n +
+ a1 y + a0 has a root if and only if some ai 6= 0 for i > 0, or a0 = 0. If p is not present, then we use the
fact that every algebraically closed field is infinite and also that every nonzero polynomial has finitely many
roots to conclude that the there is an element which is not a root of the polynomial an y n + a1 y + a0 if
and only if some ai 6= 0.
Suppose then that both p and q are present. The key fact is to use here is that if p and q are polynomials
over an algebraically closed field and the degree of p is at most n, then every root of p is a root of q if and
only if p | q n .
Thus, we have twoP
polynomials p and q, and we
Pnwant to find a quantifier-free formula equivalent to p | q.
m
Suppose that p(y) = i=0 ai y i and that q(y) = i=0 bi y i . Now if m = 0, then we have p | q if and only if
one of the following is true:
a0 6= 0
a0 = 0 and each bi = 0
If n = 0, then we have p | q if and only if one of the following is true:
b0 = 0
b0 6= 0, a0 6= 0, and ai = 0 for 1 i m.
Suppose that 1 m n. We then have that p | q if and only if one of the following is true:
am = 0 and (p am y m ) | q
am 6= 0 and p | (am q bn y nm p)
Thus, in either case, weve reduced the potential degree of one of the two polynomials. Finally, suppose that
1 n < m. We then have that p | q if and only if one of the following is true:
Each bi = 0
ai = 0 for n < i m and (p am y m am1 y m1 an+1 y n+1 ) | q
By repeatedly applying these latter two, and bottoming out as appropriate, we eventually obtain a quantifierfree equivalent to our formula.
Corollary 7.5.3. ACF0 is complete and ACFp is complete for all primes p.
Corollary 7.5.4. If F and K are algebraically closed fields such that F is a subfield of K, then (F, 0, 1, +, ) 
(K, 0, 1, +, ).
Corollary 7.5.5. (Q, 0, 1, +, )  (C, 0, 1, +, ).
Corollary 7.5.6. Suppose that F is an algebraically closed field. A subset X F is definable in (F, 0, 1, +, )
if and only if X is either finite or cofinite.

112

CHAPTER 7. QUANTIFIER ELIMINATION

Proposition 7.5.7. Let SentL . The following are equivalent.


1. ACF0  .
2. There exists m such that ACFp  for all primes p > m.
3. ACFp  for infinitely many primes p.
Proof. 1 imples 2 is Compactness. 2 implies 3 is trivial. 3 implies 1 using completeness of ACF0 and the
ACFp for each prime p, together with 1 implies 2.
Proposition 7.5.8. Let p be prime. Every finitely generated subfield of Fp is finite.
n

Proof. Let p be prime. For every n, let Kn be the set of roots of xp x in Fp . By standard results
in algebra, we have that Kn is a field of order pn , and furthermore is the unique subfield of Fp of order
d
d
d
d
2d
pn . If d | n, we then have that Kd Kn because if ap = a, then ap
= (ap )p = ap = a, so
S
2d
d
d
3d
ap = (ap )p = ap = a, etc. Let K = nN Kn . Notice that K is a subfield of Fp because if a Kn
and b Km , then a + b, a b Kmn . Furthermore, notice that K is algebraically closed because a finite
extension of a finite field is finite. Therefore, K = Fp .
Now if we have finitely many a1 , a2 , . . . , am Fp = K, then we may fix an n such that a1 , a2 , . . . , am Kn .
We then have that subfield of Fp generated by a1 , a2 , . . . , am is a subfield of Kn , and hence is finite.
Theorem 7.5.9. Every injective polynomial map from Cn to Cn is surjective.
Proof. Let n,d SentL be the sentence expressing that every injective polynomial map from F n to F n ,
where each polynomial has degree at most d, is surjective. We want to show that C  n,d for all n, d. To
do this, it suffices to show that Fp  n,d for all primes p and all n, d N. Thus, it suffices to show that for
n
n
all primes p, every injective polynomial map from Fp to Fp is surjective.
n
n
Fix a prime p and an n N. Suppose that f : Fp Fp is an injective polynomial map. Let
n
n
(b1 , b2 , . . . , bn ) Fp . We need to show that there exists (a1 , a2 , . . . , an ) Fp with f (a1 , a2 , . . . , an ) =
(b1 , b2 , . . . , bn ). Let f1 , f2 , . . . , fn Fp [x1 , x2 , . . . , xn ] be such that f = (f1 , f2 , . . . , fn ), and let C be the finite set of coefficients appearing in f1 , f2 , . . . , fn . Let K be the subfield of Fp generated by C {b1 , b2 , . . . , bn }
and notice that K is a finite field. Now f  K n maps K n into K n and is injective, so its surjective because
n
K n is finite. Thus, there exists (a1 , a2 , . . . , an ) K n Fp such that f (a1 , a2 , . . . , an ) = (b1 , b2 , . . . , bn ).

Chapter 8

Nonstandard Models of Arithmetic


and Analysis
8.1

Nonstandard Models of Arithmetic

Throughout this section, we work in the language L = {0, 1, <, +, } where 0, 1 are constant symbols, < is
a binary relation symbol, and +, are binary function symbols. We also let N = (N, 0, 1, <, +, ) where the
symbol 0 is interpreted as the real 0, the symbol + is interpreted as real addition, etc. Make sure that
you understand when + means the symbol in the language L and when it mean the addition function on N.
A basic question is whether T h(N) compeletely determines the model N. More precisely, we have the
following question.
Question 8.1.1. Are all models of T h(N) isomorphic to N?
Using Proposition 6.5.3, we can immediately give a negative answer to this question because there is an
uncountable model of T h(N), and an uncountable model cant be isomorphic to N. What would such a
model look like? In order to answer this, lets think a little about the kinds of sentences that are in T h(N).
Definition 8.1.2. For each n N, we define a term n T ermL as follows. Let 0 = 0 and let 1 = 1. Now
define the n recursively by letting n + 1 = n + 1 for each n 1. Notice here that the 1 and the + in n + 1
mean the actual number 1 and the actual addition function, whereas the 1 and + in n + 1 mean the symbols
1 and + in our language L. Thus, for example, 2 is the term 1 + 1 and 3 is the term (1 + 1) + 1.
Definition 8.1.3. Let M be an L-structure. We know that given any t T ermL containing no variables, t
corresponds to an element of M given by s(t) for some (any) variable assignment s : V ar M . We denote
this value by tM .
Notice that nN = n for all n N be a simple induction. Here are some important examples of the kinds
of things in T h(N).
Examples of Sentences in T h(N).
1. 2 + 2 = 4 and in general m + n = m + n and m n = m n.
2. xy(x + y = y + x)
3. x(x 6= 0 y(y + 1 = x))
113

114

CHAPTER 8. NONSTANDARD MODELS OF ARITHMETIC AND ANALYSIS

4. For each (x) F ormL , the sentence


(0x x( x+1
x )) x

Now any model M of T h(N) must satisfy all of these sentences. The basic sentences in 1 above roughly
tell us that M has a piece which looks just like N. We make this precise as follows.
Proposition 8.1.4. For any model M of T h(N), the function h : N M given by h(n) = nM is an
embedding of N into M.
Proof. Notice that
h(0N ) = h(0) = 0M = 0M
and
h(1N ) = h(1) = 1M = 1M
Now let m, n N. We have
m<nNm<n
m < n T h(N)
Mm<n
mM <M nM
h(m) <M h(n)
Also, since m + n = m + n T h(N) we have
h(m + n) = (m + n)M
= m M +M n M
= h(m) +M h(n)
and since m n = m n T h(N) we have
h(m n) = (m n)M
= mM M nM
= h(m) M h(n)
Finally, for any m, n N with m 6= n, we have m 6= n T h(N), so M  m 6= n, and hence h(m) 6= h(n).
Proposition 8.1.5. Let M be a model of T h(N). The following are equivalent.
1. M
= N.
2. M = {nM : n N}.
Proof. If 2 holds, then the h of the Proposition 8.1.4 is surjective and hence an isomorphism. Suppose then
that 1 holds and fix an isomorphism h : N M from N to M. We show that h(n) = nM for all n N by
induction. We have
h(0) = h(0N ) = 0M
and
h(1) = h(1N ) = 1M

8.2. THE STRUCTURE OF NONSTANDARD MODELS OF ARITHMETIC

115

Suppose that n N and h(n) = nM . We then have


h(n + 1) = h(n) +M h(1)
= n M +M 1 M
= (n + 1)M
Therefore, h(n) = nM for all n N, so M = {nM : n N} because h is surjective.
Definition 8.1.6. A nonstandard model of arithmetic is a model M of T h(N) such that M
6 N.
=
Weve already seen that there are nonstandard models of arithmetic by cardinality considerations, but
we can also build countable nonstandard models of arithmetic using the Compactness Theorem and the
Countable Lowenheim-Skolem Theorem.
Theorem 8.1.7. There exists a countable nonstandard model of arithmetic.
Proof. Let L0 = L {c} where c is a new constant symbol. Consider the following set of L0 -sentences.
0 = T h(N) {c 6= n : n N}
Notice that every finite subset of 0 has a model (by taking N and interpreting c large enough), so 0
has a countable model M (notice that L0 is countable) by the Compactness Theorem and the Countable
Lowenheim-Skolem Theorem. Restricting this model to the original language L, we may use the Proposition
8.1.5 to conclude that M is a countable nonstandard model of arithmetic.

8.2

The Structure of Nonstandard Models of Arithmetic

Throughout this section, let M be a nonstandard model of arithmetic. Anything we can express in the
first-order language of L which is true of N is in T h(N), and hence is true in M. For example, we have the
following.
Proposition 8.2.1.
+M is associative on M .
+M is commutative on M .
<M is a linear ordering on M .
For all a M with a 6= 0M , there exists b M with a + 1 = b.
Proof. The sentences
xyz(x + (y + z) = (x + y) + x)
xy(x + y = y + x)
xy(x < y y < x x = y)
x(x 6= 0 y(y + 1 = x))
are in T h(N).

116

CHAPTER 8. NONSTANDARD MODELS OF ARITHMETIC AND ANALYSIS

Since we already know that N is naturally embedded in M, and it gets tiresome to write +M , M , and
<M , well abuse notation by using just +, , and < to denote these. Thus, these symbols now have three
different meanings. They are used as formal symbols in our language, as the normal functions and relations
in N, and as their interpretations in M. Make sure you know how each appearance of these symbols is being
used.
Definition 8.2.2. We let Mf in = {nM : n N} and we call Mf in the set of finite elements of M. We also
let Minf = M \Mf in and we call Minf the set of infinite elements of M.
The following definition justifies our choice of name.
Proposition 8.2.3. Let a Minf . For any n N, we have nM < a.
Proof. For each n N, the sentence
x(x < n

n1
_

(x = i))

i=0

is in T h(N). Since a 6= nM for all n N, it follows that its not the case that a < nM for all n N. Since
< is a linear ordering on M , we may conclude that nM < a for all n N.
Definition 8.2.4. Define a relation on M by letting a b if either
a = b.
a < b and there exists n N such that a + nM = b.
b < a and there exists n N such that b + nM = a.
In other words, a b if a and b are finitely far apart.
Proposition 8.2.5. is an equivalence relation on M .
Proof. is clearly relexive and symmetric. Suppose that a, b, c M , that a b, and that b c. We handle
one case. Suppose that a < b and b < c. Fix m, n N with a + mM = b and b + nM = c. We then have
a + (m + n)M = a + (mM + nM )
= (a + mM ) + nM
= b + nM
=c
so a c. The other cases are similar.
Definition 8.2.6. Let a, b M . We write a  b to mean that a < b and a 6 b.
Wed like to know that that relation  is well-defined on the equivalence classes of . The following
lemma is useful.
Lemma 8.2.7. Let a, b, c M be such that a b c and suppose that a c. We then have a b and
b c.
Proof. If either a = b or b = c, this is trivial, so assume that a < b < c. Since a < c and a c, there exists
n N+ with a + nM = c. Now the sentence
xzw(x + w = z y((x < y y < z) u(u < w x + u = y)))
is in T h(N), so there exists d M such that d < nM and a + d = b. Since d < nM , there exists i N with
d = iM . We then have a + iM = b, hence a b. The proof that b c is similar.

8.2. THE STRUCTURE OF NONSTANDARD MODELS OF ARITHMETIC

117

Proposition 8.2.8. Suppose that a0 , b0 M are such that a0  b0 . For any a, b M with a a0 and
b b0 , we have a  b.
Proof. We first show that a 6 b. If a b, then using a0 a and b0 b, together with the fact that is an
equivalence relation, we can conclude that a0 b0 , a contradiction. Therefore, a 6 b.
Thus, we need only show that a < b. Notice that a0 < b because otherwise a0 b0 by Lemma 8.2.7.
Similarly, a < b0 because otherwise a0 b0 by Lemma 8.2.7. Thus, if b a, we have
a0 < b a < b0 .
so b a0 by Lemma 8.2.7, hence a0 b0 , a contradiction. It follows that a < b.
This allows us to define an ordering on the equivalence classes.
Definition 8.2.9. Given a, b M , we write [a] [b] to mean that a  b.
The next proposition implies that there is no largest equivalence class under the ordering .
Proposition 8.2.10. For any a Minf , we have a  a + a.
Proof. Let a Minf . For each n N, the sentence
x(n < x x + n < x + x)
is in T h(N). Using this when n = 0, we see that a = a + 0M < a + a. Since a Minf , we have nM < a and
hence a + nM < a + a for all n N. Therefore, a + nM 6= a + a for all n N, and so a 6 a + a.
Lemma 8.2.11. For all a M , one of the following holds
1. There exists b M such that a = 2M b.
2. There exists b M such that a = 2M b + 1M .
Proof. The sentence
xy(x = 2 y x = 2 y + 1)
is in T h(N).
Proposition 8.2.12. For any a Minf , there exists b Minf with b  a.
Proof. Suppose first that we have a b M such that a = 2M b. We then have a = b + b (because
x(2 x = x + x) is in T h(N)). Notice that b
/ Mf in because otherwise we would have a Mf in . Therefore,
b  b + b = a using Proposition 8.2.10. Suppose instead that we have a b M such that a = 2M b + 1M .
/ Mf in because
We then have a = (b + b) + 1M because x(2 x + 1 = (x + x) + 1) is in T h(N). Notice that b
otherwise we would have a Mf in . Therefore, b  b + b using Proposition 8.2.10, so b  (b + b) + 1 = a
since b + b (b + b) + 1.
Proposition 8.2.13. For any a, b Minf with a  b, there exists c Minf with a  c  b.
Proof. Suppose first that we have a c M such that a + b = 2M c. We then have a + b = c + c. Since
xyz((x < y x + y = z + z) (x < z z < y))
is in T h(N) it follows that a < c < b.
Suppose that a c and fix n N with a + nM = c. We then have that a + b = c + c = a + a + (2n)M ,
so b = a + (2n)M contradicting the fact that a  b. Therefore a 6 c.

118

CHAPTER 8. NONSTANDARD MODELS OF ARITHMETIC AND ANALYSIS


Suppose that c b and fix n N with c + nM = b. We then have that
a + (2n)M + b = c + c + (2n)M
= (c + nM ) + (c + nM )
=b+b

so a + (2n)M = b contradicting the fact that a 6 b. Therefore, b 6 c.


A similar argument handles the case when we have a c M such that a + b = 2M c + 1M .
This last proposition shows how nonstandard models can simplify quantifiers. It says that asking whether
a first-order statement holds for infinitely many n N is equivalent to asking whether it holds for at least
one infinite element of a nonstandard model.
Proposition 8.2.14. Let (x) F ormL . The following are equivalent.
1. There are infinitely many n N such that (N, n)  .
2. There exists a Minf such that (M, a)  .
Proof. Suppose first that there are infinitely many n N such that (N, n)  . In this case, the sentence
yx(y < x )
is in T h(N), so it holds in M. Fixing any b Minf , we may conclude that there exists a M with b < a
such that (M, a)  . Since b < a and b Minf , we may conclude that a Minf .
Conversely, suppose that there are only finitely many n N such that (N, a)  . Fix N N such that
n < N for all n with (N, n)  . We then have that the sentence
x( x < N )
is in T h(N), so it holds in M. Since there is no a Minf with a < N M , it follows that there is no a Minf
such that M  (a).

8.3

Nonstandard Models of Analysis

With a basic understanding of nonstandard models of arithmetic, lets think about nonstandard models of
other theories. One of the more amazing and useful such theories is the theory of the real numbers. The
idea is that we will have nonstandard models of the theory of the reals which contain both infinite and
infinitesimal elements. We can then transfer first-order statements back-and-forth, and do calculus in
this expanded stucture where the basic definitions (of say continuity) are simpler and more intuitive.
The first thing we need to decide on is what our language will be. Since we want to do calculus, we want
to have analogs of all of our favorite functions (such as sin) in the nonstandard models. Once we throw these
in, its hard to know where to draw the line. In fact, there is no reason to draw a line at all. Simply throw in
relation symbols for every possible subset of Rk , and throw in function symbols for every possible function
f : Rk R. Thus, throughout this section, we work in the language L = {r : r R} {P : P Rk } {f :
f : Rk R} where the P and f have the corresponding arities. We also let R be the structure with universe
R and where we interpret all symbols in the natural way.
Proposition 8.3.1. For any model M of T h(R), the function h : R M given by h(r) = rM is an
embedding of R into M.

8.3. NONSTANDARD MODELS OF ANALYSIS

119

Proof. Notice that


h(rR ) = h(r) = rM
for every r R. Now let P Rk and let r1 , r2 , . . . , rk R. We have
(r1 , r2 , . . . , rn ) P R R  P r1 r2 rk
P r1 r2 rk T h(R)
M  P r1 r2 rk
(r1 M , r2 M , . . . , rk M ) P M
(h(r1 ), h(r2 ), . . . , h(rk )) P M
Now let f : Rk R and let r1 , r2 , . . . , rk R. Since f r1 r2 rk = f (r1 , r2 , . . . , rk ) T h(R) we have
h(f R (r1 , r2 , . . . , rk )) = h(f (r1 , r2 , . . . , rk ))
= f (r1 , r2 , . . . , rk )

= f M (r1 M , r2 M , . . . , rk M )
= f M (h(r1 ), h(r2 ), . . . , h(rk ))
Finally, for any r1 , r2 R with r1 6= r2 , we have r1 6= r2 T h(R), so M  r1 6= r2 , and hence h(r1 ) 6=
h(r2 ).
Proposition 8.3.2. Let M be a model of T h(R). The following are equivalent.
1. M
= R.
2. M = {rM : r R}.
Proof. If 2 holds, then the h of the Proposition 8.3.1 is surjective and hence an isomorphism. Suppose
then that 1 holds and fix an isomorphism h : R M from R to M. For any r R, we must have
h(r) = h(rR ) = rM . Therefore, M = {rM : r R} because h is surjective.
Definition 8.3.3. A nonstandard model of analysis is a model M of T h(R) such that M
6 R.
=
Theorem 8.3.4. There exists a nonstandard model of analysis.
Proof. Let L0 = L {c} where c is a new constant symbol. Consider the following set of L0 -sentences.
0 = T h(R) {c 6= r : r R}
Notice that every finite subset of 0 has a model (by taking R and interpreting c distinct from each r
such that r appears in 0 ), so 0 has a model M by the Compactness Theorem. Restricting this model to
the original language L, we may use the Proposition 8.3.2 to conclude that M is a nonstandard model of
analysis.
Definition 8.3.5. For the rest of this section, fix a nonstandard model of analysis and denote it by R.
Instead of wrting f R for each f : Rk R, we simply write f . We use similar notation for each P Rk .
Also, since there is a natural embedding (the h above) from R into R, we will identify R with its image
and hence think of R as a subset of R. Finally, for operations like + and , we will abuse notation and omit
the s.
Proposition 8.3.6. There exists z R such that z > 0 and z < for all R with > 0.

120

CHAPTER 8. NONSTANDARD MODELS OF ARITHMETIC AND ANALYSIS

Proof. Fix b R such that b 6= r for all r R.


Case 1: Suppose that b > r for all r R. Let f : R R be the function
(
1
if r 6= 0
f (r) = r
0 otherwise
Let z = f (b). We then have have that z > 0 using the sentence
x(0 < x 0 < f x)
Also, for any R with > 0, we have that b > 1 , hence z < using the sentence
x(f < x f x < )
Case 2: Suppose that b < r for all r R. We then have that b < r for all r R and hence r < b for
all r R. Thus, we may tak z = f (b) by the argument in Case 1.
Case 3: Suppose then that there exists r R with r < b and there exists r R with b < r. Let
X = {r R : r < b}
Notice that X is downward closed (if r1 , r2 R with r2 X and r1 < r2 , then r1 X), nonempty, bounded
above, Let s = sup X R. Now b = s is impossible, so either s < b or b < s.
Subcase 1: Suppose that s < b. We claim that we may take z = b s. Since s < b, we have z = b s > 0.
Suppose that R and > 0. We then have that s + > s = sup X, so s +
/ X and hence s + b.
Now s + 6= b because s + R, so s + > b. It follows that z = b s < .
Subcase 2: Suppose that b < s. We claim that we may take z = s b. Since b < s, we have z = s b > 0.
Suppose that R and > 0. We then that s < s = sup X, so we may fix r X with s < r. Since
X is downward closed, we have that s X, so s < b. It follows that z = s b < .
From now on, well use the more natural notation

1
b

for f (b) whenever b 6= 0.

Definition 8.3.7.
1. Z = {a R : |a| < for all R with > 0}. We call Z the set of infinitesimals.
2. F = {a R : |a| < r for some r R with r > 0}. We call F the set of finite or limited elements.
3. I = R\F. We call I the set of infinite or unlimited elements.
Proposition 8.3.8.
1. Z is a subring of R.
2. F is a subring of R.
3. Z is a prime ideal of F.
Proof.
1. First notice that Z =
6 because 0 Z (or we can use Proposition 8.3.6). Suppose that a, b Z. Let
R with > 0.

8.3. NONSTANDARD MODELS OF ANALYSIS


We have that

R and

> 0, hence |a| <

121
and |b| < 2 . It follows that

|a b| |a + (b)|
|a| + | b|
|a| + |b|

< +
2 2
=
Therefore, a b Z. We also have that |a| < 1 and |b| < , hence
|a b| = |a| |b|
<1
=
Therefore, a b Z.
2. Clearly, F 6= . Suppose that a, b F, and fix r1 , r2 R with r1 , r2 > 0 such that |a| < r1 and |b| < r2 .
We have
|a b| |a + (b)|
|a| + | b|
|a| + |b|
< r1 + r2
so a b F. We also have
|a b| = |a| |b|
< r1 r2
so a b F.
3. We first show that Z is an ideal of F. Suppose that a F and b Z. Fix r R with r > 0 and
|a| < r. Let R with > 0. We then have that r R and r > 0, hence |a| < r . It follows that
|a b| = |a| |b|

< r
r
=
Therefore, a b Z.
We now show that Z is a prime ideal of F. Suppose that a, b F\Z. We have a b F by part 2.
Fix , R with , > 0 such that |a| > and |b| > . We then have |a b| = |a| |b| > , hence
ab
/ Z.

Definition 8.3.9. Let a, b R.


1. We write a b to mean that a b Z.

122

CHAPTER 8. NONSTANDARD MODELS OF ARITHMETIC AND ANALYSIS

2. We write a b to mean that a b F.


Proposition 8.3.10. and are equivalence relations on R.
Definition 8.3.11. Let a R. The -equivalence class of a is called the halo of a. The -equivalence
class of a is called the galaxy of a.
Proposition 8.3.12. Let a1 , b1 , a2 , b2 R with a1 b1 and a2 b2 .
1. a1 + a2 b1 + b2 .
2. a1 a2 b1 b2 .
3. If a1 , b1 , a2 , b2 F, then a1 a2 b1 b2 .
4. If a1 , b1 , a2 , b2 F\Z, then

a1
a2

b1
b2 .

Proof.
1. We have a1 b1 Z and a2 b2 Z, hence
(a1 + a2 ) (b1 + b2 ) = (a1 b1 ) + (a2 b2 )
is in Z by Proposition 8.3.8.
2. We have a1 b1 Z and a2 b2 Z, hence
(a1 a2 ) (b1 b2 ) = (a1 b1 ) (a2 b2 )
is in Z by Proposition 8.3.8.
3. We have a1 b1 Z and a2 b2 Z. Now
a1 a2 b1 b2 = a1 a2 a1 b2 + a1 b2 b1 b2 = a1 (a2 b2 ) + b2 (a1 b1 )
so a1 a2 b1 b2 Z by Proposition 8.3.8.
4. We have a1 b1 Z and a2 b2 Z. Now
a1
b1
a1 b2 a2 b1
1

=
=
(a1 b2 a2 b1 )
a2
b2
a2 b2
a2 b2
and we know by part 3 that a1 b2 a2 b1 Z. Since a2 , b2 F\Z, it follows that a2 b2 F\Z
by Proposition 8.3.8. Therefore, a21b2 F (if > 0 is such that |a2 b2 | > , then | a21b2 | < 1 ), so
a1
b1
a2 b2 Z by Proposition 8.3.8.

Proposition 8.3.13. For every a F, there exists a unique r R such that a r.


Proof. Fix a F. We first prove existence. Let
X = {r R : r < a}
and notice that X is downward closed, nonempty, and bounded above because a F. Now let s = sup X
and argue as in Case 3 of Proposition 8.3.6 that a s.
Suppose now that r1 , r2 R are such that a r1 and a r2 . We then have that r1 r2 because is
2|
an equivalence relation. However, this is a contradiction because |r1 r2 | > |r1 r
R.
2

8.3. NONSTANDARD MODELS OF ANALYSIS

123

Definition 8.3.14. We define a map st : F R by letting st(a) be the unique r R such that a r. We
call st(a) the standard part or shadow of a.
Corollary 8.3.15. The function st : F R is a ring homomorphism and ker(st) = Z.
Proposition 8.3.16. Suppose that A R, that f : A R, and that r, ` R. Suppose also that there exists
> 0 such that (r , r + )\{r} A. The following are equivalent.
1. lim f (x) = `.
xr

2. For all a r with a 6= r, we have f (a) `.


Proof. Suppose first that lim f (x) = `. Fix a A\{r} with a r. Let R with > 0. Since
xr

lim f (x) = `, we may fix R with > 0 such that |f (x) `| < whenever x A and 0 < |x r| < .
xr
Now the sentence
x((x A 0 < |x r| < ) |f (x) `| < )
is in T h(R) = T h( R). Now we have a A and 0 < |a r| < , hence | f (a) `| < . Since was arbitrary,
it follows that f (a) `.
Suppose now that for all a r with a 6= r, we have f (a) `. Fix z Z with z > 0. Let R with
> 0. By assumption, whenever a A and 0 < |a r| < z, we have that f (a) `. Thus, the sentence
( > 0 x((x A 0 < |x r| < ) |f (x) `| < ))
is in T h( R) = T h(R). By fixing a witnessing , we see that the limit condition holds for .
Proposition 8.3.17. Suppose that A R, that f, g : A R, and that r, `, m R. Suppose also that there
exists > 0 such that (r , r + )\{r} A, that lim f (x) = ` and lim g(x) = m. We then have
xr

xr

1. lim (f + g)(x) = ` + m.
xr

2. lim (f g)(x) = ` + m.
xr

3. lim (f g)(x) = ` m.
xr

4. If m 6= 0, then lim ( fg )(x) =


xr

`
m.

Proof. Fix a r with a 6= r. We then have f (a) ` and g(a) m. We have


1. (f + g)(a) = f (a) + g(a) ` + m.
2. (f g)(a) = f (a) g(a) ` m.
3. (f g)(a) = f (a) g(a) ` m.
4. ( fg )(a) =

f (a)

g(a)

`
m

(notice g(a)
/ Z because m 6= 0).

Corollary 8.3.18. Suppose that A R, that f : A R, and that r R. Suppose also that there exists
> 0 such that (r , r + ) A. The following are equivalent.
1. f is continuous at r.
2. For all a r, we have f (a) f (r).

124

CHAPTER 8. NONSTANDARD MODELS OF ARITHMETIC AND ANALYSIS


Taking the standard definition of the derivative, we immediately get the following.

Corollary 8.3.19. Suppose that A R, that f : A R, and that r, ` R. Suppose also that there exists
> 0 such that (r , r + ) A. The following are equivalent.
1. f is differentiable at r with f 0 (r) = `.
2. For all a r with a 6= r, we have

f (a)f (r)
ar

`.

Proposition 8.3.20. If f is differentiable at r, then f is continuous at r.


Proof. Fix a r with a 6= r. Since f is differentiable at r, we have

Now f 0 (r) F, so

f (a) f (r).

f (a)f (r)
ar

f (a) f (r)
f 0 (r)
ar

F, and hence f (a) f (r) Z because a r Z. It follows that

Proposition 8.3.21. Suppose that f, g : R R and r R. Suppose also that g is differentiable at r and f
is differentiable at g(r). We then have that f g is differentiable at r and (f g)0 (r) = f 0 (g(r)) g 0 (r).
Proof. We know that for all a r with a 6= r, we have

g(a) g(r)
g 0 (r)
ar

Also, for all b g(r) with b 6= g(r), we have

f (b) f (g(r))
f 0 (g(r))
b g(r)

Now fix a r with a 6= r. Since g is continuous at r, we have g(a) g(r). If g(a) 6= g(r), then


f ( g(a)) f (g(r))
(f g)(a) (f g)(r)
=
ar
ar

f ( g(a)) f (g(r)) g(a) g(r)
=

g(a) g(r)
ar
0
0
f (g(r)) g (r)

Suppose then that g(a) = g(r). Since the first line above holds for every a r with a 6= r, we must have
g 0 (r) 0 and hence g 0 (r) = 0 because g 0 (r) R. Therefore,


f ( g(a)) f (g(r))
(f g)(a) (f g)(r)
=
ar
ar
=0

= f 0 (g(r)) g 0 (r)

Chapter 9

Introduction to Axiomatic Set Theory


No one shall expel us from the paradise that Cantor has created. - David Hilbert

9.1

Why Set Theory?

Set theory originated in an attempt to understand and somehow classify small or negligible sets of
real numbers. Cantors early explorations in the realm of the transfinite were motivated by a desire to
understand the points of convergence of trigonometric series. The basic ideas quickly became a fundamental
part of analysis.
Since then, set theory has become a way to unify mathematical practice and the way in which mathematicians deal with the infinite in all areas of mathematics. Youve all seen the proof that the set of real
numbers is uncountable, but what more can be said? Exactly how uncountable is the set of real numbers?
Does this taming of the infinite give us any new tools to prove interesting mathematical theorems? Is there
anything more that the set-theoretic perspective provides to the mathematical toolkit other than a crude
notion of size and cute diagonal arguments?
We begin by listing a few basic questions from various areas of mathematics that can only be tackled
with a well-defined theory of the infinite which set theory provides.
Algebra: A fundamental result in linear algebra is that every finitely generated vector space has a basis,
and any two bases have the same size. We call the unique size of any basis of a vector space the dimension
of that space. What can be said about vector spaces that arent finitely generated? Does every vector space
have a basis? Is there a meaningful way to assign a dimension to every vector space in such a way that
two vector spaces over the same field are isomorphic if and only if they have the same dimension? We
need a well-defined and robust notion of infinite sets and infinite cardinality to deal with these questions.
Analysis: Lebesgues theory of measure and integration require an important distinction between countable and uncountable sets. Aside from this use, the study of the basic structure of the Borel sets or the
projective sets (an extension of the Borel sets) require some sophisticated use of set theory, in a way that
can be made precise.
Foundations: A remarkable side-effect of our undertaking to systematically formalize the infinite is that
we can devise a formal axiomatic and finitistic system in which virtually all of mathematical practice can
be embedded in an extremely faithful manner. Whether this fact is interesting or useful depends on your
philosophical stance about the nature of mathematics, but it does have an important consequence. It puts us
in a position to prove that certain statements do not follow from the axioms (which have now been formally
defined and are thus susceptible to a mathematical analysis), and hence can not be proven by the currently
accepted axioms. For better or worse, this feature has become the hallmark of set theory. For example, we
can ask questions like:
125

126

CHAPTER 9. INTRODUCTION TO AXIOMATIC SET THEORY

1. Do we really need the Axiom of Choice to produce a nonmeasurable set of real numbers?
2. Is there an uncountable set of real numbers which can not be in one-to-one correspondence with the
set of all real numbers?
Aside from these ideas which are applicable to other areas of mathematics, set theory is a very active
area of mathematics with its own rich and beautiful structure, and deserves study for this reason alone.

9.2

Motivating the Axioms

In every modern mathematical theory (say group theory, topology, the theory of Banach spaces), we start
with a list of axioms, and derive results from these. In most of the fields that we axiomatize in this way,
we have several models of the axioms in mind (many different groups, many different topological spaces,
etc.), and were using the axiomatization to prove abstract results which will be applicable to each of these
models. In set theory, you may think that it is our goal to study one unique universe of sets, so our original
motivation in writing down axioms is simply to state precisely what we are assuming in an area that can
often be very counterintuitive. Since we will build our system in first-order logic, it turns out that there are
many models of set theory as well (assuming that there is at least one...), and this is the basis for proving
independence results, but this isnt our initial motivation. This section will be a little informal. Well give
the formal axioms (in a formal first-order language) and derive consequences starting in the next section.
Whether the axioms that we are writing down now are obviously true, correct, justified, or even
worthy of study are very interesting philosophical questions, but I will not spend much time on them here.
Regardless of their epistemological status, they are now nearly universally accepted as the right axioms to
use in the development of set theory. The objects of our theory are sets, and we have one binary relation
which represents set membership. That is, we write x y to mean that x is an element of y. We begin with
an axiom which ensures that our theory is not vacuous.
Axiom of Existence: There exists a set.
We need to have an axiom which says how equality of sets is determined in terms of the membership
relation. In mathematical practice using naive set theory, the most common way to show that two sets A
and B are equal is to show that each is a subset of the other. We therefore define A B to mean that for
all x A, we have x B, and we want to be able to conclude that A = B from the facts that A B and
B A. That is, we want to think of a set as being completely determined by its members, thus linking =
and , but we need to codify this as an axiom.
Axiom of Extensionality: For any two sets A and B, if A B and B A, then A = B.
The Axiom of Extensionality implicitly implies a few perhaps unexpected consequences about the nature
of sets. First, if a is a set, then we should consider the two sets {a} and {a, a} (if we are allowed to assert
their existence) to be equal because they have the same elements. Similarly, if a and b are sets, then we
should consider {a, b} and {b, a} to be equal. Hence, whatever a set is, it should be inherently unordered
and have no notion of multiplicity. Also, since the only objects we are considering are sets, we are ruling
out the existence of atoms other than the empty set, i.e. objects a which are not the empty set but which
have no elements.
We next need some rules about how we are allowed to build sets. The naive idea is that any property
we write down determines a set. That is, for any property P of sets, we may form the set {x : P (x)}. For
example, if you have a group G, you may form the center of G given by Z(G) = {x : x G and xy = yx for
all y G}. Of course, this naive approach leads to the famous contradiction known as Russells paradox.

9.2. MOTIVATING THE AXIOMS

127

Let P (x) be the property x


/ x, and let z = {x : P (x)} = {x : x
/ x}. We then have z z if and only if
z
/ z, a contradiction.
This gives our first indication that it may be in our best interest to tread carefully when giving rules about
how to build sets. One now standard reaction to Russells Paradox and other similar paradoxes in naive set
theory is that the set-theoretic universe is too large to encapsulate into one set. Thus, we shouldnt allow
ourselves the luxury of forming the set {x : P (x)} because by doing so we may package too much into one
set, and the set-theoretic universe is too large to make this permissible. In other words, we should only
christen something as a set if it is not too large.
However, if we already have a set A and a property P , we should be allowed to from {x A : P (x)}
because A is a set (hence not too large), so we should be allowed to assert that the subcollection consisting
of those sets x in A such that P (x) holds is in fact a set. For example, if we have a group G (so G is already
known to be a set), its center Z(G) is a set because Z(G) = {x G : xy = yx for all y G}. Therefore, we
put forth the following axiom.
Axiom of Separation: For any set A and any property P of sets, we may form the set consisting of
precisely those x A such that P (x), i.e. we may form the set {x A : P (x)}.
You may object to this axiom because of the vague notion of a property of sets, and that would certainly
be a good point. Well make it precise when we give the formal first-order axioms in the next section. The
Axiom of Separation allows us to form sets from describable subcollections of sets we already know exist,
but we currently have no way to build larger sets from smaller ones. We now give axioms which allow us to
build up sets in a permissible manner.
Our first axiom along these lines will allow us to conclude that for any two sets x and y, we may put
them together into a set {x, y}. Since we already have the Axiom of Separation, we will state the axiom in
the (apparently) weaker form that for any two sets x and y, there is a set with both x and y as elements.
Axiom of Pairing: For any two sets x and y, there is a set A such that x A and y A.
We next want to have an axiom which allows us to take unions. However, in mathematics, we often
want to take a union over a family of sets, possibly
infinite. For example, we may have a set An for each
S
natural number n, and then want to consider nN An . By being clever, we can incorporate all of these
ideas of taking unions into one axiom. The idea is the following. Suppose that we have two sets A and B,
say A = {u, v, w} and B = {x, z}. We want to be able to assert the existence of the union of A and B,
which is {u, v, w, x, z}. First, by by the Axiom of Pairing, we may form the set F = {A, B}, which equals
{{u, v, w}, {x, z}}. Now the union of A and B is the set of elements of elements of F. S
In the above example,
if we can form the set F = {A1 , A2 , A3 , . . . } (later axioms will justify this), then nN An is the set of
elements of elements of F. Again, in the presence of the Axiom of Separation, we state this axiom in the
(apparently) weaker form that for any set F, there is set containing all elements of elements of F.
Axiom of Union: For any set F, there is a set U such that for all sets x, if there exists A F with x A,
then x U .
We next put forward two axioms which really allow the set-theoretic universe to expand. The first is the
Power Set Axiom which tells us that if we have a set A, it is permissible to form the set consisting of all
subsets of A.
Axiom of Power Set: For any set A, there is a set F such that for all sets B, if B A, then B F.
Starting with the empty set (which exists using the Axiom of Existence and the Axiom of Separation),

128

CHAPTER 9. INTRODUCTION TO AXIOMATIC SET THEORY

we can build a very rich collection of finite sets using the above axioms. For example, we can form {} using
the Axiom of Pairing. We can also form {} by applying the Axiom of Power Set to . We can then go on
to form {, {}} and many other finite sets. However, our axioms provide no means to build an infinite set.
Before getting to the Axiom of Infinity, we will lay some groundwork about ordinals. If set theory is going
to serve as a basis for mathematics, we certainly need to be able to embed within it the natural numbers.
It seems natural to represent the number n as some set which we think of as having n elements. Which set
should we choose? Lets start from the bottom-up. The natural choice to play the role of 0 is because it is
the only set without any elements. Now that we have 0, and we want 1 to be a set with one element, perhaps
we should let 1 be the set {0} = {}. Next, a canonical choice for a set with two elements is {0, 1}, so we let
2 = {0, 1} = {, {}}. In general, if we have defined 0, 1, 2, . . . , n, we can let n + 1 = {0, 1, . . . , n}. This way
of defining the natural numbers has many advantages which well come to appreciate. For instance, well
have n < m if and only if n m, so we may use the membership relation to define the standard ordering of
the natural numbers.
However, the . . . in the above definition of n + 1 may make you a little nervous. Fortunately, we can give
another description of n + 1 which avoids this unpleasantness. If weve defined n, we let n + 1 = n {n},
which we can justify the existence of using the Axiom of Pairing and the Axiom of Union. The elements of
n + 1 will then be n, and the elements of n which should inductively be the natural numbers up to, but
not including, n.
Using the above outline, we can use our axioms to justify the existence of any particular natural number
n (or, more precisely, the set that weve chosen to represent our idea of the natural number n). However,
we cant justify the existence of the set of natural numbers {0, 1, 2, 3, . . . }. To enable us to do this, we make
the following definition. For any set x, let S(x) = x {x}. We call S(x) the successor of x. We want an
axiom which says that there is a set containing 0 = which is closed under successors.
Axiom of Infinity: There exists a set A such that A and for all x, if x A, then S(x) A.
With the Axiom of Infinity asserting existence, its not too difficult to use the above axioms to show
that there is a smallest (with respect to ) set A such that A and for all x, if x A, then S(x) A.
Intuitively, this set is the collection of all natural numbers. Following standard set-theoretic practice, we
denote this set by (this strange choice, as opposed to the typical N, conforms with the standard practice
of using lowercase greek letters to represent infinite ordinals).
With the set of natural numbers in hand, theres no reason to be timid and stop counting. We started
with 0, 1, 2, . . . , where each new number consisted of collecting the previous numbers into a set, and weve
now collected all natural numbers into a set . Why not continue the counting process by considering
S() = {} = {0, 1, 2, . . . , }? We call this set + 1 for obvious reasons. This conceptual leap of
counting into the so-called transfinite gives rise to the ordinals, the numbers which form the backbone of
set theory.
Once we have + 1, we can then form the set + 2 = S( + 1) = {0, 1, 2, . . . , , + 1}, and continue on
to + 3, + 4, and so on. Why stop there? If we were able to collect all of the natural numbers into a set,
whats preventing us from collecting these into the set {0, 1, 2, . . . , , + 1, + 2, . . . }, and continuing? Well,
our current axioms are preventing us, but we shouldnt let that stand in our way. If we can form , surely
we should have an axiom allowing us to make this new collection a set. After all, if isnt too large, this
set shouldnt be too large either since its just another sequence of many sets after .
The same difficulty arises when you want to take the union of an infinite family of sets. In fact, the
previous problem is a special case of this one, but in this generality it may feel closer to home. Suppose we
have sets A0 , A1 , AS
2 , . . . , that is, we have a set An for every n . Of course, we should be able to justify
making the union n An into a set. If we want to apply the Axiom of Union, we should first form the
set F = {A0 , A1 , A2 , . . . } and apply the axiom to F. However, in general, our current axioms dont justify
forming this set despite its similarity to asserting the existence of .

9.2. MOTIVATING THE AXIOMS

129

To remedy these defects, we need a new axiom. In light of the above examples, we want to say something
along the lines of if we can index a family of sets with , then we can form this family into a set. Using this
principle, we should be able to form the set {, + 1, + 2, . . . } and hence {0, 1, 2, . . . , , + 1, + 2, . . . }
is a set by the Axiom of Union. Similarly, in the second example, we should be able to form the set
{A0 , A1 , A2 , . . . }. In terms of our restriction of not allowing sets to be too large, this seems justified
because if we consider to not be too large, then any family of sets it indexes shouldnt be too large
either.
There is no reason to limit our focus to . If we have any set A, and we can index a family of sets using
A, then we should be able to assert the existence of a set containing the elements of the family. We also
want to make the notion of indexing more precise, and we will do it using the currently vague notion of a
property of sets as used in the Axiom of Separation.
Axiom of Collection: Suppose that A is a set and P (x, y) is a property of sets such that for every x A,
there is a unique set y such that P (x, y) holds. Then there is a set B such that for every x A, we have
y B for the unique y such that P (x, y) holds.
Our next axiom is often viewed as the most controversial due to its nonconstructive nature and the
sometimes counterintuitive results it allows us to prove. I will list it here as a fundamental axiom, but we
will avoid using it in the basic development of set theory below until we get to a position to see its usefulness
in mathematical practice.
The Axiom of Separation and the Axiom of Collection involved the somewhat vague notion of property,
but whenever we think of a property (and the way we will make the notion of property precise using a
formal language) we have a precise unambiguous definition which describes the property in mind. Our next
axiom, the Axiom of Choice, asserts the existence of certain sets without the need for such a nice description.
Intuitively, it says that if we have a set consisting only of nonempty sets, there is a function which picks an
element out each of these nonempty sets without requiring that there be a definable description of such
a function. We havent defined the notion of a function in set theory, and it takes a little work to do, so
we will state the axiom in the following form: For every set F of nonempty pairwise disjoint sets, there is a
set C consisting of exactly one element from each element of F. We think of C as a set which chooses an
element from each of the elements of F. Slightly more precisely, we state the axiom as follows.
Axiom of Choice: Suppose that F is a set such every A F is nonempty, and for every A, B F, if there
exists a set x with x A and x B, then A = B. There exists a set C such that for every A F, there is
a unique x C with x A.
Our final axiom is in no way justified by mathematical practice because it never appears in arguments
outside set theory. It is also somewhat unique among our axioms in that in asserts that certain types of sets
do not exist. However, adopting it gives a much clearer picture of the set-theoretic universe and it will come
to play an important role in the study of set theory itself. As with the Axiom of Choice, we will avoid using
it in the basic development of set theory below until we are able to see its usefulness to us.
The goal is to eliminate sets which appear circular in terms of the membership relation. For example, we
want to forbid sets x such that x x (so there is no set x such that x = {x}). Similarly, we want to forbid
the existence of sets x and y such that x y and y x. In more general terms, we dont want to have a
set with an infinite descending chain each a member of the next, such as having sets xn for each n such
that x2 x1 x0 . We codify this by saying every nonempty set A has an element which is minimal
with respect to the membership relation.
Axiom of Foundation: If A is a nonempty set, then there exists x A such that there is no set z with
both z A and z x.

130

9.3

CHAPTER 9. INTRODUCTION TO AXIOMATIC SET THEORY

Formal Axiomatic Set Theory

We now give the formal version of our axioms. We work in a first-order language L with a single binary
relation symbol . By working in this first-order language, we are able to make precise the vague notion of
property discussed above by using first-order formulas instead. However, this comes at the cost of replacing
the Axiom of Separation and the Axiom of Collection by infinitely many axioms (also called an axiom scheme)
since we cant quantify over formulas within the theory itself. There are other more subtle consequences of
formalizing the above intuitive axioms in first-order logic which we will discuss below.
Notice also that we allow parameters (denoted by p~) in the Axioms of Separation and Collection so that
we will be able to derive statements which universally quantified over a parameter, such as For all groups
G, the set Z(G) = {x G : xy = yx for all x G} exists, rather than having to reprove that Z(G) is
a set for each group G that we know exists. Finally, notice how we can avoid using defined notions (like
, , and S(x) in the Axiom of Infinity) by expanding them out into our fixed language. For example, we
replace x y by w(w x w y) and replace z by w(y(y
/ w) w z) (we could also replace it
w(y(y
/ w) w z)).
In each of the following axioms, when we write a formula (x1 , x2 , . . . , xk ), we implicitly mean that the
xi s are distinct variables and that every free variable of is one of the xi . We also use ~p to denote a finite
sequence of variables p1 , p2 , . . . , pk . Notice that we dont need the Axiom of Existence because it is true in
all L-structures (recall that all L-structures are nonempty).
Axiom of Extensionality:
xy(w(w x w y) x = y)
Axiom (Scheme) of Separation: For each formula (x, y,~p) we have the axiom
~pyzx(x z (x y(x, y,~p)))
Axiom of Pairing:
xyz(x z y z)
Axiom of Union:
xuz(y(z y y x) z u)
Axiom of Power Set:
xzy(w(w y w x) y z)
Axiom of Infinity:
z(w(y(y
/ w) w z) x(x z y(w(w y (w x w = x)) y z)))
Axiom (Scheme) of Collection: For each formula (x, y,~p) we have the axiom
~pw((x(x w y(x, y,~p))x(x w uv(((x, u,~p) (x, v,~p)) u = v)))
zx(x w y(y z (x, y,~p))))
Axiom of Choice:
z((x(x z w(w x)) xy((x z y z w(w x w y)) x = y))
cx(x z (w(w x w c) uv((u x v x u c v c) u = v))))
Axiom of Foundation:
z(x(x z) x(x z (y(y z y x))))

9.4. WORKING FROM THE AXIOMS

131

Let AxZF C be the above set of sentences, and let ZFC = Cn(AxZF C ) (ZFC stands for Zermelo-Fraenkel
set theory with Choice). Other presentations state the axioms of ZFC a little differently, but they all
give the same theory. Some people refer to the Axiom of Separation as the Axiom of Comprehension, but
Comprehension is sometimes also used to mean the contradictory statement (via Russells Paradox) that
we can always form the set {x : P (x)}, so I prefer to call it Separation. Also, some presentations refer to
the Axiom of Collection as the Axiom of Replacement, but this name is more applicable to the statement
that replaces the last in the statement of Collection with a , and this formulation implies the Axiom of
Separation.

9.4

Working from the Axioms

We have set up ZFC as a first-order theory similar to the group axioms, ring axioms, or axioms of partial
orderings. Since we have two notions of implication (sementic and syntactic), in order to show that ZF C,
we can show that either AxZF C  or AxZF C ` . Given your experience with syntactic deductions, Im
guessing that you will jump on the first one.
When attempting to show that AxZF C  we must take an arbitrary model of AxZF C and show that
it is a model of . Thus, we must be mindful of strange L-structures and perhaps unexpected models. For
example, let L be the language of set theory (so we have one binary relation symbol ) and let N be the
L-structure (N, <). Lets see which elements of AxZF C hold in N.
Axiom of Extensionality: In the structure N, this interprets as saying that whenever two elements of
N have the same elements of N less than them, then they are equal. This holds in N.
Axiom (Scheme) of Separation: This does not hold in N. Let (x, y) be the formula w(w x). The
corresponding instance of Separation is:
yzx(x z (x y w(w x)))
In the structure N, this interprets as saying that for all n N, there is an m N such that for all
k N, we have k < m if and only if k < n and k 6= 0. This does not hold in N because if we consider
n = 2, there is no m N such that 0 6< m and yet 1 < m.
Axiom of Pairing: In the structure N, this interprets as saying that whenever m, n N, there exists
k N such that m < k and n < k. This holds in N because given m, n N, we may take k =
max{m, n} + 1.
Axiom of Union: In the structure N, this interprets as saying that whenever n N, there exists ` N
such that whenever k N has the property that there exists m N with k < m and m < n, then
k < `. This holds in N because given n N, we may take ` = n since if k < m and m < n, then k < n
by transitivity of < in N (in fact, we may take ` = n 1 if n 6= 0).
Axiom of Power Set: In the structure N, this interprets as saying that whenever n N, there exists
` N such that whenever m N has the property that every k < m also satisfies k < n, then m < `.
This holds in N because given n N, we may take ` = n + 1 since if m N has the property that
every k < m also satisfies k < n, then m n and hence m < n + 1.
Axiom of Infinity: In the structure N, this interprets as saying that there exists n N such that 0 < n
and whenever m < n, we have m + 1 < n. This does not hold in N.
Axiom (Scheme) of Collection: This holds in N, as we now check. Fix a formula (x, y,~p). Interpreting
in N, we need to check that if we fix natural numbers ~q and an n N such that for all k < n there
exists a unique ` N such that (N, k, `, ~q)  , then there exists m N such that for all k < n there

132

CHAPTER 9. INTRODUCTION TO AXIOMATIC SET THEORY


exists an ` < m such that (N, k, `, ~q)  . Lets then fix natural numbers ~q and an n N, and suppose
that for all k < n there exists a unique ` N such that (N, k, `, ~q)  . For each k < n, let `k be
the unique element of N such that (N, k, `k , ~q)  . Letting m = max{`k : k < n} + 1, we see that m
suffices. Therefore, this holds in N.
Axiom of Choice: In the structure N, this interprets as saying that whenever n N is such that
Every m < n is nonzero.
For all `, m < n, there is no k with k < ` and k < m
then there exists m N such that for all k < n, there is exactly one ` N with ` < m and ` < n.
Notice that the only n N satisfying the hypothesis (that is, the above two conditions) is n = 0. Now
for n = 0, the condition is trivial because we may take m = 0 as there is no k < 0. Therefore, this
holds in N.
Axiom of Foundation: In the structure N, this interprets as saying that whenever n N has the
property that there is some m < n, there there exists m < n such that there is no k with k < m and
k < n. Notice that n N has the property that there is some m < n if and only if n 6= 0. Thus, this
holds in N because if n 6= 0, then we have that 0 < n and there is no k with k < 0 and k < n.

Is AxZF C satisfiable? Can we somehow construct a model of AxZF C ? These are interesting questions
with subtle answers. For now, youll have to live with a set of axioms with no obvious models.
Thus, when we develop set theory below, we will be arguing semantically via models. Rather that
constantly saying Fix a model M of AxZF C at the beginning of each proof, and proceeding by showing
that (M, s)  for various , we will keep the models in the background and assume that we are living
inside one for each proof. When we are doing this, a set is simply an element of the universe M of our
model M, and given two sets a and b, we write a b to mean that (a, b) is an element of M .
Also, although there is no hierarchy of sets in our axioms, we will often follow the practice of using
lowercase letters a, b, c, etc. to represent sets that we like to think of as having no internal structure (such
as numbers, elements of a group, points of a topological space), use capital letters A, B, C, etc. to represent
sets whose elements we like to think of as having no internal structure, and use script letters A, F, etc. to
represent sets of such sets.

9.5

ZFC as a Foundation for Mathematics

In the next 2 chapters well show how to develop mathematics quite faithfully within the framework of ZFC.
This raises the possibility of using set theory as a foundation for mathematical practice. However, this seems
circular because our development of logic presupposed normal mathematical practice and naive set theory
(after all, we have the set of axioms of ZFC). It seems that logic depends on set theory and set theory
depends on logic, so how have we gained anything from a foundational perspective?
It is indeed possible, at least in principle, to get out of this vicious circle and have a completely finististic
basis for mathematics. The escape is to buckle down and use syntactic arguments. Now there are infinitely
many axioms of ZFC (because of the two axioms schemes), but instead of showing that AxZF C ` , we
can instead show that ` for a finite AxZF C (in which every line of the deduction has a finite
collection of formulas on the left-hand side). In this way, it would be possible in principle to make every
proof completely formal and finitistic where each line follows from previous lines by one of our proof rules.
If we held ourselves to this style, then we could reduce mathematical practice to a game with finitely many
symbols (if you insisted we could replace our infinite stock of variables V ar with one variable symbol x and
a new symbol 0 and refer to x3 as x000 , etc.) where each line could be mechanically checked according to our
finitely many rules. Thus, it would even be possible to program a computer to check every proof.

9.5. ZFC AS A FOUNDATION FOR MATHEMATICS

133

In practice (for human beings at least), the idea of giving deductions for everything is outlandish. Leaving
aside the fact that actually giving short deductions is often a painful endeavor in itself, it turns out that
even the most basic statements of mathematics, when translated into ZFC, are many thousands of symbols
long, and elementary mathematical proofs (such as say the Fundamental Theorem of Arithmetic) are many
thousands of lines long. Well discuss how to develop the real numbers below, but any actual formulas
talking about real numbers would be ridiculously long and incomprehensible to the human reader. Due to
these reasons, and since the prospect of giving syntactic deductions for everything gives me nightmares, I
choose to argue everything semantically in the style of any other axiomatic subject in mathematics. It is an
interesting and worthwhile exercise, however, to imagine how everything could be done syntactically.

134

CHAPTER 9. INTRODUCTION TO AXIOMATIC SET THEORY

Chapter 10

Developing Basic Set Theory


10.1

First Steps

We first establish some basic set theoretic facts carefully from the axioms.
Definition 10.1.1. If A and B are sets, we write A B to mean for all c A, we have c B.
Although the symbol is not part of our language, we will often use in our formulas and arguments. This use is justified because it can always be transcribed into our language by replacing it with the
corresponding formula as we did in the axioms.
Proposition 10.1.2. There is a unique set with no elements.
Proof. Fix a set b. By Separation applied to the formula x 6= x, there is a set c such that for all a, we have
a c if and only if a b and a 6= a. For all a, we have a = a, hence a
/ c. Therefore, there is a set with no
elements. If c1 and c2 are two sets with no elements, then by the Axiom of Extensionality, we may conclude
that c1 = c2 .
Definition 10.1.3. We use to denote the unique set with no elements.
As above, we will often use in our formulas and arguments despite the fact that there is no constant
in our language representing it. Again, this use can always be eliminated by replacing it with a formula
as we did in the axioms. We will continue to follow this practice without comment in the future when we
introduce new definitions to stand for sets for which ZFC proves existence and uniqueness. In each case, be
sure to understand how these definitions could be eliminated.
We now show how to turn the idea of Russells Paradox into a proof that there is no universal set.
Proposition 10.1.4. There is no set u such that a u for every set a.
Proof. Suppose that u is a set and a u for every set a. By Separation applied to the formula x
/ x, there
is a set c such that for all sets a, we have a c if and only if a u and a
/ a. Since a u for every set a,
we have a c if and only if a
/ a for every set a. Therefore, c c if and only if c
/ c, a contradiction.
Proposition 10.1.5. For all sets a and b, there is a unique set c such that, for all sets d, we have d c if
and only if either d = a or d = b.
Proof. Let a and b be sets. By Pairing, there is a set e such that a e and b e. By Separation applied to
the formula x = a x = b (notice that we are using parameters a and b in this use of Separation), there is a
set c such that for all d, we have d c if and only if both d e and either d = a or d = b. It follows that a c,
b c, and for any d c, we have either d = a or d = b. Uniqueness again follows from Extensionality.
135

136

CHAPTER 10. DEVELOPING BASIC SET THEORY

Corollary 10.1.6. For every set a, there is a unique set c such that, for all sets d, we have d c if and
only if d = a.
Proof. Apply the previous proposition with b = a.
Definition 10.1.7. Given two sets a and b, we use the notation {a, b} to denote the unique set guaranteed
to exist by the Proposition 10.1.5. Given a set a, we use the notation {a} to denote the unique set guaranteed
to exist by the Corollary 10.1.6.
Using the same style of argument, we can use Union and Separation to show that for every set F, there
is a unique set z consisting precisely of elements of elements of F. The proof is an exercise.
Proposition 10.1.8. Let F be a set. There is a unique set U such that for all a, we have a U if and only
if there exists B F with a B.
S
Definition 10.1.9. Let F be a set. We use the notation F to denote the unique setSguaranteed to exist
by the previous proposition. If A and B are sets, we use the notation A B to denote {A, B}.
We now introduce some notation which conforms with the normal mathematical practice of writing sets.
Definition 10.1.10. Suppose that (x, y,~p) is a formula. Suppose that B and ~q are sets. By Separation
and Extensionality, there is a unique set C such that for all sets a, we have a C if and only if a B and
(a, B, ~q). We denote this unique set by {a B : (a, B, ~q)}.
With unions in hand, what about intersections? As in unions, the general case to consider is when we
have a family of sets F. We then want to collect those a such that a B for all B F into a set. We do
need to be a little careful however. What happens if F = ? It seems that our definition would want to
make the the intersection of the sets in F consists of all sets, contrary to Proposition 10.1.4. However, this
is the only case which gives difficulty because if F 6= , we can take the intersection to be a subset of one
(any) of the elements of F.
Proposition 10.1.11. Let F be a set with F 6= . There is a unique set I such that for all a, we have a I
if and only if a B for all B F.
Proof. Since F =
6 , we may fix C F. Let I = {a C : B F(a B)}. For all a, we have a I if and
only if a B for all B F. Uniqueness again follows from Extensionality.
T
Definition 10.1.12. Let F be a set with F =
6 . We use the notation F to denote the unique set
guaranteed
to exist by the previous proposition. If A and B are sets, we use the notation A B to denote
T
{A, B}.
If A is a set, then we can not expect the complement of A to be a set because the union of such a
purported set with A would be a set which has every set as an element, contrary to Proposition 10.1.4.
However, if A and B are sets, and A B, we can take the relative complement of A in B.
Proposition 10.1.13. Let A and B be sets with A B. There is a unique set C such that for all a, we
have a C if and only if a B and a
/ A.
Definition 10.1.14. Let A and B be sets with A B. We use the notation B\A or B A to denote the
unique set guaranteed to exist by the previous proposition.

10.2. ORDERED PAIRS AND CARTESIAN PRODUCTS

10.2

137

Ordered Pairs and Cartesian Products

Since sets have no internal order to them, we need a way to represent ordered pairs. Fortunately (since it
means we dont have to extend our notion of set), there is a hack which allows us to build sets which capture
the notion of an ordered pair.
Definition 10.2.1. Given two sets a and b, we let (a, b) = {{a}, {a, b}}.
Proposition 10.2.2. Let a, b, c, d be sets. If (a, b) = (c, d), then a = c and b = d.
Proof. Suppose that a, b, c, d are sets and {{a}, {a, b}} = {{c}, {c, d}}. We first show that a = c. Since
{c} {{a}, {a, b}}, either {c} = {a} or {c} = {a, b}. In either case, we have a {c}, hence a = c. We
now need only show that b = d. Suppose instead that b 6= d. Since {a, b} {{c}, {c, d}}, we have either
{a, b} = {c} or {a, b} = {c, d}. In either case, we conclude that b = c (because either b {c} or b {c, d},
and b 6= d). Similarly, since {c, d} {{a}, {a, b}}, we have either {c, d} = {a} or {c, d} = {a, b}. In either
case, we conclude that d = a. Therefore, using the fact that a = c, it follows that b = d.
We next turn to Cartesian products. Given two sets A and B, we would like to form the set {(a, b) : a A
and b B}. Justifying that we can collect these elements into a set takes a little work. The idea is as
follows. For each fixed a A, we can assert the existence of {a} B = {(a, b) : b B} using Collection (and
Separation) because B is a set. Then using Collection (and Separation) again, we can assert the existence
of {{a} B : a A} since A is a set. The Cartesian product is then the union of this set. At later points,
we will consider this argument sufficient, but we give a slightly more formal version here to really see how
the axioms of Collection and Separation are applied and where the formulas come into play.
Proposition 10.2.3. For any two sets A and B, there exists a unique set, denoted by A B, such that for
all x, we have x A B if and only if there exists a A and b B with x = (a, b).
Proof. Let (b, x, a) be a formula expressing that x = (a, b) (think about how to write this down). We
have the statement
aB(b(b B !x(b, x, a)))
where ! is shorthand for there is a unique. Therefore, by Collection, we may conclude that
aBCb(b B x(x C (b, x, a)))
Next using Separation and Extensionality, we have
aB!Cb(b B x(x C (b, x, a)))
From this it follows that
ABa(a A !Cb(b B x(x C (b, x, a))))
Using Collection again, we may conclude that
ABFa(a A C(C F b(b B x(x C (b, x, a)))))
This implies
ABFab((a A b B) C(C F x(x C (b, x, a))))
Now let A and B be sets. From the last line above, we may conclude
that there exists F such that for all
S
a A and all b B, there exists C F with (a, b) C. Let D = F. Given any a A and b B, we then
have (a, b) D. Now applying Separation to the set D and the formula ab(a A b B (b, x, a)),
there is a set E such that for all x, we have x E if and only if there exists a A and b B with x = (a, b).
As usual, Extensionality gives uniqueness.

138

CHAPTER 10. DEVELOPING BASIC SET THEORY

10.3

Relations and Functions

Now that we have ordered pairs and Cartesian products, we can really make some progress.
Definition 10.3.1. A relation is a set R such that every set x R is an ordered pair. In other words, R is
a relation if x(x R ab(x = (a, b))).
Given a relation R, we want to define its domain to be the set of first elements of ordered pairs which are
elements of R, and we want to define its range to be the set of second elements of ordered pairs which are
elements of R. These are good descriptions which can easily (though not shortly) be turned into formulas,
but we need to know that there is some set which contains all of these elements in order to apply Separation.
Since the elements
S S of an ordered pair (a, b) = {{a}, {a, b}} are two deep, a good exercise is to convince
yourself that
R will work. This justifies the following definitions.
Definition 10.3.2. Let R be a relation
1. dom(R) is the set of a such that there exists b with (a, b) R.
2. ran(R) is the set of b such that there exists a with (a, b) R.
Definition 10.3.3. Let R be a relation. We write aRb if (a, b) R.
Definition 10.3.4. Let A be a set. We say that R is a relation on A if dom(R) A and ran(R) A.
We define functions in the obvious way.
Definition 10.3.5. A function f is a relation which is such that for all a dom(f ), there exists a unique
b ran(f ) such that (a, b) f .
Definition 10.3.6. Let f be a function. We write f (a) = b if (a, b) f .
Definition 10.3.7. Let f be a function. f is injective (or an injection) if whenever f (a1 ) = b and f (a2 ) = b,
we have a1 = a2 .
Definition 10.3.8. Let A and B be sets. We write f : A B to mean that f is a function, dom(f ) = A
and ran(f ) B.
We are now in a position to define when a function f is surjective and bijective. Notice that surjectivity
and bijectivity are not properties of a function itself because these notions depend on a set which you consider
to contain ran(f ). Once we have a fixed such set in mind, however, we can make the definitions.
Definition 10.3.9. Let A and B be sets, and let f : A B.
1. f is surjective (or a surjection) if ran(f ) = B.
2. f is bijective (or a bijection) if f is injective and surjective.
Definition 10.3.10. Let A and B be sets.
1. We write A  B to mean that there is an injection f : A B.
2. We write A B to mean that there is a bijection f : A B.
Proposition 10.3.11. Let A, B, and C be sets.
1. If A  B and B  C, then A  C.
2. A A.
3. If A B, then B A.
4. If A B and B C, then A C.

10.4. ORDERINGS

10.4

139

Orderings

Definition 10.4.1. Let R be a relation on a set A.


1. R is reflexive on A if for all a A, we have aRa.
2. R is symmetric on A if for all a, b A, if aRb then bRa.
3. R is asymmetric on A if for all a, b A, if aRb then it is not the case that bRa.
4. R is antisymmetric on A if for all a, b A, if aRb and bRa, then a = b.
5. R is transitive on A if for all a, b, c A, if aRb and bRc, then aRc.
6. R is connected on A if for all a, b A, either aRb, a = b, or bRa.
Definition 10.4.2. Let R be a relation on a set A.
1. R is a partial ordering on A if R is transitive on A and asymmetric on A.
2. R is a linear ordering on A if R is a partial ordering on A and R is connected on A.
3. R is a well-ordering on A if R is a linear ordering on A and for every X A with X 6= , there exists
m X such that for all x X, either m = x or mRx.

10.5

The Natural Numbers and Induction

We specifically added the Axiom of Infinity with the hope that it captured the idea of the set of natural
numbers. We now show how this axiom, in league with the others, allows us to embed the theory of the
natural numbers into set theory. We start by defining the initial natural number and successors of sets.
Definition 10.5.1. 0 =
Definition 10.5.2. Given a set x, we let S(x) = x {x}, and we call S(x) the successor of x.
With 0 and the notion of successor, we can then go on to define 1 = S(0), 2 = S(1) = S(S(0)), and
continue in this way to define any particular natural number. However, we are seeking to form the set of all
natural numbers.
Definition 10.5.3. A set I is inductive if 0 I and for all x I, we have S(x) I.
The Axiom of Infinity simply asserts the existence of some inductive set J. Intuitively, we have 0 J,
S(0) J, S(S(0)) J, and so on. However, J may very well contain more than just repeated applications
of S to 0. We now use the top-down approach to generation to define the natural numbers (the other two
approaches will not work yet because their definitions rely on the natural numbers).
Proposition 10.5.4. There is a smallest inductive set. That is, there is an inductive set K such that K I
for every inductive set I.
Proof. By the Axiom of Infinity, we may fix an inductive set J. Let K = {x J : x I for every inductive
set I}. Notice that 0 K because 0 I for every inductive set I (and so, in particular, 0 J). Suppose
that x K. If I is inductive, then x I, hence S(x) I. It follows that S(x) I for every inductive set I
(and so, in particular, S(x) J), hence S(x) K. Therefore, K is inductive. By definition of K, we have
K I whenever I is inductive.
By Extensionality, there is a unique smallest inductive set, so this justifies the following definition.

140

CHAPTER 10. DEVELOPING BASIC SET THEORY

Definition 10.5.5. We denote the unique smallest inductive set by .


We think that captures our intuitive idea of the set of natural numbers, and it is now our goal to show
how to prove the basic statements about the natural numbers which are often accepted axiomatically. We
first define a relation < on . Remember our intuitive idea is that captures the order relationship on the
natural numbers.
Definition 10.5.6.
1. We define a relation < on by setting < = {(n, m) : n m}.
2. We define a relation on by setting = {(n, m) : n < m or n = m}.
3. We define a relation > on by setting > = {(n, m) : m < n}.
4. We define a relation on by setting = {(n, m) : n > m or n = m}.
Lemma 10.5.7. There is no n with n < 0.
Proof. Since 0 = , there is no set x such that x 0. Therefore, there is no n with n < 0.
Lemma 10.5.8. Let m, n . We have m < S(n) if and only if m n.
Proof. Let m, n . We then have S(n) since is inductive, and
m < S(n) m S(n)
m n {n}
Either m n or m {n}
Either m < n or m = n
m n.
This proves the lemma.
Our primary objective is to show that < is a well-ordering on . Due to the nature of the definition of
, it seems that only way to prove nontrivial results about is by induction. We state the Step Induction
Principle in two forms. The first is much cleaner and seemingly more powerful (because it immediately
implies the second and we can quantify over sets but not over formulas), but the second is how one often
thinks about induction in practice (using properties of natural numbers) and will be the only form that
we can generalize to the collection of all ordinals.
Proposition 10.5.9 (Step Induction Principle on ).
1. Suppose that X is a set, 0 X, and for all n , if n X then S(n) X. We then have X.
2. For any formula (n,~p), we have the sentence
~p(((0,~p) (n )((n,~p) (S(n),~p))) (n )(n,~p))
where (S(n),~p) is shorthand for the formula
x(y(y x (y n y = n)) (x,~p))
Proof.

10.5. THE NATURAL NUMBERS AND INDUCTION

141

1. Let Y = X . Notice first that 0 Y . Suppose now that n Y = X . We then have n


and n X, so S(n) (because is inductive), and S(n) X by assumption. Hence, S(n) Y .
Therefore, Y is inductive, so we may conclude that Y . It follows that X.
2. Fix sets ~q, and suppose (0, ~q) and (n )((n, ~q) (S(n), ~q)). Let X = {n : (n, ~q)}. Notice
that 0 X and for all n , if n X then S(n) X by assumption. It follows from part 1 that
X. Therefore, we have (n )(n, ~q).

With the Step Induction Principle in hand, we can begin to prove the basic facts about the natural
numbers. Our goal is to prove that < is a well-ordering on , but it will take some time to get there.
We first give a very simple inductive proof. For this proof only, we will give careful arguments using both
versions of Step Induction to show how a usual induction proof can be formalized in either way.
Lemma 10.5.10. For all n , we have 0 n.
Proof. The following two proofs correspond to the above two versions of the Induction Principle.
1. Let X = {n : 0 n}, and notice that 0 X. Suppose now that n X. We then have n
and 0 n, hence 0 < S(n) by Lemma 10.5.8, so S(n) X. Thus, by Step Induction, we have X.
Therefore, for all n , we have 0 n.
2. Let (n) be the formula 0 n. We clearly have (0) because 0 = 0. Suppose now that n and
(n). We then have 0 n, hence 0 < S(n) by Lemma 10.5.8. It follows that (S(n)). Therefore, by
Step Induction, we have 0 n for all n .

We give a few more careful inductive proof using the second version of the Induction Principle to illustrate
how parameters can be used. Afterwards, our later inductive proofs will be given in a more natural relaxed
style.
Our relation < is given by , but it is only defined on elements of . We thus need the following
proposition which says that every element of a natural number is a natural number.
Proposition 10.5.11. Suppose that n and m n. We then have m .
Proof. The proof is by induction on n; that is, we hold m fixed by treating it as a parameter. Thus, fix
m . Let X = {n : m n m }. Notice that 0 X because m
/ 0 = . Suppose now that
n X. We show that S(n) X. Suppose that m S(n) = n {n}. We then know that either m n,
in which case m by induction (i.e. because n X), or m = n, in which case we clearly have m .
It follows that S(n) X. Therefore, by Step Induction, we may conclude that X = . Since m was
arbitrary, the result follows.
Proposition 10.5.12. < is transitive on .
Proof. We prove the result by induction on n. Fix k, m . Let X = {n : (k < m m < n) k < n}.
We then have that 0 X vacuously because we do not have m < 0 by Lemma 10.5.7. Suppose now that
n X. We show that S(n) X. Suppose that k < m and m < S(n) (if not, then S(n) X vacuously).
By Lemma 10.5.8, we have m n, hence either m < n or m = n. If m < n, then k < n because n X. If
m = n, then k < n because k < m. Therefore, in either case, we have k < n, and hence k < S(n) by Lemma
10.5.8. It follows that S(n) X. Thus, by Step Induction, we may conclude that X = . Since k, m
were arbitrary, the result follows.
Lemma 10.5.13. Let m, n . We have S(m) n if and only if m < n.

142

CHAPTER 10. DEVELOPING BASIC SET THEORY

Proof. Suppose first that m, n and S(m) n.


Case 1: Suppose that S(m) = n. We have m < S(m) by Lemma 10.5.8, hence m < n.
Case 2: Suppose that S(m) < n. We have m < S(m) by Lemma 10.5.8, hence m < n by Proposition
10.5.12.
Therefore, for all n, m , if S(m) n, then m < n.
We prove the converse statement that for all m, n , if m < n, then S(m) n by induction on n. Fix
m . Let X = {n : m < n S(m) n}. We have 0 X vacuously because we do not have m < 0
by Lemma 10.5.7. Suppose now that n X. We show that S(n) X. Suppose that m < S(n) (otherwise,
S(n) X vacuously). By Lemma 10.5.8, we have m n.
Case 1: Suppose that m = n. We then have S(m) = S(n), hence S(n) X.
Case 2: Suppose that m < n. Since n X, we have S(m) n. By Lemma 10.5.8, we know that
n < S(n). If S(m) = n, this immediately gives S(m) < S(n), while if S(m) < n, we may conclude that
S(m) < S(n) by Proposition 10.5.12. Hence, we have S(n) X.
Thus, by Step Induction, we may conclude that X = . Since m was arbitrary, the result follows.
Lemma 10.5.14. There is no n with n < n.
Proof. This follows immediately from the Axiom of Foundation, but we prove it without that assumption.
Let X = {n : (n < n)}. We have that 0 X by Lemma 10.5.7. Suppose that n X. We prove that
S(n) X by supposing that S(n) < S(n) and deriving a contradiction. Suppose then that S(n) < S(n).
By Lemma 10.5.8, we have S(n) n, hence either S(n) = n or S(n) < n. Also by Lemma 10.5.8, we have
n < S(n). Therefore, if S(n) = n, then n < n, and if S(n) < n, then n < n by Proposition 10.5.12 (since
n < S(n) and S(n) < n), a contradiction. It follows that S(n) X. Therefore, there is no n with
n < n.
Proposition 10.5.15. < is asymmetric on .
Proof. Suppose that n, m , n < m, and m < n. By Proposition 10.5.12, it follows that n < n,
contradicting Lemma 10.5.14
Proposition 10.5.16. < is connected on .
Proof. Fix m . We prove that for all n , either m < n, m = n, or n < m by induction on n. Let
X = {n : (m < n) (m = n) (n < m)}. We have 0 m by Lemma 10.5.10, hence either m = 0 or
0 < m, and so 0 X. Suppose then that n X, so that either m < n, m = n, or n < m.
Case 1: Suppose that m < n. Since n < S(n) by Lemma 10.5.8, we have m < S(n) by Proposition
10.5.12.
Case 2: Suppose that m = n. Since n < S(n) by Lemma 10.5.8, it follows that m < S(n).
Case 3: Suppose that n < m. We have S(n) m by Lemma 10.5.13. Hence, either m = S(n) or
S(n) < m.
Therefore, in all cases, either m < S(n), m = S(n), or S(n) < m, so S(n) X. The result follows by
induction.
In order to finish off the proof that < is a well-ordering on , we need a new version of induction. You
may have heard it referred to as Strong Induction.
Proposition 10.5.17 (Induction Principle on ).
1. Suppose that X is set and for all n , if m X for all m < n, then n X. We then have X.
2. For any formula (n,~p), we have the sentence
~p((n )((m < n)(m,~p) (n,~p)) (n )(n,~p))

10.6. SETS AND CLASSES

143

Proof.
1. Let Y = {n : (m < n)(m X)}. Notice that Y and 0 Y because there is no m with
m < 0 by Lemma 10.5.7. Suppose that n Y . We show that S(n) Y . Suppose that m < S(n). By
Lemma 10.5.8, we have m n, hence either m < n or m = n. If m < n, then m X because n Y .
For the case m = n, notice that n X by assumption (because m X for all m < n). Therefore,
S(n) Y . By Step Induction, it follows that Y .
Now let n . We have n , hence S(n) because is inductive, so S(n) Y . Since n < S(n)
by Lemma 10.5.8, it follows that n X. Therefore, X.
2. This follows from part 1 using Separation. Fix sets ~q, and suppose that
(n )((m < n)(m, ~q) (n, ~q))
Let X = {n : (n, ~q)}. Suppose that n and m X for all m < n. We then have
(m < n)(m, ~q), hence (n, ~q) by assumption, so n X. It follows from part 1 that X.
Therefore, we have (n )(n, ~q).

It is possible to give a proof of part 2 which makes use of part 2 of the Step Induction Principle, thus
avoiding the detour through sets and using only formulas. This proof simply mimics how we obtained part 1
above, but uses formulas everywhere instead of working with sets. Although it is not nearly as clean, when
we treat ordinals, there will times when we need to argue at the level of formulas.
Theorem 10.5.18. < is a well-ordering on
Proof. By Proposition 10.5.12, Proposition 10.5.15, and Proposition 10.5.16, it follows that < is a linear
ordering on . Suppose then that Z and there is no n Z such that for all m Z, either n = m or
n < m. We show that Z = . Notice that for every n Z, there exists m Z with m < n by Proposition
10.5.12.
Let Y = \Z. We show that Y = using the Induction Principle. Notice first that 0 Y because if
0 Z, then there exists m Z with m < 0 by the last sentence of the previous paragraph, contrary to
Lemma 10.5.7. Suppose then that n is such that m Y , i.e. m
/ Z for all m < n. If n
/ Y , we would
then have that n Z, so by the last sentence of the previous paragraph, there exists m Z with m < n, a
contradiction. Therefore, n Z. Hence, by the Induction Principle, we have that Y = and so Z = .
Therefore, if Z and Z 6= , there exists n X such that for all m Z, either n = m or n < m. It
follows that < is a well-ordering on .

10.6

Sets and Classes

We know from Proposition 10.1.4 that there is no set u such that a u for all sets a. Thus, our theory forbids
us from placing every set into one universal set which we can then play with and manipulate. However, this
formal impossibility within our theory does not prevent us from thinking about or referring to the collection
of all sets or other collections which are too large to form into a set. After all, our universal quantifiers
do indeed range over the collection of all sets. Also, if we are arguing semantically, then given a model M
of ZF C, we may externally work with the power set of M .
We want to be able to reason about such collections of sets in a natural manner within our theory
without violating our theory. We will call such collections classes to distinguish them from sets. The idea
is to recall that any first-order theory can say things about certain subsets of every model: the definable
subsets. In our case, a formula (x) is implicitly defining a certain collection of sets. Perhaps this collection

144

CHAPTER 10. DEVELOPING BASIC SET THEORY

is too large to put together into a set inside the model, but we may nevertheless use the formula in various
ways within our theory. For example, for any formulas (x) and (x), the sentence x((x) (x)) says
that every set which satisfies also satisfies . If there exist sets C and D such that x((x) x C) and
x((x) x D), then we can use Separation to form the sets A = {x C : (x)} and B = {x D : (x)},
in which case the sentence x((x) (x)) simply asserts that A B. However, even if we cant form these
sets (intuitively because {x : (x)} and {x : (x)} are too large to be sets), the sentence is expressing
the same underlying idea. Allowing the possibility of parameters, this motivates the following internal
definition.
Definition 10.6.1. A class C is a formula (x,~p).
Our course, this isnt a very good way to think about classes. Externally, a class is simply a definable
set (with the possibility of parameters). The idea is that once we fix sets ~q to fill in for the position of the
parameters, the formula describes the collection of those sets a such that (a, ~q). The first class to consider
is the class of all sets, which we denote by V. Formally, we define V to be the formula x = x, but we will
content ourselves with defining classes in the following more informal external style.
Definition 10.6.2. V is the class of all sets.
Heres a more interesting illustration of how classes can be used and why we want to consider them. Let
CR be the class of all relations and let CF be the class of all functions. More formally, CR is the formula
R (x) given by
y(y x ab(y = (a, b)))
while CF is the formula F (x) given by
y(y x ab(y = (a, b))) ab1 b2 (((a, b1 ) x (a, b2 ) x) b1 = b2 )
With this shorthand in place, we can write things like CF CR to stand for the provable sentence
x(F (x) R (x)). Thus, by using the language of classes, we can express complicated formulas in a
simplified, more suggestive, fashion. Of course, theres no real need to introduce classes because we could
always just refer to the formulas, but it is psychologically easier to think of a class as some kind of ultra-set
which our theory is able to handle, even if we are limited in what we can do with classes.
With the ability to refer to classes, why deal with sets at all? The answer is that classes are much less
versatile than sets. For example, if C and D are classes, it makes no sense to write C D because this
doesnt correspond to a formula built from the implicit formulas giving C and D. This inability corresponds
to the intuition that classes are too large to collect together into a set and then put into other collections.
Hence, asking whether V V is meaningless. Also, since classes are given by formulas, we are restricted to
referring only to definable collections. Thus, there is no way to talk about or quantify over all collections
of sets (something that is meaningless internally). However, there are many operation which do make sense
on classes.
For instance, suppose that R is a class of ordered pairs (with parameters p~). That is, R is a formula
(x, p~) such that the formula x((x, p~) ab(x = (a, b))) is provable. We think of R as a class relation.
Using suggestive notation, we can then go on to define dom(R) to be the class consisting of those sets a
such that there exists a set b with (a, b) R. To be precise, dom(R) is the class which is the formula
(a, p~) given by xb(x = (a, b) (x, p~)). Thus, we can think of dom() as a operation on classes (given
any formula (x, p~) which is a class relation, applying dom() results in the class given by the formula
xb(x = (a, b) (x, p~))).
Similarly, we can talk about class functions. We can even use notation like F : V V to mean that F is
a class function with dom(F) = V. Again, each of these expressions could have been written out as formulas
in our language, but the notation is so suggestive that its clear how to do this without actually having to
do it. An example of a general class function is U : V V V given by U(a, b) = a b. Convince yourself
how to write U as a formula.

10.6. SETS AND CLASSES

145

We can not quantify over classes within our theory in the same way that we can quantify over sets because
there is no way to quantify over the formulas of set theory within set theory. However, we can, at the price
of considering one theorem as infinitely many (one for each formula), make sense of a theorem which does
universally quantify over classes. For example, consider the following.
Proposition 10.6.3. Suppose that C is a class, 0 C, and for all n , if n C then S(n) C. We
then have C.
This proposition is what is obtained from the first version of Step Induction on by replacing the set
X with the class C. Although the set version can be written as one sentence which is provable in ZFC, this
version can not because we cant quantify over classes in the the theory. Unwrapping this proposition into
formulas, it says that for every formula (x, p~), if we can prove (x, p~) and (n )((n, p~) (S(n), p~)),
then we can prove (n )(n, p~). That is, for each formula (x, p~), we can prove the sentence
~
p(((0, p~) (n )((n, p~) (S(n), p~))) (n )(n, p~))
Thus, the class version is simply a neater way of writing the second version of Step Induction on which masks
the fact that the quantification over classes requires us to write it as infinitely many different propositions
(one for each formula (x, p~)) in our theory.
Every set can be viewed as a class by making use of the class M given by the formula x p. That is,
once we fix a set p, the class x p describes exactly the elements of p. For example, using M in class version
of Step Induction on , we see that the following sentence is provable:
p((0 p (n )(n p S(n) p)) (n )(n p))
Notice that this is exactly the set version of Step Induction on .
On the other hand, not every class can be viewed as a set (look at V, for example). Let C be a class.
We say that C is a set if there exists a set A such that for all x, we have x C if and only if x A. At
the level of formulas, this means that if C is given by the formula (x, p~), then we can prove the formula
Ax((x, p~) x A). By Separation, this is equivalent to saying that there is a set B such that for all x,
if x C then x B (i.e. we can prove the formula Bx((x, p~) x B)). A class which is not set (that
is, we can prove (Ax((x, p~) x A))) is called a proper class. For example, V is a proper class.
The following proposition will be helpful to us when we discuss transfinite constructions. Intuitively, it
says that proper classes are too large to embedded into any set.
Proposition 10.6.4. Let C be a proper class and let A be a set. There is no injective class function
F : C A.
Proof. Suppose that F : C A is an injective class function. Let B = {a A : c(c C F(c) = a)} and
notice that B is a set by Separation (recall that C and F are given by formulas). Since for each b B, there
is a unique c C with F(c) = b (using the fact that F is injective), we may use Collection and Separation
to conclude that C is a set, contradicting the fact that C is a proper class.
We end this section by seeing how to simply restate the Axiom of Separation and the Axiom of Collection
in the language of classes.
Axiom of Separation: Every subclass of a set is a set.
Axiom of Collection: If F is a class function and A is a set, then there is a set containing the image of A
under F.

146

CHAPTER 10. DEVELOPING BASIC SET THEORY

10.7

Finite Sets, Powers, and Products

10.7.1

Finite Sets

Definition 10.7.1. Let A be a set. A is finite if there exists n such that A n. If A is not finite, we
say that A is infinite.
Proposition 10.7.2. Suppose that n . Every injective f : n n is bijective.
Proof. The proof is by induction on n . Suppose first that n = 0 and f : 0 0 is injective. We then have
f = , so f is trivially bijective. Suppose now that the result holds for n so that every injective f : n n is
bijective. Suppose that f : S(n) S(n). We then have f (n) n, and we consider two cases.
Case 1: Suppose that f (n) = n. Since f is injective, we have f (m) 6= n for every m < n, hence
f (m) < n for every m < n (because f (m) < S(n) for every m < n). It follows that f  n : n n. Notice
that f  n : n n is injective because f is injective, hence f  n is bijective by induction. Therefore,
ran(f  n) = n, and hence ran(f ) = S(n) (because f (n) = n). It follows that f is surjective, so f is bijective.
Case 2: Suppose that f (n) < n. We first claim that n ran(f ). Suppose instead that n
/ ran(f ).
Notice that f  n : n n is injective because f is injective, hence f  n is bijective by induction. Therefore,
f (n) ran(f  n) (because f (n) < n), so there exists ` < n with f (`) = f (n), contrary to the fact that f is
injective. It follows that n ran(f ). Fix k < n with f (k) = n. Define a function g : n n by
(
f (m) if m 6= k
g(m) =
f (n) if m = k
Notice that if m1 , m2 < n with m1 6= m2 and m1 , m2 6= k, then g(m1 ) 6= g(m2 ) since f (m1 ) 6= f (m2 )
(because f is injective). Also, if m < n with m 6= k, then g(m) 6= g(k) since f (m) 6= f (n) (again because f is
injective). It follows that g : n n is injective, hence bijective by induction. From this we can conclude that
ran(f ) = S(n) as follows. Notice that f (n) ran(f ) and n ran(f ) because f (k) = n. Suppose that ` < n
with ` 6= f (n). Since g : n n is bijective, there exists a unique m < n with g(m) = `. Since ` 6= f (n), we
have m 6= k, hence f (m) = g(m) = `, so ` ran(f ). Therefore, ran(f ) = S(n), and hence f is bijective.
Corollary 10.7.3 (Pigeonhole Principle). If n, m and m > n, then m 6 n.
Proof. Suppose that f : m n is injective. It then follows that f  n : n n is injective, hence f  n is
bijective by Proposition 10.7.2. Therefore, since f (n) n, it follows that there exists k < n with f (k) = f (n),
contradicting the fact that f is injective. Hence, m 6 n.
Corollary 10.7.4. If m, n and m n, then m = n.
Proof. Suppose that m 6= n so that either m > n or m < n. If m > n, then m 6 n be the Pigeonhole
Principle, so m 6 n. If m < n, then n 6 m by the Pigonhole Principle, so n 6 m and hence m 6 n.
Corollary 10.7.5. If A is finite, there exists a unique n such that A n.
Definition 10.7.6. If A is finite, the unique n such that A n is called the cardinality of A and is
denoted by |A|.
Proposition 10.7.7. Let A be a nonempty set and let n . The following are equivalent:
1. A  n.
2. There exists a surjection g : n A.
3. A is finite and |A| n.

10.7. FINITE SETS, POWERS, AND PRODUCTS

147

Proof. 1 implies 2: Suppose that A  n and fix an injection f : A n. Fix an element b A (which exists
since A 6= ). Define g : n A by letting
g = {(m, a) n A : f (a) = m} {(m, a) n A : m
/ ran(f ) and a = b}.
Notice that g : n A and that g is a surjection.
2 implies 1: Suppose that g : n A is a surjection. Define a set f by letting
f = {(a, m) A n : g(m) = a and g(k) 6= a for all k < m}
Using the fact < well-orders and that g is a surjection, it follows that f : A n. Also, f is injective
because g is a function.
1 implies 3: Suppose that A  n. Let m be the least element of such that A  m, and fix an injection
g : A m. We claim that g is a bijection. Notice that m 6= 0 because A is nonempty, so we may fix k
with m = S(k). If g is not a bijection, we could construct an injective h : A k, a contradiction.
3 implies 1: Suppose that A is finite and |A| n. Let m = |A| n and fix a bijection f : A m. We
then have that f : A n is an injection, so A  n.
Corollary 10.7.8. Suppose that n . Every surjective g : n n is bijective.
Proof. Suppose that g : n n is surjective. Define an injective f : n n such that f g = idn as above.
We then have that f is bijective, hence g is bijective.

10.7.2

Finite Powers

It is possible to use ordered pairs to define ordered triples, ordered quadruples, and so on. For example, we
could define the ordered triple (a, b, c) to be ((a, b), c). However, with the basic properties of in hand, we
can give a much more elegant definition.
Proposition 10.7.9. Let A be a set and let n . There is a unique set, denoted by An , such that for all
f , we have f An if and only if f : n A.
Proof. As usual, uniqueness follows from Extensionality, so we need only prove existence. The proof is by
induction on n. Suppose that n = 0. Since for all f , we have f : 0 A if and only if f = , we may take
A0 = {}. Suppose that the result holds for n, i.e. there exists a set An such that for all f , we have f An
if and only if f : n A.
Fix a A. Notice that for each f An , there is a unique function fa : S(n) A such that fa (m) = f (m)
for all m < n and fa (n) = a (let fa = f {(n, a)} and use Lemma 10.5.8). Therefore, by Collection (since
An is a set), Separation, and Extensionality, there is a unique set Ca such that for all g, we have g Ca if
and only if g = fa for some a A. Notice that for every g : S(n) A with g(n) = a, there is an f : n A
such that g = fa (let f = g\{(n, a)}). Therefore, for every g, we have g Ca if and only if g : S(n) A and
g(n) = a.
By Collection (since A is a set), Separation, and Extensionality again, there is a set F such that S
for all
D, we have D F if and only if there exists a A with S
D = Ca . Notice that for all g, we have g F if
and only if there exists a A with g Ca . Let AS(n) = F. For all g, we then have g AS(n) if and only
g : S(n) A. Therefore, by induction, for every n , there is a set B such that for all f , we have f B
if and only if f : n A.
Proposition 10.7.10. Let A be a set. There is a unique set, denoted by A< , such that for all f , we have
f A< if and only if f An for some n .
Proof. By Collection (since is a set), Separation, and Extensionality, there is a S
unique set F such that for
all D, we have D F if and only if there exists n with D = An . Let A< = F . For every f , we then
have f A< if and only if f An for some n .

148

10.7.3

CHAPTER 10. DEVELOPING BASIC SET THEORY

Finite Products

Suppose that f is a function with dom(f ) = n . We want to consider the Cartesian product of the sets
indexed indexed by f .
Y
[
f = {g ( ran(f ))n : g(i) f (i) for all i < n}

10.8

Definitions by Recursion

Theorem 10.8.1 (Step Recursive Definitions on - Set Form). Let A be a set, let b A, and let g : A
A. There exists a unique function f : A such that f (0) = b and f (S(n)) = g(n, f (n)) for all n .
Proof. We first prove existence. Call a set Z A sufficient if (0, b) Z and for all (n, a) Z, we have
(S(n), g(n, a)) Z. Notice that sufficient sets exists (since A is sufficient). Let
Y = {(n, a) A : (n, a) Z for every sufficent set Z}.
We first show that Y is sufficient. Notice that (0, b) Y because (0, b) Z for every sufficient set Z. Suppose
now that (n, a) Y . For any sufficient set Z, we have (n, a) Z, hence (S(n), g(n, a)) Z. Therefore,
(S(n), g(n, a)) Z for every sufficient set Z, so (S(n), g(n, a)) Y . It follows that Y is sufficient.
We next show that for all n , there exists a unique a A such that (n, a) Y . Let
X = {n : there exists a unique a A such that (n, a) Y }.
Since Y is sufficient, we know that (0, b) Y . Suppose that d A and d 6= b. Since the set ( A)\{(0, d)}
is sufficient (because S(n) 6= 0 for all n ), it follows that (0, d)
/ Y . Therefore, there exists a unique
a A such that (0, a) Y (namely, a = b), so 0 X. Suppose now that n X, and let c be the unique
element of A such that (n, c) Y . Since Y is sufficient, we have (S(n), g(n, c)) Y . Fix d A with
d 6= g(n, c). We then have that Y \{(S(n), d)} is sufficient (otherwise, there exists a A such that (n, a) Y
and g(n, a) = d, contrary to the fact that in this case we have a = c by induction), so by definition of Y
it follows that Y Y \{(S(n), d)}. Hence, (S(n), d)
/ Y . Therefore, there exists a unique a A such that
(S(n), a) Y (namely, a = g(n, c)), so S(n) X. By induction, we conclude that X = , so for all n ,
there exists a unique a A such that (n, a) Y .
Let f = Y and notice that f : A from above. Since Y is sufficient, we have (0, b) Y , so f (0) = b. Let
n . Since (n, f (n)) Y and Y is sufficient, it follows that (S(n), g(n, f (n))) Y , so f (S(n)) = g(n, f (n)).
We now prove uniqueness. Suppose that f1 , f2 : A are such that:
1. f1 (0) = b.
2. f2 (0) = b.
3. f1 (S(n)) = g(n, f1 (n)) for all n .
4. f2 (S(n)) = g(n, f2 (n)) for all n .
Let X = {n : f1 (n) = f2 (n)}. Notice that 0 X because f1 (0) = b = f2 (0). Suppose that n X so
that f1 (n) = f2 (n). We then have
f1 (S(n)) = g(n, f1 (n)) = g(n, f2 (n)) = f2 (S(n))
hence S(n) X. It follows by induction that X = , so f1 (n) = f2 (n) for all n .

10.8. DEFINITIONS BY RECURSION

149

As an example of how to use this result (assuming we already know how to multiply - see below), consider
how to define the factorial function. We want to justify the existence of a unique function f : such
that f (0) = 1 and f (S(n)) = f (n) S(n) for all n . We can make this work as follows. Let A = , b = 1,
and define g : by letting g(a, n) = a S(n) (here we are thinking that the first argument of g will
contain the accumulated value f (n)). The theorem now gives the existence and uniqueness of a function
f : such that f (0) = 1 and f (S(n)) = f (n) S(n) for all n .
However, this begs the question of how to define multiplication. Lets start by thinking about how to
define addition. The basic idea is to define it recursively. For any m , we let m + 0 = m. If m , and
we know how to find m + n for some fixed n , then we should define m + S(n) = S(m + n). It looks like
an appeal to the above theorem is in order, but how do we treat the m that is fixed in the recursion? We
need a slightly stronger version of the above theorem which allows a parameter to come along for the ride.
The proof is basically the same so we just give a short sketch.
Theorem 10.8.2 (Step Recursive Definitions with Parameters on ). Let A and P be sets, let h : P A,
and let g : P A A. There exists a unique function f : P A such that f (p, 0) = h(p) for all
p P , and f (p, S(n)) = g(p, n, f (p, n)) for all p P and all n .
Proof. One could reprove this from scratch following the above outline, but we give a simpler argument
using Collection. For each p P , define gp : A A by letting gp (n, a) = g(p, n, a) for all (n, a) A.
Using the above results without parameters, for each fixed p P , there exists a unique function fp : A
such that fp (0) = h(p) and fp (S(n)) = gp (n, fp (n)) for all n . By Collection and Separation, we may
form the set {fp : p }. Let f be the union of this set. It is then straightforward to check that f is the
unique function satisfying the necessary properties.
Definition 10.8.3. Let h : be defined by h(m) = m and let g : be defined by
g(m, n, a) = S(a). We denote the unique f from the previous theorem by +. Notice that + : , that
m + 0 = m for all m , and that m + S(n) = S(m + n) for all m, n .
Now that we have the definition of +, we can prove all of the basic axiomatic facts about the natural
numbers with + by induction. Heres a simple example.
Proposition 10.8.4. 0 + n = n for all n .
Proof. The proof is by induction on n. For n = 0, simply notice that 0 + 0 = 0. Suppose that n and
0 + n = n. We then have 0 + S(n) = S(0 + n) = S(n). The result follows by induction.
A slightly more nontrivial example is a proof that + is associative.
Proposition 10.8.5. For all k, m, n , we have (k + m) + n = k + (m + n).
Proof. We fix k, m , and prove the result is my induction on n. Notice that (k + m) + 0 = k + m =
k + (m + 0). Suppose that we know the result for n, so that (k + m) + n = k + (m + n). We then have
(k + m) + S(n) = S((k + m) + n)
= S(k + (m + n))

(by induction)

= k + S(m + n)
= k + (m + S(n))
The result follows by induction.
Definition 10.8.6. Let h : be defined by h(m) = 0 and let g : be defined by
g(m, a, n) = a + m. We denote the unique f from the previous theorem by . Notice that : , that
m 0 = 0 for all m , and that m S(n) = m n + m for all m, n .

150

CHAPTER 10. DEVELOPING BASIC SET THEORY

From now on, we will present our recursive definitions in the usual mathematical style. For example, we
define iterates of a function as follows.
Definition 10.8.7. Let B be a set, and let h : B B be a function. We define, for each n , a function
hn by letting h0 = idB and letting hS(n) = h hn for all n .
For each fixed h : B B, this definition can be justified by appealing to the theorem with A = B B ,
b = idB , and g : A given by g(a, n) = h a. However, we will content ourselves with the above more
informal style when the details are straightforward and uninteresting.
The above notions of recursive definitions can only handle types of recursion where the value of f (S(n))
depends just on the previous value f (n) (and also n). Thus, it is unable to deal with recursive definitions such
as that used in defining the Fibonacci sequence where the value of f (n) depends on the two previous values
of f whenever n 2. We can justify these more general types of recursions by carrying along all previous
values of f in the inductive construction. Thus, instead of having our iterating function g : A A, where
we think of the first argument of g as carrying the current value f (n), we will have an iterating function
g : A< A, where we think of the first argument of g as carrying the finite sequence consisting of all values
f (m) for m < n. Thus, given such a g, we are seeking the existence and uniqueness of a function f : A
such that f (n) = g(f  n) for all n . Notice that in this framework, we no longer need to put forward a
b A as a starting place for f because we will have f (0) = g(). Also, we do not need to include a number
argument in the domain of g because the current n in the iteration can recovered as the domain of the single
argument of g.
Theorem 10.8.8 (Recursive Definitions on ). Let A be a set and let g : A< A. There exists a unique
function f : A such that f (n) = g(f  n) for all n .
Proof. We first prove existence. Call a set Z A sufficient if for all n and all q An such that
(k, q(k)) Z for all k < n, we have (n, g(q)) Z. Notice that sufficient sets exists (since A is sufficient).
Let
Y = {(n, a) A : (n, a) Z for every sufficent set Z}.
We first show that Y is sufficient. Suppose that n , that q An , and that (k, q(k)) Y for all k < n.
For any sufficient set Z, we have (k, q(k)) Z for all k < n, so (n, g(q)) Z. Therefore, (n, g(q)) Z for
every sufficient set Z, so (n, g(q)) Y . It follows that Y is sufficient.
We next show that for all n , there exists a unique a A such that (n, a) Y . Let
X = {n : there exists a unique a A such that (n, a) Y }.
Suppose that n is such that k X for all k < n. Let q = Y (n A) and notice that q An . Since
(k, q(k)) Y for all k < n and Y is sufficient, it follows that (n, g(q)) Y . Fix b A with b 6= g(q). We then
have that Y \{(n, b)} is sufficient (otherwise, there exists p An such that (k, p(k)) Y for all k < n and
g(p) = b, but this implies that p = q and hence b = a), so by definition of Y it follows that Y Y \{(n, b)}.
Hence, (n, b)
/ Y . Therefore, there exists a unique a A such that (n, a) Y , so n X. By induction, we
conclude that X = , so for all n , there exists a unique a A such that (n, a) Y .
Let f = Y and notice that f : A from above. Suppose that n . Let q = Y (n A) and notice
that q An and q = f  n. Since (k, q(k)) Y for all k < n and Y is sufficient, it follows that (n, g(q)) Y ,
so f (n) = g(q) = g(f  n).
We now prove uniqueness. Suppose that f1 , f2 : A are such that:
1. f1 (n) = g(f1  n) for all n .
2. f2 (n) = g(f2  n) for all n .

10.9. INFINITE SETS, POWERS, AND PRODUCTS

151

Let X = {n : f1 (n) = f2 (n)}. We prove by induction that X = . Let n and suppose that k X
for all k < n. We then have that f1  n = f2  n, hence
f1 (n) = g(f1  n) = g(f2  n) = f2 (n)
hence n X. It follows by induction that X = , so f1 (n) = f2 (n) for all n .
As above, there is a similar version when we allow parameters. If f : P A and p P , we use the
notation fp to denote the function fp : A given by fp (n) = f (p, n) for all n .
Theorem 10.8.9 (Recursive Definitions with Parameters on ). Let A and P be sets and let g : P A<
A. There exists a unique function f : P A such that f (p, n) = g(p, fp  n) for all p P and n .

10.9

Infinite Sets, Powers, and Products

Theorem 10.9.1 (Cantor-Bernstein). Let A and B be sets. If A  B and B  A, then A B.


Proof. We may assume that A and B are disjoint (otherwise, we can work with A {0} and B {1}, and
transfer the result back to A and B). Fix injections f : A B and g : B A. We say that an element a A
is B-originating if there exists b0 B and n such that b0
/ ran(f ) and a = (g f )n (g(b0 )). Similarly,
we say that an element b B is B-originating if there exists b0 B and n such that b0
/ ran(f ) and
b = (f g)n (b0 ). Let
h = {(a, b) A B : Either a is not B-originating and f (a) = b or a is B-originating and g(b) = a}
Notice that h is a function (because f is a function and g is injective), dom(h) A, and ran(h) B. We
first show that dom(h) = A. Let a A. If a is not B-originating, then (a, f (a)) h, hence a dom(h).
Suppose that a is B-originating, and fix b0 B and n with a = (g f )n (g(b0 )). If n = 0, then
a = g(b0 ), so (a, b0 ) h and hence a dom(h). Suppose that n 6= 0 and fix m with n = S(m).
We then have a = (g f )S(m) (g(b0 )) = (g f )(((g f )m (g(b0 )))) = g(f ((g f )m (g(b0 )))). Therefore,
(a, f ((g f )m (g(b0 )))) h, and hence a dom(h). It follows that dom(h) = A.
We now know that h : A B, and we need only show that h is a bijection. Let a1 , a2 A and suppose
that h(a1 ) = h(a2 ). We first show that either a1 and a2 are both B-originating or both a1 and a2 are both not
B-originating. Without loss of generality, suppose that a1 is B-originating and a2 is not, so that a1 = g(h(a1 ))
and h(a2 ) = f (a2 ). Since a1 is B-originating, we may fix b0 B and n such that b0
/ ran(f ) and
a1 = (g f )n (g(b0 )). Notice that (g f )n (g(b0 )) = a1 = g(h(a1 )) = g(h(a2 )) = g(f (a2 )) = (g f )(a2 ). If
n = 0, this implies that g(b0 ) = g(f (a2 )), hence f (a2 ) = b0 (because g is injective), contrary to the fact that
b0
/ ran(f ). Suppose that n 6= 0 and fix m with S(m) = n. We then have (g f )((g f )m (g(b0 ))) =
(g f )n (g(b0 )) = (g f )(a2 ), hence (g f )m (g(b0 )) = a2 (because g f is injective), contrary to the fact
that a2 is not B-originating. Therefore, either a1 and a2 are both B-originating or both a1 and a2 are both
not B-originating. If a1 and a2 are both not B-originating, this implies that f (a1 ) = f (a2 ), hence a1 = a2
because f is injective. If a1 and a2 are both B-originating, we then have a1 = g(h(a1 )) = g(h(a2 )) = a2 . It
follows that h is injective.
We finally show that h is surjective. Fix b B. Suppose first that b is B-originating, and fix b0 B and
n such that b0
/ ran(f ) and b = (f g)n (b0 ). We then have g(b) = g((f g)n (b0 )) = (g f )n (g(b0 )),
hence g(b) A is B-originating. It follows that h(g(b)) = b, so b ran(h). Suppose now that b is
not B-originating. We then must have b ran(f ), so we may fix a A with f (a) = b. If a is Boriginating, we may fix b0 B and n such that b0
/ ran(f ) and a = (g f )n (g(b0 )), and notice that
S(n)
n
(f g)
(b0 ) = f ((g f ) (g(b0 ))) = f (a) = b, conrary to the fact that b is not B-originating. Therefore, a
is not B-originating, so h(a) = f (a) = b, and hence b ran(h). It follows that h is surjective.
Definition 10.9.2. Let A and B be sets. We write A B to mean that A  B and A 6 B.

152

CHAPTER 10. DEVELOPING BASIC SET THEORY

Theorem 10.9.3. For any set A, we have A P(A).


Proof. First, define a function f : A P(A) by letting f (a) = {a} for every a A. Notice that f is an
injection, hence A  P(A). We next show that A 6 P(A) by showing that there is no bijection f : A P(A).
Suppose then that f : A P(A). Let B = {a A : a
/ f (a)}, and notice that B P(A). Suppose that
B ran(f ), and fix b A with f (b) = B. We then have b f (b) b B b
/ f (b), a contradiction. It
follows that B
/ ran(f ), hence f is not surjective. Therefore, A P(A).

10.9.1

Countable Sets

Definition 10.9.4. Let A be a set.


1. A is countably infinite if A .
2. A is countable if A is either finite or countably infinite.
3. A is uncountable if A is not countable.
Proposition 10.9.5. Let A be a set. The following are equivalent:
1. A is countable.
2. A  .
3. There is a surjection g : A.
Proof.

10.9.2

General Powers

There is no reason to restrict to n in the above examples. In general, we want to define AB to be the
set of all functions from B to A. We can certainly make this definition, but it is the first instance where we
really need to use Power Set.
Proposition 10.9.6. Let A and B be sets. There is a unique set, denoted by AB , such that for all f , we
have f AB if and only if f : B A.
Proof. Notice that if f : B A, then f B A, hence f P(B A). Therefore, AB = {f P(B A) : f
is a function, dom(f ) = B, and ran(f ) = A}. As usual, uniqueness follows from Extensionality.

10.9.3

General Products

Chapter 11

Doing Mathematics in Set Theory


11.1

The Basic Number Systems

11.2

Doing Logic In Set Theory

153

154

CHAPTER 11. DOING MATHEMATICS IN SET THEORY

Chapter 12

Well-Orderings, Ordinals, and


Cardinals
12.1

Well-Orderings

The ability to do induction and make definitions by recursion on was essential to developing the basic
properties of the natural numbers. With such success, we may wonder on which other kinds of structures
we can do induction and recursion. Looking at the Step Induction Principle and Step Recursive Definitions
on , it seems hard to generalize these ideas to anything more complicated than because by starting with
zero and taking successors we cant get any further. However, the more general versions of induction and
recursion which refer to the order on rather than just 0 and successors can be very fruitfully generalized
to any well-ordering.
Proposition 12.1.1 (Induction on Well-Orderings). Let (W, <) be a well-ordering.
1. Suppose that X is set and for all z W , if y X for all y < z, then z X. We then have W X.
2. For any formula (z,~p), we have the sentence
~p((z W )((y < z)(y,~p) (z,~p)) (z W )(z,~p))
3. Suppose that C is a class and for all z W , if y C for all y < z, then z C. We then have W C.
Proof.
1. Suppose that W * X so that W \X 6= . Since (W, <) is a well-ordering, there exists z W \X such
that for all y W \X, either z = y or z < y. Therefore, for all y W with y < z, we have y X
(becaues y
/ W \X). It follows from assumption that z X, contradicting the fact that z W \X.
Thus, it must be the case that W X.
2. This follows from part 1 using Separation. Fix sets ~q, and suppose that
(z W )((y < z)(y, ~q) (z, ~q))
Let X = {z W : (n, ~q)}. Suppose that z W and y X for all y < z. We then have
(y < z)(y, ~q), hence (z, ~q) by assumption, so z X. It follows from part 1 that W X. Therefore,
we have (z W )(z, ~q).
155

156

CHAPTER 12. WELL-ORDERINGS, ORDINALS, AND CARDINALS

3. This is just a restatement of 2 using the language of classes.

This is all well and good, but are there other interesting well-orderings other than (and every n )?
Well, any well-ordering has a smallest element. If there are any elements remaining, there must be a next
smallest element. Again, if there are any elements remaining, there must be a next smallest element, and so
on. Thus, any well-ordering begins with a piece that looks like .
However, we can build another longer well-ordering by taking , and adding a new element which is
greater than every element of . This can be visualized by thinking of the set
A = {1

1
R : n \{0}} {1}.
n

Its a simple exercise to check that A, ordered by inheritance from the usual order on R, is a well-ordering.
We can then add another new element which is greater than every element, and another and another and
so on, to get a well-ordering that is a copy of with another copy of on top of the first. We can add a
new element greater than all of these, and continue. These well-orderings beyond differ from (and all
n ) in that they have points that are neither initial points nor immediate successors of other points.
Definition 12.1.2. Let (W, <) be a well-ordering, and let z W .
1. If z y for all y W , we call z the initial point (such a z is easily seen to be unique).
2. If there exists y W such that there is no x W with y < x < z, we call z a successor point.
3. If z is neither an initial point nor a successor point, we call z a limit point.
A little thought will suggest that all well-orderings should be built up by starting at an initial point,
taking successors (perhaps infinitely often), and then jumping to a limit point above everything previously.
After all, if we already have an initial part that looks like , and we havent exhausted the well-ordering,
then there must be a least element not accounted for, and this is the first limit point. If we still havent
exhausted it, there is another least element, which is a successor, and perhaps another successor, and so on.
If this doesnt finish off the well-ordering, there is another least element not accounted for which will be the
second limit point.
This idea makes it seem plausible that we can take any two well-orderings and compare them by running
through this procedure until one of them runs out of elements. That is, if (W1 , <1 ) and (W2 , <2 ) are wellorderings, then either they are isomorphic, or one is isomorphic to an initial segment of the other. We now
develop the tools to prove this result. We first show that we can make recursive definitions along wellorderings. The proof is basically the same as the proof of the Induction Principle on because the only
important fact that allowed that argument to work was the property of the order < on (not the fact that
every element of was either an initial point or a successor point).
Definition 12.1.3. Let (W, <) be a well-ordering, and let z W . We let W (z) = {y W : y < z}.
Definition 12.1.4. Let (W, <) be a well-ordering. A set I W is called an initial segment of W if I 6= W
and whenever x I and y < x, we have y I.
Proposition 12.1.5. Suppose that (W, <) is a well-ordering and I is an initial segment of W . There exists
z W with I = W (z).
Proof. Since I is an initial seqment of W , we have I W and I 6= W . Hence, W \I 6= . Since (W, <)
is a well-ordering, there exists z W \I such that z y for all y W \I. We claim that I = W (z). If
y W (z), we then have y
/ W \I (because y < z), hence y I. Therefore, W (z) I. Suppose that y I
and y
/ W (z). We then have y z, hence z I because I is an initial segment, contradicting the fact that
z W \I. It follows that I W (z). Therefore, I = W (z) by Extensionality.

12.1. WELL-ORDERINGS

157

Definition 12.1.6. Let (W, <) be a well-ordering and let A be a set. We let
A<W = {f P(W A) : f is a function and f : W (z) A for some z W }
Theorem 12.1.7 (Recursive Definitions on Well-Orderings). Let (W, <) be a well-ordering, let A be a set,
and let g : A<W A. There exists a unique function f : W A such that f (z) = g(f  W (z)) for all
z W.
Proof. We first prove existence. Call a set Z W A sufficient if for all z W and all q AW (z) such that
(y, q(y)) Z for all y < z, we have (z, g(q)) Z. Notice that sufficient sets exists (since W A is sufficient).
Let
Y = {(z, a) W A : (z, a) Z for every sufficent set Z}.
We first show that Y is sufficient. Suppose that z W , that q AW (z) , and that (y, q(y)) Y for all y < z.
For any sufficient set Z, we have (y, q(y)) Z for all y < z, so (z, g(q)) Z. Therefore, (z, g(q)) Z for
every sufficient set Z, so (z, g(q)) Y . It follows that Y is sufficient.
We next show that for all z W , there exists a unique a A such that (z, a) Y . Let
X = {z W : there exists a unique a A such that (z, a) Y }.
Suppose that z W is such that y X for all y < z. Let q = Y (W (z) A) and notice that q AW (z) .
Since (y, q(y)) Y for all y < z and Y is sufficient, it follows that (z, g(q)) Y . Fix b A with b 6= g(q).
We then have that Y \{(z, b)} is sufficient (otherwise, there exists p AW (z) such that (y, p(y)) Y for all
y < z and g(p) = b, but this implies that p = q and hence b = a), so by definition of Y it follows that
Y Y \{(z, b)}. Hence, (z, b)
/ Y . Therefore, there exists a unique a A such that (z, a) Y , so z X.
By induction, we conclude that X = W , so for all z W , there exists a unique a A such that (z, a) Y .
Let f = Y and notice that f : W A from above. Suppose that z W . Define q AW (z) by letting
q = Y (W (z) A) and notice that q = f  W (z). Since (y, q(y)) Y for all y < z and Y is sufficient, it
follows that (z, g(q)) Y , so f (z) = g(q) = g(f  W (z)).
We now prove uniqueness. Suppose that f1 , f2 : W A are such that:
1. f1 (z) = g(f1  W (z)) for all z .
2. f2 (z) = g(f2  W (z)) for all z .
Let X = {z W : f1 (z) = f2 (z)}. We prove by induction that X = W . Let z W and suppose that y X
for all y < z. We then have that f1  W (z) = f2  W (z), hence
f1 (z) = g(f1  W (z)) = g(f2  W (z)) = f2 (z)
hence z X. It follows by induction that X = W , so f1 (z) = f2 (z) for all z W .
Definition 12.1.8. Let (W1 , <1 ) and (W2 , <2 ) be well-orderings.
1. A function f : W1 W2 is order-preserving if whenever x, y W1 and x <1 y, we have f (x) <2 f (y).
2. A function f : W1 W2 is an isomorphism if it is bijective and order-preserving.
3. If W1 and W2 are isomorphic, we write W1
= W2 .
Proposition 12.1.9. Suppose that (W, <) is a well-ordering and f : W W is order-preserving. We then
have f (z) z for all z W .
Proof. We prove the result by induction on W . Suppose that z W and f (y) y for all y < z. Suppose
instead that f (z) < z, and let x = f (z). Since f is order-preserving and x < z, it follows that f (x) <
f (z) = x, contradicting the fact that f (y) y for all y < z. Therefore, f (z) z. The result follows by
induction.

158

CHAPTER 12. WELL-ORDERINGS, ORDINALS, AND CARDINALS

Corollary 12.1.10.
1. If (W, <) is a well-ordering and z W , then W  W (z).
2. If (W, <) is a well-ordering, then its only automorphism is the identity.
3. If (W1 , <1 ) and (W2 , <2 ) are well-orderings, and W1
= W2 , then the isomorphism from W1 to W2 is
unique.
Proof.
1. Suppose that W
= W (z) for some z W and let f : W W (z) be a witnessing isomorphism. Then
f : W W is order-preserving and f (z) < z (because f (z) W (z)), contrary to Proposition 12.1.9.
2. Suppose that f : W W is an automorphism of W which is not the identity. By Proposition 12.1.9,
we have f (z) z for all z W . Suppose that z W and let y = f (z). Since f 1 : W W is also
an automorphism of W , Proposition 12.1.9 implies that f 1 (y) y, hence z f (z). Combining this
with the above mentioned fact that f (z) z, it follows that z = f (z). Therefore, f is the identity.
3. Suppose that f : W1 W2 and g : W1 W2 are both isomorphisms. We then have that g 1 : W2 W1
is an isomorphism, hence g 1 f : W1 W1 is an automorphism. Hence, by part b, we may conclude
that g 1 f is the identity on W1 . It follows that f = g.

Theorem 12.1.11. Let (W1 , <1 ) and (W2 , <2 ) be well-orderings. Exactly one of the following holds.
1. W1
= W2 .
2. There exists z W2 such that W1
= W2 (z).
3. There exists z W1 such that W1 (z)
= W2 .
In each of the above cases, the isomorphism and the z (if appropriate) are unique.
Proof. We first prove that one of the three options holds. Fix a set a such that a
/ W1 W2 (such an
a exists by Proposition 10.1.4). Our goal is to define a function f : W1 W2 {a} recursively. Define
g : (W2 {a})<W1 W2 {a} as follows. Let q (W2 {a})<W1 and fix z W1 such that q : W1 (z)
W2 {a}. If a ran(q) or ran(q) = W2 , let g(q) = a. Otherwise ran(q) is a proper subset of W2 , and we let
g(q) be the <2 -least element of W2 \ran(q). By Theorem 12.1.7, there is a unique f : W1 W2 {a} such
that f (z) = g(f  W1 (z)) for all z W1 .
Suppose first that a
/ ran(f ) so that f : W1 W2 . We begin by showing that ran(f  W1 (z)) is an initial
segment of W2 for all z W1 by induction. Suppose that z W1 and ran(f  W1 (y)) is an initial segment of
W2 for all y < z. If z is the initial point of W1 , then ran(f  W1 (z)) = is certainly an initial segment of W2 .
Suppose that z is a successor point of W1 , and let y W1 be such that there is no x W1 with y < x < z. By
induction, we know that ran(f  W1 (y)) is an initial segment of W2 . Since f (y) = g(f  W1 (y)) is the <2 -least
element of W2 \(f  W1 (y)), it follows that ran(f  W1 (z)) = ran(f  W1 (y)) {f (y)} isSan initial segment of
W2 . Suppose finally that z is a limit point of W1 . It then follows that ran(f  W1 (z)) = y<z ran(f  W1 (y)).
Since every element of the union is an initial segment of W2 , it follows that ran(f  W1 (z)) is an initial segment
of W2 (note that it cant equal W2 because f (z) 6= a).
Therefore, ran(f  W1 (z)) is an initial segment of W2 for all z W1 by induction. It follows that for
all y, z W1 with y < z, we have f (y) < f (z) (because ran(f  W1 (z)) is an initial seqment of W1 and
f (y) ran(f  W1 (z))), so f is order-preserving. This implies that f is an injection, so if ran(f ) = W2 , we
have W1
= W2 . Otherwise, ran(f ) is an initial segment of W2 , so by Proposition 12.1.5 there is a z W2
such that W1
= W2 (z).

12.2. ORDINALS

159

Suppose now that a ran(f ). Let z W1 be the <1 -least element of W1 such that f (z) = a. It
then follows that f  W1 (z) : W1 (z) W2 is order-preserving by induction as above. Also, we must have
ran(f  W1 (z)) = W2 because f (z) = a. Therefore, f  W1 (z) : W1 (z) W2 is an isomorphism. This
completes the proof that one of the above 3 cases must hold.
The uniqueness of the case, the isomorphism, and the z (if appropriate), all follow from Corollary 12.1.10
With this result in hand, we now know that any well-ordering is uniquely determined by its length.
The next goal is to find a nice system of representatives for the isomorphism classes of well-orderings. For
that, we need to generalize the ideas that went into the construction of the natural numbers.

12.2

Ordinals

Our definition of the natural numbers had the advantage that the ordering was given by the membership
relation . This feature allowed us to define successors easily and to think of a natural number n as the set
of all natural numbers less than n. We now seek to continue this progession to measure well-orderings longer
than . The idea is to define successors as in the case of the natural numbers, but now to take unions to
acheive limit points.
The key property of (and each n ) that we want to use in our definition of ordinals is the fact that
well-orders (and each n ). We need one more condition to ensure that there are no holes or gaps
in the set. For example, well-orders the set {0, 2, 3, 5}, but we dont want to consider it as an ordinal
because it skipped over 1 and 4. We therefore make the following definition.
Definition 12.2.1. A set z is transitive if whenever x and y are sets such that x y and y z, we have
x z.
Definition 12.2.2. Let z be a set. We define a relation z on z by setting z = {(x, y) z z : x y}.
Definition 12.2.3. An ordinal is a set which is transitive and well-ordered by .
Our hard work developing the natural numbers gives us one interesting example of an ordinal.
Proposition 12.2.4. is an ordinal.
Proof. Proposition 10.5.11 says that is transitive, and Theorem 10.5.18 says that is well-ordered by
< = .
Proposition 12.2.5. If is an ordinal and , then is an ordinal.
Proof. We first show that is transitive. Let x and y be sets with x y and y . Since y , , and
is transitive, it follows that y . Since x y and y , it follows that x . Now since x, y, ,
x y, y , and is transitive on , we may conclude that x . Therefore, is transitive.
Notice that because and is transitive. Therefore, is the restriction of to the subset
. Since is a well-ordering on , it follows that is a well-ordering on . Hence, is an ordinal.
Corollary 12.2.6. Every n is an ordinal.
Lemma 12.2.7. If is an ordinal, then
/ .
Proof. Suppose that is an ordinal and . Since , it follows that is not asymmetric on ,
contradicting the fact that is a well-ordering on .
Proposition 12.2.8. If is an ordinal, then S() is an ordinal.

160

CHAPTER 12. WELL-ORDERINGS, ORDINALS, AND CARDINALS

Proof. We first show that S() is transitive. Suppose that x y S(). Since y S() = {}, either
y or y = . Suppose first that y . We then have x y , so x because is transitive. Hence,
x S(). Suppose now that y = . We then have x because x y, so x S().
We next show that S() is transitive on S(). Let x, y, z S() with x y z. Since z S(), either
z or z = . Suppose first that z . We then have y (since y z and is transitive), and
hence x (since x y and is transitive). Thus, x, y, z , so we may conclude that x z using
the fact that is transitive on . Suppose now that z = . We then have x = z because x y
and is transitive.
We next show that S() is asymmetric on S(). Let x S(). If x , then x
/ x because is
asymmetric on . If x = , then x
/ x by Lemma 12.2.7.
We now show that S() is connected on S(). Let x, y S(). If x and y , then either x y,
x = y, or y x because is connected on . If x = and y = , we clearly have x = y. Otherwise, one
of x, y equals , and the other is an element of , if which case were done.
Finally, suppose that X S() and X 6= . If X = , then we must have X = {}, in which case
X clearly has a S() -least element. Suppose that X 6= . Since X is nonempty and is a
well-ordering on , there exists a -least element in X . For any X, either in which case we
have have either = or by choice of , or = in which case (because ). Therefore,
X has a S() -least element.
Proposition 12.2.9. Suppose that and are ordinals. We then have if and only if either =
or .
Proof. () If = , then clearly and if we can use the fact that is transitive to conclude
that .
() Suppose that and 6= . Notice that \ is an nonempty subset of , so there exists a
-least element of \, call it z. We show that = z, hence . We first show that z . Let x z.
Since z and is transitive, we have x . Since x z, we can not have x \ by choice of z, so
x . Thus, z . We next show that z. Let x . Since , we have x . Using the fact
that x, z and is connected on , we know that either x z, x = z, or z x. We can not have x = z
because x and z \. Also, we can not have z x, because if z x we can also conclude that z
(because z x and is transitive), contadicting the fact that z \. Thus, z. It follows that
z = (by Extensionality), so .
Proposition 12.2.10. Suppose that and are ordinals. Exactly one of , = , or holds.
Proof. We first show that at least one , = , holds. We first claim that is an ordinal.
If x y , then x y and x y , so x and x (because and are transitive), and
hence x . Thus, is transitive. Notice that is the restriction of to the subset .
Since is a well-ordering on , it follows that is a well-ordering on . Hence, is an ordinal.
Now we have and . If 6= and 6= , then and by
Proposition 12.2.9, hence , contrary to Lemma 12.2.7. Therefore, either = or = .
If = , we then have , hence either = or by Proposition 12.2.9. Similarly, if = ,
we then have , hence either = or by Proposition 12.2.9. Thus, in any case, at least one
, = , or holds.
We finish by showing that exactly one of , = , or holds. If and = , then ,
contrary to Lemma 12.2.7. Similary, if = and , then , contrary to Lemma 12.2.7. Finally, if
and , then (because is transitive), contrary to Lemma 12.2.7.
Definition 12.2.11. If and are ordinals, we write < to mean that .
Proposition 12.2.12. Suppose that and are ordinals. If
= as well-orderings, then = .

12.2. ORDINALS

161

Proof. If 6= , then either < or < by Proposition 12.2.10. Suppose without loss of generality that
< . We then have that the well-ordering is an initial segments of the well-ordering (in the notation
for well-orderings, we have = ()), hence  by Corollary 12.1.10.
By the above results, it seems that we are in a position to say that < is a linear ordering on the collection
of all ordinals. However, there is a small problem here. We do not know that the class of all ordinals is a
set. In fact, we will see below that the collection of all ordinals is a proper class.
Definition 12.2.13. ORD is the class of all ordinals.
We first establish that nonempty sets of ordinals have least elements.
Proposition 12.2.14. IfTA is a nonempty subset of ORD, then A has a least element. Furthermore the
least element is given by A.
Proof. Since A 6= , we may fix an ordinal A. If A = , then for any A, we can not have ,
hence either = or by Proposition 12.2.10. Suppose that A 6= . Since A is nonempty,
it has an -least element, call it . Let A and notice that is an ordinal. By Proposition 12.2.10,
either , = , or . If , then A , so either = or by choice of . If = ,
then because . If , we then have , so because is transitive. It follows that
is the least element of A.
T
Therefore, we know that A has a least element, call it . Since A, we certainly have A T. For
all A, we then
T have either = or , hence by Proposition 12.2.9. Therfore, A. It
follows that = A.
S
S
Proposition
12.2.15. If A is aSsubset of ORD, then A is an ordinal. Furthermore, we have A = sup A,
S
i.e. A for all A and A whenever is an ordinal with for all A.
S
S
S
Proof. We first show that A is transitive. Suppose that x y A. Since y A, there exists A,
necessarily an ordinal,
y A. Since is transitive and x y , we can conclude that x .
S such that S
It follows that x A. Hence, A is transitive.
S
S
S
We next show that S A is transitive on A. Let x, y, z A with x y z. Since z A, there
exists A, necessarily an ordinal, such that z A. Since z and is an ordinal, we may
use Proposition 12.2.5 to conclude that z is an ordinal. Thus, z is transitive, so we may use the fact that
x y z to conclude that x z.
S
S
We next show that S A is asymmetric on A. Let x A and fix A, necessarily an ordinal, such
that x A. Using Proposition 12.2.5 again, it follows that x is an ordinal, hence x
/ x by Lemma
12.2.7.
S
S
We now show that S A is connected on A. Let x, y A. Fix , A, necessarily ordinals, such
that x A and y A. Again, using Proposition 12.2.5, we may conclude that x and y are ordinals,
hence either x y, x = y, or yS x by Proposition 12.2.10
Finally, suppose that X A and X 6= . Notice that for any y X, there exists A, necessarily an
ordinal, such that y A, and hence y is an ordinal by Proposition 12.2.10. Therefore, X is a nonempty
subset of ORD, so by Proposition 12.2.14 we may conclude that X has a least element (with respect to
S A ).
S
We
S now show that A =Ssup A. Suppose
S that A. For any , we
S have A, hence
A. It follows that A, hence A by Proposition 12.2.9. Thus, A is an upper boundSfor
A. Suppose that is an upper bound for A, i.e. is an ordinal and for all A. For
S any A,
we
may
fix

A
such
that

and
notice
that

,
so

.
It
follows
that
A , hence
S
S
A by Proposition 12.2.9. Therefore, A = sup A.
Proposition 12.2.16. ORD is a proper class.

162

CHAPTER 12. WELL-ORDERINGS, ORDINALS, AND CARDINALS

Proof. Suppose that ORD is a set, so that there is a set O such that is an ordinal if and only O. In
this case, O is a transitive set (by Proposition 12.2.5) which is well-ordered by O (transitity follows from
the fact that ordinals are transitive sets, asymmetry follows from Lemma 12.2.7, connectedness follows from
Proposition 12.2.10, and the fact that every nonempty subset has a least element is given by Proposition
12.2.14). Therefore, O is an ordinal and so it follows that O O, contrary to Lemma 12.2.7. Hence, ORD
is not a set.
Since ORD is a proper class, there are subclasses of ORD which are not subsets of ORD. We therefore
extend Proposition 12.2.14 to the case of nonempty subclasses of ORD. The idea is that if we fix an C,
then C becomes a set of ordinals, so we can apply the above result.
Proposition 12.2.17. If C is a nonempty subclass of ORD, then C has a least element.
Proof. Since C 6= , we may fix an ordinal C. If C = , then for any C, we can not have
, hence either = or by Proposition 12.2.10. Suppose that C 6= . In this case, C is
a nonempty set of ordinals by Separation, hence C has a least element by Proposition 12.2.14. It now
follows easily that is the least element of C.
Proposition 12.2.18 (Induction on ORD). Suppose that C ORD and that for all ordinals , if C
for all < , then C. We then have C = ORD.
Proof. Suppose that C ( ORD. Let B = ORD\C and notice that B is a nonempty class of ordinals. By
Proposition 12.2.17, it follows that B has a least element, call it . For all < , we then have
/ B,
hence C. By assumption, this imples that C, a contradiction. It follows that C = ORD.
This gives a way to do strong induction on the ordinals, but there is a slightly more basic version. We
cant get around looking at many previous values at limit ordinals, but we can by with just looking at the
previous ordinal in the case of successors.
Proposition 12.2.19 (Step/Limit Induction on ORD). Suppose that C ORD and that
1. 0 C.
2. Whenever C, we have S() C.
3. Whenever is a limit ordinal and C for all < , we have C.
We then have C = ORD.
Proof. Suppose that C ( ORD. Let B = ORD\C and notice that B is a nonempty class of ordinals. By
Proposition 12.2.17, it follows that B has a least element, call it . We cant have = 0 because 0 C.
Also, it is not possible that is a successor, say = S(), because if so, then
/ B (because < ), so
C, hence = S() C. Finally, suppose that is a limit. Then for for all < , we have
/ B,
hence C. By assumption, this imples that C, a contradiction. It follows that C = ORD.
Theorem 12.2.20 (Recursive Definitions on ORD). Let G : V V be a class function. There exists a
unique class function F : ORD V such that F() = G(F  ) for all ORD.
Theorem 12.2.21 (Recursive Definitions with Parameters on ORD). Let P be a class and let G : P V
V be a class function. There exists a unique class function F : PORD V such that F(p, ) = G(Fp  )
for all p P and all ORD.
Theorem 12.2.22. Let (W, <) be a well-ordering. There exists a unique ordinal such that W
= .

12.3. ARITHMETIC ON ORDINALS

163

Proof. Fix a set a such that a


/ W (such an a exists by Proposition 10.1.4). We define a class function
F : ORD W {a} recursively as follows. If a ran(F  ) or ran(F  ) = W , let F() = a. Otherwise,
ran(F  ) ( W , and we let F() be the least element of W \ran(F  ).
Since ORD is a proper class, it follows from Proposition 10.6.4 that F is not injective. From this it
follows that a ran(F) (otherwise, a simple inductive proof gives that F would have to be injective). Let
be the least ordinal such that F() = a. Now it is straightforward to prove (along the lines of the proof of
Theorem 12.1.11) that F  : W is an isomorphism.
Uniqueness follows from Proposition 12.2.12
Definition 12.2.23. Let (W, <) be a well-ordering. The unique ordinal such that W
= is called the
order-type of (W, <).

12.3

Arithmetic on Ordinals

Definition 12.3.1. We define ordinal addition (that is a class function + : ORD ORD ORD) recursively as follows.
1. + 0 = .
2. + S() = S( + ).
S
3. + = { + : < } if is a limit ordinal.
Similarly, we define ordinal multiplication recursively as follows.
1. 0 = 0.
2. S() = + .
S
3. = { : < } if is a limit ordinal.
Finally, we define ordinal exponentiation recursively as follows.
1. 0 = 1.
2. S() = .
S
3. = { : < } if is a limit ordinal.
Proposition 12.3.2. Let , , and be ordinals. If , then + + .
Proof. Fix ordinals and . We prove by induction on that if , then + + . If = , this
is trivial. Suppose that and we know the result for . We then have
+ +
< S( + )
= + S()
Suppose now that > is a limit ordinal. We then have
[
+ { + : < }
=+

(since < )

164

CHAPTER 12. WELL-ORDERINGS, ORDINALS, AND CARDINALS

Proposition 12.3.3. Let , , and be ordinals. We have < if and only if + < + .
Proof. Notice first that
+ < S( + ) = + S()
Now for any ordinal > , we have S() , hence
+ < + S() +

Proposition 12.3.4. Let and be ordinals. If is a limit ordinal, then + is a limit ordinal.
Proof. Since is a limit ordinal, we have
+ =

{ + : < }

Suppose now that < + , and fix an ordinal < such that < + . We then have that S() <
because is a limit ordinal, hence
S() < S( + )
= + S()
+
It follows that + is a limit ordinal.
Proposition 12.3.5. Let , , and be ordinals. We have ( + ) + = + ( + ).
Proof. Fix ordinals and . We prove that ( + ) + = + ( + ) for all ordinals by induction. Suppose
first that = 0. We then have
( + ) + 0 = +
= + ( + 0)
Suppose now that is an ordinal and we know that ( + ) + = + ( + ). We then have
( + ) + S() = S(( + ) + )
= S( + ( + ))

(by induction)

= + S( + )
= + ( + S())
Suppose now that is a limit ordinal and we know that ( + ) + = + ( + ) for all < . We then
have
[
( + ) + = {( + ) + : < }
[
= { + ( + ) : < }
[
= { + : < + }
= + ( + )
where the last line follows because + is a limit ordinal.

12.4. CARDINALS

12.4

165

Cardinals

Definition 12.4.1. A cardinal is an ordinal such that 6 for any < .


Proposition 12.4.2. Every n is a cardinal, and is a cardinal.
Proposition 12.4.3. Every infinite cardinal is a limit ordinal.
Proposition 12.4.4. Let A be a set. There is an ordinal such that 6 A.
Proof. Let F = {(B, R) P(A) P(A A) : R is a well-ordering on B}. By Colletion and
S Separation,
A = {order-type(B, R) : (B, R) F} is a set of ordinals. Let be an ordinal such that > A (such an
exists because ORD is a proper class). Notice that 6 A because if f : A were an injection, we could
let B = ran(f ) and let R be the well-ordering on B obtained by transferring the ordering of . We would
then have A since (B, R) F and (B, R) has order-type , a contradiction. It follows that 6 A.
Definition 12.4.5. Let A be a set. The least ordinal such that 6 A is called the Hartogs number of A,
and is denoted by H(A).
Proposition 12.4.6. H(A) is a cardinal for every set A.
Proof. Let A be a set and let = H(A). Suppose that < and . Let f : be a bijection.
Since < = H(A), there exists an injection g : A. We then have that g f : A is an injection,
contrary to the fact that 6 A. It follows that 6 for any < , so H(A) = is a cardinal.
Definition 12.4.7. If is a cardinal, we let + = H().
Definition 12.4.8. We define for ORD by
1. 0 = .
2. +1 = +
.
S
3. = { : < } if is a limit ordinal.
The following proposition can be proved by a straightforward induction.
Proposition 12.4.9. Let and be ordinals.
1. .
2. If < , then < .
Proposition 12.4.10. Let be an ordinal. is an infinite cardinal if and only if there exists ORD
with = .
Proof. We first prove that is an infinite cardinal for all ORD by induction. Notice that 0 =
is a cardinal by Proposition 12.4.2. Also, if is a cardinal, then +1 = +
= H( ) is a cardinal by
Proposition 12.4.6. Suppose then that is a limit ordinal and that is a cardinal S
for all < . Notice
that is an ordinal by Proposition 12.2.15. Suppose that < . Since < = { : < }, there
exists < such that < . Notice that + 1 < since < and is a limit ordinal. Since +1 6 ,
it follows that +1 6 , so 6 . Therefore 6 for any < , hence is a cardinal.
Suppose now that is an infinite cardinal. By Proposition 12.4.9, we have . If = , we are
done. Suppose then that < let be the least ordinal such that < . Notice that 6= 0 because
is infinite and can not be a limit ordinal (otherwise, < for some < ). Thus, there exists such
that = S(). By choice of , we have . If < , then < < S() = H( ), contradicting
the definition of H( ). It follows that = .

166

CHAPTER 12. WELL-ORDERINGS, ORDINALS, AND CARDINALS

Proposition 12.4.11. Let A be a set. There exists an ordinal such that A if and only if A can be
well-ordered.
Proof. Suppose first that there exists an ordinal such that A . We use a bijection between A and
to transfer the ordering on the ordinals to an ordering on A. Let f : A be a bijection. Define a relation
< on A by letting a < b if and only if f (a) < f (b). It is then straightforward to check that (A, <) is a
well-ordering (using the fact that (, ) is a well-ordering).
For the converse direction, suppose that A can be well-ordered. Fix a relation < on A so that (A, <) is a
well-ordering. By Theorem 12.2.22, there is an ordinal such that A
= . In particular, we have A .
Of course, this leaves open the question of which sets can be well-ordered. Below, we will use the Axiom
of Choice to show that every set can be well-ordered.
Definition 12.4.12. Let A be a set which can be well-ordered. We define |A| to be the least ordinal such
that A .
Lemma 12.4.13. If A can be well-ordered, then |A| is a cardinal.

12.5

Addition and Multiplication Of Cardinals

Lemma 12.5.1. Let A1 , A2 , B1 , B2 be sets with A1 A2 and B1 B2 .


1. (A1 {0}) (B1 {1}) (A2 {0}) (B2 {1}).
2. A1 B1 A2 B2 .
This lemma gives us a reasonable way to define the sum and product of two cardinals. Let and be
cardinals. Notice that ( {0}) ( {1}) and can be well-ordered. This allows us to make the
following definition.
Definition 12.5.2. Let and be cardinals.
1. + = |( {0}) ( {1})|.
2. = | |.
Proposition 12.5.3. Let and be cardinals.
1. + = + .
2. = .
Definition 12.5.4. We define an ordering < on ORD ORD as follows. Let 1 , 1 , 2 , 2 be ordinals.
We set (1 , 1 ) < (2 , 2 ) if one of the following holds.
1. max{1 , 1 } < max{2 , 2 }.
2. max{1 , 1 } = max{2 , 2 } and 1 < 2 .
3. max{1 , 1 } = max{2 , 2 }, 1 = 2 , and 1 < 2 .

12.5. ADDITION AND MULTIPLICATION OF CARDINALS

167

Lemma 12.5.5. < is a well-ordering on ORD ORD.


Proof. Transitivity, asymmetry, and connectedness are easily shown by appealing to the transitivity, asymmetry, and connectedness of the ordering on ORD. Let C be a nonempty subclass of ORD ORD.
Notice that D = {max{, } : (, ) C} is a nonempty subclass of ORD, hence has a least element by
Proposition 12.2.17. Now let A = { : (, ) C}.
Suppose first that A 6= , and let 0 be the least element of A (which exists by Proposition 12.2.14). Let
(, ) C. Notice that if max{, } > , we then have (0 , ) < (, ). Suppose then that max{, } = .
If = , we then have (0 , ) < (, ) because 0 < . If 6= and = , we then have 0 by choice
of 0 , hence (0 , ) (, ).
Suppose now that A = . Let B = { S() : (, ) C} and notice that B 6= . Let 0 be the least
element of B (which exists by Proposition 12.2.14). Let (, ) C. Notice that if max{, } > , we then
have (, 0 ) < (, ). Suppose then that max{, } = . Notice that we must have = because A = . It
follows that 0 by choice of 0 , hence (, 0 ) (, ).
Theorem 12.5.6. For all ORD, we have = .
Proof. The proof is by induction on ORD. Suppose is an ordinal and that = for all < .
Notice that if we restrict the < relation on ORD ORD to , we still get a well-ordering. Given
(, ) , we let
P, = {(1 , 2 ) : (1 , 2 ) < (, )}.
Let (, ) . Let = max{, } + 1. Since , < , and is an infinite cardinal by Proposition
12.4.10, it follows that < and hence || < . Fix < such that || = . We then have
P, , by induction. Therefore, |P, | < for every (, ) .
Since is well-ordered by <, it follows from Theorem 12.2.22 theat
= for some ordinal
. Let f : be a witnessing isomorphism. Then f is injective, so we must have  , and
hence . Suppose that < . Since f is an isomorphism, there exists (, ) such that
f ((, )) = . We then have |P, | = , a contradiction. It follows that = , so f witnesses that
. Hence = .
Corollary 12.5.7. Suppose that and are cardinals, 1 , and 0 . We then have
1. + = = + .
2. = = .
Proof. Fix such that = by Proposition 12.4.10. Notice that
= = = .
Since we clearly have , it follows that = . Also, notice that
+ + = 2 = .
Since we clearly have + , it follows that + = .

168

CHAPTER 12. WELL-ORDERINGS, ORDINALS, AND CARDINALS

Chapter 13

The Axiom Of Choice


13.1

Use of the Axiom of Choice in Mathematics

Definition 13.1.1. Let F be a family of nonempty sets. A choice function on F is a function h : F


such that h(A) A for all A F.

Proposition 13.1.2. The following are equivalent (over ZF).


1. The Axiom of Choice: If F is a family of nonempty pairwise disjoint sets, then there is a set C such
that there is a unique element of C A for every A F.
2. Every family F of nonempty sets has a choice function.
3. Every family F of nonempty pairwise disjoint sets has a choice function.
Proof. 1 implies 2: Let F be a family of nonempty sets. Let G = {{A} A : A F}, and notice that G is a
set by Collection and Separation. Furthermore, G is a family of nonempty pairwise disjoint sets. By 1, there
is a set
for every B G. By Separation, we may assume that
S C such that there is unique element of C B S
C G. Letting h = C, it now follows that h : F F and h(A) A for every A F. Therefore, F has
a choice function.
2 implies 3: Trivial.
3 implies 1: Let F be a family of nonempty pairwise disjoint sets. By 3, there is choice function h for F.
Let C = ran(h) and notice that there is a unique element of C A for every A F (because the sets in F
are pairwise disjoint).
Here are some examples where the Axiom of Choice is implicitly used in mathamtics.
Proposition 13.1.3. If f : A B is a surjection, there exists an injection g : B A such that f g = idB .
Proof. The idea of constructing such a g is to let g(b) be an arbitrary a A such that f (a) = b. When you
think about it, there doesnt seem to be a way to define g without making all of these arbitrary choices.
Define a function H : B P(A) by letting H(b) = {a A : f (a) = b}. Notice that H(b) 6= for
every b B because f is surjective. Let h : P(A)\{} A be a choice function, so h(D) D for every
D P(A)\{}. Set g = hH and notice that g : B A. We first show that (f g)(b) = b for every b B. Let
b B. Since h(H(b)) H(b), it follows that f (h(H(b))) = b, hence (f g)(b) = f (g(b)) = f (h(H(b))) = b.
Therefore, f g is the identity function on B. We finally show that g is injective. Let b1 , b2 B with
g(b1 ) = g(b2 ). We then have b1 = (f g)(b1 ) = f (g(b1 )) = f (g(b2 )) = (f g)(b2 ) = b2 .
169

170

CHAPTER 13. THE AXIOM OF CHOICE

Proposition 13.1.4. If f : R R and y R, then f is continuous at y if and only if for every sequence
{xn }n with lim xn = y, we have lim f (xn ) = f (y).
n

Proof. The left-to right direction is unproblematic. For the right-to-left direction, the argument is as follows.
Suppose that f is not continuous at y, and fix > 0 such that there is no > 0 such that whenever |xy| < ,
we have |f (x) f (y)| < . We define a sequence as follows. Given n , let xn be an arbitrary real number
with |xn y| < n1 such that |f (xn ) f (y)| . Again, were making infinitely many arbitrary choices in the
construction.
Suppose that f is not continuous at y, and fix > 0 such that there is no > 0 such that whenever
|xy| < , we have |f (x)f (y)| < . Define a function H : R+ P(R) by letting H() = {x R : |xy| <
and |f (x) f (y)| . Notice that H() 6= for every R+ by assumption. Let h : P(R)\{} R be a
choice function. For each n , let xn = h(H( n1 )). One then easily checks that lim xn = y but its not the
n

case that lim f (xn ) = f (y).


n

Another example is the proof is the countable union of countable sets is countable. Let {An }n be
countable
sets. The first step is to fix injections fn : An for each n and then build an injection
S
f : n An from these. However, we are again making infinitely many arbitrary choices when we fix
the injections. Well prove a generalization of this fact using the Axiom of Choice below.
S
Example. Let F = P()\{0}. Notice that F = . We can prove the existence of a choice function for F
without the Axiom of Choice as follows. Define g : F by letting g(A) be the <-least element of A for
every A P()\{0}. More formally, we define g = {(A, a) F : a A and a b for all b A} and
prove that g is a choice function on F.
Proposition 13.1.5. Without the Axiom of Choice, one can prove that if F is a family of nonempty sets
and F is finite, then F has a choice function.

13.2

Equivalents of the Axiom of Choice

Theorem 13.2.1 (Zermelo). The following are equivalent.


1. The Axiom of Choice.
2. Every set can be well-ordered.
Proof. 2 implies 1: We show that every family of nonempty
Let F be a family
S sets has a choice function.
S
of nonempty sets. By 2, we can fix a well-ordering < of F. Define g : F F by letting g(A) be the
<-least element of A. Notice that g is a choice function on F.
1 implies 2: Let A be a set. It suffices to show that there is an ordinal such that A. Our goal
is to define a class function F : ORD A recursively. First, let g : P(A)\{} A be a choice function.
Fix x
/ A. We now define F as follows. If x ran(F  ) or ran(F  ) = A, let F() = x. Otherwise,
ran(F  ) ( A, and we let F() = g(A\ran(F  )). Since A is a set and ORD is a proper class, we know
that F is not injective. It follows that we must have x ran(F) (otherwise, a simple induction shows that
F is injective). Let be the least ordinal such that F() = x. A straightforward induction now shows that
F  : A is injective, and we notice that it is surjective because F() = x. It follows that A .
Definition 13.2.2. Zorns Lemma is the statement that if (P, <) is nonempty partially ordered set with the
property that each chain in P has an upper bound in P , then P has a maximal element.
Theorem 13.2.3. The following are equivalent.
1. The Axiom of Choice.

13.3. THE AXIOM OF CHOICE AND CARDINAL ARITHMETIC

171

2. Zorns Lemma.
Proof. 1 implies 2: Let (P, <) be nonempty partially ordered set with the property that each chain in P has
an upper bound in P . Let g : P(P )\{} P be a choice function. Fix x
/ P . We define a class function
F : ORD P recursively as follows. If x ran(F  ), let F() = x. Also, if ran(F  ) A and there is no
q P such that q > p for every p ran(F  ), let F() = x. Otherwise, ran(F  ) A and {q P : q > p
for every p ran(F  )} =
6 , and we let F() = g({q P : q > p for every p ran(F  )}). We know that
F can not be injective, so as above we must have x ran(F). Fix the least ordinal such that F() = x.
A straightforward induction shows that ran(F  ) is injective and that ran(F  ) is a chain in P .
Notice that 6= 0 because P 6= . Suppose that is a limit ordinal. Since ran(F  ) is a chain in
P , we know by assumption that there exists q P with q p for all p ran(F  ). Notice that we can
not have q = F() for any < because we would then have + 1 < (because is a limit ordinal)
and q < F( + 1) by definition of F, contrary to the fact that q p for all p ran(F  ). It follows that
q > p for all p ran(F  ), hence F() 6= x, a contradiction. It follows that is a successor ordinal, say
= S(). Since F() 6= x and F(S()) = x, it follows that F() is a maximal element of P .
2 implies 1: Let F be a family of nonempty sets. We use Zorns Lemma to show that F has a choice
function. Let P = {q : q is a function, dom(q) F, and q(A) A for every A dom(q)}. Given p, q P ,
we let p < q if and only if p ( q. It is easy to check
that (P, <) isSa partial ordering. Notice that P 6=
S
because P . Also, if H is a chain in P , then H P , and p H for all p H. It follows that every
chain in P has an upper bound in P . By Zorns Lemma, P has a maximal element which we call g. We
need only show that dom(g) = F. Suppose instead that dom(g) ( F, and fix A F\dom(g). Fix a A.
We then have g {(A, a)} P and g < g {(A, a)}, a contradiction. It follows that dom(g) = F, so g is a
choice function on F.

13.3

The Axiom of Choice and Cardinal Arithmetic

Once we adopt the Axiom of Choice, it follows that every set can be well-ordered. Therefore, |A| is defined
for every set A.
Proposition 13.3.1. Let A and B be sets.
1. A  B if and only if |A| |B|.
2. A B if and only if |A| = |B|.
Proof.
1. Suppose first that |A| |B|. Let = |A| and let = |B|, and fix bijections f : A and g : B.
Since , we have and so we may consider g f : A B. One easily checks that this is an
injective function.
Suppose now that A  B, and fix an injection h : A B. Let = |A| and let = |B|, and fix
bijections f : A and g : B . We then have that g h f : is an injection, hence .
2. Suppose first that A B. We then have that |A| |B| and |B| |A| by part 1, hence |A| = |B|.
3. Suppose now that |A| = |B|. By part 1, we then have that A  B and B  A, hence A B by the
Cantor-Bernstein Theorem.

Proposition 13.3.2. |A A| = |A| for every infinite set A.

172

CHAPTER 13. THE AXIOM OF CHOICE

Proof. Since A is infinite, there exists such that |A| = . We then have A A , hence
|A A| . We clearly have |A A|, hence |A A| = = |A|.
Proposition 13.3.3.
Let F be a family of sets. Suppose that |F| and that |A| for every A F.
S
We then have | F| .
Proof. Let = |F| (notice that ), and fix a bijection f : F.
S Also, for each A F, fix an injection
S
gA : A (using the Axiom of Choice). We define an injection h :
F as follows. Given S
b F,
let be the least ordinal such that b f (), and set h(b) = (, gf () (b)). Suppose that b1 , b2 F and
h(b1 ) = h(b2 ). Let 1 be the least ordinal such that b1 f (1 ) and let 2 be the least ordinal such that
b2 f (2 ). Since h(b1 ) = h(b2 ), it follows that 1 = 2 , and we call their common value . Therefore,
using the fact that h(b1 ) = h(b2 ) again, we conclude that gf () (b1 ) = gf () (b2 ). Since gf () is an injection,
it follows that b1 = b2 . Hence, h : F is an injection, so we may conclude that |F| .
Proposition 13.3.4. |A< | = |A| for every infinite set A.
Proof. Using Proposition
13.3.2 and induction (on ), it follows that |An | = |A| for every n with n 1.
S n
<
Since A
= {A : n }, we may use Proposition 13.3.3 to conclude that |A< | 0 |A| = |A|. We
clearly have |A| |A< |, hence |A< | = |A|.
Definition 13.3.5. Let A and B be sets. We let AB be the set of all functions from B to A.
B2
1
Proposition 13.3.6. Let A1 , A2 , B1 , B2 be sets with A1 A2 and B1 B2 . We then have AB
1 A2 .

Now that weve adopted the Axiom of Choice, we know that AB can be well-ordered for any sets A and
B, so it makes sense to talk about |AB |. This gives us a way to define cardinal exponentiation.
Definition 13.3.7. Let and be cardinals. We use to also denote the cardinality of the set . (So,
were using the same notation to denote both the set of functions from to and also its cardinality).
Proposition 13.3.8. Let , , and be cardinals.
1. + = .
2. = ( ) .
3. ( ) = .
Proof. Fix sets A, B, C such that |A| = , |B| = , and |C| = (we could use , , and , but its easier
to distinguish sets from cardinals).
1. It suffices to find a bijection F : AB{0}C{1} AB AC . We define F as follows. Given f : B
{0} C {1} A, let F (f ) = (g, h) where g : B A is given by g(b) = f ((b, 0)) and h : C A is
given by h(c) = f ((c, 1)).
2. It suffices to find a bijection F : (AB )C ABC . We define F as follows. Given f : C AB , let
F (f ) : B C A be the function defined by F (f )((b, c)) = f (c)(b) for all b B and c C.
3. It suffices to find a bijection F : AC B C (A B)C . We define F as follows. Given g : C A
and h : C B, let F ((g, h)) : C A B be the function defined by F ((g, h))(c) = (g(c), h(c)) for all
c C.

Proposition 13.3.9. 2 = |P()| for all cardinals .

13.3. THE AXIOM OF CHOICE AND CARDINAL ARITHMETIC

173

Proof. Fix a cardinal . We define a function F : 2 P() as follows. Given f : 2, let F (f ) = {


: f () = 1}. We then have that F is a bijection, hence 2 = |P()|
Corollary 13.3.10. < 2 for all cardinals .
Proof. We know that P() from above.
Proposition 13.3.11. If 2 , then = 2
Proof. We have
2 (2 ) = 2 = 2

174

CHAPTER 13. THE AXIOM OF CHOICE

Chapter 14

Set-theoretic Methods in Analysis


and Model Theory
14.1

Subsets of R

14.1.1

The Reals

Proposition 14.1.1. |R| = 20 .


Proof. The function f : R P(Q) given by f (x) = {q Q : q < x} is injective, so
|R| |P(Q)| = 2|Q| = 20
The function f : 20 R given by
f (q) =

X
q(n)
10n
n=0

is injective, so 20 |R|.
Proposition 14.1.2. If a, b R and a < b, then |(a, b)| = 20 .
Proof. The above injection shows that |(0, 1)| = 20 and for all a, b R with a < b, we have (0, 1) (a, b).
Proposition 14.1.3. If O is a nonempty open subset of R, then |O| = 20 .
Proof. Every nonempty open subset of R contains an open interval.

14.1.2

Perfect Sets

Definition 14.1.4. Let P R. We say that P is perfect if it is closed and has no isolated points.
Example. [a, b] is perfect for all a, b R with a < b.
Proposition 14.1.5. The Cantor Set C defined by

X
q(n 1)
: q {0, 2} }
n
3
n=1

C={
is perfect.

175

176

CHAPTER 14. SET-THEORETIC METHODS IN ANALYSIS AND MODEL THEORY

Proof. Consult your favorite analysis book.


Proposition 14.1.6. If P R is perfect and a, b R with a < b and a, b
/ P , then P [a, b] is perfect.
Proof. Since both P and [a, b] are closed, it follows that P [a, b] is closed. Fix x P [a, b], and notice that
x > a and x < b since a, b
/ P . Let > 0. Since P is perfect, we know that x is not isolated in P , so there
exists y P such that 0 < |x y| < min{, x a, b x}. We then have that 0 < |x y| < and also that
y [a, b] (by choice of ). Therefore, x is not isolated in P [a, b]. It follows that P [a, b] is perfect.
Proposition 14.1.7. If P R is a nonempty perfect set and > 0, then there exists nonempty perfect sets
P1 , P2 R such that
1. P1 P2 = .
2. P1 P2 P .
3. diam(P1 ), diam(P2 ) < .
Proof. Let P R be a nonempty perfect set and let > 0. Since P is nonempty, we may fix x P .
Case 1: There exists > 0 such that [x, x+] P . We may assume (by making smaller if necessary)
that < . In this case, let P1 = [x , x 2 ] and let P2 = [x + 2 , x + ].
Case 2: Otherwise, for every > 0, there exists infinitely many y [x , x + ]\P . Thus, there
exists points a, b, c, d [x 4 , x + 4 ]\P such that a < b < c < d. In this case, let P1 = P [a, b] and let
P2 = P [c, d].
Proposition 14.1.8. If P R is a nonempty perfect set, then |P | = 20 .
Proof. Since P R, we know that |P | 20 . By the previous Proposition, there exists a nonempty perfect
set Q P such that diam(Q) < 1. We can now use the previous Proposition to recursivley define a function
f : 2< P(P ) such that:
1. f () = Q.
2. f () is a nonempty perfect set for all 2< .
3. diam(f ()) <

1
2||

for all 2< .

4. f ( 0) f ( 1) f () for all 2< .


5. f ( 0) f ( 1) = for all 2< .
T
Now define g : 20 P by letting g(q) be the unique element of n f (q  n) for all q 20 (notice that
such an element must exist because the intersection is of a nested sequence of compact sets, and that the
element is unique because the diameters go to 0). Finally, notice that g is injective by virtue of property 5
of the function f .

14.1.3

Closed Sets

Definition 14.1.9. Suppose that C R is a closed set. We define C 0 to be the set C {x R : x is an


isolated point of C}. We call C 0 the Cantor-Bendixson derivative of C.
Notice that a closed set C is perfect if and only if C = C 0 .
Proposition 14.1.10. If C R is a closed set, then C 0 C is also closed.

14.1. SUBSETS OF R

177

Proof. Recall that a set is closed if and only if its complement is open. We show that C 0 is open. Fix x C 0 .
If x
/ C, then since C is closed, we may fix > 0 such that (x , x + ) C C 0 . Suppose then that
x C. Since x
/ C 0 , we know that x is an isolated point of C. Fix > 0 such that C (x , x + ) = {x}.
We then have that (x , x + ) C 0 . Therefore, C 0 is open, hence C 0 is closed.
Proposition 14.1.11. If C R is a closed set, then C\C 0 = {x R : x is an isolated point of C} is
countable.
Proof. Define a function f : C\C 0 Q Q by letting f (x) = (q, r) where (q, r) is least (under some fixed
well-ordering of Q Q) such that C (q, r) = {x}. We then have that f is injective, hence C\C 0 is countable
because Q Q is countable.
Definition 14.1.12. Let C R be a closed set. We define a sequence C () for < 1 recursively as follows.
1. C (0) = C.
2. C (+1) = (C () )0 .
T
3. C () = {C () : < } if is a limit.
Notice that each C () is closed and that C () C () whenever < < 1 be a trivial induction.
Proposition 14.1.13. Let C R be a closed set. There exists an < 1 such that C (+1) = C () .
Proof. Suppose that C (+1) 6= C () for all < 1 . Define a function f : 1 Q Q by letting f () = (q, r)
where (q, r) is least (under some fixed well-ordering of Q Q) such that there is a unique element of
C () (q, r). We then have that f is injective, contrary to the fact that |Q Q| = 0 .
Theorem 14.1.14. Let C R be a closed set. There exists a perfect set P R and a countable A R
such that C = A P and A P = .
S
Proof. Let < 1 be least such that C (+1) = C () . Let P = C () and let A = < (C () \C (+1) ). Notice
that C = A P and A P = , that P is perfect because P = P 0 , and that A is countable because it is the
countable union of countable sets.
Corollary 14.1.15. If C R is an uncountable closed set, then |C| = 20 .
Proof. Let C R be an uncountable closed set. We have |C| 20 because C R. Let P be perfect and
A countable such that C = A P and A P = . Since C is uncountable, we have P 6= , hence |P | = 20 ,
and so |C| 20 .

14.1.4

Borel Sets

Definition 14.1.16. Let O be the set of open subsets of R. We define the set B of Borel sets to be the
smallest subset of P(R) such that
1. O B.
2. If A B, then R\A B.
3. If An B for all n , then

An B.

Definition 14.1.17. We define a sequence ( , ) for < 1 \{0} recursively as follows.


1. 1 = O and 1 = {R\A : A O}.
S
2. +1 = { n An : each An } and +1 = {R\A : A +1 }.

178

CHAPTER 14. SET-THEORETIC METHODS IN ANALYSIS AND MODEL THEORY

S
3. = { n An : each An for some < } and = {R\A : A } if is a limit.
S
Proposition 14.1.18. B = <1 .
Corollary 14.1.19. |B| = 20 .
Corollary 14.1.20. B 6= P(R).
Proof. We have |P(R)| = 2|R| = 22

14.1.5

> 20 = |B|.

Measurable Sets

Definition 14.1.21. Let M be the set of all (Lebesgue) measurable subsets of R.


0

Proposition 14.1.22. |M| = 22 .


Proof. The Cantor set C is measurable with measure 0, and |C| = 20 . Since every subset of a set of measure
0
0 is measurable (with measure 0), it follows that P(C) M. Therefore, M 2|C| = 22 .
Corollary 14.1.23. There is a measurable set which is not Borel.

14.2

The Size of Models

14.2.1

Controlling the Size of Models

Proposition 14.2.1. Let L be a language and suppose that F ormL is satisfiable. There exists a model
(M, s) of such that |M | |L| + 0 .
Proof. We already proved this last quarter when L was countable (and particular when L was finite). Suppose
that L is infinite and let = |L|.
Recall the proof of the Completeness Theorem. Notice that if L is consistent, then L0 formed in the first
step of adding witnesses satisfies |L0 | = because |F ormL V ar|S= < 0 = . Thus, each Ln acheived
by iteratively adding witnesses satisfies |Ln | = , so the final L0 = n Ln satisfies |L0 | = . It follows that
|T ermL0 | = , and since the L0 -structure M0 we constructed in the proof of the Completeness Theorem is
formed by taking the quotient from an equivalence relation on the countable T ermL0 , we can conclude that
|M 0 | . Therefore, the L-structure M0  L from the proof of the Completeness Theorem has cardinality
at most .
Theorem 14.2.2 (Lowenheim-Skolem Theorem). Let L be a language and suppose that F ormL has an
infinite model. Let |L| + 0 . There exists a model (M, s) of such that |M | = .
Proof. Suppose that |L|. Let L0 be L together with new constant symbols c for all < . Notice that
|L0 | = |L| + = . Let
0 = {c 6= c : , < and 6= }
Notice that every finite subset of 0 has a model by using an infinite model of and interpreting the
constants which appear in the finite subset as distinct elements. Therefore, by Compactness, we know that
is a model. By Proposition 14.2.1, there exists a model (M0 , s) of 0 such that |M 0 | |L0 | + 0 = .
Notice that we must also have |M 0 | , hence |M 0 | = . Letting M be the restiction of the structure M0
to the language L, we see that (M, s) is a model of and that |M | = .

14.2. THE SIZE OF MODELS

14.2.2

179

Counting Models

Definition 14.2.3. Given a theory T in a language L and a cardinal , let I(T, ) be the number of models
of T of cardinality up to isomorphism.
Proposition 14.2.4. Let T be a theory in a language L with |L| = . For any infinite cardinal , we have
I(T, ) 2 . In particular, if is infinite, then I(T, ) 2 .
Proof. Let be an infinite cardinal. We have
I(T, ) |C| |P(< )||R| |P(< )||F |
|C| |P()||R| |P()||F |
(2 ) (2 )
(2 ) (2 ) (2 )
= 2

Proposition 14.2.5. If T is the theory of groups, then I(T, 0 ) = 20 .


Proof. Let P be the set of primes. Notice that the set of finite subsets of P is countable, so the set of infinite
subsets of P has cardinality 20 . For each infinite A P(P ), let
GA = pA Z/pZ
Notice that if A, B P(P ) are infinite, then GA
6 GB .
=
Proposition 14.2.6. Let T be the theory of vector spaces over Q. We have I(T, 0 ) = 0 and I(T, ) = 1
for all 1 .
Proof. Notice first that if V is a vector space over Q and dimQ (V ) = n , then
|V | = |Qn | = 0
Now if V is a vector space over Q and dimQ (V ) = 0 , then since every element of V is a finite sum of
scaler multiples of elements of a basis, it follows that
|V | |(Q )< | = |(0 )< | = |< | = .
and we clearly have |V | , so |V | = .
Since two vector spaces over Q are isomorphic if and only if they have the same dimension, it follows
that I(T, 0 ) = 0 (corresponding to dimensions in {0 }) and I(T, ) = 1 for all 1 (corresponding
to dimension ).
Proposition 14.2.7. For any p, we have I(ACFp , 0 ) = 0 and I(ACFp , ) = 1 for all 1 .
Proof. Notice that if F is an algebraically closed field and tr.deg.(F ) = , then |F | = .
Definition 14.2.8. Let T be a theory and let be a cardinal. We say that T is -categorical if I(T, ) = 1.
Proposition 14.2.9 (Los-Vaught Test). Suppose that T is a theory such that all models of T are infinite.
If there exists |L| + 0 such that T is -categorical, then T is complete.

180

CHAPTER 14. SET-THEORETIC METHODS IN ANALYSIS AND MODEL THEORY

Proof. Let T be a theory such that all models of T are infinite. Suppose that T is not complete and fix
SentL such that
/ T and
/ T . We then have that T {} and T {} are both satisfiable with
infinite models (because all models of T are infinite), so by the Lowenheim-Skolem Theorem we may fix a
model M1 of T {} and a model M2 of T {} such that |M1 | = = |M2 |. We then have that M1 and
M2 are models of T which are not isomorphic, hence I(T, ) 2.
Corollary 14.2.10. DLO and each ACFp are complete.
Theorem 14.2.11 (Morleys Theorem). Let L be a countable language and let T be a theory. If T is
-categorical for some 1 , then T is -categorical for all 1 .

14.3

Ultraproducts and Compactness

Let L be a language. Let I be a set, and suppose that for each i I we have an L-structure Mi . For initial
clarity, think of I = , so that we have L-stuctures M0 , M1 , M2 , . . . . We want a way to put together all
of the Mi which somehow blends the properties of the Mi together
into one structure. An initial thought
Q
is to form
a
product
of
the
structures
M
with
underlying
set
M
. That is, M consists of all functions
i
i
iI
S
g : I iI Mi such that g(i) Mi for all i I. Interpreting the constants and functions would then
be straightforward. For example, suppose that L = {e, f} where e is a constant symbol and f is a binary
relation symbol. Suppose that I = and each Mi is a group. Elements of M would then be sequences
hai ii , we would interpret e as the sequence of each identity in each group, and we would interpret f as the
componentwise group operation (i.e. f M (hai ii , hbi ii ) = hf Mi (ai , bi )ii . In general, we would let cM
be the function i 7 cMi for each constant symbol c, and given f Fk we would let f Mi (g1 , g2 , . . . , gk ) be
the function i 7 f Mi (gi (i), g2 (i), . . . , gk (i)).
This certainly works, but it doesnt really blend the properties of the structures together particularly
well. For example, if each Mi is a group and all but one is abelian, the product is still nonabelian. Also,
if we have relation symbols, its not clear what the right way to determine how to interpret the relation
on M. For example, if L = {R} where R is a binary relation symbol and I = , do we say that the pair
(hai ii , hbi ii ) is an element of RM if some (ai , bi ) RMi , or if all (ai , bi ) RMi , or something else?
Which is the right definition? In other words, if each Mi is a graph, do we put an edge between the
sequences if some edge exists between the components, or if every pair has an edge?
We thus want a more democratic approach of forming M which also gives a way to nicely interpret
the relation symbols. If I were finite, perhaps we could do a majority rules (if most of the pairs were in the
relation), but what if I is infinite?

14.3.1

Ultrafilters

Definition 14.3.1. Let X be a set. A filter on X is a set F P(X) such that


1. X F and
/ F.
2. If A F and A B X, then B F.
3. A B F whenever A, B F.
Example. Let X be a nonempty set, and let x X. The set
F = {A P(X) : x A}
is a filter on X. Such a filter is called a principal filter on X generated by x.

14.3. ULTRAPRODUCTS AND COMPACTNESS

181

Proposition 14.3.2. Let X be an infinite set. The set


F = {A P(X) : A is cofinite}
is a filter on X.
Proposition 14.3.3. Let X be a set and let F be a filter on X. For every finite T F, we have

T =
6 .

Proof. By induction on |T |.
Definition T
14.3.4. Let X be a set and suppose that S P(X). We say that S has the finite intersection
property if T 6= for all finite T S.
Proposition 14.3.5. Let X be a set and suppose that S P(X). The following are equivalent
1. S has the finite intersection property.
2. There exists a filter F on X such that S F.
Proof. 1 imples 2: Let
F = {A P(X) :

T A for some finite T S}

We claim that F is a filter on X. Notice that


/ F because S hasTthe finite
T we clearly have X F, and that
intersection property. Now if A F, say T A where T S is finite, and T
A B X, then
T B,
T
so B F. Finally,
suppose
that
A,
B

F,
and
fix
finite
T
,
T

S
such
that
T

A
and
T

B. We
1
2
1
2
T
then have that (T1 T2 ) A B, hence A B F.
2 implies 1: Fix a filter
T F on X with S F. Let T be a finite subset of S. WeTthen have that T is a
finite subset of F, hence T F because F is a filter. Since
/ F, it follows that T 6= .
Definition 14.3.6. Let X be a set. An ultrafilter on X is filter U on X such that for all A X, either
A U or X\A U.
Example. Every principal filter is an ultrafilter.
Proposition 14.3.7. Let F be a filter on X. F is an ultrafilter on X if and only if F is a maximal filter
on X (i.e. there is no filter G on X with F ( G).
Proof. Suppose that F is not a maximal filter on X. Fix a filter G on X such that F ( G. Fix A G\F.
Notice that X\A
/ F because otherwise we would have X\A G and hence = A (X\A) G, a
contradiction. Therefore, A
/ F and X\A
/ F, so F is not an ultrafilter on X.
Conversely, suppose that F is not an ultrafilter on X. Fix A P(X) such that A
/ F and X\A
/ F.
We claim that F {A} has the finite intersection property. Fix a filter G on X such that F {A} G.
Since F ( G, it follows that F is not a maximal filter on X.
Proposition 14.3.8. Let F be a filter on X. There exists an ultrafilter U on X such that F U.
Proof. Zorns Lemma.
Corollary 14.3.9. Let X be an infinite set. There exists a nonprincipal ultrafilter on X.
Proof. Let F be the filter on X consisting of all cofinite subsets of X. Fix an ultrafilter U on X such that
F U. For all x X, we have X\{x} F U, hence {x}
/ U.

182

CHAPTER 14. SET-THEORETIC METHODS IN ANALYSIS AND MODEL THEORY

14.3.2

Ultraproducts

Ultrafilters (or even just filters) solve our democratic blending problem for relation symbols beautifully.
Suppose that L = {R} where R is a binary relation symbol and I = . Suppose also that U is an ultrafilter
on . Given elements hai ii and hbi ii of M , we could then say that the pair (hai ii , hbi ii ) is an element
of RM if the set of indices i I such that (ai , bi ) RMi is large, i.e. if {i I : (ai , bi ) RMi } U. Of
course, our notion of large depends on the ultrafilter, but that flexibility is the beauty of the construction!
However, we have yet to solve the dictatorial problem of function symbols (such as the product of groups
in which each is abelian save one ending up nonabelian regardless of what we consider large). Wonderfully,
and perhaps surpisingly, the ultrafilter can be used in another way to save the day. For concreteness, consider
the situation where L = {e, f} where e is a constant symbol and f is a binary relation symbol, I = , and each
Mi is a group. The idea is to flat out ignore variations on small sets by considering two sequences hai ii
and hbi ii to be the same if the set of indices in which they agree is large, i.e. if {i I : ai = bi } U. In
other words, we should define an equivalence relation in this way and take a quotient! This is completely
analagous to considering two function f, g : R R to be the same if the set {x R : f (x) 6= g(x)} has
measure 0. What does this solve? Suppose that M0 was our rogue nonabelian group, and each Mi for
i 6= 0 was an abelian group. Suppose also that \{0} U (i.e. our ultafilter is not the principal ultrafilter
generated by {0}, and thus we are considering {0} to be a small set). Given a sequence hai ii , let [hai ii ]
be the equivalence class of hai ii under the relation. Assuming that everything is well-defined (see below),
we then have that hf Mi (ai , bi )ii hf Mi (bi , ai )ii and so
f M ([hai ii ], [hbi ii ]) = [hf Mi (ai , bi )ii ]
= [hf Mi (bi , ai )ii ]
= f M ([hbi ii ], [hai ii ])
and so we have saved abelianess by ignoring problems on small sets!
To summarize before launching into details, heres
Q the constuction. Start with a language L, a set I, and
L-structures Mi for each i I. Form the product iI Mi , but take a quotient by considering two elements
of this product to be equivalent if the set of indices on which they agree is large. Elements of our structure
are now equivalence classes, so we need to worry about things being well-defined, but the fundamental idea
is to interpret constant symbols and functions componentwise, and interpret relation symbols by saying that
that an k-tuple is in the interpretation of some R Rk if the set of indices on which the corresponding ktuple is in RMi is large. Amazingly, this process behaves absolutely beautifully with regards to first-order
logic. For example, if we denote this blended structure by M, we will prove below that for any SentL
we have
M  if and only if {i I : Mi  } U
That is, an arbitrary sentence is true in the blended stucture if and only if the set of indices i I in
which is true in Mi is large!
Onward to the details. The notation is painful and easy to get lost in, but keep the fundamental ideas
in mind and revert to thinking of I = whenever the situation looks hopelessly complicated. First we have
the proposition saying that the defined in this way is an equivalence relation and that our definitions are
well-defined.
Proposition 14.3.10. Let I be a set, and suppose
that for each i I we have an L-structure Mi . Let U
Q
be an ultrafilter on I. Define a relation on iI Mi by saying that g h if {i I : g(i) = h(i)} U.
Q
1. is an equivalence relation on iI Mi .
Q
2. Suppose that g1 , g2 , . . . , gk , h1 , h2 , . . . , hk iI Mi are such that gj hj for all j.
(a) {i I : (g1 (i), g2 (i), . . . , gk (i)) = (h1 (i), h2 (i), . . . , hk (i))} U.

14.3. ULTRAPRODUCTS AND COMPACTNESS

183

(b) For each R Rk , the following are equivalent


{i I : (g1 (i), g2 (i), . . . , gk (i)) RMi } U.
{i I : (h1 (i), h2 (i), . . . , hk (i)) RMi } U.
(c) For each f Fk , we have {i I : f Mi (g1 (i), g2 (i), . . . , gk (i)) = f Mi (h1 (i), h2 (i), . . . , hk (i))} U.
Proof.
Definition 14.3.11. Let I be a set, and suppose that for
Q each i I we have an L-structure Mi . Let U
be
an
ultrafilter
on
I.
We
define
an
L-structure
M
=
iI Mi /U as follows. Define the relation on
Q
M
as
above,
and
we
let
the
universe
of
M
be
the
corresponding
quotient. We interpret the symbols of
i
iI
L as follows.
1. For each c C, let cM = [i 7 cMi ].
2. For each R Rk , let RM = {([g1 ], [g2 ], . . . , [gk ]) M k : {i I : (g1 (i), g2 (i), . . . , gk (i)) RMi } U }.
3. For each f Fk , let f M ([g1 ], [g2 ], . . . , [gk ]) = [i 7 f Mi (g1 (i), g2 (i), . . . , gk (i))].
We call M the ultraproduct of the Mi over the ultrafilter U.
Definition 14.3.12. In the above situtation, given variable assignments si : V ar Mi for each i I, we
let hsi iiI denote the variable assigment V ar M given by hsi iiI (x) = [i 7 si (x)].
Lemma 14.3.13. Let L be a language, let I be aQset, and let U be an ultrafilter on I. Suppose that for each
i I, we have an L-structure Mi , and let M = iI Mi /U. For all t T ermL and all si : V ar Mi , we
have
hsi iiI (t) = [i 7 si (t)]
Q
In other words, for all t(x1 , x2 , . . . , xk ) T ermL and all g1 , g2 , . . . , gk iI Mi , we have
tM ([g1 ], [g2 ], . . . , [gk ]) = [i 7 tMi (g1 (i), g2 (i), . . . , gk (i))]
Proof. Suppose that c C. Let si : V ar Mi be variable assignments. We then have
hsi iiI (c) = cM
= [i 7 cMi ]
= [i 7 si (c)]
Suppose that x V ar. Let si : V ar Mi be variable assignments. We then have
hsi iiI (x) = hsi iiI (x)
= [i 7 si (x)]
= [i 7 si (x)]
Suppose that f Fk and t1 , t2 , . . . , tk T ermL are such that the result holds for the ti . Let si : V ar Mi
be variable assignments. We then have
hsi iiI (ft1 t2 tk ) = f M (hsi iiI (t1 ), hsi iiI (t2 ), . . . , hsi iiI (tk ))
= f M ([i 7 si (t1 )], [i 7 si (t2 )], . . . , [i 7 si (tk )])
= [i 7 f Mi (si (t1 ), si (t2 ), . . . , si (tk ))]
= [i 7 si (ft1 t2 tk )]

184

CHAPTER 14. SET-THEORETIC METHODS IN ANALYSIS AND MODEL THEORY

Theorem 14.3.14. Let L be a language, let I be aQ


set, and let U be an ultrafilter on I. Suppose that for each
i I, we have an L-structure Mi , and let M = iI Mi /U. For all F ormL and all si : V ar Mi ,
we have
(M, hsi iiI )  if and only if {i I : (Mi , si )  } U
Q
In other words, for all (x1 , x2 , . . . , xk ) F ormL and all g1 , g2 , . . . , gk iI Mi , we have
(M, [g1 ], [g2 ], . . . , [gk ])  if and only if {i I : (Mi , g1 (i), g2 (i), . . . , gk (i))  } U
In particular, for any SentL , we have
M  if and only if {i I : Mi  } U
Proof. The proof is by induction.
Suppose that t1 , t2 T ermL . Let si : V ar Mi be variable assignments. We then have
(M, hsi iiI )  = t1 t2 hsi iiI (t1 ) = hsi iiI (t2 )
[i 7 si (t1 )] = [i 7 si (t2 )]
{i I : si (t1 ) = si (t2 )} U
{i I : (Mi , si )  = t1 t2 } U
Suppose that R Rk and t1 , t2 , . . . , tk T ermL . Let si : V ar Mi be variable assignments. We then have
(M, hsi iiI )  Rt1 t2 tk (hsi iiI (t1 ), hsi iiI (t2 ), . . . , hsi iiI (tk )) RM
([i 7 si (t1 )], [i 7 si (t2 )], . . . , [i 7 si (tk )]) RM
{i I : (si (t1 ), si (t2 ), . . . , si (tk )) RMi } U
{i I : (Mi , si )  Rt1 t2 tk } U
Suppose that the result holds for and . Let si : V ar Mi be variable assignments. We then have
(M, hsi iiI )  (M, hsi iiI )  and (M, hsi iiI ) 
{i I : (Mi , si )  } U and {i I : (Mi , si )  } U
{i I : (Mi , si )  and (Mi , si )  } U
{i I : (Mi , si )  } U
Suppose that the result holds for . Let si : V ar Mi be variable assignments. We then have
(M, hsi iiI )  (M, hsi iiI ) 6
{i I : (Mi , si )  }
/U
{i I : (Mi , si ) 6 } U
{i I : (Mi , si )  } U
Suppose that the result holds for . Let si : V ar Mi be variable assignments. We then have
(M, hsi iiI )  y There exists a M such that (M, hsi iiI [y a]) 
Y
There exists g
Mi such that (M, hsi iiI [y [g]]) 
iI

There exists g

Mi such that (M, hsi [y g(i)]iiI ) 

iI

There exists g

Mi such that {i I : (Mi , si [y g(i)])  } U

iI

{i I : There exists a Mi such that (Mi , si [y a])  } U


{i I : (Mi , si )  y} U

14.3. ULTRAPRODUCTS AND COMPACTNESS

185

Theorem 14.3.15. If every finite subset of has a model, then has a model.
Proof. Let I be the set of all finite subsets of . For each I, fix a model M of . For each , let
A = { I : }. Let S = {A : } P(I) and notice that S has the finite intersection property
because
{1 , 2 , . . . , n } A1 A2 An
Q
Fix an ultrafilter U on I such that S U and let M = I M /U. For any , we then have that
A { I : M  }, hence { I : M  } U, and so M  . Therefore, M is a model of .

186

CHAPTER 14. SET-THEORETIC METHODS IN ANALYSIS AND MODEL THEORY

Chapter 15

Primitive Recursive Functions and


Relations
15.1

Primitive Recursive Functions

Definition 15.1.1. Let F be the set of all functions f : Nn N for some n N+ .


Definition 15.1.2. The initial functions are
1. O : N N defined by O(x) = 0.
2. S : N N defined by S(x) = x + 1.
3. Iin : Nn N defined by Iin (x1 , x2 , . . . , xn ) = xi whenever 1 i n.
The collection of initial functions is denoted by Init.
Definition 15.1.3. Suppose that m, n N+ , that h : Nm N, and that g1 , g2 , . . . , gm : Nn N. We let
Compose(h, g1 , g2 , . . . , gm ) be the function f : Nn N defined by f (~x) = h(g1 (~x), g2 (~x), . . . , gm (~x)) for all
~x Nn .
Definition 15.1.4. Let n N+ , let g : Nn N and h : Nn+2 N. We let PrimRec(g, h) be the unique
function f : Nn+1 N defined by
f (~x, 0) = g(~x)
f (~x, y + 1) = h(~x, y, f (~x, y))
Definition 15.1.5. The collection of primitive recursive functions is the collection of functions generated
by starting with the initial functions, and generating using Compose and P rimRec.
Proposition 15.1.6. For every k N and every n N+ , the constant function Ckn : Nn N given by
Ckn (~x) = k for all ~x Nn is primitive recursive.
Proof. We first prove the result when n = 1 by induction on k. Notice that C01 = O, so C01 is primitive
1
1
= Compose(S, Ck1 ), it follows that Ck+1
is
recursive. Suppose that Ck1 is primitive recursive. Since Ck+1
primitive recursive. Therefore, Ck1 is primitive recursive for all k N.
Suppose now that k N and that n Nn . We then have that Ckn = Compose(Ck1 , I1n ), so Ckn is primitive
recursive.
187

188

CHAPTER 15. PRIMITIVE RECURSIVE FUNCTIONS AND RELATIONS

Proposition 15.1.7. The addition function f : N2 N given by f (x, y) = x + y for every x, y N is


primitive recursive.
Proof. Notice that we have
f (x, 0) = x
f (x, y + 1) = f (x, y) + 1
Thus, we define g : N N by letting g(x) = x for all x N and we define h : N3 N by letting h(x, y, a) =
a + 1 for all x, y, a N. Now g = I11 , so g is primitive recursive. Also, notice that h = Compose(S, I33 ), so
h is primitive recursive. It follows that f = PrimRec(g, h) is primitive recursive.
Proposition 15.1.8. The multiplication function f : N2 N given by f (x, y) = x y for every x, y N is
primitive recursive.
Proof. Notice that we have
f (x, 0) = 0
f (x, y + 1) = f (x, y) + x
Thus, we define g : N N by letting g(x) = 0 for all x N and we define h : N3 N by letting h(x, y, a) =
a + x for all x, y, a N. Now g = O, so g is primitive recursive. Also, notice that h = Compose(+, I33 , I13 ),
so h is primitive recursive. It follows that f = PrimRec(g, h) is primitive recursive.
Proposition 15.1.9. The exponentiation function f : N2 N given by f (x, y) = xy for every x, y N is
primitive recursive.
Proof. Notice that we have
f (x, 0) = 1
f (x, y + 1) = f (x, y) x
Thus, we define g : N N by letting g(x) = 1 for all x N and we define h : N3 N by letting h(x, y, a) = ax
for all x, y, a N. Now g = C11 , so g is primitive recursive. Also, notice that h = Compose(, I33 , I13 ), so h is
primitive recursive. It follows that f = PrimRec(g, h) is primitive recursive.
Using our definition of P rimRec, we have no way to define a function f : N N recursively by letting
f (0) be some constant and defining y and f (y + 1). This is because all of our objects are functions, so
we required the g in P rimRec to be a function, not a contant. However, we can easily prove that such
definitions give primitive recursive functions by adding on a dummy variable and projecting down at the
end.
Proposition 15.1.10. Let a N and let h : N2 N be primitive recursive. Let f : N N be defined by
f (0) = a
f (y + 1) = h(y, f (y))
Then f is primitive recursive.
Proof. The idea is first to go up to a function of two variables f 0 : N2 N defined by
f 0 (x, 0) = a
f 0 (x, y + 1) = h0 (x, y, f 0 (x, y)).

15.1. PRIMITIVE RECURSIVE FUNCTIONS

189

where h0 : N3 N is given by h0 (x, y, a) = h(y, a). This is something we can handle, after which we can
strip off the extaneous first variable to get that f (x) = f 0 (0, x) is primitive recursive.
Let g 0 = Ca1 : N N and let h0 : N3 N be defined by h0 (x, y, a) = h(y, a). Notice that g 0 is primitive
recursive and h0 = Compose(h, I23 , I33 ) is also primitive recursive. It follows that f 0 = PrimRec(g 0 , h0 ) is
primitive recursive. Therefore, f = Compose(f 0 , C01 , I11 ) is primitive recursive.
Proposition 15.1.11. The factorial function f : N N is primitive recursive.
Proof. Notice that
f (0) = 0
f (y + 1) = (y + 1) f (y)
Thus, we want to consider the function h : N2 N defined by h(y, a) = (y + 1) a. Notice that h =
Compose(, Compose(S, I12 ), I22 ) is primitive recursive, so f is primitive recursive by Proposition 15.1.10.
Proposition 15.1.12. The predecessor function P red : N N defined by P red(0) = 0 and P red(n) = n 1
for all n > 0 is primitive recursive.
Proof. We have
P red(0) = 0
P red(y + 1) = y
Thus, if we let h : N2 N be the function h(y, a) = y, we notice that h = I12 is primitive recursive, hence
P red is primitive recursive by Proposition 15.1.10.
Proposition 15.1.13. The function f : N2 N defined by
(
x y if x y
f (x, y) =
0
otherwise
Proof. We have
f (x, 0) = x
f (x, y + 1) = P red(f (x, y))
More formally, notice that f = P rimRec(I11 , Compose(P red, I33 )).
Proposition 15.1.14. The following functions are primitive recursive.
1. sg : N N defined by

1
0

if n 6= 0
otherwise

1
0

if n = 0
otherwise

sg(n) =
2. sg : N N defined by
sg(n) =

Proof. We first handle sg. Notice that sg(n) = 1 n for every n N, so sg = Compose(, C11 , I11 ). Next
notice that sg(n) = 1 sg(n) for every n N, so sg = Compose(, C11 , sg).
Proposition 15.1.15. The following functions are primitive recursive:

190

CHAPTER 15. PRIMITIVE RECURSIVE FUNCTIONS AND RELATIONS

1. The function Gt : N2 N defined by


(
1
Gt(x, y) =
0

if x > y
otherwise

(
1
Lt(x, y) =
0

if x < y
otherwise

2. The function Lt : N2 N defined by

3. The function Equal : N2 N defined by


(
Equal(x, y) =

1
0

if x = y
otherwise

Proof. Notice that x > y if and only if x y 6= 0, which is if and only if sg(x y) = 1. Therefore,
Gt = Compose(sg, ) is primitive recursive. Now Lt = Compose(Gt, I22 , I12 ), so Lt is primitive recursive.
Finally, notice that Equal = Compose(sg, Compose(+, Gt, Lt)), so Equal is primitive recursive.
Proposition 15.1.16. For each k N, the function ik defined by
(
1 if n = k
ik (n) =
0 otherwise
is primitive recursive.
Proof. Notice that for each m, we have that im = Compose(Equal, I11 , Ck1 ), so ik is primitive recursive.
Proposition 15.1.17. Suppose that f : Nn+1 N is primitive recursive.
P
1. The function g : Nn+1 N given by g(~x, y) =
f (~x, z) is primitive recusive.
z<y

2. The function g : Nn+1 N given by g(~x, y) =

f (~x, z) is primitive recusive.

z<y

Proof.
1. We have
g(~x, 0) = 0
g(~x, y + 1) = g(~x, y) + f (~x, y)
2. We have
g(~x, 0) = 1
g(~x, y + 1) = g(~x, y) f (~x, y)

Proposition 15.1.18. Suppose that f : Nn+1 N is primitive recursive.

15.2. PRIMITIVE RECURSIVE RELATIONS

191

1. The function g1 : Nn+1 N defined by


(
1 if there exists z < y such that f (~x, z) = 0
g1 (~x, y) =
0 otherwise
is primitive recursive.
2. The function g2 : Nn+1 N defined by
(
z if f (~x, w) = 0 for some w < y and z is the least such w
g2 (~x, y) =
y otherwise
is primitive recursive.
Proof.
Q
1. Notice that g1 (~x, y) = sg(
f (~x, z)). More formally, we can argue as follows. Let h : Nn+1 N be the
z<y
Q
function h(~x, y) =
f (~x, z), and notice that h is primitive recursive. Notice that g1 = Compose(sg, h)
z<y

so g1 is primitive recursive.
2. We have
g2 (~x, y) = sg(g1 (~x, y)) y +

sg(g1 (~x, z)) sg(f (~x, z)) z

z<y

Definition 15.1.19. If f : Nn+1 N, we denote the g2 of the previous proposition by writing


g2 (~x, y) = z<y (f (~x, z) = 0)

15.2

Primitive Recursive Relations

Definition 15.2.1. Let R Nn . We say that R is primitive recursive if its characteristic function, i.e. the
function KR : Nn N given by
(
1 if (x1 , x2 , . . . , xn ) R
KR (x1 , x2 , . . . , xn ) =
0 otherwise
is primitive recursive.
Proposition 15.2.2. If R Nn is primitive recursive, then so is Nn \R.
Proof. Notice that KNn \R = Compose(sg, KR ).
Proposition 15.2.3. If R, S Nn are primitive recursive, then so are R S and R S.
Q
Proof. Notice that KRS = Compose( , KR , KS ) and that KRS = Compose(sg, Compose(+, KR , KS )).
Proposition 15.2.4. Suppose that f : Nn N is primitive recursive (as a function). We then have that
graph(f ) Nn+1 is primitive recursive (as a relation).

192

CHAPTER 15. PRIMITIVE RECURSIVE FUNCTIONS AND RELATIONS

Proof. Notice that


(
1
Kgraph(f ) (x1 , x2 , . . . , xn , y) =
0

if f (x1 , x2 , . . . , xn ) = y
otherwise

n+1
Thus, Kgraph(f ) = Compose(Equal, Compose(f, I1n+1 , I2n+1 , , Inn+1 ), In+1
).

Proposition 15.2.5. Suppose that R Nn+1 is primitive recursive.


1. The set
S = {(~x, y) Nn+1 : (z < y)R(~x, z)}
is primitive recursive.
2. The function f : Nn+1 N defined by
f (~x, y) = (z<y )R(~x, z)
is primitive recursive.
Proof. Let g = KNn+1 \R be the characteristic function of the complement of R, which we know is primitive
recursive.
1. We have

(
KS (~x, y) =

1
0

if there exists z < y such that g(~x, z) = 0


otherwise

so KS is primitive recursive by Proposition 15.1.18.


2. We have f (~x, y) = z<y (g(~x, z)) = 0), so f is primitive recursive by Proposition 15.1.18.

Proposition 15.2.6. The set D = {(x, y) N2 : x divides y} is primitive recursive.


Proof. Notice that
(x, y) D (w < y + 1)[x w = y]
Since D is obtained from a primitive recursive relation by a existential quantification with a primitive
recursive bound, it follows that D is primitive recursive.
More formally, notice that the set R = {(x, y, w) N3 : x w = y} is primitive recursive because it is, up
to permutation of the variable, the graph of the primitive recursive multiplication function. Therefore, the
set S = {(x, y, z) N3 : (w < z)[x w = y]} is primitive recursive from above. Now let g : N N be the
successor function (i.e. g = S) which is primitive recursive. We then have that KD (x, y) = KS (x, y, g(y)) for
all x, y N, or more annoyingly,
KD (x, y) = KS (I12 (x, y), I22 (x, y), g(I22 (x, y)))
for all x, y N. It follows that KD = Compose(KS , I12 , I22 , Compose(g, I22 )), so KD is primitive recursive.
Proposition 15.2.7. The set P rime N of primes is primitive recursive.
Proof.
n P rime n 6= 0 n 6= 1 (k < n)(m < n)[k m = n]

15.3. CODING SEQUENCES

193

Proposition 15.2.8. The function f : N N which sends n to the n + 1st prime (i.e. f (0) = 2, f (1) = 3,
f (2) = 5, etc.) is primitive recursive.
Proof. Notice that by Euclids proof that there are infinitely many primes, we have that f (n + 1) f (n)! + 1
for all n N. Thus,
f (0) = 2
f (n + 1) = z<(f (n)!+2) (z > f (n) z P rime)

15.3

Coding Sequences

In what follows, we let pn denote the (n + 1)st prime, so p0 = 2, p1 = 3, p2 = 5, etc. Notice that the function
n 7 pn is primitive recursive by Proposition 15.2.8.
Definition 15.3.1. For each n N+ , let n : Nn N be the function
xn +1
n (x1 , x2 , . . . , xn ) = p0x1 +1 p1x2 +1 pn1

Proposition 15.3.2. n is primitive recursive for each n N+ .


Proof. We know that exponentiation and multiplication are primitive recursive.
S
Definition 15.3.3. Let Seq = {1} nN+ ran(n ) (we think of 1 as coding the sequence of length 0).
Proposition 15.3.4. Seq is primitive recursive.
Proof. We have
x Seq x 6= 0 (x = 1 (p x)[(p P rime p|x) (q < p)(q P rime q|x)])

Notation 15.3.5. Given x1 , x2 , . . . , xn N, we use hx1 , x2 , . . . , xn i to denote n (x1 , x2 , . . . , xn ).


Proposition 15.3.6. The function f : N2 N defined by
(
e if x > 1, xe | y, and xe+1 - y
f (x, y) =
0 otherwise
is primitive recursive.
Proof. f (x, y) = Gt(x, 1) P red(ey (xe - y)).
Notation 15.3.7. Given y, i N, we use (y)i to denote KSeq (y) P red(f (pi , y)). In other words, (y)i is
the (i + 1)st element of the sequence coded by y (and is 0 if either y does not code a sequence, or does not
code a sufficiently long sequence).
Proposition 15.3.8. The function ln : N N defined by

0 if y = 1
ln(y) = n if y ran(n )

0 otherwise
(i.e. ln(y) is the length of the sequence coded by y) is primitive recursive.

194

CHAPTER 15. PRIMITIVE RECURSIVE FUNCTIONS AND RELATIONS

Proof. ln(x) = KSeq (x) ix (pi - x).


Proposition 15.3.9. The function Concat : N2 N, which given x, y N coding sequences and returns
the code of their concatenation (giving 0 otherwise) is primitive recursive.
Q
(y)i +1
Proof. Concat(x, y) = KSeq (x) KSeq (y) x i<ln(y) pln(x)+i
.
Definition 15.3.10. Given a function f : Nn+1 N, we let f: Nn+1 N be the function
f(~x, y) = hf (~x, 0), f (~x, 1), . . . , f (~x, y 1)i
Proposition 15.3.11. Suppose that f : Nn+1 N. We have that f is primitive recursive if and only f is
primitive recursive.
Proof. Suppose that f is primitive recursive. We then have that f (~x, y) = (f(~x, y + 1))y , so f is primitive
recursive.
Suppose that f is primitive recursive. Notice that we have
f(~x, 0) = 1
f(~x, y + 1) = Concat(f(~x, y), hf (~x, y)i)
so f is primitive recusive.
Theorem 15.3.12. If h : Nn+1 N is primitive recursive, then the function f : Nn+1 N defined by
f (~x, y) = h(~x, f(~x, y))
for all y N is primitive recursive.
Proof. Notice that we have
f(~x, 0) = 1
f(~x, y + 1) = Concat(f(~x, y), hh(~x, f(~x, y)))i
so f is primitive recursive, and hence f is primitive recursive.
Corollary 15.3.13. The Fibonacci sequence defined by f (0) = 1, f (1) = 1, and f (n) = f (n 1) + f (n 2)
for all n 2 is primitive recursive.
Proof. Define h : N N by letting

1
h(y) = (y)ln(y)2 + (y)ln(y)1

if y Seq and ln(y) < 2


if y Seq and ln(y) 2
otherwise

and notice that h is primitive recursive and that f (y) = h(f(y)) for all y.
Proposition 15.3.14. Suppose that f : Nn N, and let g : N N be the function
(
f (x1 , x2 , . . . , xn ) if y = hx1 , x2 , . . . , xn i Seq
g(y) =
0
otherwise
We then have that f is primitive recursive if and only if g is primitive recursive.

15.4. CODING PRIMITIVE RECURSIVE FUNCTIONS

195

Proof. Suppose that f is primitive recursive. Since


(
g(y) =

f ((y)0 , (y)1 , . . . , (y)n1 )


0

if y Seq and ln(y) = n


otherwise

it follows that g is primitive recursive.


Suppose conversely that g is primitive recursive. Since f (x1 , x2 , . . . , xn ) = g(hx1 , x2 , . . . , xn i), it follows
that f is primitive recursive.

15.4

Coding Primitive Recursive Functions

We want to be able to code primitive recursive functions using numbers. The idea is as follows:
1. We use the number h0i = 20+1 = 2 as a code for the function O.
2. We use the number h1i = 21+1 = 4 as a code for the function S.
3. We use the number h2, n, ii = 22+1 3n+1 5i+1 = 8 3n+1 5i+1 as a code for the function Iin .
4. If a, b1 , b2 , . . . , bm are codes for functions such that the each of the functions coded by the bi have the
same arity, and the arity of the function coded by a is m, then we use the number h3, a, b1 , b2 , . . . , bm i
as a code for the function which is the composition of the function coded by a with the functions coded
by the bi .
5. If a and b are codes functions such that the function coded by b has arity two more than the function
coded by a, then we use the number h4, a, bi as a code for the function with is achieved via primitive
recursion using the function coded by a as the base case and the function coded by b as our iterator.
For example, the number h3, h1i, h0ii = 23+1 34+1 52+1 = 24 35 53 is a code for the function C11 .
We want to show that the set of codes described above is primitive recursive. To do this, we define a
function f : N N recursively in which f (e) = 0 if e is not a valid code, and f (e) = n > 0 if e is a valid
code of a function of arity n. Heres the precise definition.
Definition 15.4.1. We define a function f : N N recursively as follows.

(e)1

f ((e) )
2
f (e) =

f ((e)1 ) + 1

e = h0i
e = h1i
e Seq, (e)0 = 2, ln(e) = 3, (e)1 1, and 1 (e)2 (e)1
e Seq, (e)0 = 3, ln(e) 3, (i < ln(e))(i > 0 f ((e)i ) 6= 0),
(i < ln(e))(j < ln(e))((i > 1 j > 1) f ((e)i ) = f ((e)j ))
and f ((e)1 ) = ln(e) 2
if e Seq, (e)0 = 4, ln(e) = 3, f ((e)1 ) 6= 0, and f ((e)2 ) = f ((e)1 ) + 2
otherwise

if
if
if
if

We denote f by P RArity.
Proposition 15.4.2. P RArity is primitive recursive.

196

CHAPTER 15. PRIMITIVE RECURSIVE FUNCTIONS AND RELATIONS

Proof. Roughly, this is because P RArity is defined recursively using primitive recursive conditions. More
formally, we define h : N N by

1
if ln(y) = h0i

1
if ln(y) = h1i

(ln(y))
if ln(y) Seq, (ln(y))0 = 2, ln(ln(y)) = 3, (ln(y))1 1, and 1 (ln(y))2 (ln(y))1

(y)
if ln(y) Seq, (ln(y))0 = 3, ln(ln(y)) 3, (i < ln(ln(y)))(i > 0 (y)i 6= 0)
2
h(y) =

(i < ln(ln(y)))(j < ln(ln(y)))((i > 1 j > 1) (y)i = (y)j )

and (y)1 = ln(ln(y)) 2

(y)1 + 1 if ln(y) Seq, (ln(y))0 = 4, ln(ln(y)) = 3, f ((ln(y))1 ) 6= 0, and (y)2 = (y)1 + 2

0
otherwise

Notice that h is primitive recursive and that P RArity(e) = h(P RArity(e))


for all e N, so P RArity is
primitive recursive.
Definition 15.4.3. Let P RCode be the set of all e N such that P RArity(e) 6= 0.
Proposition 15.4.4. P RCode is primitive recursive.
We now define our decoding function which on input e P RCode gives the function coded by e.
Definition 15.4.5. We define a function : P RCode F recursively as follows.

S
(e)
(e) = I(e)21

Compose(((e)ln(e)1 ), ((e)1 ), ((e)2 ), . . . , ((e)ln(e)2 ))

P rimRec(((e)2 ), ((e)1 ))

if e = h0i
if e = h1i
if (e)0 = 2
if (e)0 = 3
if (e)0 = 4

Proposition 15.4.6. A function f : Nn N is primitive recursive if and only if there exists e N such
that f = (e).
Definition 15.4.7. We define a function F : N2 N by letting
(
(e)((x)0 , (x)1 , . . . , (x)ln(x)1 ) if x Seq, ln(x) > 0, and P RArity(e) = ln(x)
F (e, x) =
0
otherwise
For each n N, we then define a function Fn : Nn+1 N by letting
(
F (e, hx1 , x2 , . . . , xn i) if P RArity(e) = n
Fn (e, x1 , x2 , . . . , xn ) =
0
otherwise
Notice that F and each Fn are intuitively computable.
Proposition 15.4.8. F1 is not primitive recursive.
Proof. Suppose that F1 is primitive recursive. Define g : N N by letting g(x) = F1 (x, x) + 1 for all x N,
and notice that g is primitive recursive. Fix an e N such that g = (e). We then have that
g(x) = (e)(x) = F1 (e, x)
for all x N, hence
F1 (e, e) = g(e) = F1 (e, e) + 1
a contradiction.

Chapter 16

Recursive Functions and Relations


16.1

Definitions and Basic Results

Definition 16.1.1. Fix some element not in N and denote it by . Let N = N {}.
Definition 16.1.2. Let F be the set of all functions f : Nn N for some n N+ . We call elements of F
partial functions.
Definition 16.1.3. Suppose that m, n N+ , that h : Nm N , and that g1 , g2 , . . . , gm : Nn N . We let
Compose(h, g1 , g2 , . . . , gm ) be the function f : Nn N defined by
(

if gj (~x) = for some j


f (~x) =
h(g1 (~x), g2 (~x), . . . , gm (~x)) otherwise
Definition 16.1.4. Let n N+ , let g : Nn N and h : Nn+2 N . We let PrimRec(g, h) be the unique
function f : Nn+1 N defined by
f (~x, 0) = g(~x)
(

f (~x, y + 1) =
h(~x, y, f (~x, y))

if f (~x, y) =
otherwise

Definition 16.1.5. Let h : Nn+1 N . We let Minimize(h) be the function f : Nn N defined by letting
f (~x) be the least y such that h(~x, y) = 0 and h(~x, z) 6= for all z N for all ~x Nn . We denote f by writing
f (~x) = y(h(~x, y) = 0).
Definition 16.1.6. The collection of partial recursive functions is the collection of partial functions generated
by starting with the initial functions, and generating using Compose, P rimRec, and M inimize. A (total)
recursive function is a partial recursive function f : Nn N such that
/ ran(f ).
Definition 16.1.7. Let R Nn . We say that R is recursive if its characteristic function, i.e. the function
KR : Nn N given by
(
1 if (x1 , x2 , . . . , xn ) R
KR (x1 , x2 , . . . , xn ) =
0 otherwise
is recursive.
Proposition 16.1.8. Suppose that f : Nn N (notice that f is total). If graph(f ) Nn+1 is recursive (as
a relation), then f is recursive (as a function).
197

198

CHAPTER 16. RECURSIVE FUNCTIONS AND RELATIONS

Proof. The right-to-left direction is as before, so we prove the left-to-rigth direction. Suppose that graph(f )
is recursive (as a relation). Notice that for all ~x N, there exists y N such that sg(Kgraph(f ) (~x, y)) = 0
(because (~x, f (~x)) graph(f ) and so sg(Kgraph(f ) (~x, f (~x))) = 0). Since f (~x) = y(sg(Kgraph(f ) (~x, y)) = 0),
it follows that f is recursive.
Definition 16.1.9. We define a function f : N N recursively as follows.

1
if e = h0i

1
if e = h1i

(e)1
if e Seq, (e)0 = 2, ln(e) = 3, (e)1 1, and 1 (e)2 (e)1

if e Seq, (e)0 = 3, ln(e) 3, (i < ln(e))(i > 0 f ((e)i ) 6= 0)


f ((e)2 )
f (e) =
(i < ln(e))(j < ln(e))((i > 1 j > 1) f ((e)i ) = f ((e)j ))

and f ((e)1 ) = ln(e) 2

f
((e)
)
+
1
if
e

Seq, (e)0 = 4, ln(e) = 3, f ((e)1 ) 6= 0, and f ((e)2 ) = f ((e)1 ) + 2

f ((e)1 ) 1 if e Seq, (e)0 = 5, ln(e) = 2, and f ((e)1 ) > 1

0
otherwise
We denote f by RArity.
Proposition 16.1.10. RArity is primitive recursive.
Definition 16.1.11. Let RCode be the set of all e N such that RArity(e) 6= 0.
Proposition 16.1.12. RCode is primitive recursive.
Definition 16.1.13. We define a function : RCode F {} recursively as follows.

O
if e = h0i

S
if e = h1i

(e)1

I(e)2
if (e)0 = 2 and ((e)1 ), ((e)2 ) 6=

(e) = Compose(((e)ln(e)1 ), ((e)1 ), . . . , ((e)ln(e)2 )) if (e)0 = 3 and ((e)i ) 6= whenever 0 < i < ln(y)

P rimRec(((e)2 ), ((e)1 ))
if (e)0 = 4 and ((e)1 ), ((e)2 ) 6=

M inimize(((e)1 ))
if (e)0 = 5, ((e)1 ) 6=, and ~xy(((e)1 )(~x, y) = 0)

otherwise
Definition 16.1.14. We define a set T N3 such that (e, x, y) T means to capture that y codes a
computation of (e) on input x Seq.
Proposition 16.1.15. T is primitive recursive.
Definition 16.1.16. For each n N+ , let Tn Nn+2 be defined by letting
Tn = {(e, x1 , x2 , . . . , xn , y) Nn+2 : (e, hx1 , x2 , . . . , xn i, y) T }
Proposition 16.1.17. Tn is primitive recursive for each n N+ .
Theorem 16.1.18. Suppose that f : Nn N is a partial recursive function. There exists e N such that
1. For all ~x Nn , we have f (~x) 6= if and only if there exists y such that Tn (e, ~x, y).
2. For all ~x N and all y N such that Tn (e, ~x, y), we have f (~x) = U (y). In particular, we have
f (~x) = U (yTn (e, ~x, y)) for all ~x Nn .

16.2. TURING MACHINES AND COMPUTABLE FUNCTIONS

16.2

Turing Machines and Computable Functions

16.3

The Church-Turing Thesis

16.4

Computably Enumerable Sets

199

Definition 16.4.1. Let A N. We say that A is computably enumerable, or c.e., if either A = or


A = ran(f ) for some computable function f : N N.
Proposition 16.4.2. Let A N. The following are equivalent.
1. A is c.e.
2. There exists a partial computable function g : N N such that A = ran(g)\{}.
3. There exists a partial computable function h : N N such that A = dom(h), i.e.
A = {x N : h(x) 6=}
4. There exists a computable R N2 such that for all x N, we have x A yR(x, y).
Proof. Notice that 1 implies 2 is trivial.
We now show that 2 implies 3. Suppose that g : N N is a partial computable function such that
A = ran(g)\{}. Fix an e N such that for all x N, we have g(x) = U (yT3 (e, x, y)). Define h : N N
by letting
h(y) = x(T3 (e, (x)0 , (x)1 ) U ((x)1 ) = y)
We then have that h is a partial computable function and that A = dom(h).
We now show that 3 implies 4. Suppose that h : N N is a partial computable function such that
A = dom(h). Fix an e N such that for all x N, we have
h(x) 6= if and only if there exists y such that T3 (e, x, y)
Define R N2 by letting R = {(x, y) N2 : T3 (e, x, y)}. For all x N, we then have that we have
x A yR(x, y).
We end by proving that 4 implies 1. Suppose that R N2 is computable and such that for all x, we have
x A yR(x, y). If A = , then A is c.e. by definition. Suppose then that A 6= and fix a A. Define
f : N N by letting
(
f (x) =

(x)0
a

if R((x)0 , (x)1 )
otherwise

Notice that f is computable and that A = ran(f ).


Proposition 16.4.3. Let A N and suppose that both A and A are c.e. We then have that A is computable.
Proof. If either A = or A = N, this is trivial, so we may suppose that A 6= and A 6= . Fix computable
functions f, g : N N such that A = ran(f ) and A = ran(g). Notice that for all x, there is an s such that
either f (s) = x or g(s) = x, so the function h : N N defined by letting h(x) = s(f (s) = x g(s) = x)
is a total computable function. We then have that x A f (h(x)) = x. Since {x N : f (h(x)) = x} is
computable, it follows that A is computable.

200

CHAPTER 16. RECURSIVE FUNCTIONS AND RELATIONS

Theorem 16.4.4. There exists a c.e. set which is not computable. In particular, the set
K = {x N : (y)T3 (x, x, y)}
is a c.e. set which is not computable.
Proof. Since T3 is primitive recursive, it follows that {(x, y) N2 : T3 (x, x, y)} is primitive recursive, hence
recursive. Therefore, K is c.e. Suppose that K was computable. Define a function f : N N by letting
(
U (yT3 (x, x, y)) + 1 if x K
f (x) =
0
otherwise
and notice that f is a total computable function. Thus, we fix an e such that f (x) = U (yT3 (e, x, y)) for all
x N. We then have that e K because f (e) N, hence f (e) = U (yT3 (e, e, y)) + 1, a contradiction.

Chapter 17

Coding Logic Computably


Suppose that you are working with a reasonable first-order language. For the moment, you can interpret
reasonable as finite. If we assign numbers to each of the symbols from L, together the logical symbols and
the variables, then each formula can be viewed as a sequence of numbers and hence be coded by a number.
We first fix a numbering for the logical symbols and variables independent of the language. We use only the
even numbers to save the odds for the symbols of a language L.
1. Code() = 0
2. Code() = 2
3. Code() = 4
4. Code() = 6
5. Code() = 8
6. Code() = 10
7. Code(=) = 12.
8. Fix an enumeration x0 , x1 , x2 , . . . of V ar, and let Code(xi ) = 2i + 14 for all i N.
Now we give a precise definition to reasonable. Informally, a language is reasonable if we can view
the symbols as odd natural numbers in such a way that the sets of constant symbols, relation symbols, and
function symbols are computable, and such that we can compute the arities of these functions.
Definition 17.0.5. Let L be a (first-order) language and let Odd = {2n + 1 : n N}. We say that L is
a computable language if there exists functions hC : C Odd, hR : R Odd and hF : F Odd, together
with functions ArityR : N N and ArityF : N N, such that
1. hC , hR , and hF are injective.
2. ran(hC ) ran(hR ) = , ran(hC ) ran(hF ) = , and ran(hR ) ran(hF ) = .
3. ran(hC ), ran(hR ), and ran(hF ) are computable.
4. ArityR and ArityF are computable.
5. ArityR (n) is the arity of h1
R (n) for all n ran(hR ).
201

202

CHAPTER 17. CODING LOGIC COMPUTABLY

6. ArityF (n) is the arity of h1


F (n) for all n ran(hF ).
Suppose for the rest of this section that we are working in a computable language L, and we have fixed
such functions.
Definition 17.0.6. We assign code numbers to sequences of elements of L by letting
]a1 a2 an = hCode(a1 ), Code(a2 ), . . . , Code(an )i
We call ]a1 a2 an the G
odel number of the sequence a1 a2 an .
Definition 17.0.7. Given X SymL , we let ]X = {] : X}.
Proposition 17.0.8. ]V ar is computable.
Proof. Notice that ]V ar = {22n+15 : n N}, which is computable.
Proposition 17.0.9. ]Con is computable.
Proof. Notice that ]Con = {2n+1 : n ran(hC )}, which is computable.
Proposition 17.0.10. ]T ermL is computable.
Proof. Recall that terms are defined recursively by starting with the constant symbols and variables, and generated by putting a function symbol of arity k in front of k terms concatenated together. Let SeqConcat : N
N be the function which on input x N, views x as coding a sequence of sequences, and outputs the code of
the concatenation of the sequences in x. For example, SeqConcat(hh2, 7i, h14i, h1, 0, 42ii) = h2, 7, 14, 1, 0, 42i.
Notice that SeqConcat is computable from the homework.
We define a function f : N N recursively as follows.

1 if n ]Con

1 if n ]V ar

1 if n Seq, (n) ran(h ), (m (p


nArityF ((n)0 )
)
0
F
ArityF ((n)0 ) )
f (n) =

[m

Seq

ln(m)
=
Arity
((n)
)

(i
<
ln(m))[(m)
<
n f ((m)i ) = 1]
F
0
i

n = Concat(h(n)0 i, SeqConcat(m))]

0 otherwise
Notice that f is computable and that f = K]T ermL .
Proposition 17.0.11. ]AtomicF ormL is computable.
Proof. We define a

1
f (n) =

function f : N N recursively as follows.


if n Seq, ln(n) = 3, (n)0 = 12, (n)1 ]T ermL , and (n)2 ]T ermL
if n Seq, (n)0 ran(hR ), (m (pArityR ((n)0 ) )nArityR ((n)0 ) )
[m Seq ln(m) = ArityR ((n)0 ) (i < ln(m))[(m)i < n (m)i ]T ermL ]
n = Concat(h(n)0 i, SeqConcat(m))]
otherwise

Proposition 17.0.12. ]F ormL is computable.

203
Notice that since we can code finite sequences of numbers with numbers, we can also code finite sets of
numbers with numbers. One such natural coding is as follows. Suppose that F N is finite, list the elements
of F in ascending order as a1 , a2 , . . . , an , and let the code of F be ha1 , a2 , . . . , an i. Using this definition,
one can now check that the function which takes two such codes, and outputs the code of their union is
computable, along with many other such basic functions.
Definition 17.0.13. Let F reeV ar : N N be the function defined by letting F reeV ar(n) be the code of the
finite set of variables occuring free in , if n = ], and 0 otherewise.
Proposition 17.0.14. F reeV ar is computable.
Proposition 17.0.15. ]SentL is computable.
Proposition 17.0.16. Subst : N2 N is computable.
Proposition 17.0.17. V alidSubst : N2 N is computable.
Since we can code finite sets and formulas using numbers, we can now code pairs (, ) as numbers.
From here, we can sequences of such pairs as numbers.
Proposition 17.0.18. Let Ded N be the set of codes of such sequences which are deductions. We then
have that Ded is computable. Furthermore, if F ormL is computable, then the subset Ded consisting
of elements of Ded whose last line is of the form (, ) where , is computable.
Proposition 17.0.19. Suppose that F ormL is such that ] is computable. The set ]{ F ormL :
 } is c.e.
Proof. Notice that n ]{ F ormL :  } if and only if n ]{ F ormL :  } by the Soundness
and Completenes Theorems, which is if and only if y(Ded (y) Last(y) = n).
Corollary 17.0.20. Suppose that SentL is such that ] is computable. The set ]Cn() is c.e.
Definition 17.0.21. Let T be a theory in a computable language.
1. We say that T is finitely axiomatizable if there exists a finite SentL such that T = Cn().
2. We say that T is axiomatizable if there exists SentL such that ] is computable and T = Cn().
Definition 17.0.22. We say that a theory T is decidable if ]T is computable.
Proposition 17.0.23. If T is an axiomatizable complete theory, then T is decidable.
Corollary 17.0.24. DLO is decidable.
Corollary 17.0.25. ACFp is decidable for each p.

204

CHAPTER 17. CODING LOGIC COMPUTABLY

Chapter 18

Subtheories of Number Theory


A few big questions about the theory.
1. Is the theory axiomatizable/decidable?
2. What do the models look like?
3. For all models, or just particular interesting models, what are the definable sets?

18.1

The Natural Numbers with Successor

Definition 18.1.1. Let L = {0, S} where 0 is a constant symbol and S is a unary function symbol. Let
NS = (N, 0, S).
Definition 18.1.2. Let AxS be the following set of L-sentences.
1. xy(Sx = Sy x = y)
2. x(x 6= 0 y(Sy = x))
3. x(Sx 6= 0)
4. x(Sn x 6= x) for each n N+ .
A model of Cn(AxS ) consists of one N-chain together with some number (possibly 0) of Z-chains as
we discuss now.
Proposition 18.1.3. Suppose that M  AxS . Define a relation on M by saying that a b if either
1. a = b.
2. a = (SM )(n) (b) for some n N+ .
3. b = (SM )(n) (a) for some n N+ .
We then have that is an equivalence relation on M .
Definition 18.1.4. Let be a cardinal. We define an L-structure M by letting M = N ( Z), letting
0M = 0, and letting SM (, n) = (, n + 1).
Proposition 18.1.5. M  AxS for every cardinal .
205

206

CHAPTER 18. SUBTHEORIES OF NUMBER THEORY

Proposition 18.1.6. Let and be cardinals. If 6= , then M


6= M .
Proposition 18.1.7. |M | = + 0 for all cardinals .
Proposition 18.1.8. Suppose that M  AxS , and let = |(M/ )\[0M ]|. We then have that M
= M .
Proposition 18.1.9. Cn(AxS ) is complete.
Proof. From above, we know that |M | = 0 for all 0 , and that |M | = for all > 0 . It follows
that Cn(AxS ) is -categorical for all uncountable . Since all models of Cn(AxS ) are infinite, we may use
Los-Vaught to conclude that Cn(AxS ) is complete.
Corollary 18.1.10. T h(NS ) = Cn(AxS ), hence T h(NS ) is axiomatizable and decidable.
Proposition 18.1.11. Cn(AxS ) has QE.
Proof. Let T = Cn(AxS ). Suppose that 1 , . . . , m , 1 , . . . , n F ormL with
1. F reeV ar(1 , . . . , m , 1 , . . . , n ) {y, x1 , x2 , . . . , xk }.
2. y F reeV ar(i ) for all i and y F reeV ar(j ) for all j.
3. Each i and j is an atomic formula.
We need to show that there exists a quantifier-free (x1 , x2 , . . . , xk ) F ormL such that
m
^

T  y(

n
^

i=1

j )

j=1

Notice that terms in our language are S` x for some ` N and some x V ar, and also S` 0 for some ` N.
Let
X = {S` 0 : ` N} {S` y : ` N} {S` xi : ` N, 1 i k}
Now each i and j is s.e. with, and hence we may assume is, an atomic formula of the form S` y = t for
some t X.
First, we may suppose that none of ts is of the form S` y. This is because we may ignore the formulas
i of the form S` y = Sp y with ` = p and the formulas j of the form S` y = Sp y with ` 6= p. Also, if some
i of the form S` y = Sp y with ` 6= p or some j of the form S` y = Sp y with ` = p, then using the injectivity
axioms and the noncircularity axioms, we see that
T  y(

m
^

i=1

n
^

j ) (x1 = x1 )

j=1

Next, we may assume that the `s in each S` y are the same. This is because we have t = u if and only if
St = Su (due to the injectivity axiom for S).
Now for each i, we denote the term on the right in i by ti , and for each j, we denote the the term on
the right of j by uj . If m 1 (i.e. if there is some i ), we then have
T  y(

m
^

i=1

n
^

j ) ((

`1
^

t1 6= Sp 0)

p=0

j=1

m
^

(t1 = ti )

i=2

Finally, if m = 0 (i.e. if there is no i ), then we have


T  y(

m
^

i=1

because all models of T are infinite.

n
^
j=1

j ) x1 = x1

n
^
j=1

(t1 = uj ))

18.2. THE NATURAL NUMBERS WITH ORDER

207

Corollary 18.1.12. Cn(AxS ) is complete.


Corollary 18.1.13. T h(NS ) = Cn(AxS ), hence T h(NS ) is axiomatizable and decidable.
Proposition 18.1.14. A set X N is definable in NS if and only if it is either finite or cofinite.
Proof. Notice that for every n N, we have that {n} is definable in NS by the formulas x = Sn 0. Since the
collection of definable sets is closed under finite union and complement, it follows that all finite and cofinite
sets are definable in NS .
From above, we know that T h(NS ) has QE. Thus, a set is definable in NS if and only if it is definable by
a quantifier free formula. We first see which X N are definable by atomic formulas. Suppose that x V ar
and that (x) AtomicF ormL . Since all terms have the form either S` y for some ` N and some y V ar,
or S` 0, and the only free variable in (x) is x, it follows that (x) is s.e. to either S` x = Sp x or S` x = Sp 0
for some `, p N. In either case, we see that (x) defines either or {n} for some n N. Since the sets
definable by atomic formulas are obtained by genrating using finite unions and complements from the sets
definable by quantifier-free formulas, the result follows.

18.2

The Natural Numbers with Order

Definition 18.2.1. Let L = {0, S, <} where 0 is a constant symbol, S is a unary function symbol, and < is
a binary relation symbol. Let NL = (N, 0, S, <).
Definition 18.2.2. Let AxL be the following set of L-sentences.
1. x(x 6= 0 y(Sy = x))
2. x(x < x)
3. xyz((x < y y < z) x < z)
4. xy((x < y) (y < x) (x = y))
5. x(x 6= 0 0 < x)
6. x(x < Sx)
7. xy(x < y y < Sx)
Lemma 18.2.3. AxL  xy(x < y Sx < Sy).
Proof. Fix a model M  AxL . Suppose that a, b M .
Suppose that a <M b. Notice that b <M SM (a) is impossible by 7, so we must have either SM (a) = b or
M
S (a) <M b by 4. In the former case, we have that SM (a) <M SM (b) by 6. In the latter case, we may use
6 and 3 to conclude that SM (a) <M SM (b).
Suppose conversely that SM (a) <M SM (b). We need to show that a <M b. Suppose for a contradiction
that this is not the case. By 4, either a = b or b <M a. In the former case, we could conclude that
SM (a) = SM (b) contradicting 2. In the latter case, we could use the previous paragraph to conclude that
SM (b) <M SM (a), then use 3 to get that SM (a) <M SM (a), contradicting 2. It follows that a <M b.
n

Lemma 18.2.4. AxL  x(x < S x) for each n N+ .


Proof. By induction on n using 6 and 3.
Proposition 18.2.5.

208

CHAPTER 18. SUBTHEORIES OF NUMBER THEORY

1. AxL  xy(Sx = Sy x = y).


2. AxL  x(Sx 6= 0).
n

3. AxL  x(S x 6= x) for each n N+ .


Hence, Cn(AxS ) Cn(AxL ).
Models of Cn(AxL ).
Definition 18.2.6. Let (L, ) be a linear ordering. We define an L-structure M(L,) by letting M(L,) =
N (L Z), letting 0M(L,) = 0, letting SM(L,) (n) = n + 1 for all n N, letting SM(L,) (a, n) = (a, n + 1)
for all (a, n) L Z, and letting <M(L,) be the set
{(m, n) N2 : m < n} (N (L Z)) {((a, m), (b, n)) (L Z)2 : either a b or a = b and m < n}
Proposition 18.2.7. M(L,)  AxL for every linear ordering (L, ).
Proposition 18.2.8. Let (L1 , 1 ) and (L2 , 2 ) be linear orderings. If L1
6 L2 (as linear orderings), then
=
M(L1 ,)
6 M(L2 ,) .
=
Proposition 18.2.9. |M(L,) | = |L| + 0 for all linear orderings (L, ).
Proposition 18.2.10. Suppose that M  AxL , and let L = |(M/ )\[0M ]| and define on L by letting
[a] [b] if a <M b. We then have that M
= M(L,) .
Proposition 18.2.11. Cn(AxL ) has QE.
Proof. Let T = Cn(AxL ). Suppose that 1 , . . . , m , 1 , . . . , n F ormL with
1. F reeV ar(1 , . . . , m , 1 , . . . , n ) {y, x1 , x2 , . . . , xk }.
2. y F reeV ar(i ) for all i and y F reeV ar(j ) for all j.
3. Each i and j is an atomic formula.
We need to show that there exists a quantifier-free (x1 , x2 , . . . , xk ) F ormL such that
T  y(

m
^
i=1

n
^

j )

j=1

Notice that terms in our language are S` x for some ` N and some x V ar, and also S` 0 for some ` N.
Let
X = {S` 0 : ` N} {S` y : ` N} {S` xi : ` N, 1 i k}
Now each i and j is s.e. with, and hence we may assume is, one of the following
1. S` y = t for some t X.
2. S` y < t for some t X.
3. t < S` y for some t X.

18.2. THE NATURAL NUMBERS WITH ORDER

209

First, we may suppose that none of ts is of the form S` y because we can either ignore them, or because they
make the formula trivial as above. Next, we can suppose that there are no j s . This is because we could
replace (S` y = t) by (S` y < t) (t < S` y) and similarly for inequalities, distribute the over the , and
then distribute the over the , handling each case separately. Next, we may assume that the `s in each
S` y are the same. This is because we have t = u if and only if St = Su (due to the injectivity axiom for S)
and t < u if and only if St < Su (by what we showed above).
For each i, we denote the t in i by ti . Let E = {i : i is S` y = ti }, let L = {i : i is ti < S` y}, and let
U = {i : i is S` y < ti }.
Suppose first that E 6= . Fix j E. We then have
T  y(

m
^

i=1

i ) (

`1
^

(tj 6= S p 0)

p=0

(tj = ti )

iE

(ti < tj )

iL

(tj < ti ))

iU

Suppose then that E = . If U = , then


T  y(

m
^

i ) x1 = x1

i=1

If L = , then
T  y(

m
^

i )

i=1

(S` 0 < ti )

iU

Suppose then that L 6= and U 6= . We then have


T  y(

m
^

i ) (

i=1

iU

(S` 0 < ti )

(Sti < tj ))

iL,jU

Corollary 18.2.12. Cn(AxL ) is complete.


Corollary 18.2.13. T h(NL ) = Cn(AxL ), hence T h(NL ) is axiomatizable and decidable.
Proposition 18.2.14. A set X N is definable in NL if and only if it is either finite or cofinite.
Proof. As above, it suffices to show that if x V ar and that (x) AtomicF ormL , then (x) defines a
subset of N which is either finite of cofinite.
Since all terms have the form either S` y for some ` N and some y V ar, or S` 0, and the only free
variable in (x) is x, it follows that (x) is s.e. to one of the following:
1. S` x = Sp x
2. S` x = Sp 0
3. S` x < Sp x
4. S` x < Sp 0
5. S` 0 < Sp x
for some `, p N. In any case, we see that (x) defines a set which is either finite or cofinite. Since the sets
definable by atomic formulas are obtained by genrating using finite unions and complements from the sets
definable by quantifier-free formulas, the result follows.

210

18.3

CHAPTER 18. SUBTHEORIES OF NUMBER THEORY

The Natural Numbers with Addition

Definition 18.3.1. Let L = {0, 1, <, +} where 0 and 1 are constant symbols, < is a binary relation symbol,
and + is a binary function symbol. Let NA = (N, 0, 1, <, +).
Given n N+ and y V ar, we use n y as shortand for y + y + + y (n times) and given n N, we
use n as shorthand for n 1.
Proposition 18.3.2. T h(NA ) does not have QE.
Proof. Notice that if (x) F ormL is quantifier-free, then {n N : (NA , n)  } is either finite or cofinite
because atomic formulas are linear inequalities and inequalities. Since y(y + y = x) defines the set of even
numbers, which is neither finite nor cofinite, it follows that y(y + y = x) is not equivalent to any quantifierfree formula.
Definition 18.3.3. Let AxA be the following set of L-sentences.
1. x(x 6= 0 y(y + 1 = x))
2. x(x < x)
3. xyz((x < y y < z) x < z)
4. xy((x < y) (y < x) (x = y))
5. x(x 6= 0 0 < x)
6. x(x < x + 1)
7. xy(x < y y < x + 1)
8. x(x + 0 = x)
9. xyz((x + y) + z = x + (y + z))
10. xy(x + y = y + x)
11. xy(x < y z(x + z = y))
12. wxyz((w < y (x < z x = z)) w + x < y + z)
W
13. xy k<n (x = n y + k) for each n 2.
Lemma 18.3.4.
1. AxA  x(x < `

W`1

k=0 (x

= k)) whenever ` N+ .

2. AxA  (k = `) for all k 6= `.


3. AxA  k + ` = k + ` for all k, ` N.
4. AxA  xyz(x + z = y + z x = y)
5. AxA  x(n x + m x = (n + m) x) whenever n, m N.
Definition 18.3.5. Let L+ = L {Dn : n 2} where each Dn is a unary relation symbol. Let N+
A be NA
together with interpreting Dn as {k N : n|k}. Let Ax+
A be AxA together with the sentences
x(Dn (x) y(x = n y))

18.3. THE NATURAL NUMBERS WITH ADDITION

211

Proposition 18.3.6. Cn(Ax+


A ) has QE.
Proof.
Corollary 18.3.7. Cn(Ax+
A ) is complete.
+
+
Corollary 18.3.8. T h(N+
A ) = Cn(AxA ), hence T h(NA ) is axiomatizable and decidable.

Corollary 18.3.9. Cn(AxA ) is complete.


Corollary 18.3.10. T h(NA ) = Cn(AxA ), hence T h(NA ) is axiomatizable and decidable.
Definition 18.3.11. A set X N is eventually periodic if there exists m N and p N+ such that for all
n N with n > m, we have n X if and only if n + p X.
Proposition 18.3.12. A set X N is definable in NA if and only if it is eventually periodic.
Proof.
Corollary 18.3.13. The set Y = {(k, m, n) N3 : k m = n} is not definable in NA .
Proof. Suppose that Y is definable in NA , say by the formula (x, y, z). Let (x) be the formula x(x, x, z).
We then have that defines X = {n N : n is a square} in NA . However, X is not eventually periodic, so
this is a contradiction. It follows that Y is not definable in NA .

212

CHAPTER 18. SUBTHEORIES OF NUMBER THEORY

Chapter 19

Number Theory
19.1

Definability in N

Definition 19.1.1. Let L = {0, 1, +, } where 0 and 1 are constant symbols, and + and are binary function
symbols. Let N = (N, 0, 1, +, ).
Theorem 19.1.2 (Essentially G
odel). The partial computable functions equals the collection of partial functions starting with O, S, Iin , +, , and Equal, and closing off under Compose and M inimize.
Proof. Let F be the collection of all such functions. Since +, , and Equal are all partial computable, it
follow by a simple induction that every element of F is partial recursive (i.e. partial computable). For the
converse, we need to show that if g : Nn N and that h : Nn+2 N are in F, then P rimRec(g, h) F.
Our method to accomplish this will be to define a function : N2 N such that F with the property
that for all a0 , a1 , . . . , an N, there exists c N such that (c, i) = ai for all i n. That is, we want a
function of two variables in F which is able to code finite sequences. You may think that weve already done
this through our sequence decoding function. However, our coding of sequences using powers of primes, and
exponentiation is defined using primitive recursion which at this point is not obviously in F (we will know
that it is once we prove the theorem, but we dont know that yet).
Suppose that we have such a function . Suppose that g : Nn N and that h : Nn+2 N are in F. Let
f = P rimRec(g, h). Consider the function t : Nn+1 N defined by
t(~x, y) = z[(z, 0) = g(~x) (i < y)((z, i + 1) = h(~x, i, (z, i)))]
Assuming that F is closed under same basic operations (like and bounded quantification), it follows that
t F. Since f (~x, y) = (t(~x, y), y), it follows that f F. Ill leave it to you to think about why F is closed
under these basic operations, and move on to the subtle part of the argument.
Rather than defining our function of two variables all at once, well first make a function of three
variables work. Assuming some pairing function exists in F (it does, see below), this will be sufficient. Thus,
we define a function : N3 N in F such that for all a0 , a1 , . . . , an N, there exists b, k N such that
(b, k, i) = ai for all i n. The idea for this function is to get the ai s as remainders upon division of the
number b by n + 1 numbers given in terms of k and i. To this end, recall the Chinese Remainder Theorem,
which says that if d0 , d1 , . . . , dn N are pairwise relatively prime, and a0 , a1 , . . . , an N satisfy ai < di for
all i, then there exists m N such that m ai (mod di ) for all i.
Let (b, k, i) be the remainder upon division of b by 1 + (i + 1)k. To see that F, simply notice
that (b, k, i) = r[(q < b + 1)(b = q (1 + (i + 1)k) + r)]. We now need to show that works. Suppose
that a0 , a1 , . . . , an N. Let s = max{n, a0 , a1 , . . . , an } + 1 and let k = s!. We first argue that the numbers
1 + k, 1 + 2k, . . . , 1 + (n + 1)k are pairwise relatively prime. Suppose that 1 i, j n + 1, that p is prime,
213

214

CHAPTER 19. NUMBER THEORY

and that p | (1 + ik) and p | (1 + jk). Since p | (1 + ik), we must have p - k, hence it must be the case
that p > s. We also have p | (i j)k, hence p | (i j) (because p is prime and p - k). Therefore, since
p > s n + 1, it must be the case that i = j. Thus, the numbers 1 + k, 1 + 2k, . . . , 1 + (n + 1)k are pairwise
relatively prime. Since ai < s k < 1 + (i + 1)k for all k, it follows by the Chinese Remainder Theorem
that there exists b N such that (b, k, i) = ai for all i.
Next, we need to a pairing function which is in F. Rather that using powers of primes and thus
exponentiation, we will make do with the standard diagonalizing through the plane encoding. That is,
we define J : N2 N by letting
x+y
X

J(x, y) = (

i) + x =

i=1

(x + y)(x + y + 1)
+x
2

which simplifies to
J(x, y) =

(x + y)2 + 3x + y
2

Notice that J(x, y) = z(2z = (x + y)2 + 3x + y), so J F. Now define L : N N by letting L(z) =
x((y < z + 1)(J(x, y) = z)) and R(z) = y((x < z + 1)(J(x, y) = z)), and notice that L, R F.
Finally, define : N2 N by letting (c, i) = (L(c), R(c), i) and notice that F. Given a0 , a1 , . . . , an
N, we then have that there exists c N such that (c, i) = ai for all i n.
Theorem 19.1.3. Every computable relation and (the graph of ) every parital computable function is definable in N.
Proof. We first show that (the graph of) every partial computable function is definable in N.
Suppose that h : Nm N is definable in N and that g1 , g2 , . . . , gm : Nn N are definable in N. Let f =
Compose(h, g1 , g2 , . . . , gm ). Fix (y1 , y2 , . . . , ym , z) F ormL defining h in N and fix i (x1 , x2 , . . . , xn , y)
F ormL defining gi in N. Let (x1 , x2 , . . . , xn , z) F ormL be the formula
y1 y2 ym (

m
^

(i (x1 , x2 , . . . , xn , yi )) (y1 , y2 , . . . , ym , z))

i=1

We then have that defines f in N.


Suppose that h : Nn+1 N is definable in N. Let f = M inimize(h). Fix (x1 , x2 , . . . , xn , xn+1 , y)
F ormL definabing h in N. Let (x1 , x2 , . . . , xn , y) F ormL be the formula
(x1 , x2 , . . . , xn , y, 0)z(u(u 6= 0 z + u = y) w(w 6= 0(x1 , x2 , . . . , xn , z, w)))
We then have that defines f in N.
Therefore, (the graph of) every parital computable function is definable in N. Suppose now that R Nn is
computable. By definition, this means that the characteristic function KR : Nn N is a computable function.
Fix (x1 , x2 , . . . , xn , y) F ormL defining KR in N. We then have that the formula (x1 , x2 , . . . , xn , 1) defines
R in N.
Corollary 19.1.4. If X N is c.e., then X is definable in N.
Proof. Since X is c.e., there exists a computable set C N2 such that
m X There exists n N with (m, n) C
Now C is computable, so it is definable in N from above. Fix (x, y) F ormL defining C in N. Let (x)
be the formula y(x, y), and notice that defines X in N.

19.2. INCOMPLETENESS AND UNDECIDABILITY IN N

19.2

215

Incompleteness and Undecidability in N

Our goal in this section is to prove the following theorem which is a weak form of the First Incompleteness
Theorem. We will strengthen it later, but all of the real insight and hard work is in this version anyway.
Theorem 19.2.1 (First Incompleteness Theorem - Godel). T h(N) is undecidable and not axiomatizable.
Notice that since T h(N) is a complete theory, we know that it is undecidable if and only if it is not
axiomatizable. Thus, it suffices to prove only one. We will give three proofs below.

19.2.1

Proof Using a C.E. Set Which is Not Computable

Proof of Incompleteness Theorem via Computability. Suppose that ] is computable and that Cn() =
T h(N). Let K be a c.e. set which is not computable. We then know that K is not c.e. Fix a formula
(x) defining K in N. Notice that the set {n N :  (n)} is c.e. However, we have that
n K N  (n)  (n)
so K = {n N :  (n)} is c.e., a contradiction.

19.2.2

Proof Using Undefinabilty of Truth

Theorem 19.2.2 (Undefinability of Truth - Tarski). The set ]T h(N) is not definable in N.
Proof. Suppose that {] : N  } is definable in N, and fix (x) F ormL defining it so that we have
N  N  (])
for all SentL . The idea is to show that there is a definable subset of N2 such that every definable subset
of N appears as a row. We can then definably diagonlize out by taking the negation of the diagonal to get
a contradiction.
Notice that the function f : N2 N given by letting
(
](n) if m = ] for a formula with one free variable
f (m, n) =
0
otherwise
is computable. Fix (x, y, z) defining (the graph of) f in N. Let (x, y) be the formula z((x, y, z) (z)).
Notice that for all (x) F ormL and all n N, we have
N  (], n) N  (f (], n))
N  (](n))
N  (n)
Now let (x) be the formula (x, x) (so defines the complement of the diagonal). The point is that we
diagonalized out, so this cant be one of rows. But it must be one of the rows, since in fact it must be in
row number ]. Formally, we have
N  (]) N  (], ])
N 6 (], ])
N 6 (])
which is a contradiction.
Proof of Incompleteness Theorem via Definability. If T h(N) is decidable, then ]T h(N) is computable, hence
definable in N, contradicting Undefinability of Truth. Therefore, T h(N) is undecidable.

216

19.2.3

CHAPTER 19. NUMBER THEORY

Proof Using a Sentence Implying Its Nonprovability

Our next proof uses of Incompleteness uses the following fundamental Lemma which allows us to make
sentences which indirectly refer to themselves.
Lemma 19.2.3 (Fixed-Point Lemma - G
odel). Let (x) F ormL . There exists SentL such that
N  (])
Proof. As above, notice that the function f : N2 N defined by letting
(
](n) if m = ] for a formula with one free variable
f (m, n) =
0
otherwise
is computable. Fix (x, y, z) defining (the graph of) f in N. Let (x, y) be the formula z((x, y, z) (z)).
Notice that for all (x) F ormL and all n N, we have
N  (], n) N  (f (], n))
N  (](n))
Now let (x) be the formula (x, x) (so defines the diagonal). The point here is to look at what happens
when the row ] which is defining the diagonal actually meets the diagonal. That is, we should look at the
(], ]) entry of the table. We have
N  (]) N  (], ])
N  (](]))
Thus, if we let = (]), we then have that N  if and only if N  (]). That is, N  (]).
We first show how to get Undefinability of Truth using the Fixed-Point Lemma. The idea is to take a
purported definition of truth, and use it to get a sentence which indirectly says that it is false.
Using the Fixed-Point Lemma to Prove Undefinability of Truth. Suppose that the set {] : N  } is definable in N, and fix (x) F ormL defining it so that
N  N  (])
for all SentL . By the Fixed-Point Lemma, there exists SentL such that N  (]). We then
have that
N  (]) N 
N  (])
a contradiction.
We now give another proof of incompletenss using a sentence which indirectly asserts that it is not
provable.
Proof of Incompleteness Theorem via Self-Reference. Suppose that T h(N) and that ] is computable.
We then have that the set {] :  } is c.e., so is definable in N. Fix P rv (x) F ormL defining
{] :  } in N, so that
N  P rv (]) 

19.3. ROBINSONS Q AND PEANO ARITHMETIC

217

for all SentL . By the Fixed-Point Lemma, there exists SentL such that
N  P rv (])
Now if  , we would then have that N  (because T h(N)), but we also have
 N  P rv ()
N 
which is a contradiction. Therefore, we must have 6 . It follows that N 6 P rv (]), so N  P rv (]),
and hence N  . Therefore, T h(N)\Cn(), so Cn() 6= T h(N).
It follows that T h(N) is not axiomatizable.

19.3

Robinsons Q and Peano Arithmetic

19.3.1

Robinsons Q

Definition 19.3.1. Let L = {0, 1, +, } and let AxQ be the following set of L-sentences.
1. xy(Sx = Sy x = y)
2. x(Sx 6= 0)
3. x(x 6= 0 y(Sy = x))
4. x(x + 0 = x)
5. xy(x + Sy = S(x + y))
6. x(x 0 = 0)
7. xy(x Sy = x y + x)
Let Q = Cn(AxQ ).
Definition 19.3.2. Let (x, y) be the formula z(z + x = y) and let < (x, y) be the formula (x, y)x 6= y.
Proposition 19.3.3.
1. Q  x(x + k = S k x)
2. Q  x(x + k + 1 6= k)
W`
3. Q  (k, `) i=0 (k = i) for all k, ` N.
4. Q  x(< (x, k) x = k < (k, x)) for all k N.
5. Q  (k = `) for all k 6= `.
6. Q  k + ` = k + ` for all k, ` N.
7. Q  k ` = k ` for all k, ` N.
Proposition 19.3.4. For every variable-free t T ermL , there exists k N such that Q  t = k.
Proof.

218

CHAPTER 19. NUMBER THEORY

Proposition 19.3.5. If SentL is quantifier-free and N  , then Q  .


Proof.
Proposition 19.3.6. If (x1 , x2 , . . . , xn ) F ormL is quantifier-free and N  x1 x2 xn , then Q 
x1 x2 xn .
Proof.

19.3.2

Peano Arithmetic

Definition 19.3.7. Let P A be the L-theory axiomatized by AxQ together with the formulas
~p(((0,~p) x((x,~p) (Sx,~p))) x(x,~p)
for all (x, p~) F ormL .

19.4

Representable Relations and Functions

Definition 19.4.1. Let L be a compuatble language containing a constant symbol 0 and a unary function
symbol S. Let T be an L-theory.
1. A relation R Nn is representable in T if there exists (x1 , x2 , . . . , xn ) F ormL such that for all
k1 , k2 , . . . , kn N, we have
(a) If (k1 , k2 , . . . , kn ) R, then T  (k1 , k2 , . . . , kn ).
(b) If (k1 , k2 , . . . , kn )
/ R, then T  (k1 , k2 , . . . , kn ).
2. A function f : Nn N is representable in T if graph(f ) Nn+1 is representable in T (as a relation).
Proposition 19.4.2. Let T be an axiomatizable theory.
1. If R Nn is representable in T , then R is computable.
2. If f : Nn N is representable in T , then f is computable.
Proof. Since T is axiomatizable, we may fix a set such that ] is computable and Cn() = T .
1. Suppose that R Nn is representable in T , and fix (x1 , x2 , . . . , xn ) F ormL representing it. Notice
that the function g : Nn N defined by g(k1 , k2 , . . . , kn ) = ](k1 , k2 , . . . , kn ) is computable. Now we
know that the set {] :  } is c.e., hence both R and R are c.e. It follows that R is computable.
2. By part 1, we know that graph(f ) is computable, so f is computable.

Definition 19.4.3. A function f : Nn N is strongly representable in T if there exists (x1 , x2 , . . . , xn , y)


F ormL such that represents f and also for all k1 , k2 , . . . , kn N we have
T  y((k1 , k2 , . . . , kn , y) y = f (k1 , k2 , . . . , kn ))
Proposition 19.4.4. If a function f : Nn N is representable in Q, then it is strongly representable in Q.

19.5. WORKING FROM Q

219

Proof. Fix (x1 , x2 , . . . , xn , y) F ormL representing f . Let (x1 , x2 , . . . , xn , y) be the formula


(x1 , x2 , . . . , xn , y) z(< (z, y) (x1 , x2 , . . . , xn , z))
We claim that strongly represents f in Q. Fix k1 , k2 , . . . , kn N. Since represents f , it follows that
Q  (k1 , k2 , . . . , kn ,W
f (k1 , k2 , . . . , kn )) and Q  (k1 , k2 , . . . , kn , `) for all ` 6= f (k1 , k2 , . . . , kn ). Since
Q  x(< (x, k) m<k (x = m)), it follows that Q  (k1 , k2 , . . . , kn , f (k1 , k2 , . . . , kn )). Therefore,
represents f in Q. Now Q  x(< (x, f (k1 , k2 , . . . , kn ))x = f (k1 , k2 , . . . , kn )< (f (k1 , k2 , . . . , kn ), x)).
Theorem 19.4.5. Every computable function and relation is representable in Q.
Proof.

19.5

Working From Q

Lemma 19.5.1 (Fixed-Point Lemma - G


odel). Let (x) F ormL . There exists SentL such that
Q  (])
Proof. We now need to check that the argument for N can be carried out in Q. Notice that the function
f : N2 N defined by letting
(
](n) if m = ] for a formula with one free variable
f (m, n) =
0
otherwise
is computable. Fix (x, y, z) strongly representing f in Q. Let (x) be the formula z((x, x, z) (z)). Let
= (]). We need to check that
Q  (])
which is to say that
Q  z((], ], z) (z)) (])
Notice that because strongly represents f in Q, and f (], ]) = ], we have that
Q  z((], ], z) z = ])

Lemma 19.5.2. Let SentL and let SentL . If Cn() is decidable, then Cn( { }) is also
decidable.
Proof. Notice that Cn( { }) if and only if Cn().
Theorem 19.5.3 (Strong Undecidability of Q). If ] is computable and Q is consistent, then Cn()
is undecidable.
Proof. Suppose that Cn() is decidable. Let T = Cn( Q), and notice that T is decidable by the lemma.
Fix (x) F ormL representing ]T in Q. By the Fixed-Point Lemma, there exists SentL such that
Q  (]). We then have
T ] ]T
Q  (])

(since represents ]T in Q)

Q 

(by choice of )

(since Q T )

220

CHAPTER 19. NUMBER THEORY

and

/ T ]
/ ]T
Q  (])

(since represents ]T in Q)

Q

(by choice of )

(since Q T )

which are both contradictions.


Theorem 19.5.4 (Strong Incompleteness of Q). If ] is computable and Q is consistent, then Cn()
is not complete.
Proof. If Cn() was complete, then it would be decidable (since its axiomatizable as ] is computable),
contrary to the Strong Undecidability of Q.
Corollary 19.5.5. P A is undecidable.
Corollary 19.5.6 (Churchs Theorem). In the language L, the theory Cn() is undecidable.

19.6

The Second Incompleteness Theorem

Definition 19.6.1. Let SentL be decidable. We then have that the set {(m, n) N2 : n codes a
deduction witnessing that proves the sentence coded by m} is computable. Denote by (x, y) a formula
representing the above set in Q. Let P rv (x) be the formula y (x, y).
Lemma 19.6.2. Suppose that SentL and that ] is computable. If  , then Q  P rv (]).
Proof. Suppose that  , and let n be the Godel number of a deduction witnessing that ` . We then
have that Q  (], n), hence Q  y (], y), which is to say that Q  P rv (]).
Definition 19.6.3. Suppose that SentL and that ] is computable. We say that has the reflection
property if whenever  , we have  P rv (]).
Corollary 19.6.4. Suppose that SentL and that ] is computable. If Q Cn(), then has the
reflection property.
Definition 19.6.5. Suppose that SentL and that ] is computable. A Godel sentence of is a
SentL such that
 P rv (])
Proposition 19.6.6. If SentL , ] is computable, and Q Cn(), then has a G
odel sentence.
Proposition 19.6.7. Suppose that SentL , ] is computable, and Q Cn(). Let be a G
odel
sentence of . If is consistent, then 6 .
Proof. Suppose that is consistent. Notice that
  P rv (])

is inconsistent
a contradiction, so 6 .

(by reflection)
(since is a Godel sentence of )

19.7. DIOPHANTINE SETS

221

Definition 19.6.8. Given a set SentL such that ] is computable, let Con be the sentence P r (](0 = 1)).
Definition 19.6.9. Suppose that SentL and that ] is computable. We say that is sufficently strong
if
1. Q Cn().
2. For any SentL , we have  P rv (]) P rv (]P rv (])). We call this formalized reflection.
3. For any , SentL , we have  (P rv (]) P rv (]( ))) P rv (] ). We call this formalized
Modus Ponens.
Theorem 19.6.10 (Second Incompleteness Theorem - Godel). Suppose that SentL ,that ] is computable, and that Cn() is sufficiently strong. We then have that  Con if and only if is inconsistent.
Proof. Let be a G
odel sentence of . We formalize the proof of Proposition 19.6.7 (which says that if
is consistent then 6 ) inside to show that  Con P rv (]). From this it will follow that
 Con , which will give the result as well see below.
1. Notice that  P rv (]) P rv (]P rv (])) by formalized reflection (so the first implication in the
proof of Proposition 19.6.7 holds inside ).
2. Now since  P rv (]) by choice of , we have  P rv (](P rv (]) )) by reflection.
Using formalized Modus Ponens, it follows that  P rv (]P rv (])) P rv (]) (so the second
implication in the proof of Proposition 19.6.7 holds inside ).
3. By combining 1 and 2, we therefore have that  P rv (]) P rv (]).
4. Notice that  ( ( (0 = 1))), so  P rv (]( ( (0 = 1)))) by reflection. By
formalized Modus Ponens, it follows that  P rv (]) P rv (]( (0 = 1))).
5. Combining 3 and 4, we see that  P rv (]) (P rv (]) P rv (]( (0 = 1)))).
6. Therefore,  P rv (]) P rv (](0 = 1)) by formalized Modus Ponens, which is to say that
 P rv (]) Con .
7. Hence,  Con P rv (]).
8. Since is a G
odel sentence of , it follows that  Con .
Therefore, if  Con , it would follows that  , which would imply that was inconsistent by
Proposition 19.6.7.
Corollary 19.6.11. There exists a consistent, decidable with P A Cn() such that  Con .
Proof. Let be the axioms of P A together with ConP A .

19.7

Diophantine Sets

Definition 19.7.1. A set D N+ is diophantine if there exists f (x1 , x2 , . . . , xn , y) Z[x1 , x2 , . . . , xk , y]


such that for all n N, we have
n D There exists m1 , m2 , . . . , mk N+ such that f (m1 , m2 , . . . , mk , n) = 0
Proposition 19.7.2. The set D = {n N+ : n is composite} is Diophantine.

222

CHAPTER 19. NUMBER THEORY

Proof. Notice that n D if and only if there exists `, m N+ with n = (` + 1)(m + 1).
Proposition 19.7.3. The set D = {n N+ : n is not a power of 2} is Diophantine.
Proof. Notice that n D if and only there exists `, m N+ with n = `(2m + 1).
Proposition 19.7.4. Every Diophantine set is c.e.
Theorem 19.7.5 (Davis,Putnam,Robinson,Matiyasevich). Every c.e. set is Diophantine.
Corollary 19.7.6. Suppose that T h(N) is decidable. There exists f Z[x1 , x2 , . . . , xn ] such that if
is the sentence expressing that f has no root in N+ , then N  but 6 .

19.8

A Speed-Up Theorem

Theorem 19.8.1 (G
odel). Suppose that ] is computable, that Q Cn(), and is such that 6 and
6 . For any computable h : N N, there exists and n such that
1.  .
2. {}  via a proof of length at most n.
3. Every proof of  has length at least h(n).
Proof. We first argue that Cn( {})\Cn() is not c.e. Suppose instead that Cn( {})\Cn() was
c.e. We show that the complement of Cn( {}) is c.e., implying that Cn( {}) is computable,
which contradictis the Strong Undecidability of Q. We have

/ Cn( {}) {} 
6
6
6 Cn()
Cn( {})\Cn()
Suppose then h : N N is computable but that there is no and n satisfying the above three conditions.
We show that Cn( {})\Cn() is c.e. Given , wait until (if ever) we see it enter Cn( {}) via a
proof of length of n. If we ever see this happen, check all proofs from up to length h(n) to see if appears.
If not, enumerate .

You might also like